Some years ago I was interested in the following Markov chain whose state space is the positive integers. The chain begins at state "1", and from state "n" the chain next jumps to a state uniformly selected from {n+1,n+2,...,2n}.
As time goes on, this chain goes to infinity, with occasional large jumps. In any case, the chain is quite unlikely to hit any particular large n.
If you define p(n) to be the probability that this chain visits state "n", then p(n) goes to zero like c/n for some constant c. In fact,
$$ np(n) \to c = {1\over 2\log(2)-1} = 2.588699. \tag1$$
In order to prove this convergence, I recast it as an analytic problem. Using the Markov property, you can see that the sequence satisfies:
$$ p(1)=1\quad\mbox{ and }\quad p(n)=\sum_{\lceil n/2\rceil}^{n-1} {p(j)\over j}\mbox{ for }n>1. \tag2$$
For some weeks, using generating functions etc. I tried and failed to find an analytic proof of the convergence in (1). Finally, at a conference in 2003 Tom Mountford showed me a (non-trivial) probabilistic proof.
So the result is true, but since then I've continued to wonder if I missed something obvious. Perhaps there is a standard technique for showing that (2) implies (1).
Question: Is there a direct (short?, analytic?) proof of (1)?
Perhaps someone who understands sequences better than I do could take a shot at this.
Update: I'm digging through my old notes on this. I now remember that I had a proof (using generating functions) that if $\ np(n)$ converges, then the limit is $1\over{2\log (2)-1}$. It was the convergence that eluded me.
I also found some curiosities like: $\sum_{n=1}^\infty {p(n)\over n(2n+1)}={1\over 2}.$
Another update: Here is the conditional result mentioned above.
As in Qiaochu's answer, define $Q$ to be the generating function of $p(n)/n$, that is, $Q(t)=\sum_{n=1}^\infty {p(n)\over n} t^n$ for $0\leq t<1$. Differentiating gives $$Q^\prime(t)=1+{Q(t)-Q(t^2)\over 1-t}.$$ This is slightly different from Qiaochu's expression because $p(n)\neq \sum_{j=\lceil n/2\rceil}^{n-1} {p(j)\over j}$ when $n=1$, so that $p(1)$ has to be treated separately.
Differentiating again and multiplying by $1-t$, we get $$(1-t)Q^{\prime\prime}(t)=-1+2\left[Q^\prime(t)-t Q^\prime(t^2)\right],$$ that is, $$(1-t)\sum_{j=0}^\infty (j+1) p(j+2) t^j = -1+2\left[\sum_{j=1}^\infty (jp(j)) {t^j-t^{2j}\over j}\right].$$
Assume that $\lim_n np(n)=c$ exists. Letting $t\to 1$ above the left hand side gives $c$, while the right hand side is $-1+2c\log(2)$ and hence $c={1\over 2\log(2)-1}$.
Note: $\sum_{j=1}^\infty {t^j-t^{2j}\over j}=\log(1+t).$
New update: (Sept. 2)
Here's an alternative proof of the conditional result that my colleague Terry Gannon showed me in 2003.
Start with the sum $\sum_{n=2}^{2N}\ p(n)$, substitute the formula in the title, exchange the variables $j$ and $n$, and rearrange to establish the identity:
$${1\over 2}=\sum_{j=N+1}^{2N} {j-N\over j}\ {p(j)}.$$
If $jp(j)\to c$, then $1/2=\lim_{N\to\infty} \sum_{j=N+1}^{2N} {j-N\over j^2}\ c=(\log(2)-1/2)\ c,$ so that $c={1\over 2\log(2)-1}$.
New update: (Sept. 8) Despite the nice answers and interesting discussion below, I am still holding out for an (nice?, short?) analytic proof of convergence. Basic Tauberian theory is allowed :)
New update: (Sept 13) I have posted a sketch of the probabilistic proof of convergence under "A fun and frustrating recurrence sequence" in the "Publications" section of my homepage.
Final Update: (Sept 15th) The deadline is approaching, so I have decided to award the bounty to T.. Modulo the details(!), it seems that the probabilistic approach is the most likely to lead to a proof.
My sincere thanks to everyone who worked on the problem, including those who tried it but didn't post anything.
In a sense, I did get an answer to my question: there doesn't seem to be an easy, or standard proof to handle this particular sequence.
Update: the following probabilistic argument I had posted earlier shows only that $p(1) + p(2) + \dots + p(n) = (c + o(1)) \log(n)$ and not, as originally claimed, the convergence $np(n) \to c$. Until a complete proof is available [edit: George has provided one in another answer] it is not clear whether $np(n)$ converges or has some oscillation, and at the moment there is evidence in both directions. Log-periodic or other slow oscillation is known to occur in some problems where the recursion accesses many previous terms. Actually, everything I can calculate about $np(n)$ is consistent with, and in some ways suggestive of, log-periodic fluctuations, with convergence being the special case where the bounds could somehow be strengthened and the fluctuation scale thus squeezed down to zero.
$p(n) \sim c/n$ is [edit: only in average] equivalent to $p(1) + p(2) + \dots + p(n)$ being asymptotic to $c \log(n)$. The sum up to $p(n)$ is the expected time the walk spends in the interval [1,n]. For this quantity there is a simple probabilistic argument that explains (and can rigorously demonstrate) the asymptotics.
This Markov chain is a discrete approximation to a log-normal random walk. If $X$ is the position of the particle, $\log X$ behaves like a simple random walk with steps $\mu \pm \sigma$ where $\mu = 2 \log 2 - 1 = 1/c$ and $\sigma^2 = (1- \mu)^2/2$. This is true because the Markov chain is bounded between two easily analyzed random walks with continuous steps.
(Let X be the position of the particle and $n$ the number of steps; the walk starts at X=1, n=1.)
Lower bound walk $L$: at each step, multiply X by a uniform random number in [1,2] and replace n by (n+1). $\log L$ increases, on average, by $\int_1^2 log(t) dt = 2 \log(2) - 1$ at each step.
Upper bound walk $U$: at each step, jump from X to uniform random number in [X+1,2X+1] and replace n by (n+1).
$L$ and $U$ have means and variances that are the same within $O(1/n)$, where the $O()$ constants can be made explicit. Steps of $L$ are i.i.d and steps of $U$ are independent, asymptotically identical-distributed. Thus, the Central Limit theorem shows that $\log X$ after $n$ steps is approximately a Gaussian with mean $n\mu + O(\log n)$ and variance $n\sigma^2 + O(\log n)$.
The number of steps for the particle to escape the interval $[1,t]$ is therefore $({\log t})/\mu$ with fluctuations of size $A \sqrt{\log t}$ having probability that decays rapidly in A (bounded by $|A|^p \exp(-qA^2)$ for suitable constants). Thus, the sum p(1) + p(2) + ... p(n) is asymptotically equivalent to $(\log n)/(2\log (2)-1)$.
Maybe this is equivalent to the 2003 argument from the conference. If the goal is to get a proof from the generating function, it suggests that dividing by $(1-x)$ may be useful for smoothing the p(n)'s.
After getting some insight by looking at some numerical calculations to see what $np(n)-c$ looks like for large $n$, I can now describe the convergence in some detail. First, the results of the simulations strongly suggested the following.
This suggests the following asymptotic form for $p(n)$. $$ p(n)=\frac{c}{n}\left(1+\frac{a_1(n)}{n}+\frac{a_2(n)}{n^2}\right)+o\left(\frac{1}{n^3}\right) $$ where $a_1(n)$ has period 2, with $a_1(0)=0$, $a_1(1) < 0$ and $a_2(n)$ has period 4 with $a_2(0) > 0$ and $a_2(2) < 0$. In fact, we can expand $p(n)$ to arbitrary order [Edit: Actually, not quite true. See below] $$ \begin{array} {}p(n)=c\sum_{r=0}^s\frac{a_r(n)}{n^{r+1}}+O(n^{-s-2})&&(1) \end{array} $$ and the term $a_r(n)$ is periodic in $n$, with period $2^r$. Here, I have normalized $a_0(n)$ to be equal to 1.
We can compute all of the coefficients in (1). As $p$ satisfies the recurrence relation $$ \begin{array} \displaystyle p(n+1)=(1+1/n)p(n)-1_{\lbrace n\textrm{ even}\rbrace}\frac2np(n/2) -1_{\lbrace n=1\rbrace}.&&(2) \end{array} $$ we can simply plug (1) into this, expand out the terms $(n+1)^{-r}=\sum_s\binom{-r}{s}n^{-r-s}$ on the left hand side, and compare coefficients of $1/n$. $$ \begin{array} \displaystyle a_r(k+1)-a_r(k)=a_{r-1}(k)-\sum_{s=0}^{r-1}\binom{-s-1}{r-s}a_{s}(k+1)-1_{\lbrace k\textrm{ even}\rbrace}2^{r+1}a_{r-1}(k/2).&&(3) \end{array} $$ Letting $\bar a_r$ be the average of $a_r(k)$ as $k$ varies, we can average (3) over $k$ to get a recurrence relation for $\bar a_r$. Alternatively, the function $f(n)=\sum_r\bar a_rn^{-r-1}$ must satisfy $f(n+1)=(1+1/n)f(n)-f(n/2)/n$ which is solved by $f(n)=1/(n+1)=\sum_{r\ge0}(-1)^rn^{-r-1}$, so we get $\bar a_r=(-1)^r$. Then, (3) can be applied iteratively to obtain $a_r(k+1)-a_r(k)$ in terms of $a_s(k)$ for $s < r$. Together with $\bar a_r$, this gives $a_r(k)$ and it can be seen that the period of $a_r(k)$ doubles at each step. Doing this gives $a_r\equiv(a_r(0),\ldots,a_r(2^r-1))$ as follows $$ \begin{align} & a_0=(1),\\\\ & a_1=(0,-2),\\\\ & a_2=(7,-3,-9,13)/2 \end{align} $$ These agree with the numerical simulation mentioned above.
Update: I tried another numerical simulation to check these asymptotics, by successively subtracting out the leading order terms. You can see it converges beautifully to the levels $a_0$, $a_1$, $a_2$ but, then...
... it seems that after the $a_2n^{-3}$ part, there is an oscillating term! I wasn't expecting that, but it there is an asymptotic of the form $cn^{-r}\sin(a\log n+\theta)$, where you can solve to leading order to obtain $r\approx3.54536$, $a\approx10.7539$.
Update 2: I was re-thinking this question a few days ago, and it suddenly occured how you can not only prove it using analytic methods, but give a full asymptotic expansion to arbitrary order. The idea involves some very cool maths! (if I may say so myself). Apologies that this answer has grown and grown, but I think it's worth it. It is a very interesting question and I've certainly learned a lot by thinking about it. The idea is that, instead of using a power series generating function as in Qiaochu's answer, you use a Dirichlet series which can be inverted with Perron's formula [1]. First, the expansion is as follows, $$ \begin{array}{ccc} \displaystyle p(n)=\sum_{\Re[r]+k\le \alpha}a_{r,k}(n)n^{-r-k}+o(n^{-\alpha}),&&(4) \end{array} $$ for any real $\alpha$. The sum is over nonnegative integers $k$ and complex numbers $r$ with real part at least 1 and satisfying $r+1=2^r$ (the leading term being $r=1$), and $a_{r,k}(n)$ is a periodic function of $n$, with period $2^k$. The reason for such exponents is that the difference equation (2) has the continuous-time limit $p^\prime(x)=p(x)/x-p(x/2)/x$ which has solutions $p(x)=x^{-r}$ for precisely such exponents. Splitting into real and imaginary parts $r=u+iv$, all solutions to $r+1=2^r$ lie on the line $(1+u)^2+v^2=4^u$ and, other than the leading term $u=1$, $v=0$, there is precisely one complex solution with imaginary part $2n\pi\le v\log2\le2n\pi+\pi/2$ (positive integer $n$) and, together with the complex conjugates, this lists all possible exponents $r$. Then, $a_{r,k}(n)$ is determined (as a multiple of $a_{r,0}$) for $k > 0$ by substituting this expansion back into the difference equation as I did above. I arrived at this expansion after running the simulations plotted above (and T..'s mention of complex exponents of n helped). Then, the Dirichlet series idea nailed it.
Define the Dirichlet series with coefficients $p(n)/n$ $$ L(s)=\sum_{n=1}^\infty p(n)n^{-1-s}, $$ which converges absolutely for the real part of $s$ large enough (greater than 0, since $p$ is bounded by 1). It can be seen that $L(s)-1$ is of order $2^{-1-s}$ in the limit as the real part of $s$ goes to infinity. Multiplying (2) by $n^{-s}$, summing and expanding $n^{-s}$ in terms of $(n+1)^{-s}$ on the LHS gives the functional equation $$ \begin{array} \displaystyle (s-1+2^{-s})L(s)=s+\sum_{k=1}^\infty(-1)^k\binom{-s}{k+1}(L(s+k)-1).&&(5) \end{array} $$ This extends $L$ to a meromorphic function on the whole complex plane with simple poles precisely at the points $-r$ with real part at least one and $r+1 = 2^r$, and then at $-r-k$ for nonnegative integers $k$. The pole with largest real component is at $s = -1$ and has residue $$ {\rm Res}(L,-1)={\rm Res}(s/(s-1+2^{-s}),-1)=\frac{1}{2\log2-1}. $$ If we define $p^\prime(n)$ by the truncated expansion (4) for some suitably large $\alpha$, then it will satisfy the recurrence relation (2) up to an error term of size $O(n^{-\alpha-1})$ and its Dirichlet series will satisfy the functional equation (5) up to an error term which will be an analytic function over $\Re[s] > -\alpha$ (being a Dirichlet series with coefficients $o(n^{-\alpha-1}))$. It follows that $p^\prime$ has simple poles in the same places as $p$ on the domain $\Re[s] > -\alpha$ and, by choosing $a_{r,0}$ appropriately, it will have the same residues. Then, the Dirichlet series associated with $q(n)= p^\prime(n)-p(n)$ will be analytic on $\Re[s] > -\alpha$ We can use Perron's formula [2] to show that $q(n)$ is of size $O(n^\beta)$ for any $\beta > -\alpha$ and, by making $\alpha$ as large as we like, this will prove the asymptotic expansion (4). Differentiated, Perron's formula gives $$ dQ(x)/dx = \frac{1}{2\pi i}\int_{\beta-i\infty}^{\beta+i\infty}x^sL(s)\,ds $$ where $Q(x)=\sum_{n\le x}q(n)$ and $\beta > -\alpha$. This expression is intended to be taken in the sense of distributions (i.e., multiply both sides by a smooth function with compact support in $(0,\infty)$ and integrate). If $\theta$ is a smooth function of compact support in $(0,\infty)$ then $$ \begin{array} \displaystyle\sum_{n=1}^\infty q(n)\theta(n/x)/x&\displaystyle=\int_0^\infty\theta(y)\dot Q(xy)\,dy\\\\ &\displaystyle=\frac{1}{2\pi i}\int_{\beta-i\infty}^{\beta+i\infty}x^sL(s)\int\theta(y)y^s\,dy\,ds=O(x^\beta)\ \ (6) \end{array} $$ We obtain the final bound, because by integration by parts, the integral of $\theta(y)y^s$ can be shown to go to zero faster than any power of $1/s$, so the integrand is indeed integrable and the $x^\beta$ term can be pulled outside. This is enough to show that $q(n)$ is itself of $O(n^\beta)$. Trying to finish this answer off without too much further detail, the argument is as follows. If $q(n)n^{-\beta}$ was unbounded then would keep exceeding its previous maximum and, by the recurrence relation (2), it would take time at least Ω(n) [3] to get back close to zero. So, if $\theta$ has support in $[1,1+\epsilon]$ for small enough $\epsilon$, the integral $\int\theta(y)\dot Q(ny)\,dy$ will be of order $q(n)$ at such times and, as this happens infinitely often, it would contradict (6). Phew! I knew that this could be done, but it took some work! Possibly not as simple or direct as you were asking for, but Dirichlet series are quite standard (more commonly in analytic number theory, in my experience). However, maybe not really more difficult than the probabilistic method and you do get a whole lot more. This approach should also work for other types of recurrence relations and differential equations too.
Finally, I added a much more detailed writeup on my blog, fleshing out some of the details which I skimmed over here. See Asymptotic Expansions of a Recurrence Relation [4].
[1] http://en.wikipedia.org/wiki/Perron%27s_formulaHere is the standard generating function gruntwork. Let $Q(x) = \sum_{n \ge 1} \frac{p(n)}{n} x^n$. Then
$$Q'(x) = \sum_{n \ge 1} p(n) x^{n-1} = \sum_{n \ge 1}^{\infty} x^{n-1} \sum_{j = \lceil \frac{n}{2} \rceil}^{n-1} \frac{p(j)}{j}.$$
Exchanging the order of summation gives
$$Q'(x) = \sum_{j=1}^{\infty} \frac{p(j)}{j} \sum_{n=j+1}^{2j} x^{n-1} = \sum_{j=1}^{\infty} \frac{p(j)}{j} \frac{x^{2j} - x^j}{x - 1}.$$
So it follows that
$$Q'(x) = \frac{Q(x^2) - Q(x)}{x - 1}$$
with initial conditions $Q(0) = 0, Q'(0) = 1$.
Now, as I recall there are theorems in Flajolet and Sedgewick's Analytic Combinatorics [1] that allow us to deduce asymptotic behavior about the coefficients of $Q$ from these kinds of relations; I'm going to hunt one down now, and in the meantime others can see what they can do with this.
[1] http://algo.inria.fr/flajolet/Publications/books.htmlHere is an observation: It looks like $(2n) p(2n) - c$ converges to $0$ much faster than $(2n+1) p(2n+1)-c$ does. Here is a plot of the points $(\log n, n p(n) -c)$, with blue points for$n$ even and red points for $n$ odd.
It might be easiest to first prove the result for $n$ even, then use the fact that $(2n) p(2n) - (2n-1) p(2n-1) = p(2n) + p(2n-1)$ to get that the odd and even points must have the same limit.
I think these observations are probably already in other answers, or even too simple to have been listed, but I wanted to add my working anyway:
p(2n+2)=(n+1)/n(p(2n)-2p(n)/(2n+1))
thus if k(n)= np(n)
k(2n+2)=(n+1)^2/n^2/(2n+1)((2n+1)k(2n)-4k(n))