This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Simple Non-Stationary Mean Ergodic Theorem, with Bonus Weak Law of Large Numbers

Cosma Rohilla Shalizi Departments of Statistics and of Machine Learning, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, and the Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501.
(1 November 2021, small revision 19 March 2022)
Abstract

This brief pedagogical note re-proves a simple theorem on the convergence, in L2L_{2} and in probability, of time averages of non-stationary time series to the mean of expectation values. The basic condition is that the sum of covariances grows sub-quadratically with the length of the time series. I make no claim to originality; the result is widely, but unevenly, spread bit of folklore among users of applied probability. The goal of this note is merely to even out that distribution.

I assume some familiarity with basic probability and stochastic processes, along the lines of Grimmett and Stirzaker (1992), but re-prove a number of basic results, to show that everything really is elementary.

Let X1,X2,Xt,X_{1},X_{2},\ldots X_{t},\ldots be a sequence of real-valued random variables. Assume that μt𝔼[Xt]\mu_{t}\equiv\mathbb{E}\left[X_{t}\right] and Cov[Xt,Xs]\mathrm{Cov}\left[X_{t},X_{s}\right] exist and are finite for all t,st,s. The time average of the XtX_{t}s,

An=1nt=1nXt,A_{n}=\frac{1}{n}\sum_{t=1}^{n}{X_{t}}~{}, (1)

converges on the average of the expectation values,

mn=1nt=1nμt,m_{n}=\frac{1}{n}\sum_{t=1}^{n}{\mu_{t}}~{}, (2)

under a condition on the sum of the covariances,

Vn=t=1ns=1nCov[Xt,Xs].V_{n}=\sum_{t=1}^{n}{\sum_{s=1}^{n}{\mathrm{Cov}\left[X_{t},X_{s}\right]}}~{}. (3)

The condition will be obvious after the following lemma, which will also be useful for extensions.

Lemma 1
𝔼[(Anmn)2]=Var[An]=1n2Vn\mathbb{E}\left[(A_{n}-m_{n})^{2}\right]=\mathrm{Var}\left[A_{n}\right]=\frac{1}{n^{2}}V_{n} (4)

Proof: Since, for any ZZ, 𝔼[Z2]=(𝔼[Z])2+Var[Z]\mathbb{E}\left[Z^{2}\right]=(\mathbb{E}\left[Z\right])^{2}+\mathrm{Var}\left[Z\right], we have

𝔼[(Anmn)2]=(𝔼[Anmn])2+Var[An]\mathbb{E}\left[(A_{n}-m_{n})^{2}\right]=(\mathbb{E}\left[A_{n}-m_{n}\right])^{2}+\mathrm{Var}\left[A_{n}\right] (5)

By linearity of expectation,

𝔼[An]=mn\mathbb{E}\left[A_{n}\right]=m_{n} (6)

On the other hand,

Var[An]\displaystyle\mathrm{Var}\left[A_{n}\right] =\displaystyle= 1n2Var[t=1nXt]\displaystyle\frac{1}{n^{2}}{\mathrm{Var}\left[\sum_{t=1}^{n}{X_{t}}\right]} (7)
=\displaystyle= 1n2t=1ns=1nCov[Xt,Xs]\displaystyle\frac{1}{n^{2}}{\sum_{t=1}^{n}{\sum_{s=1}^{n}{\mathrm{Cov}\left[X_{t},X_{s}\right]}}} (8)
=\displaystyle= 1n2Vn\displaystyle\frac{1}{n^{2}}{V_{n}} (9)

through repeated application of the identity Var[B+C]=Var[B]+Var[C]+2Cov[B,C]\mathrm{Var}\left[B+C\right]=\mathrm{Var}\left[B\right]+\mathrm{Var}\left[C\right]+2\mathrm{Cov}\left[B,C\right], and the definition of VnV_{n}. Substituting into Eq. 5,

𝔼[(Anmn)2]=02+Var[An]=1n2Vn\mathbb{E}\left[(A_{n}-m_{n})^{2}\right]=0^{2}+\mathrm{Var}\left[A_{n}\right]=\frac{1}{n^{2}}V_{n} (10)

as was to be shown. \Box

Theorem 1 (Mean ergodic theorem)

If Vn=o(n2)V_{n}=o(n^{2}), then Anmn0A_{n}-m_{n}\rightarrow 0 in L2L_{2} as nn\rightarrow\infty.

Proof: Convergence in L2L_{2} means that 𝔼[(Anmn)2]0\mathbb{E}\left[(A_{n}-m_{n})^{2}\right]\rightarrow 0, and, by the lemma, 𝔼[(Anmn)2]=Vn/n2\mathbb{E}\left[(A_{n}-m_{n})^{2}\right]=V_{n}/n^{2}. By the assumption of the theorem, Vn=o(n2)V_{n}=o(n^{2}), hence Vn/n2=o(1)V_{n}/n^{2}=o(1), so we have

𝔼[(Anmn)2]=o(1)0\mathbb{E}\left[(A_{n}-m_{n})^{2}\right]=o(1)\rightarrow 0 (11)

Thus Anmn0A_{n}-m_{n}\rightarrow 0 in L2L_{2}; this is a mean (or mean-square) ergodic theorem. \Box

Since VnV_{n} is a sum of n2n^{2} terms, if the sum is to be o(n2)o(n^{2}), most of those terms must be shrinking rapidly to zero as nn grows. That is, Cov[Xt,Xt+h]\mathrm{Cov}\left[X_{t},X_{t+h}\right] must go to zero as hh\rightarrow\infty. Stationarity (see immediately below) is not required.

Definition 1

The sequence of XXs is weakly or second-order stationary when Cov[Xt,Xt+h]=γ(h)\mathrm{Cov}\left[X_{t},X_{t+h}\right]=\gamma(h) for some γ\gamma and all tt.

If one does assume weak stationarity, a sufficient condition for Vn=o(n2)V_{n}=o(n^{2}) is that h=γ(h)=τγ(0)<\sum_{h=-\infty}^{\infty}{\gamma(h)}=\tau\gamma(0)<\infty, since in that case Vn=O(n)V_{n}=O(n). τ\tau has names like “correlation time”, “autocorrelation time”, “integrated autocovariance time”, etc.111Some authors use these names for h=0γ(h)\sum_{h=0}^{\infty}{\gamma(h)}, or even for h=1γ(h)\sum_{h=1}^{\infty}{\gamma(h)}. Any one of these sums is finite if and only if the others are.. From Lemma 4, for uncorrelated variables, Var[An]=γ(0)/n\mathrm{Var}\left[A_{n}\right]=\gamma(0)/n, but when 0<τ<0<\tau<\infty, nVar[An]τγ(0)n\mathrm{Var}\left[A_{n}\right]\rightarrow\tau\gamma(0), so the “effective sample size” is reduced from nn to n/τn/\tau.

It is unnecessary for the theorem to assume that mnm_{n} converges, i.e., that the instantaneous expectations μn\mu_{n} are Cèsaro-convergent. Still less is it necessary to assume that the μn\mu_{n} have a limit. Var[Xt]\mathrm{Var}\left[X_{t}\right] need not be constant or tending to a limit either, though it cannot grow too fast.

L2L_{2} convergence implies convergence in probability, or what is usually known as the weak law of large numbers.

Corollary 1 (Weak law of large numbers)

Under the conditions of the theorem, Anmn0A_{n}-m_{n}\rightarrow 0 in probability.

Proof: Use Chebyshev’s inequality (re-proved below as Proposition 1): for any random variable ZZ,

(|Z𝔼[Z]|ϵ)Var[Z]ϵ2\mathbb{P}\left(|Z-\mathbb{E}\left[Z\right]|\geq\epsilon\right)\leq\frac{\mathrm{Var}\left[Z\right]}{\epsilon^{2}} (12)

Applied to AnA_{n}, this gives, for each fixed ϵ\epsilon,

(|Anmn|ϵ)Vnn2ϵ20\mathbb{P}\left(|A_{n}-m_{n}|\geq\epsilon\right)\leq\frac{V_{n}n^{-2}}{\epsilon^{2}}\rightarrow 0 (13)

which is the definition of convergence to 0 in probability. \Box

The convergence results carry over easily to dd-dimensional vectors, at least if dd is fixed as nn grows.

Corollary 2

Let Yt=(Yt1,Yt2,Ytd)\vec{Y}_{t}=(Y_{t1},Y_{t2},\ldots Y_{td}) be a sequence of dd-dimensional vectors, with expected values μt\vec{\mu}_{t}, and define Vnj=t=1ns=1nCov[Ytj,Ysj]V_{nj}=\sum_{t=1}^{n}{\sum_{s=1}^{n}{\mathrm{Cov}\left[Y_{tj},Y_{sj}\right]}}. If Vnj=o(n2)V_{nj}=o(n^{2}) for all jj, then

1nt=1nYt1nt=1nμt0\left\|\frac{1}{n}\sum_{t=1}^{n}{\vec{Y}_{t}}-\frac{1}{n}\sum_{t=1}^{n}{\vec{\mu}_{t}}\right\|\rightarrow 0 (14)

in L2L_{2} and in probability.

Proof: Apply the theorem and the previous corollary to each coordinate of YY separately to get convergence along each coordinate, and hence convergence of the Euclidean distance to zero. \Box

If VnV_{n} grows too fast, then the convergence to a deterministic limit fails.

Corollary 3

If Vn=Ω(n2)V_{n}=\Omega(n^{2}), then Anmn↛0A_{n}-m_{n}\not\rightarrow 0 in L2L_{2}.

Proof: Vn=Ω(n2)V_{n}=\Omega(n^{2}) means that lim infVn/n2=v>0\liminf{V_{n}/n^{2}}=v>0. From the lemma, we know that Var[An]=Vn/n2\mathrm{Var}\left[A_{n}\right]=V_{n}/n^{2}, which does not go to zero, so convergence in L2L_{2} must fail. \Box

Convergence in probability is slightly more delicate.

Corollary 4

If Vn=Ω(n2)V_{n}=\Omega(n^{2}) and Var[An2]=O(Vn2/n4)\mathrm{Var}\left[A_{n}^{2}\right]=O(V_{n}^{2}/n^{4}), then Anmn↛0A_{n}-m_{n}\not\rightarrow 0 in probability.

Proof: Begin with the previous corollary, and apply the Paley-Zygmund inequality (Proposition 3) to the non-negative random variable (Anμn)2(A_{n}-\mu_{n})^{2}, whose expected value is (again, from Lemma 4) Vn/n2V_{n}/n^{2}. By the inequality, for any ϵVn/n2\epsilon\leq V_{n}/n^{2},

((Anμn)2ϵ)\displaystyle\mathbb{P}\left((A_{n}-\mu_{n})^{2}\geq\epsilon\right) \displaystyle\geq (Vn/n2ϵ)2Var[(Anμn)2]+(Vn/n2)2\displaystyle\frac{(V_{n}/n^{2}-\epsilon)^{2}}{\mathrm{Var}\left[(A_{n}-\mu_{n})^{2}\right]+(V_{n}/n^{2})^{2}} (15)
(|Anμn|ϵ)\displaystyle\mathbb{P}\left(\left|A_{n}-\mu_{n}\right|\geq\sqrt{\epsilon}\right) \displaystyle\geq (Vn/n2ϵ)2Var[An2]+(Vn/n2)2\displaystyle\frac{(V_{n}/n^{2}-\epsilon)^{2}}{\mathrm{Var}\left[A_{n}^{2}\right]+(V_{n}/n^{2})^{2}} (16)

Restrict ourselves to ϵ<v/2\epsilon<v/2. Then, for all sufficiently large nn,

(|Anμn|ϵ)\displaystyle\mathbb{P}\left(\left|A_{n}-\mu_{n}\right|\geq\sqrt{\epsilon}\right) \displaystyle\geq (vϵ)2v2+Var[An2]\displaystyle\frac{(v-\epsilon)^{2}}{v^{2}+\mathrm{Var}\left[A_{n}^{2}\right]} (17)
\displaystyle\geq (v/2)2v2+Var[An2]\displaystyle\frac{(v/2)^{2}}{v^{2}+\mathrm{Var}\left[A_{n}^{2}\right]} (18)
=\displaystyle= 1411+Var[An2]/v2>0\displaystyle\frac{1}{4}\frac{1}{1+\mathrm{Var}\left[A_{n}^{2}\right]/v^{2}}>0 (19)

so Anμn↛0A_{n}-\mu_{n}\not\rightarrow 0 in probability. \Box

Remark 3: I suspect the extra condition needed to force non-convergence in probability can be weakened, because the underlying Paley-Zygmund inequality used in the proof isn’t necessarily sharp. But an example helps show that some condition is necessary. Suppose that for each tt, Xt=±t3/2X_{t}=\pm t^{3/2} with probability 12t2\frac{1}{2}t^{-2}, otherwise Xt=0X_{t}=0, and that the XtX_{t} are all mutually independent. Then mn=0m_{n}=0 for all nn. Moreover, (Xt0)=1/t2\mathbb{P}\left(X_{t}\neq 0\right)=1/t^{2}. Since those probabilities are summable, by the Borel-Cantelli lemma, (Xt0infinitely often)=0\mathbb{P}\left(X_{t}\neq 0~{}\text{infinitely often}\right)=0. But then Xt=0X_{t}=0 for all but finitely many tt almost surely, hence An0A_{n}\rightarrow 0 in probability. On the other hand, Var[Xt]=t\mathrm{Var}\left[X_{t}\right]=t, so Vn=n(n+1)/2V_{n}=n(n+1)/2, limVn/n2=1/2\lim{V_{n}/n^{2}}=1/2, and An↛0A_{n}\not\rightarrow 0 in L2L_{2}. Verifying that the second condition of the corollary does not hold involves some straightforward but detailed algebra, given in Appendix C. While this example is deliberately stylized, it does get at what’s needed to have convergence in probability without convergence in L2L_{2}: the probability that |Anmn|ϵ|A_{n}-m_{n}|\geq\epsilon has to be going to zero, no matter how small we set ϵ\epsilon, but when there are fluctuations in AnA_{n} away from mnm_{n}, they need to be getting larger and larger. Whether this is a realistic concern or a paranoid fear will depend on the application.

Remark 4: Convergence of the AnA_{n} not to the deterministic mnm_{n} but to a random limit, as in the full mean-square ergodic theorem for weakly stationary sequences (as given by, e.g., Grimmett and Stirzaker (1992, §9.5, theorem 3)) would seem to require more advanced tools.

Credit

I do not know the history of Theorem 1, but I want to emphasize again that it is not original to me. I learned it, without attribution to any particular source or even a name, when studying statistical mechanics in the physics department of the University of Wisconsin-Madison in the mid-1990s. A version of the argument which assumes weak (second-order) stationarity of the XtX_{t} but allows for continuous time appears in Frisch (1995, pp. 50–51). The oldest version of the result I have been able to locate is Taylor (1922). (While the paper was not published until 1922, it was read before the London Mathematical Society in 1920.) This again develops the result assuming weak stationarity, but in both discrete and continuous time. Taylor presents this as a new result, but someone else might be able to claim historical priority.

Acknowledgments

I am grateful to David Darmon and Paul J. Wolfson for correspondence which led me to write this; to support for a sabbatical year in 2017–2018 from Carnegie Mellon University; and to my students in 36-462, “Data over Space and Time”, in 2018 and 2020, for letting me test versions of this material on them.

References

  • Frisch (1995) Frisch, Uriel (1995). Turbulence: The Legacy of A. N. Kolmogorov. Cambridge, England: Cambridge University Press.
  • Grimmett and Stirzaker (1992) Grimmett, G. R. and D. R. Stirzaker (1992). Probability and Random Processes. Oxford: Oxford University Press, 2nd edn.
  • Taylor (1922) Taylor, G. I. (1922). “Diffusion by Continuous Movements.” Proceedings of the London Mathematical Society, 20: 196–212. doi:10.1112/plms/s2-20.1.196.

Appendix A Upper Bounds: Markov and Chebyshev

Going from L2L_{2} convergence to convergence in probability uses an inequality which has come to be associated with the name of Chebyshev:

Proposition 1 (Chebyshev inequality)

For any real-valued random variable ZZ,

(|Z𝔼[Z]|ϵ)Var[Z]ϵ2\mathbb{P}\left(|Z-\mathbb{E}\left[Z\right]|\geq\epsilon\right)\leq\frac{\mathrm{Var}\left[Z\right]}{\epsilon^{2}}

This is itself an easy consequence of another inequality:

Proposition 2 (Markov inequality)

For any non-negative, real-value random variable ZZ,

(Zϵ)𝔼[Z]ϵ\mathbb{P}\left(Z\geq\epsilon\right)\leq\frac{\mathbb{E}\left[Z\right]}{\epsilon}

The intuition behind Markov’s inequality is simple: the probability of ZZ being large can’t be too big, without also driving up the expected value of ZZ.

Proof (of Proposition 2): For any event BB, 𝕀(B)+𝕀(Bc)=1\mathbb{I}\left(B\right)+\mathbb{I}\left(B^{c}\right)=1. So, clearly,

Z=Z𝕀(Zϵ)+Z𝕀(Z<ϵ)Z=Z\mathbb{I}\left(Z\geq\epsilon\right)+Z\mathbb{I}\left(Z<\epsilon\right)

and thus

𝔼[Z]\displaystyle\mathbb{E}\left[Z\right] =\displaystyle= 𝔼[Z𝕀(Zϵ)]+𝔼[Z𝕀(Z<ϵ)]\displaystyle\mathbb{E}\left[Z\mathbb{I}\left(Z\geq\epsilon\right)\right]+\mathbb{E}\left[Z\mathbb{I}\left(Z<\epsilon\right)\right] (20)
\displaystyle\geq 𝔼[Z𝕀(Zϵ)]\displaystyle\mathbb{E}\left[Z\mathbb{I}\left(Z\geq\epsilon\right)\right] (21)
\displaystyle\geq 𝔼[ϵ𝕀(Zϵ)]\displaystyle\mathbb{E}\left[\epsilon\mathbb{I}\left(Z\geq\epsilon\right)\right] (22)
=\displaystyle= ϵ𝔼[𝕀(Zϵ)]=ϵ(Zϵ)\displaystyle\epsilon\mathbb{E}\left[\mathbb{I}\left(Z\geq\epsilon\right)\right]=\epsilon\mathbb{P}\left(Z\geq\epsilon\right) (23)

as was to be shown. \Box

Proof (of Proposition 1): |Z𝔼[Z]|ϵ|Z-\mathbb{E}\left[Z\right]|\geq\epsilon if and only if (Z𝔼[Z])2ϵ2(Z-\mathbb{E}\left[Z\right])^{2}\geq\epsilon^{2}. But (Z𝔼[Z])2(Z-\mathbb{E}\left[Z\right])^{2} is a non-negative, real-valued random variable, with expected value Var[Z]\mathrm{Var}\left[Z\right], so the proposition follows by applying Proposition 2. \Box

Appendix B Lower Bounds: Paley-Zygmund

Markov’s inequality says that the probability of large values can’t be too high, without increasing the expected value. A counter-part inequality essentially says that the probability of large values can’t be too small, either, without decreasing the expected value. Start with Equation 20, again assuming Z0Z\geq 0:

𝔼[Z]\displaystyle\mathbb{E}\left[Z\right] =\displaystyle= 𝔼[Z𝕀(Zϵ)]+𝔼[Z𝕀(Z<ϵ)]\displaystyle\mathbb{E}\left[Z\mathbb{I}\left(Z\geq\epsilon\right)\right]+\mathbb{E}\left[Z\mathbb{I}\left(Z<\epsilon\right)\right] (24)
\displaystyle\leq 𝔼[Z𝕀(Zϵ)]+ϵ𝔼[𝕀(Z<ϵ)]\displaystyle\mathbb{E}\left[Z\mathbb{I}\left(Z\geq\epsilon\right)\right]+\epsilon\mathbb{E}\left[\mathbb{I}\left(Z<\epsilon\right)\right] (25)
\displaystyle\leq 𝔼[Z2]𝔼[𝕀(Zϵ)]+ϵ𝔼[𝕀(Z<ϵ)]\displaystyle\sqrt{\mathbb{E}\left[Z^{2}\right]\mathbb{E}\left[\mathbb{I}\left(Z\geq\epsilon\right)\right]}+\epsilon\mathbb{E}\left[\mathbb{I}\left(Z<\epsilon\right)\right] (26)
\displaystyle\leq 𝔼[Z2](Zϵ)+ϵ\displaystyle\sqrt{\mathbb{E}\left[Z^{2}\right]\mathbb{P}\left(Z\geq\epsilon\right)}+\epsilon (27)
(𝔼[Z]ϵ)2𝔼[Z2]\displaystyle\frac{(\mathbb{E}\left[Z\right]-\epsilon)^{2}}{\mathbb{E}\left[Z^{2}\right]} \displaystyle\leq (Zϵ)\displaystyle\mathbb{P}\left(Z\geq\epsilon\right) (28)

where Eq. 26 uses the Cauchy-Schwarz inequality. We have thus proved

Proposition 3 (Paley-Zygmund Inequality)

For a random variable Z0Z\geq 0, and ϵ𝔼[Z]\epsilon\leq\mathbb{E}\left[Z\right],

(Zϵ)(𝔼[Z]ϵ)2Var[Z]+𝔼[Z]2\mathbb{P}\left(Z\geq\epsilon\right)\geq\frac{(\mathbb{E}\left[Z\right]-\epsilon)^{2}}{\mathrm{Var}\left[Z\right]+\mathbb{E}\left[Z\right]^{2}} (29)

(The proposition is usually stated in the form

(Zθ𝔼[Z])(1θ)2𝔼[Z]2Var[Z]+𝔼[Z]2\mathbb{P}\left(Z\geq\theta\mathbb{E}\left[Z\right]\right)\geq\frac{(1-\theta)^{2}\mathbb{E}\left[Z\right]^{2}}{\mathrm{Var}\left[Z\right]+\mathbb{E}\left[Z\right]^{2}} (30)

for θ(0,1)\theta\in(0,1), but this is clearly equivalent.)

Appendix C Algebraic Details for Remark 3

The example in Remark 3 converges in probability but not in L2L_{2}, so we need to verify that one or the other conditions of Corollary 4 fails. In fact, the failing condition is the one about fourth moments, that Var[An2]=O(Vn2/n4)\mathrm{Var}\left[A_{n}^{2}\right]=O(V_{n}^{2}/n^{4}).

An2\displaystyle A_{n}^{2} =\displaystyle= n2[t=1nXt2+2t=1n1t+1nXtXs]\displaystyle n^{-2}\left[\sum_{t=1}^{n}{X_{t}^{2}}+2\sum_{t=1}^{n-1}\sum_{t+1}^{n}{X_{t}X_{s}}\right] (31)
Var[An2]\displaystyle\mathrm{Var}\left[A_{n}^{2}\right] =\displaystyle= n4[(Var[Xt2])\displaystyle n^{-4}\left[\sum(\mathrm{Var}\left[X_{t}^{2}\right])\right.
+4t=1n1s=t+1nVar[XtXs]\displaystyle+4\sum_{t=1}^{n-1}{\sum_{s=t+1}^{n}{\mathrm{Var}\left[X_{t}X_{s}\right]}}
+2t=1n1s=t+1nCov[Xt2,Xs2]\displaystyle+2\sum_{t=1}^{n-1}{\sum_{s=t+1}^{n}{\mathrm{Cov}\left[X_{t}^{2},X_{s}^{2}\right]}}
+4t=1ns=1n1r=s+1nCov[Xt2,XsXr]\displaystyle+4\sum_{t=1}^{n}\sum_{s=1}^{n-1}{\sum_{r=s+1}^{n}{\mathrm{Cov}\left[X_{t}^{2},X_{s}X_{r}\right]}}
+8t=1n1s=t+1n(r,q)(t,s)Cov[XtXs,XrXq]]\displaystyle\left.+8\sum_{t=1}^{n-1}{\sum_{s=t+1}^{n}{\sum_{(r,q)\neq(t,s)}{\mathrm{Cov}\left[X_{t}X_{s},X_{r}X_{q}\right]}}}\right]

Take the terms appearing in the expression for Var[An2]\mathrm{Var}\left[A_{n}^{2}\right] one at a time:

Var[Xt2]\displaystyle\mathrm{Var}\left[X_{t}^{2}\right] =\displaystyle= 𝔼[Xt4](𝔼[Xt2])2\displaystyle\mathbb{E}\left[X_{t}^{4}\right]-(\mathbb{E}\left[X_{t}^{2}\right])^{2} (33)
=\displaystyle= t6/t2(Var[Xt]+𝔼[Xt])2\displaystyle t^{6}/t^{2}-(\mathrm{Var}\left[X_{t}\right]+\mathbb{E}\left[X_{t}\right])^{2} (34)
=\displaystyle= t4(t+0)2=t4t2=t2(t21)\displaystyle t^{4}-(t+0)^{2}=t^{4}-t^{2}=t^{2}(t^{2}-1) (35)
Var[XsXt]\displaystyle\mathrm{Var}\left[X_{s}X_{t}\right] =\displaystyle= 𝔼[Xt2Xs2](𝔼[XtXs])2\displaystyle\mathbb{E}\left[X_{t}^{2}X_{s}^{2}\right]-\left(\mathbb{E}\left[X_{t}X_{s}\right]\right)^{2} (36)
=\displaystyle= 𝔼[Xt2]𝔼[Xs2](𝔼[Xt]𝔼[Xs])2\displaystyle\mathbb{E}\left[X_{t}^{2}\right]\mathbb{E}\left[X_{s}^{2}\right]-(\mathbb{E}\left[X_{t}\right]\mathbb{E}\left[X_{s}\right])^{2} (37)
=\displaystyle= ts0=ts\displaystyle ts-0=ts (38)

using the fact that the expectation of the product of two independent variables is the product of their expectations. On the other hand, for sts\neq t,

Cov[Xt2,Xs2]\displaystyle\mathrm{Cov}\left[X_{t}^{2},X_{s}^{2}\right] =\displaystyle= 0\displaystyle 0 (39)

by independence of the XtX_{t}s. Similarly, unless s=ts=t or r=tr=t, Cov[Xt2,XsXr]=0\mathrm{Cov}\left[X_{t}^{2},X_{s}X_{r}\right]=0. Since srs\neq r, at most one of ss and rr could equal tt, so, without loss of generality, say it’s s=ts=t. Then we have

Cov[Xt2,XtXr]\displaystyle\mathrm{Cov}\left[X_{t}^{2},X_{t}X_{r}\right] =\displaystyle= 𝔼[Xt3Xr]𝔼[Xt2]𝔼[XtXr]\displaystyle\mathbb{E}\left[X_{t}^{3}X_{r}\right]-\mathbb{E}\left[X_{t}^{2}\right]\mathbb{E}\left[X_{t}X_{r}\right] (40)
=\displaystyle= 𝔼[Xt3]𝔼[Xr]𝔼[Xt2]𝔼[Xt]𝔼[Xr]\displaystyle\mathbb{E}\left[X_{t}^{3}\right]\mathbb{E}\left[X_{r}\right]-\mathbb{E}\left[X_{t}^{2}\right]\mathbb{E}\left[X_{t}\right]\mathbb{E}\left[X_{r}\right] (41)
=\displaystyle= 0\displaystyle 0 (42)

So Cov[Xt2,XtXr]=0\mathrm{Cov}\left[X_{t}^{2},X_{t}X_{r}\right]=0 as well. Likewise Cov[XtXs,XrXq]=0\mathrm{Cov}\left[X_{t}X_{s},X_{r}X_{q}\right]=0 if all the indices are distinct, but even if indices overlap, everything boils out to zero.

Going back to the variance of An2A_{n}^{2}, then,

Var[An2]\displaystyle\mathrm{Var}\left[A_{n}^{2}\right] =\displaystyle= 1n4[t=1nt4t2+t=1n1s=t+1nts]\displaystyle\frac{1}{n^{4}}\left[\sum_{t=1}^{n}{t^{4}-t^{2}}+\sum_{t=1}^{n-1}{\sum_{s=t+1}^{n}{ts}}\right] (43)

Now t4n5\sum{t^{4}}\sim n^{5}, so Var[An2]n\mathrm{Var}\left[A_{n}^{2}\right]\sim n, but Vn2/n4=(n(n+1)/2)2/n41/4V_{n}^{2}/n^{4}=(n(n+1)/2)^{2}/n^{4}\rightarrow 1/4, so the second condition of the corollary is indeed violated.