This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Limiting partition function for the Mallows model: a conjecture and partial evidence

Soumik Pal Department of Mathematics
University of Washington
Seattle, WA 98195
soumik@uw.edu
Abstract.

Let 𝕊n\mathbb{S}_{n} denote the set of permutations of nn labels. We consider a class of Gibbs probability models on 𝕊n\mathbb{S}_{n} that is a subfamily of the so-called Mallows model of random permutations. The Gibbs energy is given by a class of right invariant divergences on 𝕊n\mathbb{S}_{n} that includes common choices such as the Spearman foot rule and the Spearman rank correlation. Mukherjee [Muk16] computed the limit of the (scaled) log partition function (i.e. normalizing factor) of such models as nn\rightarrow\infty. Our objective is to compute the exact limit, as nn\rightarrow\infty, without the log. We conjecture that this limit is given by the Fredholm determinant of an integral operator related to the so-called Schrödinger bridge probability distributions from optimal transport theory. We provide partial evidence for this conjecture, although the argument lacks a final error bound that is needed for it to become a complete proof.

Key words and phrases:
Mallows model, random permutation, Schrödinger bridge, Fredholm determinant
2000 Mathematics Subject Classification:
60B15, 60C99
This research is partially supported by NSF grant DMS-2052239, DMS-2134012 and the PIMS Research Network grant Kantorovich Initiative.

1. Introduction

Let c:[0,1]2[0,)c:[0,1]^{2}\rightarrow[0,\infty) denote a cost function satisfying the following assumptions

  • cc is twice continuously differentiable on [0,1]2[0,1]^{2}.

  • c(x,x)=0c(x,x)=0 for all x[0,1]x\in[0,1].

  • cc is symmetric, i.e. c(x,y)=c(y,x)c(x,y)=c(y,x) for all (x,y)[0,1]2(x,y)\in[0,1]^{2}.

  • c(x,y)=c(1x,1y)c(x,y)=c(1-x,1-y) for all (x,y)[0,1]2(x,y)\in[0,1]^{2}.

An example of such a cost function is c(x,y)=(xy)2c(x,y)=(x-y)^{2}. The first of these assumptions is made for technical convenience as will be apparent below. No attempt has been made to get the optimal set of assumptions.

Fix nn\in\mathbb{N}. Let 𝕊n\mathbb{S}_{n} denote the set of all permutations of nn labels [n]:={1,2,,n}[n]:=\{1,2,\ldots,n\}. Consider the following quantity

(1) Ln=1n!σ𝕊nexp(i=1nc(i/n,σi/n)).L_{n}=\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\exp\left(-\sum_{i=1}^{n}c(i/n,\sigma_{i}/n)\right).

We are interested in the limit of this sequence as nn\rightarrow\infty. The reason it comes up is that this is the partition function of a family of probability distributions which is a subset of the well-known Mallows models [Mal57] of random permutations. See the Introduction in [Muk16] and many applications listed in [Dia88, Chapters 5 and 6]. For example, the case of c(x,y)=(xy)2c(x,y)=(x-y)^{2} is related to the Spearman rank correlation.

Our goal in this paper is to understand limnLn\lim_{n\rightarrow\infty}L_{n}. This problem is important in statistical estimation [Muk16] and also to understand scaling limits of large random permutations with fixed patterns [KKRW20]. See also [DR00, Section 2e] for generalizations to other groups where the importance of this problem is stressed. We will try to convince the reader that there are constants Γ0\Gamma_{0} and CC such that

(2) limnenΓ0Ln=C.\lim_{n\rightarrow\infty}e^{n\Gamma_{0}}L_{n}=C.

The value of the constant Γ0\Gamma_{0} is already known due to [Muk16] and is the value of an entropy-regularized optimal transport problem with uniform marginals. See also [Sta09] for a special case of a discontinuous cost function. We conjecture in this paper, and give partial evidence, that, under suitable assumptions, the constant CC is the Fredholm determinant [vN22, Definition 14.35] of a certain integral operator related to the so-called Schrödinger bridge, the optimal coupling for the same entropy-regularized optimal transport problem. Both these concepts are described below. Taken together, they give the limiting partition function of this class of Mallows models that satisfy all our assumptions.

The concept of entropy-regularized optimal transport and the related notion of Schrödinger bridges can be found in [Léo12]. Let μ\mu denote the Uni(0,1)(0,1) distribution. Let Π(μ,μ)\Pi(\mu,\mu) denote the set of couplings (i.e., joint distributions) with both marginals μ\mu. Then the entropic OT problem is given as the solution to the following optimization problem on Π(μ,μ)\Pi(\mu,\mu):

(3) Γ0:=infξΠ(μ,μ)[c(x,y)ξ(x,y)𝑑x𝑑y+Ent(ξ)],\Gamma_{0}:=\inf_{\xi\in\Pi(\mu,\mu)}\left[\int c(x,y)\xi(x,y)dxdy+\mathrm{Ent}(\xi)\right],

where Ent()\mathrm{Ent}(\cdot) is the optimal transport entropy (the negative of the usual differential Shannon entropy) given by Ent(ξ)=ξ(x,y)logξ(x,y)𝑑x𝑑y\mathrm{Ent}(\xi)=\int\xi(x,y)\log\xi(x,y)dxdy if ξ\xi has a density (also denoted by ξ\xi) and infinity otherwise.

The optimal ρΠ(μ,μ)\rho\in\Pi(\mu,\mu) that attains Γ0\Gamma_{0} exists and is called the (static) Schrödinger bridge for the cost cc and marginals μ\mu and μ\mu. From the work of Rüschendorf and Thomsen [RT93] (building on Csiszar [Csi75]) it is known that the Schrödinger bridge always admits a density is of the following form

(4) ρ(x,y)=exp(c(x,y)a(x)a(y)),\rho(x,y)=\exp\left(-c(x,y)-a(x)-a(y)\right),

for some measurable function aa satisfying the following marginal constraint almost surely.

(5) 01ec(x,y)a(y)𝑑y=ea(x),forx[0,1].\int_{0}^{1}e^{-c(x,y)-a(y)}dy=e^{-a(x)},\quad\text{for}\;x\in[0,1].

Please note that we are using a standard abuse of the notation by referring to both the measure and its density by the letter ρ\rho.

In particular, ρ\rho is symmetric in its argument (since both cc and the marginal constraints are symmetric in the coordinates) and

Γ0=c(x,y)ρ(x,y)𝑑x𝑑y+Ent(ρ)=c(x,y)ρ(x,y)𝑑x𝑑y(c(x,y)+a(x)+a(y))ρ(x,y)𝑑x𝑑y=201a(x)𝑑x,\begin{split}\Gamma_{0}&=\int c(x,y)\rho(x,y)dxdy+\mathrm{Ent}(\rho)\\ &=\int c(x,y)\rho(x,y)dxdy-\int\left(c(x,y)+a(x)+a(y)\right)\rho(x,y)dxdy\\ &=-2\int_{0}^{1}a(x)dx,\end{split}

where the final equality is due to the fact that ρΠ(μ,μ)\rho\in\Pi(\mu,\mu).

Mukherjee [Muk16, Theorem 1.5] shows that the log-partition function has the following large deviation limit

limn1nlogLn=Γ0.\lim_{n\rightarrow\infty}\frac{1}{n}\log L_{n}=-\Gamma_{0}.

To compare our notation with that of [Muk16], note that θ=1\theta=1, their f=cf=-c, =Π(μ,μ)\mathcal{M}=\Pi(\mu,\mu), D(||u)=Ent()D(\cdot||u)=\mathrm{Ent}(\cdot), Zn(f,θ)=log(n!Ln)Z_{n}(f,\theta)=\log(n!L_{n}) and Zn(0)=logn!Z_{n}(0)=\log n!.

Hence, it makes sense to consider limnenΓ0Ln\lim_{n\rightarrow\infty}e^{n\Gamma_{0}}L_{n}. Towards that goal, define

(6) Dn:=1n!σ𝕊ni=1nρ(i/n,σi/n)=1n!σ𝕊nexp(i=1nc(i/n,σi/n)2i=1na(i/n))=Lnexp(2i=1na(i/n))Lnexp(2n01a(x)𝑑x)=enΓ0Ln.\begin{split}D_{n}&:=\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\prod_{i=1}^{n}\rho(i/n,\sigma_{i}/n)\\ &=\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\exp\left(-\sum_{i=1}^{n}c(i/n,\sigma_{i}/n)-2\sum_{i=1}^{n}a(i/n)\right)=L_{n}\exp\left(-2\sum_{i=1}^{n}a(i/n)\right)\\ &\approx L_{n}\exp\left(-2n\int_{0}^{1}a(x)dx\right)=e^{n\Gamma_{0}}L_{n}.\end{split}

The \approx in the middle can be quantified as the discretization error in the Riemann sum approximation. In fact, assuming that aa is twice continuously differentiable, we get

01a(x)𝑑x1ni=1na(i/n)=i=1n(i1)/ni/n(a(x)a(i/n))𝑑x=i=1na(i/n)(i1)/ni/n(xi/n)𝑑x+O(i=1n(i1)/ni/n(xi/n)2𝑑x)=12n2i=1na(i/n)+O(1n2).\begin{split}\int_{0}^{1}a(x)dx&-\frac{1}{n}\sum_{i=1}^{n}a(i/n)=\sum_{i=1}^{n}\int_{(i-1)/n}^{i/n}\left(a(x)-a(i/n)\right)dx\\ &=\sum_{i=1}^{n}a^{\prime}(i/n)\int_{(i-1)/n}^{i/n}(x-i/n)dx+O\left(\sum_{i=1}^{n}\int_{(i-1)/n}^{i/n}(x-i/n)^{2}dx\right)\\ &=\frac{1}{2n^{2}}\sum_{i=1}^{n}a^{\prime}(i/n)+O\left(\frac{1}{n^{2}}\right).\end{split}

How can we guarantee that aa is twice continuously differentiable? This follows from the assumed twice continuous differentiability of cc and the integral equation (5). Hence,

limnn[01a(x)𝑑x1ni=1na(i/n)]=1201a(x)𝑑x=a(1)a(0)2.\lim_{n\rightarrow\infty}n\left[\int_{0}^{1}a(x)dx-\frac{1}{n}\sum_{i=1}^{n}a(i/n)\right]=\frac{1}{2}\int_{0}^{1}a^{\prime}(x)dx=\frac{a(1)-a(0)}{2}.

Now, due to our assumption c(x,y)=c(1x,1y)c(x,y)=c(1-x,1-y), we must have a(0)=a(1)a(0)=a(1). This is because the invariance of μ\mu under the map x1xx\mapsto 1-x. Hence, if ρ(x,y)\rho(x,y) is the Schrödinger bridge, so is ρ(1x,1y)\rho(1-x,1-y) due to the uniqueness of the solution of the strictly convex optimization problem (3).

Hence, from (6),

(7) limnenΓ0Ln=limnDn.\lim_{n\rightarrow\infty}e^{n\Gamma_{0}}L_{n}=\lim_{n\rightarrow\infty}D_{n}.

This is what we will evaluate below.

The first step is to consider the Markov integral operator corresponding to the probability density ρ\rho and derive its spectral decomposition. Consider the separable Hilbert space =L2[0,1]\mathcal{H}=L^{2}[0,1]. Define the integral operator: for uu\in\mathcal{H},

Tu(x):=01u(y)ρ(x,y)𝑑y.Tu(x):=\int_{0}^{1}u(y)\rho(x,y)dy.

Clearly, for any other vv\in\mathcal{H}, by Fubini’s Theorem and the symmetry of ρ\rho,

01v(x)Tu(x)𝑑x=0101v(x)ρ(x,y)u(y)𝑑y𝑑x=01u(y)[01v(x)ρ(y,x)𝑑x]𝑑y=01u(y)Tv(y)𝑑y.\begin{split}\int_{0}^{1}v(x)Tu(x)dx&=\int_{0}^{1}\int_{0}^{1}v(x)\rho(x,y)u(y)dydx=\int_{0}^{1}u(y)\left[\int_{0}^{1}v(x)\rho(y,x)dx\right]dy\\ &=\int_{0}^{1}u(y)Tv(y)dy.\end{split}

In particular, TT is a self-adjoint linear operator on the separable Hilbert space \mathcal{H}. Since this is a Hilbert-Schmidt operator, it is also compact. Hence, it admits a spectral decomposition. In particular, there exists a countable sequence of eigenvalues (λn,n)(\lambda_{n},\;n\in\mathbb{N}) and their corresponding eigenfunctions (with multiplicities) such that

ρ(x,y)=1+n=1λnϕn(x)ϕn(y).\rho(x,y)=1+\sum_{n=1}^{\infty}\lambda_{n}\phi_{n}(x)\phi_{n}(y).

All the eigenvalues are real since TT is self-adjoint. In fact, ITI-T is a nonnegative operator, where II is the identity operator. Thus all the eigenvalues of TT lie in the interval [1,1][-1,1]. The eigenfunction corresponding to eigenvalue 11 is the constant function ϕ1(x)1\phi_{1}(x)\equiv 1.

Assumption 1.

Assume that there is a positive spectral gap, i.e.,

(8) σ2=(1maxi1λi2)(0,1).\sigma^{2}=\left(1-\max_{i\geq 1}\lambda^{2}_{i}\right)\in(0,1).
Assumption 2.

Assume that each eigenfunction ϕn\phi_{n} is Lipschitz continuous on [0,1][0,1].

Consider TT as a self-adjoint operator on 1\mathcal{H}_{1}, the subspace of \mathcal{H} that is orthogonal to the constant functions. Since T2T^{2} is a trace class operator on the Hilbert space 1\mathcal{H}_{1}, the Fredholm determinant of IT2I-T^{2} exists and is given by the absolutely convergent infinite product

detF(IT2)=n=1(1λn2).\mathrm{det}_{\mathrm{F}}(I-T^{2})=\prod_{n=1}^{\infty}(1-\lambda^{2}_{n}).

See [vN22, Definition 14.35, Theorem 14.44].

Conjecture. Our main conjecture is that, under Assumptions 1 and 2, the following limit holds.

(9) limnDn=C=1detF(IT2).\lim_{n\rightarrow\infty}D_{n}=C=\frac{1}{\sqrt{\mathrm{det}_{\mathrm{F}}(I-T^{2})}}.

Hence, by (7), limnenΓ0Ln=(detF(IT2))1/2\lim_{n\rightarrow\infty}e^{n\Gamma_{0}}L_{n}=\left(\mathrm{det}_{\mathrm{F}}(I-T^{2})\right)^{-1/2} which completes our aim outlined in (2).

Although we will give a partial proof towards this conjecture in the next section, let us provide some intuition why such a limit should be true. In [HLP20, Theorem 2], the present author and coauthors proved a similar but more complex limit. The relationship between this paper and that one may be explained in the following way. If we consider LnL_{n} in (1) as a function of the empirical distribution μ^n:=1ni=1nδi/n\hat{\mu}_{n}:=\frac{1}{n}\sum_{i=1}^{n}\delta_{i/n}, in [HLP20] we consider a simlar function of the empirical distribution μ~n:=1ni=1nδXi\tilde{\mu}_{n}:=\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}, where X1,X2,X_{1},X_{2},\ldots is a sequence of i.i.d. Uni(0,1)(0,1) random variables. In the latter case, the limit limnDn\lim_{n\rightarrow\infty}D_{n} is random and belongs to the class of second order Gaussian chaos as shown in [HLP20, Theorem 2]. The reason we get the Gaussian chaos is due to the limiting Gaussian fluctuation between μ~n\tilde{\mu}_{n} and Uni(0,1)(0,1) as established by standard empirical process theory. There is, of course, no limiting Gaussian fluctuation for the difference between μ^n\hat{\mu}_{n} and Uni(0,1)(0,1). Hence one may assume that the limiting Gaussian random variables all have zero variance. If we plug this in [HLP20, Theorem 2] and simplify to our case at hand we get (9).

Although this connection has been pointed out in the introduction of [HLP20], that proof simply cannot cover this case due to the lack of randomness. The difference between the two set-ups may be explained by the following analogy. Whereas the proof in [HLP20] can generalize to sampling with replacement from the finite set (i/n,i[n])(i/n,\;i\in[n]), our current set-up is about sampling without replacement. The combinatorics is much more involved which leads to our inability to completing the proof of the conjecture.

2. A partial proof of the conjectured limit

Let ρ~\tilde{\rho} be the kernel ρ1\rho-1. Then ρ~(x,y)=i=1λiϕi(x)ϕi(y)\tilde{\rho}(x,y)=\sum_{i=1}^{\infty}\lambda_{i}\phi_{i}(x)\phi_{i}(y) where the series converges in L2L^{2}. In particular, due to the marginal constraints,

(10) 01ρ~(x,y)𝑑y=0=01ρ~(z,w)𝑑z,\int_{0}^{1}\tilde{\rho}(x,y)dy=0=\int_{0}^{1}\tilde{\rho}(z,w)dz,

for x,wx,w in [0,1][0,1].

For any choice of (x1,,xn)(x_{1},\ldots,x_{n}) and any σ𝕊n\sigma\in\mathbb{S}_{n},

i=1nρ(xi,xσi)=i=1n(1+ρ~(xi,xσi))=1+A[n],AiAρ~(xi,xσi).\begin{split}\prod_{i=1}^{n}\rho(x_{i},x_{\sigma_{i}})&=\prod_{i=1}^{n}\left(1+\tilde{\rho}(x_{i},x_{\sigma_{i}})\right)=1+\sum_{A\subseteq[n],\;A\neq\emptyset}\prod_{i\in A}\tilde{\rho}(x_{i},x_{\sigma_{i}}).\end{split}

Hence,

Dn=1n!σ𝕊n[1+A[n],AiAρ~(i/n,σi/n)]=1+1n!σ𝕊nA[n],AiAρ~(i/n,σi/n)=1+1n!σ𝕊nr=1nA:|A|=riAρ~(i/n,σi/n)=1+1n!r=1nA:|A|=rσ𝕊niAρ~(i/n,σi/n)=1+(nr)!n!r=1n1i1<i2<<irn1j1j2jrnt=1rρ~(it/n,jt/n).\begin{split}D_{n}&=\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\left[1+\sum_{A\subseteq[n],\;A\neq\emptyset}\prod_{i\in A}\tilde{\rho}(i/n,\sigma_{i}/n)\right]\\ &=1+\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\sum_{A\subseteq[n],\;A\neq\emptyset}\prod_{i\in A}\tilde{\rho}(i/n,\sigma_{i}/n)\\ &=1+\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\sum_{r=1}^{n}\sum_{A:\left\lvert A\right\rvert=r}\prod_{i\in A}\tilde{\rho}(i/n,\sigma_{i}/n)\\ &=1+\frac{1}{n!}\sum_{r=1}^{n}\sum_{A:\left\lvert A\right\rvert=r}\sum_{\sigma\in\mathbb{S}_{n}}\prod_{i\in A}\tilde{\rho}(i/n,\sigma_{i}/n)\\ &=1+\frac{(n-r)!}{n!}\sum_{r=1}^{n}\sum_{1\leq i_{1}<i_{2}<\cdots<i_{r}\leq n}\sum_{1\leq j_{1}\neq j_{2}\neq\cdots\neq j_{r}\leq n}\prod_{t=1}^{r}\tilde{\rho}(i_{t}/n,j_{t}/n).\end{split}

Here, the condition {1j1j2jrn}\{1\leq j_{1}\neq j_{2}\neq\cdots\neq j_{r}\leq n\} means all the indices are distinct and in [n][n].

Fix KK\in\mathbb{N}. For all nKn\geq K, let

Dn,K:=1+(nr)!n!r=1K1i1<i2<irn1j1j2jrnt=1rρ~(it/n,jt/n).D_{n,K}:=1+\frac{(n-r)!}{n!}\sum_{r=1}^{K}\sum_{1\leq i_{1}<i_{2}<\ldots i_{r}\leq n}\sum_{1\leq j_{1}\neq j_{2}\neq\ldots\neq j_{r}\leq n}\prod_{t=1}^{r}\tilde{\rho}(i_{t}/n,j_{t}/n).

For LL\in\mathbb{N}, define

ρ~(L)(x,y)=l=1Lλlϕl(x)ϕl(y).\tilde{\rho}^{(L)}(x,y)=\sum_{l=1}^{L}\lambda_{l}\phi_{l}(x)\phi_{l}(y).

Finally, define

Dn,K(L):=1+(nr)!n!r=1K1i1<i2<<irn1j1j2jrnt=1rρ~(L)(it/n,jt/n)=1+(nr)!n!r=1K1i1<i2<<irn1j1j2jrnt=1r[l=1Lλlϕl(it/n)ϕl(jt/n)]=1+r=1K1l1,,lrLt=1rλlt×[(nr)!r!n!1i1irn1j1jrnt=1rϕlt(it/n)ϕlt(jt/n)].\begin{split}D_{n,K}^{(L)}&:=1+\frac{(n-r)!}{n!}\sum_{r=1}^{K}\sum_{1\leq i_{1}<i_{2}<\cdots<i_{r}\leq n}\sum_{1\leq j_{1}\neq j_{2}\neq\cdots\neq j_{r}\leq n}\prod_{t=1}^{r}\tilde{\rho}^{(L)}(i_{t}/n,j_{t}/n)\\ &=1+\frac{(n-r)!}{n!}\sum_{r=1}^{K}\sum_{1\leq i_{1}<i_{2}<\cdots<i_{r}\leq n}\sum_{1\leq j_{1}\neq j_{2}\neq\cdots\neq j_{r}\leq n}\prod_{t=1}^{r}\left[\sum_{l=1}^{L}\lambda_{l}\phi_{l}(i_{t}/n)\phi_{l}(j_{t}/n)\right]\\ &=1+\sum_{r=1}^{K}\sum_{1\leq l_{1},\ldots,l_{r}\leq L}\prod_{t=1}^{r}\lambda_{l_{t}}\times\\ &\left[\frac{(n-r)!}{r!n!}\sum_{1\leq i_{1}\neq\cdots\neq i_{r}\leq n}\sum_{1\leq j_{1}\neq\cdots\neq j_{r}\leq n}\prod_{t=1}^{r}\phi_{l_{t}}(i_{t}/n)\phi_{l_{t}}(j_{t}/n)\right].\end{split}

Recall that the sum over indices i1iri_{1}\neq\cdots\neq i_{r} means that all the indices are distinct.

Now fix a vector (l1,l2,,lr)(l_{1},l_{2},\ldots,l_{r}) in [L]r[L]^{r}. Let ala_{l} denote the frequency of appearance of l[L]l\in[L] in this sequence. Then each al0a_{l}\geq 0 and assume l=1Lal=rK\sum_{l=1}^{L}a_{l}=r\leq K. Since, for fixed rr,

(nr)!n!1nr,\frac{(n-r)!}{n!}\approx\frac{1}{n^{r}},

consider the normalized inner sum

(11) 1nr1i1irn1j1jrnt=1rϕlt(it/n)ϕlt(jt/n)=[1nr/21i1irnt=1rϕlt(it/n)]2\begin{split}\frac{1}{n^{r}}&\sum_{1\leq i_{1}\neq\cdots\neq i_{r}\leq n}\sum_{1\leq j_{1}\neq\cdots\neq j_{r}\leq n}\prod_{t=1}^{r}\phi_{l_{t}}(i_{t}/n)\phi_{l_{t}}(j_{t}/n)\\ &=\left[\frac{1}{n^{r/2}}\sum_{1\leq i_{1}\neq\cdots\neq i_{r}\leq n}\prod_{t=1}^{r}\phi_{l_{t}}(i_{t}/n)\right]^{2}\end{split}

The claim is unless al{0,2}a_{l}\in\{0,2\} for all l[L]l\in[L], the contributions of the corresponding terms in the normalized sum inside the square converge to zero as nn\rightarrow\infty.

To see this let us introduce a sequence of i.i.d. Uni(0,1)(0,1) random variables (U1,U2,)(U_{1},U_{2},\ldots). For nn\in\mathbb{N}, define

Ui(n):=1nnUi.U_{i}^{(n)}:=\frac{1}{n}\lceil nU_{i}\rceil.

Then, for each nn, (U1(n),Un(2),)\left(U_{1}^{(n)},U^{(2)}_{n},\ldots\right) is a sequence of i.i.d. discrete Uni[n][n] random variables, and obviously, |Ui(n)Ui|1/n\left\lvert U_{i}^{(n)}-U_{i}\right\rvert\leq 1/n.

Note

1nr1i1irnt=1rϕlt(it/n)=E[(t=1rϕlt(Ut(n)))1{U1(n)U2(n)Ur(n)}].\frac{1}{n^{r}}\sum_{1\leq i_{1}\neq\cdots\neq i_{r}\leq n}\prod_{t=1}^{r}\phi_{l_{t}}(i_{t}/n)=\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}\left(U^{(n)}_{t}\right)\right)1\left\{U^{(n)}_{1}\neq U^{(n)}_{2}\neq\cdots\neq U^{(n)}_{r}\right\}\right].

Thus, to analyze the limit of (11), it suffices to analyze nr/2n^{r/2} times the RHS.

Consider the following events Fij(n)={Ui(n)=Uj(n)}F^{(n)}_{ij}=\{U^{(n)}_{i}=U^{(n)}_{j}\}, for i<ji<j\in\mathbb{N}. Then

1i<jrFij(n)={U1(n)U2(n)Ur(n)}c.\cup_{1\leq i<j\leq r}F^{(n)}_{ij}=\{U^{(n)}_{1}\neq U^{(n)}_{2}\neq\cdots\neq U^{(n)}_{r}\}^{c}.

Thus

E[(t=1rϕlt(Ut(n)))1{U1(n)U2(n)Ur(n)}]=E[(t=1rϕlt(Ut(n)))]E[(t=1rϕlt(Ut(n)));1i<jrFij(n)].\begin{split}\mathrm{E}&\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right)1\{U^{(n)}_{1}\neq U^{(n)}_{2}\neq\cdots\neq U^{(n)}_{r}\}\right]\\ =&\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right)\right]-\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cup_{1\leq i<j\leq r}F^{(n)}_{ij}\right].\end{split}

Here, for any event AA and any integrable random variable YY in a probability space, E(Y;A):=E(Y1A)\mathrm{E}(Y;A):=\mathrm{E}(Y1_{A}).

Since every ϕi\phi_{i} is an eigenfunction orthogonal to 11, they satisfy E(ϕi(U1))=0\mathrm{E}(\phi_{i}(U_{1}))=0 and E(ϕi2(U1))=1\mathrm{E}(\phi_{i}^{2}(U_{1}))=1. Due to the spatial discreteness this is not going to be exactly true for Ui(n)U_{i}^{(n)}. However, by Assumption 2 on the Lipschitzness of eigenfunctions, it follows that

(12) E(ϕi(U1(n)))=O(1n),E[(t=1rϕlt(Ut(n)))]=t=1rE(ϕlt(Ut(n)))=O(1nr),\begin{split}\mathrm{E}\left(\phi_{i}(U^{(n)}_{1})\right)&=O\left(\frac{1}{n}\right),\\ \mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right)\right]&=\prod_{t=1}^{r}\mathrm{E}\left(\phi_{l_{t}}(U^{(n)}_{t})\right)=O\left(\frac{1}{n^{r}}\right),\end{split}

where the constant in O(1/n)O(1/n) can be chosen uniformly for any finite collection of eigenfunctions. In the calculation below, every time we encounter an expression as in (12) we will ignore the O(1/n)O(1/n) term and put zero instead. This is simply for the clarity of the combinatorial expressions. Because LL and KK are both finite, only finitely many eigenfunctions ever get used and the constant in O(1/n)O(1/n) remains uniformly bounded. The primary reason why we failed to complete this proof is because we cannot suitably estimate this error when LL and KK are not bounded. Nevertheless, with this convenient abuse of notation,

nr/2E[(t=1rϕlt(Ut(n)))1{U1(n)U2(n)Ur(n)}]=nr/2E[(t=1rϕlt(Ut(n)));1i<jrFij(n)],\begin{split}n^{r/2}\mathrm{E}&\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right)1\{U^{(n)}_{1}\neq U^{(n)}_{2}\neq\cdots\neq U^{(n)}_{r}\}\right]\\ &=-n^{r/2}\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cup_{1\leq i<j\leq r}F^{(n)}_{ij}\right],\end{split}

where, let us repeat again, we have ignored an O(nr/2)O(n^{-r/2}) error.

On the other hand, by the inclusion-exclusion principle, and by utilizing exchangeability,

E[(t=1rϕlt(Ut(n)));1i<jrFij(n)]=k=1r(r1)/2(1)k1E[(t=1rϕlt(Ut(n)));intersection of k many Fij(n)s],\begin{split}\mathrm{E}&\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cup_{1\leq i<j\leq r}F^{(n)}_{ij}\right]\\ &=\sum_{k=1}^{r(r-1)/2}(-1)^{k-1}\sum\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\text{intersection of $k$ many $F^{(n)}_{ij}$s}\right],\end{split}

where the inner sum is over all choices of kk many Fij(n)F^{(n)}_{ij}s. Fix (i1,j1),,(ik,jk)(i_{1},j_{1}),\ldots,(i_{k},j_{k}). Then

m=1kFimjm(n)={Ui1(n)=Uj1(n),,Uik(n)=Ujk(n)}.\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}=\{U^{(n)}_{i_{1}}=U^{(n)}_{j_{1}},\ldots,U^{(n)}_{i_{k}}=U^{(n)}_{j_{k}}\}.

We now make the following observations. The above constraint gives a partition of [r][r], where, for each block of the partition, the discrete uniform random variables corresponding to the indices in that block take the same value.

  • If the partition contains a block that is a singleton, i.e. m=1k{im,jm}[r]\cup_{m=1}^{k}\{i_{m},j_{m}\}\neq[r], there will be some Ut(n)U^{(n)}_{t} which has no constraint and is, therefore, independent of the other uniform random variables. Let the number of singletons be rrr-r^{\prime}, for some r{0}[r1]r^{\prime}\in\{0\}\cup[r-1]. Then, by (12), independence and exchangeability,

    E[(t=1rϕlt(Ut(n)));m=1kFimjm(n)]=O(1nrr)E[(t=1rϕlt(Ut(n)));m=1kFimjm(n)].\begin{split}\mathrm{E}&\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}\right]\\ =&O\left(\frac{1}{n^{r-r^{\prime}}}\right)\mathrm{E}\left[\left(\prod_{t=1}^{r^{\prime}}\phi_{l_{t}}(U^{(n)}_{t})\right);\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}\right].\end{split}

    The cases below will show that nr/2n^{r^{\prime}/2} times the expectation on the RHS remains bounded, as nn\rightarrow\infty. Hence nr/2n^{r/2} times the expectation on the LHS goes to zero as nn\rightarrow\infty whenever r<rr^{\prime}<r. Hence, asymptotically, the only non-zero terms come from partitions of [r][r] that do not contain any singleton blocks.

  • Now consider the case where every block in the partition is of size two. Such partitions are in correspondence with perfect matchings of the complete graph KrK_{r}. In particular, r=2kr=2k must be even. For any such perfect matching, say Ui1(n)=Uj1(n)U^{(n)}_{i_{1}}=U^{(n)}_{j_{1}}, Ui2(n)=Uj2(n)U^{(n)}_{i_{2}}=U^{(n)}_{j_{2}}, etc.,

    E[(t=1rϕlt(Ut(n)));m=1kFimjm(n)]=1nr/2m=1r/2E(ϕlim(Uim(n))ϕljm(Uim(n))).\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}\right]=\frac{1}{n^{r/2}}\prod_{m=1}^{r/2}\mathrm{E}\left(\phi_{l_{i_{m}}}(U^{(n)}_{i_{m}})\phi_{l_{j_{m}}}(U^{(n)}_{i_{m}})\right).

    The above product is zero in all cases, except when lim=ljml_{i_{m}}=l_{j_{m}}, for all m[r/2]m\in[r/2], in which case the product is one. This is due to the orthonormal property of the eigenfunctions. Both claims hold up to a smaller discretization error as in (12). When multiplied by nr/2n^{r/2}, each such expectation gives an O(1)O(1) term, and there are only finitely many matchings of KrK_{r}, rKr\leq K.

  • Finally, consider the contribution of a partition of [r][r] that contains a block of size 33 or more and no singletons. Then, obviously, P(m=1kFimjm(n))=o(1nr/2)P\left(\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}\right)=o\left(\frac{1}{n^{r/2}}\right). Since the eigenfunctions are uniformly bounded for rKr\leq K,

    E[(t=1rϕlt(Ut(n)));m=1kFimjm(n)]=o(1nr/2).\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}\right]=o\left(\frac{1}{n^{r/2}}\right).

Since rKr\leq K and there are only finitely many partitions of [r][r],

E[(t=1rϕlt(Ut));intersection of k many Fij(n)s]=o(1nr/2)\sum\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U_{t})\right);\text{intersection of $k$ many $F^{(n)}_{ij}$s}\right]=o\left(\frac{1}{n^{r/2}}\right)

for all rr odd and all kr/2k\neq r/2, when rr is even. The only remaining case is when rr is even and k=r/2k=r/2. This is the computation done in the second bulleted item in the itemized list above. The limiting contribution of each term in that case is either zero or one as shown. Hence the total contribution is the number of terms that contribute one.

To find this number, let ai{0}a_{i}\in\mathbb{N}\cup\{0\} denote the number of times lil_{i} appears in the sequence (l1,,lr)(l_{1},\ldots,l_{r}). If any aia_{i} is odd, there is no matching of KrK_{r} that matches all lil_{i}s to themselves and then the sum is zero. If all aia_{i}s are even, the only partitions of rr that have nonzero contributions are precisely those that belong to the direct product of the set of perfect matchings of the complete graph KaiK_{a_{i}}. Hence, the number of such terms is the product of the number of perfect matchings of KaiK_{a_{i}}s. Thus, in this only remaining case,

E[(t=1rϕlt(Ut));intersection of k many Fijs]=1nr/2i=1(ai1)!!,\sum\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U_{t})\right);\text{intersection of $k$ many $F_{ij}$s}\right]=\frac{1}{n^{r/2}}\prod_{i=1}^{\infty}(a_{i}-1)!!,

where, by convention, (01)!!=1(0-1)!!=1. Let ai:=2bia_{i}:=2b_{i}; the above may also be written as

1nr/2i=1(2bi1)!!.\frac{1}{n^{r/2}}\prod_{i=1}^{\infty}(2b_{i}-1)!!.

Combining all these terms,

(13) Dn,K(L)=1+r=1,evenK1r!i=1Lλi2bi(i=1L(2bi1)!!)2+o(1),as n,\begin{split}D_{n,K}^{(L)}=1+\sum_{r=1,\;\text{even}}^{K}\frac{1}{r!}\sum\prod_{i=1}^{L}\lambda_{i}^{2b_{i}}\left(\prod_{i=1}^{L}(2b_{i}-1)!!\right)^{2}+o(1),\;\;\text{as $n\rightarrow\infty$},\end{split}

where the inner sum is over all sequences (l1,,lr)[L]r(l_{1},\ldots,l_{r})\in[L]^{r} and the nonnegative integers (2bi,i[L])(2b_{i},\;i\in[L]) such that 2i=1Lbi=r2\sum_{i=1}^{L}b_{i}=r record the frequency of appearance of ii in the sequence.

We now recall a property of Hermite polynomials, i.e., the number of perfect matchings of KnK_{n} is exactly Hn(0)H_{n}(0) which is the value at zero for the nnth Hermite polynomial

Hn(x)=(1)nex2/2dndxxex2/2.H_{n}(x)=(-1)^{n}e^{x^{2}/2}\frac{d^{n}}{dx^{x}}e^{-x^{2}/2}.

See [HL72, eqn. (3.14)] for a more general identity involving moment polynomials. Also see [God81] for similar identities involving more general graphs.

Thus, one may also write (13) as

(14) Dn,K(L)=1+r=1,evenK1r!(i=1LλibiH2bi(0))2+on(1),as n,D_{n,K}^{(L)}=1+\sum_{r=1,\;\text{even}}^{K}\frac{1}{r!}\sum\left(\prod_{i=1}^{L}\lambda_{i}^{b_{i}}H_{2b_{i}}(0)\right)^{2}+o_{n}(1),\;\;\text{as $n\rightarrow\infty$},

where, as before, the inner sum is over all sequences (l1,,lr)[L]r(l_{1},\ldots,l_{r})\in[L]^{r} and the nonnegative integers (2bi,i[L])(2b_{i},\;i\in[L]) such that 2i=1Lbi=r2\sum_{i=1}^{L}b_{i}=r record the frequency of appearance of ii in the sequence.

Consider any permutation symmetric function f:[L]rf:[L]^{r}\rightarrow\mathbb{R}. For every choice of nonnegative integers 𝐚:=(a1,,aL)\mathbf{a}:=(a_{1},\ldots,a_{L}) such that i=1Lai=r\sum_{i=1}^{L}a_{i}=r, let Γ(a1,,aL)\Gamma(a_{1},\ldots,a_{L}) denote all subsets of [L]r[L]^{r} such that i[L]i\in[L] appears exactly aia_{i} times. Pick a representation element 𝐚Γ(a1,,aL)\ell_{\mathbf{a}}\in\Gamma(a_{1},\ldots,a_{L}). It then follows easily that

1r!(l1,,lr)[L]rf(l1,,lr)=(a1,,aL)1a1!aL!f(𝐚).\frac{1}{r!}\sum_{(l_{1},\ldots,l_{r})\in[L]^{r}}f(l_{1},\ldots,l_{r})=\sum_{(a_{1},\ldots,a_{L})}\frac{1}{a_{1}!\ldots a_{L}!}f(\ell_{\mathbf{a}}).

Thus, from (14),

(15) limnDn,K(L)=DK(L):=1+r=1,evenK1(2b1)!(2bL)!(i=1LλibiH2bi(0))2,\lim_{n\rightarrow\infty}D_{n,K}^{(L)}=D_{K}^{(L)}:=1+\sum_{r=1,\;\text{even}}^{K}\sum\frac{1}{(2b_{1})!\ldots(2b_{L})!}\left(\prod_{i=1}^{L}\lambda_{i}^{b_{i}}H_{2b_{i}}(0)\right)^{2},

where now the inner sum is over all choices of nonnegative integers (b1,,bL)(b_{1},\ldots,b_{L}) such that 2i=1Lbi=r2\sum_{i=1}^{L}b_{i}=r.

Note that, at least formally,

limKDK(L)=D(L):=1+i=1Lλi2bi(2bi)!(H2bi(0))2.\begin{split}\lim_{K\rightarrow\infty}D_{K}^{(L)}&=D^{(L)}:=1+\sum\prod_{i=1}^{L}\frac{\lambda_{i}^{2b_{i}}}{(2b_{i})!}\left(H_{2b_{i}}(0)\right)^{2}.\end{split}

where now the sum is over all choices of nonnegative integers (b1,,bL)(b_{1},\ldots,b_{L}). The fact that the expression on the RHS is finite (and therefore the limit exists) is a consequence of the multilinear Mehler formula for the Hermite polynomials. See [Foa81, eqn. (2.4)] for a choice of n=Ln=L, SnS_{n} to be the L×LL\times L zero matrix, the diagonal matrix DnD_{n} to have the diagonal vector (λ12,,λL2)(\lambda^{2}_{1},\ldots,\lambda^{2}_{L}) and the indeterminate vectors y,zy,z to both be the zero vector. For this choice, in their notation, the only symmetric matrices NN that will contribute nonzero terms must be diagonal with νii=ai\nu_{ii}=a_{i} equal to our 2bi2b_{i}. Hence, by [Foa81, eqn. (2.4)] the following limit exists and is given by the simple determinantal expression

D(L)=1i=1L1λi2.D^{(L)}=\frac{1}{\prod_{i=1}^{L}\sqrt{1-\lambda^{2}_{i}}}.

Note that this is finite by our assumption on the positive spectral gap (8). Therefore,

limLD(L):=1i=11λi2=1detF(IT2).\lim_{L\rightarrow\infty}D^{(L)}:=\frac{1}{\prod_{i=1}^{\infty}\sqrt{1-\lambda^{2}_{i}}}=\frac{1}{\sqrt{\mathrm{det}_{\mathrm{F}}\left(I-T^{2}\right)}}.

The relation of the above with the limit conjectured in (9) is now obvious. What we have shown is that

limK,LlimnDn,K(L)=1detF(IT2),\lim_{K,L\rightarrow\infty}\lim_{n\rightarrow\infty}D^{(L)}_{n,K}=\frac{1}{\sqrt{\mathrm{det}_{\mathrm{F}}\left(I-T^{2}\right)}},

while what we need to show is

limnlimK,LDn,K(L)=1detF(IT2).\lim_{n\rightarrow\infty}\lim_{K,L\rightarrow\infty}D^{(L)}_{n,K}=\frac{1}{\sqrt{\mathrm{det}_{\mathrm{F}}\left(I-T^{2}\right)}}.

The interchange of limits requires a uniform error bound. In [HLP20] such an error bound has been proved for that set-up. But the argument does not extend to this case because the combinatorics is different. However, it is reasonable to guess that a more careful combinatorics will provide us with the error bound to establish (9).

3. Acknowledgment

I am very grateful to the editors for giving me an opportunity to contribute towards IJPAM volume celebrating the memory of Prof. K. R. Parthasarathy. May his work continue to inspire many further generations of Indian probabilists.

References

  • [Csi75] I. Csiszar, I-divergence geometry of probability distributions and minimization problems, Ann. Probab. 3 (1975), 146–158.
  • [Dia88] P. Diaconis, Group representations in probability and statistics, Institute of Mathematics Statistics Lecture Notes – Monograph Series, vol. 11, IMS, Hayward, CA, 1988.
  • [DR00] Persi Diaconis and Arun Ram, Analysis of systematic scan Metropolis algorithms using Iwahori-Hecke algebra techniques., Michigan Mathematical Journal 48 (2000), no. 1, 157 – 190.
  • [Foa81] Dominique Foata, Some Hermite polynomial identities and their combinatorics, Advances in Applied Mathematics 2 (1981), no. 3, 250–259.
  • [God81] C. D. Godsil, Hermite polynomials and a duality relationship for matching polynomials, Combinatorica 1 (1981), no. 3, 257–262.
  • [HL72] Ole J. Heilmann and Elliott H. Lieb, Theory of monomer-dimer systems, Commun. math. Phys. 25 (1972), 190–232.
  • [HLP20] Z. Harchaoui, L. Liu, and S. Pal, Asymptotics of discrete Schrödinger bridges via chaos decomposition, To appear in Bernoulli. Preprint available at arxiv.org/abs/2011.08963., 2020.
  • [KKRW20] R. Kenyon, D. Král’, C. Radin, and P. Winkler, Permutations with fixed pattern densities, Random Structures and Algorithms 56 (2020).
  • [Léo12] Christian Léonard, From the Schrödinger problem to the Monge-Kantorovich problem, Journal of Functional Analysis 262 (2012), no. 4, 1879–1920.
  • [Mal57] C. L. Mallows, Non-null ranking models. I., Biometrika 44 (1957), 114–130.
  • [Muk16] Sumit Mukherjee, Estimation in exponential families on permutations, The Annals of Statistics 44 (2016), no. 2, 853–875.
  • [RT93] L. Rüschendorf and W. Thomsen, Note on the Schrödinger equation and I-projections, Statistics and Probability Letters 17 (1993), 369–375.
  • [Sta09] Shannon Starr, Thermodynamic limit for the Mallows model on SnS_{n} , Journal of Mathematical Physics 50 (2009), no. 9, 095208.
  • [vN22] J. van Neerven, Functional analysis, Cambridge Studies in Advanced Mathematics, Cambridge University Press, 2022.