Limiting partition function for the Mallows model: a conjecture and partial evidence

Soumik Pal Department of Mathematics
University of Washington
Seattle, WA 98195 soumik@uw.edu

Abstract.

Let $\mathbb{S}_{n}$ denote the set of permutations of $n$ labels. We consider a class of Gibbs probability models on $\mathbb{S}_{n}$ that is a subfamily of the so-called Mallows model of random permutations. The Gibbs energy is given by a class of right invariant divergences on $\mathbb{S}_{n}$ that includes common choices such as the Spearman foot rule and the Spearman rank correlation. Mukherjee [Muk16] computed the limit of the (scaled) log partition function (i.e. normalizing factor) of such models as $n\rightarrow\infty$ . Our objective is to compute the exact limit, as $n\rightarrow\infty$ , without the log. We conjecture that this limit is given by the Fredholm determinant of an integral operator related to the so-called Schrödinger bridge probability distributions from optimal transport theory. We provide partial evidence for this conjecture, although the argument lacks a final error bound that is needed for it to become a complete proof.

Key words and phrases:

Mallows model, random permutation, Schrödinger bridge, Fredholm determinant

2000 Mathematics Subject Classification:

60B15, 60C99

This research is partially supported by NSF grant DMS-2052239, DMS-2134012 and the PIMS Research Network grant Kantorovich Initiative.

1. Introduction

Let $c:[0,1]^{2}\rightarrow[0,\infty)$ denote a cost function satisfying the following assumptions

•

$c$ is twice continuously differentiable on $[0,1]^{2}$ .
•

$c(x,x)=0$ for all $x\in[0,1]$ .
•

$c$ is symmetric, i.e. $c(x,y)=c(y,x)$ for all $(x,y)\in[0,1]^{2}$ .
•

$c(x,y)=c(1-x,1-y)$ for all $(x,y)\in[0,1]^{2}$ .

An example of such a cost function is $c(x,y)=(x-y)^{2}$ . The first of these assumptions is made for technical convenience as will be apparent below. No attempt has been made to get the optimal set of assumptions.

Fix $n\in\mathbb{N}$ . Let $\mathbb{S}_{n}$ denote the set of all permutations of $n$ labels $[n]:=\{1,2,\ldots,n\}$ . Consider the following quantity

(1)

L_{n}=\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\exp\left(-\sum_{i=1}^{n}c(i/n,\sigma_{i}/n)\right).

We are interested in the limit of this sequence as $n\rightarrow\infty$ . The reason it comes up is that this is the partition function of a family of probability distributions which is a subset of the well-known Mallows models [Mal57] of random permutations. See the Introduction in [Muk16] and many applications listed in [Dia88, Chapters 5 and 6]. For example, the case of $c(x,y)=(x-y)^{2}$ is related to the Spearman rank correlation.

Our goal in this paper is to understand $\lim_{n\rightarrow\infty}L_{n}$ . This problem is important in statistical estimation [Muk16] and also to understand scaling limits of large random permutations with fixed patterns [KKRW20]. See also [DR00, Section 2e] for generalizations to other groups where the importance of this problem is stressed. We will try to convince the reader that there are constants $\Gamma_{0}$ and $C$ such that

(2)

\lim_{n\rightarrow\infty}e^{n\Gamma_{0}}L_{n}=C.

The value of the constant $\Gamma_{0}$ is already known due to [Muk16] and is the value of an entropy-regularized optimal transport problem with uniform marginals. See also [Sta09] for a special case of a discontinuous cost function. We conjecture in this paper, and give partial evidence, that, under suitable assumptions, the constant $C$ is the Fredholm determinant [vN22, Definition 14.35] of a certain integral operator related to the so-called Schrödinger bridge, the optimal coupling for the same entropy-regularized optimal transport problem. Both these concepts are described below. Taken together, they give the limiting partition function of this class of Mallows models that satisfy all our assumptions.

The concept of entropy-regularized optimal transport and the related notion of Schrödinger bridges can be found in [Léo12]. Let $\mu$ denote the Uni $(0,1)$ distribution. Let $\Pi(\mu,\mu)$ denote the set of couplings (i.e., joint distributions) with both marginals $\mu$ . Then the entropic OT problem is given as the solution to the following optimization problem on $\Pi(\mu,\mu)$ :

(3)

\Gamma_{0}:=\inf_{\xi\in\Pi(\mu,\mu)}\left[\int c(x,y)\xi(x,y)dxdy+\mathrm{Ent}(\xi)\right],

where $\mathrm{Ent}(\cdot)$ is the optimal transport entropy (the negative of the usual differential Shannon entropy) given by $\mathrm{Ent}(\xi)=\int\xi(x,y)\log\xi(x,y)dxdy$ if $\xi$ has a density (also denoted by $\xi$ ) and infinity otherwise.

The optimal $\rho\in\Pi(\mu,\mu)$ that attains $\Gamma_{0}$ exists and is called the (static) Schrödinger bridge for the cost $c$ and marginals $\mu$ and $\mu$ . From the work of Rüschendorf and Thomsen [RT93] (building on Csiszar [Csi75]) it is known that the Schrödinger bridge always admits a density is of the following form

(4)

\rho(x,y)=\exp\left(-c(x,y)-a(x)-a(y)\right),

for some measurable function $a$ satisfying the following marginal constraint almost surely.

(5)

\int_{0}^{1}e^{-c(x,y)-a(y)}dy=e^{-a(x)},\quad\text{for}\;x\in[0,1].

Please note that we are using a standard abuse of the notation by referring to both the measure and its density by the letter $\rho$ .

In particular, $\rho$ is symmetric in its argument (since both $c$ and the marginal constraints are symmetric in the coordinates) and

\begin{split}\Gamma_{0}&=\int c(x,y)\rho(x,y)dxdy+\mathrm{Ent}(\rho)\\ &=\int c(x,y)\rho(x,y)dxdy-\int\left(c(x,y)+a(x)+a(y)\right)\rho(x,y)dxdy\\ &=-2\int_{0}^{1}a(x)dx,\end{split}

where the final equality is due to the fact that $\rho\in\Pi(\mu,\mu)$ .

Mukherjee [Muk16, Theorem 1.5] shows that the log-partition function has the following large deviation limit

\lim_{n\rightarrow\infty}\frac{1}{n}\log L_{n}=-\Gamma_{0}.

To compare our notation with that of [Muk16], note that $\theta=1$ , their $f=-c$ , $\mathcal{M}=\Pi(\mu,\mu)$ , $D(\cdot||u)=\mathrm{Ent}(\cdot)$ , $Z_{n}(f,\theta)=\log(n!L_{n})$ and $Z_{n}(0)=\log n!$ .

Hence, it makes sense to consider $\lim_{n\rightarrow\infty}e^{n\Gamma_{0}}L_{n}$ . Towards that goal, define

(6)

\begin{split}D_{n}&:=\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\prod_{i=1}^{n}\rho(i/n,\sigma_{i}/n)\\ &=\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\exp\left(-\sum_{i=1}^{n}c(i/n,\sigma_{i}/n)-2\sum_{i=1}^{n}a(i/n)\right)=L_{n}\exp\left(-2\sum_{i=1}^{n}a(i/n)\right)\\ &\approx L_{n}\exp\left(-2n\int_{0}^{1}a(x)dx\right)=e^{n\Gamma_{0}}L_{n}.\end{split}

The $\approx$ in the middle can be quantified as the discretization error in the Riemann sum approximation. In fact, assuming that $a$ is twice continuously differentiable, we get

\begin{split}\int_{0}^{1}a(x)dx&-\frac{1}{n}\sum_{i=1}^{n}a(i/n)=\sum_{i=1}^{n}\int_{(i-1)/n}^{i/n}\left(a(x)-a(i/n)\right)dx\\ &=\sum_{i=1}^{n}a^{\prime}(i/n)\int_{(i-1)/n}^{i/n}(x-i/n)dx+O\left(\sum_{i=1}^{n}\int_{(i-1)/n}^{i/n}(x-i/n)^{2}dx\right)\\ &=\frac{1}{2n^{2}}\sum_{i=1}^{n}a^{\prime}(i/n)+O\left(\frac{1}{n^{2}}\right).\end{split}

How can we guarantee that $a$ is twice continuously differentiable? This follows from the assumed twice continuous differentiability of $c$ and the integral equation (5). Hence,

\lim_{n\rightarrow\infty}n\left[\int_{0}^{1}a(x)dx-\frac{1}{n}\sum_{i=1}^{n}a(i/n)\right]=\frac{1}{2}\int_{0}^{1}a^{\prime}(x)dx=\frac{a(1)-a(0)}{2}.

Now, due to our assumption $c(x,y)=c(1-x,1-y)$ , we must have $a(0)=a(1)$ . This is because the invariance of $\mu$ under the map $x\mapsto 1-x$ . Hence, if $\rho(x,y)$ is the Schrödinger bridge, so is $\rho(1-x,1-y)$ due to the uniqueness of the solution of the strictly convex optimization problem (3).

Hence, from (6),

(7)

\lim_{n\rightarrow\infty}e^{n\Gamma_{0}}L_{n}=\lim_{n\rightarrow\infty}D_{n}.

This is what we will evaluate below.

The first step is to consider the Markov integral operator corresponding to the probability density $\rho$ and derive its spectral decomposition. Consider the separable Hilbert space $\mathcal{H}=L^{2}[0,1]$ . Define the integral operator: for $u\in\mathcal{H}$ ,

Tu(x):=\int_{0}^{1}u(y)\rho(x,y)dy.

Clearly, for any other $v\in\mathcal{H}$ , by Fubini’s Theorem and the symmetry of $\rho$ ,

\begin{split}\int_{0}^{1}v(x)Tu(x)dx&=\int_{0}^{1}\int_{0}^{1}v(x)\rho(x,y)u(y)dydx=\int_{0}^{1}u(y)\left[\int_{0}^{1}v(x)\rho(y,x)dx\right]dy\\ &=\int_{0}^{1}u(y)Tv(y)dy.\end{split}

In particular, $T$ is a self-adjoint linear operator on the separable Hilbert space $\mathcal{H}$ . Since this is a Hilbert-Schmidt operator, it is also compact. Hence, it admits a spectral decomposition. In particular, there exists a countable sequence of eigenvalues $(\lambda_{n},\;n\in\mathbb{N})$ and their corresponding eigenfunctions (with multiplicities) such that

\rho(x,y)=1+\sum_{n=1}^{\infty}\lambda_{n}\phi_{n}(x)\phi_{n}(y).

All the eigenvalues are real since $T$ is self-adjoint. In fact, $I-T$ is a nonnegative operator, where $I$ is the identity operator. Thus all the eigenvalues of $T$ lie in the interval $[-1,1]$ . The eigenfunction corresponding to eigenvalue $1$ is the constant function $\phi_{1}(x)\equiv 1$ .

Assumption 1.

Assume that there is a positive spectral gap, i.e.,

(8)

\sigma^{2}=\left(1-\max_{i\geq 1}\lambda^{2}_{i}\right)\in(0,1).

Assumption 2.

Assume that each eigenfunction $\phi_{n}$ is Lipschitz continuous on $[0,1]$ .

Consider $T$ as a self-adjoint operator on $\mathcal{H}_{1}$ , the subspace of $\mathcal{H}$ that is orthogonal to the constant functions. Since $T^{2}$ is a trace class operator on the Hilbert space $\mathcal{H}_{1}$ , the Fredholm determinant of $I-T^{2}$ exists and is given by the absolutely convergent infinite product

\mathrm{det}_{\mathrm{F}}(I-T^{2})=\prod_{n=1}^{\infty}(1-\lambda^{2}_{n}).

See [vN22, Definition 14.35, Theorem 14.44].

Conjecture. Our main conjecture is that, under Assumptions 1 and 2, the following limit holds.

(9)

\lim_{n\rightarrow\infty}D_{n}=C=\frac{1}{\sqrt{\mathrm{det}_{\mathrm{F}}(I-T^{2})}}.

Hence, by (7), $\lim_{n\rightarrow\infty}e^{n\Gamma_{0}}L_{n}=\left(\mathrm{det}_{\mathrm{F}}(I-T^{2})\right)^{-1/2}$ which completes our aim outlined in (2).

Although we will give a partial proof towards this conjecture in the next section, let us provide some intuition why such a limit should be true. In [HLP20, Theorem 2], the present author and coauthors proved a similar but more complex limit. The relationship between this paper and that one may be explained in the following way. If we consider $L_{n}$ in (1) as a function of the empirical distribution $\hat{\mu}_{n}:=\frac{1}{n}\sum_{i=1}^{n}\delta_{i/n}$ , in [HLP20] we consider a simlar function of the empirical distribution $\tilde{\mu}_{n}:=\frac{1}{n}\sum_{i=1}^{n}\delta_{X_{i}}$ , where $X_{1},X_{2},\ldots$ is a sequence of i.i.d. Uni $(0,1)$ random variables. In the latter case, the limit $\lim_{n\rightarrow\infty}D_{n}$ is random and belongs to the class of second order Gaussian chaos as shown in [HLP20, Theorem 2]. The reason we get the Gaussian chaos is due to the limiting Gaussian fluctuation between $\tilde{\mu}_{n}$ and Uni $(0,1)$ as established by standard empirical process theory. There is, of course, no limiting Gaussian fluctuation for the difference between $\hat{\mu}_{n}$ and Uni $(0,1)$ . Hence one may assume that the limiting Gaussian random variables all have zero variance. If we plug this in [HLP20, Theorem 2] and simplify to our case at hand we get (9).

Although this connection has been pointed out in the introduction of [HLP20], that proof simply cannot cover this case due to the lack of randomness. The difference between the two set-ups may be explained by the following analogy. Whereas the proof in [HLP20] can generalize to sampling with replacement from the finite set $(i/n,\;i\in[n])$ , our current set-up is about sampling without replacement. The combinatorics is much more involved which leads to our inability to completing the proof of the conjecture.

2. A partial proof of the conjectured limit

Let $\tilde{\rho}$ be the kernel $\rho-1$ . Then $\tilde{\rho}(x,y)=\sum_{i=1}^{\infty}\lambda_{i}\phi_{i}(x)\phi_{i}(y)$ where the series converges in $L^{2}$ . In particular, due to the marginal constraints,

(10)

\int_{0}^{1}\tilde{\rho}(x,y)dy=0=\int_{0}^{1}\tilde{\rho}(z,w)dz,

for $x,w$ in $[0,1]$ .

For any choice of $(x_{1},\ldots,x_{n})$ and any $\sigma\in\mathbb{S}_{n}$ ,

\begin{split}\prod_{i=1}^{n}\rho(x_{i},x_{\sigma_{i}})&=\prod_{i=1}^{n}\left(1+\tilde{\rho}(x_{i},x_{\sigma_{i}})\right)=1+\sum_{A\subseteq[n],\;A\neq\emptyset}\prod_{i\in A}\tilde{\rho}(x_{i},x_{\sigma_{i}}).\end{split}

Hence,

\begin{split}D_{n}&=\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\left[1+\sum_{A\subseteq[n],\;A\neq\emptyset}\prod_{i\in A}\tilde{\rho}(i/n,\sigma_{i}/n)\right]\\ &=1+\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\sum_{A\subseteq[n],\;A\neq\emptyset}\prod_{i\in A}\tilde{\rho}(i/n,\sigma_{i}/n)\\ &=1+\frac{1}{n!}\sum_{\sigma\in\mathbb{S}_{n}}\sum_{r=1}^{n}\sum_{A:\left\lvert A\right\rvert=r}\prod_{i\in A}\tilde{\rho}(i/n,\sigma_{i}/n)\\ &=1+\frac{1}{n!}\sum_{r=1}^{n}\sum_{A:\left\lvert A\right\rvert=r}\sum_{\sigma\in\mathbb{S}_{n}}\prod_{i\in A}\tilde{\rho}(i/n,\sigma_{i}/n)\\ &=1+\frac{(n-r)!}{n!}\sum_{r=1}^{n}\sum_{1\leq i_{1}<i_{2}<\cdots<i_{r}\leq n}\sum_{1\leq j_{1}\neq j_{2}\neq\cdots\neq j_{r}\leq n}\prod_{t=1}^{r}\tilde{\rho}(i_{t}/n,j_{t}/n).\end{split}

Here, the condition $\{1\leq j_{1}\neq j_{2}\neq\cdots\neq j_{r}\leq n\}$ means all the indices are distinct and in $[n]$ .

Fix $K\in\mathbb{N}$ . For all $n\geq K$ , let

D_{n,K}:=1+\frac{(n-r)!}{n!}\sum_{r=1}^{K}\sum_{1\leq i_{1}<i_{2}<\ldots i_{r}\leq n}\sum_{1\leq j_{1}\neq j_{2}\neq\ldots\neq j_{r}\leq n}\prod_{t=1}^{r}\tilde{\rho}(i_{t}/n,j_{t}/n).

For $L\in\mathbb{N}$ , define

\tilde{\rho}^{(L)}(x,y)=\sum_{l=1}^{L}\lambda_{l}\phi_{l}(x)\phi_{l}(y).

Finally, define

\begin{split}D_{n,K}^{(L)}&:=1+\frac{(n-r)!}{n!}\sum_{r=1}^{K}\sum_{1\leq i_{1}<i_{2}<\cdots<i_{r}\leq n}\sum_{1\leq j_{1}\neq j_{2}\neq\cdots\neq j_{r}\leq n}\prod_{t=1}^{r}\tilde{\rho}^{(L)}(i_{t}/n,j_{t}/n)\\ &=1+\frac{(n-r)!}{n!}\sum_{r=1}^{K}\sum_{1\leq i_{1}<i_{2}<\cdots<i_{r}\leq n}\sum_{1\leq j_{1}\neq j_{2}\neq\cdots\neq j_{r}\leq n}\prod_{t=1}^{r}\left[\sum_{l=1}^{L}\lambda_{l}\phi_{l}(i_{t}/n)\phi_{l}(j_{t}/n)\right]\\ &=1+\sum_{r=1}^{K}\sum_{1\leq l_{1},\ldots,l_{r}\leq L}\prod_{t=1}^{r}\lambda_{l_{t}}\times\\ &\left[\frac{(n-r)!}{r!n!}\sum_{1\leq i_{1}\neq\cdots\neq i_{r}\leq n}\sum_{1\leq j_{1}\neq\cdots\neq j_{r}\leq n}\prod_{t=1}^{r}\phi_{l_{t}}(i_{t}/n)\phi_{l_{t}}(j_{t}/n)\right].\end{split}

Recall that the sum over indices $i_{1}\neq\cdots\neq i_{r}$ means that all the indices are distinct.

Now fix a vector $(l_{1},l_{2},\ldots,l_{r})$ in $[L]^{r}$ . Let $a_{l}$ denote the frequency of appearance of $l\in[L]$ in this sequence. Then each $a_{l}\geq 0$ and assume $\sum_{l=1}^{L}a_{l}=r\leq K$ . Since, for fixed $r$ ,

\frac{(n-r)!}{n!}\approx\frac{1}{n^{r}},

consider the normalized inner sum

(11)

\begin{split}\frac{1}{n^{r}}&\sum_{1\leq i_{1}\neq\cdots\neq i_{r}\leq n}\sum_{1\leq j_{1}\neq\cdots\neq j_{r}\leq n}\prod_{t=1}^{r}\phi_{l_{t}}(i_{t}/n)\phi_{l_{t}}(j_{t}/n)\\ &=\left[\frac{1}{n^{r/2}}\sum_{1\leq i_{1}\neq\cdots\neq i_{r}\leq n}\prod_{t=1}^{r}\phi_{l_{t}}(i_{t}/n)\right]^{2}\end{split}

The claim is unless $a_{l}\in\{0,2\}$ for all $l\in[L]$ , the contributions of the corresponding terms in the normalized sum inside the square converge to zero as $n\rightarrow\infty$ .

To see this let us introduce a sequence of i.i.d. Uni $(0,1)$ random variables $(U_{1},U_{2},\ldots)$ . For $n\in\mathbb{N}$ , define

U_{i}^{(n)}:=\frac{1}{n}\lceil nU_{i}\rceil.

Then, for each $n$ , $\left(U_{1}^{(n)},U^{(2)}_{n},\ldots\right)$ is a sequence of i.i.d. discrete Uni $[n]$ random variables, and obviously, $\left\lvert U_{i}^{(n)}-U_{i}\right\rvert\leq 1/n$ .

Note

\frac{1}{n^{r}}\sum_{1\leq i_{1}\neq\cdots\neq i_{r}\leq n}\prod_{t=1}^{r}\phi_{l_{t}}(i_{t}/n)=\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}\left(U^{(n)}_{t}\right)\right)1\left\{U^{(n)}_{1}\neq U^{(n)}_{2}\neq\cdots\neq U^{(n)}_{r}\right\}\right].

Thus, to analyze the limit of (11), it suffices to analyze $n^{r/2}$ times the RHS.

Consider the following events $F^{(n)}_{ij}=\{U^{(n)}_{i}=U^{(n)}_{j}\}$ , for $i<j\in\mathbb{N}$ . Then

\cup_{1\leq i<j\leq r}F^{(n)}_{ij}=\{U^{(n)}_{1}\neq U^{(n)}_{2}\neq\cdots\neq U^{(n)}_{r}\}^{c}.

Thus

\begin{split}\mathrm{E}&\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right)1\{U^{(n)}_{1}\neq U^{(n)}_{2}\neq\cdots\neq U^{(n)}_{r}\}\right]\\ =&\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right)\right]-\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cup_{1\leq i<j\leq r}F^{(n)}_{ij}\right].\end{split}

Here, for any event $A$ and any integrable random variable $Y$ in a probability space, $\mathrm{E}(Y;A):=\mathrm{E}(Y1_{A})$ .

Since every $\phi_{i}$ is an eigenfunction orthogonal to $1$ , they satisfy $\mathrm{E}(\phi_{i}(U_{1}))=0$ and $\mathrm{E}(\phi_{i}^{2}(U_{1}))=1$ . Due to the spatial discreteness this is not going to be exactly true for $U_{i}^{(n)}$ . However, by Assumption 2 on the Lipschitzness of eigenfunctions, it follows that

(12)

\begin{split}\mathrm{E}\left(\phi_{i}(U^{(n)}_{1})\right)&=O\left(\frac{1}{n}\right),\\ \mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right)\right]&=\prod_{t=1}^{r}\mathrm{E}\left(\phi_{l_{t}}(U^{(n)}_{t})\right)=O\left(\frac{1}{n^{r}}\right),\end{split}

where the constant in $O(1/n)$ can be chosen uniformly for any finite collection of eigenfunctions. In the calculation below, every time we encounter an expression as in (12) we will ignore the $O(1/n)$ term and put zero instead. This is simply for the clarity of the combinatorial expressions. Because $L$ and $K$ are both finite, only finitely many eigenfunctions ever get used and the constant in $O(1/n)$ remains uniformly bounded. The primary reason why we failed to complete this proof is because we cannot suitably estimate this error when $L$ and $K$ are not bounded. Nevertheless, with this convenient abuse of notation,

\begin{split}n^{r/2}\mathrm{E}&\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right)1\{U^{(n)}_{1}\neq U^{(n)}_{2}\neq\cdots\neq U^{(n)}_{r}\}\right]\\ &=-n^{r/2}\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cup_{1\leq i<j\leq r}F^{(n)}_{ij}\right],\end{split}

where, let us repeat again, we have ignored an $O(n^{-r/2})$ error.

On the other hand, by the inclusion-exclusion principle, and by utilizing exchangeability,

\begin{split}\mathrm{E}&\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cup_{1\leq i<j\leq r}F^{(n)}_{ij}\right]\\ &=\sum_{k=1}^{r(r-1)/2}(-1)^{k-1}\sum\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\text{intersection of $k$ many $F^{(n)}_{ij}$s}\right],\end{split}

where the inner sum is over all choices of $k$ many $F^{(n)}_{ij}$ s. Fix $(i_{1},j_{1}),\ldots,(i_{k},j_{k})$ . Then

\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}=\{U^{(n)}_{i_{1}}=U^{(n)}_{j_{1}},\ldots,U^{(n)}_{i_{k}}=U^{(n)}_{j_{k}}\}.

We now make the following observations. The above constraint gives a partition of $[r]$ , where, for each block of the partition, the discrete uniform random variables corresponding to the indices in that block take the same value.

•

If the partition contains a block that is a singleton, i.e. $\cup_{m=1}^{k}\{i_{m},j_{m}\}\neq[r]$ , there will be some $U^{(n)}_{t}$ which has no constraint and is, therefore, independent of the other uniform random variables. Let the number of singletons be $r-r^{\prime}$ , for some $r^{\prime}\in\{0\}\cup[r-1]$ . Then, by (12), independence and exchangeability,

\begin{split}\mathrm{E}&\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}\right]\\ =&O\left(\frac{1}{n^{r-r^{\prime}}}\right)\mathrm{E}\left[\left(\prod_{t=1}^{r^{\prime}}\phi_{l_{t}}(U^{(n)}_{t})\right);\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}\right].\end{split}

The cases below will show that $n^{r^{\prime}/2}$ times the expectation on the RHS remains bounded, as $n\rightarrow\infty$ . Hence $n^{r/2}$ times the expectation on the LHS goes to zero as $n\rightarrow\infty$ whenever $r^{\prime}<r$ . Hence, asymptotically, the only non-zero terms come from partitions of $[r]$ that do not contain any singleton blocks.

•

Now consider the case where every block in the partition is of size two. Such partitions are in correspondence with perfect matchings of the complete graph $K_{r}$ . In particular, $r=2k$ must be even. For any such perfect matching, say $U^{(n)}_{i_{1}}=U^{(n)}_{j_{1}}$ , $U^{(n)}_{i_{2}}=U^{(n)}_{j_{2}}$ , etc.,

\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}\right]=\frac{1}{n^{r/2}}\prod_{m=1}^{r/2}\mathrm{E}\left(\phi_{l_{i_{m}}}(U^{(n)}_{i_{m}})\phi_{l_{j_{m}}}(U^{(n)}_{i_{m}})\right).

The above product is zero in all cases, except when $l_{i_{m}}=l_{j_{m}}$ , for all $m\in[r/2]$ , in which case the product is one. This is due to the orthonormal property of the eigenfunctions. Both claims hold up to a smaller discretization error as in (12). When multiplied by $n^{r/2}$ , each such expectation gives an $O(1)$ term, and there are only finitely many matchings of $K_{r}$ , $r\leq K$ .

•

Finally, consider the contribution of a partition of $[r]$ that contains a block of size $3$ or more and no singletons. Then, obviously, $P\left(\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}\right)=o\left(\frac{1}{n^{r/2}}\right)$ . Since the eigenfunctions are uniformly bounded for $r\leq K$ ,

\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U^{(n)}_{t})\right);\cap_{m=1}^{k}F^{(n)}_{i_{m}j_{m}}\right]=o\left(\frac{1}{n^{r/2}}\right).

Since $r\leq K$ and there are only finitely many partitions of $[r]$ ,

\sum\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U_{t})\right);\text{intersection of $k$ many $F^{(n)}_{ij}$s}\right]=o\left(\frac{1}{n^{r/2}}\right)

for all $r$ odd and all $k\neq r/2$ , when $r$ is even. The only remaining case is when $r$ is even and $k=r/2$ . This is the computation done in the second bulleted item in the itemized list above. The limiting contribution of each term in that case is either zero or one as shown. Hence the total contribution is the number of terms that contribute one.

To find this number, let $a_{i}\in\mathbb{N}\cup\{0\}$ denote the number of times $l_{i}$ appears in the sequence $(l_{1},\ldots,l_{r})$ . If any $a_{i}$ is odd, there is no matching of $K_{r}$ that matches all $l_{i}$ s to themselves and then the sum is zero. If all $a_{i}$ s are even, the only partitions of $r$ that have nonzero contributions are precisely those that belong to the direct product of the set of perfect matchings of the complete graph $K_{a_{i}}$ . Hence, the number of such terms is the product of the number of perfect matchings of $K_{a_{i}}$ s. Thus, in this only remaining case,

\sum\mathrm{E}\left[\left(\prod_{t=1}^{r}\phi_{l_{t}}(U_{t})\right);\text{intersection of $k$ many $F_{ij}$s}\right]=\frac{1}{n^{r/2}}\prod_{i=1}^{\infty}(a_{i}-1)!!,

where, by convention, $(0-1)!!=1$ . Let $a_{i}:=2b_{i}$ ; the above may also be written as

\frac{1}{n^{r/2}}\prod_{i=1}^{\infty}(2b_{i}-1)!!.

Combining all these terms,

(13)

\begin{split}D_{n,K}^{(L)}=1+\sum_{r=1,\;\text{even}}^{K}\frac{1}{r!}\sum\prod_{i=1}^{L}\lambda_{i}^{2b_{i}}\left(\prod_{i=1}^{L}(2b_{i}-1)!!\right)^{2}+o(1),\;\;\text{as $n\rightarrow\infty$},\end{split}

where the inner sum is over all sequences $(l_{1},\ldots,l_{r})\in[L]^{r}$ and the nonnegative integers $(2b_{i},\;i\in[L])$ such that $2\sum_{i=1}^{L}b_{i}=r$ record the frequency of appearance of $i$ in the sequence.

We now recall a property of Hermite polynomials, i.e., the number of perfect matchings of $K_{n}$ is exactly $H_{n}(0)$ which is the value at zero for the $n$ th Hermite polynomial

H_{n}(x)=(-1)^{n}e^{x^{2}/2}\frac{d^{n}}{dx^{x}}e^{-x^{2}/2}.

See [HL72, eqn. (3.14)] for a more general identity involving moment polynomials. Also see [God81] for similar identities involving more general graphs.

Thus, one may also write (13) as

(14)

D_{n,K}^{(L)}=1+\sum_{r=1,\;\text{even}}^{K}\frac{1}{r!}\sum\left(\prod_{i=1}^{L}\lambda_{i}^{b_{i}}H_{2b_{i}}(0)\right)^{2}+o_{n}(1),\;\;\text{as $n\rightarrow\infty$},

where, as before, the inner sum is over all sequences $(l_{1},\ldots,l_{r})\in[L]^{r}$ and the nonnegative integers $(2b_{i},\;i\in[L])$ such that $2\sum_{i=1}^{L}b_{i}=r$ record the frequency of appearance of $i$ in the sequence.

Consider any permutation symmetric function $f:[L]^{r}\rightarrow\mathbb{R}$ . For every choice of nonnegative integers $\mathbf{a}:=(a_{1},\ldots,a_{L})$ such that $\sum_{i=1}^{L}a_{i}=r$ , let $\Gamma(a_{1},\ldots,a_{L})$ denote all subsets of $[L]^{r}$ such that $i\in[L]$ appears exactly $a_{i}$ times. Pick a representation element $\ell_{\mathbf{a}}\in\Gamma(a_{1},\ldots,a_{L})$ . It then follows easily that

\frac{1}{r!}\sum_{(l_{1},\ldots,l_{r})\in[L]^{r}}f(l_{1},\ldots,l_{r})=\sum_{(a_{1},\ldots,a_{L})}\frac{1}{a_{1}!\ldots a_{L}!}f(\ell_{\mathbf{a}}).

Thus, from (14),

(15)

\lim_{n\rightarrow\infty}D_{n,K}^{(L)}=D_{K}^{(L)}:=1+\sum_{r=1,\;\text{even}}^{K}\sum\frac{1}{(2b_{1})!\ldots(2b_{L})!}\left(\prod_{i=1}^{L}\lambda_{i}^{b_{i}}H_{2b_{i}}(0)\right)^{2},

where now the inner sum is over all choices of nonnegative integers $(b_{1},\ldots,b_{L})$ such that $2\sum_{i=1}^{L}b_{i}=r$ .

Note that, at least formally,

\begin{split}\lim_{K\rightarrow\infty}D_{K}^{(L)}&=D^{(L)}:=1+\sum\prod_{i=1}^{L}\frac{\lambda_{i}^{2b_{i}}}{(2b_{i})!}\left(H_{2b_{i}}(0)\right)^{2}.\end{split}

where now the sum is over all choices of nonnegative integers $(b_{1},\ldots,b_{L})$ . The fact that the expression on the RHS is finite (and therefore the limit exists) is a consequence of the multilinear Mehler formula for the Hermite polynomials. See [Foa81, eqn. (2.4)] for a choice of $n=L$ , $S_{n}$ to be the $L\times L$ zero matrix, the diagonal matrix $D_{n}$ to have the diagonal vector $(\lambda^{2}_{1},\ldots,\lambda^{2}_{L})$ and the indeterminate vectors $y,z$ to both be the zero vector. For this choice, in their notation, the only symmetric matrices $N$ that will contribute nonzero terms must be diagonal with $\nu_{ii}=a_{i}$ equal to our $2b_{i}$ . Hence, by [Foa81, eqn. (2.4)] the following limit exists and is given by the simple determinantal expression

D^{(L)}=\frac{1}{\prod_{i=1}^{L}\sqrt{1-\lambda^{2}_{i}}}.

Note that this is finite by our assumption on the positive spectral gap (8). Therefore,

\lim_{L\rightarrow\infty}D^{(L)}:=\frac{1}{\prod_{i=1}^{\infty}\sqrt{1-\lambda^{2}_{i}}}=\frac{1}{\sqrt{\mathrm{det}_{\mathrm{F}}\left(I-T^{2}\right)}}.

The relation of the above with the limit conjectured in (9) is now obvious. What we have shown is that

\lim_{K,L\rightarrow\infty}\lim_{n\rightarrow\infty}D^{(L)}_{n,K}=\frac{1}{\sqrt{\mathrm{det}_{\mathrm{F}}\left(I-T^{2}\right)}},

while what we need to show is

\lim_{n\rightarrow\infty}\lim_{K,L\rightarrow\infty}D^{(L)}_{n,K}=\frac{1}{\sqrt{\mathrm{det}_{\mathrm{F}}\left(I-T^{2}\right)}}.

The interchange of limits requires a uniform error bound. In [HLP20] such an error bound has been proved for that set-up. But the argument does not extend to this case because the combinatorics is different. However, it is reasonable to guess that a more careful combinatorics will provide us with the error bound to establish (9).

3. Acknowledgment

I am very grateful to the editors for giving me an opportunity to contribute towards IJPAM volume celebrating the memory of Prof. K. R. Parthasarathy. May his work continue to inspire many further generations of Indian probabilists.

References

[Csi75] I. Csiszar, I-divergence geometry of probability distributions and minimization problems, Ann. Probab. 3 (1975), 146–158.
[Dia88] P. Diaconis, Group representations in probability and statistics, Institute of Mathematics Statistics Lecture Notes – Monograph Series, vol. 11, IMS, Hayward, CA, 1988.
[DR00] Persi Diaconis and Arun Ram, Analysis of systematic scan Metropolis algorithms using Iwahori-Hecke algebra techniques., Michigan Mathematical Journal 48 (2000), no. 1, 157 – 190.
[Foa81] Dominique Foata, Some Hermite polynomial identities and their combinatorics, Advances in Applied Mathematics 2 (1981), no. 3, 250–259.
[God81] C. D. Godsil, Hermite polynomials and a duality relationship for matching polynomials, Combinatorica 1 (1981), no. 3, 257–262.
[HL72] Ole J. Heilmann and Elliott H. Lieb, Theory of monomer-dimer systems, Commun. math. Phys. 25 (1972), 190–232.
[HLP20] Z. Harchaoui, L. Liu, and S. Pal, Asymptotics of discrete Schrödinger bridges via chaos decomposition, To appear in Bernoulli. Preprint available at arxiv.org/abs/2011.08963., 2020.
[KKRW20] R. Kenyon, D. Král’, C. Radin, and P. Winkler, Permutations with fixed pattern densities, Random Structures and Algorithms 56 (2020).
[Léo12] Christian Léonard, From the Schrödinger problem to the Monge-Kantorovich problem, Journal of Functional Analysis 262 (2012), no. 4, 1879–1920.
[Mal57] C. L. Mallows, Non-null ranking models. I., Biometrika 44 (1957), 114–130.
[Muk16] Sumit Mukherjee, Estimation in exponential families on permutations, The Annals of Statistics 44 (2016), no. 2, 853–875.
[RT93] L. Rüschendorf and W. Thomsen, Note on the Schrödinger equation and I-projections, Statistics and Probability Letters 17 (1993), 369–375.
[Sta09] Shannon Starr, Thermodynamic limit for the Mallows model on $S_{n}$ , Journal of Mathematical Physics 50 (2009), no. 9, 095208.
[vN22] J. van Neerven, Functional analysis, Cambridge Studies in Advanced Mathematics, Cambridge University Press, 2022.