spacing=nonfrench

Fluctuation results for general block spin Ising models

Holger Knöpfel Fachbereich Mathematik und Informatik, Universität Münster, Einsteinstraße 62, 48149 Münster, Germany Holger.Knoepfel@ruhr-uni-bochum.de , Matthias Löwe Fachbereich Mathematik und Informatik, Universität Münster, Einsteinstraße 62, 48149 Münster, Germany maloewe@math.uni-muenster.de , Kristina Schubert Fakultät für Mathematik, TU Dortmund, Vogelpothsweg 87, 44227 Dortmund, Germany kristina.schubert@tu-dortmund.de and Arthur Sinulis Fakultät für Mathematik, Universität Bielefeld, Postfach 100131, 33501 Bielefeld, Germany asinulis@math.uni-bielefeld.de

Abstract.

We study a block spin mean-field Ising model, i. e. a model of spins in which the vertices are divided into a finite number of blocks with each block having a fixed proportion of vertices, and where pair interactions are given according to their blocks. For the vector of block magnetizations we prove Large Deviation Principles and Central Limit Theorems under general assumptions for the block interaction matrix. Using the exchangeable pair approach of Stein’s method we establish a rate of convergence in the Central Limit Theorem for the block magnetization vector in the high temperature regime.

Key words and phrases:

block spin Ising models, central limit theorem, large deviation principle, phase transition, Stein’s method

1991 Mathematics Subject Classification:

Primary 60F05, 60F10, Secondary 82B20

M.L.’s research was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2044-390685587, Mathematics Münster: Dynamics - Geometry - Structure

A.S. acknowledges financial support by the German Research Foundation via the CRC 1283.

1. Introduction

Mean-field block models were introduced as an approximation of a lattice model of a meta-magnet, see e.g. formula (4.1) in [24]. Furthermore, they can arise in disordered systems with random pair interactions, studied for example in [32],[31],[9]. Later, they were rediscovered as interesting models for statistical mechanics systems, see [20], [17], [8], [27], [25], as well as models for social interactions between several groups, e.g. in [19], [1], [29]. This latter approach follows very much the social re-interpretation for one group of the Curie-Weiss model in [6] or of the Hopfield model in [10] or [26]. A third source of interest in mean-field spin block models is a statistical point of view. In [3], the authors gave another analysis of the bipartite mean-field Ising block model with equal block sizes, and asked the question whether one can recover the blocks from several observations from this model, and if so, how many observations are needed. In this aspect, the block spin models are related to the stochastic block models from random graph theory. These have been in the center of interest in statistics and probability theory over the past couple of years (see, e.g. [2], [21]). The statistical interest in them arises from their relation to graphical models. In this framework a major question is always how to reconstruct the block structure under sparsity assumptions (see e.g. [5], [28], [4]).

Our starting point is [27]. There, the fluctuations of an order parameter for a two-groups block model with equal block sizes were analyzed on the level of large deviations principles (LDPs, for short) and central limit theorems (CLTs). Starting from these results, there are several natural questions. First: Can these results be also proven for systems with not necessarily identical block sizes? Second: Can we generalize our results to the situation of more than two groups? And third: Can we give a speed of convergence for the CLT? The main goal of the current note is to (partially) answer these questions. To this end, we will present a new approach to mean-field block spin models, via the corresponding block interaction matrix. Moreover, to obtain a speed of convergence in the CLT, we will employ Stein’s method as in [14], [7] for the standard mean-field Ising, or Curie–Weiss model.

The rest of this note is organized in the following way. In the remaining part of this introduction, we define our model in a way that makes it accessible to our techniques in Sections 2 and 3, and state our main results. Section 2 is devoted to the proof of the LDP results. Afterwards, we analyze the critical points of the rate function and obtain the mean field equations, showing that in the high temperature case the only maximum is $0$ , whereas in the low temperature case there are nonzero maximizers, and we obtain a solution for a special class of block interaction matrices. In Section 3 we prove the CLT for the order parameter of the model in two ways. One uses the classical Hubbard–Stratonovich transformation. This was already used for proving the CLT for the magnetization in the Curie–Weiss model in [16], and also is the core technique for the CLT in [27]. The second proof uses a multivariate version of the exchangeable pair approach in Stein’s method, developed in [30]. Lastly, Section 4 contains a discussion of some of the results and further open questions.

1.1. The model

The block spin Ising model will be characterized by two quantities, a number $k\in\operatorname{\mathbb{N}}$ – number of blocks – and a symmetric, positive definite matrix $A\in\operatorname{\mathbb{R}}^{k\times k}$ , which is the block interaction matrix. $A_{ij}$ will determine the strength of interaction between two particles in block $i$ and $j$ respectively. Here, $\operatorname{\mathbb{R}}^{r_{1}\times r_{2}}$ is the set of all $r_{1}$ by $r_{2}$ matrices with real entries.

Let $N(n)$ be a strictly increasing subsequence of $\operatorname{\mathbb{N}}$ . For a system of size $N=N(n)$ let $B_{1}^{(n)},\ldots,B_{k}^{(n)}\subset\{1,\ldots,N\}$ be a partition of $\{1,\ldots,N\}$ into $k$ blocks. Without loss of generality, we assume that the indices in the blocks are ordered, i.e. if $i_{0}\in B_{i}^{(n)}$ and $j_{0}\in B_{j}^{(n)}$ and $i<j$ , it follows $i_{0}<j_{0}$ . We call $|B_{i}^{(n)}|$ the block size of the $i$ -th block. Note that, in particular, we have a system of size $N$ , where for $n\in\mathbb{N}$

N=N(n)=\sum_{i=1}^{k}\lvert B_{i}^{(n)}\rvert.

Define for each $n\in\operatorname{\mathbb{N}}$ the matrix of the relative block sizes

\Gamma_{n}\coloneqq\operatorname{diag}\left(\frac{\sqrt{\lvert B_{1}^{(n)}\rvert}}{\sqrt{N}},\ldots,\frac{\sqrt{\lvert B_{k}^{(n)}\rvert}}{\sqrt{N}}\right)\in\mathbb{R}^{k\times k}.

We assume that for each $i=1,\ldots,k$ the limit

\gamma_{i}\coloneqq\lim_{n\to\infty}\sqrt{\frac{\lvert B_{i}^{(n)}\rvert}{N}}\in(0,1)

exists, so that the matrix of asymptotic relative block sizes

\Gamma_{\infty}\coloneqq\operatorname{diag}(\gamma_{1},\ldots,\gamma_{k})\in\operatorname{\mathbb{R}}^{k\times k}

is invertible. If the $k$ partition blocks are asymptotically of the same size, i.e.

\Gamma_{\infty}=\frac{1}{\sqrt{k}}\operatorname{Id}\qquad\mbox{ resp.}\quad\gamma_{i}=\frac{1}{\sqrt{k}}\mbox{ for }i=1,\ldots,k,

we call this the uniform case. The block spin Ising model with $k$ blocks of sizes $\lvert B^{(n)}_{1}\rvert,\ldots,\lvert B^{(n)}_{k}\rvert$ and block interaction matrix $A$ is defined as the Ising model with interaction matrix

J_{n}\coloneqq\frac{1}{N}\begin{pmatrix}A_{11}O(\lvert B_{1}^{(n)}\rvert,\lvert B_{1}^{(n)}\rvert)&\cdots&A_{1k}O(\lvert B_{1}^{(n)}\rvert,\lvert B_{k}^{(n)})\rvert\\ \vdots&\vdots&\vdots\\ A_{k,1}O(\lvert B_{k}^{(n)}\rvert,\lvert B_{1}^{(n)}\rvert)&\cdots&A_{kk}O(\lvert B_{k}^{(n)}\rvert,\lvert B_{k}^{(n)}\rvert)\end{pmatrix},

where $O(m,n)\in\operatorname{\mathbb{R}}^{m\times n}$ is the matrix with all entries equal to $1$ . We denote this model by $\mu_{J_{n}}$ . More precisely, $\mu_{J_{n}}$ is the probability measure on $\{-1,+1\}^{N},N=N(n),$ defined by

\mu_{J_{n}}(x)=Z_{n}^{-1}\exp\left(H_{n}(x)\right)=Z_{n}^{-1}\exp\left(\frac{1}{2}\langle x,J_{n}x\rangle\right)=Z_{n}^{-1}\exp\left(\frac{1}{2}\sum_{i,j=1}^{N}(J_{n})_{ij}x_{i}x_{j}\right).

Here, of course, $Z_{n}$ is the partition function

Z_{n}:=\sum_{x\in\{-1,+1\}^{N}}\exp\left(\frac{1}{2}\sum_{i,j=1}^{N}(J_{n})_{ij}x_{i}x_{j}\right).

Note that, contrary to the usual convention, we do not require the diagonal of $J_{n}$ to be zero for technical convenience. However, since $x_{i}^{2}=1$ , both $J_{n}$ and its “dediagonalized” version $\widetilde{J}_{n}=J_{n}-\operatorname{diag}(J_{ii})$ give rise to the same Ising model. Here and in the sequel, $\operatorname{diag}(\lambda_{1},\ldots,\lambda_{l})$ is a diagonal $l\times l$ matrix with values $\lambda_{1},\ldots,\lambda_{l}$ on its diagonal. Lastly, for any $p,q\in[1,\infty]$ and any matrix $A\in\operatorname{\mathbb{R}}^{k\times k}$ we define the operator norm

\lVert A\rVert_{p\to q}\coloneqq\sup_{x\in\operatorname{\mathbb{R}}^{k}:\lVert x\rVert_{p}=1}\lVert Ax\rVert_{q}.

1.2. Main results

We prove results on the fluctuations of the block magnetization vector on different scales. In what follows, we use the non-normalized and normalized versions of the block magnetization vector defined as

	$\displaystyle m^{(n)}=m^{(n)}(x)=(m^{(n)}_{1}(x),\ldots,m^{(n)}_{k}(x))$	$\displaystyle=\left(\sum_{j\in B_{i}^{(n)}}x_{j}\right)_{i=1,\ldots,k},$
	$\displaystyle\widetilde{m}^{(n)}=\widetilde{m}^{(n)}(x)=(\widetilde{m}^{(n)}_{1}(x),\ldots,\widetilde{m}^{(n)}_{k}(x))$	$\displaystyle=\left(\frac{1}{\lvert B_{i}^{(n)}\rvert}\sum_{j\in B_{i}^{(n)}}x_{j}\right)_{i=1,\ldots,k},$
	$\displaystyle\widehat{m}^{(n)}=\widehat{m}^{(n)}(x)=(\widehat{m}^{(n)}_{1}(x),\ldots,\widehat{m}^{(n)}_{k}(x))$	$\displaystyle=\left(\frac{1}{\sqrt{\lvert B_{i}^{(n)}\rvert}}\sum_{j\in B_{i}^{(n)}}x_{j}\right)_{i=1,\ldots,k}.$

Note that this allows us to rewrite the Hamiltonian $H_{n}$ of $\mu_{J_{n}}$ as

H_{n}(x)=\frac{1}{2N}\left\langle m^{(n)},Am^{(n)}\right\rangle=\frac{1}{2}\left\langle\widehat{m}^{(n)},\Gamma_{n}A\Gamma_{n}\widehat{m}^{(n)}\right\rangle=\frac{N}{2}\left\langle\Gamma_{n}^{2}A\Gamma_{n}^{2}\widetilde{m}^{(n)},\widetilde{m}^{(n)}\right\rangle,

which we use tacitly.

We begin by presenting the large deviation results. The first result is a generalization of [27, Theorem 2.1]. In that paper, an LDP for $\widetilde{m}^{(n)}$ was proved in the situation of $k=2$ blocks of equal size. Here we analyze the general case.

Theorem 1.1.

Let $k\in\operatorname{\mathbb{N}}$ and $A$ be a block interaction matrix. The sequence $(\widetilde{m}^{(n)})_{n\in\operatorname{\mathbb{N}}}$ satisfies an LDP under $(\mu_{J_{n}})_{n\in\operatorname{\mathbb{N}}}$ with speed $N$ and rate function

J(x)\coloneqq\sup_{y\in\operatorname{\mathbb{R}}^{k}}I(y)-I(x),

where

I(x)\coloneqq\frac{1}{2}\left\langle x,\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}x\right\rangle-\sum_{i=1}^{k}\gamma_{i}^{2}L^{*}(x_{i}),

and $L^{*}$ denotes the convex conjugate of $\log\cosh$ , i.e.

L^{*}(x)\coloneqq\frac{1}{2}(1+x)\log(1+x)+\frac{1}{2}(1-x)\log(1-x)\quad\quad x\in[-1,+1].

More precisely, in the notion of large deviations, the sequence of push-forwards $(\widetilde{m}^{(n)}\circ\mu_{J_{n}})_{n\in\operatorname{\mathbb{N}}}$ satisfies an LDP with speed $N$ and the rate function $I$ .

In the special case of asymptotically uniform block sizes the function $I$ is related to the matrix $A$ in an even more straightforward way, since in this case

I(x)=\frac{1}{2k^{2}}\langle x,Ax\rangle-\frac{1}{k}\sum_{i=1}^{k}L^{*}(x_{i}).

We show that the rate function $I$ has a unique minimum at $0$ in the case $\lVert\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}\rVert_{2\to 2}\leq 1$ , which yields the following corollary.

Corollary 1.2.

Under the general assumptions, if $\lVert\Gamma_{\infty}A\Gamma_{\infty}\rVert_{2\to 2}\leq 1$ , the normalized vector of magnetizations $\widetilde{m}^{(n)}$ converges to $0$ exponentially fast in $\mu_{J_{n}}$ -probability. By this we mean more precisely, for each $\varepsilon>0$ there is a constant $I_{\varepsilon}$ such that

\mu_{J_{n}}(||\widetilde{m}^{(n)}||\geq\varepsilon)\leq\exp(-NI_{\varepsilon}).

Let us discuss the large deviation results. In the classical Curie–Weiss model, i.e. the case $k=1$ , there is a phase transition: The limiting behavior of $\widetilde{m}^{(n)}$ changes, depending on whether $A_{11}\leq 1$ (the high temperature regime), or $A_{11}>1$ (the low temperature regime) (see [15] for an extensive treatment of this model). A corresponding phase transition can be observed in our model. This is stated in [18] for the bipartite model. In [25] the authors prove the existence of such a phase transition using the method of moments. Of course, with that method one cannot obtain an exponential speed of convergence as in Corollary 1.2. In accordance with the notion in the classical Curie–Weiss model, we will call these different parameter regimes the high temperature and low temperature regime, respectively. Here, the high temperature regime corresponds to $\lVert\Gamma_{\infty}A\Gamma_{\infty}\rVert_{2\to 2}\leq 1$ and the low temperature regime to $\lVert\Gamma_{\infty}A\Gamma_{\infty}\rVert_{2\to 2}>1$ . In the special case of asymptotically uniform block sizes (i.e. $\Gamma_{\infty}=\frac{1}{\sqrt{k}}\operatorname{Id}$ ) these conditions reduce to $\lVert A\rVert_{2\to 2}\leq k$ and $\lVert A\rVert_{2\to 2}>k$ respectively.

Next, we consider the scaled block magnetization vector $\widehat{m}^{(n)}$ . Again, in the classical (i.e. one-dimensional) case it is known that the magnetization satisfies a central limit theorem with variance $\sigma^{2}=(1-A_{11})^{-1}$ whenever $A_{11}<1$ . The following theorem is a generalization of this phenomenon.

Theorem 1.3.

Let $k\in\operatorname{\mathbb{N}}$ and $A$ be a block interaction matrix. In the high temperature regime we have

\widehat{m}^{(n)}\Rightarrow\mathcal{N}(0,\Sigma_{\infty})=\mathcal{N}\left(0,\left(\operatorname{Id}-\Gamma_{\infty}A\Gamma_{\infty}\right)^{-1}\right).

Consequently, in the uniform case

\widehat{m}^{(n)}\Rightarrow\mathcal{N}\left(0,\left(\operatorname{Id}-\frac{1}{k}A\right)^{-1}\right).

Refer to caption — Figure 1. A visualization of the block magnetization vector $\widehat{m}^{(n)}$ (left) for $n=500$ , using the Glauber dynamic for sampling, and a heat map for the limiting normal distribution. Here, we choose $k=2$ , $A=\begin{pmatrix}1.1&0.6\\ 0.6&1.1\end{pmatrix}$ and the uniform case.

Note that $\Sigma_{\infty}$ exists, and it can be expanded into a Neumann series. Moreover, if $\Gamma_{\infty}A\Gamma_{\infty}=V^{T}\Lambda V$ is an orthogonal decomposition, then $\Sigma_{\infty}=V^{T}\operatorname{diag}((1-\lambda_{i})^{-1})V$ . Again, a similar statement is derived in [25] using the method of moments.

Furthermore, we can treat the critical case. In the Curie–Weiss model, for $\beta=1$ , the quantity $N^{-3/4}\sum_{i=1}^{N}\sigma_{i}$ converges weakly to a measure with Lebesgue-density $g_{1}(x)\coloneqq Z^{-1}\exp\left(-\frac{x^{4}}{12}\right)$ (see e.g. [15, Theorem V.9.5]). As proven in [27] and [18] a similar statement holds true for the vector of magnetizations in the case of $k=2$ blocks. The next theorem gives a further generalization of this fact in the case $k\geq 2$ . Moreover, it shows that statistics associated to the orthogonal decomposition of the block interaction matrix give rise to $k$ asymptotically independent random variables with either a Gaussian distribution or a distribution with a Lebesgue-density $g_{1}$ .

In the multidimensional critical case $\lVert\Gamma_{\infty}A\Gamma_{\infty}\rVert_{2\to 2}=1$ we restrict to the uniform case with a simple eigenvalue $\lambda_{k}=k$ , i.e. we have $A=V^{T}\operatorname{diag}(\lambda_{1},\ldots,\lambda_{k-1},k)V$ . Let $\Gamma_{n}A\Gamma_{n}=V^{T}_{n}\Lambda_{n}V_{n},$ be the orthogonal decomposition, where $V_{n}$ is a unitary $k\times k$ -matrix and $\Lambda_{n}$ a diagonal $k\times k$ -matrix. If we define the normalized vector

w^{\prime}=w^{\prime}_{n}\coloneqq\operatorname{diag}(N^{-1/2},\ldots,N^{-1/2},N^{-3/4})Vm^{(n)}

and the matrix

\hat{C}_{N}\coloneqq\operatorname{diag}(\lambda_{1},\ldots,\lambda_{k-1},kN^{1/2}),

we have the following result.

Theorem 1.4.

Under the above assumptions let $Y_{n}\sim\mathcal{N}(0,\hat{C}_{N}^{-1})$ and $X_{n}\sim\mu_{J_{n}}$ be independent random variables, defined on a common probability space. Then $w^{\prime}_{n}(X_{n})+Y_{n}$ converges in distribution to a probability measure with density

(1.1)

\widetilde{g}_{k}(x)\coloneqq\widetilde{Z}^{-1}\exp\left(-\frac{1}{2}\sum_{i=1}^{k-1}\left(\lambda_{i}-\frac{\lambda_{i}^{2}}{k}\right)x_{i}^{2}-\frac{k^{3}}{12}x_{k}^{4}\sum_{i=1}^{k}V_{ki}^{4}\right)

for a suitable normalization $\widetilde{Z}$ that makes the expression (1.1) a probability density.

Thus, the vector $(w_{n}^{\prime}(X_{n})_{j})_{j=1,\ldots,k-1}$ converges to a normal distribution with covariance matrix $\Sigma=\operatorname{diag}\left((k-\lambda_{j})^{-1}\right)$ and the random variable $w_{n}^{\prime}(X_{n})_{k}$ converges to a distribution with Lebesgue-density $Z^{-1}\exp\left(-(\frac{k^{3}}{12}\sum_{i=1}^{k}V_{ki}^{4})x^{4}\right)dx$ .

We believe it is possible to extend Theorem 1.4 to the case where the eigenvalue $k$ has multiplicity greater than $1$ , by appropriately rescaling all the eigenvectors which belong to the eigenvalue $k$ .

Note that the parameter $\sigma^{2}\coloneqq k^{3}/12\sum_{i=1}^{k}V_{ki}^{4}$ is directly related to the variance of a random variable with that distribution; indeed, a short calculation shows that for $X\sim\exp(-\sigma^{2}x^{4})dx$ we have $\mathrm{Var}(X)=c\sigma^{-1}$ , where $c$ is an absolute constant. Moreover, $\sum_{i=1}^{k}V_{ki}^{4}=\lVert v_{k}\rVert_{4}^{4}$ , where $v_{k}$ is the eigenvector belonging to the eigenvalue $k$ .

In a final step, we establish convergence rates in the CLT in the high temperature case for a special class of functions. We use the exchangeable pair approach of Stein’s method, that was also used in [14] and [7] in the case of the Curie–Weiss model. The proof of the next result will rely on a multivariate version of Stein’s method proven in [30]. To this end, define the function class

\mathcal{F}_{3}\coloneqq\left\{h:\operatorname{\mathbb{R}}^{k}\to\operatorname{\mathbb{R}}:h\in\mathcal{C}^{3}(\operatorname{\mathbb{R}}^{k}),\max_{j=1,2,3}\max_{\alpha=(\alpha_{j})_{j}}\sup_{x\in\operatorname{\mathbb{R}}^{k}}\lvert\partial^{\alpha}h\rvert(x)\leq 1\right\}

of all three times differentiable functions with all partial derivatives (up to order three) bounded.

Theorem 1.5.

Assume that $\lVert\Gamma_{\infty}A\Gamma_{\infty}\rVert_{2\to 2}<1$ and for each $n\in\operatorname{\mathbb{N}}$ let $\Sigma_{n}\coloneqq\operatorname{\mathbb{E}}\hat{m}^{(n)}(\hat{m}^{(n)})^{T}$ . For $Z\sim\mathcal{N}(0,\operatorname{Id})$ , we have

\sup_{h\in\mathcal{F}_{3}}\big{\lvert}\operatorname{\mathbb{E}}_{\mu_{J_{n}}}\big{(}h\big{(}\hat{m}^{(n)}\big{)}\big{)}-\operatorname{\mathbb{E}}h(\Sigma_{n}^{1/2}Z)\big{\rvert}=O(N^{-1/2}).

2. Proofs of the large deviation results and the mean-field equations

Let us start off by proving the LDP result for the rescaled block magnetization vector $\widetilde{m}^{(n)}$ . Recall the notion of an LDP (for which we also refer to [13] and [12]): If $\mathcal{X}$ is a Polish space and $(a_{n})_{n\in\operatorname{\mathbb{N}}}$ is an increasing sequence of non-negative real numbers, we say that a sequence of probability measures $(\nu_{n})_{n}$ on $\mathcal{X}$ satisfies a large deviation principle with speed $a_{n}$ and rate function $I:\mathcal{X}\to\operatorname{\mathbb{R}}$ (i.e. a lower semi-continuous function with compact level sets $\{x:I(x)\leq L\}$ for all $L>0$ ), if for all Borel sets $B\in\mathcal{B}(\mathcal{X})$ we have

-\inf_{x\in\mathrm{int}(B)}I(x)\leq\liminf_{n\to\infty}\frac{\log\nu_{n}(B)}{a_{n}}\leq\limsup_{n\to\infty}\frac{\log\nu_{n}(B)}{a_{n}}\leq-\inf_{x\in\mathrm{cl}(B)}I(x),

where $\mathrm{int}(B)$ and $\mathrm{cl}(B)$ denote the topological interior and closure of a set $B$ , respectively.

We say that a sequence of random variables $X_{n}:\Omega\to\mathcal{X}$ satisfies an LDP with speed $a_{n}$ and rate function $I:\mathcal{X}\to\operatorname{\mathbb{R}}$ under a sequence of measures $\mu_{n}$ if the push-forward sequence $\nu_{n}\coloneqq\mu_{n}\circ X_{n}$ satisfies an LDP with speed $a_{n}$ and rate function $I$ .

To prove Theorem 1.1, we will need the following lemma.

Lemma 2.1.

Let $\mathcal{X}$ be a Polish space and assume that a sequence of measures $(\mu_{n})_{n\in\operatorname{\mathbb{N}}}$ on $\mathcal{X}$ satisfies an LDP with speed $n$ and rate function $I$ . Let $F:\mathcal{X}\to\operatorname{\mathbb{R}}$ be a continuous function which is bounded from above and $\eta_{n}:\mathcal{X}\to\operatorname{\mathbb{R}}$ a sequence of functions such that $\lVert\eta_{n}\rVert_{L^{\infty}(\mu_{n})}\to 0$ . Then the sequence of measures

d\widetilde{\mu}_{n}=\exp(nF+n\eta_{n})d\mu_{n}

satisfies an LDP with speed $n$ and rate function

J(x)=\sup_{\lambda\in\mathcal{X}}\left(F(\lambda)-I(\lambda)\right)-(F(x)-I(x)).

Proof.

Note that this is a slight modification of the tilted LDP, which is an immediate consequence of Varadhan’s Lemma ([13, Theorem III.17]). Indeed, according to this tilted LDP, the sequence of measures $(\nu_{n})_{n}$ with $\mu_{n}$ -density $\exp(nF)$ satisfies an LDP with speed $n$ and rate function $J$ . Since for any $n\in\operatorname{\mathbb{N}}$ and any $B\in\mathcal{B}(\mathcal{X})$ the inequalities

e^{-2n\lVert\eta_{n}\rVert_{L^{\infty}(\mu_{n})}}\nu_{n}(B)\leq\widetilde{\mu}_{n}(B)\leq e^{2n\lVert\eta_{n}\rVert_{L^{\infty}(\mu_{n})}}\nu_{n}(B)

hold, this easily implies an LDP for $(\widetilde{\mu}_{n})_{n}$ with speed $n$ and the same rate function $J$ due to $\lVert\eta_{n}\rVert_{L^{\infty}(\mu_{n})}\to 0$ . ∎

Proof of Theorem 1.1.

First, note that under the uniform measure $\mu_{0}$ (i.e. $A\equiv 0$ ) we have

\operatorname{\mathbb{E}}_{\mu_{0}}\exp\left(N\langle t,\widetilde{m}^{(n)}\rangle\right)=\prod_{i=1}^{k}\cosh\left(t_{i}\frac{N}{\lvert B_{i}^{(n)}\rvert}\right)^{\lvert B_{i}^{(n)}\rvert},

so that

\lim_{N\to\infty}\frac{1}{N}\log\operatorname{\mathbb{E}}_{\mu_{0}}\exp\left(N\langle t,\widetilde{m}^{(n)}\rangle\right)=\sum_{i=1}^{k}\gamma_{i}^{2}\log\cosh\left(\frac{t_{i}}{\gamma_{i}^{2}}\right).

By the Gärtner-Ellis Theorem ([12, Theorem 2.3.6]), $\widetilde{m}^{(n)}$ satisfies an LDP under $\mu_{0}$ with speed $N$ and rate function

J_{\mu_{0}}(x)\coloneqq\sup_{t\in\operatorname{\mathbb{R}}^{k}}\left(\langle t,x\rangle-\sum_{i=1}^{k}\gamma_{i}^{2}\log\cosh\left(\frac{t_{i}}{\gamma_{i}^{2}}\right)\right)=\sum_{i=1}^{k}\gamma_{i}^{2}L^{*}(x_{i}),

where $L^{*}(x)$ is the convex conjugate of $\log\cosh$ . Next, it is easy to see that we can rewrite the $\mu_{0}$ -density of $\mu_{J_{n}}$ as

\frac{d\mu_{J_{n}}}{d\mu_{0}}(x)=\exp\left(\frac{N}{2}\langle(\Gamma_{n}^{2}A\Gamma_{n}^{2})\widetilde{m}^{(n)},\widetilde{m}^{(n)}\rangle\right)=\exp\left(NF(\widetilde{m}^{(n)})-N\eta_{n}(\widetilde{m}^{(n)})\right),

where

	$\displaystyle F(x)$	$\displaystyle=\frac{1}{2}\langle\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}x,x\rangle=\frac{1}{2}\langle\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}x,x\rangle\wedge\frac{1}{2}k\lVert\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}\rVert_{2\to 2},$
	$\displaystyle\eta_{n}(x)$	$\displaystyle=\frac{1}{2}\langle\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}x,x\rangle-\frac{1}{2}\langle\Gamma_{n}^{2}A\Gamma_{n}^{2}x,x\rangle.$

Note that we artificially inserted the truncation in $F$ to emphasize the boundedness of $F(\widetilde{m}^{(n)})$ . This does not affect the quadratic form, as

\left|\frac{1}{2}\left\langle\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}\widetilde{m}^{(n)},\widetilde{m}^{(n)}\right\rangle\right|\leq\frac{1}{2}\lVert\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}\rVert_{2\to 2}\lVert\widetilde{m}^{(n)}\rVert_{2}^{2}\leq\frac{k}{2}\lVert\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}\rVert_{2\to 2}.

Moreover, $F$ is obviously continuous and $\eta_{n}$ satisfies

\displaystyle\lVert\eta_{n}\rVert_{\infty}\leq k\lVert\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}-\Gamma_{n}^{2}A\Gamma_{n}^{2}\rVert\to 0

on the support of $\mu_{0}\circ\widetilde{m}=[-1,1]^{k}$ , so that the assertion follows from Lemma 2.1. ∎

2.1. The mean-field equations

Theorem 1.1 states that the function

I(x)=\frac{1}{2}\left\langle x,\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}x\right\rangle-\sum_{i=1}^{k}\gamma_{i}^{2}L^{*}(x_{i})

determines the asymptotic behavior of the magnetization, and thus the critical points of $I$ are of utter importance. These satisfy the so-called mean-field equations

(2.1)

\displaystyle\begin{split}x_{1}&=\tanh((A\Gamma_{\infty}^{2}x)_{1})=\tanh\bigg{(}\sum_{j=1}^{k}A_{1j}\gamma_{j}^{2}x_{j}\bigg{)}\\ \vdots&\quad\vdots\quad\vdots\\ x_{k}&=\tanh((A\Gamma_{\infty}^{2}x)_{k})=\tanh\bigg{(}\sum_{j=1}^{k}A_{kj}\gamma_{j}^{2}x_{j}\bigg{)}.\end{split}

For example, in the well-studied case $k=2$ , choosing

A=\begin{pmatrix}A_{11}&A_{12}\\ A_{12}&A_{22}\end{pmatrix}\quad\text{ and }\quad\Gamma_{\infty}^{2}=\begin{pmatrix}\gamma&0\\ 0&1-\gamma\end{pmatrix}

for a positive definite matrix $A$ and $\gamma\in(0,1)$ equations (2.1) reduce to

\displaystyle\begin{split}x_{1}&=\tanh(\gamma A_{11}x_{1}+(1-\gamma)A_{12}x_{2}),\\ x_{2}&=\tanh(\gamma A_{12}x_{1}+(1-\gamma)A_{22}x_{2}).\end{split}

Whereas for the two-dimensional fixed point problem the existence of a solution can be shown by monotonicity arguments, the existence of a solution to (2.1) for general $k$ is more involved. First off, we show that in the high temperature regime the only critical point of $I$ is $0$ . This will immediately yield Corollary 1.2.

Proof of Corollary 1.2.

In the sense of the formulation in Corollary 1.2, $\widetilde{m}^{(n)}$ concentrates exponentially fast in the minima of the function $J$ . However, under the condition $\lVert\Gamma_{\infty}A\Gamma_{\infty}\rVert_{2\to 2}\leq 1$ there is only one minimum, which is zero. To see this, note that any local minimum satisfies

(2.2)

\displaystyle\nabla(J)(x)=-\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}x+\Gamma_{\infty}^{2}\operatorname{artanh}(x)=0.

Here, $\operatorname{artanh}(x)$ is understood componentwise. Clearly, $0$ is a solution, and due to

(2.3)

\operatorname{Hess}\left(I\right)(0)=-\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}+\Gamma_{\infty}^{2}=\Gamma_{\infty}\left(\operatorname{Id}-\Gamma_{\infty}A\Gamma_{\infty}\right)\Gamma_{\infty}\geq 0

this is a local minimum. Assume there is some $y\neq 0$ solving (2.2), and observe that

	$\displaystyle\lVert\Gamma_{\infty}^{2}y\rVert_{2}^{2}$	$\displaystyle=\left\langle\Gamma_{\infty}^{2}y,\Gamma_{\infty}^{2}\operatorname{artanh}(y)+\left(\operatorname{Id}-\Gamma_{\infty}^{2}A\right)\Gamma_{\infty}^{2}y\right\rangle\geq\langle\Gamma_{\infty}^{2}y,\Gamma_{\infty}^{2}\operatorname{artanh}(y)\rangle$
(2.4)			$\displaystyle=\sum_{i=1}^{k}\gamma_{i}^{4}\operatorname{artanh}(y_{i})y_{i}\geq\sum_{i=1}^{k}\gamma_{i}^{4}y_{i}^{2}=\lVert\Gamma_{\infty}^{2}y\rVert_{2}^{2}.$

Here the first inequality follows from the general fact that the spectrum of the matrices $BC$ and $CB$ agree, applied to $B=\Gamma_{\infty}$ and $C=\Gamma_{\infty}A$ . The last inequality follows from $\operatorname{artanh}(x)x\geq x^{2}$ for all $x\in(-1,1)$ , with equality for $x=0$ only. This means that for any solution $y$ we have equality in (2.1). However, equality can only hold if $y_{i}=0$ whenever $\gamma_{i}\neq 0$ . Due to our assumption $\gamma_{i}\in(0,1)$ , this proves the claim. ∎

In contrast, in the low temperature regime, there are other solutions to the mean-field equations (2.1). Let us start with the following proposition showing the connection of the $k$ -dimensional mean-field equations to the one-dimensional equations of the Curie–Weiss model. It provides an explicit formula for the solution of the $k$ -dimensional problem in terms of the solution of the Curie–Weiss equation.

Proposition 2.2.

Let $k\in\operatorname{\mathbb{N}}$ , $\Gamma_{\infty}=\frac{1}{\sqrt{k}}\operatorname{Id}$ and $A$ be a positive semidefinite, symmetric matrix with $\lVert A\rVert_{2\to 2}>k$ . If the eigenvector $v_{k}$ belonging to the largest eigenvalue $\lambda_{k}$ can be rescaled to satisfy $v_{k}\in\{-1,0,1\}^{k}$ , then there exists a solution $x\neq 0$ to the mean-field equations (2.1) and it is given by $x=m^{*}v_{k}$ , where $m^{*}$ is the positive one-dimensional solution of the Curie–Weiss model with temperature $\beta=\lambda_{k}k^{-1}>1$ .

Proof.

Let $m^{*}>0$ be the unique positive solution of the Curie–Weiss equation $\tanh(\frac{\lambda_{k}}{k}x)=x$ for $\beta\coloneqq\frac{\lambda_{k}}{k}>1$ and define $v\coloneqq m^{*}v_{k}$ . We have

\tanh\bigg{(}\frac{1}{k}Av\bigg{)}=\tanh\bigg{(}\frac{m^{*}}{k}Av_{k}\bigg{)}=\tanh\bigg{(}\frac{m^{*}\lambda_{k}}{k}v_{k}\bigg{)}=\tanh\bigg{(}\frac{m^{*}\lambda_{k}}{k}\bigg{)}v_{k}=v,

where in the second-to-last step we have used explicitly that $v_{k}\in\{-1,0,1\}^{k}$ , and so $v$ is a critical point of $I$ . Moreover, in this case it is easily seen that

\operatorname{Hess}\left(\frac{1}{2k^{2}}\langle x,Ax\rangle-\frac{1}{k}\sum_{i=1}^{k}L^{*}(x_{i})\right)(v)=\frac{1}{k^{2}}A-\frac{1}{k(1-(m^{*})^{2})}\operatorname{Id}

is negative definite. Indeed, from

g(x)\coloneqq\frac{\operatorname{artanh}(x)}{x}-\frac{1}{1-x^{2}}=\sum_{k=0}^{\infty}x^{2k}\left(\frac{1}{1+2k}-1\right)\leq 0

we obtain

	$\displaystyle\left\langle y,\left(\frac{1}{k^{2}}\Lambda-\frac{1}{k(1-(m^{*})^{2})}\right)y\right\rangle$	$\displaystyle=\frac{1}{k}\sum_{i=1}^{k}y_{i}^{2}\left(\frac{\lambda_{i}}{k}-\frac{1}{1-(m^{*})^{2}}\right)$
		$\displaystyle\leq\frac{1}{k}\sum_{i=1}^{k}y_{i}^{2}\left(\frac{\lambda_{k}}{k}-\frac{1}{1-(m^{})^{2}}\right)=\frac{g(m^{})}{k}\sum_{i=1}^{k}y_{i}^{2}$
		$\displaystyle\leq 0.$

∎

Example 2.3.

Even though the assumptions in the previous proposition seem to be tailor-made for its proof (and the conclusion also holds true more generally), there are interesting non-trivial examples of a matrix satisfying the conditions of Proposition 2.2. One of them is the family of $k\times k$ matrices ( $k\in\operatorname{\mathbb{N}}$ ) of the form

A(\alpha,\beta)=(\beta-\alpha)\operatorname{Id}+\alpha O(k,k)

for any parameters $(\alpha,\beta)$ satisfying

(2.5)

\displaystyle\beta+(k-1)\alpha>k\quad\text{and}\quad\beta>\alpha.

This corresponds to $k$ groups with an interaction parameter $\beta$ within the group and $\alpha$ between the groups. For example, the condition (2.5) is satisfied whenever $\beta>\alpha>1$ .

In the general case, the conclusion of Proposition 2.2 holds as well. In this case the proof relies on the fact that the continuous function $I$ has a global maximum on its (compact) domain $[-1,1]^{k}$ , and the next lemma excludes maxima on the boundary. Hence there is always at least one solution $y\neq 0$ (since $0$ is either an inflection point or a minimum) to (2.1).

Lemma 2.4.

Let $I$ be the large deviation rate function from Theorem 1.1, i.e.

I(x)\coloneqq\frac{1}{2}\left\langle x,\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}x\right\rangle-\sum_{i=1}^{k}\gamma_{i}^{2}L^{*}(x_{i}),\quad x\in[-1,1]^{k}

and $L^{*}$ denotes the convex conjugate of $\log\cosh$ .

(1)

$I$ has no global maxima on the boundary of $[-1,1]^{k}$ .
(2)

If $x\in[-1,1]^{k}$ satisfies the mean-field equations, we have

(2.6) $I(x)=\frac{1}{2}\sum_{i=1}^{k}\gamma_{i}^{2}\left(x_{i}\operatorname{artanh}(x_{i})+\log(1-x_{i}^{2})\right).$
(3)

The set of all global maximisers has a positive distance from the boundary.

Proof.

$(1)$ : Assume that $x$ is a global maximum of $I$ on the boundary. Then there is at least one index $j\in\{1,\ldots,k\}$ such that $x_{j}=1$ (if $x_{j}=-1$ , switch to $-x$ since $I(-x)=I(x)$ ). Rewriting the fact that $x$ is a maximum of $I$ , we have for any $y_{j}\in[-1,1]$ and $C\coloneqq\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}$

\frac{1}{2}\left(\left\langle(\overline{x}_{j},1),C(\overline{x}_{j},1)\right\rangle-\left\langle(\overline{x}_{j},y_{j}),C(\overline{x}_{j},y_{j})\right\rangle\right)\geq\gamma_{j}^{2}(L^{*}(1)-L^{*}(y_{j})),

where $\overline{x}_{j}\in\mathbb{R}^{k-1}$ is the vector obtained from $x$ by deleting the $j$ -th component. If we divide both sides by $1-y$ and let $\limsup_{y\to 1}$ , the left hand side is finite, as $\frac{1}{2}\langle x,Cx\rangle\in C^{\infty}(\operatorname{\mathbb{R}}^{k})$ , and the right hand side tends to $\infty$ by l’Hospital’s rule. This proves statement (1).

$(2)$ : Clearly, $x$ can only satisfy the mean-field equations if $x\in(-1,+1)^{k}$ . Since it solves the mean-field equations, for any $i=1,\ldots,k$ we have

\displaystyle\operatorname{artanh}(x_{i})=\sum_{j=1}^{k}A_{ij}\gamma_{j}^{2}x_{j}=(A\Gamma_{\infty}^{2}x)_{i}.

Inserting this into the function $I$ gives

	$\displaystyle I(x)$	$\displaystyle=\frac{1}{2}\sum_{i=1}^{k}\gamma_{i}^{2}x_{i}(\Gamma_{\infty}^{2}Ax)_{i}-\sum_{i=1}^{k}\gamma_{i}^{2}L^{}(x_{i})=\frac{1}{2}\sum_{i=1}^{k}\gamma_{i}^{2}(x_{i}\operatorname{artanh}(x_{i})-2L^{}(x_{i}))$
		$\displaystyle=-\frac{1}{2}\sum_{i=1}^{k}\gamma_{i}^{2}(x_{i}\operatorname{artanh}(x_{i})+\log(1-x_{i}^{2}))$
		$\displaystyle\eqqcolon\frac{1}{2}\sum_{i=1}^{k}\gamma_{i}^{2}R(x_{i}).$

$(3)$ : The function $I$ is bounded in $[-1,1]^{k}$ , as

\lvert I(x)\rvert\leq 2\log 2+\frac{k}{2}\lVert\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}\rVert_{2\to 2}.

On the other hand, if there exists a sequence of maximisers approaching the boundary, i.e. for at least one $i$ we have $x_{i}\to 1$ , this gives $R(x_{i})\to\infty$ . ∎

In the case of two blocks, i.e. $k=2$ , equal block sizes and the same interaction within a group, the set of maximisers of the rate function is explicitly known. Indeed, in [3, Proposition 4.1] and [27, Theorem 2.1] the authors show that for

A=\begin{pmatrix}\beta&\alpha\\ \alpha&\beta\end{pmatrix}

satisfying $\beta\geq\alpha\geq 0$ and $\beta+\alpha>2$ (the low temperature case) the distribution of $\widetilde{m}^{(n)}$ concentrates in the two points $x=(m^{+}((\beta+\alpha)/2),m^{+}((\beta+\alpha)/2)$ , and $-x$ . In the case $\alpha<0$ the limit points for $\widetilde{m}^{(n)}$ become $x=(m^{+}((\beta+\alpha)/2),-m^{+}((\beta+\alpha)/2))$ , and $-x$ . Here $m^{+}(b)$ is the largest solution to

m=\tanh(bm).

If $\beta+\lvert\alpha\rvert\leq 2$ , the distribution of $\widetilde{m}^{(n)}$ concentrates in the origin. For $k=2$ , we can extend this result to arbitrary block sizes.

Proposition 2.5.

Let $k=2$ , $A=\begin{pmatrix}\beta&\alpha\\ \alpha&\beta\end{pmatrix}$ be a block interaction matrix and $\gamma_{1}^{2}=\gamma$ $\gamma_{2}^{2}=(1-\gamma)$ for some $0<\gamma<\frac{1}{2}$ . In the low temperature case, if the groups are not interacting (i.e. $\alpha=0$ ) there exists either two or four global maxima of $I$ ; for $\alpha\neq 0$ , there are always two global maxima of $I$ .

Note that we have to restrict to $\lvert\alpha\rvert<\beta$ and $\beta>0$ in order for $A$ to be positive definite. Moreover, the characterization of the high temperature phase $\Gamma_{\infty}A\Gamma_{\infty}\preceq\mathrm{Id}$ (where $\preceq$ is the Loewner partial ordering) can be reduced to $\langle(\mathrm{Id}-\Gamma_{\infty}A\Gamma_{\infty})e_{1},e_{1}\rangle>0$ and $\det(\mathrm{Id}-\Gamma_{\infty}A\Gamma_{\infty})>0$ . Thus we are in the high temperature regime if and only if

\beta\gamma<1\quad\text{and}\quad(\beta^{2}-\alpha^{2})\gamma(1-\gamma)>\beta-1.

Proof.

The case $\alpha=0$ is an easy consequence of the statements for the one-dimensional Curie–Weiss model, since $I(x_{1},x_{2})=I_{1}(x_{1})+I_{2}(x_{2})$ and $\Gamma_{\infty}^{2}A\Gamma_{\infty}^{2}=\mathrm{diag}(\beta\gamma^{2},\beta(1-\gamma^{2}))$ .

We treat the case $\alpha>0$ only – the case $\alpha<0$ follows immediately from the equality $I_{\alpha,\beta}(x,y)=I_{-\alpha,\beta}(x,-y)$ (with the appropriate modifications, e.g. the maximum will be in the second quadrant instead of the first).

Due to (2.6) the maximum of the rate function is non-negative, let us call this maximum $\eta$ . Then, $I(x,y)=\eta=0$ implies $(x,y)=0$ , which is a contradiction to the low temperature case (recall the Hessian of $I$ in $0$ given in equation (2.3)), so that $\eta>0$ . Moreover, every global maximum (and thus local maximum, as it is not attained on the boundary) satisfies the mean-field equations, and so the value of $I$ at any maximum is given by equation (2.6). As a consequence, all global maxima lie on a contour line $C_{\eta}\coloneqq\{x\in[-1,1]^{2}:\gamma R(x_{1})+(1-\gamma)R(x_{2})=2\eta\}$ , where $R(x)=x\operatorname{artanh}(x)+\log(1-x^{2})$ was defined in the previous lemma.

Firstly, let us show that in the first quadrant there can only be one such point. Due to symmetry, the global maximum will also be present in the third quadrant. For $x_{1}>0$ the points on the contour line $C_{\eta}$ can be described by a function $x_{2}=g(x_{1})$ , and due to the monotonicity of $R$ the function $g$ is non-increasing. Moreover, the solutions of the mean-field equations can be described by the functions

\displaystyle\begin{split}f_{1}(x)&:=\frac{1}{(1-\gamma)\alpha}\left(\operatorname{artanh}(x)-\gamma\beta x\right),\\ f_{2}(x)&:=\frac{1}{\gamma\alpha}\left(\operatorname{artanh}(x)-(1-\gamma)\beta x\right)\end{split}

via

x_{2}=f_{1}(x_{1})\quad\text{and}\quad x_{1}=f_{2}(x_{2}).

The function $f_{1}$ can behave in two ways, depending on the parameter $\gamma\beta$ : For $\gamma\beta\leq 1$ it increases monotonously. For $\gamma\beta>1$ it decreases first and then increases. More precisely, in the latter case, $f_{1}(t)=0$ if and only if $t\in\{0,\pm m_{\gamma\beta}\}$ for some $m_{\gamma\beta}>0$ and $f_{1}$ is strictly increasing for $t\geq m_{\gamma\beta}$ . Moreover, the curve $(x,f_{1}(x))$ is only in the first quadrant if $m_{\gamma\beta}<x\leq 1$ . In either case, there is only one intersection point of $g$ and $f_{1}$ in the first quadrant.

Secondly, the maximum cannot be in the second quadrant. Assume that there are solutions to the mean field equations both in the first and in the second quadrant. If we denote by $m_{c}$ the zeros of $\varphi_{c}(t)\coloneqq\operatorname{artanh}(x)-cx$ , for the solution in the second quadrant, we easily see that $-m_{c}<x<0$ and $0\leq y\leq m_{\beta(1-\gamma)}.$ Hence

I(x,y)=\frac{1}{2}\left(\gamma R(x)+(1-\gamma)R(y)\right)<\frac{1}{2}\left(\gamma R(m_{\beta\gamma})+(1-\gamma)R(m_{(1-\gamma)\beta})\right).

If there is also a solution in the first quadrant with coordinates $(x^{*},y^{*})$ , we obtain analogously

I(x^{*},y^{*})>\frac{1}{2}\left(\gamma R(m_{\beta\gamma)}+(1-\gamma)R(m_{(1-\gamma)\beta})\right).

This yields that the maximum must lie in the first quadrant ∎

Furthermore, we can treat the case $k>2$ for uniform block sizes and special matrices. The proof is motivated by [3, Proposition 4.1].

Lemma 2.6.

Let $k\geq 2$ and $A$ be a block interaction matrix with positive entries such that we have for any $i=1,\ldots,k$ for two constants $c_{1},c_{2}>0$ $A_{ii}=c_{1}$ and $\sum_{j\neq i}A_{ij}=c_{2}$ .

In the uniform case, there are exactly two maximisers of the rate function $I$ and they satisfy $x=m^{*}(1,\ldots,1)$ for $m^{*}$ solving the Curie–Weiss equation $\frac{c_{1}+c_{2}}{k}x=\operatorname{artanh}(x)$ .

Proof.

Using the equality $xy=-\frac{1}{2}(x-y)^{2}+\frac{1}{2}x^{2}+\frac{1}{2}y^{2}$ we can rewrite the rate function as

	$\displaystyle I(x)$	$\displaystyle=\frac{1}{k}\left(\frac{1}{2k}\sum_{i\neq j}A_{ij}x_{i}x_{j}+\frac{c_{1}}{2k}\langle x,x\rangle-\sum_{i=1}^{k}L^{*}(x_{i})\right)$
		$\displaystyle=\frac{1}{k}\left(-\frac{1}{4k}\sum_{i,j}A_{ij}(x_{i}-x_{j})^{2}+\frac{c_{1}+c_{2}}{2k}\langle x,x\rangle-\sum_{i=1}^{k}L^{*}(x_{i})\right)$
		$\displaystyle\leq\frac{c_{1}+c_{2}}{2k^{2}}\langle x,x\rangle-\frac{1}{k}\sum_{i=1}^{k}L^{*}(x_{i}),$

where equality only holds in the case $x_{i}=x_{j}$ for all $i,j$ . Thus, we search for maximisers of $I$ on the generalized diagonal $\{x\in[-1,1]^{k}:x_{i}=x_{j}\,\forall i,j\}$ . On this set we have

I((x,\ldots,x))=\frac{c_{1}+c_{2}}{2k}x^{2}-L^{*}(x),

i.e it reduces to the Curie–Weiss equations in one dimension. For $c_{1}+c_{2}>k$ it has a unique nonzero solution $m^{*}$ , and $x=m^{*}(1,\ldots,1)$ solves the $k$ -dimensional maximization problem. ∎

Unfortunately, the proof cannot be modified in a straightforward way to deal with non-equal block sizes, not even in the case $k=2$ . The reason is that the inequality used in the proof does not give any information on the actual maximiser in this setting (i. e. $I$ is not maximized on any type of (weighted) diagonal). As such, we cannot reduce this to the one-dimensional setting.

Example.

For example, Lemma 2.6 can be used to prove that given three positive parameters $\alpha,\beta,\gamma$ with $\beta>\alpha$ and $\beta+\alpha>2\gamma$ , the rate function corresponding to

A=\begin{pmatrix}\beta&\alpha&\gamma&\gamma\\ \alpha&\beta&\gamma&\gamma\\ \gamma&\gamma&\beta&\alpha\\ \gamma&\gamma&\alpha&\beta\end{pmatrix}

only has two maximisers in the uniform case. The conditions on $\alpha,\beta,\gamma$ ensure that $A$ is positive definite, and it is clear that $c_{1}=\beta$ and $c_{2}=\alpha+2\gamma$ .

As a concluding remark let us note that the previous results imply that there is indeed a phase transition in our block spin model. However, if $k>2$ or the block sizes are not equal, it seems hard to give a similarly explicit formula for the limit points. Nevertheless, the above observations show that there is a phase transition in a very general class of block spin models with an arbitrary number of blocks and general class of block sizes. In particular, they also justify the names “high temperature regime” and “low temperature regime”.

3. Proofs of the limit theorems

In this section we prove (standard and non-standard) Central Limit Theorems for the vector $\widehat{m}^{(n)}$ . In the first subsection we will treat the high temperature regime. Here we derive a standard CLT using the Hubbard–Stratonovich transform. This is in spirit similar to the third section in [27] and technically related to [22]. The result can also be derived from [17], where similar techniques are used. However, the subsection also prepares nicely for Subsection 3.2, where we treat the critical case and show a non standard CLT. This generalizes results from [18] and [27]. Finally, in Subsection 3.3 we will use Stein’s method, an alternative approach to prove the CLT for $\widehat{m}^{(n)}$ . This is not only interesting in its own right, but also has the advantage of providing a speed of convergence, which is missing in the case of a proof via the Hubbard–Stratonovich transform.

3.1. Central limit theorem: Hubbard–Stratonovich approach

For the proof we shall use the transformed block magnetization vectors

	$\displaystyle w^{(n)}$	$\displaystyle\coloneqq V_{n}m^{(n)},$
	$\displaystyle\widehat{w}^{(n)}$	$\displaystyle\coloneqq V_{n}\widehat{m}^{(n)},$
	$\displaystyle\widetilde{w}^{(n)}$	$\displaystyle=V_{n}\Gamma_{n}\widetilde{m}^{(n)},$

where $\Gamma_{n}A\Gamma_{n}=V_{n}^{T}\Lambda_{n}V_{n}$ is the orthogonal decomposition. It is easy to see that

H_{n}=\frac{1}{2N}\langle w^{(n)},\Lambda_{n}w^{(n)}\rangle=\frac{1}{2}\left\langle\widehat{w}^{(n)},\Lambda_{n}\widehat{w}^{(n)}\right\rangle=\frac{N}{2}\left\langle\Lambda_{n}\widetilde{w}^{(n)},\widetilde{w}^{(n)}\right\rangle.

Proof of Theorem 1.3.

As in [27] or [17] (both papers are inspired by [16]), we use the Hubbard–Stratonovich transform (i.e. a convolution with an independent normal distribution). For each $n\in\operatorname{\mathbb{N}}$ ,

\mu_{J_{n}}(\sigma)=Z_{n}^{-1}\exp\left(\frac{1}{2}\langle\Lambda_{n}\widehat{w}^{n},\widehat{w}^{n}\rangle\right).

Our first step is to prove that $\widehat{w}^{n}$ converges weakly to a normal distribution. Let $Y_{n}\sim\mathcal{N}(0,\Lambda_{n}^{-1})$ be an independent sequence, which is moreover independent of $(\widehat{w}^{n})_{n\in\operatorname{\mathbb{N}}}$ . We have for any $B\in\mathcal{B}(\operatorname{\mathbb{R}}^{k})$

	$\displaystyle\operatorname{\mathbb{P}}(\widehat{w}^{n}+Y_{n}\in B)$
	$\displaystyle=Z_{n}^{-1}\sum_{\sigma\in\{\pm 1\}^{N}}\mu_{J_{n}}(\sigma)\int_{B}\exp\left(-\frac{1}{2}\langle x-\widehat{w}^{n},\Lambda_{n}(x-\widehat{w}^{n})\rangle\right)dx$
	$\displaystyle=\frac{2^{N}}{C_{n}Z_{n}}\int_{B}\exp\left(-\frac{1}{2}\langle x,\Lambda_{n}x\rangle\right)\operatorname{\mathbb{E}}_{\mu_{0}}\exp\left(N\left\langle\frac{1}{\sqrt{N}}\Gamma_{n}V^{T}\Lambda_{n}x,\frac{1}{N}\Gamma_{n}^{-2}m\right\rangle\right)dx$
	$\displaystyle=\frac{2^{N}}{C_{n}Z_{n}}\int_{B}\exp\left(-\Phi_{n}(x)\right)dx,$

where we have defined

\displaystyle\begin{split}\Phi_{n}(x)&\coloneqq\frac{1}{2}\langle x,\Lambda_{n}x\rangle-\sum_{i=1}^{k}\lvert B_{i}^{(n)}\rvert\log\cosh\left(\frac{\sqrt{N}}{\lvert B_{i}^{(n)}\rvert}(\Gamma_{n}V_{n}\Lambda_{n}x)_{i}\right)\\ &=\frac{1}{2}\langle x,\Lambda_{n}x\rangle-\sum_{i=1}^{k}\lvert B_{i}^{(n)}\rvert\log\cosh\left(\lvert B^{(n)}_{i}\rvert^{-1/2}(V_{n}\Lambda_{n}x)_{i}\right).\end{split}

Since $\log\cosh(x)=\frac{1}{2}x^{2}+O(x^{4})$ , we obtain

(3.1)		$\displaystyle\Phi_{n}(x)$	$\displaystyle=\frac{1}{2}\langle x,\Lambda_{n}x\rangle-\frac{1}{2}\langle x,\Lambda_{n}^{2}x\rangle+\frac{1}{N}O\left(\sum_{i=1}^{k}\frac{N}{\lvert B^{(n)}_{i}\rvert}(V_{n}\Lambda_{n}x)_{i}^{4}\right)$
		$\displaystyle=\frac{1}{2}\langle x,(\Lambda_{n}-\Lambda_{n}^{2})x\rangle+\frac{1}{N}O(\lVert\Gamma_{n}^{-1/2}V_{n}\Lambda_{n}x\rVert_{4}^{4}).$

For parameters $r,R>0$ let $B_{0,r,R}\coloneqq\{x\in\operatorname{\mathbb{R}}^{k}:r\leq\lVert x\rVert_{2}^{2}\leq R\}$ and decompose

	$\displaystyle\operatorname{\mathbb{P}}(\widehat{w}^{n}+Y_{n}\in B)$	$\displaystyle=\frac{2^{N}}{C_{n}Z_{n}}\left(\int_{B\cap B_{R}(0)}+\int_{B\cap B_{0,R,r\sqrt{N}}}+\int_{B\cap B_{r\sqrt{N}}(0)^{c}}\right)\exp\left(-\Phi_{n}(x)\right)dx$
		$\displaystyle\eqqcolon\frac{2^{N}}{C_{n}Z_{n}}\left(I_{1}+I_{2}+I_{3}\right).$

Since $\Lambda_{n}\to\Lambda_{\infty}$ (which is a consequence of the continuity of the eigenvalues) we have for any $R>0$

\lim_{n\to\infty}I_{1}=\int_{B\cap B_{R}(0)}\exp\left(-\frac{1}{2}\langle x,(\Lambda_{\infty}-\Lambda_{\infty}^{2})x\rangle\right)dx.

Next, we will estimate (3.1) from below in order to obtain an upper bound for $I_{2}$ . If we define $C_{2,4}\coloneqq\lVert\operatorname{Id}\rVert_{2\to 4}$ , it follows that

\displaystyle\begin{split}\Phi_{n}(x)&\geq\frac{1}{2}\langle x,(\Lambda_{n}-\Lambda_{n}^{2})x\rangle-C(r)\lVert\Gamma_{n}^{-1/2}\rVert_{4\to 4}r^{2}\lVert\Lambda_{n}\rVert_{2\to 2}^{2}\langle x,x\rangle\\ &\geq\frac{1}{2}\langle x,\left(\Lambda_{n}-\Lambda_{n}^{2}-C(r)r^{2}C\right)x\rangle\\ &\geq c\frac{1}{2}\langle x,x\rangle.\end{split}

Here, we have used the convergence of $\Gamma_{n}$ to $\Gamma_{\infty}$ to bound $\lVert\Gamma_{n}^{-1/2}\rVert_{4\to 4}$ and the fact that $C(r)r^{2}\to 0$ as $r\to 0$ , so that the right hand side is positive definite for $r$ small enough, uniformly in $n$ . Thus, after taking the limit $n\to\infty$ , $I_{2}$ will vanish in the limit $R\to\infty$ .

Lastly, we need to show that $I_{3}$ vanishes as well. To this end, we show that we can choose $r>0$ small enough to ensure that $\Phi_{n}(x)\geq\exp(-Nc)$ uniformly for $x\in B_{r\sqrt{N}}(0)^{c}$ and for $n$ large enough. Since $\lVert\Lambda_{n}-\Lambda_{\infty}\rVert_{2\to 2}\to 0$ and $\lVert\Lambda_{\infty}\rVert_{2\to 2}<1$ , choose $n$ large enough so that $\lVert\Lambda_{n}\rVert_{2\to 2}<1$ uniformly. Again, as before, it can be seen that $0$ is the only minimum for $n$ chosen that way. Indeed, after some manipulations any critical point satisfies $\Gamma_{n}A\Gamma_{n}\tanh(y)=y$ , and since $\lVert\tanh(y)\rVert_{2}\leq\lVert y\rVert_{2}$ and $\lVert\Gamma_{n}A\Gamma_{n}\rVert_{2\to 2}<1$ , this is only possible for $y=0$ . As a consequence, for any $r>0$ there is a constant $c$ such that uniformly $\widetilde{\Phi}_{n}(x)\geq c$ , i.e.

I_{3}\leq\int\text{$\mathbbm{1}$}_{\{\lVert x\rVert_{2}>r\sqrt{N}\}}\exp\left(-\Phi_{n}(x)\right)dx\leq\int_{B_{r\sqrt{N}}(0)^{c}}\exp\left(-N\widetilde{\Phi}_{n}(N^{-1/2}x)\right)dx\to 0.

Lastly, choose $r>0$ so small that $\Lambda_{n}-\Lambda_{n}^{2}-C(r)r^{2}C$ is uniformly positive definite, and observe that we obtain

\lim_{n\to\infty}\operatorname{\mathbb{P}}(\widehat{w}^{n}+Y_{n}\in B)=\mathcal{N}(0,(\Lambda_{\infty}-\Lambda_{\infty}^{2})^{-1})(B).

From here, it remains to undo the convolution (e.g. by using the characteristic function), giving

\lim_{n\to\infty}\mu_{J_{n}}(\widehat{w}^{n}\in B)=\mathcal{N}(0,(\operatorname{Id}-\Lambda_{\infty})^{-1})(B).

With the help of Slutsky’s theorem and the definition $\widehat{m}^{n}=V^{T}_{n}\widehat{w}^{n}$ this implies

\mu_{J_{n}}\circ\widehat{m}^{n}\Rightarrow\mathcal{N}(0,V^{T}(\operatorname{Id}-\Lambda_{\infty})^{-1}V)=\mathcal{N}(0,\left(\operatorname{Id}-\Gamma_{\infty}A\Gamma_{\infty}\right)^{-1})

as claimed. ∎

Example.

Consider the case $k=2$ and

A_{2}=\begin{pmatrix}\beta&\alpha\\ \alpha&\beta\end{pmatrix}.

$A_{2}$ is positive definite if $\beta\geq 0$ and $(\beta-\alpha)(\beta+\alpha)\geq 0$ , i.e. if $\lvert\alpha\rvert\leq\beta$ . We have the diagonalization

A_{2}=\frac{1}{2}\begin{pmatrix}1&1\\ 1&-1\end{pmatrix}\begin{pmatrix}\beta+\alpha&0\\ 0&\beta-\alpha\end{pmatrix}\begin{pmatrix}1&1\\ 1&-1\end{pmatrix}\eqqcolon V^{T}\Lambda V,

and $w=V^{T}m=\frac{1}{\sqrt{2}}\begin{pmatrix}1&1\\ 1&-1\end{pmatrix}m$ corresponds to the transformation performed in [27, Theorem 1.2] (up to a factor of $\sqrt{2}$ ). In this case

\left(\operatorname{Id}-\frac{1}{2}A_{2}\right)^{-1}=\frac{2}{(\beta-2)^{2}-\alpha^{2}}\begin{pmatrix}2-\beta&\alpha\\ \alpha&2-\beta\end{pmatrix}

which is exactly the covariance matrix in [27] (again up to a factor of $2$ ). Note that similar results have been derived in [25].

Remark.

If $A\in M_{k}(\operatorname{\mathbb{R}})$ is symmetric and positive semidefinite, then a variant of the proof shows that if we let $A=V^{T}\Lambda V$ with $\Lambda=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{l},0,\ldots,0)$ for $l<k$ , $((V\widetilde{m})_{i})_{i\leq l}$ converges to an $l$ -dimensional normal distribution with covariance matrix $\Sigma_{l}\coloneqq(\operatorname{Id}-\Lambda_{l})^{-1},\Lambda_{l}=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{l})$ . This can be applied to the matrix $A_{2}$ above with $\alpha=\beta$ , resulting in a CLT for the magnetization in a Curie–Weiss model, which of course can also be obtained by choosing $k=1$ and $0<\beta<1$ .

3.2. Non-central limit theorem

Recall the situation of Theorem 1.4: The block interaction matrix has eigenvalues $0<\lambda_{1}\leq\ldots\leq\lambda_{k-1}<\lambda_{k}=k$ and we consider the uniform case, i.e. $\Gamma_{\infty}^{2}=k^{-1}$ . Moreover, we use the definitions

	$\displaystyle w^{\prime}$	$\displaystyle=\operatorname{diag}(N^{-1/2},\ldots,N^{-1/2},N^{-3/4})Vm^{(n)},$
	$\displaystyle\hat{C}_{N}$	$\displaystyle=\operatorname{diag}(\lambda_{1},\ldots,\lambda_{k-1},kN^{1/2}),$

so that

H_{n}=\frac{1}{2}\langle\hat{C}_{N}w^{\prime},w^{\prime}\rangle.

Proof of Theorem 1.4.

Let $Y_{n}\sim\mathcal{N}(0,\hat{C}_{N}^{-1})$ and $X_{n}\sim\mu_{J_{n}}$ be independent random variables, defined on a common probability space. We have for any Borel set $B\in\mathcal{B}(\operatorname{\mathbb{R}}^{k})$

	$\displaystyle\operatorname{\mathbb{P}}\left(w_{n}^{\prime}(X_{n})+Y_{n}\in B\right)$	$\displaystyle=2^{N}Z_{n}^{-1}\int_{B}\exp\left(-\frac{1}{2}\langle\hat{C}_{N}x,x\rangle\right)\operatorname{\mathbb{E}}_{\mu_{0}}\exp\left(\langle x,\hat{C}w^{\prime}\rangle\right)dx$
		$\displaystyle=\widetilde{Z}_{n}^{-1}\int_{B}\exp\left(-\frac{1}{2}\langle\hat{C}_{N}x,x\rangle+\frac{N}{k}\sum_{i=1}^{k}\log\cosh((V^{T}\Lambda\widetilde{x})_{i})\right)dx$
		$\displaystyle=\widetilde{Z}_{n}^{-1}\int_{B}\exp\left(-\Phi_{N}(x)\right)dx$
		$\displaystyle=\widetilde{Z}_{n}^{-1}\int_{B}\exp\left(-N\widetilde{\Phi}_{N}\left(\frac{x_{1}}{N^{1/2}},\ldots,\frac{x_{k-1}}{N^{1/2}},\frac{x_{k}}{N^{1/4}}\right)\right)dx$

where we used

	$\displaystyle\Phi_{N}(x)$	$\displaystyle\coloneqq\frac{1}{2}\langle x,\hat{C}_{N}x\rangle-\frac{N}{k}\sum_{i=1}^{k}\log\cosh\left(\left(V^{T}\Lambda\left(\frac{x_{1}}{N^{1/2}},\ldots,\frac{x_{k-1}}{N^{1/2}},\frac{x_{k}}{N^{1/4}}\right)\right)_{i}\right),$
	$\displaystyle\widetilde{\Phi}_{N}(x)$	$\displaystyle\coloneqq\frac{1}{2}\langle x,\Lambda x\rangle-\frac{1}{k}\sum_{i=1}^{k}\log\cosh\left((V^{T}\Lambda x)_{i}\right).$

Now the proof is along the same lines as the proof of the CLT in the high temperature phase, with the slight modification that we use expansion of $\log\cosh$ to fourth order

\log\cosh(x)=\frac{x^{2}}{2}-\frac{x^{4}}{12}+O(x^{6}).

We again split $\operatorname{\mathbb{R}}^{k}$ into three regions, namely the inner region $I_{1}=B_{R}(0)$ for an arbitrary $R>0$ , the intermediate region $I_{2}=K_{r}\backslash B_{R}(0)$ for some arbitrary $r>0$ , where

K_{r}\coloneqq\left\{x\in\operatorname{\mathbb{R}}^{k}:\left\lVert\left(N^{-1/2}x_{1},\ldots,N^{-1/2}x_{k-1},N^{-1/4}x_{k}\right)\right\rVert_{\infty}\leq r\right\},

and the outer region $I_{3}\coloneqq K_{r}^{c}$ . Also define the rescaled vector

\displaystyle\widetilde{x}\coloneqq\left(\lambda_{1}N^{-1/2}x_{1},\ldots,\lambda_{k-1}N^{-1/2}x_{k-1},kN^{-1/4}x_{k}\right).

Firstly, in the inner region we rewrite

	$\displaystyle\Phi_{N}(x)$	$\displaystyle=\frac{1}{2}\langle x,\hat{C}_{N}x\rangle-\frac{N}{2k}\sum_{i=1}^{k}(V^{T}\widetilde{x})_{i}^{2}+\frac{N}{12k}\sum_{i=1}^{k}(V^{T}\widetilde{x})_{i}^{4}+\frac{N}{k}O(\lVert V^{T}\widetilde{x}\rVert_{6}^{6})$
		$\displaystyle=\frac{1}{2}\sum_{i=1}^{k-1}\left(\lambda_{i}-\frac{\lambda_{i}^{2}}{k}\right)x_{i}^{2}+\frac{N}{12k}\lVert V^{T}\widetilde{x}\rVert_{4}^{4}+\frac{N}{k}O(\lVert V^{T}\widetilde{x}\rVert_{6}^{6})$
		$\displaystyle=\frac{1}{2}\sum_{i=1}^{k-1}\left(\lambda_{i}-\frac{\lambda_{i}^{2}}{k}\right)x_{i}^{2}+\frac{k^{3}}{12}x_{k}^{4}\sum_{i=1}^{k}V_{ki}^{4}+O(N^{-1/4})+\frac{N}{k}O(\lVert V^{T}\widetilde{x}\rVert_{6}^{6}),$

and since the convergence of the error terms is uniform on any compact subset of $\operatorname{\mathbb{R}}^{k}$ , for any fixed $R>0$ this yields

\displaystyle\lim_{N\to\infty}\int_{B\cap I_{1}}\exp\left(-\Phi_{N}(x)\right)dx=\int_{B\cap I_{1}}\exp\left(-\frac{1}{2}\sum_{i=1}^{k-1}\left(\lambda_{i}-\frac{\lambda_{i}^{2}}{k}\right)x_{i}^{2}-\frac{k^{3}}{12}x_{k}^{4}\sum_{i=1}^{k}V_{ki}^{4}\right)dx.

Secondly, we show that the outer region does not contribute to the limit $N\to\infty$ . It can be seen by elementary tools that $\widetilde{\Phi}_{N}$ has a unique minimum $0$ in $0$ , and so for any $r>0$ we have $\inf_{x\in I_{3}}\widetilde{\Phi}(x)>0$ . Using the monotone convergence theorem, we obtain

\lim_{N\to\infty}\int_{B\cap I_{3}}\exp\left(-N\widetilde{\Phi}(x)\right)dx=0.

Lastly, we will estimate the contribution of the intermediate region from above by a quantity which vanishes as $R\to\infty$ . To this end, we will bound the function $\Phi_{N}$ from below. Recall that

	$\displaystyle\Phi_{N}(x)$	$\displaystyle=\frac{1}{2}\langle x,\hat{C}_{N}x\rangle-\frac{N}{2k}\sum_{i=1}^{k}(V^{T}\widetilde{x})_{i}^{2}+\frac{N}{12k}\sum_{i=1}^{k}(V^{T}\widetilde{x}_{i})^{4}+\frac{N}{k}O(\lVert V^{T}\widetilde{x}\rVert_{6}^{6})$
		$\displaystyle=\frac{1}{2}\langle x,\hat{C}_{N}x\rangle-\frac{N}{2k}\langle\widetilde{x},\widetilde{x}\rangle+\frac{N}{12k}\lVert V^{T}\widetilde{x}\rVert_{4}^{4}+\frac{N}{k}O(\lVert V^{T}\widetilde{x}\rVert_{6}^{6})$

and since $\lVert V^{T}\widetilde{x_{i}}\rVert_{4}^{4}\geq C\lVert\widetilde{x}_{i}\rVert^{4}_{4}$ for $C=\lVert V\rVert_{4\to 4}^{-4}$ this yields

	$\displaystyle\Phi_{N}(x)$	$\displaystyle\geq\frac{1}{2}\langle x,\hat{C}_{N}x\rangle-\frac{N}{2k}\langle\widetilde{x},\widetilde{x}\rangle+\frac{N}{12k}C\lVert\widetilde{x}\rVert_{4}^{4}+\frac{N}{k}O(\lVert V^{T}\widetilde{x}\rVert_{6}^{6})$
		$\displaystyle=\frac{1}{2}\langle\left(\Lambda-k^{-1}\Lambda\right)x,x\rangle+\frac{k^{4}}{12}Cx_{k}^{4}+O(\lVert V^{T}\widetilde{x}\rVert_{6}^{6}).$

Now, as in the case of the central limit theorem, we can estimate from below the error term in such a way that there is a positive constant $c$ and a positive definite matrix $C$ such that

\Phi_{N}(x)\geq\frac{1}{2}\langle C(x_{1},\ldots,x_{k-1},0),(x_{1},\ldots,x_{k-1},0)\rangle+cx_{k}^{4},

from which we obtain an upper bound, i.e.

\int_{B\cap I_{3}}\exp\left(-\Phi_{N}(x)\right)dx\leq\int_{B\cap I_{3}}\exp\left(-\frac{1}{2}\sum_{i,j=1}^{k-1}C_{ij}x_{i}x_{j}-cx_{k}^{4}\right)dx,

and the right hand side vanishes as $R\to\infty$ by dominated convergence. As a result, the limit $n\to\infty$ exists and is equal to

\lim_{n\to\infty}\operatorname{\mathbb{P}}\left(w^{\prime}_{n}(X_{n})+Y_{n}\in B\right)=Z^{-1}\int_{B}\exp\left(-\frac{1}{2}\sum_{i=1}^{k-1}\left(\lambda_{i}-\frac{\lambda_{i}^{2}}{k}\right)x_{i}^{2}-\frac{k^{3}}{12}x_{k}^{4}\sum_{i=1}^{k}V_{ki}^{4}\right)dx.

The convergence results for the non-convoluted vector follow easily by considering the characteristic functions. We have for any $t\in\operatorname{\mathbb{R}}^{k}$

\operatorname{\mathbb{E}}\exp\left(i\langle t,w_{n}^{\prime}(X_{n})+Y_{n}\rangle\right)\to\exp\left(-\frac{1}{2}\langle(t_{1},\ldots,t_{k-1},\widetilde{\Sigma}(t_{1},\ldots,t_{k-1}))\rangle\right)\varphi(t_{k}),

where $\widetilde{\Sigma}=\operatorname{diag}\left(\lambda_{i}^{-1}+(k-\lambda_{i})^{-1}\right)$ and $\varphi$ is the characteristic function of a random variable with distribution $\exp\left(-x_{k}^{4}k^{3}/12\sum_{i=1}^{k}V_{ki}^{4}\right)$ . Using the independence of $X_{n}$ and $Y_{n}$ , the results follow by simple calculations. ∎

3.3. Central limit theorem: Stein’s method

Lastly, we will prove Theorem 1.5 using Stein’s method of exchangeable pairs. For brevity’s sake, for the rest of this section we fix $n\in\operatorname{\mathbb{N}}$ and we will drop all sub- and superscripts (e.g. we write $B_{i}$ instead of $B_{i}^{(n)}$ , $\hat{m}$ instead of $\hat{m}^{(n)}$ , $J$ instead of $J_{n}$ et cetera). It is more convenient to formulate this approach in terms of random variables. Let $X$ be a random vector with distribution $\mu_{J}$ and $I$ be an independent random variable uniformly distributed on $\{1,\ldots,N\}$ . First, denote by $(X,\widetilde{X})$ the exchangeable pair which is given by taking a step in the Glauber chain for $\mu_{J}$ , i.e. $\widetilde{X}$ is the vector after replacing $X_{I}$ by an independent $\widetilde{X}_{I}$ with distribution $\widetilde{X}_{I}\sim\mu_{J}(\cdot\mid\overline{X}_{I})$ (the exchangeability follows from the reversibility of the Glauber dynamics). Consequently, $(\hat{m},\hat{m}^{\prime})=(\hat{m}(X),\hat{m}(\widetilde{X}))$ is also exchangeable. More precisely, with the standard basis vectors $(e_{i})_{i=1,\ldots,k}$ of $\operatorname{\mathbb{R}}^{k}$ we have

(3.2)

\hat{m}^{\prime}\coloneqq\hat{m}-\frac{X_{I}-\widetilde{X}_{I}}{\sqrt{\lvert B_{I}\rvert}}\begin{pmatrix}1\\ \vdots\\ 1\end{pmatrix}\Rightarrow\hat{m}-\hat{m}^{\prime}=\frac{X_{I}-\widetilde{X}_{I}}{\sqrt{M}}e_{h(I)}.

We need the following lemma to identify the conditional expectation of $\widetilde{X}_{i}$ . Here, we write $h:\{1,\ldots,N\}\to\{1,\ldots,k\}$ for the function that assigns to each position its block, i.e. $h(j)=k\Longleftrightarrow j\in B_{k}$ .

Lemma 3.1.

Let $\mathcal{F}=\sigma(X)$ and $(X,\widetilde{X})$ be defined as above. Then for each fixed $i\in\{1,\ldots,N\}$

\operatorname{\mathbb{E}}\left(\widetilde{X}_{i}\mid\mathcal{F}\right)=\tanh\left(\frac{1}{\sqrt{N}}(A\Gamma\hat{m})_{i}-\frac{1}{N}A_{h(i)h(i)}X_{i}\right).

Proof.

For any Ising model $\mu=\mu_{J}$ the conditional distribution of $\widetilde{X}_{i}$ is given by $\mu(\cdot\mid\overline{X}_{i})$ and so

\operatorname{\mathbb{E}}\left(\widetilde{X}_{i}\mid\mathcal{F}\right)=2\mu(1\mid\overline{X}_{i})-1=\tanh\left((J^{(d)}X)_{i}\right),

where we recall the notation $J^{(d)}$ for the matrix without its diagonal, i.e. $J^{(d)}=J-\operatorname{diag}(J_{ii})$ . In the case that $J=J_{n}$ is the block model matrix, this yields

	$\displaystyle\operatorname{\mathbb{E}}\left(\widetilde{X}_{i}\mid\mathcal{F}\right)$	$\displaystyle=\tanh\left(N^{-1}\sum_{j=1}^{k}A_{h(i)j}\sum_{l\in B_{j}}X_{l}-N^{-1}A_{h(i)h(i)}X_{i}\right)$
		$\displaystyle=\tanh\left(N^{-1}(Am)_{h(i)}-N^{-1}A_{h(i)h(i)}X_{i}\right)$
		$\displaystyle=\tanh\left(N^{-1/2}(A\Gamma\hat{m})_{i}-N^{-1}A_{h(i)h(i)}X_{i}\right).$

∎

Since the conditional expectation will be of importance, we define

g_{i}(X)\coloneqq N^{-1}(Am)_{h(i)}-N^{-1}A_{h(i)h(i)}X_{i}=N^{-1/2}(A\Gamma\hat{m})_{i}-N^{-1}A_{h(i)h(i)}X_{i},

so that $\operatorname{\mathbb{E}}(\widetilde{X}_{i}\mid\mathcal{F})=\tanh(g_{i}(X))$ . Note that $g_{i}$ actually does not depend on $X_{i}$ , the latter term is added for convenience to rewrite the first term. Thus we have $g_{i}(X)=\operatorname{\mathbb{E}}(\widetilde{X}_{i}\mid\overline{X}_{i})$ .

Lemma 3.2.

We have

\operatorname{\mathbb{E}}\left(\hat{m}-\hat{m}^{\prime}\mid\mathcal{F}\right)=N^{-1}\left(\operatorname{Id}-\Gamma A\Gamma\right)\hat{m}+R(X),

with

R(X)\coloneqq N^{-1}\sum_{i=1}^{k}e_{i}\left((\Gamma A\Gamma\hat{m})_{i}-\lvert B_{i}\rvert^{-1/2}\sum_{j\in B_{i}}\tanh\left(g_{j}(X)\right)\right).

Proof.

From equation (3.2) and Lemma 3.1 we obtain

	$\displaystyle\operatorname{\mathbb{E}}\left(\hat{m}-\hat{m}^{\prime}\mid\mathcal{F}\right)$	$\displaystyle=N^{-1}\sum_{i=1}^{k}e_{i}\lvert B_{i}\rvert^{-1/2}\sum_{j\in B_{i}}\operatorname{\mathbb{E}}(X_{j}-\widetilde{X_{j}}\mid\mathcal{F})$
		$\displaystyle=N^{-1}\sum_{i=1}^{k}e_{i}\hat{m}_{i}-N^{-1}\sum_{i=1}^{k}e_{i}\lvert B_{i}\rvert^{-1/2}\sum_{j\in B_{i}}\tanh(g_{j}(X))$
		$\displaystyle=N^{-1}\hat{m}-N^{-1}\sum_{i=1}^{k}e_{i}\lvert B_{i}\rvert^{-1/2}\Big{(}\sum_{j\in B_{i}}N^{-1/2}(A\Gamma\hat{m})_{i}\Big{)}+R(X)$
		$\displaystyle=N^{-1}\left(\operatorname{Id}-\Gamma A\Gamma\right)\hat{m}+R(X).$

∎

For $n$ large enough, the matrix $\Lambda\coloneqq N^{-1}(\operatorname{Id}-\Gamma A\Gamma)$ satisfies $\lVert\Lambda\rVert_{2\to 2}<\frac{1}{N}$ and is thus invertible, with inverse $\Lambda^{-1}=N\sum_{l=0}^{\infty}(\Gamma A\Gamma)^{l}$ . Moreover, we also have $\lVert\Lambda^{-1}\rVert_{2\to 2}\leq N(1-\lVert\Gamma A\Gamma\rVert_{2\to 2})^{-1}$ .

We will need the following approximation theorem for random vectors.

Theorem 3.3 ([30], Theorem 2.1).

Assume that $(W,W^{\prime})$ is an exchangeable pair of $\operatorname{\mathbb{R}}^{d}$ -valued random vectors such that

\operatorname{\mathbb{E}}W=0,\quad\quad\operatorname{\mathbb{E}}WW^{t}=\Sigma,

with $\Sigma\in\operatorname{\mathbb{R}}^{d\times d}$ symmetric and positive definite. Suppose further that

\operatorname{\mathbb{E}}[W^{\prime}-W\mid W]=-\Lambda W+R

is satisfied for an invertible matrix $\Lambda$ and a $\sigma(W)$ -measurable random vector $R$ . Then, if $Z$ has $d$ -dimensional standard normal distribution, we have for every three times differentiable function

\lvert\operatorname{\mathbb{E}}h(W)-\operatorname{\mathbb{E}}h(\Sigma^{1/2}Z)\rvert\leq\frac{\lvert h\rvert_{2}}{4}E_{1}+\frac{\lvert h\rvert_{3}}{12}E_{2}+\left(\lvert h\rvert_{1}+\frac{1}{2}d\lVert\Sigma\rVert^{1/2}\lvert h\rvert_{2}\right)E_{3},

where, with $\lambda{(i)}\coloneqq\sum_{m=1}^{d}\lvert\left(\Lambda^{-1}\right)_{m,i}\rvert$ , we define the three error terms

	$\displaystyle E_{1}$	$\displaystyle=\sum_{i,j=1}^{d}\lambda{(i)}\sqrt{\operatorname{Var}\operatorname{\mathbb{E}}\left[(W_{i}^{\prime}-W_{i})(W_{j}^{\prime}-W_{j})\mid W\right]},$
	$\displaystyle E_{2}$	$\displaystyle=\sum_{i,j,k=1}^{d}\lambda{(i)}\operatorname{\mathbb{E}}\lvert(W_{i}^{\prime}-W_{i})(W_{j}^{\prime}-W_{j})(W_{k}^{\prime}-W_{k})\rvert,$
	$\displaystyle E_{3}$	$\displaystyle=\sum_{i=1}^{d}\lambda{(i)}\sqrt{\operatorname{Var}R_{i}}.$

Here, $\lvert h\rvert_{j}$ denotes the supremum of the partial derivatives of up to order $j$ .

Note that in the proof the choice of $\sigma(W)$ for the conditional expectation is arbitrary; it suffices to take any $\sigma$ -algebra $\mathcal{F}$ with respect to which $W$ is measurable. Clearly, the value $E_{1}$ has to be adjusted accordingly.

Corollary 3.4.

Let $\hat{m}$ be the block magnetization vector and $\hat{m}^{\prime}$ as above, define $\Sigma\coloneqq\operatorname{\mathbb{E}}\hat{m}\hat{m}^{T}$ and let $Z\sim\mathcal{N}(0,\Sigma)$ . For any function $h\in\mathcal{F}_{3}$

\lvert\operatorname{\mathbb{E}}h(\hat{m}(X))-\operatorname{\mathbb{E}}h(Z)\rvert\leq CN\left(\frac{\lvert h\rvert_{2}}{4}E_{1}+\frac{\lvert h\rvert_{3}}{12}E_{2}+\left(\lvert h\rvert_{1}+\frac{1}{2}k\lVert\Sigma\rVert^{1/2}\lvert h\rvert_{2}\right)E_{3}\right)

with the three error terms

	$\displaystyle E_{1}$	$\displaystyle=\sum_{i=1}^{k}\sqrt{\operatorname{Var}\left(\operatorname{\mathbb{E}}((\hat{m}_{i}(X)-\hat{m}_{i}(\widetilde{X}))^{2}\mid\mathcal{F})\right)}$
	$\displaystyle E_{2}$	$\displaystyle=\sum_{i=1}^{k}\operatorname{\mathbb{E}}\lvert\hat{m}_{i}(X)-\hat{m}_{i}(\widetilde{X})\rvert^{3}$
	$\displaystyle E_{3}$	$\displaystyle=\sum_{i=1}^{k}\sqrt{\operatorname{Var}(R_{i})}.$

Finally, the following lemma shows that all error terms $E_{i}$ can be bounded by a term of order $N^{-3/2}$ .

Lemma 3.5.

In the situation of Corollary 3.4 we have

\max(E_{1},E_{2},E_{3})=O(N^{-3/2}).

Before we prove this lemma (and consequently Theorem 1.5), we will state concentration of measure results in the block spin Ising models. These will be necessary to bound $E_{1},E_{2},E_{3}$ . The first step is the existence of a logarithmic Sobolev inequality for the Ising model $\mu_{J_{n}}$ with a constant that is uniform in $n$ .

Proposition 3.6.

Under the general assumptions, if $\lVert\Gamma_{\infty}A\Gamma_{\infty}\rVert_{2\to 2}<1$ , then for $n$ large enough the Ising model $\mu_{J_{n}}$ satisfies a logarithmic Sobolev inequality with a constant $\sigma^{2}=\sigma^{2}(\lVert\Gamma_{\infty}A\Gamma_{\infty}\rVert_{2\to 2})$ , i.e. for any function $f:\{-1,+1\}^{N}\to\operatorname{\mathbb{R}}$ we have

(3.3)

\operatorname{Ent}_{\mu_{J_{n}}}(f^{2})\leq 2\sigma^{2}\sum_{i=1}^{N}\operatorname{\mathbb{E}}_{\mu_{J_{n}}}(f-f\circ T_{i})^{2},

where $\operatorname{Ent}$ is the entropy functional and $T_{i}:\{-1,+1\}^{N}\to\{-1,+1\}^{N},(\sigma_{1},\ldots,\sigma_{N})\mapsto(\sigma_{1},\ldots,\sigma_{i-1},-\sigma_{i},\sigma_{i+1},\ldots,\sigma_{N})$ the sign flip operator.

This follows immediately from [23, Proposition 1.1], since $\Gamma_{n}A\Gamma_{n}\to\Gamma_{\infty}A\Gamma_{\infty}$ , which implies the convergence of the norms, i.e. for $n$ large enough we have $\lVert\Gamma_{n}A\Gamma_{n}\rVert_{2\to 2}<1$ . Although the condition in [23] is $\lVert J\rVert_{1\to 1}<1$ , this was merely for applications’ sake and $\lVert J\rVert_{2\to 2}<1$ is sufficient to establish the logarithmic Sobolev inequality.

For any function $f:\{-1,+1\}^{N}\to\operatorname{\mathbb{R}}$ and any $r\in\{1,\ldots,N\}$ we write

\mathfrak{h}_{r}f(x)=\lvert f(x)-f(T_{r}x)\rvert,

so that (3.3) becomes

\operatorname{Ent}_{\mu_{J_{n}}}(f^{2})\leq 2\sigma^{2}\sum_{r=1}^{N}\int(\mathfrak{h}_{r}f(x))^{2}d\mu_{J_{n}}(x).

Moreover, it is known that (3.3) implies a Poincaré inequality

(3.4)

\operatorname{Var}(f)\leq\sigma^{2}\sum_{r=1}^{N}\operatorname{\mathbb{E}}\mathfrak{h}_{r}f(X)^{2}.

Proof of Lemma 3.5.

Error term $\mathbf{E_{1}}$ : To treat the term $E_{1}$ , fix $i\in\{1,\ldots,k\}$ and observe that

	$\displaystyle\operatorname{\mathbb{E}}\left((\hat{m}_{i}(X)-\hat{m}_{i}(\widetilde{X}))^{2}\mid\mathcal{F}\right)$	$\displaystyle=N^{-1}\sum_{j=1}^{N}\operatorname{\mathbb{E}}\left((\hat{m}_{i}(X)-\hat{m}_{i}(\overline{X}_{j},\widetilde{X}_{j}))^{2}\mid\mathcal{F}\right)$
		$\displaystyle=(N\lvert B_{i}\rvert)^{-1}\sum_{j\in B_{i}}\operatorname{\mathbb{E}}\left((X_{j}-\widetilde{X}_{j})^{2}\mid\mathcal{F}\right)$
		$\displaystyle=-2(N\lvert B_{i}\rvert)^{-1}\sum_{j\in B_{i}}X_{j}\tanh(g_{j}(X))+2N^{-1}.$

Thus, if we define

f_{i}(X)\coloneqq\lvert B_{i}^{(n)}\rvert^{-1/2}\sum_{j\in B_{i}}X_{j}\tanh\left(N^{-1}\sum_{l=1}^{k}A_{il}m_{l}(X)-N^{-1}A_{ii}X_{i}\right),

we see that

\\ Var^{1/2}\left(\operatorname{\mathbb{E}}\left((\hat{m}_{i}^{(n)}-\hat{m}_{i}^{(n)}\,{}^{\prime})^{2}\mid\mathcal{F}\right)\right)=2N^{-1}\lvert B_{i}^{(n)}\rvert^{-1/2}\operatorname{Var}^{1/2}(f_{i}(X)),

and we need to show that $\operatorname{Var}(f_{i}(X))=O(1)$ . Using the Poincaré inequality (3.4) it suffices to prove that $\mathfrak{h}_{r}f_{i}(X)^{2}\leq C\lvert B_{i}^{(n)}\rvert^{-1}$ .

Let $r\in\{1,\ldots,N\}$ be arbitrary and define $h_{i}(X)\coloneqq N^{-1}\sum_{l=1}^{k}A_{il}m_{l}(X)-N^{-1}A_{ii}X_{i}$ . The first case is that $r\in B_{i}^{(n)}$ , for which

	$\displaystyle\mathfrak{h}_{r}f_{i}(X)$	$\displaystyle\leq\lvert B_{i}\rvert^{-1/2}\lvert 2X_{r}\tanh(h_{i}(X))\rvert+\lvert B_{i}\rvert^{-1/2}\sum_{\begin{subarray}{c}j\in B_{i}\\ j\neq r\end{subarray}}\lvert\tanh(h_{i}(X))-\tanh(h_{i}(T_{r}X))\rvert$
		$\displaystyle\leq 4\lvert B_{i}\rvert^{-1/2}+\lvert B_{i}\rvert^{-1/2}N^{-1}\sum_{\begin{subarray}{c}j\in B_{i}\\ j\neq r\end{subarray}}\left\lvert\sum_{l=1}^{k}A_{il}(m_{l}(X)-m_{l}(T_{r}(X)))\right\rvert$
		$\displaystyle\leq\lvert B_{i}\rvert^{-1/2}(4+2\lVert A\rVert_{\infty}).$

The second case $r\notin B_{i}^{(n)}$ follows by similar reasoning.

Error term $\mathbf{E_{2}}$ : The second term $E_{2}$ is much easier to estimate, as

\operatorname{\mathbb{E}}\lvert\hat{m}_{i}-\hat{m}_{i}^{\prime}\rvert^{3}=N^{-1}\lvert B_{i}\rvert^{-3/2}\sum_{j\in B_{i}}\operatorname{\mathbb{E}}\lvert X_{j}-\widetilde{X}_{j}\rvert^{3}\leq 8N^{-1}\lvert B_{i}\rvert^{-1/2}=O(N^{-3/2}).

Error term $\mathbf{E_{3}}$ : To estimate the variance of the remainder term $R$ we first split it into two sums. For any $i=1,\ldots,k$ write

	$\displaystyle R_{i}(X)$	$\displaystyle=N^{-1}\Big{(}\lvert B_{i}\rvert^{-1/2}\sum_{j\in B_{i}}g_{j}(X)-\tanh(g_{j}(X))+N^{-1}A_{ii}X_{j}\Big{)}$
		$\displaystyle=N^{-1}\lvert B_{i}\rvert^{-1/2}\sum_{j\in B_{i}}g_{j}(X)-\tanh(g_{j}(X))+N^{-2}A_{ii}m_{i}(X)$
		$\displaystyle\eqqcolon R_{j}^{(1)}(X)+R_{j}^{(2)}(X).$

Clearly $\lVert R_{i}-\operatorname{\mathbb{E}}R_{i}\rVert_{2}\leq\lVert R_{i}^{(1)}-\operatorname{\mathbb{E}}R_{i}^{(1)}\rVert_{2}+\lVert R_{i}^{(2)}-\operatorname{\mathbb{E}}R_{i}^{(2)}\rVert_{2}$ and we estimate these terms separately. It is obvious that the $L^{2}$ norm of the second term is of order $O(N^{-2})$ . To estimate $R^{(1)}_{i}$ , we use $\tanh(x)-x=O(x^{3})$ to obtain

	$\displaystyle\lVert R^{(1)}_{i}-\operatorname{\mathbb{E}}R^{(1)}_{i}\rVert_{2}$	$\displaystyle\leq CN^{-1}\lvert B_{i}\rvert^{-1/2}\sum_{j\in B_{i}}\lVert\lvert g_{j}(X)\rvert^{3}\rVert_{2}$
		$\displaystyle\leq CN^{-1}\lvert B_{i}\rvert^{-1/2}\sum_{j\in B_{i}}\lVert\lvert N^{-1/2}(A\Gamma\hat{m})_{j}\rvert^{3}\rVert_{2}+\lVert N^{-3}\lvert A_{ii}\rvert^{3}\rVert_{2}$
		$\displaystyle=O(N^{-2})+O(N^{-5/2}).$

In the last line we have used the fact that $\lVert(A\Gamma\hat{m})_{i}^{3}\rVert_{2}=\lVert(A\Gamma\hat{m}_{i}\rVert_{6}^{3}$ and for all $p\geq 2$

\lVert(A\Gamma\hat{m})_{i}\rVert_{p}\leq C\sum_{l=1}^{k}\lVert\hat{m}_{l}\rVert_{p}\leq C\sum_{l=1}^{k}(\sigma^{2}p)^{1/2}

which evaluated at $p=6$ gives $\lVert(A\Gamma\hat{m})_{i}\rVert_{6}^{3}=O(1)$ . For the details see [23]. The constant depends on a norm of $A\Gamma$ , which by convergence to $A\Gamma_{\infty}$ can again be chosen independently of $n$ . ∎

Proof of Theorem 1.5.

The theorem follows immediately from Corollary 3.4 and Lemma 3.5. ∎

4. Discussion and open questions

Although the questions raised in the introduction have been answered to a certain degree, there are still open questions that we were not yet able to answer.

The first question concerns the maxima of the rate function $I$ . Firstly, note that by [11, Theorem A.1] the global maxima of $I$ are related to the global minima of the so-called pressure functional, which can for example be found in [17, equation (14)]. Using the compactness of $[-1,1]^{k}$ and the continuity of $I$ , the existence of a maximiser easily follows, but the number of maximisers is still obscure. From real-analyticity of $I$ , we can infer that the set of maximisers is a $\lambda^{k}$ null set, but it could in principle contain infinitely many points. However, Lemmas 2.5 and 2.6 as well as numerics suggest that for positive interactions and $k\geq 2$ , the number of local minima is twice the number of independent systems - see Figures 2 for the $k=3$ and 3 for the $k=2$ case below.

However, we believe that the case of negative interactions between groups might drastically change the picture. Indeed, consider a model with three blocks and positive interaction $\beta$ within the blocks and negative interaction $\alpha$ between the blocks. Then, if $\beta$ is large enough, the points within the blocks will tend to be aligned. However, as $\alpha$ is negative, the magnetizations of block one and two will try to have different signs, but so do the magnetizations of blocks two and three, and three and one. Hence, frustration occurs. In this respect, a model with positive and negative interactions carries features of a spin glass.

Another question is the relationship of Theorems 1.3 and 1.5. In Theorem 1.5 we consider the distance to a normal distribution with covariance matrix $\Sigma_{n}\coloneqq\operatorname{\mathbb{E}}\hat{m}^{(n)}(\hat{m}^{(n)})^{T}$ and not to $\Sigma_{\infty}\coloneqq(\operatorname{Id}-\Gamma_{\infty}A\Gamma_{\infty})^{-1}$ , which is the covariance matrix of the limiting distribution. Testing against functions $h\in\mathcal{C}_{c}^{\infty}(\operatorname{\mathbb{R}}^{k})$ , we see that $\Sigma_{\infty}$ is the limit of the matrices $\Sigma_{n}$ . It is an interesting task to provide suitable bounds of $\lVert\Sigma_{n}-\Sigma_{\infty}\rVert$ in any matrix norm, since [30, Proposition 2.8] provides bounds of $\lvert\operatorname{\mathbb{E}}h(X)-\operatorname{\mathbb{E}}h(Y)\rvert$ for two random vectors with $X\sim\mathcal{N}(0,\Sigma_{0})$ and $Y\sim\mathcal{N}(0,\Sigma_{1})$ in terms of the $1$ -distance of $\Sigma_{0}$ and $\Sigma_{1}$ .

Thirdly, it remains an open problem to quantify the distance to a normal distribution with the “limiting” covariance matrix $\Sigma_{\infty}$ . The central limit theorem in the one-dimensional Curie–Weiss model has been solved for example in [14, Corollary 2.9]. Therein one can see that the limiting covariance is $(1-\beta)^{-1}$ by considering the approximate linear regression condition. A similar condition is true in the multidimensional case. For example, in Lemma 3.2 we have proven

(4.1)

\operatorname{\mathbb{E}}\left(\hat{m}^{(n)}-\hat{m}^{(n)}\,{}^{\prime}\mid\mathcal{F}\right)=\lambda\Lambda^{-1}\hat{m}^{(n)}+R(X),

where $\lambda=N^{-1}$ and $\Lambda=(\operatorname{Id}-\Gamma_{n}A\Gamma_{n})^{-1}$ . Thus, in the case $\Gamma_{n}\equiv\Gamma_{\infty}$ (e.g. consider a subsequence along which this holds) $\Lambda$ is the covariance matrix of the limit distribution. However, we have been unable to find a suitable modification of [30, Theorem 2.1] that enables one to compare the distribution of the random vector $\hat{m}^{(n)}$ with $\mathcal{N}(0,\Lambda)$ .

References

[1] E. Agliari, R. Burioni, and P. Contucci. A diffusive strategic dynamics for social systems. J. Stat. Phys., 139(3):478–491, 2010.
[2] A. A. Amini and E. Levina. On semidefinite relaxations for the block model. Ann. Statist., 46(1):149–179, 2018.
[3] Q. Berthet, P. Rigollet, and P. Srivastava. Exact recovery in the Ising blockmodel. Ann. Statist., 47(4):1805–1834, 2019.
[4] G. Bresler. Efficiently learning Ising models on arbitrary graphs [extended abstract]. In STOC’15—Proceedings of the 2015 ACM Symposium on Theory of Computing, pages 771–782. ACM, New York, 2015.
[5] G. Bresler, E. Mossel, and A. Sly. Reconstruction of Markov random fields from samples: some observations and algorithms. SIAM J. Comput., 42(2):563–578, 2013.
[6] W. A. Brock and S. N. Durlauf. Discrete choice with social interactions. Rev. Econom. Stud., 68(2):235–260, 2001.
[7] S. Chatterjee and Q.-M. Shao. Nonnormal approximation by Stein’s method of exchangeable pairs with application to the Curie-Weiss model. Ann. Appl. Probab., 21(2):464–483, 2011.
[8] F. Collet. Macroscopic limit of a bipartite Curie-Weiss model: a dynamical approach. J. Stat. Phys., 157(6):1301–1319, 2014.
[9] F. Comets. Large deviation estimates for a conditional probability distribution. Applications to random interaction Gibbs measures. Probab. Theory Related Fields, 80(3):407–432, 1989.
[10] R. Cont and M. Löwe. Social distance, heterogeneity and social interactions. J. Math. Econom., 46(4):572–590, 2010.
[11] M. Costeniuc, R. S. Ellis, and H. Touchette. Complete analysis of phase transitions and ensemble equivalence for the curie–weiss–potts model. J. Math. Phys., 46(6):063301, 2005.
[12] A. Dembo and O. Zeitouni. Large deviations techniques and applications, volume 38 of Stochastic Modelling and Applied Probability. Springer-Verlag, Berlin, 2010.
[13] F. den Hollander. Large deviations, volume 14 of Fields Institute Monographs. American Mathematical Society, Providence, RI, 2000.
[14] P. Eichelsbacher and M. Löwe. Stein’s method for dependent random variables occurring in statistical mechanics. Electron. J. Probab., 15:no. 30, 962–988, 2010.
[15] R. S. Ellis. Entropy, large deviations, and statistical mechanics. Classics in Mathematics. Springer-Verlag, Berlin, 2006.
[16] R. S. Ellis and C. M. Newman. Limit theorems for sums of dependent random variables occurring in statistical mechanics. Z. Wahrsch. Verw. Gebiete, 44(2):117–139, 1978.
[17] M. Fedele and P. Contucci. Scaling limits for multi-species statistical mechanics mean-field models. J. Stat. Phys., 144(6):1186–1205, 2011.
[18] M. Fedele and F. Unguendoli. Rigorous results on the bipartite mean-field model. J. Phys. A, 45(38):385001, 18, 2012.
[19] I. Gallo, A. Barra, and P. Contucci. Parameter evaluation of a simple mean-field model of social interaction. Math. Models Methods Appl. Sci., 19(suppl.):1427–1439, 2009.
[20] I. Gallo and P. Contucci. Bipartite mean field spin systems. Existence and solution. Math. Phys. Electron. J., 14:Paper 1, 21, 2008.
[21] C. Gao, Z. Ma, A. Y. Zhang, and H. H. Zhou. Achieving optimal misclassification proportion in stochastic block models. J. Mach. Learn. Res., 18:Paper No. 60, 45, 2017.
[22] B. Gentz and M. Löwe. The fluctuations of the overlap in the Hopfield model with finitely many patterns at the critical temperature. Probab. Theory Related Fields, 115(3):357–381, 1999.
[23] F. Götze, H. Sambale, and A. Sinulis. Higher order concentration for functions of weakly dependent random variables. Electron. J. Probab., 24:no. 85, 19 pp, 2019.
[24] J. M. Kincaid and E. G. D. Cohen. Phase diagrams of liquid helium mixtures and metamagnets: Experiment and mean field theory. Physics Reports, 22(2):57 – 143, 1975.
[25] W. Kirsch and G. Toth. Two groups in a Curie-Weiss model with heterogeneous coupling. Journal of Theoretical Probability, 2019.
[26] H. Knöpfel and M. Löwe. Zur Meinungsbildung in einer heterogenen Bevölkerung—ein neuer Zugang zum Hopfield Modell. Math. Semesterber., 56(1):15–38, 2009.
[27] M. Löwe and K. Schubert. Fluctuations for block spin Ising models. Electron. Commun. Probab., 23:Paper No. 53, 12, 2018.
[28] E. Mossel, J. Neeman, and A. Sly. Belief propagation, robust reconstruction and optimal recovery of block models. Ann. Appl. Probab., 26(4):2211–2256, 2016.
[29] A. A. Opoku, K. Owusu Edusei, and R. K. Ansah. A conditional Curie-Weiss model for stylized multi-group binary choice with social interaction. J. Stat. Phys., 171(1):106–126, 2018.
[30] G. Reinert and A. Röllin. Multivariate normal approximation with stein’s method of exchangeable pairs under a general linearity condition. Ann. Probab., 37(6):2150–2173, 2009.
[31] J. L. van Hemmen, D. Grensing, A. Huber, and R. Kühn. Elementary solution of classical spin-glass models. Z. Phys. B, 65(1):53–63, 1986.
[32] J. L. van Hemmen, A. C. D. van Enter, and J. Canisius. On a classical spin glass model. Z. Phys. B, 50(4):311–336, 1983.