This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

aainstitutetext: Dipartimento di Matematica “Guido Castelnuovo”, Sapienza Università di Roma, Roma, Italybbinstitutetext: GNFM-INdAM, Gruppo Nazionale di Fisica Matematica, Istituto Nazionale di Alta Matematica, Lecce, Italy

Non-linear PDEs approach to statistical mechanics of Dense Associative Memories

Elena Agliari a,b    Alberto Fachechi a,b    Chiara Marullo agliari@mat.uniroma1.it
Abstract

Dense associative memories (DAM), are widespread models in artificial intelligence used for pattern recognition tasks; computationally, they have been proven to be robust against adversarial input and theoretically, leveraging their analogy with spin-glass systems, they are usually treated by means of statistical-mechanics tools. Here we develop analytical methods, based on nonlinear PDEs, to investigate their functioning. In particular, we prove differential identities involving DAM’s partition function and macroscopic observables useful for a qualitative and quantitative analysis of the system. These results allow for a deeper comprehension of the mechanisms underlying DAMs and provide interdisciplinary tools for their study.

Keywords:
pp-spin, Statistical mechanics, Burgers hierarchy, Nonlinear systems, Mean-field theory, PDE

1 Introduction

Artificial intelligence (AI) is rapidly changing the face of our society thanks to its impressive abilities in accomplishing complex tasks and extracting information from non-trivially structured, high-dimensional datasets. The successful applications of modern AI range from hard sciences and technology to more practical scenarios (e.g. medical sciences, economics and finance, and many daily tasks). Its important success is primarily due to the ascent of deep learning LeCun2015 ; Schmidhuber2015 , a set of semi-heuristic techniques consisting in stacking together several minimal building blocks in complex architectures with extremely-high learning performances. Despite its success, a rigorous theoretical framework guiding the development of such machine-learning architectures is still lacking. In this context, statistical mechanics of complex systems offers ideal tools to study neural network models from a more theoretical (and rigorous) point of view, thus drawing a feasible path which makes AI less empirical and more explainable.

In statistical mechanics, a primary classification of physical systems is the following. On the one side, we have simple systems, which are essentially characterized by the fact that the number of equilibrium configurations does not depend on the system size NN. A paradigmatic (mean-field) realization of this situation is the Curie-Weiss (CW) model, in which all the spins σi\sigma_{i}, i=1,,Ni=1,\dots,N, making up the system interact pairwisely by a constant, positive (i.e. ferromagnetic) coupling JJ. Below the critical temperature, in fact, the system exhibits ordered collective behaviors, and the equilibrium configurations of the system are characterized by only two possible values of the global magnetization m(𝝈):=1Ni=1Nσim(\boldsymbol{\sigma}):=\frac{1}{N}\sum_{i=1}^{N}\sigma_{i} (which are further related by a spin-flip symmetry σiσi\sigma_{i}\to-\sigma_{i} for each i=1,,Ni=1,\dots,N). On the opposite side, we have complex systems, in which the number of equilibrium configurations increases according to an appropriate function of the system size NN due to the presence of frustrated interactions mezard1988spin . The prototypical example of mean-field spin-glass is the Sherrington-Kirkpatrick (SK) model sherrington1975 , in which the interaction strengths between the spin pairs are i.i.d. Gaussian variables. With respect to simple systems, spin-glass models exhibit a richer physical and mathematical structure, as shown by the presence of the spontaneous replica-symmetry breaking and an infinite number of phase transitions (e.g. see Parisi1979rsb1 ; Parisi1979rsb2 ; Parisi1980rsb3 ; Parisi1980sequence ; mezard1984replica ; guerra2003broken ; ghirlanda1998general ; talagrand2000rsb ) as well as the ultrametric organization of pure states (e.g. see panchenko2010ultra ; panchenko2011ultra ; panchenko2013parisi ). Statistical mechanics of spin glasses has acquired a prominent role during the last decades due to its ability to describe the equilibrium dynamics of several paradigmatic models for AI, in particular thanks to the work by Amit, Gutfreund, and Sompolinsky amit1985 for associative neural networks. For our concerns, the relevant ones are the Hopfield model hopfield1982hopfield ; pastur1977exactly and its pp-spin extensions, the Dense Associative Memories (DAMs) Krotov2016DenseAM ; Krotov2018DAMS ; AD-EPJ2020 , exhibiting features which are peculiar both of ferromagnetic (simple) and spin-glass (complex) systems. In these models, the interactions between the spins are designed in order to store KK “information patterns”, each encoded by KK binary vectors of length NN and denoted by {𝝃μ}μ=1,,K\{\boldsymbol{\xi}^{\mu}\}_{\mu=1,\dots,K}, with 𝝃μ=(ξ1μ,ξ2μ,,ξNμ){1,+1}N\boldsymbol{\xi}^{\mu}=(\xi_{1}^{\mu},\xi_{2}^{\mu},...,\xi_{N}^{\mu})\in\{-1,+1\}^{N} and ξiμ\xi^{\mu}_{i} is a Rademacher random variable for any i=1,,Ni=1,\dots,N and μ=1,,K\mu=1,\dots,K; the μ\mu-th pattern is said to be stored if the configuration 𝝈=𝝃μ\boldsymbol{\sigma}=\boldsymbol{\xi}^{\mu} is an equilibrium state and the relaxation to this configuration, starting from a relatively close one (i.e. a corrupted version of 𝝃μ\boldsymbol{\xi}^{\mu}), is interpreted as the retrieval of that pattern. The Hamilton function (or the energy in physics jargon) of these systems can be expressed as

HN,p(𝝈)μ=1K(mμ(𝝈))p,H_{N,p}(\boldsymbol{\sigma})\propto-\sum_{\mu=1}^{K}(m_{\mu}(\boldsymbol{\sigma}))^{p},

where pp is the interaction order (for the Hopfield model p=2p=2, while p>2p>2 for the DAMs) and mμ(𝝈):=1Ni=1Nξiμσim_{\mu}(\boldsymbol{\sigma}):=\frac{1}{N}\sum_{i=1}^{N}\xi^{\mu}_{i}\sigma_{i} is the so-called Mattis magnetizations measuring the retrieval of the μ\mu-th pattern. It has been shown that the number of storable patterns scales, at most, as a function of the system size, more precisely, K<αc(p)Np1K<\alpha_{c}(p)N^{p-1}, where αc(p)\alpha_{c}(p)\in\mathbb{R} depends on the interaction order pp and is referred to as critical storage capacity baldi1987number ; bovier2001spin . By a statistical-mechanics investigation of these models one can highlight the macroscopic observables (order parameters) useful to describe the overall behavior of the system, namely to assess whether it exhibits retrieval capabilities, and the natural control parameters whose tuning can qualitatively change the system behavior; such knowledge can then be summarized in phase diagram.

Regarding the methods, statistical mechanics offers a wide set of techniques for analyzing the equilibrium dynamics of complex systems, and in particular to solve for their free-energy. Historically, the first method (which was applied to the SK model and the Hopfield model amit1985 ; steffan1994replica ) is the replica trick, which – despite being straightforward and effective – is semi-heuristic and suffers from delicate points, see for example tanaka2007moment . Alternative, rigorous approaches were developed during the years and, among these, the relevant one for our concerns is Guerra’s interpolating framework. In this case, we can take advantage of rigorous mathematical methods by applying sum rules guerra2001sum or by mapping the relevant quantities (the free energy or the model order parameters) of the statistical setting to the solutions of PDE systems. Indeed, differential equations involving the partition functions (or related quantities) of thermodynamic models have been extensively investigated in the literature, see for example agliari2012notes ; barra2013mean ; barra2008mean ; barra2010replica ; barra2014proc ; Moro2018annals ; AMT-JSP2018 ; Moro2019PRE ; ABN-JMP2019 ; fachechi2021pde ; AAAF-JPA2021 . In particular, they allow us to express the equation of state (or the self-consistency equations) governing the equilibrium dynamics of the system in terms of solutions of non-linear differential equations, and to describe phase transition phenomena as the development of shock waves, thus linking critical behaviours to gradient catastrophe theory Barra2015Annals ; DeNittis2012PrsA ; Giglio2016Physica ; Moro2014Annals . In a recent work fachechi2021pde , a direct connection between the thermodynamics of ferromagnetic models with interactions of order pp and the equations of the Burgers hierarchy was established by linking the solution of the latter as the equilibrium solution of the order parameter of the former (i.e. the global magnetization mm). In the present paper, we extend these results to complex models, in particular to the Hopfield model and the DAMs.

The paper is organised as follows. In Section 2, we introduce the relevant tools for our investigations, in particular Guerra’s interpolating scheme for the PDE duality. In Section 3, as a warm up, we review some basic results about the pp-spin ferromagnetic models. In Section 4, we extend our results to the Derrida models (constituting the pp-spin extension of the SK spin glass) derrida1981rem . In Section 5, we merge our results in a unified methodology for dealing with the DAMs, especially in the so-called high storage limit, and re-derive the self-consistency equations for the order parameters by means of PDE technology.

2 Generalities and notation

In this Section, we present the thermodynamic objects we aim to study. We start with a system made up of NN spins whose configurations 𝝈ΣN{1,+1}N\boldsymbol{\sigma}\in\Sigma_{N}\equiv\{-1,+1\}^{N} are the nodes of a hypercube and interacting via a suitable tensor 𝑱\boldsymbol{J} of order pp. The Hamilton functions of the system we will consider in this paper are of the form

HN,p,𝑱(𝝈)=1Dp,N,Ji1,,ip=1NJi1,i2,,ipσi1σi2σip,H_{N,p,\boldsymbol{J}}(\boldsymbol{\sigma})=-\frac{1}{D_{p,N,J}}\sum_{i_{1},\dots,i_{p}=1}^{N}J_{i_{1},i_{2},\dots,i_{p}}\sigma_{i_{1}}\sigma_{i_{2}}\dots\sigma_{i_{p}}, (1)

where Dp,N,JD_{p,N,J} is a normalization factor ensuring the linear extensivity of the energy with the system size. Once the Hamiltonian is fixed, we introduce the partition function in the usual Boltzmann-Gibbs form. Thus, given β+\beta\in\mathbb{R}_{+} the level of thermal noise of the system, the partition function is defined as

ZN,p,𝑱(β):=𝝈ΣNexp[βHN,p,𝑱(𝝈)].Z_{N,p,\boldsymbol{J}}(\beta):=\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\exp\left[-\beta H_{N,p,\boldsymbol{J}}(\boldsymbol{\sigma})\right]. (2)

For simple systems, the partition function can be computed exactly for any 𝑱\boldsymbol{J} coupling matrix. As is standard in statistical mechanics, it is convenient to compute intensive quantities which are well-defined in the thermodynamic limit NN\to\infty. Since the partition function is a sum of 2N2^{N} contribution, it is sufficient to take the intensive logarithm of the partition function, i.e.

AN,p,𝑱(β):=1NlogZN,p,𝑱(β),A_{N,p,\boldsymbol{J}}(\beta):=\frac{1}{N}\log Z_{N,p,\boldsymbol{J}}(\beta),

which is the intensive statistical pressure (which is, apart for a factor β-\beta, the usual free energy) of the system. However, when dealing with spin-glass systems, the coupling tensor 𝑱\boldsymbol{J} is a multidimensional random variable, thus the partition function defines a random measure on the configuration space. For good enough probability distributions of the coupling matrix, the intensive logarithm of the partition function is expected to converge to its expectation value in the thermodynamic limit NN\to\infty by virtue of self-averaging theorems ShcherbinaPastur-JSP1991 ; Bovier-JPA1994 , so it is natural to consider the quenched intensive pressure associated to the partition function (2), which is defined as

AN,p(β):=1N𝔼𝑱logZN,p,𝑱(β),A_{N,p}(\beta):=\frac{1}{N}\mathbb{E}_{\boldsymbol{J}}\log Z_{N,p,\boldsymbol{J}}(\beta), (3)

where 𝔼𝑱\mathbb{E}_{\boldsymbol{J}} denotes the average over the quenched disorder 𝑱\boldsymbol{J} (we stress that, in this case, the free energy does not depend any longer on the coupling matrix because of the average operation).

Rather than working with the quantity (2), we will use as a fundamental object Guerra’s interpolated partition function and its associated interpolating intensive pressure. For instance, for spin-glass systems we would have

ZN,p,𝑱(t,𝒙):=𝝈ΣNexp[HN,p,𝑱(t,𝒙)],AN,p(t,𝒙):=1N𝔼𝑱logZN,p,𝑱(t,𝒙),\begin{split}Z_{N,p,\boldsymbol{J}}(t,\boldsymbol{x})&:=\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\exp\left[-{H_{N,p,\boldsymbol{J}}(t,\boldsymbol{x})}\right],\\ A_{N,p}(t,\boldsymbol{x})&:=\frac{1}{N}\mathbb{E}_{\boldsymbol{J}}\log Z_{N,p,\boldsymbol{J}}(t,\boldsymbol{x}),\end{split} (4)

where HN,p,𝑱(t,𝒙)H_{N,p,\boldsymbol{J}}(t,\boldsymbol{x}) denotes the interpolating Hamiltonian satisfying the properties that, at 𝒙=0\boldsymbol{x}=0 and t0t\neq 0, it recovers the Hamiltonian (1) times β\beta, and at t=0t=0 and 𝒙𝟎\boldsymbol{x}\neq\boldsymbol{0} it corresponds to an exactly-solvable 11-body system, namely a system where spins interact only with an external field that has to be set a posteriori. The interpolating parameters tt and 𝒙\boldsymbol{x} are interpreted, in a mechanical analogy, as spacetime coordinates with suitable dimensionality.

The interpolating structure (4) implies a generalized measure, whose related Boltzmann factor is

BN,p,𝑱(t,𝒙):=exp[HN,p,𝑱(t,𝒙)].B_{N,p,\boldsymbol{J}}(t,\boldsymbol{x}):=\exp\left[-{H_{N,p,\boldsymbol{J}}(t,\boldsymbol{x})}\right]. (5)

Thus, for an arbitrary observable O(𝝈)O(\boldsymbol{\sigma}) in the configuration space ΣN\Sigma_{N}, we can introduce the Boltzmann average induced by the partition function (4) as

ωt,𝒙(O):=1ZN,p,𝑱(t,𝒙)𝝈ΣNO(𝝈)BN,p,𝑱(t,𝒙).\omega_{t,\boldsymbol{x}}(O):=\frac{1}{Z_{N,p,\boldsymbol{J}}(t,\boldsymbol{x})}\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}O(\boldsymbol{\sigma})B_{N,p,\boldsymbol{J}}(t,\boldsymbol{x}). (6)

Usually, in spin-glass systems, the quenched average is performed after taking the Boltzmann expectation values on the ss-replicated space ΣN(s)=(ΣN)s{1,+1}sN\Sigma_{N}^{(s)}=(\Sigma_{N})^{\otimes s}\equiv\{-1,+1\}^{sN}, which is naturally endowed with a random Gibbs measure corresponding to the partition function ZN,p,𝑱(s)(t,𝒙)=ZN,p,𝑱(t,𝒙)sZ_{N,p,\boldsymbol{J}}^{(s)}(t,\boldsymbol{x})=Z_{N,p,\boldsymbol{J}}(t,\boldsymbol{x})^{s}. Given a function O:ΣN(s)O:\Sigma_{N}^{(s)}\to\mathbb{R}, the Boltzmann average in the ss-replicated space are straightforwardly defined as

Ωt,𝒙(s)(O):=1ZN,p,𝑱(s)(t,𝒙)𝝈¯ΣN(s)O(𝝈¯)BN,p,𝑱(s)(t,𝒙)\Omega_{t,\boldsymbol{x}}^{(s)}(O):=\frac{1}{Z_{N,p,\boldsymbol{J}}^{(s)}(t,\boldsymbol{x})}\sum_{\underline{\boldsymbol{\sigma}}\in\Sigma_{N}^{(s)}}O(\underline{\boldsymbol{\sigma}})B_{N,p,\boldsymbol{J}}^{(s)}(t,\boldsymbol{x})

where 𝝈¯ΣN(s)\underline{\boldsymbol{\sigma}}\in\Sigma_{N}^{(s)} is the global configuration of the replicated system, and BN,p,𝑱(s)(t,𝒙)B_{N,p,\boldsymbol{J}}^{(s)}(t,\boldsymbol{x}) is the Boltzmann factor associated to the ss-replicated partition function. Of course, in spin-glass theory, the relevant quantities are the quenched expectation values, which are defined as

Ot,𝒙:=𝔼𝑱Ωt,𝒙(s)(O).\langle O\rangle_{t,\boldsymbol{x}}:=\mathbb{E}_{\boldsymbol{J}}\Omega_{t,\boldsymbol{x}}^{(s)}(O). (7)

For the sake of simplicity, we dropped the index ss from the quenched averages, as it would be clear from the context.

With all these definitions in mind, we are then able to find the link between the resolution of the statistical mechanics of a given spin-like model and a specific PDE problem in the fictitious space (t,𝒙)(t,\boldsymbol{x}). Before concluding this section it is worth recalling that here we will work under the replica-symmetry (RS) assumption, meaning that we assume the self-averaging property for any order parameter XX, i.e. the fluctuations around their expectation values vanish in the thermodynamic limit. In distributional sense, this corresponds to

limN𝒫t,𝒙(X)=δ(XX¯).\lim_{N\to\infty}\mathcal{P}_{t,\boldsymbol{x}}(X)=\delta(X-\bar{X}). (8)

where X¯=Xt,𝒙\bar{X}=\langle X\rangle_{t,\boldsymbol{x}} is the expectation value w.r.t. the interpolating measure 𝒫t,𝒙(X)\mathcal{P}_{t,\boldsymbol{x}}(X). Typically, for simple systems this assumption is correct, conversely, for complex systems this is not always the case, for instance, in spin-glasses the RS is broken at low temperature mezard1988spin . When dealing with neural-network models, RS constitutes a standard working assumption as it usually applies (at least) in a limited region of the parameter space, while elsewhere it yields only small quantitative discrepancies with respect to the exact solution amit1989 ; SpecialIssue-JPA . The latter, accounting for RSB phenomena, can be obtained by iteratively perturbing the RS interpolation scheme (e.g. see AABO-JPA2020 ; AAAF-JPA2021 ; albanese2021 ), thus, our results find direct application on the practical side and provide the starting point for further refinements on the theoretical side.

3 pp-spin ferromagnetic models: how to deal with simple systems

The present section is a compendium of the results reported in fachechi2021pde , so we refer to that work for a detailed derivation. In pp-spin ferromagnets, the interaction between spins is fixed by the requirement Ji1,i2,,ip=JJ_{i_{1},i_{2},\dots,i_{p}}=J for each i1,,ip=1,,Ni_{1},\dots,i_{p}=1,\dots,N and J>0J>0; without loss of generality, one can set J=1J=1, since it corresponds to a rescaling of the thermal noise. Thus, the Hamilton function of the model simply reads as

HN,p(𝝈):=1Np1i1,,ip=1Nσi1σip=N(m(𝝈))p,H_{N,p}(\boldsymbol{\sigma}):=-\frac{1}{N^{p-1}}\sum_{i_{1},\dots,i_{p}=1}^{N}\sigma_{i_{1}}\dots\sigma_{i_{p}}=-N(m(\boldsymbol{\sigma}))^{p}, (9)

with

m(𝝈):=1Niσi,m(\boldsymbol{\sigma}):=\frac{1}{N}\sum_{i}\sigma_{i},

being the global magnetization of the system. By following the same lines of fachechi2021pde , Guerra’s interpolating partition function reads as

ZN,p(t,x)\displaystyle Z_{N,p}(t,x) =\displaystyle= 𝝈ΣNexp(HN,p(t,x)),\displaystyle\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\exp\left(-H_{N,p}(t,x)\right), (10)
HN,p(t,x)\displaystyle H_{N,p}(t,x) =\displaystyle= tNm(𝝈)pNxm(𝝈),\displaystyle tNm(\boldsymbol{\sigma})^{p}-Nxm(\boldsymbol{\sigma}), (11)

where (t,x)2(t,x)\in\mathbb{R}^{2}. The starting point is to notice that the interpolating statistical pressure associated to the partition function (10) has spacetime derivatives

tAN,p(t,x)\displaystyle\partial_{t}A_{N,p}(t,x) =\displaystyle= ωt,x(m(𝝈)p),\displaystyle-\omega_{t,x}(m(\boldsymbol{\sigma})^{p}), (12)
xAN,p(t,x)\displaystyle\partial_{x}A_{N,p}(t,x) =\displaystyle= ωt,x(m(𝝈)).\displaystyle\omega_{t,x}(m(\boldsymbol{\sigma})). (13)

The expectation value of monomials of the global magnetization satisfies the following relation fachechi2021pde :

xωt,x(m(𝝈)s)=N(ωt,x(m(𝝈)s+1)ωt,x(m(𝝈)s)ωt,x(m(𝝈))).\partial_{x}\omega_{t,x}(m(\boldsymbol{\sigma})^{s})=N(\omega_{t,x}(m(\boldsymbol{\sigma})^{s+1})-\omega_{t,x}(m(\boldsymbol{\sigma})^{s})\omega_{t,x}(m(\boldsymbol{\sigma}))). (14)

This means that we can act on the expectation value ωt,x(m(𝝈))\omega_{t,x}(m(\boldsymbol{\sigma})) to generate higher momenta. In particular, calling u(t,x)=ωt,x(m(𝝈))u(t,x)=\omega_{t,x}(m(\boldsymbol{\sigma})) and setting s=p1s=p-1, we directly get the Burgers hierarchy

tu(t,x)+x(1Nx+u(t,x))p1u(t,x)=0.\partial_{t}u(t,x)+\partial_{x}\left(\frac{1}{N}\partial_{x}+u(t,x)\right)^{p-1}u(t,x)=0. (15)

This duality also allows us to analyse the thermodynamic limit, corresponding to the inviscid scenario for the Burgers hierarchy. Indeed, posing u¯(t,x)=limNu(t,x)=limNωt,x(m(𝝈))\bar{u}(t,x)=\lim_{N\to\infty}u(t,x)=\lim_{N\to\infty}\omega_{t,x}(m(\boldsymbol{\sigma})), we have the initial value problem

{tu¯(t,x)+pu¯(t,x)p1xu¯(t,x)=0u¯(0,x)=tanh(x),\begin{cases}\partial_{t}\bar{u}(t,x)+p\bar{u}(t,x)^{p-1}\partial_{x}\bar{u}(t,x)=0\\ \bar{u}(0,x)=\tanh(x)\end{cases}, (16)

where the initial profile is easily computed by straightforward calculations (since it is a 1-body problem). This system describes the propagation of non-linear waves, and can be solved by assuming a solution in implicit form u¯(t,x)=tanh(xv(t,x)t)\bar{u}(t,x)=\tanh(x-v(t,x)t), where v(t,x)=pu¯(t,x)p1v(t,x)=p\bar{u}(t,x)^{p-1} is the effective velocity. Recalling that the thermodynamics of the original pp-spin model associated to the Hamilton function (9) is recovered by setting t=βt=-\beta and x=0x=0, we directly obtain

m¯=tanh(βpm¯p1),\bar{m}=\tanh(\beta p\bar{m}^{p-1}), (17)

where m¯=u¯(β,0)\bar{m}=\bar{u}(-\beta,0). This is precisely the self-consistency equation for the global magnetization for the pp-spin ferromagnetic model fachechi2021pde . The phase transition of the system is expected to take place where the gradient of the solution explodes, which, on the Burgers side, corresponds to the development of a shock wave at x=0x=0. Since the temporal coordinate tt is directly related to the thermal noise at which the phase transition occurs, with standard PDE methods we can analytically determine the critical temperature according to the simple system

{ξ¯=F(ξ¯)F(ξ¯),Tc=F(ξ¯),\begin{cases}\bar{\xi}=\frac{F(\bar{\xi})}{F^{\prime}(\bar{\xi})},\\ T_{c}=F^{\prime}(\bar{\xi}),\end{cases}

where F(ξ)=ptanh(ξ)p1F(\xi)=p\tanh(\xi)^{p-1}. This prediction is in perfect agreement with the numerical solutions of the self-consistency equation (17).

4 Derrida models: how to deal with complex systems

In this section, we adapt the previous methodologies to treat complex systems with pp-spin interactions. The paradigmatic case is given by the pp-spin SK model, also referred to as Derrida model, defined as follows

Definition 1.

Let 𝛔\boldsymbol{\sigma} be the generic point in the configuration space ΣN={1,+1}N\Sigma_{N}=\{-1,+1\}^{N} of the system. Let 𝐉\boldsymbol{J} be a pp-rank random tensor with entries Ji1ip𝒩(0,1)J_{i_{1}\dots i_{p}}\sim\mathcal{N}(0,1) i.i.d. The Hamilton function of the pp-spin Derrida model is defined as

HN,p,𝑱(𝝈)=p!2Np11i1<<ipNNJi1ipσi1σip.H_{N,p,\boldsymbol{J}}(\boldsymbol{\sigma})=-\sqrt{\frac{p!}{2N^{p-1}}}\sum_{1\leq i_{1}<\dots<i_{p}\leq N}^{N}J_{i_{1}\dots i_{p}}\sigma_{i_{1}}\dots\sigma_{i_{p}}. (18)
Remark 1.

Clearly, for p=2p=2 we recover the Sherrington-Kirkpatrick model sherrington1975 .

Remark 2.

In the usual definition of the pp-spin SK model, the sum is performed with the constraint 1i1<i2<ipN1\leq i_{1}<i_{2}<\dots i_{p}\leq N like in (18). Beyond that formulation, it is possible to consider an alternative one, where summation is realized independently over all the indices, the difference between the two prescriptions being vanishing in the thermodynamic limit, that is

1i1<<ipN()=1p!i1,,ip=1N()+ contributions vanishing as N.\sum_{1\leq i_{1}<\dots<i_{p}\leq N}(\cdot)=\frac{1}{p!}\sum_{i_{1},\dots,i_{p}=1}^{N}(\cdot)+\textnormal{ contributions vanishing as }N\to\infty. (19)

Since we are interested in the thermodynamic limit, we will often use the equality

1i1<<ipN()=1p!i1,,ip=1N(),\sum_{1\leq i_{1}<\dots<i_{p}\leq N}(\cdot)=\frac{1}{p!}\sum_{i_{1},\dots,i_{p}=1}^{N}(\cdot), (20)

holding in the NN\to\infty limit.

Definition 2.

Given (t,x)2(t,x)\in\mathbb{R}^{2} and given a family {Ji}i=1N\{J_{i}\}_{i=1}^{N} of i.i.d. 𝒩(0,1)\mathcal{N}(0,1)-distributed random variables, Guerra’s interpolating partition function for the pp-spin SK model is

ZN,p,𝑱(t,x)\displaystyle Z_{N,p,\boldsymbol{J}}(t,x) =\displaystyle= 𝝈ΣNexp(HN,p,𝑱(t,x)),\displaystyle\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\exp\big{(}-H_{N,p,\boldsymbol{J}}(t,x)\big{)}, (21)
HN,p,𝑱(t,x)\displaystyle H_{N,p,\boldsymbol{J}}(t,x) =\displaystyle= tp!2Np11i1<<ipNNJi1ipσi1σipxi=1NJiσi.\displaystyle-\sqrt{\frac{tp!}{2N^{p-1}}}\sum_{1\leq i_{1}<\dots<i_{p}\leq N}^{N}J_{i_{1}\dots i_{p}}\sigma_{i_{1}}\dots\sigma_{i_{p}}-\sqrt{x}\sum_{i=1}^{N}J_{i}\sigma_{i}. (22)

The Boltzmann factor associated to this partition function is denoted with BN,p,𝐉(t,x)B_{N,p,\boldsymbol{J}}(t,x).

As stated in Sec. 1, when dealing with spin glasses we need to enlarge our analysis to the ss-replicated version of the configuration space. To this aim, we use the following

Definition 3.

Let ΣN(s)=(ΣN)s{1,+1}sN\Sigma_{N}^{(s)}=(\Sigma_{N})^{\otimes s}\equiv\{-1,+1\}^{sN} be the ss-replicated configuration space. We denote with 𝛔¯=(𝛔1,,𝛔s)ΣN(s)\boldsymbol{\underline{\sigma}}=(\boldsymbol{\sigma}^{1},\dots,\boldsymbol{\sigma}^{s})\in\Sigma^{(s)}_{N} the global configuration of the replicated system. The space ΣN(s)\Sigma_{N}^{(s)} is naturally endowed with the ss-replicated Boltzmann-Gibbs measure associated to the partition function

ZN,p,𝑱(s)(t,x)=𝝈¯ΣN(s)exp(tp!2Np1a=1s1i1<<ipNNJi1ipσi1(a)σip(a)+xa=1si=1NJiσi(a)).Z^{(s)}_{N,p,\boldsymbol{J}}(t,x)=\sum_{\boldsymbol{\underline{\sigma}}\in\Sigma_{N}^{(s)}}\exp\Big{(}\sqrt{\frac{tp!}{2N^{p-1}}}\sum_{a=1}^{s}\sum_{1\leq i_{1}<\dots<i_{p}\leq N}^{N}J_{i_{1}\dots i_{p}}\sigma_{i_{1}}^{(a)}\dots\sigma_{i_{p}}^{(a)}+\sqrt{x}\sum_{a=1}^{s}\sum_{i=1}^{N}J_{i}\sigma_{i}^{(a)}\Big{)}. (23)

We will denote with BN,p,𝐉(s)(t,x)B^{(s)}_{N,p,\boldsymbol{J}}(t,x) the Boltzmann factor appearing in the ss-replicated partition function. Given an observable O:ΣN(s)O:\Sigma_{N}^{(s)}\to\mathbb{R} on the replicated space, the Boltzmann average w.r.t. the ss-replicated partition function is

Ωt,x(s)(O)=𝝈¯O(𝝈¯)BN,p,𝑱(s)(t,x)𝝈¯BN,p,𝑱(s)(t,x).\Omega^{(s)}_{t,x}(O)=\frac{\sum_{\boldsymbol{\underline{\sigma}}}O(\boldsymbol{\underline{\sigma}})B_{N,p,\boldsymbol{J}}^{(s)}(t,x)}{\sum_{\boldsymbol{\underline{\sigma}}}B_{N,p,\boldsymbol{J}}^{(s)}(t,x)}. (24)
Remark 3.

Clearly, the thermodynamics of the original model is recovered with t=β2t=\beta^{2} and x=0x=0.

Remark 4.

Since replicas are independent ZN,p,𝐉(s)(t,x)(ZN,p,𝐉(t,x))sZ^{(s)}_{N,p,\boldsymbol{J}}(t,x)\equiv(Z_{N,p,\boldsymbol{J}}(t,x))^{s}.

In the following, in order to lighten the notation, the replica index ss of the Boltzmann average Ωt,x(s)\Omega^{(s)}_{t,x} can be dropped, since it is understood directly from the function to be averaged.

Definition 4.

Given an observable O:ΣN(s)O:\Sigma_{N}^{(s)}\to\mathbb{R} on the replicated space, the quenched average is defined as

Ot,x=𝔼𝑱Ωt,x(O).\langle O\rangle_{t,x}=\mathbb{E}_{\boldsymbol{J}}\Omega_{t,x}(O). (25)
Remark 5.

In the last definition, the average 𝔼𝐉\mathbb{E}_{\boldsymbol{J}} is again the expectation value performed over all the quenched disorder, thus including the auxiliary random variables in the interpolating setup.

Definition 5.

The order parameter for the pp-spin SK model is the replica overlap

qab=1Ni=1Nσi(a)σi(b),q_{ab}=\frac{1}{N}\sum_{i=1}^{N}\sigma_{i}^{(a)}\sigma_{i}^{(b)}, (26)

where 𝛔(a)\boldsymbol{\sigma}^{(a)} and 𝛔(b)\boldsymbol{\sigma}^{(b)} are two generic configurations of different replicas of the system.

We can now focus on the PDE approach to the statistical mechanics of the pp-spin SK model. To this aim, we compute the spacetime derivative of the quenched intensive pressure, as given in the following

Definition 6.

For all p2p\geq 2, Guerra’s action functional is defined as

SN,p(t,x)=2AN,p(t,x)xt2.S_{N,p}(t,x)=2A_{N,p}(t,x)-x-\frac{t}{2}. (27)
Lemma 1.

The spacetime derivatives of the Guerra’s action functional read as

tSN,p(t,x)\displaystyle\partial_{t}S_{N,p}(t,x) =\displaystyle= 12q12pt,x+RN(t,x),\displaystyle-\frac{1}{2}\langle q_{12}^{p}\rangle_{t,x}+R_{N}(t,x), (28)
xSN,p(t,x)\displaystyle\partial_{x}S_{N,p}(t,x) =\displaystyle= q12t,x.\displaystyle-\langle q_{12}\rangle_{t,x}. (29)

where RN(t,x)R_{N}(t,x) takes into account the contributions coming from (19) and vanishing in the NN\to\infty limit.

The proof of this lemma can be found in Appendix A.

Lemma 2.

Given an observable O:ΣN(s)O:\Sigma^{(s)}_{N}\to\mathbb{R} on the replicated space, the following streaming equation holds:

xO(𝝈¯)t,x=N2a,b=1sO(𝝈¯)qabt,xsNa=1sO(𝝈¯)qa,s+1t,xs2NO(𝝈¯)t,x+s(s+1)2O(𝝈¯)qs+1,s+2t,x.\begin{split}\partial_{x}\langle O(\boldsymbol{\underline{\sigma}})\rangle_{t,x}&=\frac{N}{2}\sum_{a,b=1}^{s}\langle O(\underline{\boldsymbol{\sigma}})q_{ab}\rangle_{t,x}-sN\sum_{a=1}^{s}\langle O(\underline{\boldsymbol{\sigma}})q_{a,s+1}\rangle_{t,x}\\ &-\frac{s}{2}N\langle O(\boldsymbol{\underline{\sigma}})\rangle_{t,x}+\frac{s(s+1)}{2}\langle O(\boldsymbol{\underline{\sigma}})q_{s+1,s+2}\rangle_{t,x}.\end{split} (30)
Proof.

The proof is long and rather cumbersome, so we will just give a sketch. First of all, we recall that

O(𝝈¯)t,x=𝔼𝑱1ZN,p,𝑱(t,x)s𝝈¯ΣN(s)O(𝝈¯)BN,p,𝑱(s)(t,x).\langle O(\underline{\boldsymbol{\sigma}})\rangle_{t,x}=\mathbb{E}_{\boldsymbol{J}}\frac{1}{Z_{N,p,\boldsymbol{J}}(t,x)^{s}}\sum_{\boldsymbol{\underline{\sigma}}\in\Sigma_{N}^{(s)}}O(\underline{\boldsymbol{\sigma}})B_{N,p,\boldsymbol{J}}^{(s)}(t,x). (31)

When taking the xx-derivative of this quantity, we will get two contributions: the first one follows from the derivative of BN,p,𝑱(s)(t,x)B_{N,p,\boldsymbol{J}}^{(s)}(t,x), and the second one follows from the derivative of 1/ZN,p,𝑱s1/Z_{N,p,\boldsymbol{J}}^{s} (which results in adding a new replica). In quantitative terms:

xO(𝝈¯)t,x=12x𝔼𝑱(a=1si=1NJi1ZN,p,𝑱(t,x)s𝝈(1)𝝈(s)O(𝝈¯)σi(a)BN,p,𝑱(s)(t,x)si=1NJi1ZN,p,𝑱(t,x)s+1𝝈(1)𝝈(s+1)O(𝝈¯)σi(s+1)BN,p,𝑱(s+1)(t,x)).\begin{split}\partial_{x}\langle O(\underline{\boldsymbol{\sigma}})\rangle_{t,x}=\frac{1}{2\sqrt{x}}&\mathbb{E}_{\boldsymbol{J}}\Bigl{(}\sum_{a=1}^{s}\sum_{i=1}^{N}J_{i}\frac{1}{Z_{N,p,\boldsymbol{J}}(t,x)^{s}}\sum_{\boldsymbol{\sigma}^{(1)}}\dots\sum_{\boldsymbol{\sigma}^{(s)}}O(\underline{\boldsymbol{\sigma}})\sigma_{i}^{(a)}B_{N,p,\boldsymbol{J}}^{(s)}(t,x)\\ &-s\sum_{i=1}^{N}J_{i}\frac{1}{Z_{N,p,\boldsymbol{J}}(t,x)^{s+1}}\sum_{\boldsymbol{\sigma}^{(1)}}\dots\sum_{\boldsymbol{\sigma}^{(s+1)}}O(\underline{\boldsymbol{\sigma}})\sigma_{i}^{(s+1)}B_{N,p,\boldsymbol{J}}^{(s+1)}(t,x)\Bigr{)}.\end{split} (32)

The presence of JiJ_{i} in both terms of the right-hand-side can be carried out by applying the Wick-Isserlis theorem. Each JiJ_{i}-derivative would result in two different contributions, and its action on the denominators (involving the partition functions) will again result in the appearance of Boltzmann averages with more replicas. Further, the explicit xx-dependence of the derivative precisely cancels (since the JiJ_{i}-derivative will produce factors proportional to x\sqrt{x}). Indeed, after all the computations and recalling t,x=𝔼𝑱Ωt,x()\langle\cdot\rangle_{t,x}=\mathbb{E}_{\boldsymbol{J}}\Omega_{t,x}(\cdot), we get

xO(𝝈¯)t,x=12a,b=1si=1NO𝝈¯)σi(a)σi(b)t,xs2a=1si=1NO(𝝈¯)σi(a)σi(s+1)t,xs2s=1s+1i=1NO(𝝈¯)σi(a)σi(s+1)t,x+s(s+1)2i=1NO(𝝈¯)σi(s+1)σi(s+2)t,x.\begin{split}\partial_{x}\langle O(\underline{\boldsymbol{\sigma}})\rangle_{t,x}&=\frac{1}{2}\sum_{a,b=1}^{s}\sum_{i=1}^{N}\langle O\underline{\boldsymbol{\sigma}})\sigma_{i}^{(a)}\sigma_{i}^{(b)}\rangle_{t,x}-\frac{s}{2}\sum_{a=1}^{s}\sum_{i=1}^{N}\langle O(\underline{\boldsymbol{\sigma}})\sigma_{i}^{(a)}\sigma_{i}^{(s+1)}\rangle_{t,x}\\ &-\frac{s}{2}\sum_{s=1}^{s+1}\sum_{i=1}^{N}\langle O(\underline{\boldsymbol{\sigma}})\sigma_{i}^{(a)}\sigma_{i}^{(s+1)}\rangle_{t,x}+\frac{s(s+1)}{2}\sum_{i=1}^{N}\langle O(\underline{\boldsymbol{\sigma}})\sigma_{i}^{(s+1)}\sigma_{i}^{(s+2)}\rangle_{t,x}.\end{split} (33)

Recalling Def. 5 and after some rearrangements of the quantities, we get the thesis. ∎

Corollary 1.

For each ll\in\mathbb{N}, the following equality holds:

xq12lt,x=N(q12l+1t,x4q12lq23t,x+3q12lq34t,x).\partial_{x}\langle q_{12}^{l}\rangle_{t,x}=N\bigl{(}\langle q_{12}^{l+1}\rangle_{t,x}-4\langle q^{l}_{12}q_{23}\rangle_{t,x}+3\langle q_{12}^{l}q_{34}\rangle_{t,x}\bigr{)}. (34)
Proof.

The proof works simply by putting O(𝝈¯)=q12lO(\underline{\boldsymbol{\sigma}})=q_{12}^{l} (which is a function of two replicas of the system) in Prop. 2. ∎

In order to proceed, we have now to make some physical assumptions on the model. As standard in spin-glass theory, the simplest requirement is the RS in the thermodynamic limit. In fact, as we are going to show, this makes the PDE approach feasible, due to the fact that we can express non-trivial expectation values of function of the replicas in a very simple form.

Proposition 1.

For the interpolated Derrida model (2), the following equality holds:

q12pt,x=(1Nx+q12t,x)p1q12t,x+QN(p1)(t,x),\langle q_{12}^{p}\rangle_{t,x}=\left(\frac{1}{N}\partial_{x}+\langle q_{12}\rangle_{t,x}\right)^{p-1}\langle q_{12}\rangle_{t,x}+Q_{N}^{(p-1)}(t,x), (35)

where QN(p1)(t,x)Q_{N}^{(p-1)}(t,x) vanishes in the NN\to\infty limit and under the RS assumption.

Proof.

Let us consider the xx-derivative of q12\langle q_{12}\rangle and try to rearrange the first contribution:

q12lq23t,x=q12lΔ(q23)t,x+q23t,xq12lt,x,\begin{split}\langle q_{12}^{l}q_{23}\rangle_{t,x}=\langle q_{12}^{l}\Delta({q_{23}})\rangle_{t,x}+\langle q_{23}\rangle_{t,x}\langle q_{12}^{l}\rangle_{t,x},\end{split} (36)

where Δ(qab)=qabqabt,x\Delta(q_{ab})=q_{ab}-\langle q_{ab}\rangle_{t,x} a,b\forall a,b is the fluctuation of the overlap w.r.t its thermodynamic value. Further

q12lq23t,x=Δ(q12l)Δ(q23)t,x+q12lt,xΔ(q23)t,x+q23t,xq12lt,x=q12t,xq12lt,x+RN(1,l)(t,x),\begin{split}\langle q_{12}^{l}q_{23}\rangle_{t,x}&=\langle\Delta(q_{12}^{l})\Delta({q_{23}})\rangle_{t,x}+\langle q_{12}^{l}\rangle_{t,x}\langle\Delta(q_{23})\rangle_{t,x}+\langle q_{23}\rangle_{t,x}\langle q_{12}^{l}\rangle_{t,x}\\ &=\langle q_{12}\rangle_{t,x}\langle q_{12}^{l}\rangle_{t,x}+R^{(1,l)}_{N}(t,x),\end{split} (37)

where, RN(1,l)(t,x)R^{(1,l)}_{N}(t,x) represents the terms involving the fluctuation functions of the overlap. In the last equality, we also used the fact that q23t,x=q12t,x\langle q_{23}\rangle_{t,x}=\langle q_{12}\rangle_{t,x} since the average is independent on the replicas labelling. The last term in (34) has a similar expansion:

q12lq34t,x=q12t,xq12lt,x+RN(2,l)(t,x),\begin{split}\langle q_{12}^{l}q_{34}\rangle_{t,x}=\langle q_{12}\rangle_{t,x}\langle q_{12}^{l}\rangle_{t,x}+R^{(2,l)}_{N}(t,x),\end{split} (38)

thus we finally get

q12l+1t,x=1Nxq12lt,x+q12t,xq12lt,x+RN(l)(t,x),\begin{split}\langle q_{12}^{l+1}\rangle_{t,x}=\frac{1}{N}\partial_{x}\langle q_{12}^{l}\rangle_{t,x}+\langle q_{12}\rangle_{t,x}\langle q_{12}^{l}\rangle_{t,x}+R_{N}^{(l)}(t,x),\end{split} (39)

where RN(l)(t,x)=4RN(1,l)(t,x)3RN(2,l)(t,x)R_{N}^{(l)}(t,x)=4R^{(1,l)}_{N}(t,x)-3R^{(2,l)}_{N}(t,x). We can then express higher moments of the overlap in terms of lower ones:

q12l+1t,x=(1Nx+q12t,x)q12lt,x+RN(l)(t,x).\begin{split}\langle q_{12}^{l+1}\rangle_{t,x}=\left(\frac{1}{N}\partial_{x}+\langle q_{12}\rangle_{t,x}\right)\langle q_{12}^{l}\rangle_{t,x}+R^{(l)}_{N}(t,x).\end{split} (40)

Iterating this procedure from l=1l=1 up to l=p1l=p-1, we obtain

q12pt,x=(1Nx+q12t,x)p1q12t,x+QN(p1)(t,x),\begin{split}\langle q_{12}^{p}\rangle_{t,x}=\left(\frac{1}{N}\partial_{x}+\langle q_{12}\rangle_{t,x}\right)^{p-1}\langle q_{12}\rangle_{t,x}+Q^{(p-1)}_{N}(t,x),\end{split} (41)

where QN(p1)(t,x)Q^{(p-1)}_{N}(t,x) collects all the terms involving RN(l)(t,x)R_{N}^{(l)}(t,x), and thus vanishes in the NN\to\infty limit. ∎

At this point, we have at our disposal all the ingredients needed for making explicit our approach. Using (41) in (28), we get

tSN,p(t,x)=12(1Nx+q12t,x)p1q12t,x+RN(t,x)12QN(p1)(t,x).\partial_{t}S_{N,p}(t,x)=-\frac{1}{2}\left(\frac{1}{N}\partial_{x}+\langle q_{12}\rangle_{t,x}\right)^{p-1}\langle q_{12}\rangle_{t,x}+R_{N}(t,x)-\frac{1}{2}Q_{N}^{(p-1)}(t,x). (42)

Deriving (42) with respect to the spatial coordinate xx, we have

txSN,p(t,x)=12x(1Nx+q12t,x)p1q12t,x+VN(t,x),\partial_{t}\partial_{x}S_{N,p}(t,x)=-\frac{1}{2}\partial_{x}\left(\frac{1}{N}\partial_{x}+\langle q_{12}\rangle_{t,x}\right)^{p-1}\langle q_{12}\rangle_{t,x}+V_{N}(t,x),

where VN(t,x):=x(RN(t,x)12QN(p1)(t,x))V_{N}(t,x):=-\partial_{x}(R_{N}(t,x)-\frac{1}{2}Q_{N}^{(p-1)}(t,x)), vanishing in the NN\to\infty limit. We can then write the following equation

tq12t,x12x(1Nx+q12t,x)p1q12t,x=VN(t,x).\partial_{t}\langle q_{12}\rangle_{t,x}-\frac{1}{2}\partial_{x}\left(\frac{1}{N}\partial_{x}+\langle q_{12}\rangle_{t,x}\right)^{p-1}\langle q_{12}\rangle_{t,x}=V_{N}(t,x). (43)

On the l.h.s. we recognize a Burgers hierarchy structure, while on the r.h.s. we have a source term (which further vanishes in the thermodynamic limit). The equilibrium dynamics of the Derrida models is then realized taking the limit NN\to\infty, as summarized in the following

Theorem 1.

The expectation value of the order parameter for the pp-spin Derrida model under the RS ansatz is given by the function q¯(β)=u(β2,0)\bar{q}(\beta)=u(\beta^{2},0), where u(t,x)u(t,x) is the solution of the inviscid limit of the pp-th element Burgers hierarchy with initial profile (49), i.e.

{tu(t,x)12xup(t,x)=0u(0,x)=𝔼Jtanh2(xJ).\begin{cases}\partial_{t}u(t,x)-\frac{1}{2}\partial_{x}u^{p}(t,x)=0\\ u(0,x)=\mathbb{E}_{J}\tanh^{2}(\sqrt{x}J)\end{cases}. (44)
Proof.

By making the limit of the previous equation (43) for NN\to\infty and recalling that VN(t,x)0V_{N}(t,x)\rightarrow 0 for NN\to\infty we get

tu(t,x)12xup(t,x)=0,\partial_{t}u(t,x)-\frac{1}{2}\partial_{x}u^{p}(t,x)=0, (45)

where u(t,x):=limNq12t,xu(t,x):=\lim_{N\to\infty}\langle q_{12}\rangle_{t,x}. The initial profile of the Cauchy problem associated to the PDE (45) is easily determined, since for t=0t=0 the partition function reduces to a 11-body problem. Thus, we have to compute u(0,x)=limNq120,xu(0,x)=\lim_{N\to\infty}\langle q_{12}\rangle_{0,x}. To this aim, we start from the partition function evaluated at t=0t=0, which is

ZN,p,𝑱(0,x)=𝝈exp(xi=1NJiσi)=i=1N2cosh(xJi).\begin{split}Z_{N,p,\boldsymbol{J}}(0,x)&=\sum_{\boldsymbol{\sigma}}\exp\bigl{(}\sqrt{x}\sum_{i=1}^{N}J_{i}\sigma_{i}\bigr{)}=\prod_{i=1}^{N}2\cosh(\sqrt{x}J_{i}).\end{split} (46)

Taking the logarithm and averaging over the quenched disorder 𝑱\boldsymbol{J}, we have the intensive pressure:

AN,p(0,x)=1N𝔼𝑱logi=1N2cosh(xJi)=log2+2Ni=1N𝔼𝑱logcosh(xJi).\begin{split}A_{N,p}(0,x)&=\frac{1}{N}\mathbb{E}_{\boldsymbol{J}}\log\prod_{i=1}^{N}2\cosh(\sqrt{x}J_{i})=\log 2+\frac{2}{N}\sum_{i=1}^{N}\mathbb{E}_{\boldsymbol{J}}\log\cosh(\sqrt{x}J_{i}).\end{split} (47)

Recalling that the Guerra’s action is defined as (27) and that the JiJ_{i} are i.i.d. so the sum of quenched averages of functions of JiJ_{i} is NN times the average w.r.t. a single quenched variables J𝒩(0,1)J\sim\mathcal{N}(0,1), we get

SN(0,x)=2log2+2𝔼Jlogcosh(xJ)x.S_{N}(0,x)=2\log 2+2\mathbb{E}_{J}\log\cosh(\sqrt{x}J)-x. (48)

Finally, taking the derivative w.r.t. the spatial coordinate, we finally have the initial profile for the overlap expectation value, which reads

u(0,x)=limNq120,x=𝔼Jtanh2(xJ).u(0,x)=\lim_{N\to\infty}\langle q_{12}\rangle_{0,x}=\mathbb{E}_{J}\tanh^{2}(\sqrt{x}J). (49)

Here, we used again the Wick-Isserlis theorem for normally distributed random variables. Putting together (45) and (44), we get the thesis. ∎

Proposition 2.

The implicit solution of the inviscid Burgers hierarchy (44) is the self-consistency equation for the order parameter q¯(β)\bar{q}(\beta) for the pp-spin model under the RS ansatz.

Proof.

Let us rewrite the differential equation (44) as

tup2up1xu=0.\partial_{t}u-\frac{p}{2}u^{p-1}\partial_{x}u=0. (50)

This is a non-linear wave equation and, as well-known, it admits a solution of the form u(t,x)=u0(xv(t,x)t)u(t,x)=u_{0}(x-v(t,x)t), where u0u_{0} is the initial profile and v(t,x)v(t,x) is the effective velocity. For the case under consideration, we have v(t,x)=p2up1(t,x)v(t,x)=-\frac{p}{2}u^{p-1}(t,x), thus

u(t,x)=𝔼Jtanh2(x+tp2u(t,x)p1J).u(t,x)=\mathbb{E}_{J}\tanh^{2}\left(\sqrt{x+t\frac{p}{2}u(t,x)^{p-1}}J\right). (51)

Recalling that q¯(β)=u(β2,0)\bar{q}(\beta)=u(\beta^{2},0), we finally have

q¯=𝔼Jtanh2(βp2q¯p1J),\bar{q}=\mathbb{E}_{J}\tanh^{2}\left(\beta\sqrt{\tfrac{p}{2}\bar{q}^{p-1}}J\right), (52)

which is precisely the self-consistency equation for the pp-spin glass model, as reported also in agliari2012notes . ∎

Corollary 2.

The (ergodicity breaking) phase transition of the pp-spin model coincides with the gradient catastrophe of the Cauchy problem (44), and the critical temperature is determined by the system

{Tc=F(ξ¯)ξ¯=F(ξ¯)F(ξ¯),\begin{cases}T_{c}=\sqrt{-F^{\prime}(\bar{\xi})}\\ \bar{\xi}=\frac{F(\bar{\xi})}{F^{\prime}(\bar{\xi})},\end{cases} (53)

where F(ξ)=p2𝔼Jtanh2(xJ)F(\xi)=-\tfrac{p}{2}\mathbb{E}_{J}\tanh^{2}(\sqrt{x}J).

Proof.

The determination of the critical temperature can be achieved with the usual analysis of intersecting characteristics of the Cauchy problem (44), and follows the same lines of fachechi2021pde . ∎

As a comparison, in Fig. 1, we reported the solutions of the self-consistency equations (52) for p=2,,8p=2,\dots,8 (solid curves), and the critical temperatures as predicted by the system (53) (dashed lines).

Refer to caption
Figure 1: Solutions of the self-consistency equations (52) for p=2,,8p=2,\dots,8 (solid curves), and the critical temperatures as predicted by the system (53) (dashed lines).

5 Application to Dense Associative Memories

Going beyond the pure spin-glass case, in this Section we will approach the DAMs that are the main focus of this work.

Definition 7.

Let 𝛔\boldsymbol{\sigma} be the generic point in the configuration space ΣN{1,+1}N\Sigma_{N}\equiv\{-1,+1\}^{N} of the system. Given KK random patterns {𝛏μ}μ=1K\{\boldsymbol{\xi}^{\mu}\}_{\mu=1}^{K} each made of NN i.i.d. binary entries drawn with equal probability P(ξiμ=1)=P(ξiμ=1)=12P(\xi^{\mu}_{i}=-1)=P(\xi^{\mu}_{i}=1)=\frac{1}{2} i=1,,N\forall i=1,\dots,N, the Hamiltonian of the pp-th order DAM is

HN,p,𝝃,K(𝝈):=1Np1μ=1Ki1,,ipNξi1μξipμσi1σip.H_{N,p,\boldsymbol{\xi},K}(\boldsymbol{\sigma}):=-\frac{1}{N^{p-1}}\sum_{\mu=1}^{K}\sum_{i_{1},\dots,i_{p}}^{N}\xi^{\mu}_{i_{1}}\dots\xi^{\mu}_{i_{p}}\sigma_{i_{1}}\dots\sigma_{i_{p}}. (54)
Remark 6.

The normalization factor 1Np1\frac{1}{N^{p-1}} ensures the linear extensivity of the Hamiltonian, in the volume of the network NN, i.e. limN|HN,p,𝛏,KN|(0,+)\lim_{N\to\infty}\left|\frac{H_{N,p,\boldsymbol{\xi},K}}{N}\right|\in(0,+\infty).

As already stated in Sec. 1, DAMs with pp-order interactions are able to store at most a number of patterns K=αNp1K=\alpha N^{p-1}, with α<αc\alpha<\alpha_{c} baldi1987number ; bovier2001spin and, in the following, we will study the model in two different regimes, that is, setting KK finite and setting KK such that α\alpha is finite, corresponding to, respectively, a simple and a complex scenario.

5.1 Low storage

Let us start the analysis of the network in a low-load regime, storing a finite number of patterns. Again, the goal is to use interpolation techniques and derive PDEs equations able to describe the thermodynamics of the system. To do this, let start by defining

Definition 8.

The order parameters used to describe the macroscopic behavior of the model are the so-called Mattis magnetizations, defined as

mμ(𝝈):=1Ni=1Nξiμσiμ=1,,Km_{\mu}(\boldsymbol{\sigma}):=\frac{1}{N}\sum_{i=1}^{N}\xi^{\mu}_{i}\sigma_{i}\quad\forall\mu=1,\dots,K (55)

measuring the overlap between the network configuration and the stored patterns.

Remark 7.

The Hamilton function (54) in terms of the Mattis magnetizations is

HN,p,𝝃,K(𝝈)=Nμ=1K(mμ(𝝈))p.H_{N,p,\boldsymbol{\xi},K}(\boldsymbol{\sigma})=-N\sum_{\mu=1}^{K}\big{(}m_{\mu}(\boldsymbol{\sigma})\big{)}^{p}.

Next, we define the basic objects of our investigations within the interpolating framework.

Definition 9.

Given (t,𝐱)K+1(t,\boldsymbol{x})\in\mathbb{R}^{K+1}, the spacetime Guerra’s interpolating partition function for the DAM model (in the low-load regime) reads as

ZN,p,𝝃,K(t,𝒙)\displaystyle Z_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x}) =\displaystyle= 𝝈ΣNexp(HN,p,𝝃,K(t,𝒙)),\displaystyle\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\exp\big{(}-H_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})\big{)}, (56)
HN,p,𝝃,K(t,𝒙)\displaystyle H_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x}) =\displaystyle= tNμ=1Kmμ(𝝈)pNμ=1Kxμmμ(𝝈).\displaystyle tN\sum_{\mu=1}^{K}m_{\mu}(\boldsymbol{\sigma})^{p}-N\sum_{\mu=1}^{K}x_{\mu}m_{\mu}(\boldsymbol{\sigma}). (57)
Remark 8.

Clearly, the spacetime Guerra’s interpolating partition function recovers the one related to the DAM by setting t=βt=-\beta and 𝐱=𝟎\boldsymbol{x}=\boldsymbol{0}.

Remark 9.

We recall that, for the CW model, the Guerra mechanical analogy consists in interpreting the statistical pressure as the Burgers hierarchy describing the motion of viscid non-linear waves in 1+11+1-dimensional space. In the case of the DAMs, we have KK Mattis magnetizations, and the dual mechanical system describes non-linear waves travelling in a K+1K+1-dimensional space.

Definition 10.

For each configuration 𝛔ΣN\boldsymbol{\sigma}\in\Sigma_{N} of the system, the Boltzmann factor corresponding to the partition function (56) is

BN,p,𝝃,K(t,𝒙)=exp(tNμ=1Kmμ(𝝈)p+Nμ=1Kxμmμ(𝝈)).B_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})=\exp\Big{(}-tN\sum_{\mu=1}^{K}m_{\mu}(\boldsymbol{\sigma})^{p}+N\sum_{\mu=1}^{K}x_{\mu}m_{\mu}(\boldsymbol{\sigma})\Big{)}. (58)
Proposition 3.

The first-order spacetime derivatives of the Guerra intensive pressure associated to the partition function (56) read as

tAN,p,𝝃,K(t,𝒙)\displaystyle\partial_{t}A_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x}) =\displaystyle= μ=1Kωt,𝒙(mμ(𝝈)p),\displaystyle-\sum_{\mu=1}^{K}\omega_{t,\boldsymbol{x}}(m_{\mu}(\boldsymbol{\sigma})^{p}), (59)
μAN,p,𝝃,K(t,𝒙)\displaystyle\partial_{\mu}A_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x}) =\displaystyle= ωt,𝒙(mμ(𝝈)).\displaystyle\omega_{t,\boldsymbol{x}}(m_{\mu}(\boldsymbol{\sigma})). (60)

where μ:=xμ\partial_{\mu}:=\partial_{x^{\mu}}.

Proof.

Recalling the definition of the intensive pressure AN,p,𝝃,K(t,𝒙)=1Nlog𝝈ΣNBN,p,𝝃,K(t,𝒙)A_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})=\frac{1}{N}\log\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}B_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x}), along with eq. 58, the proof follows straightforward computations. The temporal derivative reads

tAN,p,𝝃,K(t,𝒙)=1NZN,p,𝝃,K1(t,𝒙)𝝈ΣN(Nμ=1Kmμ(𝝈)p)BN,p,𝝃,K(t,𝒙)=μ=1Kωt,𝒙(mμ(𝝈)p).\begin{split}\partial_{t}A_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})&=\frac{1}{N}Z^{-1}_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\Big{(}-N\sum_{\mu=1}^{K}m_{\mu}(\boldsymbol{\sigma})^{p}\Big{)}B_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})=-\sum_{\mu=1}^{K}\omega_{t,\boldsymbol{x}}(m_{\mu}(\boldsymbol{\sigma})^{p}).\end{split}

while the spatial derivative reads

μAN,p,𝝃,K(t,𝒙)=1NZN,p,𝝃,K1(t,𝒙)𝝈ΣN(Nmμ(𝝈))BN,p,𝝃,K(t,𝒙)=ωt,𝒙(mμ(𝝈)).\begin{split}\partial_{\mu}A_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})&=\frac{1}{N}Z^{-1}_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\Big{(}Nm_{\mu}(\boldsymbol{\sigma})\Big{)}B_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})=\omega_{t,\boldsymbol{x}}(m_{\mu}(\boldsymbol{\sigma})).\end{split}

Proposition 4.

The higher (non-centered) momenta of the Mattis magnetizations are realized as

ωt,𝒙(mμs+1)=(1Nμ+ωt,𝒙(mμ))ωt,𝒙(mμs),\omega_{t,\boldsymbol{x}}(m_{\mu}^{s+1})=\Big{(}\frac{1}{N}\partial_{\mu}+\omega_{t,\boldsymbol{x}}(m_{\mu})\Big{)}\omega_{t,\boldsymbol{x}}(m_{\mu}^{s}), (61)

for each integer s1s\geq 1.

Proof.

We start by computing the spatial derivative of the Mattis magnetizations expectation value:

νωt,𝒙(mμs)=ν(ZN,p,𝝃,K1(t,𝒙)𝝈ΣNmμ(𝝈)sBN,p,𝝃,K(t,𝒙))==NZN,p,𝝃,K1(t,𝒙)𝝈ΣNmμ(𝝈)smν(𝝈)BN,p,𝝃,K(t,𝒙)NZN,p,𝝃,K1(t,𝒙)𝝈ΣNmμ(𝝈)sBN,p,𝝃,K(t,𝒙)ZN,p,𝝃,K1(t,𝒙)𝝈ΣNmν(𝝈)BN,p,𝝃,K(t,𝒙)==Nωt,𝒙(mμsmν)Nωt,𝒙(mμs)ωt,𝒙(mν).\begin{split}\partial_{\nu}\omega_{t,\boldsymbol{x}}(m_{\mu}^{s})&=\partial_{\nu}\Big{(}Z^{-1}_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}m_{\mu}(\boldsymbol{\sigma})^{s}B_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})\Big{)}=\\ &=NZ^{-1}_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}m_{\mu}(\boldsymbol{\sigma})^{s}m_{\nu}(\boldsymbol{\sigma})B_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})\\ &-NZ^{-1}_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}m_{\mu}(\boldsymbol{\sigma})^{s}B_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})\cdot Z^{-1}_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})\sum_{\boldsymbol{\sigma}^{\prime}\in\Sigma_{N}}m_{\nu}(\boldsymbol{\sigma}^{\prime})B_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})=\\ &=N\omega_{t,\boldsymbol{x}}(m_{\mu}^{s}m_{\nu})-N\omega_{t,\boldsymbol{x}}(m_{\mu}^{s})\omega_{t,\boldsymbol{x}}(m_{\nu}).\end{split}

In particular, for ν=μ\nu=\mu, we have

μωt,𝒙(mμs)=N[ωt,𝒙(mμs+1)ωt,𝒙(mμs)ωt,𝒙(mμ)].\partial_{\mu}\omega_{t,\boldsymbol{x}}(m_{\mu}^{s})=N[\omega_{t,\boldsymbol{x}}(m_{\mu}^{s+1})-\omega_{t,\boldsymbol{x}}(m_{\mu}^{s})\omega_{t,\boldsymbol{x}}(m_{\mu})].

Expressing the higher order moment in terms of the other quantities, we reach the thesis. ∎

By calling uμ(s)(t,𝒙):=ωt,𝒙(mμ(𝝈)s)u^{(s)}_{\mu}(t,\boldsymbol{x}):=\omega_{t,\boldsymbol{x}}(m_{\mu}(\boldsymbol{\sigma})^{s}), we can express all uμ(s)(t,𝒙)u^{(s)}_{\mu}(t,\boldsymbol{x}) in terms of uμ(1)(t,𝒙):=uμ(t,𝒙)u_{\mu}^{(1)}(t,\boldsymbol{x}):=u_{\mu}(t,\boldsymbol{x}) for each s>1s>1. Indeed

uμ(s+1)(t,𝒙)=(1Nμ+uμ(t,𝒙))uμ(s)(t,𝒙)=(1Nμ+uμ(t,𝒙))suμ(t,𝒙).u_{\mu}^{(s+1)}(t,\boldsymbol{x})=\Big{(}\frac{1}{N}\partial_{\mu}+u_{\mu}(t,\boldsymbol{x})\Big{)}u_{\mu}^{(s)}(t,\boldsymbol{x})=\Big{(}\frac{1}{N}\partial_{\mu}+u_{\mu}(t,\boldsymbol{x})\Big{)}^{s}u_{\mu}(t,\boldsymbol{x}). (62)

To simplify the notation, we define the operator Dμ:=1Nμ+uμ(t,𝒙)D_{\mu}:=\frac{1}{N}\partial_{\mu}+u_{\mu}(t,\boldsymbol{x}).

Theorem 2.

The expectation value of the Mattis magnetizations of the interpolated DAM model (9) satisfies the non-linear evolutive equations

tuμ(t,𝒙)=ν=1KμDνp1uν(t,𝒙).\partial_{t}u_{\mu}(t,\boldsymbol{x})=-\sum_{\nu=1}^{K}\partial_{\mu}D_{\nu}^{p-1}u_{\nu}(t,\boldsymbol{x}). (63)
Proof.

First, we put s=p1s=p-1 in (62), so that

uν(p)(t,𝒙)=Dνp1uν(t,𝒙).u_{\nu}^{(p)}(t,\boldsymbol{x})=D_{\nu}^{p-1}u_{\nu}(t,\boldsymbol{x}).

Now, recall that uν(p)(t,𝒙)=ωt,𝒙(mν(𝝈)p)u_{\nu}^{(p)}(t,\boldsymbol{x})=\omega_{t,\boldsymbol{x}}(m_{\nu}(\boldsymbol{\sigma})^{p}) and tAN,p,𝝃,K(t,𝒙)=μ=1Kωt,𝒙(mμ(𝝈)p)\partial_{t}A_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})=-\sum_{\mu=1}^{K}\omega_{t,\boldsymbol{x}}(m_{\mu}(\boldsymbol{\sigma})^{p}), thus

tAN,p,𝝃,K(t,𝒙)=ν=1Kuν(p)(t,𝒙)=ν=1KDνp1uν(t,𝒙).\partial_{t}A_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})=-\sum_{\nu=1}^{K}u_{\nu}^{(p)}(t,\boldsymbol{x})=-\sum_{\nu=1}^{K}D_{\nu}^{p-1}u_{\nu}(t,\boldsymbol{x}).

Taking the derivative μ\partial_{\mu}, commuting t\partial_{t} and μ\partial_{\mu} and recalling that μAN,p,𝝃,K(t,𝒙)=ωt,𝒙(mμ(𝝈))=uμ(t,𝒙)\partial_{\mu}A_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})=\omega_{t,\boldsymbol{x}}(m_{\mu}(\boldsymbol{\sigma}))=u_{\mu}(t,\boldsymbol{x}), we directly reach the thesis. ∎

Lemma 3.

The evolutive equations (63) can be linearized by means of the Cole-Hopf transform.

Proof.

To proof our assertion, we use the basic identities

(μ+μΨΨ)sμΨΨ\displaystyle\Big{(}\partial_{\mu}+\frac{\partial_{\mu}\Psi}{\Psi}\Big{)}^{s}\frac{\partial_{\mu}\Psi}{\Psi} =\displaystyle= μs+1ΨΨ,\displaystyle\frac{\partial_{\mu}^{s+1}\Psi}{\Psi},
tμΨΨ\displaystyle\partial_{t}\frac{\partial_{\mu}\Psi}{\Psi} =\displaystyle= μtΨΨ.\displaystyle\partial_{\mu}\frac{\partial_{t}\Psi}{\Psi}.

Performing the multi-dimensional Cole-Hopf transform uμ(t,𝒙)=1Nμ(logΨ)u_{\mu}(t,\boldsymbol{x})=\frac{1}{N}\partial_{\mu}(\log\Psi), we have

1NtμΨΨ=1Npν=1Kμ(μ+μΨΨ)p1μΨΨ,\begin{split}\frac{1}{N}\partial_{t}\frac{\partial_{\mu}\Psi}{\Psi}=-\frac{1}{N^{p}}\sum_{\nu=1}^{K}\partial_{\mu}\Big{(}\partial_{\mu}+\frac{\partial_{\mu}\Psi}{\Psi}\Big{)}^{p-1}\frac{\partial_{\mu}\Psi}{\Psi},\end{split}

and using the previous properties we have

μ(tΨΨ+ν=1KμpΨΨ)=0,\partial_{\mu}\Big{(}\frac{\partial_{t}\Psi}{\Psi}+\sum_{\nu=1}^{K}\frac{\partial_{\mu}^{p}\Psi}{\Psi}\Big{)}=0,

Setting the argument of the spatial derivative to zero and assuming Ψ0\Psi\neq 0, we have

tΨ+1Np1ν=1KνpΨ=0.\partial_{t}\Psi+\frac{1}{N^{p-1}}\sum_{\nu=1}^{K}\partial_{\nu}^{p}\Psi=0.

Remark 10.

In the proof, the function Ψ\Psi is nothing but Guerra’s interpolating partition function, as can be understood by comparing the definitions uμ(t,𝐱)=1NμlogΨ(t,𝐱)u_{\mu}(t,\boldsymbol{x})=\frac{1}{N}\partial_{\mu}\log\Psi(t,\boldsymbol{x}) and uμ(t,𝐱)=μAN,p,𝛏,K(t,𝐱)u_{\mu}(t,\boldsymbol{x})=\partial_{\mu}A_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x}). Indeed, by computing the derivatives of the partition function we easily get

tZN,p,𝝃,K(t,𝒙)\displaystyle\partial_{t}Z_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x}) =\displaystyle= Nμ=1K𝝈ΣNmμp(𝝈)BN,p,𝝃,K(t,𝒙),\displaystyle-N\sum_{\mu=1}^{K}\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}m_{\mu}^{p}(\boldsymbol{\sigma})B_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x}),
μpZN,p,𝝃,K(t,𝒙)\displaystyle\partial_{\mu}^{p}Z_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x}) =\displaystyle= Np𝝈ΣNmμp(𝝈)BN,p,𝝃,K(t,𝒙).\displaystyle N^{p}\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}m_{\mu}^{p}(\boldsymbol{\sigma})B_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x}).

A direct comparison shows that Guerra’s interpolating partition function satisfies the same differential equation of the Ψ\Psi potential.

Remark 11.

The case K=1K=1 corresponds to the pp-spin CW model treated in fachechi2021pde . Indeed, the partition function of the system can be handled as

ZN,p,𝝃,K=1(t,𝒙)=𝝈ΣNexp(tN(1Niξi1σi)p+Nx(1Niξi1σi))==𝝈ΣNexp(tN(1Niσi)p+Nx(1Niσi))=ZN,p(CW)(t,𝒙),\begin{split}Z_{N,p,\boldsymbol{\xi},K=1}(t,\boldsymbol{x})&=\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\exp\Big{(}-tN(\tfrac{1}{N}\sum_{i}\xi^{1}_{i}\sigma_{i})^{p}+Nx(\tfrac{1}{N}\sum_{i}\xi^{1}_{i}\sigma_{i})\Big{)}=\\ &=\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\exp\Big{(}-tN(\tfrac{1}{N}\sum_{i}\sigma_{i})^{p}+Nx(\tfrac{1}{N}\sum_{i}\sigma_{i})\Big{)}=Z_{N,p}^{(CW)}(t,\boldsymbol{x}),\end{split}

where we used the invariance of the partition function under the transformation σiξi1σi\sigma_{i}\to\xi^{1}_{i}\sigma_{i}. In this particular case we recover the Burgers hierarchy with viscosity parameter 1/N1/N: calling x1=xx_{1}=x and u1(t,𝐱)=u(t,𝐱)u_{1}(t,\boldsymbol{x})=u(t,\boldsymbol{x}), the family (63) reduces to

tu+x(1Nx+u)p1u=0.\partial_{t}u+\partial_{x}\Big{(}\frac{1}{N}\partial_{x}+u\Big{)}^{p-1}u=0.

Within this framework, we generate multi-dimensional generalization of Burgers hierarchy, see Appendix B for further details and examples.

5.2 High-storage

Here we will study the pp-spin DAMs in the high-load regime limNKNp1=α>0\lim_{N\to\infty}\frac{K}{N^{p-1}}=\alpha>0 for even pp, which now behaves as a complex system with global non-trivial properties. Let us start by observing that, in this case, the partition function related to the Hamiltonian (54) can also be written in the following form111Notice the little abuse of notation in the expression ZN,p,𝝃,α(β)Z_{N,p,\boldsymbol{\xi},\alpha}(\beta): in the subscript, α\alpha ismeant as the ratio K/Np1K/N^{p-1} by-passing the thermodynamic limit:

ZN,p,𝝃,α(β):=𝝈ΣNexp(βHN,p,𝝃,α(𝝈))==𝝈ΣNexp[βp!Np1μ=1Ki1,,ipNξi1μξipμσi1σip]==𝝈ΣNexp[βp!Np1μ=1K(i1,,ip/2Nξi1μξip/2μσi1σip/2)2],\begin{split}Z_{N,p,\boldsymbol{\xi},\alpha}(\beta)&:=\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\exp{\left(-\beta H_{N,p,\boldsymbol{\xi},\alpha}(\boldsymbol{\sigma})\right)}=\\ &=\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\exp{\Bigg{[}\frac{\beta}{p!N^{p-1}}\sum_{\mu=1}^{K}\sum_{i_{1},\dots,i_{p}}^{N}\xi^{\mu}_{i_{1}}\dots\xi^{\mu}_{i_{p}}\sigma_{i_{1}}\dots\sigma_{i_{p}}\Bigg{]}}=\\ &=\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\exp{\Bigg{[}\frac{\beta}{p!N^{p-1}}\sum_{\mu=1}^{K}\Big{(}\sum_{i_{1},\dots,i_{p/2}}^{N}\xi^{\mu}_{i_{1}}\dots\xi^{\mu}_{i_{p/2}}\sigma_{i_{1}}\dots\sigma_{i_{p/2}}\Big{)}^{2}\Bigg{]}},\end{split} (64)

and, by Hubbard-Stratonovich transforming, we get

ZN,p,𝝃,α(β)=𝝈ΣNμ=1K𝑑τμeτμ222πexp(2βp!Np1i1,,ip/2Nξi1μξip/2μσi1σip/2τμ).\begin{split}Z_{N,p,\boldsymbol{\xi},\alpha}(\beta)=\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\prod_{\mu=1}^{K}\int d\tau_{\mu}\frac{e^{-\frac{\tau_{\mu}^{2}}{2}}}{\sqrt{2\pi}}\exp{\Bigg{(}\sqrt{\frac{2\beta}{p!N^{p-1}}}\sum_{i_{1},\dots,i_{p/2}}^{N}\xi^{\mu}_{i_{1}}\dots\xi^{\mu}_{i_{p/2}}\sigma_{i_{1}}\dots\sigma_{i_{p/2}}\tau_{\mu}\Bigg{)}}.\end{split} (65)

Here, we used the index α\alpha rather than KK in order to distinguish between the partition function of low and high storage regimes.

Remark 12.

As standard in statistical mechanics of (complex) neural networks, we will assume that a single pattern is candidate to be retrieved, say 𝛏1\boldsymbol{\xi}^{1}. Under this assumption, we can treat separately the Mattis magnetization m:=m1m:=m_{1} corresponding to the recalled pattern from those associated to non-retrieved ones.

Definition 11.

Given (t,𝐱)4(t,\boldsymbol{x})\in\mathbb{R}^{4}, the spacetime Guerra’s interpolating partition function for the DAM model (in the high storage regime) reads as

ZN,p,𝝃,α(t,𝒙)\displaystyle Z_{N,p,\boldsymbol{\xi},\alpha}(t,\boldsymbol{x}) =\displaystyle= 𝝈ΣNμ=1K𝑑τμeτμ222πexp(HN,p,𝝃,α(t,𝒙)),\displaystyle\sum_{\boldsymbol{\sigma}\in\Sigma_{N}}\prod_{\mu=1}^{K}\int d\tau_{\mu}\frac{e^{-\frac{\tau_{\mu}^{2}}{2}}}{\sqrt{2\pi}}\exp\big{(}-H_{N,p,\boldsymbol{\xi},\alpha}(t,\boldsymbol{x})\big{)}, (66)
HN,p,𝝃,α(t,𝒙)\displaystyle H_{N,p,\boldsymbol{\xi},\alpha}(t,\boldsymbol{x}) =\displaystyle= tN2mptNp1μ=2Ki1,,ip/2Nξi1μξip/2μσi1σip/2τμ\displaystyle-\frac{tN}{2}m^{p}-\sqrt{\frac{t}{N^{p-1}}}\sum_{\mu=2}^{K}\sum_{i_{1},\dots,i_{p/2}}^{N}\xi_{i_{1}}^{\mu}\dots\xi^{\mu}_{i_{p/2}}\sigma_{i_{1}}\dots\sigma_{i_{p/2}}\tau_{\mu}
\displaystyle- xi=1NηiσiN1p/2yμ=1KθμτμN1p/22(tt0+y)μ=1Kτμ2N2zm,\displaystyle\sqrt{x}\sum_{i=1}^{N}\eta_{i}\sigma_{i}-\sqrt{N^{1-p/2}y}\sum_{\mu=1}^{K}\theta_{\mu}\tau_{\mu}-\frac{N^{1-p/2}}{2}(t-t_{0}+y)\sum_{\mu=1}^{K}\tau_{\mu}^{2}-\frac{N}{2}zm,

where 𝐱=(x,y,z)\boldsymbol{x}=(x,y,z).

Remark 13.

Clearly, the partition function of the original DAM model (65) is recovered by setting t0=2βp!t_{0}=\frac{2\beta}{p!} and (t,x,y,z)=(2βp!,0,0,0)(t,x,y,z)=(\frac{2\beta}{p!},0,0,0).

In the case under consideration, the Boltzmann average w.r.t. the interpolating measure is

ωt,𝒙(O)=𝝈𝑑μ(𝝉)O(𝝈,𝝉)BN,p,𝝃,α(t,𝒙)ZN,p,𝝃,α(t,𝒙),\omega_{t,\boldsymbol{x}}(O)=\frac{\sum_{\boldsymbol{\sigma}}\int d\mu(\boldsymbol{\tau})\,O(\boldsymbol{\sigma},\boldsymbol{\tau})B_{N,p,\boldsymbol{\xi},\alpha}(t,\boldsymbol{x})}{Z_{N,p,\boldsymbol{\xi},\alpha}(t,\boldsymbol{x})}, (68)

where O(𝝈,𝝉)O(\boldsymbol{\sigma},\boldsymbol{\tau}) is a generic observables in the configuration space of the system, and BN,pB_{N,p} is the generalized Boltzmann factor for the pp-spin model defined as

BN,p,𝝃,α(t,𝒙):=exp(tN2mp+tNp1μ=2Ki1,,ip/2Nξi1μξip/2μσi1σip/2τμ+xi=1Nηiσi+N1p/2yμ=1KθμτμN1p/22(tt0+y)μ=1Kτμ2+N2zm).\begin{split}B_{N,p,\boldsymbol{\xi},\alpha}(t,\boldsymbol{x}):&=\exp\biggl{(}\frac{tN}{2}m^{p}+\sqrt{\frac{t}{N^{p-1}}}\sum_{\mu=2}^{K}\sum_{i_{1},\dots,i_{p/2}}^{N}\xi_{i_{1}}^{\mu}\dots\xi^{\mu}_{i_{p/2}}\sigma_{i_{1}}\dots\sigma_{i_{p/2}}\tau_{\mu}\\ &+\sqrt{x}\sum_{i=1}^{N}\eta_{i}\sigma_{i}+\sqrt{N^{1-p/2}y}\sum_{\mu=1}^{K}\theta_{\mu}\tau_{\mu}-\frac{N^{1-p/2}}{2}(t-t_{0}+y)\sum_{\mu=1}^{K}\tau_{\mu}^{2}+\frac{N}{2}zm\biggl{)}.\end{split}

As we said in Sec. 1, the high storage regime of associative neural networks exhibits both ferromagnetic and spin-glass features. Thus, besides the usual Mattis magnetisations, we need the overlap for the two sets of relevant variables in the integral formulation of the partition function (65):

Definition 12.

The order parameters used to describe the macroscopic behavior of the model are the overlap mm (already defined in (8) and used to quantify the retrieval capability of the network), the replica overlap in the 𝛔\boldsymbol{\sigma} variables

q12:=1Ni=1Nσi(1)σi(2),q_{12}:=\frac{1}{N}\sum_{i=1}^{N}\sigma_{i}^{(1)}\sigma_{i}^{(2)}, (69)

and the replica overlap in the 𝛕\boldsymbol{\tau}’s variables

p12:=1Np/2μ=1Kτμ(1)τμ(2).p_{12}:=\frac{1}{N^{p/2}}\sum_{\mu=1}^{K}\tau_{\mu}^{(1)}\tau_{\mu}^{(2)}. (70)

With all these ingredients at our hand, we now move to the formulation of the PDE duality of DAMs in the high-storage limit.

Definition 13.

For all even p2p\geq 2, Guerra’s action functional is defined as

SN,p,α(t,𝒙):=2AN,p,α(t,𝒙)x.S_{N,p,\alpha}(t,\boldsymbol{x}):=2A_{N,p,\alpha}(t,\boldsymbol{x})-x. (71)
Lemma 4.

The partial derivatives of Guerra’s action SN,p(t,𝐱)S_{N,p}(t,\boldsymbol{x}) can be expressed in terms of the generalized expectations of the order parameters as

tSN,p,α=mpt,𝒙p12q12p/2t,𝒙,xSN,p,α=q12t,𝒙,ySN,p,α=p12t,𝒙,zSN,p,α=mt,𝒙.\begin{split}\partial_{t}S_{N,p,\alpha}&=\langle m^{p}\rangle_{t,\boldsymbol{x}}-\langle p_{12}q^{p/2}_{12}\rangle_{t,\boldsymbol{x}},\\ \partial_{x}S_{N,p,\alpha}&=-\langle q_{12}\rangle_{t,\boldsymbol{x}},\\ \partial_{y}S_{N,p,\alpha}&=-\langle p_{12}\rangle_{t,\boldsymbol{x}},\\ \partial_{z}S_{N,p,\alpha}&=\langle m\rangle_{t,\boldsymbol{x}}.\end{split} (72)

The computation of the spacetime derivatives is pretty lengthy but straightforward. We report the computation of the derivatives in Appendix C.

In order to derive differential identities for the expectation values of the order parameters, we need to compute the spatial derivatives of a generic function of two replicas O(𝝈(1),𝝈(2),𝝉(1),𝝉(2))O(\boldsymbol{\sigma}^{(1)},\boldsymbol{\sigma}^{(2)},\boldsymbol{\tau}^{(1)},\boldsymbol{\tau}^{(2)}).

Proposition 5.

Let 𝛔¯=(𝛔(1),𝛔(2))\boldsymbol{\underline{\sigma}}=(\boldsymbol{\sigma}^{(1)},\boldsymbol{\sigma}^{(2)}) and 𝛕¯=(𝛕(1),𝛕(2))\boldsymbol{\underline{\tau}}=(\boldsymbol{\tau}^{(1)},\boldsymbol{\tau}^{(2)}) be the configurations of the 2-replicated system. Then

xO(𝝈¯,𝝉¯)t,𝒙=N2a,b=12O(𝝈¯,𝝉¯)qabt,𝒙2Na=12O(𝝈¯,𝝉¯)qa3t,𝒙NO(𝝈¯,𝝉¯)t,𝒙+3NO(𝝈¯,𝝉¯)q34t,𝒙,\begin{split}\partial_{x}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\rangle_{t,\boldsymbol{x}}&=\frac{N}{2}\sum_{a,b=1}^{2}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})q_{ab}\rangle_{t,\boldsymbol{x}}-2N\sum_{a=1}^{2}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})q_{a3}\rangle_{t,\boldsymbol{x}}-N\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\rangle_{t,\boldsymbol{x}}\\ &+3N\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})q_{34}\rangle_{t,\boldsymbol{x}},\end{split} (73)
yO(𝝈¯,𝝉¯)t,𝒙=N2a,b=12O(𝝈¯,𝝉¯)pabt,𝒙2Na=12O(𝝈¯,𝝉¯)pa3t,𝒙NO(𝝈¯,𝝉¯)p33t,𝒙+3NO(𝝈¯,𝝉¯)p34t,𝒙.\begin{split}\partial_{y}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\rangle_{t,\boldsymbol{x}}&=\frac{N}{2}\sum_{a,b=1}^{2}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})p_{ab}\rangle_{t,\boldsymbol{x}}-2N\sum_{a=1}^{2}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})p_{a3}\rangle_{t,\boldsymbol{x}}-N\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})p_{33}\rangle_{t,\boldsymbol{x}}\\ &+3N\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})p_{34}\rangle_{t,\boldsymbol{x}}.\end{split} (74)

and

zO(𝝈¯,𝝉¯)t,𝒙=N(O(𝝈¯,𝝉¯)mt,𝒙O(𝝈¯,𝝉¯)t,𝒙mt,𝒙)\partial_{z}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\rangle_{t,\boldsymbol{x}}=N\left(\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})m\rangle_{t,\boldsymbol{x}}-\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\rangle_{t,\boldsymbol{x}}\langle m\rangle_{t,\boldsymbol{x}}\right) (75)

The complete proof is given in Appendix D.

Corollary 3.

For all l+l\in\mathbb{N}^{+}, the following equalities holds

xp12q12lt,𝒙=N(p12q12l+1t,𝒙4p12q12lq13t,𝒙+3p12q12lq34t,𝒙).\partial_{x}\langle p_{12}q_{12}^{l}\rangle_{t,\boldsymbol{x}}=N\left(\langle p_{12}q_{12}^{l+1}\rangle_{t,\boldsymbol{x}}-4\langle p_{12}q_{12}^{l}q_{13}\rangle_{t,\boldsymbol{x}}+3\langle p_{12}q_{12}^{l}q_{34}\rangle_{t,\boldsymbol{x}}\right). (76)
Proof.

The proof is a simple application of Eq. (73) with O(𝝈¯,𝝉¯)=p12q12lO(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})=p_{12}q_{12}^{l}. Thus

xp12q12lt,𝒙=N2(p12q12lt,𝒙+p12q12lq12t,𝒙+p12q12lq21t,𝒙+p12q12lt,𝒙)2N(p12q12lq13t,𝒙+p12q12lq23t,𝒙)Np12q12lt,𝒙+3Np12q12lq34t,𝒙=Np12q12l+1t,𝒙4Np12q12lq13t,𝒙+3Np12q12lq34t,𝒙.\begin{split}\partial_{x}\langle p_{12}q_{12}^{l}\rangle_{t,\boldsymbol{x}}&=\frac{N}{2}\left(\langle p_{12}q_{12}^{l}\rangle_{t,\boldsymbol{x}}+\langle p_{12}q_{12}^{l}q_{12}\rangle_{t,\boldsymbol{x}}+\langle p_{12}q_{12}^{l}q_{21}\rangle_{t,\boldsymbol{x}}+\langle p_{12}q_{12}^{l}\rangle_{t,\boldsymbol{x}}\right)\\ &-2N\left(\langle p_{12}q_{12}^{l}q_{13}\rangle_{t,\boldsymbol{x}}+\langle p_{12}q_{12}^{l}q_{23}\rangle_{t,\boldsymbol{x}}\right)-N\langle p_{12}q_{12}^{l}\rangle_{t,\boldsymbol{x}}+3N\langle p_{12}q_{12}^{l}q_{34}\rangle_{t,\boldsymbol{x}}\\ &=N\langle p_{12}q_{12}^{l+1}\rangle_{t,\boldsymbol{x}}-4N\langle p_{12}q_{12}^{l}q_{13}\rangle_{t,\boldsymbol{x}}+3N\langle p_{12}q_{12}^{l}q_{34}\rangle_{t,\boldsymbol{x}}.\end{split}

Also in this case, we will work under the RS assumption in order to simplify the computations by neglecting the fluctuations of the order parameters w.r.t. their expectation values.

Proposition 6.

The following equalities holds

mpt,𝒙\displaystyle\langle m^{p}\rangle_{t,\boldsymbol{x}} =\displaystyle= (1Nz+mt,𝒙)p1mt,𝒙,\displaystyle\left(\frac{1}{N}\partial_{z}+\langle m\rangle_{t,\boldsymbol{x}}\right)^{p-1}\langle m\rangle_{t,\boldsymbol{x}}, (77)
p12q12p/2t,𝒙\displaystyle\langle p_{12}q_{12}^{p/2}\rangle_{t,\boldsymbol{x}} =\displaystyle= (1Nx+q12t,𝒙)p/2p12t,𝒙+RN(p2)(t,𝒙),\displaystyle\left(\frac{1}{N}\partial_{x}+\langle q_{12}\rangle_{t,\boldsymbol{x}}\right)^{p/2}\langle p_{12}\rangle_{t,\boldsymbol{x}}+R_{N}^{(\frac{p}{2})}(t,\boldsymbol{x}), (78)

where RN(p2)(t,𝐱)R_{N}^{(\frac{p}{2})}(t,\boldsymbol{x}) collects the terms involving the fluctuations of the order parameters, and thus vanishes in the NN\to\infty limit and under the RS assumption.

Proof.

To simplify the notation, we will drop the subscript t,𝒙t,\boldsymbol{x} from the quenched averages. The derivation of (77) iterating the property (75) with O(𝝈,𝝉)=m(𝝈)p1O(\boldsymbol{\sigma},\boldsymbol{\tau})=m(\boldsymbol{\sigma})^{p-1}. Let us now observe that

p12q12lq13=p12q12hΔ(q13)+p12q12lq13=p12q12lq13+RN(1,l)(t,𝒙),\langle p_{12}q_{12}^{l}q_{13}\rangle=\left\langle p_{12}q_{12}^{h}\Delta(q_{13})\right\rangle+\langle p_{12}q_{12}^{l}\rangle\langle q_{13}\rangle=\langle p_{12}q_{12}^{l}\rangle\langle q_{13}\rangle+R^{(1,l)}_{N}(t,\boldsymbol{x}),

and

p12q12lq34=p12q12lΔ(q34)+p12q12lq34=p12q12lq34+RN(2,l)(t,𝒙),\langle p_{12}q_{12}^{l}q_{34}\rangle=\left\langle p_{12}q_{12}^{l}\Delta(q_{34})\right\rangle+\langle p_{12}q_{12}^{l}\rangle\langle q_{34}\rangle=\langle p_{12}q_{12}^{l}\rangle\langle q_{34}\rangle+R^{(2,l)}_{N}(t,\boldsymbol{x}),

where Δ(qab):=qabqab\Delta(q_{ab}):=q_{ab}-\langle q_{ab}\rangle, and RN(1,l),RN(2,l)R^{(1,l)}_{N},R^{(2,l)}_{N} collect the contributions involving the fluctuations. Then, recalling eq.(76), we can write

p12q12l+1=1Nxp12q12l+4p12q12lq133p12q12lq34+RN(l)(t,𝒙)==1Nxp12q12l+p12q12lq12+RN(l)(t,𝒙),\begin{split}\langle p_{12}q_{12}^{l+1}\rangle&=\frac{1}{N}\partial_{x}\langle p_{12}q_{12}^{l}\rangle+4\langle p_{12}q_{12}^{l}\rangle\langle q_{13}\rangle-3\langle p_{12}q_{12}^{l}\rangle\langle q_{34}\rangle+R^{(l)}_{N}(t,\boldsymbol{x})=\\ &=\frac{1}{N}\partial_{x}\langle p_{12}q_{12}^{l}\rangle+\langle p_{12}q_{12}^{l}\rangle\langle q_{12}\rangle+R^{(l)}_{N}(t,\boldsymbol{x}),\end{split}

where RN(l)(t,𝒙):=4RN(1,l)(t,𝒙)3RN(2,l)(t,𝒙)R^{(l)}_{N}(t,\boldsymbol{x}):=4R^{(1,l)}_{N}(t,\boldsymbol{x})-3R^{(2,l)}_{N}(t,\boldsymbol{x}) and we used q34=q13=q12\langle q_{34}\rangle=\langle q_{13}\rangle=\langle q_{12}\rangle following from the invariance under replica labelling. The previous equation can be thus written in the following way

p12q12l+1=(1Nx+q12)p12q12l+RN(l)(t,𝒙).\langle p_{12}q_{12}^{l+1}\rangle=\left(\frac{1}{N}\partial_{x}+\langle q_{12}\rangle\right)\langle p_{12}q_{12}^{l}\rangle+R^{(l)}_{N}(t,\boldsymbol{x}). (79)

Iterating the procedure, we get

p12q12l+1=(1Nx+q12)l+1p12+RN(p2)(t,𝒙),\langle p_{12}q_{12}^{l+1}\rangle=\left(\frac{1}{N}\partial_{x}+\langle q_{12}\rangle\right)^{l+1}\langle p_{12}\rangle+R_{N}^{(\frac{p}{2})}(t,\boldsymbol{x}), (80)

where RN(p2)(t,𝒙)R_{N}^{(\frac{p}{2})}(t,\boldsymbol{x}) collects all the terms involving the rests of previous expansions (and thus vanishes in the NN\to\infty limit and under the RS assumption). Then, by imposing l=p/21l=p/2-1 we get the thesis. ∎

Now we can use all the information obtained to build a PDE that can describe the thermodynamics of the DAM models. Indeed, recalling the temporal derivative of the Guerra’s action (72) and using the result obtained in Prop. 6, we have

tSN,p,α=(1Nz+mt,𝒙)p1mt,𝒙(1Nx+q12t,𝒙)p/2p12t,𝒙RN(p2)(t,𝒙).\partial_{t}S_{N,p,\alpha}=\left(\frac{1}{N}\partial_{z}+\langle m\rangle_{t,\boldsymbol{x}}\right)^{p-1}\langle m\rangle_{t,\boldsymbol{x}}-\left(\frac{1}{N}\partial_{x}+\langle q_{12}\rangle_{t,\boldsymbol{x}}\right)^{p/2}\langle p_{12}\rangle_{t,\boldsymbol{x}}-R_{N}^{(\frac{p}{2})}(t,\boldsymbol{x}). (81)

Finally, taking the spatial derivatives of this expression and denoting 𝑫N(t,𝒙)=RN(p/2)\boldsymbol{D}_{N}(t,\boldsymbol{x})=-\boldsymbol{\nabla}R_{N}^{(p/2)}, we have

tq12t,𝒙x(1Nz+mt,𝒙)p1mt,𝒙+x(1Nx+q12t,𝒙)p/2p12t,𝒙=DN,x,tp12t,𝒙y(1Nzmt,𝒙)p1mt,𝒙+y(1Nx+q12t,𝒙)p/2p12t,𝒙=DN,y,tmt,𝒙z(1Nz+mt,𝒙)p1mt,𝒙+z(1Nx+q12t,𝒙)p/2p12t,𝒙=DN,z.\begin{split}-\partial_{t}\langle q_{12}\rangle_{t,\boldsymbol{x}}-\partial_{x}\Big{(}\frac{1}{N}\partial_{z}+\langle m\rangle_{t,\boldsymbol{x}}\Big{)}^{p-1}\langle m\rangle_{t,\boldsymbol{x}}+\partial_{x}\Big{(}\frac{1}{N}\partial_{x}+\langle q_{12}\rangle_{t,\boldsymbol{x}}\Big{)}^{p/2}\langle p_{12}\rangle_{t,\boldsymbol{x}}&=D_{N,x},\\ \\ -\partial_{t}\langle p_{12}\rangle_{t,\boldsymbol{x}}-\partial_{y}\Big{(}\frac{1}{N}\partial_{z}-\langle m\rangle_{t,\boldsymbol{x}}\Big{)}^{p-1}\langle m\rangle_{t,\boldsymbol{x}}+\partial_{y}\Big{(}\frac{1}{N}\partial_{x}+\langle q_{12}\rangle_{t,\boldsymbol{x}}\Big{)}^{p/2}\langle p_{12}\rangle_{t,\boldsymbol{x}}&=D_{N,y},\\ \\ \partial_{t}\langle m\rangle_{t,\boldsymbol{x}}-\partial_{z}\Big{(}\frac{1}{N}\partial_{z}+\langle m\rangle_{t,\boldsymbol{x}}\Big{)}^{p-1}\langle m\rangle_{t,\boldsymbol{x}}+\partial_{z}\Big{(}\frac{1}{N}\partial_{x}+\langle q_{12}\rangle_{t,\boldsymbol{x}}\Big{)}^{p/2}\langle p_{12}\rangle_{t,\boldsymbol{x}}&=D_{N,z}.\end{split} (82)

The l.h.s. of the system of PDEs constitute the 3+1-dimensional DAM generalization of the Burgers hierarchy structure. Similarly to the Derrida model case, at finite NN we have a source term on the r.h.s. which vanishes in the limit NN\to\infty under the RS assumption of the order parameters. In this case, we can analyse the thermodynamic limit and describe the equilibrium dynamics of the model.

Theorem 3.

The high-storage regime for the DAM models under the RS assumption in the thermodynamic limit can be described by the following system of partial differential equations:

{tq¯+xm¯pxq¯p/2p¯=0tp¯+ym¯pyq¯p/2p¯=0tm¯zm¯p+zq¯p/2p¯=0,\begin{cases}\partial_{t}\bar{q}+\partial_{x}\bar{m}^{p}-\partial_{x}\bar{q}^{p/2}\bar{p}&=0\\ \partial_{t}\bar{p}+\partial_{y}\bar{m}^{p}-\partial_{y}\bar{q}^{p/2}\bar{p}&=0\\ \partial_{t}\bar{m}-\partial_{z}\bar{m}^{p}+\partial_{z}\bar{q}^{p/2}\bar{p}&=0\end{cases}, (83)

with the initial conditions

{q¯(0,𝒙)=𝔼η[tanh2(xη+z2)]p¯(0,𝒙)={αy(1+yt0)2if p=2αyif p=2k with k=2,3,m¯(0,𝒙)=𝔼η[tanh(xη+z2)],\begin{cases}\bar{q}(0,\boldsymbol{x})&=\mathbb{E}_{\eta}\left[\tanh^{2}\left(\sqrt{x}\eta+\frac{z}{2}\right)\right]\\ \bar{p}(0,\boldsymbol{x})&=\begin{cases}\frac{\alpha y}{(1+y-t_{0})^{2}}&\mbox{if }p=2\\ \alpha y&\mbox{if }p=2k\mbox{ with }k=2,3,\dots\end{cases}\\ \bar{m}(0,\boldsymbol{x})&=\mathbb{E}_{\eta}\left[\tanh\left(\sqrt{x}\eta+\frac{z}{2}\right)\right]\end{cases}, (84)

where 𝔼η\mathbb{E}_{\eta} is the Gaussian average over the variable η\eta.

Proof.

First, let us call

m¯(t,𝒙)=limNmt,𝒙,q¯(t,𝒙)=limNq12t,𝒙,p¯(t,𝒙)=limNp12t,𝒙,\bar{m}(t,\boldsymbol{x})=\lim_{N\to\infty}\langle m\rangle_{t,\boldsymbol{x}},\quad\bar{q}(t,\boldsymbol{x})=\lim_{N\to\infty}\langle q_{12}\rangle_{t,\boldsymbol{x}},\quad\bar{p}(t,\boldsymbol{x})=\lim_{N\to\infty}\langle p_{12}\rangle_{t,\boldsymbol{x}},

the expectation values of the order parameters in the thermodynamic limit. Taking NN\to\infty in (82) and recalling that the source contributions 𝑫N(t,𝒙)\boldsymbol{D}_{N}(t,\boldsymbol{x}) vanish in this limit under the RS assumption, we arrive at the PDE system (83). Let us now find the initial conditions (84). To do this we start calculating the interpolating partition function in t=0t=0

ZN,p,𝝃,α(0,𝒙)=𝝈dμ(𝝉)exp(xi=1Nηiσi+yNp/21μ=1Kθμτμyt02Np/21μ=1Kτμ2+z2i=1Nσi)==𝝈exp(i=1N(xηi+12z)σi)dμ(𝝉)exp(yNp/21μ=1Kθμτμyt02Np/21μ=1Kτμ2)==i=1N2cosh(xηi+z2)μ=1K1N1p/2(yt0)+1exp(N1p/2yθμ22(N1p/2(yt0)+1)).\begin{split}Z_{N,p,\boldsymbol{\xi},\alpha}(0,\boldsymbol{x})&=\sum_{\boldsymbol{\sigma}}\int d\mu(\boldsymbol{\tau})\exp\biggl{(}\sqrt{x}\sum_{i=1}^{N}\eta_{i}\sigma_{i}+\sqrt{\frac{y}{N^{p/2-1}}}\sum_{\mu=1}^{K}\theta_{\mu}\tau_{\mu}-\frac{y-t_{0}}{2N^{p/2-1}}\sum_{\mu=1}^{K}\tau_{\mu}^{2}+\frac{z}{2}\sum_{i=1}^{N}\sigma_{i}\biggl{)}=\\ &=\sum_{\boldsymbol{\sigma}}\exp\Big{(}\sum_{i=1}^{N}(\sqrt{x}\eta_{i}+\tfrac{1}{2}z)\sigma_{i}\Big{)}\int d\mu(\boldsymbol{\tau})\exp\biggl{(}\sqrt{\frac{y}{N^{p/2-1}}}\sum_{\mu=1}^{K}\theta_{\mu}\tau_{\mu}-\frac{y-t_{0}}{2N^{p/2-1}}\sum_{\mu=1}^{K}\tau_{\mu}^{2}\biggl{)}=\\ &=\prod_{i=1}^{N}2\cosh\left(\sqrt{x}\eta_{i}+\frac{z}{2}\right)\prod_{\mu=1}^{K}\frac{1}{\sqrt{N^{1-p/2}(y-t_{0})+1}}\exp\left(\frac{N^{1-p/2}y\theta_{\mu}^{2}}{2\left(N^{1-p/2}(y-t_{0})+1\right)}\right).\end{split}

By using the definition of the interpolating statistical pressure (4), we see that

AN,p,α(0,𝒙)=1Ni=1N𝔼log2cosh(xηi+z2)+1Nμ=1K𝔼(N1p/2yθμ22(N1p/2(yt0)+1))K2Nlog(N1p/2(yt0)+1)==𝔼log2cosh(xη+z2)+KNp/2y2(N1p/2(yt0)+1)K2Nlog(N1p/2(yt0)+1),\begin{split}A_{N,p,\alpha}(0,\boldsymbol{x})&=\frac{1}{N}\sum_{i=1}^{N}\mathbb{E}\log 2\cosh\left(\sqrt{x}\eta_{i}+\frac{z}{2}\right)+\frac{1}{N}\sum_{\mu=1}^{K}\mathbb{E}\left(\frac{N^{1-p/2}y\theta_{\mu}^{2}}{2\left(N^{1-p/2}(y-t_{0})+1\right)}\right)\\ &-\frac{K}{2N}\log\left(N^{1-p/2}(y-t_{0})+1\right)=\\ &=\mathbb{E}\log 2\cosh\left(\sqrt{x}\eta+\frac{z}{2}\right)+\frac{K}{N^{p/2}}\frac{y}{2\left(N^{1-p/2}(y-t_{0})+1\right)}\\ &-\frac{K}{2N}\log\left(N^{1-p/2}(y-t_{0})+1\right),\end{split}

where we used the fact that the ηis\eta_{i}^{\prime}s are i.i.d. random variables and 𝔼θμ2=1\mathbb{E}\theta_{\mu}^{2}=1 for all μ=1,,K\mu=1,\dots,K. Now, recalling Eq. (71), we can straightforwardly derive the initial condition for the order parameters according to (72). First

q(0,𝒙)=limN(xSN,p,α(0,𝒙))=x[𝔼ηlog2cosh(xη+z2)x]==𝔼η[tanh2(xη+z2)].\begin{split}q({0,\boldsymbol{x}})&=\lim_{N\to\infty}\bigl{(}-\partial_{x}S_{N,p,\alpha}(0,\boldsymbol{x})\bigl{)}=-\partial_{x}\Big{[}\mathbb{E}_{\eta}\log 2\cosh\left(\sqrt{x}\eta+\frac{z}{2}\right)-x\Big{]}=\\ &=\mathbb{E}_{\eta}\left[\tanh^{2}\left(\sqrt{x}\eta+\frac{z}{2}\right)\right].\end{split} (85)

Analogously, we have

p(0,𝒙)=limN(ySN,p,α(0,𝒙))==limNy(KNp/2y1+N1p/2(yt0)KNlog[1+N1p/2(yt0)])==limN(KNp1y[1+N1p/2(yt0)]2)=={αy[1+(yt0)]2if p=2αyif p=2k and k=2,3,.\begin{split}p(0,\boldsymbol{x})&=\lim_{N\to\infty}\bigl{(}-\partial_{y}S_{N,p,\alpha}(0,\boldsymbol{x})\bigr{)}=\\ &=-\lim_{N\to\infty}\partial_{y}\Big{(}\frac{K}{N^{p/2}}\frac{y}{1+N^{1-p/2}(y-t_{0})}-\frac{K}{N}\log\big{[}1+N^{1-p/2}(y-t_{0})\big{]}\Big{)}=\\ &=-\lim_{N\to\infty}\left(-\frac{K}{N^{p-1}}\frac{y}{[1+N^{1-p/2}(y-t_{0})]^{2}}\right)=\\ &=\begin{cases}\frac{\alpha y}{[1+(y-t_{0})]^{2}}\quad&\text{if }p=2\\ \alpha y\quad&\text{if }p=2k\text{ and }k=2,3,\dots\\ \end{cases}.\end{split} (86)

Finally

m(0,𝒙)=limNzSN,p,α(0,𝒙)=z𝔼ηlog2cosh(xη+z2)=𝔼ηtanh(xη+z2).\begin{split}m(0,\boldsymbol{x})=\lim_{N\to\infty}\partial_{z}S_{N,p,\alpha}(0,\boldsymbol{x})=\partial_{z}\mathbb{E}_{\eta}\log 2\cosh\left(\sqrt{x}\eta+\frac{z}{2}\right)=\mathbb{E}_{\eta}\tanh\left(\sqrt{x}\eta+\frac{z}{2}\right).\end{split} (87)

This concludes the proof. ∎

Proposition 7.

The system of PDEs (83) can be rewritten in a non-linear wave equation as

tϕ+(𝒗)ϕ=0,\partial_{t}\boldsymbol{\phi}+(\boldsymbol{v}\cdot\boldsymbol{\nabla})\boldsymbol{\phi}=0, (88)

where ϕ:=(q¯,p¯,m¯)\boldsymbol{\phi}:=(\bar{q},\bar{p},\bar{m}) is the vector of the order parameters and 𝐯:=(p2q¯p/21p¯,q¯p/2,pm¯p1)\boldsymbol{v}:=\left(-\frac{p}{2}\bar{q}^{p/2-1}\bar{p},-\bar{q}^{p/2},-p\bar{m}^{p-1}\right) is the effective velocity.

Proof.

We prove the equation (88) for the first component, as the others follow accordingly. Let us define the function G(ϕ)=m¯pq¯p/2p¯G(\boldsymbol{\phi})=\bar{m}^{p}-\bar{q}^{p/2}\bar{p}, so that the PDE for the order parameter q¯\bar{q} can be rewritten as

tq¯+xG(ϕ)=0.\partial_{t}\bar{q}+\partial_{x}G(\boldsymbol{\phi})=0.

The xx-derivative of the function GG is straightforwardly computed:

xG(ϕ)=x(m¯pq¯p/2p¯)=pm¯p1xm¯p2q¯p/21p¯xq¯q¯p/2xp¯.\partial_{x}G(\boldsymbol{\phi})=\partial_{x}(\bar{m}^{p}-\bar{q}^{p/2}\bar{p})=p\bar{m}^{p-1}\partial_{x}\bar{m}-\frac{p}{2}\bar{q}^{p/2-1}\bar{p}\ \partial_{x}\bar{q}-\bar{q}^{p/2}\partial_{x}\bar{p}.

Now

xp¯=x(limNySN,p,α(t,𝒙))=y(limNxSN,p,α(t,𝒙))=yq¯.\partial_{x}\bar{p}=\partial_{x}\left(-\lim_{N\to\infty}\partial_{y}S_{N,p,\alpha}(t,\boldsymbol{x})\right)=\partial_{y}\left(-\lim_{N\to\infty}\partial_{x}S_{N,p,\alpha}(t,\boldsymbol{x})\right)=\partial_{y}\bar{q}.

In the same way

xm¯=x(limNzSN,p,α(t,𝒙))=z(limNxSN,p,α(t,𝒙))=zq¯.\partial_{x}\bar{m}=\partial_{x}\left(\lim_{N\to\infty}\partial_{z}S_{N,p,\alpha}(t,\boldsymbol{x})\right)=\partial_{z}\left(\lim_{N\to\infty}\partial_{x}S_{N,p,\alpha}(t,\boldsymbol{x})\right)=-\partial_{z}\bar{q}.

With these results, we have

xG(ϕ)=p2q¯p/21p¯xq¯q¯p/2yq¯pm¯p1zq¯=(vxx+vyy+vzz)q¯=(𝒗)q¯.\partial_{x}G(\boldsymbol{\phi})=-\frac{p}{2}\bar{q}^{p/2-1}\bar{p}\ \partial_{x}\bar{q}-\bar{q}^{p/2}\partial_{y}\bar{q}-p\bar{m}^{p-1}\partial_{z}\bar{q}=(v_{x}\partial_{x}+v_{y}\partial_{y}+v_{z}\partial_{z})\bar{q}=(\boldsymbol{v}\cdot\boldsymbol{\nabla})\bar{q}.

This leads to

tq¯+(𝒗)q¯=0.\partial_{t}\bar{q}+(\boldsymbol{v}\cdot\boldsymbol{\nabla})\bar{q}=0.

Proposition 8.

The equilibrium dynamics of the DAM models is given by the set of self-consistency equations

q¯=𝔼[tanh2(βp2q¯p/21p¯η+βp2m¯p1)],m¯=𝔼[tanh(βp2q¯p/21p¯η+βp2m¯p1)],p¯={αβq¯[1β(1q¯)]2,if p=2αβq¯p/2if p=2k with k=2,3,\begin{split}&\bar{q}=\mathbb{E}\left[\tanh^{2}\left(\sqrt{\beta^{\prime}\frac{p}{2}\bar{q}^{p/2-1}\bar{p}}\eta+\beta^{\prime}\frac{p}{2}\bar{m}^{p-1}\right)\right],\\ &\bar{m}=\mathbb{E}\left[\tanh\left(\sqrt{\beta^{\prime}\frac{p}{2}\bar{q}^{p/2-1}\bar{p}}\eta+\beta^{\prime}\frac{p}{2}\bar{m}^{p-1}\right)\right],\\ &\bar{p}=\begin{cases}\frac{\alpha\beta\bar{q}}{[1-\beta(1-\bar{q})]^{2}},&\mbox{if }p=2\\ \alpha\beta^{\prime}\bar{q}^{p/2}&\mbox{if }p=2k\mbox{ with }k=2,3,\dots\end{cases}\\ \end{split} (89)

where β:=2βp!\beta^{\prime}:=\frac{2\beta}{p!}.

Proof.

In order to proof our assertion, we use the vector PDE (88), whose solution can be given in implicit form as

ϕ(t,𝒙)=ϕ0(𝒙𝒗t),\boldsymbol{\phi}(t,\boldsymbol{x})=\boldsymbol{\phi}_{0}(\boldsymbol{x}-\boldsymbol{v}t),

where ϕ0(𝒙)\boldsymbol{\phi}_{0}(\boldsymbol{x}) is the initial profile given by the conditions (84). For the first component, we have

q¯(t,𝒙)=ϕ0,x(𝒙𝒗t)=𝔼ηtanh2(xvxtη+zvzt2)==𝔼ηtanh2(x+p2q¯p/21p¯tη+z+pm¯p1t2).\begin{split}\bar{q}(t,\boldsymbol{x})&=\phi_{0,x}(\boldsymbol{x}-\boldsymbol{v}t)=\mathbb{E}_{\eta}\tanh^{2}\left(\sqrt{x-v_{x}t}\eta+\frac{z-v_{z}t}{2}\right)=\\ &=\mathbb{E}_{\eta}\tanh^{2}\left(\sqrt{x+\frac{p}{2}\bar{q}^{p/2-1}\bar{p}t}\eta+\frac{z+p\bar{m}^{p-1}t}{2}\right).\end{split} (90)

Analogously,

m¯(t,𝒙)=ϕ0,z(𝒙𝒗t)=𝔼ηtanh(x+p2q¯p/21p¯tη+z+pm¯p1t2).\begin{split}\bar{m}(t,\boldsymbol{x})&=\phi_{0,z}(\boldsymbol{x}-\boldsymbol{v}t)=\mathbb{E}_{\eta}\tanh\left(\sqrt{x+\frac{p}{2}\bar{q}^{p/2-1}\bar{p}t}\eta+\frac{z+p\bar{m}^{p-1}t}{2}\right).\end{split} (91)

Finally, if p=2p=2, we have

p¯(t,𝒙)=ϕ0,y(𝒙𝒗t)=α(yvyt)(1+yvytt0)2=α(y+q¯t)(1+y+q¯tt0)2,\begin{split}\bar{p}(t,\boldsymbol{x})&=\phi_{0,y}(\boldsymbol{x}-\boldsymbol{v}t)=\frac{\alpha(y-v_{y}t)}{(1+y-v_{y}t-t_{0})^{2}}=\frac{\alpha(y+\bar{q}t)}{(1+y+\bar{q}t-t_{0})^{2}},\end{split} (92)

while, for p=2kp=2k with k2k\geq 2, the same order parameter satisfies the self-consistency equation

p¯(t,𝒙)=ϕ0,y(𝒙𝒗t)=α(yvyt)=α(y+q¯t).\begin{split}\bar{p}(t,\boldsymbol{x})&=\phi_{0,y}(\boldsymbol{x}-\boldsymbol{v}t)={\alpha(y-v_{y}t)}={\alpha(y+\bar{q}t)}.\end{split} (93)

Recalling that the thermodynamics of the DAM models is reproduced when t=t0=2βp!t=t_{0}=\frac{2\beta}{p!} and 𝒙=𝟎\boldsymbol{x}=\boldsymbol{0}, we easily get the thesis. ∎

Remark 14.

We stress that the self-consistency equations in Prop. 8 are in agreement with those obtained by Gardner in gardner1987 by means of replica trick.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: Phase diagram for p=4p=4 (panel a), p=6p=6 (panel b), p=8p=8 (panel c) and p=10p=10 (panel d), in the space of the control parameters α\alpha and β=2β/p!\beta^{\prime}=2\beta/p!. The region I, delimited by the solid black line, is the retrieval region, while region II and region III are respectively the spin-glass and the ergodic phases. The dashed line is the prolongation of the spin-glass phase inside the retrieval region. Finally, the lighter dotted line inside the region I identifies the portion of the parameter space in which the retrieval states are global minima for the free energy. Notice that the indentation that can be seen in the transition line delimiting the retrieval phase is a spurious effects due to the RS approximation albanese2021 .

By solving these self-consistency equations (89) numerically by a fixed-point method for a given pp and tuning the parameters TT and α\alpha, we obtain the phase diagrams shown in Fig. 2. As expected, the diagrams exhibit the existence of three different regions:

  • For high levels of noise TT, no matter the value of storage α\alpha, the only stable solution is given by m¯=0\bar{m}=0, q¯=0\bar{q}=0, thus the system is ergodic (III);

  • At lower temperatures and with relatively high load, the system exhibits spin-glass behaviors (II), and the solution is characterized by m¯=0\bar{m}=0 and q¯0\bar{q}\neq 0;

  • For relatively small values of α\alpha and TT, we have m¯,q¯0\bar{m},\bar{q}\neq 0 and the system is located in the retrieval phase (I). In this situation, the system behaves as an associative neural network performing spontaneously pattern recognition. In particular, we can see that the retrieval region observed for values of TT and α\alpha relatively small can be further split in a pure retrieval region , where pure states are global minima for the free energy, and in a mixed retrieval region, where pure states are local minima, yet their attraction basin is large enough for the system to end there if properly stimulated.

Thus, by increasing pp we need to afford higher costs in terms of resources, since the number of connection weights to be properly set grows as (Np)\binom{N}{p}, but we also have a reward on a coarse scale, since the number of storable patterns grows as KNp1K\sim N^{p-1}, as well as on a fine scale, since the critical load αc\alpha_{c} is also increasing with pp.

6 Conclusions

In this work we focused on DAMs, that are neural-network models widely used for pattern recognition tasks and characterized by high-order (higher than quadratic) interactions between the constituting neurons. Extensive empirical evidence has shown that these models significantly outperform non-dense networks (displaying only quadratic interactions), especially as for the ability to correctly recognize adversarial or extremely noisy examples Krotov2016DenseAM ; Krotov2018DAMS ; AABCF-PRL2020 ; AD-EPJ2020 hence making these models particularly suitable for detecting and cope with malicious attacks. From the theoretical side, results are sparse and mainly based on the possibility of recasting these networks as spin-glass-like models with spins interacting pp-wisely; these models can, in turn, be effectively faced by tools stemming from the statistical mechanics of disordered systems (e.g., Bovier-JPA1994 ; AABF-NN2020 ). Here, we pave this way and develop analytical techniques for their investigation. More precisely, we translate the original statistical-mechanical problem into an analytical-mechanical one where control parameters play as spacetime coordinates, the free-energy plays as an action and the macroscopic observables that assess whether the system can be used for pattern recognition tasks play as effective velocities and are shown to fulfil a set of nonlinear partial differential equations. In this framework, transitions from different regimes (e.g., from a region in the control parameter space where the system performs correctly and another one where information processing capabilities are lost) appear as the emergence of shock waves.

A main advantage of this route is that it allows for rigorous investigations in a field where most knowledge is based on (pseudo) heuristic approaches, with a wide set of already available methods and strategies to rely upon. Further, by bridging two different perspectives, the statistical-mechanics and the analytical one, we anticipate a cross-fertilization that may lead to a deeper comprehension of the system subtle mechanisms and ultimately progress in the development of a complete theory for (deep) machine learning.

Acknowledgments

E.A. and A.F. would like to thank A. Moro for useful discussions. The authors acknowledge Sapienza University of Rome for financial support (RM120172B8066CB0, RM12117A8590B3FA, AR12117A623B0114).

Appendix A Proof of Lemma 1

Proof.

First of all, we compute the temporal derivative:

tAN,p(t,x)=1N𝔼𝑱1ZN,p,𝑱(t,x)𝝈p!2Np112t1i1<<ipNJi1ipσi1σipBN,p,𝑱(t,x)==1Np!2Np112t1i1<<ipN𝔼𝑱Ji1ipωt,x(σi1σip).\begin{split}\partial_{t}A_{N,p}(t,x)&=\frac{1}{N}\mathbb{E}_{\boldsymbol{J}}\frac{1}{Z_{N,p,\boldsymbol{J}}(t,x)}\sum_{\boldsymbol{\sigma}}\sqrt{\frac{p!}{2N^{p-1}}}\frac{1}{2\sqrt{t}}\sum_{1\leq i_{1}<\dots<i_{p}\leq N}J_{i_{1}\dots i_{p}}\sigma_{i_{1}}\dots\sigma_{i_{p}}B_{N,p,\boldsymbol{J}}(t,x)=\\ &=\frac{1}{N}\sqrt{\frac{p!}{2N^{p-1}}}{\frac{1}{2\sqrt{t}}}\sum_{1\leq i_{1}<\dots<i_{p}\leq N}\mathbb{E}_{\boldsymbol{J}}J_{i_{1}\dots i_{p}}\omega_{t,x}(\sigma_{i_{1}}\dots\sigma_{i_{p}}).\end{split} (94)

Here, we can use the Wick-Isserlis theorem for normally distributed random variables, ensuring that 𝔼𝑱Jlf(𝑱)=𝔼𝑱Jlf(𝑱)\mathbb{E}_{\boldsymbol{J}}J_{l}f(\boldsymbol{J})=\mathbb{E}_{\boldsymbol{J}}\partial_{J_{l}}f(\boldsymbol{J}) for each function ff of the quenched disorder 𝑱\boldsymbol{J}. Thus

tAN,p(t,x)=1Np!2Np112t1i1<<ipN𝔼𝑱Ji1ipωt,x(σi1σip)==1Np!2Np112ttp!2Np11i1<<ipN𝔼𝑱(ωt,x(1)ωt,x(σi1σip)2)==p!4Np1i1<<ipN(1𝔼𝑱ωt,x(σi1σip)2).\begin{split}\partial_{t}A_{N,p}(t,x)&=\frac{1}{N}\sqrt{\frac{p!}{2N^{p-1}}}{\frac{1}{2\sqrt{t}}}\sum_{1\leq i_{1}<\dots<i_{p}\leq N}\mathbb{E}_{\boldsymbol{J}}\partial_{J_{i_{1}\dots i_{p}}}\omega_{t,x}(\sigma_{i_{1}}\dots\sigma_{i_{p}})=\\ &=\frac{1}{N}\sqrt{\frac{p!}{2N^{p-1}}}{\frac{1}{2\sqrt{t}}}\sqrt{\frac{tp!}{2N^{p-1}}}\sum_{1\leq i_{1}<\dots<i_{p}\leq N}\mathbb{E}_{\boldsymbol{J}}(\omega_{t,x}(1)-\omega_{t,x}(\sigma_{i_{1}}\dots\sigma_{i_{p}})^{2})=\\ &=\frac{p!}{4N^{p}}\sum_{1\leq i_{1}<\dots<i_{p}\leq N}(1-\mathbb{E}_{\boldsymbol{J}}\omega_{t,x}(\sigma_{i_{1}}\dots\sigma_{i_{p}})^{2}).\end{split} (95)

The non-trivial contribution in the round brackets in the last equality can be expressed in terms of the overlap order parameter. Indeed

𝔼𝑱ωt,x(σi1σip)2=𝔼𝑱ωt,x(1)(σi1(1)σip(1))ωt,x(2)(σi1(2)σip(2))==𝔼𝑱Ωt,x(2)(σi1(1)σi1(2)σip(1)σip(2))==σi1(1)σi1(2)σip(1)σip(2)t,x,\begin{split}\mathbb{E}_{\boldsymbol{J}}\omega_{t,x}(\sigma_{i_{1}}\dots\sigma_{i_{p}})^{2}&=\mathbb{E}_{\boldsymbol{J}}\omega_{t,x}^{(1)}(\sigma_{i_{1}}^{(1)}\dots\sigma_{i_{p}}^{(1)})\ \omega_{t,x}^{(2)}(\sigma_{i_{1}}^{(2)}\dots\sigma_{i_{p}}^{(2)})=\\ &=\mathbb{E}_{\boldsymbol{J}}\Omega^{(2)}_{t,x}(\sigma_{i_{1}}^{(1)}\sigma_{i_{1}}^{(2)}\dots\sigma_{i_{p}}^{(1)}\sigma_{i_{p}}^{(2)})=\\ &=\langle\sigma_{i_{1}}^{(1)}\sigma_{i_{1}}^{(2)}\dots\sigma_{i_{p}}^{(1)}\sigma_{i_{p}}^{(2)}\rangle_{t,x},\end{split} (96)

where 1,21,2 are the replica indices. Now, using Rem. 2, in the thermodynamic the following equality holds:

tAN,p(t,x)=14Npi1,,ip=1N(1σi1(1)σi1(2)σip(1)σip(2)t,x)==14(1(1Ni=1Nσi(1)σi(2))pt,x)=14(1q12pt,x).\begin{split}\partial_{t}A_{N,p}(t,x)&=\frac{1}{4N^{p}}\sum_{i_{1},\dots,i_{p}=1}^{N}(1-\langle\sigma_{i_{1}}^{(1)}\sigma_{i_{1}}^{(2)}\dots\sigma_{i_{p}}^{(1)}\sigma_{i_{p}}^{(2)}\rangle_{t,x})=\\ &=\frac{1}{4}(1-\langle\bigl{(}\frac{1}{N}\sum_{i=1}^{N}\sigma_{i}^{(1)}\sigma_{i}^{(2)}\bigr{)}^{p}\rangle_{t,x})=\frac{1}{4}(1-\langle q_{12}^{p}\rangle_{t,x}).\end{split} (97)

Concerning the spatial derivative, we proceed in the same way:

xAN,p(t,x)=1N𝔼𝑱1ZN,p,𝑱(t,x)𝝈12xi=1NJiσiBN,p,𝑱(t,x)==12Nxi=1N𝔼𝑱Jiωt,x(σi)=12Nxi=1N𝔼𝑱Jiωt,x(σi)==12Ni=1N(1𝔼𝑱ωt,x(σi)2).\begin{split}\partial_{x}A_{N,p}(t,x)&=\frac{1}{N}\mathbb{E}_{\boldsymbol{J}}\frac{1}{Z_{{N,p,\boldsymbol{J}}}(t,x)}\sum_{\boldsymbol{\sigma}}\frac{1}{2\sqrt{x}}\sum_{i=1}^{N}J_{i}\sigma_{i}B_{N,p,\boldsymbol{J}}(t,x)=\\ &=\frac{1}{2N\sqrt{x}}\sum_{i=1}^{N}\mathbb{E}_{\boldsymbol{J}}J_{i}\omega_{t,x}(\sigma_{i})=\frac{1}{2N\sqrt{x}}\sum_{i=1}^{N}\mathbb{E}_{\boldsymbol{J}}\partial_{J_{i}}\omega_{t,x}(\sigma_{i})=\\ &=\frac{1}{2N}\sum_{i=1}^{N}(1-\mathbb{E}_{\boldsymbol{J}}\omega_{t,x}(\sigma_{i})^{2}).\end{split} (98)

Also in this case, we express 𝔼𝑱ωt,x(σi)2=σi(1)σi(2)t,x\mathbb{E}_{\boldsymbol{J}}\omega_{t,x}(\sigma_{i})^{2}=\langle\sigma_{i}^{(1)}\sigma_{i}^{(2)}\rangle_{t,x}, thus

xAN,p(t,x)=12(1q12t,x).\begin{split}\partial_{x}A_{N,p}(t,x)&=\frac{1}{2}(1-\langle q_{12}\rangle_{t,x}).\end{split} (99)

By simply exploiting Def. 6 and the Rem.2 we get the thesis. ∎

Appendix B Particular cases of low storage DAMs

In this Appendix, having clear the equations describing the general case of the DAM models in the low storage regime, we will study two special cases namely the standard case where p=2p=2 and the more complex case with p=3p=3. In particular, we will observe that these two cases can be described by the Burgers and Sharma-Tasso-Olver equations in a (K+1)(K+1)-dimensional space, respectively. To start this study, however, we first need the following definition and Lemma.

Lemma 5.

For all μ,ν=1,,K\mu,\nu=1,\dots,K, we have

  1. 1.

    [Dμ,Dν]=0[D_{\mu},D_{\nu}]=0;

  2. 2.

    [μ,Dνs]=N[Dνs,uμ][\partial_{\mu},D_{\nu}^{s}]=N[D_{\nu}^{s},u_{\mu}] s>0,\forall s>0,

where [,][\cdot,\cdot] is the usual commutator.

Proof.

The proof of the statement 1.1. works by direct computation. Indeed:

[Dμ,Dν]=[1Nμ+uμ,1Nν+uν]==1N2[μ,ν]+1N[μ,uν]+1N[uμ,ν]+[uμ,uν]==1N(μuννuμ).\begin{split}[D_{\mu},D_{\nu}]&=\Big{[}\frac{1}{N}\partial_{\mu}+u_{\mu},\frac{1}{N}\partial_{\nu}+u_{\nu}\Big{]}=\\ &=\frac{1}{N^{2}}[\partial_{\mu},\partial_{\nu}]+\frac{1}{N}[\partial_{\mu},u_{\nu}]+\frac{1}{N}[u_{\mu},\partial_{\nu}]+[u_{\mu},u_{\nu}]=\\ &=\frac{1}{N}(\partial_{\mu}u_{\nu}-\partial_{\nu}u_{\mu}).\end{split}

Since the field uμ(t,𝒙)u_{\mu}(t,\boldsymbol{x}) is conservative, i.e. uμ(t,𝒙)=μAN,p,𝝃,K(t,𝒙)u_{\mu}(t,\boldsymbol{x})=\partial_{\mu}A_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x}), we have

μuννuμ=(μννμ)AN,p,𝝃,K(t,𝒙)=0,\partial_{\mu}u_{\nu}-\partial_{\nu}u_{\mu}=(\partial_{\mu}\partial_{\nu}-\partial_{\nu}\partial_{\mu})A_{N,p,\boldsymbol{\xi},K}(t,\boldsymbol{x})=0,

meaning that [Dμ,Dν]=0.[D_{\mu},D_{\nu}]=0. Let as now prove the property 2.2. In this case the proof works by exploiting the property [Dμ,Dν]=0[D_{\mu},D_{\nu}]=0 (from which it follows also [Dμ,Dνs]=0[D_{\mu},D_{\nu}^{s}]=0) and the definition Dμ=1Nμ+uμD_{\mu}=\frac{1}{N}\partial_{\mu}+u_{\mu}. Indeed

0=[Dμ,Dνs]=1N[μ,Dνs]+[uμ,Dνs].0=[D_{\mu},D_{\nu}^{s}]=\frac{1}{N}[\partial_{\mu},D^{s}_{\nu}]+[u_{\mu},D_{\nu}^{s}].

By rearranging the equality we easily get the thesis. ∎

From the previous lemma we can then prove the following two propositions

Proposition 9.

In the p=2p=2 case with a generic finite KK, the evolutive equations (63) reduces to the multidimensional Burgers equation.

Proof.

In the p=2p=2 case, the equations (63) reduces to

tuμ=ν=1KμDνuν.\partial_{t}u_{\mu}=-\sum_{\nu=1}^{K}\partial_{\mu}D_{\nu}u_{\nu}.

Now, using the second claim of Lemma 5 with s=1s=1, we have

μDνuν=(Dνμ+N[Dν,uμ])uν.\partial_{\mu}D_{\nu}u_{\nu}=\Big{(}D_{\nu}\partial_{\mu}+N[D_{\nu},u_{\mu}]\Big{)}u_{\nu}.

Recalling the definition of the DD operator, we have

Dνμuν\displaystyle D_{\nu}\partial_{\mu}u_{\nu} =\displaystyle= (1Nν+uν)μuν=1Nνμuν+uνμuν,\displaystyle\Big{(}\frac{1}{N}\partial_{\nu}+u_{\nu}\Big{)}\partial_{\mu}u_{\nu}=\frac{1}{N}\partial_{\nu}\partial_{\mu}u_{\nu}+u_{\nu}\partial_{\mu}u_{\nu},
[Dν,uμ]\displaystyle\big{[}D_{\nu},u_{\mu}\big{]} =\displaystyle= [1Nν+uν,uμ]=1N[ν,uμ]=1Nνuμ.\displaystyle\big{[}\frac{1}{N}\partial_{\nu}+u_{\nu},u_{\mu}\big{]}=\frac{1}{N}\big{[}\partial_{\nu},u_{\mu}\big{]}=\frac{1}{N}\partial_{\nu}u_{\mu}.

Thus

tuμ=ν=1K(1Nνμuν+uνμuν+uννuμ).\partial_{t}u_{\mu}=-\sum_{\nu=1}^{K}\Big{(}\frac{1}{N}\partial_{\nu}\partial_{\mu}u_{\nu}+u_{\nu}\partial_{\mu}u_{\nu}+u_{\nu}\partial_{\nu}u_{\mu}\Big{)}.

But now

νμuν\displaystyle\partial_{\nu}\partial_{\mu}u_{\nu} =\displaystyle= νμνAN,p=2,𝝃,K(t,𝒙)=ν2uμ,\displaystyle\partial_{\nu}\partial_{\mu}\partial_{\nu}A_{N,p=2,\boldsymbol{\xi},K}(t,\boldsymbol{x})=\partial_{\nu}^{2}u_{\mu},
uνμuν\displaystyle u_{\nu}\partial_{\mu}u_{\nu} =\displaystyle= uνμνAN,p=2,𝝃,K(t,𝒙)=uννuμ.\displaystyle u_{\nu}\partial_{\mu}\partial_{\nu}A_{N,p=2,\boldsymbol{\xi},K}(t,\boldsymbol{x})=u_{\nu}\partial_{\nu}u_{\mu}.

Using these results, we can rewrite the equation as

tuμ+ν=1K(1Nν2uμ+2uννuμ)=0,\partial_{t}u_{\mu}+\sum_{\nu=1}^{K}\Big{(}\frac{1}{N}\partial_{\nu}^{2}u_{\mu}+2u_{\nu}\partial_{\nu}u_{\mu}\Big{)}=0,

or, in vector form

t𝒖+1N2𝒖+2(𝒖)𝒖=0,\partial_{t}\boldsymbol{u}+\frac{1}{N}\boldsymbol{\nabla}^{2}\boldsymbol{u}+2(\boldsymbol{u}\cdot\boldsymbol{\nabla})\boldsymbol{u}=0,

which is precisely the Burgers equation in K+1K+1 spacetime. ∎

Proposition 10.

In the p=3p=3 case (and generic KK), the evolutive equations (63) reduces to the multidimensional Sharma-Tasso-Olver (STO) equation olver1977evolution ; tasso1976cole .

Proof.

In the p=3p=3 case, the equations (63) reduces to

tuμ=ν=1KμDν2uν.\partial_{t}u_{\mu}=-\sum_{\nu=1}^{K}\partial_{\mu}D^{2}_{\nu}u_{\nu}.

Recalling that Dν:=(1Nν+uν)D_{\nu}:=\left(\frac{1}{N}\partial_{\nu}+u_{\nu}\right) we have

μDν2uν=μ(1Nν+uν)(1Nν+uν)uν=μ[1N2ν2uν+1Nν(uν2)+1Nuννuν+uν3]=μ[1N2ν2uν+3Nuννuν+uν3].\begin{split}\partial_{\mu}D^{2}_{\nu}u_{\nu}&=\partial_{\mu}\left(\frac{1}{N}\partial_{\nu}+u_{\nu}\right)\left(\frac{1}{N}\partial_{\nu}+u_{\nu}\right)u_{\nu}\\ &=\partial_{\mu}\left[\frac{1}{N^{2}}\partial_{\nu}^{2}u_{\nu}+\frac{1}{N}\partial_{\nu}(u^{2}_{\nu})+\frac{1}{N}u_{\nu}\partial_{\nu}u_{\nu}+u^{3}_{\nu}\right]\\ &=\partial_{\mu}\left[\frac{1}{N^{2}}\partial_{\nu}^{2}u_{\nu}+\frac{3}{N}u_{\nu}\partial_{\nu}u_{\nu}+u^{3}_{\nu}\right].\end{split}

Performing the derivative with respect to the μ\mu-th component we therefore get

μDν2uν=1N2μν2uν+3N(μuννuν+uνμνuν)+3uν2μuν.\partial_{\mu}D^{2}_{\nu}u_{\nu}=\frac{1}{N^{2}}\partial_{\mu}\partial_{\nu}^{2}u_{\nu}+\frac{3}{N}\left(\partial_{\mu}u_{\nu}\partial_{\nu}u_{\nu}+u_{\nu}\partial_{\mu}\partial_{\nu}u_{\nu}\right)+3u_{\nu}^{2}\partial_{\mu}u_{\nu}. (100)

Now, recalling that uμ(t,𝒙):=ωt,𝒙(mν(𝝈))u_{\mu}(t,\boldsymbol{x}):=\omega_{t,\boldsymbol{x}}(m_{\nu}(\boldsymbol{\sigma})) and (60), we have

μν2uν=μν3AN,p=3,𝝃,K(t,𝒙)=ν3uμμuν=μνAN,p=3,𝝃,K(t,𝒙)=νuμμνuν=μν2AN,p=3,𝝃,K(t,𝒙)=ν2uμ.\begin{split}\partial_{\mu}\partial_{\nu}^{2}u_{\nu}&=\partial_{\mu}\partial_{\nu}^{3}A_{N,p=3,\boldsymbol{\xi},K}(t,\boldsymbol{x})=\partial_{\nu}^{3}u_{\mu}\\ \partial_{\mu}u_{\nu}&=\partial_{\mu}\partial_{\nu}A_{N,p=3,\boldsymbol{\xi},K}(t,\boldsymbol{x})=\partial_{\nu}u_{\mu}\\ \partial_{\mu}\partial_{\nu}u_{\nu}&=\partial_{\mu}\partial_{\nu}^{2}A_{N,p=3,\boldsymbol{\xi},K}(t,\boldsymbol{x})=\partial_{\nu}^{2}u_{\mu}.\end{split}

Using these results, we can rewrite the equation as

tuμ=ν=1K(1N2ν3uμ+3N(νuμ)2+3Nuνν2uμ+3uν2νuμ).\partial_{t}u_{\mu}=-\sum_{\nu=1}^{K}\left(\frac{1}{N^{2}}\partial^{3}_{\nu}u_{\mu}+\frac{3}{N}(\partial_{\nu}u_{\mu})^{2}+\frac{3}{N}u_{\nu}\partial^{2}_{\nu}u_{\mu}+3u_{\nu}^{2}\partial_{\nu}u_{\mu}\right).

Appendix C Proof of Lemma 4

Proof.

We prove the equality for the tt-derivative of the Guerra’s action, as the others follow with similar calculations. To do this, we first compute the temporal derivative of the interpolating statistical pressure:

tAN,p,α=12𝔼ωt,𝒙(mp)+12NtNp1μi1,,ip/2𝔼ξi1μξip/2μωt,𝒙(σi1σip/2τμ)12𝔼ωt,𝒙(p11).\begin{split}\partial_{t}A_{N,p,\alpha}&=\frac{1}{2}\mathbb{E}\omega_{t,\boldsymbol{x}}(m^{p})+\frac{1}{2N\sqrt{tN^{p-1}}}\sum_{\mu}\sum_{i_{1},\dots,i_{p/2}}\mathbb{E}\,\xi^{\mu}_{i_{1}}\dots\xi^{\mu}_{i_{p/2}}\omega_{t,\boldsymbol{x}}(\sigma_{i_{1}}\dots\sigma_{i_{p/2}}\tau_{\mu})-\frac{1}{2}\mathbb{E}\omega_{t,\boldsymbol{x}}(p_{11}).\end{split}

Since non-retrieved patterns constitutes a noise contribution to the system dynamics, we can assume - with standard arguments about the universality of noise Genovese_2012 ; Agliari_2019 - that the whole product ξi1μξip/2μ\xi^{\mu}_{i_{1}}\dots\xi^{\mu}_{i_{p/2}} is Gaussian-distributed as long as NN\to\infty and KK\to\infty. Thus, we can apply the Wick-Isserlis theorem on the second contribution to get

tAN,p,α=12mpt,𝒙+12Npμi1,,ip/2𝔼[ωt,𝒙(τμ2)ωt,𝒙2(σi1σip/2τμ)]12p11t,𝒙==12mpt,𝒙+12Np(Npp11t,𝒙Npq12p/2p12t,𝒙)12p11t,𝒙==12mpt,𝒙12p12q12p/2t,𝒙,\begin{split}\partial_{t}A_{N,p,\alpha}&=\frac{1}{2}\langle m^{p}\rangle_{t,\boldsymbol{x}}+\frac{1}{2{N^{p}}}\sum_{\mu}\sum_{i_{1},\dots,i_{p/2}}\mathbb{E}\left[\omega_{t,\boldsymbol{x}}(\tau_{\mu}^{2})-\omega^{2}_{t,\boldsymbol{x}}(\sigma_{i_{1}}\dots\sigma_{i_{p/2}}\tau_{\mu})\right]-\frac{1}{2}\langle p_{11}\rangle_{t,\boldsymbol{x}}=\\ &=\frac{1}{2}\langle m^{p}\rangle_{t,\boldsymbol{x}}+\frac{1}{2N^{p}}(N^{p}\langle p_{11}\rangle_{t,\boldsymbol{x}}-N^{p}\langle q_{12}^{p/2}p_{12}\rangle_{t,\boldsymbol{x}})-\frac{1}{2}\langle p_{11}\rangle_{t,\boldsymbol{x}}=\\ &=\frac{1}{2}\langle m^{p}\rangle_{t,\boldsymbol{x}}-\frac{1}{2}\langle p_{12}q_{12}^{p/2}\rangle_{t,\boldsymbol{x}},\end{split}

where we used the definitions of the overlap order parameters (69) and (70). Recalling that SN,p,α(t,𝒙)=2AN,p,α(t,𝒙)xS_{N,p,\alpha}(t,\boldsymbol{x})=2A_{N,p,\alpha}(t,\boldsymbol{x})-x, we finally get the result. ∎

Appendix D Proof of Proposition 5

Proof.

We only prove the equation (73), the other one can be obtained in an analogous way. We will denote for simplicity of notation t,𝒙\langle\cdot\rangle_{t,\boldsymbol{x}} with \langle\cdot\rangle. Thus

xO(𝝈¯,𝝉¯)=12xi=1Na=12𝔼ηηiΩ(2)(O(𝝈¯,𝝉¯)σi(a))1xi=1N𝔼ηiΩ(3)(O(𝝈¯,𝝉¯)σi(3))==12xi=1Na=12𝔼ηηiΩ(2)(O(𝝈¯,𝝉¯)σi(a))1xi=1N𝔼ηiΩ(3)(O(𝝈¯,𝝉¯)σi(3)),\begin{split}\partial_{x}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\rangle&=\frac{1}{2\sqrt{x}}\sum_{i=1}^{N}\sum_{a=1}^{2}\mathbb{E}_{\eta}\eta_{i}\Omega^{(2)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(a)}\right)-\frac{1}{\sqrt{x}}\sum_{i=1}^{N}\mathbb{E}\eta_{i}\Omega^{(3)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(3)}\right)=\\ &=\frac{1}{2\sqrt{x}}\sum_{i=1}^{N}\sum_{a=1}^{2}\mathbb{E}_{\eta}\partial_{\eta_{i}}\Omega^{(2)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(a)}\right)-\frac{1}{\sqrt{x}}\sum_{i=1}^{N}\mathbb{E}\partial_{\eta_{i}}\Omega^{(3)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(3)}\right),\end{split} (101)

where in the last line we used the Wick-Isserlis theorem. Now, it is simple to see that

ηiΩ(2)(O(𝝈¯,𝝉¯)σi(a))=x[b=12Ω(2)(O(𝝈¯,𝝉¯)σi(a)σi(b))2Ω(3)(O(𝝈¯,𝝉¯)σi(a)σi(3))],\partial_{\eta_{i}}\Omega^{(2)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(a)}\right)=\sqrt{x}\left[\sum_{b=1}^{2}\Omega^{(2)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(a)}\sigma_{i}^{(b)}\right)-2\Omega^{(3)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(a)}\sigma_{i}^{(3)}\right)\right], (102)

and

ηiΩ(3)(O(𝝈¯,𝝉¯)σi(3))=x[b=13Ω(3)(O(𝝈¯,𝝉¯)σi(3)σi(b))3Ω(4)(O(𝝈¯,𝝉¯)σi(3)σi(4))].\partial_{\eta_{i}}\Omega^{(3)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(3)}\right)=\sqrt{x}\left[\sum_{b=1}^{3}\Omega^{(3)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(3)}\sigma_{i}^{(b)}\right)-3\Omega^{(4)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(3)}\sigma_{i}^{(4)}\right)\right]. (103)

By substituting and into we get

xO(𝝈¯,𝝉¯)=12i=1Na,b=12𝔼Ω(2)(O(𝝈¯,𝝉¯)σi(a)σi(b))i=1Nb=12𝔼Ω(3)(O(𝝈¯,𝝉¯)σi(3)σi(a))i=1Na=13𝔼Ω(3)(O(𝝈¯,𝝉¯)σi(3)σi(a))+3i=1N𝔼Ω(4)(O(𝝈¯,𝝉¯)σi(3)σi(4)).\begin{split}\partial_{x}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\rangle&=\frac{1}{2}\sum_{i=1}^{N}\sum_{a,b=1}^{2}\mathbb{E}\Omega^{(2)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(a)}\sigma_{i}^{(b)}\right)-\sum_{i=1}^{N}\sum_{b=1}^{2}\mathbb{E}\Omega^{(3)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(3)}\sigma_{i}^{(a)}\right)\\ &-\sum_{i=1}^{N}\sum_{a=1}^{3}\mathbb{E}\Omega^{(3)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(3)}\sigma_{i}^{(a)}\right)+3\sum_{i=1}^{N}\mathbb{E}\Omega^{(4)}\left(O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\sigma_{i}^{(3)}\sigma_{i}^{(4)}\right).\end{split}

Recalling that qab=1Niσi(a)σi(b)q_{ab}=\frac{1}{N}\sum_{i}\sigma_{i}^{(a)}\sigma_{i}^{(b)} we can write

xO(𝝈¯,𝝉¯)=N2a,b=12O(𝝈¯,𝝉¯)qaba=12O(𝝈¯,𝝉¯)qa3a=13O(𝝈¯,𝝉¯)q3a+3NO(𝝈¯,𝝉¯)q34==N2a,b=12O(𝝈¯,𝝉¯)qab2Na=12O(𝝈¯,𝝉¯)qa3NO(𝝈¯,𝝉¯)+3NO(𝝈¯,𝝉¯)q34\begin{split}\partial_{x}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\rangle&=\frac{N}{2}\sum_{a,b=1}^{2}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})q_{ab}\rangle-\sum_{a=1}^{2}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})q_{a3}\rangle-\sum_{a=1}^{3}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})q_{3a}\rangle+3N\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})q_{34}\rangle=\\ &=\frac{N}{2}\sum_{a,b=1}^{2}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})q_{ab}\rangle-2N\sum_{a=1}^{2}\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})q_{a3}\rangle-N\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})\rangle+3N\langle O(\boldsymbol{\underline{\sigma}},\boldsymbol{\underline{\tau}})q_{34}\rangle\end{split} (104)

thus obtaining Eq. (73). ∎

References

  • (1) Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
  • (2) J. Schmidhuber, “Deep learning in neural networks: an overview,” Neural Networks, vol. 61, pp. 85–117, 2015.
  • (3) M. Mézard, G. Parisi, and M. A. Virasoro, Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, vol. 9. World Scientific Publishing Company, 1987.
  • (4) D. Sherrington and S. Kirkpatrick, “Solvable model of a spin-glass,” Physical Review Letters, vol. 35, no. 26, p. 1792, 1975.
  • (5) G. Parisi, “Toward a mean field theory for spin glasses,” Physics Letters A, vol. 73, no. 3, pp. 203–205, 1979.
  • (6) G. Parisi, “Infinite number of order parameters for spin-glasses,” Physical Review Letters, vol. 43, no. 23, p. 1754, 1979.
  • (7) G. Parisi, “The order parameter for spin glasses: a function on the interval 0-1,” Journal of Physics A: Mathematical and General, vol. 13, no. 3, p. 1101, 1980.
  • (8) G. Parisi, “A sequence of approximated solutions to the SK model for spin glasses,” Journal of Physics A: Mathematical and General, vol. 13, no. 4, p. L115, 1980.
  • (9) M. Mézard, G. Parisi, N. Sourlas, G. Toulouse, and M. Virasoro, “Replica symmetry breaking and the nature of the spin glass phase,” Journal de Physique, vol. 45, no. 5, pp. 843–854, 1984.
  • (10) F. Guerra, “Broken replica symmetry bounds in the mean field spin glass model,” Communications in Mathematical Physics, vol. 233, no. 1, pp. 1–12, 2003.
  • (11) S. Ghirlanda and F. Guerra, “General properties of overlap probability distributions in disordered spin systems. Towards Parisi ultrametricity,” Journal of Physics A: Mathematical and General, vol. 31, no. 46, p. 9149, 1998.
  • (12) M. Talagrand, “Replica symmetry breaking and exponential inequalities for the Sherrington-Kirkpatrick model,” The Annals of Probability, vol. 28, no. 3, pp. 1018–1062, 2000.
  • (13) D. Panchenko, “A connection between the Ghirlanda–Guerra identities and ultrametricity,” The Annals of Probability, vol. 38, no. 1, pp. 327–347, 2010.
  • (14) D. Panchenko, “Ghirlanda–Guerra identities and ultrametricity: An elementary proof in the discrete case,” Comptes Rendus Mathematique, vol. 349, no. 13-14, pp. 813–816, 2011.
  • (15) D. Panchenko, “The parisi ultrametricity conjecture,” Annals of Mathematics, pp. 383–393, 2013.
  • (16) D. J. Amit, H. Gutfreund, and H. Sompolinsky, “Storing infinite numbers of patterns in a spin-glass model of neural networks,” Physical Review Letters, vol. 55, no. 14, p. 1530, 1985.
  • (17) J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the national academy of sciences, vol. 79, no. 8, pp. 2554–2558, 1982.
  • (18) L. A. Pastur and A. L. Figotin, “Exactly soluble model of a spin glass,” Journal of Low Temperature Physics, vol. 3, no. 6, pp. 378–383, 1977.
  • (19) D. Krotov and J. J. Hopfield, “Dense associative memory for pattern recognition,” Advances in neural information processing systems, vol. 29, 2016.
  • (20) D. Krotov and J. Hopfield, “Dense associative memory is robust to adversarial inputs,” Neural computation, vol. 30, no. 12, pp. 3151–3167, 2018.
  • (21) E. Agliari and G. De Marzo, “Tolerance versus synaptic noise in dense associative memories,” The European Physical Journal Plus, vol. 135, no. 11, pp. 1–22, 2020.
  • (22) P. Baldi and S. S. Venkatesh, “Number of stable points for spin-glasses and neural networks of higher orders,” Physical Review Letters, vol. 58, no. 9, p. 913, 1987.
  • (23) A. Bovier and B. Niederhauser, “The spin-glass phase-transition in the Hopfield model with pp-spin interactions,” Mathematical Physics and Mathematics, 2001.
  • (24) H. Steffan and R. Kühn, “Replica symmetry breaking in attractor neural network models,” Zeitschrift für Physik B Condensed Matter, vol. 95, no. 2, pp. 249–260, 1994.
  • (25) T. Tanaka, “Moment problem in replica method,” Interdisciplinary information sciences, vol. 13, no. 1, pp. 17–23, 2007.
  • (26) F. Guerra, “Sum rules for the free energy in the mean field spin glass model,” Fields Institute Communications, vol. 30, no. 11, 2001.
  • (27) E. Agliari, A. Barra, R. Burioni, and A. Di Biasio, “Notes on the p-spin glass studied via Hamilton-Jacobi and smooth-cavity techniques,” Journal of Mathematical Physics, vol. 53, no. 6, p. 063304, 2012.
  • (28) A. Barra, G. Dal Ferraro, and D. Tantari, “Mean field spin glasses treated with PDE techniques,” The European Physical Journal B, vol. 86, no. 7, pp. 1–10, 2013.
  • (29) A. Barra, “The mean field ising model trough interpolating techniques,” Journal of Statistical Physics, vol. 132, no. 5, pp. 787–809, 2008.
  • (30) A. Barra, A. Di Biasio, and F. Guerra, “Replica symmetry breaking in mean-field spin glasses through the Hamilton–Jacobi technique,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2010, no. 09, p. P09006, 2010.
  • (31) A. Barra, A. Di Lorenzo, F. Guerra, and A. Moro, “On quantum and relativistic mechanical analogues in mean-field spin models,” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 470, no. 2172, p. 20140589, 2014.
  • (32) G. De Matteis, F. Giglio, and A. Moro, “Exact equations of state for nematics,” Annals of Physics, vol. 396, pp. 386–396, 2018.
  • (33) E. Agliari, D. Migliozzi, and D. Tantari, “Non-convex multi-species Hopfield models,” Journal of Statistical Physics, vol. 172, no. 5, pp. 1247–1269, 2018.
  • (34) P. Lorenzoni and A. Moro, “Exact analysis of phase transitions in mean-field Potts models,” Physical Review E, vol. 100, no. 2, p. 022103, 2019.
  • (35) E. Agliari, A. Barra, and M. Notarnicola, “The relativistic Hopfield network: Rigorous results,” Journal of Mathematical Physics, vol. 60, no. 3, p. 033302, 2019.
  • (36) A. Fachechi, “PDE/statistical mechanics duality: Relation between Guerra’s interpolated pp-spin ferromagnets and the Burgers hierarchy,” Journal of Statistical Physics, vol. 183, no. 1, pp. 1–28, 2021.
  • (37) E. Agliari, L. Albanese, F. Alemanno, and A. Fachechi, “A transport equation approach for deep neural networks with quenched random weights,” Journal of Physics A: Mathematical and Theoretical, vol. 54, no. 50, p. 505004, 2021.
  • (38) A. Barra and A. Moro, “Exact solution of the Van der Waals model in the critical region,” Annals of Physics, vol. 359, pp. 290–299, 2015.
  • (39) G. De Nittis and A. Moro, “Thermodynamic phase transitions and shock singularities,” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 468, no. 2139, pp. 701–719, 2012.
  • (40) F. Giglio, G. Landolfi, and A. Moro, “Integrable extended Van der Waals model,” Physica D: Nonlinear Phenomena, vol. 333, pp. 293–300, 2016.
  • (41) A. Moro, “Shock dynamics of phase diagrams,” Annals of Physics, vol. 343, pp. 49–60, 2014.
  • (42) B. Derrida, “Random-energy model: An exactly solvable model of disordered systems,” Physical Review B, vol. 24, no. 5, p. 2613, 1981.
  • (43) L. Pastur and M. Shcherbina, “Absence of self-averaging of the order parameter in the Sherrington-Kirkpatrick model,” Journal of Statistical Physics, vol. 62, no. 1, pp. 1–19, 1991.
  • (44) A. Bovier, “Self-averaging in a class of generalized Hopfield models,” Journal of Physics A: Mathematical and General, vol. 27, no. 21, p. 7069, 1994.
  • (45) D. J. Amit, Modeling brain function: The world of attractor neural networks. Cambridge university press, 1989.
  • (46) E. Agliari, A. Barra, P. Sollich, and L. Zdeborová, “Machine learning and statistical physics: preface,” Journal of Physics A: Mathematical and Theoretical, vol. 53, p. 500401, 2020.
  • (47) E. Agliari, L. Albanese, A. Barra, and G. Ottaviani, “Replica symmetry breaking in neural networks: a few steps toward rigorous results,” Journal of Physics A: Mathematical and Theoretical, vol. 53, no. 41, p. 415005, 2020.
  • (48) L. Albanese, F. Alemanno, A. Alessandrelli, and A. Barra, “Replica symmetry breaking in dense neural networks,” arXiv preprint arXiv:2111.12997, 2021.
  • (49) E. Gardner, “Multiconnected neural network models,” Journal of Physics A: Mathematical and General, vol. 20, no. 11, p. 3453, 1987.
  • (50) E. Agliari, F. Alemanno, A. Barra, M. Centonze, and A. Fachechi, “Neural networks with a redundant representation: detecting the undetectable,” Physical Review Letters, vol. 124, no. 2, p. 028301, 2020.
  • (51) E. Agliari, F. Alemanno, A. Barra, and A. Fachechi, “Generalized Guerra’s interpolation schemes for dense associative neural networks,” Neural Networks, vol. 128, pp. 254–267, 2020.
  • (52) P. J. Olver, “Evolution equations possessing infinitely many symmetries,” Journal of Mathematical Physics, vol. 18, no. 6, pp. 1212–1215, 1977.
  • (53) H. Tasso, “Cole’s ansatz and extensions of Burgers’ equation,” tech. rep., Max-Planck-Institut für Plasmaphysik, 1976.
  • (54) G. Genovese, “Universality in bipartite mean field spin glasses,” Journal of Mathematical Physics, vol. 53, no. 12, p. 123304, 2012.
  • (55) E. Agliari, A. Barra, and B. Tirozzi, “Free energies of Boltzmann machines: self-averaging, annealed and replica symmetric approximations in the thermodynamic limit,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2019, no. 3, p. 033301, 2019.