This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Barriers for the performance of graph neural networks (GNN) in discrete random structures. A comment on [SBK22],[ART23],[SBK23]

David Gamarnik Operations Research Center, Statistics and Data Science Center, Sloan School of Management, MIT; e-mail: gamarnik@mit.eduFunding from NSF Grant DMS-2015517 is gratefully acknowledged.

Recently graph neural network (GNN) based algorithms were proposed to solve a variety of combinatorial optimization problems, including Maximum Cut problem, Maximum Independent Set problem and similar other problems [SBK22],[SBZK22]. The algorithm was tested in particular on random instances of these problems, namely when the underlying graph is generated according to some specified probability distribution. Earlier, a similar proposal using a somewhat different learning architecture was put forward to solve another optimization problem, one of finding ground states of spin glass models [SNL+22].

The publication [SBK22] stirred a debate whether GNN based method was adequately benchmarked against best prior methods. In particular, critical commentaries [ART23] and [Boe23] point out that simple greedy algorithm performs better than GNN in the setting of random graphs, and in fact stronger algorithmic performance can be reached with more sophisticated methods. A response from the authors [SBK23] pointed out that GNN performance can be improved further by tuning up the parameters better.

We do not intend to discuss the merits of arguments and counter-arguments in [SBK22],[ART23],[Boe23],[SBK23]. Rather in this note we establish a fundamental limitation for running GNN on random graphs considered in these references, for a broad range of choices of GNN architecture. Specifically, these barriers hold when the depth of GNN does not scale with graph size (we note that depth 2 was used in experiments in [SBK22]), and importantly regardless of any other parameters of GNN architecture, including internal dimension and update functions. These limitations arise from the presence of the Overlap Gap Property (OGP) phase transition, which is a barrier for many algorithms, both classical and quantum, including importantly local algorithms [Gam21],[GMZ22]. As we demonstrate in this paper, it is also a barrier to GNN due to its local structure. We note that at the same time known algorithms ranging from simple greedy algorithms to more sophisticated algorithms based on message passing, provide best results for these problems up to the OGP phase transition. This leaves very little space for GNN to outperform the known algorithms, and based on this we side with the conclusions made in [ART23] and [Boe23].

1 GNN for Combinatorial Optimization in Random Graphs

A class of problems discussed in [SBK22] and solved using GNN based methods falls into the domain of combinatorial optimization in random graphs. A graph GG is a collection of nodes VV and edges EE, which is s subset of unordered pairs or, more generally, tuples (hyper-edges) of nodes. A generic combinatorial optimization problem is defined by introducing a cost function C:{0,1}VC:\{0,1\}^{V}\to\mathbb{R} (also called Hamiltonian in physics jargon), which maps bit strings σ{0,1}V\sigma\in\{0,1\}^{V} (aka ”decisions”) into real values C(σ)C(\sigma) (aka ”cost” or ”energy”), and solving the problem maxσC(σ)\max_{\sigma}C(\sigma). An equivalent choice of σ{0,1}V\sigma\in\{0,1\}^{V} will be adopted here often for convenience. The presence of various kinds of combinatorial constraints on decisions arising from the presence of edges and hyper-edges can be encoded into the cost function CC.

A canonical example considered in the aforementioned references is the Independent Set problem (which we abbreviate as IS) which is a problem of finding a largest in cardinality subset IVI\subset V such that no two nodes are spanned by an edge. Namely (i,j)E(i,j)\notin E for all i,jIi,j\in I. This corresponds to a special case of CC where C(σ)=(iVσi)𝟏(σiσj=0,(i,j)E)C(\sigma)=\left(\sum_{i\in V}\sigma_{i}\right){\bf 1}\left(\sigma_{i}\sigma_{j}=0,\forall(i,j)\in E\right). Namely, CC is the number of ones in the string σ\sigma (indicating a inclusion into the independent set) multiplied by the indicator function for the event that σ\sigma indeed encodes a legitimate independent set. Another example discussed in the same collection of references is the graph Maximum Cut problem (which we abbreviate as MAXCUT). This is the problem of partitioning nodes of a graph into two sets which maximizes the number of crossed edges. Formally, this corresponds to the cost function C:{1,1}VC:\{-1,1\}^{V}\to\mathbb{R} defined by C(σ)=(i,j)E𝟏(σiσj=1)C(\sigma)=\sum_{(i,j)\in E}{\bf 1}\left(\sigma_{i}\sigma_{j}=-1\right). This model extends naturally to hypergraphs as follows. A KK-uniform hypergraph is a pair of a node set VV and a collection EE of hyperedges, where each hyperedge is an unordered subset of KK nodes. Thus 22-uniform hypergraph is just a graph. An extension of MAXCUT to hypergraphs is obtained by considering the cost function C(σ)=(i1,,iK)E𝟏(σ(i1)σ(i2)σ(iK)=1)C(\sigma)=\sum_{(i_{1},\ldots,i_{K})\in E}{\bf 1}\left(\sigma(i_{1})\sigma(i_{2})\cdots\sigma(i_{K})=-1\right). The case K=2K=2 again then reduces to the case of MAXCUT on graphs.

Our last example, arising from the studies of spin glasses, corresponds fixing an order pp tensor J=(Ji1,,ip,i1,,ipV)npJ=(J_{i_{1},\ldots,i_{p}},i_{1},\ldots,i_{p}\in V)\in\mathbb{R}^{n\otimes p} and defining C(σ)=i1,,ipVJi1,,ipσi1σi2σipC(\sigma)=\sum_{i_{1},\ldots,i_{p}\in V}J_{i_{1},\ldots,i_{p}}\sigma_{i_{1}}\sigma_{i_{2}}\cdots\sigma_{i_{p}} for each σ{1,1}V\sigma\in\{-1,1\}^{V}. The optimization problem is one of finding the value of maxσC(σ)\max_{\sigma}C(\sigma). Put it differently, this an unconstrained optimization problem on a complete weighted hyper-graph with hyper-edges (i1,,ip),i1,,ipV(i_{1},\ldots,i_{p}),i_{1},\ldots,i_{p}\in V.

In the random setting, either the cost function CC or the graph GG (or both) are generated randomly according to some probability distribution. The setting discussed in [SBK22] is IS problem when the underlying graph a random dd-regular graph on the set of nn nodes denoted for convenience by V={1,,n}V=\{1,\ldots,n\}. dd-regular means every node has exactly dd neighbors. The graph is generated uniformly at random from the space of all dd-regular graphs on nn nodes (see [JLR11],[FK15] for some background regarding existence and constructions). The random graph constructed this way will be denoted by 𝔾d(n)\mathbb{G}_{d}(n). The setting of spin glasses corresponds to assuming that the entries of the tensor JJ are generated randomly and independently from some common distribution, such as the standard normal distribution.

Next we turn to a generic description of GNN algorithms. We follow the notations used in [SBK22]. Given a graph G=(V,E)G=(V,E) the algorithm generates a sequence of node and time dependent features (hu,tdu,uV,t0)(h_{u,t}\in\mathbb{R}^{d_{u}},u\in V,t\geq 0). Time is assumed to evolve in discrete steps t=0,1,2,t=0,1,2,\ldots, and dud_{u} represents the dimension of the feature space for node uu. The feature vectors hu,th_{u,t} are generated as follows. The algorithm designer creates a node and time dependent functions (fu,t,uV,t0)(f_{u,t},u\in V,t\geq 0) where each fu,tf_{u,t} maps du+v𝒩(u)du\mathbb{R}^{d_{u}+\sum_{v\in\mathcal{N}(u)}}\to\mathbb{R}^{d_{u}}. Here 𝒩(u)\mathcal{N}(u) denotes the set of neighbors of uu ( the set of nodes vv such that (u,v)E(u,v)\in E). The features are then updated according to the rule hu,t+1=fu,t(hu,t,{hv,t,v𝒩(u)})h_{u,t+1}=f_{u,t}\left(h_{u,t},\{h_{v,t},v\in\mathcal{N}(u)\}\right). The update rules fu,tf_{u,t} can be parametric or non-parametric (our conclusions do not depend on that), and can be learned using various learning algorithms. The algorithm runs for a certain time t=0,1,,Rt=0,1,\ldots,R, which is also the depth of the underlying neural architecture. The obtained vector of features (hu,R,uV)(h_{u,R},u\in V) is then projected to a desired solution of the problem. As we will see below, the actual details of how the update functions fu,tf_{u,t} come about, and, furthermore, regardless of the dimensions du,uVd_{u},u\in V that the algorithm designers opts to work with, the power of GNN algorithms is fundamentally limited by the Overlap Gap Property, which we turn to next.

2 Limits of GNN

We begin with some background on problems introduced earlier: IS and MAXCUT in a setting of random graphs, and ground states of spin glasses. Let InI_{n}^{*} denote (any) maximum size independent set in 𝔾d(n)\mathbb{G}_{d}(n), which we recall is a random dd-regular graph, and |In||I_{n}^{*}| denote its size (cardinality). The following two facts were established in [BGT13] and [FŁ92] respectively. For each dd there exists αd\alpha_{d} such that |In|/n|I_{n}|/n converges to αd\alpha_{d} with high probability as nn\to\infty. Furthermore, αd=2(1+od(1))logd/d\alpha_{d}=2(1+o_{d}(1))\log d/d. Here od(1)o_{d}(1) denotes a function which converges to zero as dd\to\infty. Informally, we summarize this by saying that the size |In||I^{*}_{n}| of a largest independent set in 𝔾d(n)\mathbb{G}_{d}(n) is approximately 2(logd/d)n2(\log d/d)n.

Next we turn to the discussion of algorithms for finding large independent sets in 𝔾d(n)\mathbb{G}_{d}(n). It turns out that the best known algorithm for this problem is in fact the Greedy algorithm (the algorithms discussed in [ART23],[Boe23]) which recovers a factor 1/21/2-optimum independent set. More precisely, let IGreedyI_{\rm Greedy} be the independent set produced by the Greedy algorithm for 𝔾d(n)\mathbb{G}_{d}(n). Then limdαd1limn(|IGreedy|/n)=1/2\lim_{d\to\infty}\alpha_{d}^{-1}\lim_{n}(|I_{\rm Greedy}|/n)=1/2 as dd\to\infty, Exercise 6.7.20 in [FK15]. No algorithm is known which beats Greedy by a factor non-vanishing in dd.

The theory based on the Overlap Gap Property (OGP) explains this phenomena rigorously. The OGP for this problem was established in [GS17] and it reads as follows: for every factor 1/2+1/(22)<θ<11/2+1/(2\sqrt{2})<\theta<1 there exists 0<ν1<ν2<10<\nu_{1}<\nu_{2}<1 such that for every two independent sets I1,I2I_{1},I_{2} which are θ\theta-optimal, namely |I1|/nθαd|I_{1}|/n\geq\theta\alpha_{d}, |I2|/nθαd|I_{2}|/n\geq\theta\alpha_{d}, it is the case that either |I1I2|/nν1|I_{1}\cap I_{2}|/n\leq\nu_{1} or |I1I2|/nν2|I_{1}\cap I_{2}|/n\geq\nu_{2}, for all large enough dd, with high probability as nn\to\infty. Informally, every two sufficiently large independent sets (namely those which are multiplicative factor θ\theta-close to optimality) are either ”close” to each other (overlap in at least ν2n\nu_{2}n many nodes) or ”far” from each other (overlap in at most ν1n\nu_{1}n many nodes). Namely, solutions to the IS optimization problem with sufficiently large optimization values exhibit a gap in the overlaps (hence the name of the property).

It turns out that OGP is a barrier to a broad class of algorithms, in particular algorithms which are local in an appropriately defined sense. This was established in the same paper [GS17]. We introduce the notion of locality only informally. The formal definition involves the concept of Factors of IID for which we refer the reader to [GS17]. An algorithm, which maps graphs GG to an independent set in GG is called RR-local if for every node uu of the graph GG, the algorithmic decision as to whether to make this node a part of the constructed independent set or not, is based entirely on the size RR neighborhood of this node uu. In particular, we see that the GNN algorithm is RR-local provided that the number of iterations tt of GNN is at most RR. Importantly this holds regardless of the complexity of the feature dimensions dud_{u} and the choice of update functions fu,tf_{u,t}. We recall that the GNN algorithm reported in [SBK22] was based on 2 iterations and as such it is 2-local.

A main theorem proved in [GS17] states that OGP is a barrier for all RR-local algorithms, as long as RR is any constant not growing with the size of the graph. Specifically, for any RR, consider any algorithm 𝒜\mathcal{A} which is RR-local. Then the independent set produced by 𝒜\mathcal{A} is at most (1/2+1/(22))αd(1/2+1/(2\sqrt{2}))\alpha_{d} for large enough dd with high probability as nn\to\infty. Using a more sophisticated notion of multi-overlaps the result was improved in [RV17] to factor 1/21/2 of optimality for the same class of all local algorithms. Importantly, as we recall, 1/21/2 is the threshold achievable by the Greedy algorithm. The result was recently extended to the class of algorithms based on low-degree polynomials and small depth Boolean circuits in [GJW20],[Wei22]. It is conjectured that beating the 1/21/2 threshold is not possible within the class of polynomial time algorithms (but showing this will amount to proving PNPP\neq NP).

As a consequence of the discussion above we obtain an important conclusion regarding the power of GNN for solving the IS problem in 𝔾d(n)\mathbb{G}_{d}(n).

Theorem 2.1.

Consider any architecture of the GNN algorithm with any choice of dimensions (dv,v{1,2,,n})(d_{v},v\in\{1,2,\ldots,n\}), any choice of feature functions hu,th_{u,t}, and any choice of update functions fu,tf_{u,t}. Suppose the GNN algorithm iterates for RR steps and produces an independent set IGNNI_{\rm GNN} in the random regular graph 𝔾d(n)\mathbb{G}_{d}(n). Then the size of IGNNI_{\rm GNN} is at most half-optimum asymptotically in dd, for any value of RR.

We stress here that the depth parameter RR can be arbitrarily large and, in particular, may depend on the average degree dd, provided it does not depend on the size nn of the graph. We recall that R=2R=2 in the implementation reported in [SBK22]. Since the Greedy algorithm already achieves 1/21/2 optimality, as we have remarked earlier, this result leaves very little space for GNN to outperform the known (Greedy) algorithm for the IS defined on random regular graphs. We note that while the results above are stated in the asymptotic sense of increasing degrees dd, the fact is that OGP is a barrier to local algorithm as soon as OGP holds. For example if it is the case that say 𝔾10(n)\mathbb{G}_{10}(n) graph exhibits the OGP above some approximation factor ρ\rho to optimality, this would imply that GNN cannot beat the ρ\rho-factor approximation for the IS problem in the random graph 𝔾10(n)\mathbb{G}_{10}(n) for any graph independent depth (number of rounds) RR. The obstruction to this is proving the OGP for small values of dd, which is more challenging mathematically.

Next we turn to the MAXCUT problem on random graphs and random hypergraphs. The situation here is rather similar, but better developed in the context of random Erdös-Rényi graphs and hypergraphs, as opposed random regular graphs, and thus, this is the class of random graphs we now turn to. A random Erdös-Rényi graph with average degree dd denoted by 𝔾(n,d)\mathbb{G}(n,d) is obtained by connecting every pair of nodes i,ji,j among nn nodes with probability d/nd/n, independently across all unordered pairs iji\neq j. A random KK-uniform hypergraph is obtained similarly by creating a hyperedge from a collection of nodes i1,,iKi_{1},\ldots,i_{K} with probability d/(n1K1)d/{n-1\choose K-1}. We denote this graph by 𝔾(n,d;K)\mathbb{G}(n,d;K) It is easy to see that the average degree in both 𝔾(n,d)\mathbb{G}(n,d) and 𝔾(n,d;K)\mathbb{G}(n,d;K) is d+o(1)d+o(1). It was known for a while that the optimum values of MAXCUT in 𝔾(n,d;K)\mathbb{G}(n,d;K) are of the form n(d/(2K)+γKd+o(d))n(d/(2K)+\gamma_{K}^{*}\sqrt{d}+o(\sqrt{d})) as nn\to\infty[CGHS04], for some constant γK\gamma_{K}. Namely, the optimum value is known up to the order ndn\sqrt{d}. The constant γK\gamma_{K}^{*} was computed in [DMS17] and [Sen18] first for the case K=2K=2 and then extended to general KK in [CGPR19]. As it turns out, this constant is the value of the ground state of a KK-spin model, known since the work of Parisi [Par80], Talagrand [Tal06] and Panchenko [Pan13].

Interestingly, as far as the algorithms are concerned there is a fundamental difference between the case K=2K=2 (aka graphs) versus K3K\geq 3. Specifically, algorithms achieving the asymptotically optimal value n(d/(2K)+γKd+o(d))n(d/(2K)+\gamma_{K}^{*}\sqrt{d}+o(\sqrt{d})) are known based on Approximate Message Passing (AMP) schemes [AMS21]. Furthermore, conjecturally, the OGP does not hold for this problem. However, when K4K\geq 4 and is even, OGP provably does hold and again presents a barrier to all local algorithms, as was established in [CGPR19]. Furthermore, a sophisticated version of the multi-OGP called Branching-OGP was computed [HS22], the threshold for which, denoted by γBOGP,K\gamma_{\rm B-OGP,K} matches the best known algorithms, which is again the AMP type. The formal statement of the OGP is very similar to the one for the IS and we skip it. As an implication we obtain our second conclusion.

Theorem 2.2.

Consider any architecture of the GNN algorithm which produces a partition σGNN{±1}n\sigma_{\rm GNN}\in\{\pm 1\}^{n} in the random hyper graph 𝔾(n,d;K)\mathbb{G}(n,d;K). Suppose K4K\geq 4 and is even. For sufficiently large degree values dd the size of the cut associated with this solution is at most n(d/(2K)+γBOGP,Kd+o(d))n(d/(2K)+\gamma_{\rm B-OGP,K}\sqrt{d}+o(\sqrt{d})) with high probability, for any choice of RR. This is suboptimal since γBOGP,K<γK\gamma_{\rm B-OGP,K}<\gamma_{K}^{*}.

As the threshold γBOGP,K\gamma_{\rm B-OGP,K} is achievable by the AMP algorithm, again this leaves very little space for GNN to outperform the best known (namely AMP) algorithm for this problem.

The story for the problem of finding near ground states in spin glasses is very similar and is skipped. We refer the reader to surveys [Gam21] and [GMZ22] for details. In fact many of the results above described for the MAXCUT problem, were obtained first by deriving them for the spin glasses model, and then transferred to the case of random graphs 𝔾(n,d/n;K)\mathbb{G}(n,d/n;K) using an interpolation technique.

3 Discussion

In this paper we have presented a barriers faced by GNN based algorithms in solving combinatorial optimization problems in random graphs and random structures. These barriers stem from the complex solution space geometry property in the form of the Overlap Gap Property (OGP), a known barrier to broad classes of algorithms, local algorithms in particular. As GNN falls within the framework of local algorithms, OGP is a barrier to GNN as well. Since algorithms are known which achieve all the optimality values below the OGP phase transition threshold, this leaves very little room for GNN to outperform the known algorithms.

Some further investigation can be done however to obtain a more refined picture. Most of the OGP results are obtained in the doubly-asymptotic regime where not only the graph size diverges but also the degree (and the degree type parameters) diverge. While it is possible to prove OGP for a fixed and sufficiently large values of the degree, the values arising from these computations tend to be quite large. Instead, it would be nice to see whether OGP takes place say in random regular graphs 𝔾d(n)\mathbb{G}_{d}(n) for a small degree value such as say d=5d=5. We need sharper mathematical techniques for this. Knowing this might provide us with a place where non-trivial algorithms going beyond the simple Greedy algorithms might provide some value. It is known (as already observed in [ART23]) that more clever version of the Greedy algorithm known as the Degree Greedy algorithm provably outperform the naive Greedy algorithms for fixed values of dd. It is possible thus that a more sophisticated version of the GNN can perhaps achieve performance values even stronger than the ones obtained by the Degree Greedy algorithm. Whether this is possible is yet to be seen, but in case this is indeed verified rigorously, it would provide a more compelling argument in favor of GNN than the one currently presented in [SBK22].

References

  • [AMS21] Ahmed El Alaoui, Andrea Montanari, and Mark Sellke. Local algorithms for maximum cut and minimum bisection on locally treelike regular graphs of large degree. arXiv preprint arXiv:2111.06813, 2021.
  • [ART23] Maria Chiara Angelini and Federico Ricci-Tersenghi. Modern graph neural networks do worse than classical greedy algorithms in solving combinatorial optimization problems like maximum independent set. Nature Machine Intelligence, 5(1):29–31, 2023.
  • [BGT13] M. Bayati, D. Gamarnik, and P. Tetali. Combinatorial approach to the interpolation method and scaling limits in sparse random graphs. Annals of Probability. (Conference version in Proc. 42nd Ann. Symposium on the Theory of Computing (STOC) 2010), 41:4080–4115, 2013.
  • [Boe23] Stefan Boettcher. Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems. Nature Machine Intelligence, 5(1):24–25, 2023.
  • [CGHS04] D. Coppersmith, D. Gamarnik, M. Hajiaghayi, and G. Sorkin. Random MAXSAT, random MAXCUT, and their phase transitions. Random Structures and Algorithms, 24(4):502–545, 2004.
  • [CGPR19] Wei-Kuo Chen, David Gamarnik, Dmitry Panchenko, and Mustazee Rahman. Suboptimality of local algorithms for a class of max-cut problems. The Annals of Probability, 47(3):1587–1618, 2019.
  • [DMS17] AMIR DEMBO, ANDREA MONTANARI, and SUBHABRATA SEN. Extremal cuts of sparse random graphs. The Annals of Probability, 45(2):1190–1217, 2017.
  • [FK15] Alan Frieze and Michał Karoński. Introduction to random graphs. Cambridge University Press, 2015.
  • [FŁ92] A.M. Frieze and T. Łuczak. On the independence and chromatic numbers of random regular graphs. Journal of Combinatorial Theory, Series B, 54(1):123–132, 1992.
  • [Gam21] David Gamarnik. The overlap gap property: A topological barrier to optimizing over random structures. Proceedings of the National Academy of Sciences, 118(41), 2021.
  • [GJW20] David Gamarnik, Aukosh Jagannath, and Alexander S Wein. Low-degree hardness of random optimization problems. In 61st Annual Symposium on Foundations of Computer Science, 2020.
  • [GMZ22] David Gamarnik, Cristopher Moore, and Lenka Zdeborová. Disordered systems insights on computational hardness. Journal of Statistical Mechanics: Theory and Experiment, 2022(11):114015, 2022.
  • [GS17] David Gamarnik and Madhu Sudan. Limits of local algorithms over sparse random graphs. Annals of Probability, 45:2353–2376, 2017.
  • [HS22] Brice Huang and Mark Sellke. Tight lipschitz hardness for optimizing mean field spin glasses. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 312–322. IEEE, 2022.
  • [JLR11] Svante Janson, Tomasz Luczak, and Andrzej Rucinski. Random graphs, volume 45. John Wiley & Sons, 2011.
  • [Pan13] Dmitry Panchenko. The Sherrington-Kirkpatrick model. Springer Science & Business Media, 2013.
  • [Par80] Giorgio Parisi. A sequence of approximated solutions to the sk model for spin glasses. Journal of Physics A: Mathematical and General, 13(4):L115, 1980.
  • [RV17] Mustazee Rahman and Balint Virag. Local algorithms for independent sets are half-optimal. The Annals of Probability, 45(3):1543–1577, 2017.
  • [SBK22] Martin JA Schuetz, J Kyle Brubaker, and Helmut G Katzgraber. Combinatorial optimization with physics-inspired graph neural networks. Nature Machine Intelligence, 4(4):367–377, 2022.
  • [SBK23] Martin JA Schuetz, J Kyle Brubaker, and Helmut G Katzgraber. Reply to: Modern graph neural networks do worse than classical greedy algorithms in solving combinatorial optimization problems like maximum independent set. Nature Machine Intelligence, 5(1):32–34, 2023.
  • [SBZK22] Martin JA Schuetz, J Kyle Brubaker, Zhihuai Zhu, and Helmut G Katzgraber. Graph coloring with physics-inspired graph neural networks. Physical Review Research, 4(4):043131, 2022.
  • [Sen18] Subhabrata Sen. Optimization on sparse random hypergraphs and spin glasses. Random Structures & Algorithms, 53(3):504–536, 2018.
  • [SNL+22] Mutian Shen, Zohar Nussinov, Yang-Yu Liu, Changjun Fan, Yizhou Sun, and Zhong Liu. Finding spin glass ground states through deep reinforcement learning. In APS March Meeting Abstracts, volume 2022, pages K09–001, 2022.
  • [Tal06] Michel Talagrand. The parisi formula. Annals of mathematics, pages 221–263, 2006.
  • [Wei22] Alexander S Wein. Optimal low-degree hardness of maximum independent set. Mathematical Statistics and Learning, 4(3):221–251, 2022.