Hyperoptimized approximate contraction of tensor networks for rugged-energy-landscape spin glasses on periodic square and cubic lattices

Adil A. Gangat Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA 94085 Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125 Johnnie Gray Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125

Abstract

Obtaining the low-energy configurations of spin glasses that have rugged energy landscapes is of direct relevance to combinatorial optimization and fundamental science. Search-based heuristics have difficulty with this task due to the existence of many local minima that are far from optimal. The work of [M. M. Rams et al., Phys. Rev. E 104, 025308 (2021)] demonstrates an alternative that can bypass this issue for spin glasses with planar or quasi-planar geometry: sampling the Boltzmann distribution via approximate contractions of tensor networks. The computational complexity of this approach is due only to the complexity of contracting the network, and is therefore independent of landscape ruggedness. Here we initiate an investigation of how to take this approach beyond (quasi-)planar geometry by utilizing hyperoptimized approximate contraction of tensor networks [J. Gray and G. K.-L. Chan, Phys. Rev. X 14, 011009 (2024)]. We perform tests on the periodic square- and cubic-lattice, planted-solution Ising spin glasses generated with tile planting [F. Hamze et al., Phys. Rev. E 97, 043303 (2018)] for up to 2304 (square lattice) and 216 (cubic lattice) spins. For a fixed bond dimension, the time complexity is quadratic. With a bond dimension of only four, over the tested system sizes the average relative energy error in the most rugged instance class remains at $\sim$ 1% (square lattice) or $\sim$ 10% (cubic lattice) of optimal. In less rugged instances the solution is always optimal for the square lattice and either optimal or within $\sim$ 1% for the cubic lattice. These results suggest that further development of optimization methods based on tensor-network representations of spin glass partition functions may be fruitful, especially given that such methods are not limited to the Ising (i.e., binary) or two-body (i.e., quadratic) settings.

I Introduction

Discrete combinatorial optimization problems and glassy materials with quenched disorder are both often modeled with classical spin Hamiltonians such that the interaction energies between the spins are disordered. The simplest such “spin glass” model is the disordered classical Ising model:

H=\frac{1}{2}\sum_{i\neq j}J_{ij}s_{i}s_{j},

(1)

where $i$ and $j$ are graph vertex labels; $J_{ij}$ is a random, real scalar for the interaction energy between spins at vertices $i$ and $j$ ; and $s_{i}=\pm 1$ . Obtaining good solutions to discrete optimization problems and also understanding the low-temperature behavior of glassy systems then entails obtaining the low-energy (a.k.a. near-optimal) and lowest-energy (a.k.a. optimal) configurations of models like Eq. (1) ¹¹1The definition of “near-optimal” is problem-specific. For example, for Max-Cut problems it is in some cases desirable to have solutions within $\sim 12\%$ of optimal and in other cases within $\sim 6\%$ [3]..

Specialized hardware platforms for obtaining near-optimal and optimal configurations of spin glasses are under active development [2, 3, 4, 5, 6]. Although such machines display an average time complexity that is superpolynomial for obtaining optimal solutions [3] (but see Refs. [7, 8, 9, 10] for the theoretical possibility of average polynomial time complexity), the hope is that the smaller constant factors in the scaling that are possible with such machines will one day yield superior performance compared to classical CPU-based methods for industrial optimization problems at scale. A recent milestone of particular note toward this goal is the experimental confirmation by D-Wave, in the context of a 5,000-qubit simulation of a three-dimensional quantum spin glass, of a scaling advantage of quantum annealing over both simulated annealing and (Monte Carlo) simulated quantum annealing for the solution quality (i.e., spin configuration energy distance from the ground state) versus the annealing time [11]. Regarding computing near-optimal solutions (a.k.a. approximate optimization), a recent work [12] showed that quantum annealing can obtain an average polynomial time complexity for solutions within 1% error from optimal of a quasi-two-dimensional classical spin glass, and further that it can do so with a scaling exponent that is smaller than that of the leading classical heuristic (parallel tempering Monte Carlo with isoenergetic cluster moves [13]).

But for both exact and approximate optimization, there is much room left to improve such specialized hardware platforms. For example, densely connected Ising Hamiltonians are the typical case in industrial applications [14], so the sparse connectivity between qubits in the current D-Wave hardware requires an embedding of the dense problem, which incurs a quadratic overhead in the number of spins [14] that dramatically impacts its scaling performance [15]. As another example, the quantum approximate optimization experiments in Ref. [16] show solution quality that remains constant with increasing problem size for spin glass problems with geometry that is native to their superconducting-qubit processor’s planar geometry, but decreasing solution quality with increasing problem size for geometrically non-native problems that are embedded (a.k.a. compiled) into their processor’s geometry. Therefore, further development of classical algorithms for exact and approximate optimization remains relevant, as they can compute larger scale and more complex problems than current specialized machines, and also serve as a way of benchmarking the performance of the specialized machines as they are further developed.

As for classical digital algorithms, many are based on Markov Chain Monte Carlo (MCMC) [17], which performs an annealing-based stochastic search over the low-temperature free-energy landscape in configuration space for the global minima and also local minima that are close in energy to the global minima. However, in various spin-glass cases the ruggedness of the landscape is such that these methods become exponentially slower with increasing system size [18, 19, 20]. There are some proposals to accelerate MCMC methods with assistance from machine learning, but Ref. [19] shows that these proposals fail in the presence of rugged energy landscapes. In fact, Ref. [19] concludes that rugged-energy-landscape spin glass problems may be intrinsically difficult for all machine learning algorithms, and we are not aware of any counterexamples thus far. In fact, the difficulty that MCMC-based methods face in dealing with rugged energy landscapes in the configuration space of spin glass phases has been suggested as indicative of a place to look for quantum advantage [21].

On the other hand, there are classical digital algorithms that are based on simulation of dynamical systems, such as the simulated coherent Ising machine [22, 23] and simulated bifurcation machine [24, 25], but these also amount to a search over the free-energy landscape in configuration space for optimal and near-optimal minima, and can therefore also suffer a steep performance drop (in terms of time to solution) with increasing landscape ruggedness, as observed for example in the supplementary materials of Ref. [23].

For industrial applications it is often necessary to obtain near-optimal solutions to combinatorial optimization problems at very large system sizes ²²2It is often the case in industrial contexts that the optimal solution is not necessary; reduced cost solutions are often sufficient., making polynomial time complexity highly desirable. It is important then to find numerical methods that can approximately solve rugged-energy-landscape classical spin glasses in polynomial time. Tensor networks [27, 28] offer a route to approximately sampling, in polynomial time, the low-temperature Boltzmann distribution of classical spin glasses without traversing the free energy landscape in configuration space, and may thus offer a way to bypass the problems presented by rugged energy landscapes without the use of quantum hardware. This route is now possible due to the following key developments: First, the pioneering work of Nishino [29, 30, 31] showed how to use tensor-network algorithms for homogenous two- and three-dimensional classical lattice models. Then, Murg et al. presented a way in Ref. [32] to use tensor networks to deterministically perform efficient evaluation of partition functions of finite, inhomogeneous two-dimensional classical lattice models with open boundaries (although they did not apply their approach to a spin glass). This latter approach is also reviewed in Ref. [33]. Their idea entails a lattice of low-rank tensors that reflects the geometry of the physical lattice; an approximation of the partition function is obtained in polynomial time by approximately contracting all of the tensors in a row-to-row (or column-to-column) fashion by treating the rows (or columns) as matrix product states (MPSs) or matrix product operators (MPOs), which are one-dimensional tensor networks. The approximation arises due to setting an upper bound on the size of the indices of the intermediate tensors that arise during the contraction. Since the computational complexity of this approximate method is independent of the elements in the tensors, it is independent of the ruggedness of the energy landscape. The subsequent work of Ref. [34] develops the idea of Murg et al. further by demonstrating how to sample actual spin configurations from the low-temperature Boltzmann distribution of spin glasses via such approximate contractions.

We note some limitations of the work in Ref. [34]: It is only able to natively (i.e., without embedding) handle spin glasses on graphs that are either planar or quasi-planar (e.g., the chimera graph), which prevents a native treatment of spin glass problems with non-planar geometry. Though non-planar problems can be embedded into quasi-planar graphs, this entails a polynomial overhead in the number of spins [14]. Further, while Ref. [34] demonstrates good solution quality with small-index-size tensors for large, rugged-landscape spin glasses that are native to the chimera graph, they do not test against problems with non-planar geometry that have been embedded into the chimera geometry. This is significant because tests carried out in Ref. [35] show that non-natively representing the Boltzmann distribution of a complete-graph model by using an MPS requires the size of the tensors of the MPS to grow rapidly with system size if good solution quality is desired. Also, the above-mentioned experiment [16] that found decreasing solution quality with increasing problem size for geometrically non-native problems that are embedded into the native planar geometry of a superconducting-qubit processor makes it doubtful that the quasi-planar tensor-network method in Ref. [34] would yield good solutions in polynomial time for problems that are not natively planar or quasi-planar.

Ref. [36] presents a different tensor network approach for sampling from the partition function, based on tropical algebra, that can natively handle non-planar geometries and that guarantees the optimal solution with a time to solution that is independent of landscape ruggedness. However, the time complexity of this exact approach is (unsurprisingly) superpolynomial in the system size, and due to the use of tropical algebra there is as yet no established way of incorporating approximations into that approach to reduce the time complexity to polynomial. The recent work in Ref. [37] presents a still different tensor network method that is tailored to the exact partition function computation of the p-spin model (which can exhibit rugged energy landscapes) and can also natively handle non-planar geometry, yet again in some regimes has superpolynomial time complexity.

Thus it is important to find ways of extending the polynomial-time approach of Ref. [34] to be able to natively handle non-planar geometries. Strategies for approximately contracting tensor networks with arbitrary geometry do exist [38, 39, 40], and thereby provide a potential route for achieving this. Here we initiate the exploration of this route by testing the ability of the approximate tensor-network contraction method in Ref. [39] to sample the low-temperature Boltzmann distributions of spin glasses on the following non-planar geometries: square and cubic lattices with periodic boundary conditions ³³3Ref. [39] shows that even in cases where an MPS-based contraction, such as the one used in Ref. [34], may be applied, the method of Ref. [39] can significantly outperform MPS-based contraction in terms of both memory and time costs.. We use planted-solution models whose instances span distinct classes of energy landscape ruggedness. For the square lattice we find the solution is always optimal except for in the most rugged class of instances, where we find an energy error of at most 3%. For the three ruggedness classes of the cubic lattice, we find an energy error of at most 12% in the most rugged class, and 0% and about 3% in the other two. These results are all obtained with the same polynomial time complexity ⁴⁴4The (assumed) distinction between P and NP does not preclude the existence of heuristic polynomial-time numerical methods for rugged-energy-landscape spin glasses—methods that have no performance guarantee regarding solution quality but in practice yield good solutions at least in some instances [3].

II Method

We follow the general idea in Ref. [34] for representing the Boltzmann distribution $p(\textbf{s})\sim\textrm{exp}[-\beta H(\textbf{s})]$ , where s is a spin configuration vector, as a contraction of a network of low-rank tensors; this gives access to any (unconditional or conditional) marginal probability of the system via trivial modifications of the tensor network. Details of our tensor network formulation are given in Appendix A. At a given value of $\beta$ , our method computes the marginal probability distribution for a single spin by contracting a modified version of the original tensor network, updates the original tensor network by projecting that spin onto its most probable state, then computes the (conditional) marginal probability distribution of the next spin with the projected version of the original tensor network, and serially iterates this procedure for all remaining spins. In this way the Boltzmann distribution for the entire spin glass at the chosen value of $\beta$ is sampled (though the sampling is not exact, due to the compression described below). We repeat this at a series of values of $\beta$ and choose the lowest energy configuration as the best solution (limited numerical precision precludes the possibility of using arbitrarily large $\beta$ ).

Ref. [34] performs a more sophisticated sampling via a branch and bound method, but we opt for the simpler strategy above since our focus here is only on providing evidence that hyperoptimized approximate tensor network contraction [39] can, with a fixed polynomial time complexity, readily obtain solutions that are within about 10% energy error from optimal for at least some non-planar spin glass models with rugged energy landscapes.

During the contraction of any tensor network, the size of the indices of the intermediate tensors can grow exponentially. The method in Ref. [39], which we employ in this work, can automatically limit, via compression, the index size to a maximum bond dimension ( $\chi$ ). For short-range spin models on lattices, using a constant value of $\chi$ limits the memory complexity for the tensor contractions to be constant in the number of spins ( $N$ ). The leading order memory complexity of the entire algorithm is therefore only that of the initial tensor network itself, which is $\mathcal{O}(N)$ for short-range models (see Appendix A for an example). For a fixed $\chi$ , the time complexity in terms of $N$ is set by the number of tensor contractions. For short-range spin models on lattices, there are $\mathcal{O}(N)$ tensor contractions for each single-spin marginal. Computing a spin configuration for a given $\beta$ with the method described above for all $N$ spins therefore has a time complexity $\mathcal{O}(N^{2})$ . We note that the computations at different values of $\beta$ and $\chi$ are independent, and can therefore be done in parallel.

As explained in Ref. [34], while increasing $\beta$ theoretically reduces thermal fluctuations and therefore the $\chi$ that is needed to achieve a given solution quality (larger $\chi$ yields less compression and therefore more accuracy in principle), limited numerical precision means that in practice there is a finite value of $\beta$ that is optimal at any given value of $\chi$ (to see this, consider that the numerical range of the Boltzmann weights in Eq. (2) in Appendix A monotonically increases with increasing $\beta$ ). Finite numerical precision also means that increasing $\chi$ does not monotonically decrease the error at a fixed $\beta$ . Thus the values of $\chi$ and $\beta$ that optimize the solution quality are not known a priori. But as mentioned above, our purpose here is only to provide evidence that the general scheme is sufficiently performant in the context of rugged, non-planar spin glasses to be of interest for combinatorial optimization. Here we say that this means that solutions within about $10\%$ of optimal should appear even at small values of fixed $\chi$ , since computational cost monotonically increases with increasing $\chi$ . Hence we usually set $\chi=4$ and test a few values of $\beta$ to show that multiple solutions with energy error $\lesssim 10\%$ do appear.

While the actual time cost of any tensor network contraction can depend strongly on the order of the tensor contractions, the contraction algorithm in Ref. [39] that we employ automatically chooses an optimal or near-optimal contraction order. This search for the contraction order is done only once for a given lattice geometry, and not for every spin glass instance, so we do not include its cost in the computational complexity of the algorithm. We implement the algorithm with the Python libraries quimb [43] and cotengra [44].

III Benchmark Models

Our testing uses the planted-solution Ising-model instances generated by the Chook library [45] with the tile planting method [46] on the square lattice and cubic lattice, both with periodic boundaries in all directions; in both cases the Hamiltonian takes the form in Eq. (1). Since the solutions are planted, the groundstate energy is exactly known.

III.1 Square lattice tile planting

A characterization of the landscape ruggedness of this model is contained in Ref. [47]. The model contains four base classes of instances, designated as $C_{1},C_{2},C_{3},$ and $C_{4}$ . The order of increasing (average) energy-landscape ruggedness of these is: $C_{4},C_{3},C_{1},$ and $C_{2}$ . The results in Ref. [47] also indicate that this is the order of increasing average computational hardness for at least the following stochastic-search methods: population annealing, simulated annealing, and simulated quantum annealing.

The ground state is doubly degenerate; by default they are the ferromagnetic state (all $+1$ or all $-1$ ), but the Chook library has the option to scramble them with gauge transformations, which we enable for all problem instances.

III.2 Cubic lattice tile planting

A (partial) characterization of the landscape ruggedness of this model is contained in Ref. [46]. The model contains three base classes of instances, designated as $F_{2},F_{4},$ and $F_{6}$ , where $F_{2}$ is subdivided into $F_{21}$ and $F_{22}$ , and $F_{4}$ into $F_{41}$ and $F_{42}$ The Chook library generates only problems in $F_{22}$ , $F_{42}$ , and $F_{6}$ , and mixtures thereof. The data in Ref. [46] shows that $F_{6}$ is computationally harder (i.e., more rugged) than both $F_{22}$ and $F_{42}$ for the stochastic-search methods of simulated annealing, parallel tempering, and population annealing. The hardest instances of these problems are much harder for these stochastic search methods than problems with either bimodal or Gaussian disorder.

$F_{22}$ , $F_{42}$ , and $F_{6}$ all have a ferromagnetic groundstate, but $F_{22}$ and $F_{6}$ also have other groundstates: $F_{22}$ has a total of four groundstates (up to $\mathbb{Z}_{2}$ symmetry) and $F_{6}$ has a total of eight (up to $\mathbb{Z}_{2}$ symmetry). For all problem instances, we enable the option in the Chook library to scramble the groundstates with gauge transformations.

IV Results

The simulations are done one instance at a time on an Apple M2 Ultra processor with 24 cores (16 performance cores and efficiency cores) and 128 GB of RAM.

IV.1 Square lattice tile planting

We test our method against the four base classes of the square-lattice model individually. Within each base class, we test ten problem instances at sizes $L$ ( $L^{2}$ ) = $20$ ( $400$ ), $32$ ( $1024$ ), and $48$ ( $2304$ ). The full data for solution quality vs. $\beta$ is presented in Appendix B. In Fig. 1 we show that, with $\chi=4$ , the method produces solutions that are within $2\%$ energy error for the most rugged base class ( $C_{2}$ ) at $L=48$ . In Fig. 2 we show the time to solution (TTS) vs. $N=L^{2}$ for $C_{2}$ with $\chi=4$ , revealing the predicted quadratic scaling at constant $\chi$ .

For a given problem instance, we consider the best solution as the one with the smallest energy error from the values of $\beta$ that we test. We find the smallest error to always be zero in base classes $C_{1},C_{3},$ and $C_{4}$ . For $C_{2}$ , in Fig. 3 we show that the average of such best solutions does not substantially change with increasing system size at $\chi=4$ .

In Fig. 4 we plot the energy error for all ten instances of $C_{2}$ at $L=20$ and $\beta=1$ as a function of $\chi$ to show that the error can be further decreased by only slightly increasing $\chi$ .

Refer to caption — Figure 1: (color online). Square-lattice tile planting; solution quality vs. $\beta$ . Ten instances from the most rugged base class ( $C_{2}$ ) at $L=48$ with $\chi=4$ . Data points belonging to the same instance are connected by dash-dotted lines. The minimum energy error over these values of $\beta$ is less than 2 $\%$ for all instances.

IV.2 Cubic lattice tile planting

We test our method against the $F_{22}$ , $F_{42}$ , and $F_{6}$ base classes of the cubic-lattice model individually. Within each base class, we test ten problem instances at sizes $L=4$ ( $N=L^{3}=64$ ) and $L=6$ ( $N=216$ ). We do not test larger sizes due to the long computation times required. In Fig. 5 we show results for the three base classes at $L=4$ and $L=6$ , all with $\chi=4$ . For the most rugged base class ( $F_{6}$ ), the minimum error is less than $10\%$ at $L=4$ ( $N=64$ ) and less than $12\%$ at $L=6$ ( $N=216$ ).

Due to having data from only two values of $L$ , we do not have a plot that meaningfully demonstrates the scaling of the time complexity, but the theoretical expectation is a quadratic scaling in $N=L^{3}$ , as explained in Sec. II.

We also perform some investigation of the performance in the most rugged base class ( $F_{6}$ ) at $L=4$ with larger values of $\chi$ . In Fig. 6 we plot the energy error for all ten instances of $F_{6}$ at $L=4$ and two values of $\beta$ as a function of $\chi$ . This illustrates that the optimal value of $\chi$ is $\beta$ -dependent, and that the error can be further decreased from what is obtained with $\chi=4$ by moderately increasing $\chi$ . In Fig. 7 we plot the energy error for all ten instances of $F_{6}$ at $L=4$ and $\chi=32$ as a function of $\beta$ . Comparison with Fig. 5(e) shows that the optimal value of $\beta$ is $\chi$ -dependent. Regarding the time cost of increasing $\chi$ , in Fig. 8 we plot the TTS for all ten instances of $F_{6}$ at $L=4$ and $\beta=1.2$ as a function of $\chi$ , revealing an approximately twenty-five times increase between $\chi=4$ and $\chi=32$ . It is unclear from the data in Fig. 8 what the asymptotic scaling of the TTS is with $\chi$ , but given that the average error lies below $5\%$ for $F_{6}$ at $L=4$ and $\chi=32$ (see Fig. 6) and that the solution quality doesn’t seem to have a strong size dependence (see Fig. 5), values of $\chi$ that are far beyond the range tested are likely not practically relevant for cubic-lattice Ising spin glasses.

V Summary and Discussion

We have investigated the ability of approximate tensor network contraction with small maximum bond dimension to yield optimal or near-optimal (i.e., $\lesssim 10\%$ energy error) solutions in polynomial time to at least some types of non-planar, short-range spin glass problems with rugged free energy landscapes. For planted-solution Ising spin glasses generated with tile planting, with a constant maximum bond dimension of $\chi=4$ , and therefore fixed polynomial time and memory complexity, we found no substantial reduction in solution quality with increasing system size between $400$ and $2304$ spins in the toroidal-square-lattice case, and between $64$ and $216$ spins in the toroidal-cubic-lattice case. For the most rugged subclass of instances in the toroidal-square-lattice case, the average best solution quality (over multiple instances) remained at $\sim 1\%$ with increasing system size, while it remained at $\sim 10\%$ for the most rugged subclass in the toroidal-cubic-lattice case. As far as solution quality, these results are encouraging given that the sampling method that we used was quite crude; it is likely that a more sophisticated sampling procedure, such as the one in Ref. [34], would yield a substantial reduction in the minimum energy error for some of the problem instances without increasing $\chi$ . At the same time, we emphasize that the method here does not have a strict guarantee on solution quality (i.e., it is categorized as a heuristic method). Regarding system size, though the time complexity of the method is quadratic, we were unable to reach beyond $6\times 6\times 6$ for the cubic-lattice due to the actual simulation time required; further development of the method is needed in order to be able to use tensor-network representations of partition functions to approximately optimize the much larger cubic-lattice sizes addressable with certain other methods.

For instance, the recent D-Wave quantum annealing experiment in Ref. [11] also yielded an approximate optimization of a cubic-lattice spin-glass model, but at a much larger size of over 5,000 spins while achieving an error of about 2 $\%$ on a much shorter time scale of microseconds. A direct comparison between their results and the ones here is not possible since they did not assess the computational time complexity (they used a fixed system size), nor did they test the performance of their method against different levels of energy-landscape ruggedness, which in the literature has become synonymous with computational hardness [47, 19]; it is known that if a method successfully performs approximate optimization for some spin glass models at very large sizes (e.g., 8,000 spins on a cubic lattice [48]), it may still perform poorly on energy landscapes that are more rugged [19]. But we can make a rough comparison between our results and those of Ref. [11] as follows: According to Ref. [46], the most rugged subclass ( $F_{6}$ ) for the cubic lattice model that we test is orders of magnitude more computationally difficult than the random $\pm J$ instances that are tested in Ref. [11]. Considering this, our obtained error of $\sim$ 10% for subclass $F_{6}$ (Figs. 5 (e-f)) suggests that our method (or a further development thereof) may perform competitively in terms of solution quality (i.e., relative error) for the spin glass instances of Ref. [11]. This is further suggested in Figs. 5 (a-d), which show that when $L=6$ our method achieves an error of about 3% for the $F_{22}$ subclass and an error of 0% in the $F_{42}$ subclass. Also, Fig. 6 (b) reveals an average error of $\sim 5\%$ in the $F_{6}$ subclass when $L=4$ and $\chi=32$ . But as mentioned above, it is unlikely that the method here could approximately optimize a cubic lattice as large as 5,000 spins in a practical amount of time, and in that sense one can say that the D-Wave results are superior. At the same time, the method here has a quadratic time complexity, and further development of this method may therefore eventually yield something competitive regarding reachable system sizes; the fact that small bond dimension ( $\chi$ ) yields good results suggests that the physics is highly local, and ongoing followup work [49] suggests that this might be leveraged to dramatically improve simulation time and thereby reach much larger system sizes. On the other hand, there is theoretical hope that the time complexity of quantum annealing may become universally polynomial [50, 8, 9, 10], which could help quantum annealing reach still larger system sizes. Also noteworthy is that Ref. [51] demonstrates that cubic spin glasses in the form of minor embeddings that are larger than what can fit on a single quantum annealing processor can also be approximately optimized with quantum annealing by a hybrid quantum-classical approach that outperforms classical simulated annealing at a fixed system size. Thus, the situation regarding competition between methods for optimization of cubic-lattice Ising spin glasses remains fluid.

It is also worthwhile to juxtapose our square-lattice results with those of the D-Wave quantum annealing experiment in Ref. [12], which uses a quasi-two-dimensional model whose interaction graph is equivalent to a honeycomb lattice with additional non-planar, short-range bonds. Our results (Fig. 3) show 0% error except in the most rugged subclass ( $C_{2}$ ), where we achieve an average error of $\lesssim$ 1%, and our time complexity is always quadratic. The error-corrected quantum-annealing and parallel-tempering Monte-Carlo data in Ref. [12] are for error thresholds at and very close to 1%, and always have sub-quadratic but super-linear time complexity. The Sidon-set [52] spin glass instances that they used in Ref. [12] are considerably computationally harder than $\pm J$ spin glass instances, but we do not know which of the subclasses of the present square-lattice model it is closest to in terms of hardness.

Regarding the possibility of addressing more complex classical spin glasses than quasi-two-dimensional Ising and cubic-lattice Ising, the tensor-network approach investigated here is amenable to beyond-binary variables and beyond-quadratic interactions, and can easily incorporate local fields. Further, Ref. [53] provides a constructive proof that square- and cubic-lattice classical Ising spin glasses faithfully capture the low-energy configuration space of any classical spin system with only a polynomial overhead in the number of spins (see also Ref. [54] regarding mapping between K-SAT and the cubic-lattice classical Ising spin glass). When dealing with such square- and cubic-lattice spin glasses that encode more general spin glasses whose interaction graphs are highly non-local, it is likely that the method investigated here would suffer from a significant bias in the sampling due to the finite bond dimension (the non-local nature of the original model that is encoded into the lattice model would likely require very large bond dimension for the present method to sample the lattice model approximately fairly). However, some work has demonstrated that unbiased sampling of Boltzmann distributions of classical spin lattices can be achieved with tensor-network contraction by combining it with Markov Chain Monte Carlo [55, 56, 57, 58]. In particular, Ref. [58] demonstrated this to work well with small bond dimension for a classical square-lattice spin glass at criticality, where the correlation length is very large; it may be that the method would work well at small bond dimension also for planar or cubic-lattice spin glass models that encode spin glasses with highly non-local interaction graphs. It is unclear, however, whether or not the time complexity would be polynomial.

Finally, short-range spin glass models are better models of real materials with quenched disorder than complete-graph models, and therefore are of relevance for condensed matter physics. A prominent example is the classical Edwards-Anderson Ising model on the cubic lattice, the nature of the low-temperature phase of which has remained unresolved due to the rugged-energy-landscape problem [59]. Since the results in this work demonstrate that tensor-network contraction can approximately optimize rugged-energy-landscape cubic spin glasses of moderate size in polynomial time, we hope that the developments here and in related work may eventually contribute toward resolving such fundamental questions.

VI Acknowledgments

We acknowledge discussions with Garnet Chan and Yu Tong. AAG also acknowledges correspondence with Sam Reifenstein, Timothée Leleu, Wangwei Lan, and Marek Rams.

References

Note [1] The definition of “near-optimal” is problem-specific. For example, for Max-Cut problems it is in some cases desirable to have solutions within $\sim 12\%$ of optimal and in other cases within $\sim 6\%$ [3].
Cai et al. [2020] F. Cai, S. Kumar, T. Van Vaerenbergh, X. Sheng, R. Liu, C. Li, Z. Liu, M. Foltin, S. Yu, Q. Xia, et al., Power-efficient combinatorial optimization using intrinsic noise in memristor hopfield neural networks, Nature Electronics 3, 409 (2020).
Mohseni et al. [2022] N. Mohseni, P. L. McMahon, and T. Byrnes, Ising machines as hardware solvers of combinatorial optimization problems, Nature Reviews Physics 4, 363 (2022).
Nguyen et al. [2024] L. Nguyen, M.-A. Miri, R. J. Rupert, W. Dyk, S. Wu, N. Vrahoretis, I. Huang, M. Begliarbekov, N. Chancellor, U. Chukwu, et al., Entropy computing: A paradigm for optimization in an open quantum system, arXiv preprint arXiv:2407.04512 (2024).
Zhang et al. [2024a] T. Zhang, Q. Tao, B. Liu, A. Grimaldi, E. Raimondo, M. Jimenez, M. J. Avedillo, J. Nunez, B. Linares-Barranco, T. Serrano-Gotarredona, et al., A review of ising machines implemented in conventional and emerging technologies, IEEE Transactions on Nanotechnology (2024a).
King et al. [2024] A. D. King, A. Nocera, M. M. Rams, J. Dziarmaga, R. Wiersema, W. Bernoudy, J. Raymond, N. Kaushal, N. Heinsdorf, R. Harris, et al., Computational supremacy in quantum simulation, arXiv preprint arXiv:2403.00910 (2024).
Traversa and Di Ventra [2017] F. L. Traversa and M. Di Ventra, Polynomial-time solution of prime factorization and np-complete problems with digital memcomputing machines, Chaos: An Interdisciplinary Journal of Nonlinear Science 27 (2017).
Bernaschi et al. [2024] M. Bernaschi, I. González-Adalid Pemartín, V. Martín-Mayor, and G. Parisi, The quantum transition of the two-dimensional ising spin glass, Nature 631, 749 (2024).
Zhang et al. [2024b] H. Zhang, K. Boothby, and A. Kamenev, Cyclic quantum annealing: Searching for deep low-energy states in 5000-qubit spin glass, arXiv preprint arXiv:2403.01034 (2024b).
Ghosh et al. [2024] R. Ghosh, L. A. Nutricati, N. Feinstein, P. Warburton, and S. Bose, Exponential speed-up of quantum annealing via n-local catalysts, arXiv preprint arXiv:2409.13029 (2024).
King et al. [2023] A. D. King, J. Raymond, T. Lanting, R. Harris, A. Zucca, F. Altomare, A. J. Berkley, K. Boothby, S. Ejtemaee, C. Enderud, et al., Quantum critical dynamics in a 5,000-qubit programmable spin glass, Nature 617, 61 (2023).
Bauza and Lidar [2024] H. M. Bauza and D. A. Lidar, Scaling advantage in approximate optimization with quantum annealing, arXiv preprint arXiv:2401.07184 (2024).
Zhu et al. [2015] Z. Zhu, A. J. Ochoa, and H. G. Katzgraber, Efficient cluster algorithm for spin glasses in any space dimension, Physical review letters 115, 077201 (2015).
Könz et al. [2021] M. S. Könz, W. Lechner, H. G. Katzgraber, and M. Troyer, Embedding overhead scaling of optimization problems in quantum annealing, PRX Quantum 2, 040322 (2021).
Hamerly et al. [2019] R. Hamerly, T. Inagaki, P. L. McMahon, D. Venturelli, A. Marandi, T. Onodera, E. Ng, C. Langrock, K. Inaba, T. Honjo, et al., Experimental investigation of performance differences between coherent ising machines and a quantum annealer, Science advances 5, eaau0823 (2019).
Harrigan et al. [2021] M. P. Harrigan, K. J. Sung, M. Neeley, K. J. Satzinger, F. Arute, K. Arya, J. Atalaya, J. C. Bardin, R. Barends, S. Boixo, et al., Quantum approximate optimization of non-planar graph problems on a planar superconducting processor, Nature Physics 17, 332 (2021).
Krauth [2006] W. Krauth, Statistical mechanics: algorithms and computations, Vol. 13 (OUP Oxford, 2006).
Machta [2009] J. Machta, Strengths and weaknesses of parallel tempering, Physical Review E 80, 056706 (2009).
Ciarella et al. [2023] S. Ciarella, J. Trinquier, M. Weigt, and F. Zamponi, Machine-learning-assisted monte carlo fails at sampling computationally hard problems, Machine Learning: Science and Technology 4, 010501 (2023).
Montanari and Semerjian [2006] A. Montanari and G. Semerjian, Rigorous inequalities between length and time scales in glassy systems, Journal of Statistical Physics 125, 23 (2006).
Jaumà et al. [2024] G. Jaumà, J. J. García-Ripoll, and M. Pino, Exploring quantum annealing architectures: A spin glass perspective, Advanced Quantum Technologies 7, 2300245 (2024).
Tiunov et al. [2019] E. S. Tiunov, A. E. Ulanov, and A. Lvovsky, Annealing by simulating the coherent Ising machine, Optics Express 27, 10288 (2019).
Leleu et al. [2021] T. Leleu, F. Khoyratee, T. Levi, R. Hamerly, T. Kohno, and K. Aihara, Scaling advantage of chaotic amplitude control for high-performance combinatorial optimization, Communications Physics 4, 266 (2021).
Goto et al. [2019] H. Goto, K. Tatsumura, and A. R. Dixon, Combinatorial optimization by simulating adiabatic bifurcations in nonlinear hamiltonian systems, Science Advances 5, eaav2372 (2019).
Goto et al. [2021] H. Goto, K. Endo, M. Suzuki, Y. Sakai, T. Kanao, Y. Hamakawa, R. Hidaka, M. Yamasaki, and K. Tatsumura, High-performance combinatorial optimization based on classical mechanics, Science Advances 7, eabe7953 (2021).
Note [2] It is often the case in industrial contexts that the optimal solution is not necessary; reduced cost solutions are often sufficient.
Orús [2014] R. Orús, A practical introduction to tensor networks: Matrix product states and projected entangled pair states, Annals of Physics 349, 117 (2014).
Bañuls [2023] M. C. Bañuls, Tensor network algorithms: A route map, Annual Review of Condensed Matter Physics 14, 173 (2023).
Nishino [1995] T. Nishino, Density matrix renormalization group method for 2d classical models, Journal of the Physical Society of Japan 64, 3598 (1995).
Nishino and Okunishi [1995] T. Nishino and K. Okunishi, Product wave function renormalization group, Journal of the Physical Society of Japan 64, 4084 (1995).
Nishino and Okunishi [1998] T. Nishino and K. Okunishi, A density matrix algorithm for 3d classical models, Journal of the Physical Society of Japan 67, 3066 (1998).
Murg et al. [2005] V. Murg, F. Verstraete, and J. I. Cirac, Efficient evaluation of partition functions of inhomogeneous many-body spin systems, Physical Review Letters 95, 057206 (2005).
Verstraete et al. [2008] F. Verstraete, V. Murg, and J. I. Cirac, Matrix product states, projected entangled pair states, and variational renormalization group methods for quantum spin systems, Advances in physics 57, 143 (2008).
Rams et al. [2021] M. M. Rams, M. Mohseni, D. Eppens, K. Jałowiecki, and B. Gardas, Approximate optimization, sampling, and spin-glass droplet discovery with tensor networks, Physical Review E 104, 025308 (2021).
Jałowiecki et al. [2021] K. Jałowiecki, M. M. Rams, and B. Gardas, Brute-forcing spin-glass problems with cuda, Computer Physics Communications 260, 107728 (2021).
Liu et al. [2021] J.-G. Liu, L. Wang, and P. Zhang, Tropical tensor network for ground states of spin glasses, Physical Review Letters 126, 090506 (2021).
Lanthier et al. [2024] B. Lanthier, J. Côté, and S. Kourtis, Tensor networks for $p$ -spin models, arXiv preprint arXiv:2405.08106 (2024).
Pan et al. [2020] F. Pan, P. Zhou, S. Li, and P. Zhang, Contracting arbitrary tensor networks: general approximate algorithm and applications in graphical models and quantum circuit simulations, Physical Review Letters 125, 060503 (2020).
Gray and Chan [2024] J. Gray and G. K.-L. Chan, Hyperoptimized approximate contraction of tensor networks with arbitrary geometry, Physical Review X 14, 011009 (2024).
Ma et al. [2024] L. Ma, M. T. Fishman, E. M. Stoudenmire, and E. Solomonik, Approximate contraction of arbitrary tensor networks with a flexible and efficient density matrix algorithm, arXiv preprint arXiv:2406.09769 (2024).
Note [3] Ref. [39] shows that even in cases where an MPS-based contraction, such as the one used in Ref. [34], may be applied, the method of Ref. [39] can significantly outperform MPS-based contraction in terms of both memory and time costs.
Note [4] The (assumed) distinction between P and NP does not preclude the existence of heuristic polynomial-time numerical methods for rugged-energy-landscape spin glasses—methods that have no performance guarantee regarding solution quality but in practice yield good solutions at least in some instances [3].
Gray [2018] J. Gray, quimb: A python package for quantum information and many-body calculations, Journal of Open Source Software 3, 819 (2018).
Gray and Kourtis [2021] J. Gray and S. Kourtis, Hyper-optimized tensor network contraction, Quantum 5, 410 (2021).
Perera et al. [2020a] D. Perera, I. Akpabio, F. Hamze, S. Mandra, N. Rose, M. Aramon, and H. G. Katzgraber, Chook–a comprehensive suite for generating binary optimization problems with planted solutions, arXiv preprint arXiv:2005.14344 (2020a).
Hamze et al. [2018] F. Hamze, D. C. Jacob, A. J. Ochoa, D. Perera, W. Wang, and H. G. Katzgraber, From near to eternity: spin-glass planting, tiling puzzles, and constraint-satisfaction problems, Physical Review E 97, 043303 (2018).
Perera et al. [2020b] D. Perera, F. Hamze, J. Raymond, M. Weigel, and H. G. Katzgraber, Computational hardness of spin-glass problems with tile-planted solutions, Physical Review E 101, 023316 (2020b).
Fan et al. [2023] C. Fan, M. Shen, Z. Nussinov, Z. Liu, Y. Sun, and Y.-Y. Liu, Searching for spin glass ground states through deep reinforcement learning, Nature communications 14, 725 (2023).
[49] A. A. Gangat, (in preparation).
Susa et al. [2018] Y. Susa, Y. Yamashiro, M. Yamamoto, and H. Nishimori, Exponential speedup of quantum annealing by inhomogeneous driving of the transverse field, Journal of the Physical Society of Japan 87, 023002 (2018).
Raymond et al. [2023] J. Raymond, R. Stevanovic, W. Bernoudy, K. Boothby, C. C. McGeoch, A. J. Berkley, P. Farré, J. Pasvolsky, and A. D. King, Hybrid quantum annealing for larger-than-qpu lattice-structured problems, ACM Transactions on Quantum Computing 4, 1 (2023).
Katzgraber et al. [2015] H. G. Katzgraber, F. Hamze, Z. Zhu, A. J. Ochoa, and H. Munoz-Bauza, Seeking quantum speedup through spin glasses: The good, the bad, and the ugly, Physical Review X 5, 031026 (2015).
De las Cuevas and Cubitt [2016] G. De las Cuevas and T. S. Cubitt, Simple universal models capture all classical spin physics, Science 351, 1180 (2016).
Zhang [2023] Z. Zhang, Mapping between Spin-Glass Three-Dimensional (3D) Ising Model and Boolean Satisfiability Problem, Mathematics 11, 237 (2023).
Ferris [2015] A. J. Ferris, Unbiased monte carlo for the age of tensor networks, arXiv preprint arXiv:1507.00767 (2015).
Huggins et al. [2017] W. Huggins, C. D. Freeman, M. Stoudenmire, N. M. Tubman, and K. B. Whaley, Monte Carlo tensor network renormalization, arXiv preprint arXiv:1710.03757 (2017).
Frías Pérez et al. [2023] M. Frías Pérez, M. Mariën, D. Pérez García, M. C. Bañuls, and S. Iblisdir, Collective Monte Carlo updates through tensor network renormalization, SciPost Physics 14, 123 (2023).
Chen et al. [2024] T. Chen, E. Guo, W. Zhang, P. Zhang, and Y. Deng, Tensor network Monte Carlo simulations for the two-dimensional random-bond ising model, arXiv preprint arXiv:2409.06538 (2024).
Charbonneau et al. [2023] P. Charbonneau, E. Marinari, M. Mézard, G. Parisi, F. Ricci-Tersenghi, G. Sicuro, and F. Zamponi, Spin Glass Theory and Far Beyond: Replica Symmetry Breaking after 40 Years (World Scientific, 2023).

Appendix A Tensor network example for conditional marginals

We envision the square and cubic spin lattices as graphs with vertices at each spin. The tensor network construction that we employ entails assigning a matrix to each edge of the graph. The matrix encodes the possible Boltzmann weights for the joint configuration of the two spins that it connects. For example, the matrix $T_{ij}$ assigned to the edge between spin $i$ and spin $j$ takes the form

T_{ij}=\begin{bmatrix}e^{-\beta J_{ij}}&e^{\beta J_{ij}}\\ e^{\beta J_{ij}}&e^{-\beta J_{ij}}\end{bmatrix}.

(2)

We represent these matrices with a larger (blue) circle and two legs (Fig. 9a), where each leg corresponds to an index of the matrix. This assignment of matrices on the square and cubic spin lattices results in a hyperindex at each spin, which is easily handled by the quimb [43] and cotengra [44] Python libraries that we use in this work. However, for pedagogical clarity, in this section we replace each hyperindex with a kronecker delta function, which we represent with a smaller (black) circle and one leg attached to the smaller circle for each index (Fig. 9b). Deleting a leg (as in going from Fig. 9b to Fig. 9c) denotes summing over the corresponding index. Deleting a smaller black circle and its dangling leg and replacing the remaining legs with dashed lines (as in going from Fig. 9b to Fig. 9d) represents a selection of a particular value of the corresponding index, which is equivalent to projecting the corresponding spin onto a particular state. Joining legs together represents a contraction of the corresponding indices (Fig. 9e).

To illustrate how this construction produces the desired marginals, we present an example for the three-site Ising chain with open boundaries. In Fig. 10a, we show the tensor network whose contraction produces a single three-site tensor, which encodes the (unnormalized) marginal distribution for all three spins. Selecting a particular value for a single open leg (i.e., uncontracted index) is equivalent to projecting the corresponding spin onto a particular state. In the case of Ising spins, each leg (i.e., index) in the uncontracted tensor network has dimension two. Selecting particular values for all of the dangling legs produces a tensor network whose contraction gives the (unnormalized) probability for the corresponding spin configuration of all three spins. Summing over the left two dangling legs produces the tensor network in Fig. 10b, whose contraction yields a vector containing the (unnormalized) unconditional marginal distribution for the third spin. Finally, summing over the first dangling leg and selecting a particular value for the middle dangling leg produces the tensor network in Fig. 10c, whose contraction yields the (unnormalized) conditional marginal distribution for the third spin (conditioned upon the selected value for the second spin); the marginal comes out of the contraction as a two-element vector (i.e., a circle with one open leg).

To compute the most likely spin configuration for the full spin glass at a given value of $\beta$ , the following procedure is used: 0) construct the tensor network corresponding to the marginal distribution for the entire spin lattice (call it TN0), 1) make a copy of TN0, and sum over all open indices except for the one corresponding to the first spin, 2) contract this modified copy of TN0 to produce the marginal for the first spin, 3) project the first spin in the original TN0 onto its most probable state as determined from the marginal produced in the previous step, 4) use this updated version of TN0 to perform steps (1)-(3) for the next spin.