This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Improved accuracy for decoding surface codes with matching synthesis

Cody Jones ncodyjones@google.com Google Quantum AI, Santa Barbara, CA 93117, USA
Abstract

We present a method, called matching synthesis, for decoding quantum codes that produces an enhanced assignment of errors from an ensemble of decoders. We apply matching synthesis to develop a decoder named Libra, and show in simulations that Libra increases the error-suppression ratio Λ\Lambda by about 10%10\%. Matching synthesis takes the solutions of an ensemble of approximate solvers for the minimum-weight hypergraph matching problem, and produces a new solution that combines the best local solutions, where locality depends on the hypergraph. We apply matching synthesis to an example problem of decoding surface codes with error correlations in the conventional circuit model, which induces a hypergraph with hyperedges that are local in space and time. We call the matching-synthesis decoder Libra, and in this example the ensemble consists of correlated minimum-weight matching using a different hypergraph with randomly perturbed error probabilities for each ensemble member. Furthermore, we extend matching synthesis to perform summation of probability for multiple low-weight solutions and at small computational overhead, approximating the probability of an equivalence class; in our surface code problem, this shows a modest additional benefit. We show that matching synthesis has favorable scaling properties where accuracy begins to saturate with an ensemble size of 60, and we remark on pathways to real-time decoding at near-optimal decoding accuracy if one has an accurate model for the distribution of errors.

I Introduction

A fault-tolerant quantum computer requires real-time decoding of the parity checks that detect errors during the computation [1]. Accuracy of decoding is important as well, since improvements in accuracy can reduce the resource overhead for error correction [2, 3, 4, 5]. Hence decoding algorithms that achieve higher accuracy while being efficient to run in real time are desirable for making useful quantum computers.

Many works have studied accurate decoding for surface codes. If accuracy improvements are relative, then the historical baseline  [6] is Edmonds’ Blossom algorithm [7] for minimum-weight perfect matching (where often the decoder itself is known as MWPM). The observation that correlations between the XX and ZZ lattices (or primal and dual in the original work [8]) could be exploited by edge reweighting [9] led to improved accuracy with correlated MWPM [10, 11, 12, 13]. More recently, a mapping between surface and color codes [14] converts the problem into finding a good decoder for color codes. Accuracy in matching can also be improved with belief propagation [15] or accounting for path degeneracy with multi-path summation [16].

Whereas correlated matching is heuristic with error correlations, there are other decoding algorithms that attempt to compute optimal solutions. Belief propagation [17, 18] can achieve high accuracy but sometimes fails to converge (Ref. [15] falls back to matching). Tensor-network decoders appear to approach optimal accuracy for 2D networks (noiseless syndrome measurement) [19, 20, 21], but contracting a 3D tensor network, required for fault tolerance in surface codes, is believed to be intractable beyond small code distances [22, 23, 24]. Likewise, integer-program decoders have exponential runtime in the worst case [25, 26], though the average runtime for below-threshold error rates could be more favorable; linear-program decoders [27, 28] may also enable circumventing intractable runtimes.

This work introduces a decoder called Libra that utilizes ensembling, where a collection of decoders produce differing decoding solutions, which are analyzed collectively. Ensembling for decoding has appeared in machine-learning decoders [29, 30] and matching with the Harmony decoder [13]. The Harmony decoder, developed by colleagues at the same institution, was an inspiration for how to identify improving cycles in problem-independent way, as opposed to local search that would introduce complexity that depends on problem structure. Both Harmony and Libra access an ensemble of decoders; Harmony makes fewer demands on the ensemble output (majority voting or selecting the global-minimum weight), whereas Libra can synthesize the best local solutions across the ensemble, if the ensemble can produce an assignment of errors.

We remark that the matching synthesis method appears new to the author, but the procedure seems to apply to other problems in optimization beyond decoding quantum codes. Hence this method or something similar may already appear in a different field, under a different name. It was not found in a literature search, though this would be limited by the author’s knowledge of terminology in other fields. The use of a random ensemble and combining pieces of solutions resembles local search in simulated annealing [31, 32, 33] or “crossover” in genetic algorithms [34, 35, 36], but the similarity ends there; our method is not local search or a genetic algorithm. If a closer match to this method is found in prior work, the manuscript will be updated accordingly.

II Decoding from Minimum-Weight Hypergraph Perfect Matching

The minimum-weight hypergraph perfect matching (MWHPM) problem is a generalization of the more commonly encountered minimum-weight perfect matching problem [7]. However, whereas the latter can be solved optimally in polynomial time, MWHPM is an NP-hard problem [37]. Hence the best we can hope for in a new approach to this problem is to improve the trade-off between computational time and approximation to optimal accuracy. Let us first define the MWHPM problem and describe how this problem applies to decoding quantum codes.

Define a weighted hypergraph as G=(V,E)G=(V,E) where VV is a set of vertices and EE is a set of hyperedges. Each hyperedge eEe\in E has some list of vertices v(e)v(e), where the list size is any positive integer (contrast this with a graph, where the size must be two for every edge) and every vertex in the list is in VV. Furthermore, each hyperedge ee has weight w(e)w(e) that in general can be any real number, though we restrict our attention to w(e)0w(e)\geq 0 in this work. Let us furthermore represent the sum of edge weights for a list of hyperedges 𝐞={ei}\mathbf{e}=\{e_{i}\} as

w(𝐞)=ei𝐞w(ei).w(\mathbf{e})=\sum_{e_{i}\in\mathbf{e}}w(e_{i}). (1)

In mapping to decoding stabilizer quantum codes [38], each vertex is a parity check, and each hyperedge corresponds to an error channel. The vertices v(e)v(e) are the parity checks that flip when the error ee occurs. There are different approaches to selecting the hyperedge weights, but we will motivate a typical one. If errors are independent binary channels with probability p(e)p(e) (i.e. weighted coin flip for each error occurring or not), then setting w(e)=log[(1p(e))/p(e)]w(e)=\log[(1-p(e))/p(e)] creates an equivalence between minimizing a sum of hyperedge weights and maximizing posterior probability over error events consistent with the syndrome [9, 19, 20, 13], as described below. Because each hyperedge ee corresponds to a specific error occurring, we will interchangeably refer to ee as an error as well. Likewise, we will say 𝐞\mathbf{e}, a list of hyperedges, is also a configuration of errors.

The MWHPM problem, generalized for our purposes, is defined as follows. For a weighted hypergraph, one is given a “syndrome” SS that is a list of vertices. In decoding quantum codes, the syndrome will be parity checks that have flipped due to errors. Furthermore, let us define a syndrome function s(e)=v(e)s(e)=v(e) for a single hyperedge, and for a list of hyperedges 𝐞={ei}\mathbf{e}=\{e_{i}\}

s(𝐞)=ei𝐞s(ei),s(\mathbf{e})=\bigoplus_{e_{i}\in\mathbf{e}}s(e_{i}), (2)

where s(e1)s(e2)s(e_{1})\oplus s(e_{2}) means symmetric difference of the lists s(e1)s(e_{1}) and s(e2)s(e_{2}), or equivalently mod-2 parity. Hence, the MWHPM optimization problem is finding 𝐞\mathbf{e} that solves

minw(𝐞)s.t.s(𝐞)=S.\min w(\mathbf{e})\;\mathrm{s.t.}\;s(\mathbf{e})=S. (3)

Said in words, all configurations of hyperedges that satisfy s(𝐞)=Ss(\mathbf{e})=S correspond to possible Pauli-error patterns consistent with the observed parity-check violations, and we seek the error pattern that minimizes the weight sum w(𝐞)w(\mathbf{e}) because this event will be the most probable. For simplicity, we call any solution 𝐞\mathbf{e} that satisfies s(𝐞)=Ss(\mathbf{e})=S a “matching for SS”.

The generalization to MWHPM used here is in the meaning of “perfect matching”, implied by the \oplus in Eqn. (2). Here we take perfect matching to mean that the parity of hyperedges incident to a given vertex uu must be odd for uSu\in S and must be even for uSu\notin S. In the traditional formulation of perfect matching for graphs [7], there is no syndrome (or one could say the syndrome is implicitly all vertices), and the number of incident edges to every vertex must be one, instead of any odd number.

To complete the connection between MWHPM and decoding for quantum codes, we additionally associate an “observable list” o(e)o(e) with every hyperedge ee. This does not appear in the optimization problem, but it does determine the result of decoding once a solution to the optimization problem has been chosen. For the quantum code, make some arbitrary choice of logical operators {Oi}\{O_{i}\} (which are Pauli strings in stabilizer codes) that propagate through the error correction circuit. For each error event, o(e)o(e) is the list of operators in {Oi}\{O_{i}\} that anticommutes with ee (i.e. error ee occurring would flip the observed value of OiO_{i}). For a set of hyperedges we can say that

o(𝐞)=ei𝐞o(ei),o(\mathbf{e})=\bigoplus_{e_{i}\in\mathbf{e}}o(e_{i}), (4)

where again \oplus means symmetric difference. Equivalently, if we track observable lists as bitstrings with a 1 in position ii for each OiO_{i}, then the operation \oplus means bitwise XOR.

These logical observables are important for two reasons. First, all error patterns with the same observable list define an equivalence class [9, 19]. Second, when decoding is employed in a fault-tolerant computation, the observable list for the chosen error configuration determines the outcome of logical-qubit measurements.

III Matching Synthesis

Matching synthesis is a procedure for taking two or more solutions to the MWHPM problem, described in the previous section, and producing a new solution that is in general better than any of the inputs (and at least as good as the best input). For two distinct MWHPM solutions, the procedure is:

  1. 1.

    compute the symmetric difference (SD) of the assigned errors,

  2. 2.

    separate the SD into pieces with null syndrome (“cycles”),

  3. 3.

    determine the signed weight of each cycle, and

  4. 4.

    form a new “synthetic matching” that incorporates only the cycles that lower weight.

The rest of this section elaborates on the synthesis process, and later sections describe how this can be used in decoding quantum codes.

Following the nomenclature of Section II, suppose that for decoding a syndrome SS, there are candidate matchings 𝐞\mathbf{e} and 𝐟\mathbf{f}: s(𝐞)=s(𝐟)=Ss(\mathbf{e})=s(\mathbf{f})=S. Let 𝐝=𝐞𝐟\mathbf{d}=\mathbf{e}\oplus\mathbf{f} be the symmetric difference of these two solutions; namely, cancel out the errors assigned in both configurations and keep the errors that appear in only one configuration. By linearity, s(𝐝)=s(𝐞)s(𝐟)=s(\mathbf{d})=s(\mathbf{e})\oplus s(\mathbf{f})=\emptyset, the null syndrome.

Any set of hyperedges 𝐜\mathbf{c} with null syndrome has a property that is key to this work. Let us label any such 𝐜\mathbf{c} a “cycle”, where we will refine the definition below after motivating its use. For any error configuration 𝐞\mathbf{e} and any cycle 𝐜\mathbf{c}, then s(𝐞𝐜)=s(𝐞)s(\mathbf{e}\oplus\mathbf{c})=s(\mathbf{e}). Hence, given one decoding solution s(𝐞)=Ss(\mathbf{e})=S, we can use cycles to generate other solutions. If we are clever about which cycles to incorporate, then we can produce new solutions with lower weight, w(𝐞𝐜)<w(𝐞)w(\mathbf{e}\oplus\mathbf{c})<w(\mathbf{e}) and improve our approximate solution to MWHPM.

Refer to caption
Figure 1: Example showing that two matchings (AA and BB, top) for the same syndrome can be related by flipping two cycles (bottom), where the cycles are collectively the symmetric difference 𝐀𝐁\mathbf{A}\oplus\mathbf{B}.

Let us be explicit about what we mean by “cycle” in our extended version of a hypergraph. For our purposes, a cycle is any hyperedge list 𝐜\mathbf{c} such that s(𝐜)=s(\mathbf{c})=\emptyset and o(𝐜)=o(\mathbf{c})=\emptyset. Setting aside logical observables, this definition subsumes the traditional meaning of cycles on graph. It also includes any configuration of hyperedges such every vertex has an even number of incident hyperedges. The logical observables are furthermore necessary because they define equivalence classes, and we require that given a cycle 𝐜\mathbf{c}, o(𝐞)=o(𝐞+𝐜)o(\mathbf{e})=o(\mathbf{e}+\mathbf{c}) for all 𝐞\mathbf{e}, such that incorporating a cycle does not change the equivalence class for a given solution. An example using edges in a normal graph, which is easier to visualize than a hypergraph, is shown in Fig. 1.

We want to make the best (i.e. lowest weight) matching that we can for a given syndrome SS, and our approach will be to improve an initial solution 𝐞\mathbf{e} by combining with one or more cycles. To proceed, we define a “relative weight” function for cycles:

w(𝐜|𝐞)=w(𝐜𝐞)w(𝐞).w(\mathbf{c}|\mathbf{e})=w(\mathbf{c}\oplus\mathbf{e})-w(\mathbf{e}). (5)

While it might seem like we are going in circles, this relative weight can be expressed as

w(𝐜|𝐞)=w(𝐜\𝐞)w(𝐜𝐞),w(\mathbf{c}|\mathbf{e})=w(\mathbf{c}\backslash\mathbf{e})-w(\mathbf{c}\cap\mathbf{e}), (6)

where 𝐜\𝐞\mathbf{c}\backslash\mathbf{e} means “hyperedges in 𝐜\mathbf{c} and not in 𝐞\mathbf{e}” and 𝐜𝐞\mathbf{c}\cap\mathbf{e} is the interection, “hyperedges in both 𝐜\mathbf{c} and 𝐞\mathbf{e}”. The interpretation is straightforward: in going from 𝐞(𝐞𝐜)\mathbf{e}\rightarrow\left(\mathbf{e}\oplus\mathbf{c}\right), we remove the hyperedges in 𝐜𝐞\mathbf{c}\cap\mathbf{e} and subtract their summed weight, and we we add the hyperedges in 𝐜\𝐞\mathbf{c}\backslash\mathbf{e} and add their weight. The formulation in Eqn. (6) shows that we can compute the relative weight of the cycle with complexity that scales with the cycle size, not the size of the matching as one might think from Eqn. (5). The relative weight is useful because it tells us how much a matching solution improves by combining with a cycle.

Refer to caption
Figure 2: Continuing from the example in Fig. 1, suppose that just one of the cycles 𝐜\mathbf{c} from the lower-right of Fig. 1 has negative relative weight. We can produce a new “synthetic” matching 𝐃=𝐀𝐜\mathbf{D}=\mathbf{A}\oplus\mathbf{c} that has lower weight than either of the source matchings 𝐀\mathbf{A} or 𝐁\mathbf{B}.

So we want to find cycles with negative relative weight, w(𝐜|𝐞)<0w(\mathbf{c}|\mathbf{e})<0, because this would mean w(𝐜𝐞)<w(𝐞)w(\mathbf{c}\oplus\mathbf{e})<w(\mathbf{e}). If we take two matchings 𝐞\mathbf{e} and 𝐟\mathbf{f}, compute the symmetric difference 𝐝=𝐞𝐟\mathbf{d}=\mathbf{e}\oplus\mathbf{f}, and compute w(𝐝|𝐞)w(\mathbf{d}|\mathbf{e}), we are left with the uninspiring result that we are just picking the lower weight matching of the two. Now we reach the core result of matching synthesis. Instead of computing the weight of the entire symmetric difference, we break 𝐝\mathbf{d} into smaller cycles where possible,

𝐝=i𝐝is.t.s(𝐝)=s(𝐝i)=,\mathbf{d}=\bigoplus_{i}\mathbf{d}_{i}\;\;\mathrm{s.t.}\;\;s(\mathbf{d})=s(\mathbf{d}_{i})=\emptyset, (7)

where each 𝐝i\mathbf{d}_{i} is a subset of 𝐝\mathbf{d}. We then compute w(𝐝i|𝐞)w(\mathbf{d}_{i}|\mathbf{e}) for each subset, and filter out the ones that have o(𝐝i)o(\mathbf{d}_{i})\neq\emptyset or w(𝐝i|𝐞)0w(\mathbf{d}_{i}|\mathbf{e})\geq 0. What remains are cycles {𝐜i}\{\mathbf{c}_{i}\} of negative relative weight. We can produce a new matching

𝐠=𝐞(𝐜𝒩𝐜),\mathbf{g}=\mathbf{e}\oplus\left(\bigoplus_{\mathbf{c}\in\mathcal{N}}\mathbf{c}\right), (8)

where the set of negative-relative-weight cycles is 𝒩={𝐜:𝐜𝐝,w(𝐜|𝐞)<0}\mathcal{N}=\{\mathbf{c}:\mathbf{c}\in\mathbf{d},w(\mathbf{c}|\mathbf{e})<0\}. Matching 𝐠\mathbf{g} is derived from 𝐞\mathbf{e} and 𝐟\mathbf{f}, and it will equal 𝐞\mathbf{e} when 𝒩=\mathcal{N}=\emptyset and equal 𝐟\mathbf{f} when 𝒩=𝐝\mathcal{N}=\mathbf{d}. However, something interesting happens when 𝒩\mathcal{N} contains some of, but not all of, the cycles in 𝐝\mathbf{d}: in this case, 𝐠\mathbf{g} is an entirely new “synthetic” matching, and it has lower weight than either 𝐞\mathbf{e} or 𝐟\mathbf{f}. This is depicted with a visual example in Fig. 2.

Splitting a given 𝐝\mathbf{d} with s(𝐝)=s(\mathbf{d})=\emptyset into pieces {𝐝𝐢}\{\mathbf{d_{i}}\} with s(𝐝𝐢)=s(\mathbf{d_{i}})=\emptyset can be done in more than one way. The method we employ here is to simply to break 𝐝\mathbf{d} into connected components, where two hyperedges are connected when they touch a common vertex. This can be computed efficiently using standard techniques in graph search. It results in subsets 𝐝𝐢\mathbf{d_{i}} that satisfy Eqn. (7) by being non-overlapping; note however than Eqn. (7) does not require {𝐝i}\{\mathbf{d}_{i}\} to be non-overlapping. Using connected components does not guarantee the pieces are as small as possible; for example, if there are two cycles that touch at one mutual vertex, they will form a single connected component. We find that isolating cycles by connected components works well for decoding surface codes, as shown in Section V, and we leave the more general problem of cycle decomposition, which can be done with cycle-finding algorithms, to future work.

There is one final step to close the loop on matching synthesis, and that is addressing the observable lists associated with hyperedges. When we separate a symmetric difference 𝐝\mathbf{d} into pieces according to Eqn. (7), such as by connected components, sometimes pieces in {𝐝i}\{\mathbf{d}_{i}\} will be logical operators. Using our previous definitions, any 𝐝i\mathbf{d}_{i} with s(𝐝i)=s(\mathbf{d}_{i})=\emptyset and s(𝐝i)s(\mathbf{d}_{i})\neq\emptyset is a logical operator. Although they are not cycles, logical operators will be useful in two ways for improving our decoding solutions. First, using one solution 𝐞\mathbf{e}, we can generate another solution in a different equivalence class via 𝐞𝐝𝐢\mathbf{e}\oplus\mathbf{d_{i}}, because o(𝐞𝐝𝐢)=o(𝐞)o(𝐝𝐢)o(\mathbf{e}\oplus\mathbf{d_{i}})=o(\mathbf{e})\oplus o(\mathbf{d_{i}}). Second, for two pieces 𝐝i\mathbf{d}_{i} and 𝐝j\mathbf{d}_{j} with s(𝐝i)=s(𝐝j)=s(\mathbf{d}_{i})=s(\mathbf{d}_{j})=\emptyset and o(𝐝i)=o(𝐝j)o(\mathbf{d}_{i})=o(\mathbf{d}_{j}), we can make a cycle 𝐝i𝐝j\mathbf{d}_{i}\oplus\mathbf{d}_{j}, because o(𝐝i𝐝j)=o(𝐝i)o(𝐝j)=o(\mathbf{d}_{i}\oplus\mathbf{d}_{j})=o(\mathbf{d}_{i})\oplus o(\mathbf{d}_{j})=\emptyset. In some cases, adding cycles from a pair of equivalent logical operators, which may not be found by the connected-components method, can yield an improving cycle.

While matching synthesis improves a solution to the MWHPM problem by incorporating cycles of negative relative weight, what can we make of cycles with positive relative weight? One application, which will we show can improve decoding for surface codes, is to record cycles with small positive (or zero) relative weight. Each cycle can generate a new matching, and if the relative weight is small, this will be another approximate solution, with weight that is not the best known, but close to it. Of course, if there are cycles with relative weight zero, they produce alternative solutions with smallest weight found. If these small-weight cycles are non-overlapping, they generate many good solutions to the MWHPM problem: nn non-overlapping cycles encodes 2n2^{n} distinct solutions. We show later that the Libra decoder can efficiently compute the sum of probability over all 2n2^{n} configurations in time linear in nn, and also describe how it handles cases where these small-positive cycles overlap. This is reminiscent of multi-path summation [16] or the way Markov-chain [9], tensor-network [19, 20, 13] or belief-propagation [17, 15, 18] decoders account for multiple error configurations in an equivalence class.

IV Libra decoder

We apply matching synthesis to make a new decoder that we call Libra. Libra works by using an ensemble of decoders to generate multiple distinct matchings, and synthesize from them better matchings. Libra iteratively improves two or more matchings, one for each equivalence class represented. For example, in the simulations of Section V, there are two equivalence classes for a logical memory experiment (e.g. in an XX-basis memory experiment, XX logical operators act trivially, so there are two equivalence classes and not four).

Refer to caption
Figure 3: Architecture of the Libra decoder in this work. The diagram depicts 5 decoders in D1D1D5D5 in the ensemble, but the ensemble size can be arbitrary. Each \oplus operation in this diagram stands for matching synthesis (Sec. III) between the two matchings indicated by arrows pointing into the \oplus. The arrow coming out of the \oplus indicates the result of incorporating the cycles of negative relative weight (relative to the matching from the horizontal arrow, which originates with the complementary matching step). As described in the text, when recording small-positive cycles, the sequence of synthesis stages in Step 3 is performed twice. In the final step, a comparison is made between two matchings after incorporating improving cycles from the ensemble.

The architecture of Libra as implemented in this work is shown in Fig. 3. To initialize a representative for each equivalence class, we first perform complementary matching [9, 39, 40, 41, 42], using correlated MWPM on the unperturbed error hypergraph. Correlated MWPM returns two “edge” matchings for the XX and ZZ graphs, which can be converted to the most-probable assignment of hyperedges within the original hypergraph that contain these matched XX and ZZ edges using the assignment method in Ref. [13]. There are other ways of producing representatives, such as relying on an ensemble to stochastically generate matchings in more than one equivalence class [13]. However, the complementary matching also provides the “complementary gap”, the difference in weight between the two matchings found, which is very predictive of the probability of logical error for a given syndrome [39]. We use the gap to only invoke the ensemble on a small fraction of “hard” cases (where the gap is small), which happens sufficiently rarely (and with likelihood that decreases exponentially with code distance when below threshold) that the average runtime of Libra is dictated by complementary matching, and the cost of running an ensemble of 100 decoders becomes insignificant to the average runtime. Another option would be to use a small ensemble first, where we invoke the larger ensemble if there are least two equivalence classes for the matchings in the first ensemble [13].

There are many ways to choose an ensemble, but we opt for using correlated minimum-weight perfect matching (MWPM) [10, 11, 12, 13] with randomly perturbed hyperedge weights. This is simple to implement (via Stim [43], one can edit DEM files) and is similar to the ensemble in Harmony [13], though not the same. The perturbation is to multiply the probability pipip_{i}\rightarrow p_{i}^{\prime} for each error channel by a log-normally distributed random variable: pi=pirip_{i}^{\prime}=p_{i}r_{i}, where riLognorm(0,σ2)r_{i}\sim\mathrm{Lognorm}(0,\sigma^{2}) is equivalent to ri=exp(ti)r_{i}=\exp(t_{i}) for tiNorm(0,σ2)t_{i}\sim\mathrm{Norm}(0,\sigma^{2}). For example, if σ=ln2\sigma=\ln 2, then the perturbations will have a standard deviation of a factor-of-2, normally distributed in the logarithm of probability. This will ensure that probabilities are normalized, and zero-probability events (if present) remain zero. Furthermore, since hyperedge weights are typically the logarithm of probability, this is effectively adding a normally distributed perturbation to hyperedge weights. To be explicit, we only use these randomly perturbed hypergraph weights to induce an ensemble of correlated-MWPM decoders to give different matching solutions; when we evaluate the weight of a matching or cycle (such as Eqn. (6), we use the unperturbed weights.

For the comparison step in Fig. 3, one could compare the weights of the best solution for each equivalence class (e.g. two for a logical memory experiment) after matching synthesis across the ensemble. One can also use small-positive cycles found during synthesis to generate multiple configurations in each equivalence class, sum the probabilities for each equivalence class, and compare the probabilities for the equivalence classes, in a manner described below. Libra is configured to do both (the first is necessary, and the second comes at negligible additional cost), and Section V reports the results for decoding the surface code.

We describe here the algorithm currently employed for using the small-positive cycles to estimate probabilities for equivalence classes, though many variations are possible. As cycles with nonnegative relative weight are discovered, they are stored in a max-heap of fixed size (in the simulations of Section V, the size is 30). When an improving cycle (one with negative relative weight) is found, the heap is cleared, because the stored positive cycles were relative to a hyperedge configuration that has changed with the improving cycle. Alternatively, one could iterate through the cycles in the heap and only modify or discard as necessary for those overlapping the improving cycle. Because clearing the heap could happen in any synthesis step, Libra currently performs the synthesis procedure twice, where negative cycles are rarely discovered in the second pass, which mostly functions to discover small-positive cycles.

After all matching-synthesis operations are performed, we have a “best matching” for each of the equivalence classes being considered by Libra (e.g. two for the memory experiment). We also have the small-positive cycles that were discovered, from which we can generate many other configurations. For example, if we store 30 such cycles, we could generate as many as 2301092^{30}\approx 10^{9} configurations. The number could be less than 2302^{30} if the cycles stored are not linearly independent (e.g. if the combination of two cycles is another stored cycle). Computing all such configurations generated by these cycles would be impractical. However, at least for surface codes, they tend to be small cycles from disparate parts of the hypergraph, so instead we split the cycles into connected components. For this step, two cycles are connected if they overlap in at least one edge. For each connected component of cycles {𝐜i}\{\mathbf{c}_{i}\}, we compute a relative probability:

pr({𝐜i})=𝐜𝒢({𝐜i})pr(𝐜),p_{r}(\{\mathbf{c}_{i}\})=\sum_{\mathbf{c}\in\mathcal{G}(\{\mathbf{c}_{i}\})}p_{r}(\mathbf{c}), (9)

where 𝒢({𝐜i})\mathcal{G}\left(\{\mathbf{c}_{i}\}\right) is the set of all unique cycles generated by {𝐜i}\{\mathbf{c}_{i}\} and the relative probability for a single cycle 𝐜\mathbf{c} is

pr(𝐜)=p(𝐞𝐜)/p(𝐞),p_{r}(\mathbf{c})=p(\mathbf{e}\oplus\mathbf{c})/p(\mathbf{e}), (10)

i.e. the ratio of probability of the configuration using the cycle to the “base” synthetic matching 𝐞\mathbf{e}. We emphasize relative, because the “null” cycle \emptyset is always an element of any 𝒢\mathcal{G}, corresponding to the synthetic matching itself, and it has a relative probability of 1. If probability is the logarithm of weight, then relative probability can be computed directly from the relative weight of the cycle, which is much faster than a computation over the entire matching. Finally, two implementation notes. First, as we are generating 𝒢({𝐜i})\mathcal{G}\left(\{\mathbf{c}_{i}\}\right), we may encounter a generating set from the connected component {𝐜i}\{\mathbf{c}_{i}\} that is not linearly independent. To guard against this, we use a hash table to store the elements of 𝒢({𝐜i})\mathcal{G}\left(\{\mathbf{c}_{i}\}\right) as they are produced, and skip redundant entries. An alternative would be to compute an independent vector basis such as by Gaussian elimination over binary vectors. Second, it is possible that 𝒢({𝐜i})\mathcal{G}\left(\{\mathbf{c}_{i}\}\right) is very large because it is exponential in the size of {𝐜i}\{\mathbf{c}_{i}\}. For speed, we truncate the sum by only considering a low-order approximation to the exponentially large number of combinations (e.g. use {𝐜i}\{\mathbf{c}_{i}\} as the set to sum over, or all combinations of two elements of {𝐜i}\{\mathbf{c}_{i}\}). In the simulations, we make this approximation if |{𝐜i}|10|\{\mathbf{c}_{i}\}|\geq 10 cycles. Finally, the estimated probability of the equivalence class is

p(equivalenceclass)=p(𝐞)kpr(k),p(\mathrm{equivalence\;class})=p(\mathbf{e})\prod_{k}p_{r}^{(k)}, (11)

where pr(k)p_{r}^{(k)} is the relative probability for the kthk^{\mathrm{th}} connected component in the small-positive cycles.

Refer to caption
Figure 4: Matching synthesis can be performed by a logarithmic-depth binary tree of synthesis steps.

Another generalization of the procedure is the freedom of how to synthesize members of the ensemble. Figure 3 illustrates a sequential procedure where the current synthesized matching updates iteratively with each ensemble member. Alternatively, one could synthesize matchings from the ensemble in a logarithmic-depth binary tree, as shown in Fig. 4, which could have advantages for parallelization of the algorithm. Since synthesis can be performed between two matchings, any binary tree with number of leaves equal to ensemble size can describe a synthesis sequence.

V The surface code with Pauli errors in the circuit model

For our simulations, we simulate a surface code memory experiment, which consists of initializing in a basis (XX or ZZ) by direct initialization of data qubits, repeatedly measuring the stabilizers with a circuit of Clifford gates for some number of cycles, and then measuring the logical qubit in the same basis as initialization, by direct measurement of the data qubits. Noise in the simulation consists of a depolarizing channel after every operation, with a 1-qubit channel for 1-qubit gates (and reset) or a 2-qubit channel for the 2-qubit gate, which is controlled-Z (CZ) in our circuits. Measurement error is modeled as a binary symmetric channel on the classical bit (i.e. flip the classical bit with some probability). The depolarizing channels and measurement error have different probabilities according to the SI-1000 model [44], which is representative of error rates in superconducting qubits [45, 23]. The weighting also captures features seen in other qubit platforms [46, 47], such as 1-qubit gates being more reliable than 2-qubit gates, and measurement being less reliable than coherent operations. In this model, the channel probabilities are only determined by the operation and there is not inhomogeneity in space or time; e.g. all CZ gates have the same depolarizing probability.

Refer to caption
Figure 5: Relative accuracy of the decoders at d=3d=3. Improvement ratio is the ratio of the given decoder’s logical error rate to that of correlated MWPM. For the decoders, global ensembling refers to selecting the single best ensemble member, Libra refers to using matching synthesis, and “Libra degen” refers to estimating degeneracy with small-positive cycles.
Refer to caption
Figure 6: Relative accuracy of the decoders at d=5d=5. As before, improvement ratio is the ratio of the given decoder’s logical error rate to that of correlated MWPM. For the decoders, global ensembling refers to selecting the single best ensemble member, Libra refers to using matching synthesis, and “Libra degen” refers to estimating degeneracy with small-positive cycles.
Refer to caption
Figure 7: Relative accuracy of the decoders at d=7d=7. As before, improvement ratio is the ratio of the given decoder’s logical error rate to that of correlated MWPM. For the decoders, global ensembling refers to selecting the single best ensemble member, Libra refers to using matching synthesis, and “Libra degen” refers to estimating degeneracy with small-positive cycles.
Refer to caption
Figure 8: Relative accuracy of the decoders at d=9d=9. As before, improvement ratio is the ratio of the given decoder’s logical error rate to that of correlated MWPM. For the decoders, global ensembling refers to selecting the single best ensemble member, Libra refers to using matching synthesis, and “Libra degen” refers to estimating degeneracy with small-positive cycles.
Refer to caption
Figure 9: Relative accuracy of the decoders at d=11d=11. As before, improvement ratio is the ratio of the given decoder’s logical error rate to that of correlated MWPM. For the decoders, global ensembling refers to selecting the single best ensemble member, Libra refers to using matching synthesis, and “Libra degen” refers to estimating degeneracy with small-positive cycles.

The simulations here use a simple ensemble for Libra. As described in Section IV, each ensemble member is correlated-MWPM [10, 11, 12, 13] configured with the hypergraph for the circuit (i.e. DEM file [43]), but with the error probabilities randomly perturbed by Lognorm(0,σ2)\sim\mathrm{Lognorm}(0,\sigma^{2}). Half of the members have σ=ln2\sigma=\ln 2 and the other half σ=ln4\sigma=\ln 4. In testing, we found such an inhomogeneous ensemble to lead to a small improvement over a homogeneous ensemble, which may be due to different values of σ\sigma enabling the ensemble to discover a greater variety of cycles. Future work will explore alternative ways to construct an ensemble. Libra also includes a complementary matcher (Fig. 3) with unperturbed hypergraph, as described in Section IV.

The simulations are a surface-code memory experiment, in the logical ZZ basis, using CZ and Hadamard gates, with SI-1000 depolarizing model with parameter p=2e3p=2\mathrm{e-}3. The parameters swept in the simulations are code distance (dd), syndrome rounds (rr), and ensemble size (ss):

  • d{3,5,7,9,11}d\in\{3,5,7,9,11\},

  • r{10,20,30,40,50,100,150,200,250}r\in\{10,20,30,40,50,100,150,200,250\},

  • s{20,40,60,80,100}s\in\{20,40,60,80,100\}.

For every combination of (d,r,s)(d,r,s) we perform correlated MWPM as a baseline, and complementary matching to determine a complementary gap and initialize two equivalence classes in Libra. If the complementary gap is less than 20 dB (i.e. ratio of probability between the two matchings is <100<100), then we run the ensemble; this conditional execution [48, 13] significantly reduces computation time since the fraction of problems where the ensemble is run scales with the logical error rate [39], decreasing with distance. With the ensemble, we produce three decoder predictions:

  • “global” ensembling, which is using the minimum-total-weight error configuration found by an ensemble member [13];

  • Libra using matching synthesis to make improved configurations;

  • Libra “degeneracy” using small-positive cycles to estimate the probability of equivalence classes.

Each of these three ensembled methods use exactly the same ensemble (described above), so they demonstrate the effects of matching synthesis and using small-positive cycles. However, it is important to note that different ensembles (e.g. distributions of weight perturbations for matching) may be better suited to each of the ensembling methods listed above. The size of the heap for smallest-relative-weight cycles is 30 in all cases. When the ensemble is not run because the complementary gap is large, all decoders default to the prediction of correlated MWPM.

We use correlated MWPM as the baseline decoder, which yields error suppression factor [23] Λ3.6\Lambda\approx 3.6 for SI-1000(p=2e3p=2\mathrm{e-}3). We quantify accuracy as the ratio of improvement relative to the correlated-MWPM baseline:

improvementratio(decoder)=ϵMWPMϵdecoder.\mathrm{improvement\;ratio}(\mathrm{decoder})\;=\;\frac{\epsilon_{\mathrm{MWPM}}}{\epsilon_{\mathrm{decoder}}}. (12)

We calculate logical error rate for a memory experiment of rr rounds as

ϵ=12(1(12nfN)(1/r)),\epsilon=\frac{1}{2}\left(1-\left(1-2\frac{n_{f}}{N}\right)^{(1/r)}\right), (13)

where NN is total number of samples and nfn_{f} is number of simulated failures. For all values of (d,r,s)(d,r,s), we simulate to at least 1000 failures for every decoder. For a fixed (d,r)(d,r) combination, all decoders for all values of ss see the same sampled shots.

Refer to caption
Figure 10: Relative accuracy of the decoders as a function of distance for r=10r=10. As before, improvement ratio is the ratio of the given decoder’s logical error rate to that of correlated MWPM. For the decoders, global ensembling refers to selecting the single best ensemble member, Libra refers to using matching synthesis, and “Libra degen” refers to estimating degeneracy with small-positive cycles.
Refer to caption
Figure 11: Relative accuracy of the decoders as a function of distance for r=30r=30. As before, improvement ratio is the ratio of the given decoder’s logical error rate to that of correlated MWPM. For the decoders, global ensembling refers to selecting the single best ensemble member, Libra refers to using matching synthesis, and “Libra degen” refers to estimating degeneracy with small-positive cycles.
Refer to caption
Figure 12: Relative accuracy of the decoders as a function of distance for r=100r=100. As before, improvement ratio is the ratio of the given decoder’s logical error rate to that of correlated MWPM. For the decoders, global ensembling refers to selecting the single best ensemble member, Libra refers to using matching synthesis, and “Libra degen” refers to estimating degeneracy with small-positive cycles.
Decoder 𝚲𝟑,𝟏𝟏\mathbf{\Lambda_{3,11}} 𝚲𝟕,𝟏𝟏\mathbf{\Lambda_{7,11}} 𝚲𝟗,𝟏𝟏\mathbf{\Lambda_{9,11}}
MWPM-corr 3.68 3.64 3.52
global ens.
ens=20 3.70 3.61 3.45
ens=40 3.75 3.64 3.43
ens=60 3.76 3.62 3.37
ens=80 3.78 3.65 3.41
ens=100 3.80 3.67 3.44
Libra
ens=20 3.95 3.97 3.78
ens=40 4.00 4.03 3.85
ens=60 4.01 4.03 3.84
ens=80 4.02 4.04 3.86
ens=100 4.01 4.02 3.83
Libra degen.
ens=20 3.99 4.01 3.84
ens=40 4.05 4.10 3.96
ens=60 4.03 4.04 3.83
ens=80 4.04 4.06 3.85
ens=100 4.05 4.05 3.85
Table 1: Values for error-suppression ratio Λ\Lambda calculated at r=30r=30 for various decoders. The ensemble size is reported as “ens=20”, etc. This table computes Λ\Lambda in three ways, which yield slightly differing results: for logical error rate ϵd\epsilon_{d} at code distance dd, Λ3,11=(ϵ3/ϵ11)1/4\Lambda_{3,11}=\left(\epsilon_{3}/\epsilon_{11}\right)^{1/4}, Λ7,11=(ϵ7/ϵ11)1/2\Lambda_{7,11}=\left(\epsilon_{7}/\epsilon_{11}\right)^{1/2}, and Λ9,11=(ϵ9/ϵ11)\Lambda_{9,11}=\left(\epsilon_{9}/\epsilon_{11}\right).

We compare performance of the ensembled decoders at each dd, showing improvement over correlated-MWPM as a function of rr. In Figs. 58, we see two effects. First, Libra maintains performance (or improves slightly) with number of rounds, while global ensembling decreases; the onset of the decrease becomes earlier with increasing dd, showing that both dd and rr affect performance of global ensembling. Second, including degeneracy by using small-positive cycles to compute probabilities for equivalence classes shows an additional modest benefit over using the best single matching.

We can look at the same data organized differently, by grouping by fixed rr value and plotting improvement as a function of distance. This is plotted for r{10,30,100}r\in\{10,30,100\} in Figures 1012. We can see that Libra shows increasing improvement ratio with distance for all cases. For global ensembling, there is improvement for r=10r=10 with dd, some slowing of improvement for r=30r=30, and a decrease with distance for r=100r=100. Section VI proposes an explanation for this. If we take r=30r=30 as an example, we see that both Libra and Libra-degen. appear to saturate in performance around ensemble size s=60s=60.

Figures 1012 show that the improvement ratio is increasing with dd in both versions of Libra. The rate of improvement can be quantified with Λ\Lambda, and values for the different decoders are calculated in Table 1 for r=30r=30. If we take ensemble size s=100s=100 for example, we find that Λ7,11\Lambda_{7,11} increases from 3.64 to 4.02 (+10%) for Libra or to 4.05 (+11%) when including “degeneracy” from small-positive cycles.

However, we remark that Λ\Lambda is not the only figure of merit – being a ratio of logical error rates, it increases if the numerator (a smaller code) has higher logical error rate. For example, in Table 1, we see that at s=80s=80, Libra has Λ9,11\Lambda_{9,11} = 3.86 and Libra-degen. has Λ9,11\Lambda_{9,11} = 3.84 (as a reminder, these are decoding exactly the same samples of a memory experiment). Conversely, from Fig. 11, we see that Libra-degen. achieves higher improvement ratio (hence lower logical error rate) for d=9d=9 and d=11d=11.

We see overall that Libra is improving by Λ\Lambda by about 10% for the range of parameters studied here, but generalizing to higher distance or different error rates is a matter for future study.

VI Interpretation of Libra

In this section, we conjecture on the mechanism that enables Libra to work well for surface-code decoding. The key concept in Libra is identifying improving cycles. How well this works will depend on the structure of the weighted hypergraph and the nature of the ensemble. In some informal sense, the ensemble needs some “entropy” (yielding a diversity of solutions) but not too much (for a local neighborhood of the hypergraph, one of the ensemble members needs to find a good local solution).

In our simulations with the surface code (Section V), we found that Libra improved accuracy over correlated MWPM [10, 11, 12, 13]. We attribute this to 3D locality of the hypergraph and the fact that the decoders in the ensemble produce a diversity of reasonably good solutions. The surface code is a topological code [8, 1] and the error model is local to each operation, so the hypergraph is local in three dimensions (every hyperedge is contained within a ball of some finite radius, independent of the code distance). Moreover, correlated MWPM is a “pretty good” decoder for the surface code, in the sense that high-accuracy decoders have demonstrated only modestly better performance  [23, 30, 13], meaning Λ\Lambda increases by about 10%10\%. Hence the members of the ensemble tend to produce solutions that are good in a local neighborhood. We speculate that Libra will show benefit with other topological codes, such as color codes [26, 49], provided suitably “good” approximate decoders for the ensemble. Similarly, we expect Libra to generalize naturally to surface-code logical operations [50, 4, 51, 52], since these also have the properties of the error hypergraph being 3D-local and being amenable to matching.

We give some intuition for how Libra improves accuracy of decoding, though we preface that this story will be studied more carefully in future work. When operating below the threshold for error correction, which is the domain of interest for decoding, errors tend to be sparsely distributed [6, 53]. In matching synthesis, this leads to small, local cycles.

Refer to caption
Figure 13: Matching synthesis, and the Libra decoder, filter cycles by relative weight. Each gray box is a matching solution from the ensemble, and the smaller dashed or solid-blue boxes represent cycles of positive or negative relative weight, referenced to a baseline solution (in the text, we use correlated MWPM as a baseline). The synthesized matching only incorporates the improving cycles.

As a frame of reference, let us use the matching result produced by correlated MWPM as the baseline configuration of errors, and we can represent any other configuration as the collection of cycles and logical operators produced by symmetric difference between the two. We observe that with randomly perturbed error probabilities, the cycles discovered in matching synthesis have an average weight that is positive. This means that the majority are not improving cycles. Hence, if we were to take whole matchings from the ensemble and only consider their total weight [13], then we find a better solution than the baseline only when an ensemble member gets lucky enough to sample enough improving cycles to be in the tail of the distribution. In contrast, Libra is able to select the improving cycles individually, shown in Fig. 13.

Refer to caption
Figure 14: Conceptual explanation for how ensembling with total weight of an error configuration converges to the average performance of an ensemble member as size of the error hypergraph increase from (a) to (b), with a distribution of cycles that have average positive relative weight. The conceptual representation of positive- or negative-weight cycles with dashed or solid-blue boxes, respectively, is the same as Fig. 13. The label “base” refers to a baseline decoder which is the reference for the relative weight of cycles – for example, correlated MWPM. The label “min weight” is the optimal solution, shown here only for illustration along the “weight” axis.

One might expect that there will be some density distribution of cycles that is independent of code distance (i.e. diameter of the hypergraph), owing to the local structure of the hypergraph described above. We caution that this is a conjecture, since we have not yet studied this carefully. This distribution could also be correlated with other observed values, such as there being more improving cycles when the complementary gap is small.

If the relative weight of each cycle can be modeled as a real-valued random variable with positive average value, then the “global” difference in weight between an ensemble member and the correlated-MWPM baseline, which is the sum over the cycle relative weights, will also have positive average value. Moreover, by the Central Limit Theorem, the ratio of the average and standard deviation for this global weight difference will shrink if the number of cycles increases (i.e. sample mean converges to distribution mean). We conjecture that the number of cycles will increase with size of the hypergraph (i.e. volume of syndrome, which in a memory experiment is d2r\sim d^{2}r for code distance dd and number of syndrome cycles rr), as depicted in Fig. 14. In such circumstances, using the global lowest-weight configuration in the ensemble as the solution [13] will require the size of the ensemble to increase with the syndrome volume, in order to sample from the tail of the distribution for global weight. In contrast, Libra would be insensitive to this globally imposed Central Limit Theorem because it can select the best local cycles individually. The data in Section V supports this interpretation, where we see that Libra performance is insensitive to dd and rr, whereas global-weight ensembling is sensitive to both and recovers by increasing the size of the ensemble. However, we do not propose a functional form for this dependence. Techniques like windowing [54, 55] or graph partitioning [56] are important for parallelization of decoding, as well as streaming for real-time operation. If ensembling is applied to increase accuracy, windowing and/or partitioning the hypergraph would limit the size of the syndrome volume in the cut-out region, mitigating the volume-dependent effects described above. However, we expect that the minimum size of the syndrome volume will scale d3\sim d^{3} for the surface code, since logical operators of length d\sim d can be oriented in any of the three cardinal directions in 3D. Hence, these considerations become important as quantum error correction scales to large dd.

VII Generalizations to other problems

This work focuses on solving a particular variant of MWHPM (Section II) that is relevant to decoding quantum codes, but the methods appear to be more general. If one is performing constrained optimization where solutions can be decomposed into pieces and evaluated individually, such as Eqn. (3), then it could be possible to synthesize two distinct solutions and keep the best pieces. In contrast to local search in stochastic algorithms [31, 32, 9], this could avoid becoming trapped in local minima [33].

Using small-positive cycles (from Section III) as generators of multiple “good” solutions could also be of interest in optimization. For example, suppose one is optimizing routing in a distribution network, and that there is a formulation of this problem to which matching synthesis or a generalization thereof can be applied. An optimal solution that has no small cycles could be interpreted as “brittle”, because if the real-world circumstances that inform the cost function change, the single solution could become a poor one. In contrast, if there is a solution with many small-positive cycles, it means there are many almost-best solutions that could be generated. In the example, we might say this is a “robust” solution because there are many options to make changes to routing, while still being a pretty good solution. Whether these conjectures can be applied usefully in optimization problems is left for future work.

VIII Discussion

We have presented a method for high-accuracy decoding of surface codes called matching synthesis, and demonstrated improvements in accuracy with a decoder called Libra. We showed that the method makes efficient use of ensembling, reaching saturation around an ensemble size of 60 when using correlated MWPM. Our interpretation of the algorithm and these results is that each matching in the ensemble can be characterized as differing from the optimal solution by some random distribution of cycles on the hypergraph. If the hypergraph has “local structure” whereby these cycles tend to be confined to local neighborhoods, then matching synthesis quickly approaches optimality when the ensemble finds a good solution in each neighborhood with high probability, which is more favorable than requiring an ensemble member to find a good global solution.

A question not answered here is how close Libra comes to optimal decoding, which we will investigate in future work. In the case of surface codes, while it is theoretically interesting to wonder how close one can get to optimal accuracy with a computationally efficient algorithm, what is the practical benefit? If Libra (or any other decoder) increases Λ\Lambda by 10%, then the practical benefit is to reduce the code distance dd required to achieve a target logical error rate ϵ\epsilon according to the approximate formula ϵ=0.1Λ(d+1)/2\epsilon=0.1\Lambda^{-(d+1)/2} [4, 5]. For example, if the target is ϵ=1e12\epsilon=1\mathrm{e-}12 per syndrome round and the baseline decoder (e.g. correlated MWPM) achieves Λ=4\Lambda=4, then the benefit of an improved decoder with Λ=4.4\Lambda=4.4 is to reduce required code distance from d=37d=37 to d=35d=35, reducing qubit overhead by about 1(35/37)211%1-(35/37)^{2}\approx 11\%.

Besides chasing diminishing returns in accuracy, another avenue to applying these results is real-time decoding. An advantage inherent in ensembled decoding [13] is the ability to do most of the decoding in parallel. Furthermore, matching synthesis with a logarithmic-depth tree lends itself to short-depth computation. Instead of making an ensemble out of the already-pretty-good correlated MWPM decoder, one could instead use simpler algorithms for the ensemble like uncorrelated matching [6], Union-Find [57], clustering decoders [58], or renormalization group decoders  [59, 60]. Being less accurate, these might require a larger ensemble; however, if they can run in parallel on massively parallel hardware (e.g. FPGAs or GPUs) and be synthesized in a logarithmic-depth tree, the execution depth might actually be shorter than MWPM.

Finally, the matching synthesis method is not restricted to surface codes. We expect that the technique will translate well to other topological codes, such as color codes [61, 26, 62, 63, 14, 64, 49]. Whether it offers a benefit for other families of codes, such as quantum LDPC codes  [65, 66, 67], is not clear at this time. We leave these investigations to future work.

Acknowledgements.
We thank Michael Newman, Dave Bacon, Oscar Higgott, and Noah Shutty for feedback on the matching synthesis procedure.

References

  • Terhal [2015] B. M. Terhal, Quantum error correction for quantum memories, Rev. Mod. Phys. 87, 307 (2015).
  • Steane [2003] A. M. Steane, Overhead and noise threshold of fault-tolerant quantum error correction, Phys. Rev. A 68, 042322 (2003).
  • Aliferis et al. [2006] P. Aliferis, D. Gottesman, and J. Preskill, Quantum accuracy threshold for concatenated distance-3 codes, Quantum Info. Comput. 6, 97–165 (2006).
  • Fowler et al. [2012] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, Surface codes: Towards practical large-scale quantum computation, Phys. Rev. A 86, 032324 (2012).
  • Jones et al. [2012] N. C. Jones, R. Van Meter, A. G. Fowler, P. L. McMahon, J. Kim, T. D. Ladd, and Y. Yamamoto, Layered architecture for quantum computing, Phys. Rev. X 2, 031007 (2012).
  • Dennis et al. [2002] E. Dennis, A. Kitaev, A. Landahl, and J. Preskill, Topological quantum memory, Journal of Mathematical Physics 43, 4452 (2002).
  • Edmonds [1965] J. Edmonds, Paths, trees, and flowers, Can. J. Math. 17, 449 (1965).
  • Bravyi and Kitaev [1998] S. B. Bravyi and A. Y. Kitaev, Quantum codes on a lattice with boundary,   (1998), Preprint arXiv:quant-ph/9811052.
  • Hutter et al. [2014] A. Hutter, J. R. Wootton, and D. Loss, Efficient markov chain monte carlo algorithm for the surface code, Phys. Rev. A 89, 022326 (2014)Preprint arXiv:1302.2669.
  • Fowler [2013] A. G. Fowler, Optimal complexity correction of correlated errors in the surface code, arXiv:1310.0863  (2013).
  • Paler and Fowler [2023] A. Paler and A. G. Fowler, Pipelined correlated minimum weight perfect matching of the surface code, Quantum 7, 1205 (2023).
  • Higgott et al. [2023a] O. Higgott, T. C. Bohdanowicz, A. Kubica, S. T. Flammia, and E. T. Campbell, Improved decoding of circuit noise and fragile boundaries of tailored surface codes, Physical Review X 13, 031007 (2023a).
  • Shutty et al. [2024] N. Shutty, M. Newman, and B. Villalonga, Efficient near-optimal decoding of the surface code through ensembling, arXiv:2401.12434  (2024).
  • Benhemou et al. [2023] A. Benhemou, K. Sahay, L. Lao, and B. J. Brown, Minimising surface-code failures using a color-code decoder,   (2023), Preprint arXiv:2306.16476.
  • Higgott et al. [2023b] O. Higgott, T. C. Bohdanowicz, A. Kubica, S. T. Flammia, and E. T. Campbell, Improved decoding of circuit noise and fragile boundaries of tailored surface codes, Phys. Rev. X 13, 031007 (2023b).
  • Criger and Ashraf [2018] B. Criger and I. Ashraf, Multi-path Summation for Decoding 2D Topological Codes, Quantum 2, 102 (2018).
  • Old and Rispler [2023] J. Old and M. Rispler, Generalized Belief Propagation Algorithms for Decoding of Surface Codes, Quantum 7, 1037 (2023).
  • Chen et al. [2024] J. Chen, Z. Yi, Z. Liang, and X. Wang, Improved belief propagation decoding algorithms for surface codes,   (2024), Preprint arXiv:2407.11523.
  • Bravyi et al. [2014] S. Bravyi, M. Suchara, and A. Vargo, Efficient algorithms for maximum likelihood decoding in the surface code, Physical Review A 90, 032326 (2014).
  • Chubb and Flammia [2021] C. T. Chubb and S. T. Flammia, Statistical mechanical models for quantum codes with correlated noise, Annales de l’Institut Henri Poincaré D 8, 269 (2021).
  • Chubb [2021] C. T. Chubb, General tensor network decoding of 2d pauli codes, arXiv preprint arXiv:2101.04125  (2021).
  • Piveteau et al. [2023] C. Piveteau, C. T. Chubb, and J. M. Renes, Tensor network decoding beyond 2d, arXiv preprint arXiv:2310.10722  (2023).
  • AI [2023] G. Q. AI, Suppressing quantum errors by scaling a surface code logical qubit, Nature 614, 676 (2023).
  • Bohdanowicz [2022] T. C. Bohdanowicz, Quantum Constructions on Hamiltonians, Codes, and Circuits, Ph.D. thesis, California Institute of Technology (2022).
  • Feldman et al. [2005] J. Feldman, M. J. Wainwright, and D. R. Karger, Using linear programming to decode binary linear codes, IEEE Transactions on Information Theory 51, 954 (2005).
  • Landahl et al. [2011] A. J. Landahl, J. T. Anderson, and P. R. Rice, Fault-tolerant quantum computing with color codes, arXiv preprint arXiv:1108.5738  (2011).
  • Li and Vontobel [2018] J. X. Li and P. O. Vontobel, LP Decoding of Quantum Stabilizer Codes, in 2018 IEEE International Symposium on Information Theory (ISIT) (IEEE Press, 2018) p. 1306–1310.
  • Fawzi et al. [2021] O. Fawzi, L. Grouès, and A. Leverrier, Linear programming decoder for hypergraph product quantum codes, in IEEE ITW 2020 - IEEE Information theory workshop 2020 (Riva del Garda / Virtual, Italy, 2021).
  • Sheth et al. [2020] M. Sheth, S. Z. Jafarzadeh, and V. Gheorghiu, Neural ensemble decoding for topological quantum error-correcting codes, Physical Review A 101, 032338 (2020).
  • Bausch et al. [2023] J. Bausch, A. W. Senior, F. J. Heras, T. Edlich, A. Davies, M. Newman, C. Jones, K. Satzinger, M. Y. Niu, S. Blackwell, et al., Learning to decode the surface code with a recurrent, transformer-based neural network,   (2023), Preprint arXiv:2310.05900.
  • Lin [1965] S. Lin, Computer solutions of the traveling salesman problem, The Bell System Technical Journal 44, 2245 (1965).
  • Lin and Kernighan [1973] S. Lin and B. W. Kernighan, An Effective Heuristic Algorithm for the Traveling-Salesman Problem, Operations Research 21, 498 (1973)https://doi.org/10.1287/opre.21.2.498 .
  • Martin et al. [1991] O. Martin, S. W. Otto, and E. W. Felten, Large-step markov chains for the traveling salesman problem, Complex Systems 5, 299 (1991).
  • Mahfoud and Goldberg [1995] S. W. Mahfoud and D. E. Goldberg, Parallel recombinative simulated annealing: A genetic algorithm, Parallel Computing 21, 1 (1995).
  • Mühlenbein et al. [1988] H. Mühlenbein, M. Gorges-Schleuter, and O. Krämer, Evolution algorithms in combinatorial optimization, Parallel Computing 7, 65 (1988).
  • Manzoni et al. [2020] L. Manzoni, L. Mariot, and E. Tuba, Balanced crossover operators in genetic algorithms, Swarm and Evolutionary Computation 54, 100646 (2020)Preprint arXiv:1904.10494.
  • Håstad [1999] J. Håstad, Clique is hard to approximate within n1εn^{1-\varepsilon}, Acta Mathematica 182, 105 (1999).
  • Gottesman [1998] D. Gottesman, The Heisenberg Representation of Quantum Computers,   (1998), Preprint arXiv:quant-ph/9807006.
  • Gidney et al. [2023] C. Gidney, M. Newman, P. Brooks, and C. Jones, Yoked surface codes, arXiv:2312.04522  (2023).
  • Bombín et al. [2024] H. Bombín, M. Pant, S. Roberts, and K. I. Seetharam, Fault-tolerant postselection for low-overhead magic state preparation, PRX Quantum 5, 010302 (2024).
  • Meister et al. [2024] N. Meister, C. A. Pattison, and J. Preskill, Efficient soft-output decoders for the surface code,   (2024), Preprint arXiv:2405.07433.
  • Smith et al. [2024] S. C. Smith, B. J. Brown, and S. D. Bartlett, Mitigating errors in logical qubits,   (2024), Preprint arXiv:2405.03766.
  • Gidney [2021] C. Gidney, Stim: a fast stabilizer circuit simulator, Quantum 5, 497 (2021).
  • Gidney et al. [2021] C. Gidney, M. Newman, A. Fowler, and M. Broughton, A fault-tolerant honeycomb memory, Quantum 5, 605 (2021).
  • Krinner et al. [2022] S. Krinner, N. Lacroix, A. Remm, A. D. Paolo, E. Genois, C. Leroux, C. Hellings, S. Lazar, F. Swiadek, J. Herrmann, G. J. Norris, C. K. Andersen, M. Müller, A. Blais, C. Eichler, and A. Wallraff, Realizing repeated quantum error correction in a distance-three surface code, Nature 605, 669 (2022).
  • Bluvstein et al. [2024] D. Bluvstein, S. J. Evered, A. A. Geim, S. H. Li, H. Zhou, T. Manovitz, S. Ebadi, M. Cain, M. Kalinowski, D. Hangleiter, J. P. B. Ataides, N. Maskara, I. Cong, X. Gao, P. S. Rodriguez, T. Karolyshyn, G. Semeghini, M. J. Gullans, M. Greiner, V. Vuletić, and M. D. Lukin, Logical quantum processor based on reconfigurable atom arrays, Nature 626, 58 (2024).
  • da Silva et al. [2024] M. P. da Silva, C. Ryan-Anderson, J. M. Bello-Rivas, A. Chernoguzov, J. M. Dreiling, C. Foltz, F. Frachon, J. P. Gaebler, T. M. Gatterman, L. Grans-Samuelsson, D. Hayes, N. Hewitt, J. Johansen, D. Lucchetti, M. Mills, S. A. Moses, B. Neyenhuis, A. Paz, J. Pino, P. Siegfried, J. Strabley, A. Sundaram, D. Tom, S. J. Wernli, M. Zanner, R. P. Stutz, and K. M. Svore, Demonstration of logical qubits and repeated error correction with better-than-physical error rates,   (2024), Preprint arXiv:2404.02280.
  • Delfosse [2020] N. Delfosse, Hierarchical decoding to reduce hardware requirements for quantum computing,  (2020), Preprint arXiv:2001.11427.
  • Gidney and Jones [2023] C. Gidney and C. Jones, New circuits and an open source decoder for the color code,   (2023), Preprint arXiv:2312.08813.
  • Horsman et al. [2012] D. Horsman, A. G. Fowler, S. Devitt, and R. V. Meter, Surface code quantum computing by lattice surgery, New Journal of Physics 14, 123011 (2012).
  • Fowler and Gidney [2018] A. G. Fowler and C. Gidney, Low overhead quantum computation using lattice surgery,   (2018), Preprint arXiv:1808.06709.
  • Gidney [2024] C. Gidney, Inplace Access to the Surface Code Y Basis, Quantum 8, 1310 (2024).
  • Higgott and Gidney [2023] O. Higgott and C. Gidney, Sparse blossom: correcting a million errors per core second with minimum-weight matching, arXiv preprint arXiv:2303.15933  (2023).
  • Skoric et al. [2023] L. Skoric, D. E. Browne, K. M. Barnes, N. I. Gillespie, and E. T. Campbell, Parallel window decoding enables scalable fault tolerant quantum computation, Nat Commun 14, 7040 (2023), Preprint arXiv:2209.08552.
  • Tan et al. [2023] X. Tan, F. Zhang, R. Chao, Y. Shi, and J. Chen, Scalable surface-code decoders with parallelization in time, PRX Quantum 4, 040344 (2023).
  • Wu and Zhong [2023] Y. Wu and L. Zhong, Fusion Blossom: Fast MWPM Decoders for QEC, in 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Vol. 01 (2023) pp. 928–938, Preprint arXiv:2305.08307.
  • Delfosse and Nickerson [2021] N. Delfosse and N. H. Nickerson, Almost-linear time decoding algorithm for topological codes, Quantum 5, 595 (2021), Preprint arXiv:1709.06218.
  • Bravyi and Haah [2013] S. Bravyi and J. Haah, Quantum self-correction in the 3d cubic code model, Phys. Rev. Lett. 111, 200501 (2013).
  • Duclos-Cianci and Poulin [2010a] G. Duclos-Cianci and D. Poulin, Fast decoders for topological quantum codes, Phys. Rev. Lett. 104, 050504 (2010a).
  • Duclos-Cianci and Poulin [2010b] G. Duclos-Cianci and D. Poulin, A renormalization group decoding algorithm for topological quantum codes, in 2010 IEEE Information Theory Workshop (2010) pp. 1–5.
  • Bombin and Martin-Delgado [2008] H. Bombin and M. A. Martin-Delgado, Statistical mechanical models and topological color codes, Phys. Rev. A 77, 042322 (2008).
  • Delfosse [2014] N. Delfosse, Decoding color codes by projection onto surface codes, Phys. Rev. A 89, 012317 (2014).
  • Sahay and Brown [2022] K. Sahay and B. J. Brown, Decoder for the triangular color code by matching on a möbius strip, PRX Quantum 3, 010310 (2022).
  • Kubica and Delfosse [2023] A. Kubica and N. Delfosse, Efficient color code decoders in d2d\geq 2 dimensions from toric code decoders, Quantum 7, 929 (2023).
  • Kovalev and Pryadko [2013] A. A. Kovalev and L. P. Pryadko, Fault tolerance of quantum low-density parity check codes with sublinear distance scaling, Phys. Rev. A 87, 020304 (2013).
  • Breuckmann and Eberhardt [2021] N. P. Breuckmann and J. N. Eberhardt, Quantum low-density parity-check codes, PRX Quantum 2, 040101 (2021).
  • Bravyi et al. [2024] S. Bravyi, A. W. Cross, J. M. Gambetta, D. Maslov, P. Rall, and T. J. Yoder, High-threshold and low-overhead fault-tolerant quantum memory, Nature 627, 778 (2024).