This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Tetrationally Compact Entanglement Purification

Craig Gidney craig.gidney@gmail.com Google Quantum AI, Santa Barbara, California 93117, USA
Abstract

This paper shows that entanglement can be purified using very little storage, assuming the only source of noise is in the quantum channel being used to share the entanglement. Entangled pairs with a target infidelity of ϵ\epsilon can be created in O~(log1ϵ)\tilde{O}(\log\frac{1}{\epsilon}) time using O(log1ϵ)O(\log^{\ast}\frac{1}{\epsilon}) storage space, where log\log^{\ast} is the iterated logarithm. This is achieved by using multiple stages of error detection, with boosting within each stage. For example, the paper shows that 11 qubits of noiseless storage is enough to turn entanglement with an infidelity of 1/31/3 into entanglement with an infidelity of 10100000000000000000000000000010^{-1000000000000000000000000000}.

Data availability: All code written for this paper is included in its attached ancillary files.

1 Introduction

If Alice and Bob are linked by a noisy quantum channel, they can use error correction to reduce the noise of the channel. For example, Alice can encode a qubit she wants to send into the [[5,1,3]][[5,1,3]] perfect code [Laf+96], before transmitting over the channel to Bob. This increases the amount of data transmitted, but allows the message to be recovered if one error happens. However, directly encoding the data to be sent is suboptimal. A more effective technique is to use entanglement purification [Ben+96].

In an entanglement purification protocol, Alice never exposes important data to the noisy quantum channel. The channel is only used to share EPR pairs (copies of the state |00+|11|00\rangle+|11\rangle). Once an EPR pair has been successfully shared, the message is moved from Alice to Bob using quantum teleportation [Ben+93]. The benefit of doing this is that, if a transmission error happens when sharing the EPR pair, nothing important has been damaged. EPR pairs are fungible; broken ones can be discarded and replaced with new ones. This allows error correction to be done with error detection, which is more efficient and more flexible. For example, discarding and retransmitting allows a distance three code to correct two errors instead of one.

The time complexity of entanglement purification is simple. Starting from a channel with O(1)O(1) infidelity, creating at least one EPR pair with a target infidelity of ϵ\epsilon requires Θ(log1ϵ)\Theta(\log\frac{1}{\epsilon}) uses of the noisy channel. This time complexity is achievable with a variety of techniques, like encoding the entanglement into a quantum error correcting code with linear distance [PK21]. This time complexity is optimal because, if fewer than Θ(log1ϵ)\Theta(\log\frac{1}{\epsilon}) noisy pairs are shared, the chance of every single physical pair being corrupted would be larger than ϵ\epsilon.

Surprisingly, the space complexity of entanglement purification is not so simple. My own initial intuition, which I think is shared by some other researchers based on a few conversions I’ve had, is that the space complexity would also be Θ(log1ϵ)\Theta(\log\frac{1}{\epsilon}). If storage was noisy then this would be the case, since otherwise the chance of every single storage qubit failing simultaneously would be greater than the desired error rate. However, if you assume the only source of noise is the noisy channel (meaning storage is noiseless, local operations are noiseless, and classical communication is noiseless), then the space complexity drops dramatically even if you still require the time complexity to be Θ~(log1ϵ)\tilde{\Theta}(\log\frac{1}{\epsilon}).

This paper explains a construction that improves entanglement fidelity tetrationally versus storage space, achieving a space complexity of O(log1ϵ)O(\log^{\ast}\frac{1}{\epsilon}). It relies heavily on the assumption of perfect storage, which is unrealistic, but the consequences are surprising enough to merit a short paper. Also, some of the underlying ideas are applicable to practical scenarios.

2 Construction

2.1 Noise Model

I’ll be using a digitized noise model of shared EPR pairs. Errors are modelled by assuming that, when an EPR pair is shared, an unwanted Pauli may have been applied to one of its qubits. There are four Paulis that may be applied: the identity Pauli II resulting in the correct shared state BI=|00+|11B_{I}=|00\rangle+|11\rangle, the bit flip XX resulting in the state BX=|01+|10B_{X}=|01\rangle+|10\rangle, the phase flip ZZ resulting in the state BZ=|00|11B_{Z}=|00\rangle-|11\rangle, and the combined bit phase flip YY resulting in the state BY=|01|10B_{Y}=|01\rangle-|10\rangle.

I will represent noisy EPR pairs using a four dimensional vector of odds. The vector

[wxyz]\begin{bmatrix}w\\ x\\ y\\ z\end{bmatrix}

describes a noisy EPR pair where the odds of the applied Pauli being II, XX, YY, or ZZ are w:x:y:zw:x:y:z respectively. Note that this is a degenerate representation: there are multiple ways to represent the same noisy state. The exact density matrix of a noisy state described by an odds vector is:

ErrModelDensityMatrix([wxyz])=w|BIBI|+x|BXBX|+y|BYBY|+z|BZBZ|w+x+y+z\text{ErrModelDensityMatrix}\left(\begin{bmatrix}w\\ x\\ y\\ z\end{bmatrix}\right)=\frac{w|B_{I}\rangle\langle B_{I}|+x|B_{X}\rangle\langle B_{X}|+y|B_{Y}\rangle\langle B_{Y}|+z|B_{Z}\rangle\langle B_{Z}|}{w+x+y+z} (1)

Note that this noise model can’t represent coherent errors, like an unwanted X9\sqrt[9]{X} gate being applied to one of the qubits of an EPR pair. Some readers may worry that this means my analysis won’t apply to coherent noise. This isn’t actually a problem, because the purification process will digitize the noise. But, for the truly paranoid, coherent noise can be forcibly transformed into incoherent noise by Pauli twirling. If Alice and Bob have an EPR pair that has undergone coherent noise, they can pick a Pauli P{I,X,Y,Z}P\in\{I,X,Y,Z\} uniformly at random then apply PP to both qubits and forget PP. The density matrix describing the state after applying the random PP, not conditioned on which PP was used, is expressible in the digitized noise model I’m using. Discarding coherence information in this way is suboptimal, but sufficient for correctness.

Often, the exact odds will be inconveniently complicated and it will be beneficial to simplify them even if that makes the state worse. For that purpose, I define udecayvu\xrightarrow{\text{decay}}v to mean "a state with odds vector uu can be turned into a state with odds vector vv via the addition of artificial noise". A sufficient condition for this relationship to hold is for the identity term to not grow and for the error terms to not shrink:

(w1w2x1x2y1y2z1z2)([w1x1y1z1]decay[w2x2y2z2])\Big{(}w_{1}\geq w_{2}\land x_{1}\leq x_{2}\land y_{1}\leq y_{2}\land z_{1}\leq z_{2}\Big{)}\implies\left(\begin{bmatrix}w_{1}\\ x_{1}\\ y_{1}\\ z_{1}\end{bmatrix}\xrightarrow{\text{decay}}\begin{bmatrix}w_{2}\\ x_{2}\\ y_{2}\\ z_{2}\end{bmatrix}\right) (2)

2.2 Distilling with Rep Codes

The basic building block used by this paper is distillation with distance 2 rep codes. There are three relevant rep codes: X basis, Y basis, and Z basis. The X basis rep code has the stabilizer XXXX, the logical X observable XIXI, and the logical Z observable ZYZY. The Y basis rep code has the stabilizer YYYY, the logical X observable XZXZ, and the logical Z observable ZZZZ. The Z basis rep code has the stabilizer ZZZZ, the logical X observable XYXY, and the logical Z observable ZIZI. Beware that the choice of observables is slightly non-standard, and that their exact definition matters as it determines how errors propagate through the distillation process. See Figure 1 for circuits implementing these details correctly.

Refer to caption
Figure 1: The operations executed when distilling with a rep code in each basis. Each CompareP\text{Compare}_{P} circuit has a measurement flow +PP(1)m+PP\rightarrow(-1)^{m} as well as decoding flows XLXIX_{L}\rightarrow XI and ZLZIZ_{L}\rightarrow ZI. The PPYP^{P\neq Y} gate removes minus signs coming from the fact that |00+|11|00\rangle+|11\rangle is stabilized by YY-YY instead of +YY+YY. It applies XX when P=XP=X, II when P=YP=Y, and ZZ when P=ZP=Z. Note that these circuits are somewhat non-standard, due to the underlying definitions of XLX_{L} and ZLZ_{L} being somewhat non-standard, to control the exact way in which input sign errors propagate into output sign errors. The DistillP\text{Distill}_{P} circuit has Alice and Bob each locally run CompareP\text{Compare}_{P}, and then use classical communication to compare the results. If the results agree, they keep the output qubits. Otherwise they discard the output qubits and try again with new inputs.

A rep code distillation is performed by Alice and Bob each measuring the stabilizer of the rep code, and comparing their results by using classical communication. If the results differ from the result predicted by assuming they have shared copies of |00+|11|00\rangle+|11\rangle, an error has been detected. Equivalently, an error is detected if the stabilizer of the rep code anticommutes with the combination of the noisy Paulis applied to the input EPR pairs. For example, an X basis rep code will raise a detection event if given a correct EPR pair and a phase flipped EPR pair. Conversely, it won’t detect a problem if given a bit flipped EPR pair or if given two phase flipped EPR pairs.

Stepping back, a rep code distillation takes two noisy EPR pairs as input. It has some chance of detecting an error and discarding. Otherwise it succeeds and produces one output. See Figure 2 for diagrams showing how each rep code catches errors.

To analyze rep code distillation, it’s necessary to be able to compute the chance of discarding and the noise model describing the state output upon success. I’ll use vBwv\stackrel{{\scriptstyle B}}{{\star}}w to represent the probability of a BB-basis rep code distillation detecting an error. I used sympy to enumerate the cases detected by each rep code and symbolically accumulate the probability of a failure. The probabilities of a distillation failure for each basis are:

[w1x1y1z1]X[w2x2y2z2]=(w1+x1)(y2+z2)+(y1+z1)(w2+x2)(w1+x1+y1+z1)(w2+x2+y2+z2)\begin{bmatrix}w_{1}\\ x_{1}\\ y_{1}\\ z_{1}\end{bmatrix}\stackrel{{\scriptstyle X}}{{\star}}\begin{bmatrix}w_{2}\\ x_{2}\\ y_{2}\\ z_{2}\end{bmatrix}=\frac{(w_{1}+x_{1})(y_{2}+z_{2})+(y_{1}+z_{1})(w_{2}+x_{2})}{(w_{1}+x_{1}+y_{1}+z_{1})(w_{2}+x_{2}+y_{2}+z_{2})} (3)
[w1x1y1z1]Y[w2x2y2z2]=(w1+y1)(x2+z2)+(x1+z1)(w2+y2)(w1+x1+y1+z1)(w2+x2+y2+z2)\begin{bmatrix}w_{1}\\ x_{1}\\ y_{1}\\ z_{1}\end{bmatrix}\stackrel{{\scriptstyle Y}}{{\star}}\begin{bmatrix}w_{2}\\ x_{2}\\ y_{2}\\ z_{2}\end{bmatrix}=\frac{(w_{1}+y_{1})(x_{2}+z_{2})+(x_{1}+z_{1})(w_{2}+y_{2})}{(w_{1}+x_{1}+y_{1}+z_{1})(w_{2}+x_{2}+y_{2}+z_{2})} (4)
[w1x1y1z1]Z[w2x2y2z2]=(w1+z1)(x2+y2)+(x1+y1)(w2+z2)(w1+x1+y1+z1)(w2+x2+y2+z2)\begin{bmatrix}w_{1}\\ x_{1}\\ y_{1}\\ z_{1}\end{bmatrix}\stackrel{{\scriptstyle Z}}{{\star}}\begin{bmatrix}w_{2}\\ x_{2}\\ y_{2}\\ z_{2}\end{bmatrix}=\frac{(w_{1}+z_{1})(x_{2}+y_{2})+(x_{1}+y_{1})(w_{2}+z_{2})}{(w_{1}+x_{1}+y_{1}+z_{1})(w_{2}+x_{2}+y_{2}+z_{2})} (5)

I’ll use uBvu\stackrel{{\scriptstyle B}}{{\oplus}}v to represent the state output by distilling a state uu against a state vv with a basis-BB rep code, given that no error was detected. I used sympy to enumerate the cases not detected by each rep code, and symbolically accumulate the weight of each output case. When distillation succeeds, the output states for each basis are:

[w1x1y1z1]X[w2x2y2z2]=[w1w2+x1x2w1x2+x1w2y1y2+z1z2y1z2+z1y2]\begin{bmatrix}w_{1}\\ x_{1}\\ y_{1}\\ z_{1}\end{bmatrix}\stackrel{{\scriptstyle X}}{{\oplus}}\begin{bmatrix}w_{2}\\ x_{2}\\ y_{2}\\ z_{2}\end{bmatrix}=\begin{bmatrix}w_{1}w_{2}+x_{1}x_{2}\\ w_{1}x_{2}+x_{1}w_{2}\\ y_{1}y_{2}+z_{1}z_{2}\\ y_{1}z_{2}+z_{1}y_{2}\\ \end{bmatrix} (6)
[w1x1y1z1]Y[w2x2y2z2]=[w1w2+y1y2x1z2+z1x2w1y2+y1w2x1x2+z1z2]\begin{bmatrix}w_{1}\\ x_{1}\\ y_{1}\\ z_{1}\end{bmatrix}\stackrel{{\scriptstyle Y}}{{\oplus}}\begin{bmatrix}w_{2}\\ x_{2}\\ y_{2}\\ z_{2}\end{bmatrix}=\begin{bmatrix}w_{1}w_{2}+y_{1}y_{2}\\ x_{1}z_{2}+z_{1}x_{2}\\ w_{1}y_{2}+y_{1}w_{2}\\ x_{1}x_{2}+z_{1}z_{2}\\ \end{bmatrix} (7)
[w1x1y1z1]Z[w2x2y2z2]=[w1w2+z1z2x1y2+y1x2x1x2+y1y2w1z2+z1w2]\begin{bmatrix}w_{1}\\ x_{1}\\ y_{1}\\ z_{1}\end{bmatrix}\stackrel{{\scriptstyle Z}}{{\oplus}}\begin{bmatrix}w_{2}\\ x_{2}\\ y_{2}\\ z_{2}\end{bmatrix}=\begin{bmatrix}w_{1}w_{2}+z_{1}z_{2}\\ x_{1}y_{2}+y_{1}x_{2}\\ x_{1}x_{2}+y_{1}y_{2}\\ w_{1}z_{2}+z_{1}w_{2}\\ \end{bmatrix} (8)

Note that uBvu\stackrel{{\scriptstyle B}}{{\oplus}}v is left-associative. Also note that all these operators behave as you would expect with respect to decaying: if a1decaya2a_{1}\xrightarrow{\text{decay}}a_{2}, and b1decayb2b_{1}\xrightarrow{\text{decay}}b_{2}, and all states have less than 50% infidelity, then (a1Bb1)decay(a2Bb2)\left(a_{1}\stackrel{{\scriptstyle B}}{{\oplus}}b_{1}\right)\xrightarrow{\text{decay}}\left(a_{2}\stackrel{{\scriptstyle B}}{{\oplus}}b_{2}\right) and (a1Bb1)(a2Bb2)\left(a_{1}\stackrel{{\scriptstyle B}}{{\star}}b_{1}\right)\leq\left(a_{2}\stackrel{{\scriptstyle B}}{{\star}}b_{2}\right). This allows bounds on actual distilled states to be proved via bounds on decayed distilled states.

Refer to caption
Figure 2: How each rep code detects entanglement errors. The correct state BIB_{I} has stabilizer generators +XX+XX and +ZZ+ZZ. The error state BXB_{X} negates the sign of +ZZ+ZZ to ZZ-ZZ. The error state BZB_{Z} negates the sign of +XX+XX to XX-XX. The error state BYB_{Y} negates the sign of both stabilizer generators. The pair of input EPR pairs has stabilizer generators ±XIXI\pm XIXI, ±ZIZI\pm ZIZI, ±IXIX\pm IXIX, and ±IZIZ\pm IZIZ. The signs are determined by the errors in the inputs. If the inputs are correct, then all signs are positive. For each rep code, there is a combination of the input stabilizers that is transformed by the operations into a ±XX\pm XX stabilizer on the output qubits (and also a ±ZZ\pm ZZ stabilizer). This is why the output qubits are still entangled. When the input states are correct, the output states have positive signs. When the input states have sign errors, these sign errors can propagate into the output states. Purification works by catching bad signs attempting to propagate into the output. Note how it’s possible to combine the input stabilizers to form the stabilizer +PPPP+PPPP, for each Pauli basis PP. The basis PP rep code measures this stabilizer by having Alice locally measure +PPII+PPII and Bob locally measure +IIPP+IIPP. These local results can be combined using classical communication, to get the full measurement of +PPPP+PPPP, allowing Alice and Bob to verify the state was in the +1 eigenstate of +PPPP+PPPP as expected. If one of the inputs was in a state other than BIB_{I} or BPB_{P}, this catches the mistake. This filters out some errors that would otherwise propagate into the output, improving its entanglement fidelity.

2.3 Staging

Purifying with one enormous error detecting code doesn’t work very well. The issue is that, with large codes, the chances of seeing no detection events is extremely small. This can result in over-discarding, where the purification process becomes inefficient because nothing is ever good enough. It’s more efficient to purify with a series of smaller stages, as shown in Figure 3, where each stage applies a small code and only successful outputs are collected to be used as inputs for the next stage.

For example, consider a process where input EPR pairs uu are distilled by an X basis rep code, and then survivor states vv are distilled by a Z basis rep code to produce final states ww. I describe this distillation chain using the notation u𝑋v𝑍wu\xrightarrow{X}v\xrightarrow{Z}w. The [[4,1,2]][[4,1,2]] Shor code [Sho95] is also defined by concatenating a distance two X rep code with a distance two Z rep code, so you may expect a single stage distillation with the [[4,1,2]] code to behave identically. The distillation chain u[[4,1,2]]wu\xrightarrow{[[4,1,2]]}w does produce the same output as u𝑋v𝑍wu\xrightarrow{X}v\xrightarrow{Z}w, but it succeeds at a slightly slower rate. The issue is that when the [[4,1,2]][[4,1,2]] distillation detects a Z error, it costs 4 input pairs because the [[4,1,2]][[4,1,2]] code takes all 4 data qubits in one group. The two stage distillation detects Z errors during the first stage, so a Z error will only costs 2 input pairs.

A simple staging construction that purifies entanglement reasonably well is to alternate between stages that distill using the X basis rep code and stages that distill using the Z basis rep code:

in𝑋u1𝑍u2𝑋u3𝑍u4𝑋𝑋un𝑍out\text{in}\xrightarrow{X}u_{1}\xrightarrow{Z}u_{2}\xrightarrow{X}u_{3}\xrightarrow{Z}u_{4}\xrightarrow{X}\dots\xrightarrow{X}u_{n}\xrightarrow{Z}\text{out}

The error suppression of an X stage followed by a Z stage is quadratic, because any single error will be caught by one stage or the other. Furthermore, each stage only needs one qubit of storage because, as soon as two qubits are ready within one stage, they are distilled into one qubit in the next stage. O(1)O(1) additional stages cost O(1)O(1) additional space to square the infidelity. Therefore, given a target ϵ\epsilon, alternating between X rep code and Z rep code stages can reach that infidelity using O(loglog1ϵ)O(\log\log\frac{1}{\epsilon}) storage.

Refer to caption
Figure 3: A breakdown showing how three stages of rep code distillation are expanded into a circuit implementing the full distillation. Times moves from left to right. Stages and storage are laid out from top to bottom. Each stage’s outputs are sequentially fed into the next stage as inputs. (Note that this diagram assumes no retries occur. If a retry did occur, the failing stage would discard its output and wait for more inputs to arrive in order to run again and produce a new potential output.) (Note that this diagram doesn’t show what boosting looks like. Boosting would correspond to using the output of a stage as one of its inputs, some fixed number of times, before allowing the output to pass to the next stage.)

2.4 Boosting

In the previous subsection, each stage distilled copies of the output from the previous stage. Distillation was always combining two of the best-so-far state. “Boosting” is an alternative technique, where a good state is improved by repeatedly merging mediocre states into it. (Note: in the literature, “boosting” is more commonly called “pumping” [DB03]). In particular, staging as described in the previous subsection results in states being combined like this:

uk+3=((ukZuk)X(ukZuk))Z((ukZuk)X(ukZuk))u_{k+3}=\left(\left(u_{k}\stackrel{{\scriptstyle Z}}{{\oplus}}u_{k}\right)\stackrel{{\scriptstyle X}}{{\oplus}}\left(u_{k}\stackrel{{\scriptstyle Z}}{{\oplus}}u_{k}\right)\right)\stackrel{{\scriptstyle Z}}{{\oplus}}\left(\left(u_{k}\stackrel{{\scriptstyle Z}}{{\oplus}}u_{k}\right)\stackrel{{\scriptstyle X}}{{\oplus}}\left(u_{k}\stackrel{{\scriptstyle Z}}{{\oplus}}u_{k}\right)\right) (9)

That kind of merging requires an amount of storage that increases with the amount of nesting. You can instead combine states like this:

bk+1=((((((bkZbk)Xbk)Zbk)Xbk)Zbk)Xbk)Zbkb_{k+1}=\left(\left(\left(\left(\left(\left(b_{k}\stackrel{{\scriptstyle Z}}{{\oplus}}b_{k}\right)\stackrel{{\scriptstyle X}}{{\oplus}}b_{k}\right)\stackrel{{\scriptstyle Z}}{{\oplus}}b_{k}\right)\stackrel{{\scriptstyle X}}{{\oplus}}b_{k}\right)\stackrel{{\scriptstyle Z}}{{\oplus}}b_{k}\right)\stackrel{{\scriptstyle X}}{{\oplus}}b_{k}\right)\stackrel{{\scriptstyle Z}}{{\oplus}}b_{k} (10)

Here the result can be built up in a streaming fashion, with identical “booster states” arriving one by one to be folded into a single gradually improving “boosted state”.

The upside of boosting is that it squeezes more benefit out of a state that you are able to repeatedly make. The downside of boosting is that it can only be repeated a finite number of times. The boosted state doesn’t get arbitrarily good under the limit of infinite boosting. It approaches a floor set by the booster state’s error rates. A second downside is that, because each booster state has a fixed chance of failing, the chance of detecting an error and having to restart the boosting process limits to 100% as more boosts are performed. The infidelity floor and the growing restart chance force you to eventually stop boosting and start a new stage.

2.5 Bias Boosting Stage

Suppose you have access to a booster state AA where the XX, YY, and ZZ errors of AA are all equal. You are boosting a state where the XX and ZZ terms are equal. The following relationships hold:

[1xyx]Y[1aaa]=[1+ay2axy+a2ax]decay[12axy+a2ax]\begin{bmatrix}1\\ x\\ y\\ x\end{bmatrix}\stackrel{{\scriptstyle Y}}{{\oplus}}\begin{bmatrix}1\\ a\\ a\\ a\end{bmatrix}=\begin{bmatrix}1+ay\\ 2ax\\ y+a\\ 2ax\\ \end{bmatrix}\xrightarrow{\text{decay}}\begin{bmatrix}1\\ 2ax\\ y+a\\ 2ax\\ \end{bmatrix} (11)
[1xyx]Y[1aaa]=2a(1+y)+2x(1+a)(1+x+y+z)(1+a+a+a)2a+2x\begin{bmatrix}1\\ x\\ y\\ x\end{bmatrix}\stackrel{{\scriptstyle Y}}{{\star}}\begin{bmatrix}1\\ a\\ a\\ a\end{bmatrix}=\frac{2a(1+y)+2x(1+a)}{(1+x+y+z)(1+a+a+a)}\leq 2a+2x (12)

In other words: boosting with AA using a YY basis rep code additively increases the chance of a YY error and the chance of discarding, but multiplicatively suppresses XX and ZZ errors.

If this boosting process is repeated 1/a31\lfloor\sqrt[3]{1/a}\rfloor-1 times, the output is:

[1xyx]Y[1aaa]Y[1aaa]YY[1aaa]1/a31timesdecay[1(2a)1/a31xy+a1/a3a(2a)1/a31x]\begin{bmatrix}1\\ x\\ y\\ x\end{bmatrix}\overbrace{\stackrel{{\scriptstyle Y}}{{\oplus}}\begin{bmatrix}1\\ a\\ a\\ a\end{bmatrix}\stackrel{{\scriptstyle Y}}{{\oplus}}\begin{bmatrix}1\\ a\\ a\\ a\end{bmatrix}\stackrel{{\scriptstyle Y}}{{\oplus}}\dots\stackrel{{\scriptstyle Y}}{{\oplus}}\begin{bmatrix}1\\ a\\ a\\ a\end{bmatrix}}^{\lfloor\sqrt[3]{1/a}\rfloor-1\;\text{times}}\xrightarrow{\text{decay}}\begin{bmatrix}1\\ (2a)^{\lfloor\sqrt[3]{1/a}\rfloor-1}x\\ y+a\lfloor\sqrt[3]{1/a}\rfloor-a\\ (2a)^{\lfloor\sqrt[3]{1/a}\rfloor-1}x\\ \end{bmatrix} (13)

If the state being boosted is AA itself, then the result after all these boosts has exponentially smaller XX and ZZ error rates (as long as the infidelity is small enough to begin with, e.g. less than 0.1%):

[1aaa]Y[1aaa]Y[1aaa]YY[1aaa]1/a31timesdecay[112(2a)1/a3a1/a312(2a)1/a3]decay[1exp(1/a3)a2/3exp(1/a3)]\begin{bmatrix}1\\ a\\ a\\ a\end{bmatrix}\overbrace{\stackrel{{\scriptstyle Y}}{{\oplus}}\begin{bmatrix}1\\ a\\ a\\ a\end{bmatrix}\stackrel{{\scriptstyle Y}}{{\oplus}}\begin{bmatrix}1\\ a\\ a\\ a\end{bmatrix}\stackrel{{\scriptstyle Y}}{{\oplus}}\dots\stackrel{{\scriptstyle Y}}{{\oplus}}\begin{bmatrix}1\\ a\\ a\\ a\end{bmatrix}}^{\lfloor\sqrt[3]{1/a}\rfloor-1\;\text{times}}\xrightarrow{\text{decay}}\begin{bmatrix}1\\ \frac{1}{2}(2a)^{\lfloor\sqrt[3]{1/a}\rfloor}\\ a\lfloor\sqrt[3]{1/a}\rfloor\\ \frac{1}{2}(2a)^{\lfloor\sqrt[3]{1/a}\rfloor}\\ \end{bmatrix}\xrightarrow{\text{decay}}\begin{bmatrix}1\\ \exp(-1/\sqrt[3]{a})\\ a^{2/3}\\ \exp(-1/\sqrt[3]{a})\\ \end{bmatrix} (14)

In this sequence of boosts, the first one is the most likely to discard. The chance of any of the boosts discarding can be upper bounded by multiplying this probability by the number of boosts:

P(bias boosting fails)(2a+2a)(1/a31)4a2/3P(\text{bias boosting fails})\leq(2a+2a)\cdot(\lfloor\sqrt[3]{1/a}\rfloor-1)\leq 4a^{2/3} (15)

2.6 Bias Busting Stage

The bias boosting stage creates an enormous disparity between the YY error rate and the other error rates. The bias busting stage fixes this; bringing the YY error rate down to match the others without increasing them by much.

This stage boosts using the biased state BB output from the previous stage. This state has equal XX and ZZ terms that are much smaller than its YY term. The boosts alternate between distilling with the X basis rep code and the Z basis rep code:

[1xyx]X[1αβα]Z[1αβα]=[1+α(x+xβ+yα)xβ+α(β+yβ+xα)yβ2+α(x+xβ+α)xβ+α(1+y+xα)]decay[1xβ+2αyβ2+2αxβ+2α]\begin{bmatrix}1\\ x\\ y\\ x\end{bmatrix}\stackrel{{\scriptstyle X}}{{\oplus}}\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}\stackrel{{\scriptstyle Z}}{{\oplus}}\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}=\begin{bmatrix}1+\alpha(x+x\beta+y\alpha)\\ x\beta+\alpha(\beta+y\beta+x\alpha)\\ y\beta^{2}+\alpha(x+x\beta+\alpha)\\ x\beta+\alpha(1+y+x\alpha)\\ \end{bmatrix}\xrightarrow{\text{decay}}\begin{bmatrix}1\\ x\beta+2\alpha\\ y\beta^{2}+2\alpha\\ x\beta+2\alpha\\ \end{bmatrix} (16)

Assuming the initial infidelity is small enough (e.g. less than 0.1%), the chance of this pair of boosts failing is at most:

([1xyx]X[1αβα])+(([1xyx]X[1αβα])Z[1αβα])8x+4y+4α+4β10β\left(\begin{bmatrix}1\\ x\\ y\\ x\end{bmatrix}\stackrel{{\scriptstyle X}}{{\star}}\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}\right)+\left(\left(\begin{bmatrix}1\\ x\\ y\\ x\end{bmatrix}\stackrel{{\scriptstyle X}}{{\oplus}}\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}\right)\stackrel{{\scriptstyle Z}}{{\star}}\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}\right)\leq 8x+4y+4\alpha+4\beta\leq 10\beta (17)

The simplification to 10β10\beta is done by knowing that, immediately after a bias boosting stage, β\beta will be the largest term and the α\alpha and xx terms will be orders of magnitude smaller.

Repeating this pair of boosts 12logβ(α)\lceil\frac{1}{2}\log_{\beta}(\alpha)\rceil times, starting from BB, reduces all of the error terms below 4α4\alpha:

[1αβα]X[1αβα]Z[1αβα]X[1αβα]Z[1αβα]X[1αβα]Z[1αβα]12logβ(α)timesdecay[14α4α4α]\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}\overbrace{\stackrel{{\scriptstyle X}}{{\oplus}}\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}\stackrel{{\scriptstyle Z}}{{\oplus}}\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}\stackrel{{\scriptstyle X}}{{\oplus}}\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}\stackrel{{\scriptstyle Z}}{{\oplus}}\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}\stackrel{{\scriptstyle X}}{{\oplus}}\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}\stackrel{{\scriptstyle Z}}{{\oplus}}\begin{bmatrix}1\\ \alpha\\ \beta\\ \alpha\end{bmatrix}}^{\lceil\frac{1}{2}\log_{\beta}(\alpha)\rceil\;\text{times}}\xrightarrow{\text{decay}}\begin{bmatrix}1\\ 4\alpha\\ 4\alpha\\ 4\alpha\\ \end{bmatrix} (18)

Since bias busting is always applied immediately after bias boosting, it will be the case that β=a2/3\beta=a^{2/3} and α=exp(1/a3)\alpha=\exp(-1/\sqrt[3]{a}) for some aa. So the number of repetitions is at most 12logβ(α)1/a3\frac{1}{2}\log_{\beta}(\alpha)\leq 1/\sqrt[3]{a}. The chance of discarding across all the bias busting boosts is therefore at most

P(bias busting fails)10β/a3=10a2/3/a310a3=10βP(\text{bias busting fails})\leq 10\beta/\sqrt[3]{a}=10a^{2/3}/\sqrt[3]{a}\leq 10\sqrt[3]{a}=10\sqrt{\beta} (19)

Ensuring that 10a310\sqrt[3]{a} is less than 50% requires starting from an aa below 0.01%0.01\%. You can start boosting from higher error rates aa, without causing a disastrous discard rate, if you do fewer boosts in the bias boosting and bias busting stages.

2.7 Full Construction

An example of a full construction is shown in Figure 4. The first step is to bootstrap from the initial infidelity to an infidelity low enough for the bias boosting and bias busting stages to start working. Bootstrapping is done with unboosted rep code stages. The bases of the rep codes are chosen by brute forcing a sequence that gets all error rates below 0.1% in the fewest steps.

Once the error rate is bootstrapped, the construction begins alternating between bias boosting stages and bias busting stages. The number of repetitions is customized during the first alternation, instead of using the loose bounds specified in previous sections, to get a faster takeoff. Later alternations turn an infidelity of at most aa into an infidelity of at most exp(1/a3)\exp(-1/\sqrt[3]{a}). This is not quite an exponential decrease, due to the cube root, but doing two alternations fixes this and guarantees a reduction to an infidelity below exp(1/a)\exp(-1/a).

Because each stage uses 1 qubit of storage, and a constant number of stages exponentiates the error rate, each additional exponentiation has O(1)O(1) space cost. It takes O(log1ϵ)O(\log^{\ast}\frac{1}{\epsilon}) exponentiations to reach an infidelity of ϵ\epsilon, and therefore the storage needed to reach ϵ\epsilon is O(log1ϵ)O(\log^{\ast}\frac{1}{\epsilon}).

The last detail to discuss is the time complexity. Because the discard rate at each stage is proportional to the infidelity at that stage, and the infidelity is decreasing so much from stage to stage, the sum of the discard rates across all stages converges to a constant as the number of stages limits to infinity. This avoids one way that time costs can become non-linear versus number of stages. That said, the input-to-output conversion efficiency is not quite asymptotically optimal. Each stage is individually achieving a complexity of Θ(logϵin1ϵout)\Theta(\log_{\epsilon_{\text{in}}}\frac{1}{\epsilon_{\text{out}}}), with a constant factor of input-to-output waste. As the number of stages increases, these waste factors compound. As a result, the time complexity is Θ~(log1ϵ)\tilde{\Theta}(\log\frac{1}{\epsilon}) instead of the optimal Θ(log1ϵ)\Theta(\log\frac{1}{\epsilon}).

in[11.71011.71011.7101]\displaystyle\xrightarrow{\text{in}}\begin{bmatrix}1\\ 1.7\cdot 10^{-1}\\ 1.7\cdot 10^{-1}\\ 1.7\cdot 10^{-1}\end{bmatrix} discards 35%X[13.21015.41025.4102]discards 39%Y[13.51021.11011.1101]discards 29%X[17.01022.31022.3102]\displaystyle\xrightarrow[\text{discards}\ 35\%]{\text{X}}\begin{bmatrix}1\\ 3.2\cdot 10^{-1}\\ 5.4\cdot 10^{-2}\\ 5.4\cdot 10^{-2}\end{bmatrix}\xrightarrow[\text{discards}\ 39\%]{\text{Y}}\begin{bmatrix}1\\ 3.5\cdot 10^{-2}\\ 1.1\cdot 10^{-1}\\ 1.1\cdot 10^{-1}\end{bmatrix}\xrightarrow[\text{discards}\ 29\%]{\text{X}}\begin{bmatrix}1\\ 7.0\cdot 10^{-2}\\ 2.3\cdot 10^{-2}\\ 2.3\cdot 10^{-2}\end{bmatrix}
discards 16%Y[13.21034.61025.4103]discards 9%Z[13.01042.21031.1102]discards 3%X[16.01041.21044.7105]\displaystyle\xrightarrow[\text{discards}\ 16\%]{\text{Y}}\begin{bmatrix}1\\ 3.2\cdot 10^{-3}\\ 4.6\cdot 10^{-2}\\ 5.4\cdot 10^{-3}\end{bmatrix}\xrightarrow[\text{discards}\ 9\%]{\text{Z}}\begin{bmatrix}1\\ 3.0\cdot 10^{-4}\\ 2.2\cdot 10^{-3}\\ 1.1\cdot 10^{-2}\end{bmatrix}\xrightarrow[\text{discards}\ 3\%]{\text{X}}\begin{bmatrix}1\\ 6.0\cdot 10^{-4}\\ 1.2\cdot 10^{-4}\\ 4.7\cdot 10^{-5}\end{bmatrix}
discards 2%bias (24 Y boosts)[110.010813.01039.61081]discards 10%bust (15 XZ boosts)[13.010839.910799.61081]\displaystyle\xrightarrow[\text{discards}\ 2\%]{\text{bias (24 Y boosts)}}\begin{bmatrix}1\\ 10.0\cdot 10^{-81}\\ 3.0\cdot 10^{-3}\\ 9.6\cdot 10^{-81}\end{bmatrix}\xrightarrow[\text{discards}\ 10\%]{\text{bust (15 XZ boosts)}}\begin{bmatrix}1\\ 3.0\cdot 10^{-83}\\ 9.9\cdot 10^{-79}\\ 9.6\cdot 10^{-81}\end{bmatrix}
discards<1%bias (1027 Y boosts)[11010281055101028]discards<1%bust (1027 XZ boosts)[1101028101028101028]\displaystyle\xrightarrow[\text{discards}\ <1\%]{\text{bias (}10^{27}\text{ Y boosts)}}\begin{bmatrix}1\\ 10^{-10^{28}}\\ 10^{-55}\\ 10^{-10^{28}}\end{bmatrix}\xrightarrow[\text{discards}\ <1\%]{\text{bust (}10^{27}\text{ XZ boosts)}}\begin{bmatrix}1\\ 10^{-10^{28}}\\ 10^{-10^{28}}\\ 10^{-10^{28}}\end{bmatrix}
Figure 4: An example 10-stage sequence that purifies entanglement with an infidelity of 1/3 into entanglement with an infidelity of 101000000000000000000000000000010^{-10000000000000000000000000000}. Error rates and discard rates were computed using a python script, except for the last two stages. They underflowed standard floating point numbers and so had to be estimated based on bounds described elsewhere in the paper.

3 Conclusion

In this paper, I showed that surprisingly little storage is needed to purify entanglement, assuming storage and local operations and classical communication are noiseless. An infidelity of ϵ\epsilon can be reached using O(log1ϵ)O(\log^{\ast}\frac{1}{\epsilon}) storage and O~(log1ϵ)\tilde{O}(\log\frac{1}{\epsilon}) time. Although I focused on an asymptotic limit more interesting to theory than practice, I’m hopeful that the presented ideas will inspire practical constructions.

References

  • [Ben+93] Charles H. Bennett, Gilles Brassard, Claude Crépeau, Richard Jozsa, Asher Peres and William K. Wootters “Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels” In Physical Review Letters 70.13 American Physical Society (APS), 1993, pp. 1895–1899 DOI: 10.1103/physrevlett.70.1895
  • [Ben+96] Charles H. Bennett, Gilles Brassard, Sandu Popescu, Benjamin Schumacher, John A. Smolin and William K. Wootters “Purification of Noisy Entanglement and Faithful Teleportation via Noisy Channels” In Physical Review Letters 76.5 American Physical Society (APS), 1996, pp. 722–725 DOI: 10.1103/physrevlett.76.722
  • [DB03] W. Dür and H.-J. Briegel “Entanglement Purification for Quantum Computation” In Physical Review Letters 90.6 American Physical Society (APS), 2003 DOI: 10.1103/physrevlett.90.067901
  • [Laf+96] Raymond Laflamme, Cesar Miquel, Juan Pablo Paz and Wojciech Hubert Zurek “Perfect Quantum Error Correction Code” arXiv, 1996 DOI: 10.48550/ARXIV.QUANT-PH/9602019
  • [PK21] Pavel Panteleev and Gleb Kalachev “Asymptotically Good Quantum and Locally Testable Classical LDPC Codes” arXiv, 2021 DOI: 10.48550/ARXIV.2111.03654
  • [Sho95] Peter W. Shor “Scheme for reducing decoherence in quantum computer memory” In Physical Review A 52.4 American Physical Society (APS), 1995, pp. R2493–R2496 DOI: 10.1103/physreva.52.r2493