This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Time-adaptive phase estimation

Brennan de Neeve brennan.mn@proton.me Institute for Quantum Electronics, ETH Zürich, Otto-Stern-Weg 1, 8093 Zürich, Switzerland Quantum Center, ETH Zürich, 8093 Zürich, Switzerland    Andrey V. Lebedev Theoretische Physik, Wolfgang-Pauli-Strasse 27, ETH Zürich, CH-8093, Zürich, Switzerland Current affiliation: Dukhov Research Institute of Automatics (VNIIA), Moscow, 127030, Russia Current affiliation: Advanced Mesoscience and Nanotechnology Centre, Moscow Institute of Physics and Technology (MIPT), Dolgoprudny, 141700, Russia    Vlad Negnevitsky Institute for Quantum Electronics, ETH Zürich, Otto-Stern-Weg 1, 8093 Zürich, Switzerland    Jonathan P. Home jhome@phys.ethz.ch Institute for Quantum Electronics, ETH Zürich, Otto-Stern-Weg 1, 8093 Zürich, Switzerland Quantum Center, ETH Zürich, 8093 Zürich, Switzerland
Abstract

Phase estimation is known to be a robust method for single-qubit gate calibration in quantum computers [1], while Bayesian estimation is widely used in devising optimal methods for learning in quantum systems [2]. We present Bayesian phase estimation methods that adaptively choose a control phase and the time of coherent evolution based on prior phase knowledge. In the presence of noise, we find near-optimal performance with respect to known theoretical bounds, and demonstrate some robustness of the estimates to noise that is not accounted for in the model of the estimator, making the methods suitable for calibrating operations in quantum computers. We determine the utility of control parameter values using functions of the prior probability of the phase that quantify expected knowledge gain either in terms of expected narrowing of the posterior or expected information gain. In particular, we find that by maximising the rate of expected gain we obtain phase estimates having standard deviation a factor of 1.431.43 larger than the Heisenberg limit using a classical sequential strategy. The methods provide optimal solutions accounting for available prior knowledge and experimental imperfections with minimal effort from the user. The effect of many types of noise can be specified in the model of the measurement probabilities, and the rate of knowledge gain can easily be adjusted to account for times included in the measurement sequence other than the coherent evolution leading to the unknown phase, such as times required for state preparation or readout.

I Introduction

Phase estimation has found an increasing number of applications in metrology and quantum computing in recent years. Although resources are considered differently in these two settings [3, 4, 5], the methods of each have led to new applications in the other. In metrology, classical averaging limits to estimates with phase uncertainty Δϕ^\Delta\hat{\phi} scaling at best according to the standard quantum limit (SQL), Δϕ^>1/N\Delta\hat{\phi}>1/\sqrt{N}, with the number of resources NN, while general strategies are fundamentally limited only by Heisenberg’s uncertainty principle and can obtain a N\sqrt{N} improvement over the SQL reaching the so-called Heisenberg limit (HL), Δϕ^>π/N\Delta\hat{\phi}>\pi/N [6]. Entanglement was initially thought to be a key ingredient in schemes to reach Heisenberg scaling, Δϕ^1/N\Delta\hat{\phi}\propto 1/N [7], but when it was also found that the same scaling could be reached using sequentially prepared unentangled systems [8, 9, 10, 11, 12, 13, 14]111This is not necessarily true when noise is considered [88]., ideas from quantum computing [16, 17] soon led to new experimentally accessible metrology procedures with this scaling [18]. On the other hand, there have been several more recent proposals to use metrology methods to calibrate operations for quantum computing [19, 20, 21, 22, 1].

Many metrology proposals make use of a Bayesian approach to estimation which provides a natural framework to describe adaptive procedures, where the settings for future experiments are modified based on the results of previous measurements [23, 24, 25, 26, 27, 18, 28, 29, 30, 31]. While adaptive methods can lead to better performance, they are generally more complex than non-adaptive strategies and can also be more difficult to implement in some experiments. Remarkably, Higgins et al. found that they could reach near-optimal Heisenberg scaling using a non-adaptive procedure by optimising the number of measurements performed with different coherent applications of the unknown phase [32, 33].

While initial proposals like that of Higgins et al. [32] considered ideal settings with pure states and unitary operations, the role of noise and experimental imperfections has been increasingly studied over time [34]. This development has been both in the understanding of the fundamental limits to precision [35, 36, 37, 38, 39, 40, 41, 42, 43], and in devising better strategies to cope with non-ideal conditions, where the Bayesian framework with adaptive measurements has proved useful [44, 45, 46, 47, 48, 49, 50]. In some cases, adaptive methods have been used to design strategies better suited to the physics of particular experiments. In recent developments for the sensing of magnetic fields with NV centres, adaptive procedures were used to account for reduced visibility measurements [51, 52, 53, 54, 55, 56, 57, 58]222Readout can also be improved by adaptive methods [89, 90]. In many proposals the adaptive control acts on a phase that can be seen as an adjustment of the measurement basis, but the time of coherent interaction is chosen non-adaptively, e.g. as detailed in [32]. Since the interaction times proposed in [32] are optimised without noise, and thus may not be suited to experimental conditions, some experimenters perform optimisations using numerical simulations that include relevant noise in order to find interaction times better suited to their experiments[60, 61, 62, 53, 63]. Recently, Belliardo and Giovannetti [64] have analytically shown how to modify coherent interaction times in the non-adaptive procedure of Higgins et al. [32] to account for certain types of noise. Others have investigated the possibility of also choosing the interaction time, or equivalently, in the case of some optical measurements, the size of a so-called N00N-state [33]333This is an example where entanglement can be used to convert temporal resources into spacial resources [11]., adaptively: initially in proposals without noise [27, 33], or with mixed-state quantum computation [28], and later in proposals using numerical algorithms that can also account for noise and imperfections [66, 67, 68, 57, 69].

Commonly the performance of a strategy is only considered in the asymptotic regime, when the number of resources (i.e. number of physical systems and total estimation time) approaches infinity. Many experiments, however, could benefit from strategies that are optimised for finite resources. Phase estimation in metrology is typically studied in two settings. In one setting, often referred to as local, the goal is to achieve optimal sensitivity to phase fluctuations from an a priori known phase ϕ=ϕ0\phi=\phi_{0}. In the other setting, referred to as global, it is assumed that there is initially no a priori knowledge of the phase [47]. While the amount of a priori knowledge of the phase is not relevant in the asymptotic regime, it can significantly change the optimal estimation strategies for finite estimation times [70]. Therefore, it is useful to have methods that can account for arbitrary a priori phase knowledge.

In addition, the coherent interaction time in many proposals is optimised under the assumption that this time dominates the experiment time. While this is true in the asymptotic regime, many experiments may not reach this regime for all or at least a significant portion of the estimation time [54]. An extreme example is found in GaAs quantum dots where the measurement time is several orders of magnitude longer than the coherent evolution [71]; here the authors show an exponential improvement in the mean-square error (MSE) of parameter estimation by adaptively choosing the time of coherent evolution.

In this paper we present time-adaptive phase estimation (TAPE), a method that allows for the adaptive optimisation of the coherent interaction time and a control phase to be adjusted to the resources of the experiment. TAPE can provide strategies optimised both when the experiment time is proportional to the time of coherent interaction, and when the experiment time is proportional to the number of measurements; the method also allows for any resource allocation in between these two extreme cases. In addition TAPE can provide optimal strategies for arbitrary a priori knowledge of the phase, because the choice of measurement settings depends only on the prior knowledge of the phase after the last measurement, and not on any record of previous measurements. In contrast to earlier works investigating similar adaptive procedures [27, 33], we use a more general form for the measurement probabilities so that we can account for many types of noise or imperfections.

We propose and analyse several different objective functions for the adaptive parameter selection that lead to near-optimal performance with respect to known theoretical bounds, with and without noise. When the experiment time is proportional to the time of coherent interaction we reach uncertainties in the phase estimates that are a factor of 1.431.43 larger than the HL. In addition, we find that TAPE is quite robust to errors that are not accounted for in the model of the estimator. This demonstrates that, similarly to proposals like robust phase estimation [1], our methods are well-suited to calibration of single-qubit operations in the context of quantum computing.

TAPE uses a numerical representation of the phase knowledge as a Fourier series that can easily describe arbitrary prior knowledge, at the possible expense of increased computation. Since the complexity of the Fourier representation (i.e. number of coefficients to track in memory) increases with phase knowledge, the computation time required for adaptive control of experimental parameters may become too large for some practical applications. To tackle this limitation we propose a method that reduces the interval over which the Fourier representation of the phase is used. This can significantly reduce the required memory and time of computation.

II Bayesian estimation

The evolution and measurement for a single step ss of sequential phase estimation can be described by the quantum circuit in Fig. 1 [72]

\Qcircuit

@C=1em @R=.7em \lstick— 0 ⟩ \gateH \ctrl1 \gateR_z(-α) \gateH \meter\cw
\lstickϕ\qw\gateU^k \qw\qw\qw\qw

Figure 1: Quantum circuit for sequential phase estimation.

where HH is the Hadamard gate, Rz(α)=eiαZ/2R_{z}(-\alpha)=\mathrm{e}^{i\alpha Z/2} with Pauli operator ZZ is a z-rotation, and α[0,π]\alpha\,\in[0,\pi] is a control phase to be optimised. U|ϕ=eiϕ|ϕU|\phi\rangle=\mathrm{e}^{i\phi}|\phi\rangle with |ϕ2n|\phi\rangle\in\mathbb{C}^{2^{n}} an eigenstate of the unitary U2n×2nU\in\mathbb{C}^{2^{n}\times 2^{n}} with unknown phase ϕ[0,2π]\phi\,\in[0,2\pi]. k+k\in\mathbb{N}^{+} determines the number of applications of the unitary evolution. The parameters α\alpha and kk will be chosen adaptively for each measurement as detailed in section III. We denote the possible measurement outcomes at step ss by ξ{±1}\xi\in\{\pm 1\}, and in order to describe a range of noise or measurement errors [54], we assume the outcome ξ\xi to occur with probability

Pξ(α,kϕ)=\displaystyle P_{\xi}(\alpha,k\phi)=
12(1+ξ((1λk)+λkζkcos(αkϕ))),\displaystyle\frac{1}{2}\Big{(}1+\xi\big{(}(1-\lambda_{k})+\lambda_{k}\zeta_{k}\cos\left(\alpha-k\phi\right)\big{)}\Big{)}\,, (1)

where λk,ζk[0,1]\lambda_{k},\zeta_{k}\in[0,1] allow respectively for the description of asymmetry and reduced contrast in the measurement444While λk<1\lambda_{k}<1 describes a reduced probability for the outcome ξ=1\xi=-1, the reverse situation can always be described by relabelling the outcomes.. The subscript kk of λk,ζk\lambda_{k},\zeta_{k} indicates that these values can generally depend on kk. Since the outcome ξ\xi and the control parameters α\alpha and kk depend on ss we should really denote them ξs,αs,ks\xi_{s},\alpha_{s},k_{s} but we omit the subscripts for simplicity of notation.

As in [27, 33, 51] we use a Fourier series to write the probability density describing our knowledge of the phase ϕ\phi at step ss

ps(ϕ)=n=cn(s)einϕ.p_{s}(\phi)=\sum_{n=-\infty}^{\infty}c_{n}^{(s)}\mathrm{e}^{in\phi}\,. (2)

Given prior knowledge ps1(ϕ)p_{s-1}(\phi) and measurement outcome ξ\xi, we update our state of knowledge using Bayes’ theorem

ps(ϕ|ξ;α,k)\displaystyle p_{s}(\phi|\xi;\alpha,k)
=Pξ(α,kϕ)ps1(ϕ)Πξ(α,k)\displaystyle=\frac{P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)}{\Pi_{\xi}(\alpha,\,k)}
Pξ(α,kϕ)ps1(ϕ)\displaystyle\propto P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)
=n=[12(1+ξ(1λk))cn(s1)\displaystyle=\sum_{n=-\infty}^{\infty}\Bigg{[}\frac{1}{2}\big{(}1+\xi(1-\lambda_{k})\big{)}c_{n}^{(s-1)}
+ξλkζk4(eiαcn+k(s1)+eiαcnk(s1))]einϕ,\displaystyle+\xi\lambda_{k}\frac{\zeta_{k}}{4}\left(\mathrm{e}^{i\alpha}c_{n+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{n-k}^{(s-1)}\right)\Bigg{]}\mathrm{e}^{in\phi}\,, (3)

where

Πξ(α,k)=02πdϕ2πPξ(α,kϕ)ps1(ϕ)\Pi_{\xi}(\alpha,k)=\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)

is the posterior probability of outcome ξ\xi. Equation (3) specifies how to modify the coefficients cn(s1)cn(s)c_{n}^{(s-1)}\rightarrow c_{n}^{(s)} given the measurement outcome.

We choose to use the estimator

ϕ^=arg02πdϕ2πps(ϕ)eiϕ=arg(c1(s)).\hat{\phi}=\arg\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\mathrm{e}^{i\phi}=\arg\left(c_{-1}^{(s)}\right)\,. (4)

A nice feature of the Fourier series representation for ps(ϕ)p_{s}(\phi) is that the estimator (4) depends only on one coefficient. Given an estimate ϕ^\hat{\phi} of ϕ\phi (in general now, not necessarily given by (4)), we quantify the uncertainty in ϕ^\hat{\phi} by the square root of the Holevo variance [74]:

Δϕ^=V(ϕ^)=S(ϕ^)21,\displaystyle\Delta\hat{\phi}=\sqrt{V(\hat{\phi})}=\sqrt{S(\hat{\phi})^{-2}-1}\,, (5)

where S(ϕ^)=|eiϕ^|S(\hat{\phi})=|\langle\mathrm{e}^{i\hat{\phi}}\rangle| is the sharpness and the angular brackets indicate an average over the estimates ϕ^\hat{\phi}. If the estimate is biased, one can use S=cos(ϕ^ϕ)S=\langle\cos(\hat{\phi}-\phi)\rangle instead, where ϕ\phi is the value of the correct system phase [33]. In [26] the authors show that the estimator (4) minimises the Holevo variance (5).

III Adaptive procedures

In the previous section we have described how in general to use Bayes’ theorem to update our knowledge of the phase ϕ\phi given the measurement outcomes, and how to obtain an estimate ϕ^\hat{\phi} from the density ps(ϕ)p_{s}(\phi) at step ss. The task of achieving minimal uncertainty in the estimates obtained from (4) is then dependent on the choices of the values of the control phase α\alpha and the number kk of applications of UU. In a sequential procedure, the state coherently evolves according to UkU^{k} for a time proportional to kk, so that the choice of kk corresponds to choosing the time of coherent evolution. In the following we will assume such a sequential procedure, but we note that the results can also be applied in the case of some parallel procedures by using entanglement.

Given prior knowledge ps1(ϕ)p_{s-1}(\phi) at step s1s-1 we look to choose the optimal control phase for a given value of kk by maximising a function of ps1(ϕ)p_{s-1}(\phi), kk, and α\alpha that quantifies the expected knowledge gain from the next measurement. One might then expect that the optimal choice of kk can be determined by maximising the expected knowledge gain over all possible values of kk. This can be a good choice when the evolution time of UkU^{k} is negligible compared to other times in the experiment, as in the example of GaAs quantum dots mentioned above [71]. But in general experiments with different values of kk require different resources in terms of execution time, so that the expected knowledge gains for different values of kk are not directly comparable555Even more generally, which resources are valuable depends on the setting and what the experimenter wants to optimise. e.g. they may have a restricted number of qubits to measure, so that number of measurements becomes the relevant resource. In this sense it is useful to have a method where the optimisation can be adjusted by the experimenter to describe best how they value their resources..

In [27, 33], the authors studied a method where the value of kk is chosen adaptively. In [27], kk corresponds to the number of photons in a N00N state, however as shown for the noiseless case in [33] the sequential procedure we discuss here is mathematically equivalent. This equivalence also holds under certain types of noise [13, 14]. Physically these two procedures are different but they are connected by the use of entanglement to convert between temporal and spatial resources [11].

In the case that the experiment time is proportional to kk, the relevant resource is the number of applications of UU. This way of considering resources, which is typically chosen in quantum metrology, is considered in [27, 33]. In these works the authors calculate the expected (differential) entropy 666The differential entropy is the entropy of a continuous random variable, but it lacks some important properties of the Shannon entropy for discrete random variables. See e.g. [91], chapter 8. In this manuscript we will usually write simply “entropy” when referring to the differential entropy. of the posterior after the next measurement using a Gaussian approximation. Motivated by the resource dependence on kk they choose the value of kk that minimises the expected entropy divided by ln(N)\ln(N). Here NN refers to the total number of resources used since the beginning of the estimation sequence, where an initially uniform prior is assumed (no prior knowledge of the phase). Since this method requires knowing NN assuming an initially uniform prior and because the priors in the later stages of the estimation must be approximately Gaussian, it is not well suited to incorporating arbitrary prior knowledge. Moreover, the approach of dividing the entropy by ln(N)\ln(N) is based on scaling of the uncertainty in the phase that is only valid without noise.

Here we study two different methods for adaptively choosing kk that have the following features:

  1. 1.

    The resource dependence on kk is adjustable so that it can be chosen to match a given experiment. Since we consider sequential procedures, we take the resource requirement for performing an experiment with a particular value of kk to be the time required to perform that experiment.

  2. 2.

    The choice of kk for a given measurement is determined only by the prior probability density at that step of the estimation sequence.

  3. 3.

    The methods can be applied to different functions quantifying the expected knowledge gain associated with a particular choice of kk. We study two such possible functions possessing different benefits: the expected sharpness gain and the expected (differential) entropy gain.

In the following we show how to calculate these two functions quantifying expected knowledge gain exactly; with the exact expressions in hand these functions can be computed for any prior, not only Gaussian. The expected knowledge gains are calculated from expressions that can describe experiments with noise. This allows the methods to determine good phase estimation procedures for noisy experiments, as well as in the noise-free case.

III.1 Expected knowledge gain

Given prior knowledge ps1(ϕ)p_{s-1}(\phi) at step s1s-1, one might expect that in order to minimise the Holevo variance of the estimates ϕ^\hat{\phi} (equation (5)) at step ss, a good strategy can be to choose the control phase α\alpha that minimises the expected Holevo variance of the posterior Bayesian probability density for the next measurement. However, it is shown in [26, 33] that in order to minimise the Holevo variance of the estimates ϕ^\hat{\phi} one should choose α\alpha to maximise the expected sharpness of the posterior Bayesian probability density for the next measurement

ξΠξ(α,kϕ)S[ps(ϕ|ξ;α,k)],\displaystyle\sum_{\xi}\Pi_{\xi}(\alpha,k\phi)S\left[p_{s}(\phi|\xi;\alpha,k)\right]\,,

where we denote the sharpness of ps(ϕ)p_{s}(\phi) by

S[ps(ϕ)]\displaystyle S\left[p_{s}(\phi)\right] =|02πdϕ2πps(ϕ)eiϕ|=|c1(s)|.\displaystyle=\left|\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\mathrm{e}^{i\phi}\right|=\left|c_{-1}^{(s)}\right|\,. (6)

Similarly to the estimate (4), a nice feature of the Fourier series representation for ps(ϕ)p_{s}(\phi) is that the sharpness (6) depends only on a single Fourier coefficient. Maximising the expected sharpness of ps(ϕ)p_{s}(\phi) for the next measurement is a good strategy since the sharpness of the estimates can be written as the average of S[ps(ϕ)]S\left[p_{s}(\phi)\right] over possible measurement records; maximising the sharpness of the estimates is equivalent to minimising the Holevo variance of the estimates [33]. We note that the average of S[ps(ϕ)]S\left[p_{s}(\phi)\right] over possible measurement records considered in [33] assumes a uniform prior at the beginning of the estimation sequence, and this strategy may be less optimal for other priors.

In order to evaluate the best strategy for the measurement at step ss, we define the expected sharpness gain as

ΔsS(α,k)\displaystyle\Delta_{s}S(\alpha,\,k)\equiv
ξΠξ(α,k)(S[ps(ϕ|ξ;α,k)]S[ps1(ϕ)])\displaystyle\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\Big{(}S\left[p_{s}(\phi|\xi;\alpha,k)\right]-S\left[p_{s-1}(\phi)\right]\Big{)}
=|c1(s1)|+ξ|12(1+ξ(1λk))c1(s1)\displaystyle=-\left|c_{-1}^{(s-1)}\right|+\sum_{\xi}\bigg{|}\frac{1}{2}\big{(}1+\xi(1-\lambda_{k})\big{)}c_{-1}^{(s-1)}
+ξλkζk4(eiαc1+k(s1)+eiαc1k(s1))|,\displaystyle\qquad\qquad\qquad+\xi\lambda_{k}\frac{\zeta_{k}}{4}\left(\mathrm{e}^{i\alpha}c_{-1+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{-1-k}^{(s-1)}\right)\bigg{|}\,, (7)

rather than working with the expected sharpness of the posterior directly. The reason for working with the gain will become clear later on when we consider gain rates. Similarly, we define the expected entropy gain for the probability density ps(ϕ)p_{s}(\phi) at step ss,

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k)\equiv
ξΠξ(α,k)(H[ps1(ϕ)]H[ps(ϕ|ξ;α,k)]),\displaystyle\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\Big{(}H[p_{s-1}(\phi)]-H[p_{s}(\phi|\xi;\alpha,k)]\Big{)}\,, (8)

where H[p(ϕ)]H[p(\phi)] is the differential entropy of p(ϕ)p(\phi),

H[p(ϕ)]\displaystyle H[p(\phi)] =02πdϕ2πp(ϕ)ln(p(ϕ)2π).\displaystyle=-\int_{0}^{2\pi}\frac{d\phi}{2\pi}p(\phi)\ln\left(\frac{p(\phi)}{2\pi}\right)\,.

We note that ΔsH(α,k)\Delta_{s}H(\alpha,\,k) can be seen as the expected information gain from the next measurement. It can be rewritten as (see Supplemental Material, section S.V)

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =ξΠξ(α,k)DKL[psps1],\displaystyle=\sum_{\xi}\Pi_{\xi}(\alpha,k)D_{KL}[p_{s}\|p_{s-1}]\,,

where we denoted ps1=ps1(ϕ)p_{s-1}=p_{s-1}(\phi), ps=ps(ϕ|ξ;α,k)p_{s}=p_{s}(\phi|\xi;\alpha,k) for short, and DKL[psps1]D_{KL}[p_{s}\|p_{s-1}] is the Kullback-Leibler divergence of the posterior from the prior [77, 78]. In ref. [50] the authors show that maximising expected entropy gain maximises the likelihood of estimating the true phase. In terms of the coefficients of the prior, this can be computed by an expression of the form

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =f(λk,ζk)ξΠξ(α,k)ln(Πξ(α,k))\displaystyle=f(\lambda_{k},\zeta_{k})-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)
+m=1(Am𝔢{ei2mαc2mk(s1)}\displaystyle+\sum_{m=1}^{\infty}\Big{(}A_{m}\mathfrak{Re}\left\{\mathrm{e}^{i2m\alpha}c_{2mk}^{(s-1)}\right\}
+Bm𝔢{ei(2m1)αc(2m1)k(s1)}).\displaystyle+B_{m}\mathfrak{Re}\left\{\mathrm{e}^{i(2m-1)\alpha}c_{(2m-1)k}^{(s-1)}\right\}\Big{)}\,. (9)

The mm-dependent coefficients AmA_{m} and BmB_{m} are generally also functions of λk\lambda_{k} and ζk\zeta_{k}. Since our knowledge p(ϕ)p(\phi) after a finite number of measurements is always described by a finite Fourier series, the sum over mm is always finite. The derivation of this expression as well as the exact expressions for ff, AmA_{m}, and BmB_{m} are given in the Supplemental Material, section S.V.

Before discussing in detail how to use the expected gains to choose kk we discuss the choice of the control phase α\alpha and compare the benefits of using sharpness to quantify expected knowledge gain versus entropy. For both methods described in section III.2 below, computing the expected gain for a given value of kk always involves an optimisation of the control phase α\alpha for that particular kk-value. An example prior is shown in figure 2 and the values of the expected entropy and sharpness gains for the next measurement are plotted for k5k\leq 5. The expected entropy gain increases significantly from k=1k=1 to k=2k=2 while the increase for larger kk-values is smaller. The expected sharpness gain also increases significantly from k=1k=1 to k=2k=2, but contrary to the expected entropy gain, decreases for k>2k>2, and becomes zero for k>4k>4. This is because as kk increases, it is increasingly likely that the posterior will have multiple peaks. When k>4k>4 the effect of the measurement leads to a posterior density that is split into multiple peaks with an envelope similar to the prior density, so that the sharpness is unchanged. Conversely, the entropy still increases when the probability density is split into multiple peaks, so that the expected entropy gain is large for all values of kk.

Refer to caption
Figure 2: An example prior (top) is plotted along with the expected entropy gain, ΔH(α,k)\Delta H(\alpha,k) (middle) and expected sharpness gain ΔS(α,k)\Delta S(\alpha,k) (bottom) as a function of α\alpha for k5k\leq 5. ΔH(α,k)\Delta H(\alpha,k) and ΔS(α,k)\Delta S(\alpha,k) have a period of π\pi in α\alpha (if we see α\alpha as a change of the measurement basis, then α=π\alpha=\pi corresponds to the change |0|1|0\rangle\rightarrow|1\rangle, |1|0|1\rangle\rightarrow|0\rangle). The expected entropy gain increases to just over 0.30.3 for large kk, while the expected sharpness gain increases from k=1k=1 to 22, and then decreases, reaching zero for k>4k>4. The gains for k>1k>1 do not align with p(ϕ)p(\phi) since the control phase α\alpha is applied once per measurement, whereas the unknown phase ϕ\phi is applied kk times.

These differences between the entropy and the sharpness make them useful for different purposes. Maximising the expected sharpness gain is useful to ensure that the probability density p(ϕ)p(\phi) has a single peak – this is important for obtaining an accurate estimate of the phase. On the other hand, the entropy allows to quantify good strategies when there are multiple peaks in p(ϕ)p(\phi). In figure 3 we plot a prior with four peaks along with the expected entropy and sharpness gains to illustrate these differences. We see that the maximum entropy gain corresponds to a measurement with k=2k=2 that will eliminate two of the peaks in p(ϕ)p(\phi) with high probability. In contrast, the expected sharpness gain for k=2k=2 is zero because the density with either four or two equally spaced peaks has zero sharpness, meaning that the strategy of eliminating two peaks is not quantified by the sharpness. The expected sharpness gain in this case is non-zero only for odd values of kk since they will lead to a narrowing of the envelope of p(ϕ)p(\phi). The entropy quantifies strategies that sharpen the overall envelope (k=1,3k=1,3), the strategy of eliminating two peaks (k=2k=2), and the strategy of sharpening the individual peaks (k=4k=4).

Refer to caption
Figure 3: An example prior with four equally-spaced peaks (top) is plotted along with the expected entropy gain, ΔH(α,k)\Delta H(\alpha,k) (middle) and expected sharpness gain ΔS(α,k)\Delta S(\alpha,k) (bottom) as a function of α\alpha for k4k\leq 4. ΔH(α,k)\Delta H(\alpha,k) quantifies three different strategies: sharpening the envelope of p(ϕ)p(\phi) (k=1,3k=1,3), eliminating two peaks (k=2k=2), and sharpening individual peaks (k=4k=4). ΔS(α,k)\Delta S(\alpha,k) quantifies only strategies that sharpen the envelope (k=1,3k=1,3) and ΔS(α,k)=0\Delta S(\alpha,k)=0 for k=2,4k=2,4. ΔS(α,k)\Delta S(\alpha,k) is independent of α\alpha because the envelope of the prior is flat.

III.2 Choosing kk

We now describe the two methods for choosing kk. As noted above, if experiments with different values of kk require different amounts of resources, comparing the expected knowledge gain for different kk does not directly provide a means for choosing kk.

Let 𝐊={k1,k2,,kL}\mathbf{K}=\{k_{1},k_{2},\dots,k_{L}\} be a vector of possible kk-values for the next measurements, and let 𝐓={t1,t2,,tL}\mathbf{T}=\{t_{1},t_{2},\dots,t_{L}\} be the vector of corresponding times; i.e. tnt_{n} is the time required to perform an experiment with k=knk=k_{n}. The methods for choosing kk take the vectors 𝐊\mathbf{K} and 𝐓\mathbf{T} as input. The value of the expected gain used by these methods is the maximum over possible values of α[0,π]\alpha\in[0,\pi].

III.2.1 Multi-step gain method

The first method we study for the adaptive choice of kk is based on the following idea: if we compute the expected gain for more than one measurement we can compare the expected knowledge gain for different sequences of measurements that take the same total time. Comparing many possible kk-values in this way will generally lead to an expensive optimisation since there can be many measurement sequences that take the same total time, and because the complexity of computing the expected knowledge gain grows exponentially with the number of measurements (also see the Supplemental Material, section S.III). Therefore we devise a technique which allows us to compare only a few kk-values at a time.

The following example illustrates how this method proceeds. Suppose the amount of resources per kk-value is proportional to kk (𝐓𝐊\mathbf{T}\propto\mathbf{K}) and we restrict the choice of kk to the set 𝐊={2n1,n=1,2,,L}\mathbf{K}=\{2^{n-1},n=1,2,\ldots,L\}. We can then compute the expected knowledge gain for performing two measurements with k=1k=1 versus the gain for performing one measurement with k=2k=2. If the former is greater, we perform an experiment with k=1k=1. Otherwise, we can compute the expected knowledge gain for performing two measurements with k=2k=2 versus that for performing one measurement with k=4k=4. If the former is greater, we perform an experiment with k=2k=2, and so on. Note that calculating the expected gain for multiple measurements requires optimising one control phase α\alpha for each measurement. This optimisation is performed sequentially starting with the first measurement. A detailed description is given in the Supplemental Material, section S.III.

In this example we see that we need only compute the expected knowledge gain for one or two measurements and for only a few possible sequences of measurements. We expect this to converge to a locally optimal choice of kk. This method for choosing kk can be generalised to other possible input vectors 𝐊\mathbf{K} and 𝐓\mathbf{T} using an algorithm which is given in the Supplemental Material, section S.III. It works with vectors 𝐊\mathbf{K} and 𝐓\mathbf{T} sorted in increasing order, under the assumption that kj>kik_{j}>k_{i} implies tjtit_{j}\geq t_{i}, and restricted to ti/ti1<32/7t_{i}/t_{i-1}<32/7. In this general algorithm we restrict the computation of expected knowledge gain to at most 5 measurements to prevent the computation from becoming too expensive. To allow for general inputs 𝐊\mathbf{K} and 𝐓\mathbf{T} under this restriction, we in general compare sequences that require only approximately the same time. The detailed method is described by algorithm S.5 in the Supplemental Material, section S.III.

III.2.2 Gain rate method

The second method we study for choosing kk uses the rate of expected knowledge gain. A similar idea has been used recently to estimate decoherence timescales in a qubit [79]. We compute a vector of expected knowledge gain 𝐆={g1,g2,,gL}\mathbf{G}=\{g_{1},g_{2},\dots,g_{L}\} where gng_{n} corresponds to the expected knowledge gain for performing a single measurement with k=knk=k_{n}, computed using either (7) or (9) depending on the choice of gain function. We then calculate the vector of expected rate of knowledge gain 𝐑=𝐆/𝐓\mathbf{R}=\mathbf{G}/\mathbf{T} (elementwise division), and choose the k=knk=k_{n^{*}} with n=argmaxn𝐑n^{*}=\arg\max_{n}\mathbf{R}. The value of nn^{*} may be found by performing a brute force search over kk-values. Since we use a computation that runs in series, the brute force search can take too long for many practical implementations, and we reduce the computation by performing a Fibonacci search [80] to find a good value of kk. Since the computation of any element of 𝐑\mathbf{R} can be performed independently, a brute force search could be performed by computing the gain rate for each kk-value in parallel. Since the brute force search is expected to lead to slightly better performance than the Fibonacci search we use here, it could be considered for future implementations. Overall, the gain rate method has the advantage of being simpler and computationally cheaper than the multi-step method.

IV Numerical representations

The representation of phase knowledge using the Fourier series (2) means that the number of non-zero coefficients cnc_{n} to keep track of increases with our knowledge of the phase. If the number of non-zero coefficients at step ss is Γ\Gamma, the computation for the Bayesian update (3) requires calculating Γ+k\Gamma+k new coefficients. Considering the case that we start with a uniform probability density, we initially have c0=1,cn=0,n0c_{0}=1,\,c_{n}=0,\,\forall n\neq 0 in (2). After performing some measurements with NN applications of UU in total, the last update will require calculating Γ=N\Gamma=N non-zero coefficients.

If we are in a situation which results in Heisenberg scaling, then N1/Δϕ^N\propto 1/\Delta\hat{\phi}. If the scaling of Δϕ^\Delta\hat{\phi} is slower than Heisenberg scaling, then it follows from (3) that the number of coefficients Γ\Gamma after reaching a given value of Δϕ^\Delta\hat{\phi} will be greater than for Heisenberg scaling. However, if we assume a probability density p(ϕ)p(\phi) which on average preserves its functional form (i.e. the overall shape stays the same and only the width changes) as estimation proceeds and Δϕ^\Delta\hat{\phi} decreases, then from standard Fourier analysis Γ\Gamma should scale as 1/Δϕ^1/\Delta\hat{\phi}. This follows from the fact that we expect the square root of the Holevo variance of the Bayesian probability density, V[p(ϕ)]\sqrt{V[p(\phi)]}, to scale in the same way as Δϕ^\Delta\hat{\phi} with NN. This suggests that when we converge more slowly than Heisenberg scaling, the description of p(ϕ)p(\phi) obtained from (3) uses disproportionately many non-zero coefficients and becomes inefficient. For this case, we should seek a good representation of p(ϕ)p(\phi) with fewer coefficients to minimise computational overhead. In this sense we can expect that it should be possible for the number of coefficients to be inversely proportional to the uncertainty, Γ1/Δϕ^\Gamma\propto 1/\Delta\hat{\phi}, even if we do not have Heisenberg scaling.

The computation for adaptively choosing α\alpha and kk scales more favourably with Γ\Gamma than the Bayesian update. Computing the expected sharpness gain (7) does not scale with the total number of coefficients Γ\Gamma, while the expected entropy gain (9) scales as Γ/k\Gamma/k. Since we expect the optimal value of kk to increase with NN (in the noise-free case), there may exist algorithms, offering a complexity independent of NN, for choosing kk such that the expected entropy gain typically is maximised.

The practicality of using TAPE depends on the possibility of computing the Bayesian update and choice of control parameters (α\alpha and kk) in a time shorter or equal to the time it takes to run the next measurement. Since the update computation time scales linearly with Γ\Gamma, it will become longer than the measurement time for sufficiently large Γ\Gamma. In the asymptotic limit when NN\rightarrow\infty, the average measurement time, proportional to kk for sequential procedures, will also scale linearly with NN if we have Heisenberg scaling. If the slope of the linear increase in computation time is less than that of the measurement time, the computation will remain shorter than the measurement time on average for all NN. However, this may not be the case for many experiments where TAPE could otherwise be useful, for example if decoherence prevents large kk-value measurements from giving phase information, or if TAPE is used for parallel procedures. In these cases it is beneficial to find methods that reduce the time of computation. One approach is to change representation from a truncated Fourier series to another function such as a Gaussian which is described by a number of parameters that is independent of NN [81]. Here we propose an alternative method that allows the truncated Fourier series representation to be used for arbitrarily large NN while ensuring that the number of coefficients needed in the series scales much slower than 1/Δϕ^1/\Delta\hat{\phi}. This way all the procedures of sections II and III can still be used for large NN with minimal changes, and the user is also free to adjust the generality of the description of p(ϕ)p(\phi) versus computational overhead as suits their purposes.

Given a probability density p(ϕ)p(\phi), we define a contraction of p(ϕ)p(\phi) as (M,ϕ0,q(θ))(M,\phi_{0},q(\theta)), such that p(ϕ0+θ/M)q(θ)p(\phi_{0}+\theta/M)\approx q(\theta) for 0θ<2π0\leq\theta<2\pi and p(ϕ)0p(\phi)\approx 0 otherwise. We refer to M+M\in\mathbb{N}^{+} as the magnification, to ϕ0[0,2π)\phi_{0}\in[0,2\pi) as the offset, and we represent q(θ)q(\theta) as a truncated Fourier series. Thus, by representing the density p(ϕ)p(\phi) as a contraction we use the truncated Fourier series representation for ϕ0ϕ<ϕ0+2π/M\phi_{0}\leq\phi<\phi_{0}+2\pi/M and assume that p(ϕ)=0p(\phi)=0 for values of ϕ\phi outside this interval. We are effectively “zooming in” on a region around the expected value of ϕ\phi and assuming that the probability of ϕ\phi outside this region is negligible. Given a contraction (M,ϕ0,q(θ))(M,\phi_{0},q(\theta)) where q(θ)q(\theta) has coefficients cn,n=Γ,,Γc_{n},\,\forall n=-\Gamma,\dots,\Gamma we can calculate a new contraction (M,ϕ0,q(θ))(M^{\prime},\phi_{0}^{\prime},q^{\prime}(\theta^{\prime})), MM^{\prime} a multiple of MM with m=M/M+m=M^{\prime}/M\in\mathbb{N}^{+} as

ϕ0\displaystyle\phi_{0}^{\prime} =ϕ0+θ^/Mπ/M\displaystyle=\phi_{0}+\hat{\theta}/M-\pi/M^{\prime} (10)
cn\displaystyle c^{\prime}_{n} =ei(πarg(cm))cm×n,n=Γ/m,,Γ/m,\displaystyle=\mathrm{e}^{i(\pi-\arg(c_{-m}))}c_{m\times n},\,\forall n=-\Gamma/m,\dots,\Gamma/m\,, (11)

where θ^=arg(c1)\hat{\theta}=\arg(c_{-1}).

By using the truncated Fourier series on a reduced interval of size 2π/M2\pi/M, a contraction (M,ϕ0,q(θ))(M,\phi_{0},q(\theta)) of p(ϕ)p(\phi) (with Γ\Gamma coefficients) uses only Γ/M\Gamma/M coefficients. The assumption that p(ϕ)=0p(\phi)=0 outside the interval [ϕ0,ϕ0+2π/M){[\phi_{0},\phi_{0}+2\pi/M)} is often false with finite probability. However by performing more measurements this probability can always be reduced arbitrarily while increasing the number of coefficients Γ\Gamma. Once this probability is small enough for the given application we can make contractions to ensure that the number of coefficients no longer exceeds Γ\Gamma. Thus, with this method we must balance the probability that a contraction fails to represent the estimated phase against the maximum computation time per measurement.

In the Supplemental Material, section S.IV, we show that under the assumption of a Gaussian probability density p(ϕ)p(\phi), using contractions reduces the number of Fourier coefficients required by 𝒪(log(log(Γ)))\mathcal{O}(\sqrt{\log(\log(\Gamma))}), while keeping the probability that p(ϕ)0p(\phi)\neq 0 outside the interval [ϕ0,ϕ0+2π/M)[\phi_{0},\phi_{0}+2\pi/M) below a constant value (which can be chosen to be arbitrarily small). In general, the computational complexity will depend on the particular functional form of p(ϕ)p(\phi), but the Gaussian case demonstrates that the method can sometimes greatly reduce computation. If a much poorer complexity is found when faster computation is needed, it could be helpful to modify the adaptive strategy for choosing α\alpha and kk such that p(ϕ)p(\phi) becomes closer to a Gaussian before each contraction is performed.

The expected sharpness or entropy gain relations (7) and (9) can be used for the contracted Fourier series representation q(θ)q(\theta) with the replacement ϕθ\phi\rightarrow\theta, although the values of the expected sharpness gain in that case will not correspond to those for the full density p(ϕ)p(\phi) 777Although the differential entropy is not invariant when changing the scale, the differential entropy gain is.. However the control parameters that achieve the maximum gain for the contraction can still be used to determine the values to use for the next measurement. Denoting the optimal control phase thus obtained as β\beta and the optimal number of applications of UU as jj, then the corresponding values to use for the un-contracted density p(ϕ)p(\phi) can be calculated as k=Mjk=Mj, α=β+kϕ0\alpha=\beta+k\phi_{0}. In this way we can use contractions to reduce the time not only for the update computation but also for the search of optimal parameters α\alpha and kk using the expected entropy gain 888this also works when using sharpness gain, but there is no speedup.

V Simulation results

In order to characterise the performance of the methods considered above we study the results of numerical simulations. We focus on estimation starting from a uniform prior (initially no information about the phase), though we note that our methods are also well-suited to any prior that can be well-represented with a Fourier series and the contractions described in section IV, given available computational resources and particular timing requirements of an experiment. We first consider the case of noise-free quantum metrology. In this context the time of the measurement is assumed to be proportional to the number of applications kk of the unitary evolution UU; in the following we set the time of UU to 1 so that the time of a single measurement is equal to kk. In this case, the number of resources NN in the SQL, Δϕ^>1/N\Delta\hat{\phi}>1/\sqrt{N}, and HL, Δϕ^>π/N\Delta\hat{\phi}>\pi/N, is the total number of applications of UU in the estimation procedure, and is equal to the total estimation time.

In figure 4 we plot the uncertainty in the phase estimates resulting from repeating the estimation procedure with randomly chosen values for the system phase ϕ\phi 999The performance of estimation may be dependent on the system phase when starting from a uniform prior if the control phase for the first measurement is fixed (e.g. α=0\alpha=0). By choosing a random control phase α[0,π]\alpha\in[0,\pi] for the first measurement, the performance is the same for any value of the system phase. Thus, our results can also be interpreted as the performance for any particular value of the system phase assuming a random control phase is used in the first measurement.. More specifically, we plot the uncertainty multiplied by total estimation time for which any phase estimation reaching HS results in a constant value independent of estimation time; for the HL, the constant value is π\pi. The uncertainty in the phase estimates Δϕ^\Delta\hat{\phi} is calculated according to equation (5), where the sharpness is calculated as S(ϕ^)=cos(ϕ^ϕ)S(\hat{\phi})=\langle\cos(\hat{\phi}-\phi)\rangle for a true system phase ϕ\phi, and the expectation value is approximated as an average over simulations. This choice can only lead to larger uncertainty values than the usual sharpness, |ei(ϕ^ϕ)||\langle\mathrm{e}^{i(\hat{\phi}-\phi)}\rangle|, since cos(ϕ^ϕ)=𝔢{ei(ϕ^ϕ)}\langle\cos(\hat{\phi}-\phi)\rangle=\mathfrak{Re}\{\langle\mathrm{e}^{i(\hat{\phi}-\phi)}\rangle\}. For the results in figure 4 we find 𝔪{ei(ϕ^ϕ)}𝔢{ei(ϕ^ϕ)}\mathfrak{Im}\{\langle\mathrm{e}^{i(\hat{\phi}-\phi)}\rangle\}\ll\mathfrak{Re}\{\langle\mathrm{e}^{i(\hat{\phi}-\phi)}\rangle\}, which shows that the estimator is unbiased and either choice of sharpness measure leads to essentially the same values for the uncertainty. Nevertheless, in all simulation results presented in this manuscript we have calculated the phase uncertainty Δϕ^\Delta\hat{\phi} using S(ϕ^)=cos(ϕ^ϕ)S(\hat{\phi})=\langle\cos(\hat{\phi}-\phi)\rangle.

For the multi-step (gain rate) method, each plotted point is (Δϕ^)×(total estimation time)(\Delta\hat{\phi})\times(\text{total estimation time}) calculated from 6×1046\times 10^{4} (3×1063\times 10^{6}) realisations of estimation. We compare four adaptive methods for the choice of control parameters α\alpha and kk. For the multi-step method we plot separately the results of maximising either the expected sharpness gain (7) or the expected entropy gain (9) for total estimation times 2m,m=1,,122^{m},\,m=1,\dots,12. For the gain rate method we plot the results of maximising the expected sharpness gain (7) to determine the choices of α\alpha and kk, but we find that maximising the expected entropy gain (9) performs worse (not shown), which is due to the properties of entropy mentioned in section III.1. We also note that when maximising the expected entropy gain only, we tend to obtain very accurate estimates with a small number of outliers that significantly increase the phase uncertainty.

Instead of studying further the gain rate method maximising only expected entropy gain, we plot the result of a hybrid strategy in which the choices of α\alpha and kk are determined using the expected entropy gain for at most the first half of the total estimation time, and the expected sharpness gain is used for the remaining time. This approach is motivated by the fact that although maximising ones information about the phase is desired, obtaining a precise estimate for the phase additionally requires a narrow probability density ps(ϕ)p_{s}(\phi). Since multi-peak density functions can have the same entropy as single-peak functions with a broader peak, maximising the entropy doesn’t necessarily lead to a narrow density. We choose the hybrid method to study the performance of strategies that maximise the information gain, while still ensuring a narrow density ps(ϕ)p_{s}(\phi) that is required for accurate phase estimation.

As shown by Gutiérrez-Rubio et al. [50], maximising the entropy gain leads to the maximum likelihood of estimating the correct system phase. Another way to understand the drawback of maximising the entropy gain only, is that maximising the likelihood of estimating the system phase ϕ\phi does not generally minimise Δϕ^\Delta\hat{\phi}, which depends on the entire distribution of phase estimates, call it q(ϕ^)q(\hat{\phi}), rather than the value q(ϕ^=ϕ)q(\hat{\phi}=\phi) at a single point. Note that if one could maximise expected gain over an entire measurement sequence used for estimation, maximising the expected sharpness gain should minimise Δϕ^\Delta\hat{\phi} [33]. Since we perform the optimisation only for the next measurement, it becomes interesting to study the hybrid method.

The two plotted methods based on gain rate are for total estimation times 2m,m=1,,182^{m},\,m=1,\dots,18, and all methods make use of contractions (section IV) to reduce computation times; if the square root of the Holevo variance of the phase density, V[p(ϕ)]\sqrt{V[p(\phi)]} is less than π213M\frac{\pi}{2^{13}M}, where MM is the current magnification, then we perform at least one measurement using parameters α,k\alpha,k that maximise the expected sharpness gain, and then we perform a contraction with m=2m=2. The additional measurement to maximise the expected sharpness gain is used in the hybrid method, and multi-step method maximising entropy to reduce the probability of the system phase being outside the reduced interval over which the Fourier series representation of the phase knowledge is used in the contraction. Since kk-values are chosen adaptively, a sequence of optimal choices does not generally lead to the total times we choose to plot in figure 4. For the simulations plotted we therefore always restrict the optimisation over kk to values less than or equal to the total remaining time. For this reason we have performed simulations for each plotted point independently of the other points for different total times.

Refer to caption
Figure 4: Metrology scaling without noise. The time of all experiments in an estimation sequence is equal to the total number of applications of the unitary UU. The phase uncertainties Δϕ^\Delta\hat{\phi} are calculated from 6×1046\times 10^{4} (3×1063\times 10^{6}) realisations of estimation for the multi-step (gain rate) methods with system phase ϕ\phi chosen uniformly at random. Error bars (typically too small to see) are calculated by error propagation from sample standard deviations of the estimation errors. Legend: multi-step method maximising expected sharpness gain (\circ), multi-step method maximising expected entropy gain (\bullet), gain rate method maximising expected sharpness gain (++), hybrid gain rate method (\square).

The results in figure 4 show that all methods perform better than the SQL. The gain rate method using sharpness gain or the hybrid method perform best, and we fit the results for total estimation times from 2122^{12} to 2182^{18} with a functional form Δϕ^=AπNγ\Delta\hat{\phi}=A\pi N^{-\gamma}. The values of AA and γ\gamma obtained from the fits are summarised in table 1. The hybrid method clearly reaches HS (γ=1\gamma=1), while the method maximising sharpness only is extremely close to HS, and both methods are close to the theoretically lowest achievable uncertainty for the values simulated. In particular, the uncertainty for a total time of 2182^{18} (i.e. the longest time simulated) is a factor of 1.422±0.0151.422\pm 0.015 from the HL when maximising sharpness gain only. For the same total time the hybrid method reaches an uncertainty within a factor of 1.429±0.0111.429\pm 0.011 from the HL.

Method AA γ\gamma
sharpening 1.371±0.0281.371\pm 0.028 0.998±0.0020.998\pm 0.002
hybrid 1.430±0.0361.430\pm 0.036 1.0±0.0021.0\pm 0.002
Table 1: Values of the fitted parameters AA and γ\gamma using the fit function Δϕ^=AπNγ\Delta\hat{\phi}=A\pi N^{-\gamma} and the uncertainties obtained for estimation times from 2122^{12} to 2182^{18} for the two gain rate methods plotted in figure 4.

We attribute the poorer performance of the multi-step method to the fact that the compared gains require only approximately equal time and to the fact that only a local optimum is used (see Supplemental Material, section S.III). Although we expect the performance could be improved by starting the search at a kk-value dependent on the Holevo variance of the prior (cheap to compute) and additionally using a “search down” method analogous to algorithm S.4 (see Supplemental Material, section S.III), we choose rather to focus on the gain rate method since it demonstrates near-optimal performance in addition to being simpler and computationally more efficient.

In the Supplemental Material, section S.I, we have also compared the multi-step and gain rate methods in the case where we allow only the subset of kk-values: k{2n},n=1,2,3,k\in\{2^{n}\},\,n=1,2,3,\dots. In this case the gains compared by the multi-step method require exactly the same time, and the method performs similarly to the gain rate method. However even in this situation the multi-step method at best performs similarly to the hybrid method using the gain rate. For completeness, we have also simulated the gain rate methods studied here when a brute force search for kk is used rather than a Fibonacci search. In that case the hybrid method performs best reaching an uncertainty within 1.394±0.0101.394\pm 0.010 of the HL. A summary of simulation results for the best performing methods we have studied is given in table S.1 in the Supplemental Material, section S.I.

In figure 5 we consider the performance of the methods using the gain rate when the measurement time is independent of kk. This situation describes the limit where the time required to perform the unitary evolution UU is negligible compared to other times in the experiment such as state preparation and readout. In this case the total time is equal to the number of measurements. We plot methods based on maximising either sharpness gain or entropy gain as well as a hybrid method, as in figure 4. In addition we compare the case where the gain maximisation is based on the correct rate in this context, i.e. 𝐑=𝐆/𝐓\mathbf{R}=\mathbf{G}/\mathbf{T} where 𝐓={1,1,,1}\mathbf{T}=\{1,1,\dots,1\} (so 𝐑=𝐆\mathbf{R}=\mathbf{G}), with the usual case in metrology: 𝐓𝐊\mathbf{T}\propto\mathbf{K}. We plot the mean error resulting from 12×10612\times 10^{6} realisations of estimation from 1 to 40 measurements.

Refer to caption
Figure 5: Scaling without noise when the experiment time is equal to the number of measurements (independent of how many applications of UU are used). Plotted are the phase uncertainties Δϕ^\Delta\hat{\phi} calculated from 12×10612\times 10^{6} realisations of estimation with system phase ϕ\phi chosen uniformly at random. Knowledge gains optimised with rate from metrology scaling (𝐓𝐊\mathbf{T}\propto\mathbf{K}): expected sharpness gain (\diamond), expected entropy gain (\circ), hybrid method (\star). Knowledge gains optimised with the correct rate (𝐓={1,1,,1}\mathbf{T}=\{1,1,\dots,1\}): expected sharpness gain (++), expected entropy gain (\bullet), hybrid method (\square). Inset: closer view of results for 32 to 40 measurements.

The results in figure 5 show that the strategies using the correct rate (𝐑=𝐆\mathbf{R}=\mathbf{G}) outperform those optimised for the usual metrology setting with 𝐓𝐊\mathbf{T}\propto\mathbf{K}. In this case we find that the method that maximises the expected entropy gain performs best while that maximising expected sharpness gives the lowest accuracy in phase estimation; the hybrid method has a performance in between the two. We fit the results from t=10t=10 to 4040 measurements with an exponential decay Δϕ^=Aexp(κt)\Delta\hat{\phi}=A\exp{(-\kappa t)}. The values of the fitted parameters AA and κ\kappa for each of the plotted curves in figure 5 are listed in table 2.

Method AA κ\kappa
sharpening (M) 0.7659±0.01090.7659\pm 0.0109 0.13164±0.001020.13164\pm 0.00102
hybrid (M) 0.7903±0.01390.7903\pm 0.0139 0.13536±0.001270.13536\pm 0.00127
entropy (M) 0.7984±0.01190.7984\pm 0.0119 0.14074±0.001030.14074\pm 0.00103
sharpening 0.8015±0.00310.8015\pm 0.0031 0.14149±0.000280.14149\pm 0.00028
hybrid 0.7739±0.00180.7739\pm 0.0018 0.14424±0.000170.14424\pm 0.00017
entropy 0.7829±0.00270.7829\pm 0.0027 0.14856±0.000260.14856\pm 0.00026
Table 2: Values of the fitted parameters AA and κ\kappa for the fit function Δϕ^=Aexp(κt)\Delta\hat{\phi}=A\exp{(-\kappa t)} for each of the methods plotted in figure 5. (M) indicates the rate used for optimisation is the one usually considered in quantum metrology, 𝐓𝐊\mathbf{T}\propto\mathbf{K}.

Although the performance is better when the correct rate is used, the improvement is not dramatic. This is perhaps not surprising since the quantum phase estimation algorithm (QPEA) [85, 72] is known to be optimal even when the number of applications of UU is not the relevant resource (indeed this is also the case in the setting of quantum computation) [3, 4]. Methods based on QPEA also lead to Heisenberg scaling in the context of quantum metrology [33]; that is, estimation procedures using similar allocation of UU per measurement have been shown to be optimal in both settings. Nevertheless, we see that in the case of the single-step optimisation we perform for sequential strategies we can obtain a slight improvement by using the appropriate rate in the optimisation. In this setting we attribute the better performance when maximising the information gain rather than the sharpness gain to the fact that higher kk-values are cheap compared to the metrology setting, allowing strategies that are better quantified by the richer nature of the entropy.

For the remaining simulations we return to the metrology setting where 𝐓𝐊\mathbf{T}\propto\mathbf{K} since in this context we can compare performance with known theoretical bounds. In particular, we study the performance in the presence of noise.

In figure 6 we model a system with dephasing. We assume perfectly prepared |+(|0+|1)/2|+\rangle\equiv(|0\rangle+|1\rangle)/\sqrt{2} states, and U=eiZϕ/2U=\mathrm{e}^{-iZ\phi/2}, ZZ the Pauli-z operator. After each application of UU we additionally apply a dephasing channel described by the Kraus operators K0=1+η2𝟙K_{0}=\sqrt{\frac{1+\eta}{2}}\openone, K1=1η2ZK_{1}=\sqrt{\frac{1-\eta}{2}}Z. At the end we perform the measurement in the x-basis. In this case it is possible to have Heisenberg scaling only initially while increasing kk is beneficial, but as kk is increased dephasing eventually reduces the information available by the measurement and we are restricted to 1/N1/\sqrt{N} scaling. The ultimate bound on precision can then be expressed as c/Nc/\sqrt{N}, where cc is a prefactor depending on the noise channel. In the case of dephasing this prefactor has been shown to be equal to at least 1η2/η\sqrt{1-\eta^{2}}/\eta (but it is not known if this bound is tight) [39]. In figure 6 we consider the case where η=0.995\eta=0.995; the bound c/Nc/\sqrt{N} is also plotted. As before we simulate the gain rate method for sharpness gain, entropy gain, and the hybrid case described above. We compare these three methods when no decoherence is accounted for in the estimator model (ζk=1\zeta_{k}=1 in equation (1)) and when the decoherence rate is already known (ζk=e(1η)k\zeta_{k}=\mathrm{e}^{-(1-\eta)k}). We plot the phase uncertainties calculated from 32×10432\times 10^{4} realisations of the estimation starting from a uniform prior and system phase chosen uniformly at random.

Refer to caption
Figure 6: Metrology scaling in the presence of decoherence. Plotted are the phase uncertainties Δϕ^\Delta\hat{\phi} calculated from 32×10432\times 10^{4} realisations of estimation with system phase ϕ\phi chosen uniformly at random. HL: Heisenberg limit. SQL: standard quantum limit. QA: lower bound on achievable quantum advantage given by c/Nc/\sqrt{N}, where c=1η2/ηc=\sqrt{1-\eta^{2}}/\eta, η=0.995\eta=0.995, and NN is the total time. Estimation with, i.e. ζk=e(1η)k,k\zeta_{k}=\mathrm{e}^{-(1-\eta)k},\,\forall k (without, i.e. ζk=1,k\zeta_{k}=1,\,\forall k), decoherence included in the estimator model: sharpness: \diamond (++), entropy: \circ (\bullet), hybrid: \star (\square). \diamond and \star are mostly overlapping since the corresponding methods perform similarly.

When decoherence is not accounted for in the estimator model, maximising the entropy leads to the poorest performance since this chooses larger kk-values for which the system decoheres, leading to reduced information in the measurement. For all methods that do not account for decoherence in the model, estimation works initially, but as kk is increased decoherence eventually introduces large random errors in the estimates. When decoherence is accounted for in the model we find that all methods perform well. When sufficiently many resources are used the method that maximises the rate of entropy gain performs similarly to the other methods. This can be explained by the fact that decoherence eventually limits the kk-values chosen by the strategy and the density ps(ϕ)p_{s}(\phi) is far less likely to have multiple peaks. We find that when the decoherence is accounted for, the performance after a total estimation time of 2152^{15} is within 1.6715±0.00211.6715\pm 0.0021 of the theoretical bound when maximising the expected entropy gain, 1.6807±0.00631.6807\pm 0.0063 when maximising the expected sharpness gain, and 1.6734±0.00211.6734\pm 0.0021 for the hybrid method. Our methods could also be combined with an optimised estimation of decoherence timescales as proposed by Arshad et al. [79].

The results in figure 6 demonstrate some robustness of the estimation methods to errors not accounted for in the model. In figure 7 we present the results of a last set of simulations to examine further the robustness of the methods. In this case we assume perfectly prepared |0|0\rangle states and U=eiYϕ/2U=\mathrm{e}^{iY\phi/2}. After the kk applications of UU we apply a bit-flip channel with Kraus operators K0=1pb𝟙K_{0}=\sqrt{1-p_{b}}\openone, K1=pbXK_{1}=\sqrt{p_{b}}X, X,YX,Y Pauli-x,y, followed by spontaneous emission with Kraus operators

K0\displaystyle K_{0} =(1001ps),K1=(0ps00).\displaystyle=\begin{pmatrix}1&0\\ 0&\sqrt{1-p_{s}}\end{pmatrix},\quad K_{1}=\begin{pmatrix}0&\sqrt{p_{s}}\\ 0&0\end{pmatrix}\,.

We choose to set pb=p/2p_{b}=p/2, ps=pp_{s}=p and simulate the hybrid gain rate method with p=0.1,0.2,0.3p=0.1,0.2,0.3. For each value of pp we simulate estimation using an estimator model with (λk=ζk=(1p),k\lambda_{k}=\zeta_{k}=(1-p),\,\forall k) and without (λk=ζk=1,k\lambda_{k}=\zeta_{k}=1,\,\forall k) errors included.

Refer to caption
Figure 7: Metrology scaling with bit-flips and spontaneous emission. Each point is the phase uncertainty Δϕ^\Delta\hat{\phi} calculated from 2×1062\times 10^{6} realisations of estimation with system phase ϕ\phi chosen uniformly at random. The hybrid method described in the main text is used for all simulations plotted here. Without noise included in the model of the estimator, i.e. λk=ζk=1,k\lambda_{k}=\zeta_{k}=1,\,\forall k: p=0.3p=0.3 (\square), p=0.2p=0.2 (\circ), p=0.1p=0.1 (\bullet). With noise included in the estimator model, i.e. λk=ζk=(1p),k\lambda_{k}=\zeta_{k}=(1-p),\,\forall k: p=0.3p=0.3 (\diamond), p=0.2p=0.2 (++), p=0.1p=0.1 (\star).

We see that when errors are included in the model the phase uncertainty is significantly larger than in the noise-free case, but we still retain better than classical scaling for longer estimation times. We fit the phase uncertainty with a functional form Δϕ^=AπNγ\Delta\hat{\phi}=A\pi N^{-\gamma}, with fit parameters AA and γ\gamma. By fitting subsets of points we see that γ\gamma tends to increase for longer times, but we would need to simulate further to see if HS is eventually reached. When errors are not included in the model, only the p=0.1p=0.1 case performs similarly to the SQL initially, while for larger errors the performance drops significantly. We see also that for longer estimation times the lack of errors in the model can prevent further reduction of the phase uncertainty. This suggests that it may be better to restrict to lower kk-values in the presence of large unknown errors. Overall, these results demonstrate robustness of the method to large errors indicating that TAPE can also be useful for calibration of quantum systems, e.g. for quantum computing, as considered in ref. [1].

In the noise-free case we find average computation times are 1ms\sim 1\,\mathrm{ms} for the computation required for a single shot using a standard PC (CPU model: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz) without any parallel computation. When noise is included in the model we find that the maximisation of expected knowledge gain can take a time up to 3ms\sim 3\,\mathrm{ms} for the entropy gain, and 1ms\sim 1\,\mathrm{ms} when maximising the expected sharpness gain. This makes the method suitable for trapped ion or neutral atom based quantum computing systems where the time of a single shot is typically a few milliseconds. Since the computation of expected knowledge gain for each kk-value could be performed in parallel it may be possible to obtain sufficient speedups for some applications in superconducting circuits or NV centres. Further speedups may also be possible when maximising the expected sharpness gain if the optimal value of α\alpha where determined analytically. An alternative approach to achieve faster computation with the implementation we use here is to restrict measurements to use only a subset of possible kk-values; a further discussion can be found in the Supplemental Material, section S.I.

For experimental systems with sufficiently low measurement rates, TAPE thus provides a near-optimal and very flexible method for phase estimation. In the metrology setting without noise, the best performance we know of for phase estimation using a classical sequential strategy is the adaptive method of Higgins et al. that is inspired by QPEA from quantum computing [18, 4]. Their adaptive method reaches a phase uncertainty (as quantified by the square root of the Holevo variance of the estimates) a factor of 1.561.56 larger than the HL, while they have later devised a non-adaptive method, also inspired by QPEA, that performs similarly, demonstrating uncertainty less than a factor of 2.032.03 larger the HL [32, 33]. The latter non-adaptive method has also been shown to be a robust method for calibrating single-qubit gates for quantum computation [1]. Non-classical strategies for phase estimation reaching lower uncertainty estimates are known, as e.g. the method of Pezzè and Smerzi [86, 87] which provides phase estimates with uncertainty 1.27 larger than the HL. However, classical sequential strategies are best suited to single-qubit gate calibration. In the metrology setting we have found that TAPE reaches an uncertainty that is a factor of 1.431.43 larger than the HL using the hybrid method, demonstrating similar or better performance to known methods. The results presented in figures 6 and 7 also show significant robustness to errors, demonstrating that TAPE can be a good method for calibrating single-qubit operations.

In contrast to algorithms inspired by QPEA where a predetermined number of measurements is performed for values of k2n,n=1,2,3,k\in 2^{n},\,n=1,2,3,\dots, TAPE chooses the time of phase evolution (i.e. the kk-value) adaptively. For the non-adaptive method that is used for robust phase estimation [1, 32], the number of measurements for each kk-value is optimised beforehand for a given total number of phase applications N=sksN=\sum_{s}k_{s} assuming no prior knowledge of the phase. If noise is present the optimisation needs to be modified as shown in [64]. But as noted therein, the noise models they consider “are to be thought more as toy models” that “capture some of the key features of those scenarios”. Depending on the type of noise present in a given experiment a further analysis and optimisation of the number of measurements to perform with each kk-value will be required to minimise the phase uncertainty. By using the general form for the measurement probabilities (1), TAPE allows for the description of a wide range of noise to be included in the model of the estimator and directly provides near-optimal phase estimation procedures by accounting for the modelled noise in the optimisation for the control parameters α\alpha and kk.

In addition, TAPE allows the exact experiment times to be easily included in the optimisation for the adaptive choice of kk-value. This is very convenient for experiments where state preparation and readout times cannot be neglected in comparison to the time required to apply the unknown phase. Using the QPEA inspired methods above, a further optimisation of the number of measurements with each kk-value would otherwise be needed to minimise the phase uncertainty.

In the QPEA inspired methods above, the value of kk to use for a particular step in the estimation sequence requires knowing how many measurements have been performed so far with each kk-value. In TAPE, once the model of the noise is specified (by setting the values λk,ζkk\lambda_{k},\,\zeta_{k}\,\forall k), and the experiment times required for each kk-value are set, the values of the control parameters α\alpha and kk are determined using only the current prior knowledge density p(ϕ)p(\phi). This makes it easy to apply TAPE in situations where some prior knowledge may be available. As an example suppose we would like to use phase estimation for calibrating single-qubit operations on a quantum computer where internal parameters of the device can drift over time. One could use either TAPE or a QPEA inspired method to initially estimate parameters. However, it would be easy to include the drift rate in the model of the phase knowledge p(ϕ)p(\phi) as e.g. a broadening of the probability density function over time. Then one could use TAPE to perform a minimal number of measurements to keep track of the parameters needed for single-qubit operations over time.

Other proposals for phase estimation such as [27, 67, 31], and some discussed in [25] are more similar to TAPE in that they choose the value of kk adaptively. However, they do not demonstrate better performance in terms of uncertainty of phase estimates or flexibility in terms of using potentially available prior information or accounting for experimental resources and imperfections.

VI Conclusion

Between the different forms of TAPE compared we find that choosing the control parameters for the phase α\alpha and number of unknown phase applications kk based on the rate of knowledge gain gives near-optimal performance in several different settings, while requiring computation times that make it accessible to many experiments. In the context of noise-free quantum metrology we reach uncertainties in phase estimates within 1.431.43 of the HL using a hybrid method maximising the rate of expected entropy and sharpness gains. In addition, we have found uncertainties within 1.391.39 of the HL using the hybrid gain rate method performing a brute-force search over measurement setting kk, rather than a Fibonacci search (Supplemental Material, section S.I, table S.1). Performing the computations for each kk-value in parallel would allow this to be done in times comparable to those we find for performing a computation in series with the Fibonacci search method, or even faster.

In a setting where experiment times are proportional to the number of measurements rather than to the number of unknown phase applications kk, we find that maximising the information gain only, leads to the best performance, while the hybrid strategy performs only slightly worse. The method is also able to find optimal strategies in the presence of different types of noise, and demonstrates significant robustness to errors. Combined with the fact that the optimisation can be easily tuned to the real times of experiments as a function of kk and can be used with arbitrary prior information, TAPE thus provides an extremely versatile phase estimation method that can directly give optimal performance in a wide range of experimental settings.

Code availability

The core implementation of TAPE used for all simulations in this work is available at:

The gain rate method is provided therein. The multi-step method was implemented in python using the methods from the core implementation. The code used to generate the figures and values simulated in this work are available from B.N. upon reasonable request.

Author contributions

Initial theory for phase estimation in the noise-free case for k=1k=1 including derivation of an expression for the expected entropy gain was done by A.V.L., and the resulting adaptive method was implemented by V.N. for a trapped-ion experiment in the group of J.P.H. to perform adaptive Ramsey measurements. B.N. had the idea to choose different kk-values adaptively using the same formalism, and worked out the theory in the general case allowing for a range of noise. B.N. devised the gain rate and multi-step methods and chose to study the sharpness as a measure of knowledge in addition to the entropy. The manuscript was written by B.N. with input from all the authors.

Acknowledgements.
We acknowledge support from the Swiss National Science Foundation (SNF) under Grant No. 200020_179147, and from Intelligence Advanced Research Projects Activity (IARPA), via the US Army Research Office grant W911NF-16-1-0070. B.N. thanks Ivan Rojkov for helpful feedback on the manuscript.

References

  • Russo et al. [2021] A. E. Russo, W. M. Kirby, K. M. Rudinger, A. D. Baczewski, and S. Kimmel, Consistency testing for robust phase estimation, Phys. Rev. A 103, 042609 (2021).
  • Gebhart et al. [2023] V. Gebhart, R. Santagati, A. A. Gentile, E. M. Gauger, D. Craig, N. Ares, L. Banchi, F. Marquardt, L. Pezzè, and C. Bonato, Learning quantum systems, Nature Reviews Physics 5, 141 (2023).
  • van Dam et al. [2007] W. van Dam, G. M. D’Ariano, A. Ekert, C. Macchiavello, and M. Mosca, Optimal quantum circuits for general phase estimation, Phys. Rev. Lett. 98, 090501 (2007).
  • Wiseman et al. [2009] H. M. Wiseman, D. W. Berry, S. D. Bartlett, B. L. Higgins, and G. J. Pryde, Adaptive measurements in the optical quantum information laboratory, IEEE Journal of Selected Topics in Quantum Electronics 15, 1661 (2009).
  • Kaftal and Demkowicz-Dobrzański [2014] T. Kaftal and R. Demkowicz-Dobrzański, Usefulness of an enhanced kitaev phase-estimation algorithm in quantum metrology and computation, Phys. Rev. A 90, 062313 (2014).
  • Górecki et al. [2020] W. Górecki, R. Demkowicz-Dobrzański, H. M. Wiseman, and D. W. Berry, π\pi-corrected heisenberg limit, Phys. Rev. Lett. 124, 030501 (2020).
  • Giovannetti et al. [2004] V. Giovannetti, S. Lloyd, and L. Maccone, Quantum-enhanced measurements: Beating the standard quantum limit, Science 306, 1330 (2004)https://www.science.org/doi/pdf/10.1126/science.1104149 .
  • Luis [2002] A. Luis, Phase-shift amplification for precision measurements without nonclassical states, Phys. Rev. A 65, 025802 (2002).
  • Rudolph and Grover [2003] T. Rudolph and L. Grover, Quantum communication complexity of establishing a shared reference frame, Phys. Rev. Lett. 91, 217905 (2003).
  • de Burgh and Bartlett [2005] M. de Burgh and S. D. Bartlett, Quantum methods for clock synchronization: Beating the standard quantum limit without entanglement, Phys. Rev. A 72, 042301 (2005).
  • Giovannetti et al. [2006] V. Giovannetti, S. Lloyd, and L. Maccone, Quantum metrology, Phys. Rev. Lett. 96, 010401 (2006).
  • O’Loan [2009] C. J. O’Loan, Iterative phase estimation, Journal of Physics A: Mathematical and Theoretical 43, 015301 (2009).
  • Boixo and Heunen [2012] S. Boixo and C. Heunen, Entangled and sequential quantum protocols with dephasing, Phys. Rev. Lett. 108, 120402 (2012).
  • Maccone [2013] L. Maccone, Intuitive reason for the usefulness of entanglement in quantum metrology, Phys. Rev. A 88, 042109 (2013).
  • Note [1] This is not necessarily true when noise is considered [88].
  • Kitaev [1995] A. Y. Kitaev, Quantum measurements and the abelian stabilizer problem, Electron. Colloquium Comput. Complex. TR96 (1995).
  • Griffiths and Niu [1996] R. B. Griffiths and C.-S. Niu, Semiclassical fourier transform for quantum computation, Phys. Rev. Lett. 76, 3228 (1996).
  • Higgins et al. [2007] B. L. Higgins, D. W. Berry, S. D. Bartlett, H. M. Wiseman, and G. J. Pryde, Entanglement-free heisenberg-limited phase estimation, Nature 450, 393 (2007).
  • Teklu et al. [2009] B. Teklu, S. Olivares, and M. G. A. Paris, Bayesian estimation of one-parameter qubit gates, Journal of Physics B: Atomic, Molecular and Optical Physics 42, 035502 (2009).
  • Brivio et al. [2010] D. Brivio, S. Cialdi, S. Vezzoli, B. T. Gebrehiwot, M. G. Genoni, S. Olivares, and M. G. A. Paris, Experimental estimation of one-parameter qubit gates in the presence of phase diffusion, Phys. Rev. A 81, 012305 (2010).
  • Kimmel et al. [2015] S. Kimmel, G. H. Low, and T. J. Yoder, Robust calibration of a universal single-qubit gate set via robust phase estimation, Physical Review A 92, 062315 (2015).
  • Martínez-García et al. [2019] F. Martínez-García, D. Vodola, and M. Müller, Adaptive bayesian phase estimation for quantum error correcting codes, New Journal of Physics 21, 123027 (2019).
  • Wiseman and Killip [1997] H. M. Wiseman and R. B. Killip, Adaptive single-shot phase measurements: A semiclassical approach, Phys. Rev. A 56, 944 (1997).
  • Wiseman and Killip [1998] H. M. Wiseman and R. B. Killip, Adaptive single-shot phase measurements: The full quantum theory, Phys. Rev. A 57, 2169 (1998).
  • Berry and Wiseman [2000] D. W. Berry and H. M. Wiseman, Optimal states and almost optimal adaptive measurements for quantum interferometry, Phys. Rev. Lett. 85, 5098 (2000).
  • Berry et al. [2001] D. W. Berry, H. M. Wiseman, and J. K. Breslin, Optimal input states and feedback for interferometric phase estimation, Phys. Rev. A 63, 053804 (2001).
  • Mitchell [2005] M. W. Mitchell, Metrology with entangled states, in Quantum Communications and Quantum Imaging III, Vol. 5893, edited by R. E. Meyers and Y. Shih, International Society for Optics and Photonics (SPIE, 2005) p. 589310.
  • Boixo and Somma [2008] S. Boixo and R. D. Somma, Parameter estimation with mixed-state quantum computation, Phys. Rev. A 77, 052320 (2008).
  • Olivares and Paris [2009] S. Olivares and M. G. A. Paris, Bayesian estimation in homodyne interferometry, Journal of Physics B: Atomic, Molecular and Optical Physics 42, 055506 (2009).
  • Valeri et al. [2023] M. Valeri, V. Cimini, S. Piacentini, F. Ceccarelli, E. Polino, F. Hoch, G. Bizzarri, G. Corrielli, N. Spagnolo, R. Osellame, and F. Sciarrino, Experimental multiparameter quantum metrology in adaptive regime, Phys. Rev. Res. 5, 013138 (2023).
  • Smith et al. [2023] J. G. Smith, C. H. W. Barnes, and D. R. M. Arvidsson-Shukur, An adaptive Bayesian quantum algorithm for phase estimation, arXiv:2303.01517 [quant-ph] (2023).
  • Higgins et al. [2009] B. L. Higgins, D. W. Berry, S. D. Bartlett, M. W. Mitchell, H. M. Wiseman, and G. J. Pryde, Demonstrating heisenberg-limited unambiguous phase estimation without adaptive measurements, New Journal of Physics 11, 073023 (2009).
  • Berry et al. [2009] D. W. Berry, B. L. Higgins, S. D. Bartlett, M. W. Mitchell, G. J. Pryde, and H. M. Wiseman, How to perform the most accurate possible phase measurements, Phys. Rev. A 80, 052114 (2009).
  • Giovannetti et al. [2011] V. Giovannetti, S. Lloyd, and L. Maccone, Advances in quantum metrology, Nature Photonics 5, 222 (2011).
  • Shaji and Caves [2007] A. Shaji and C. M. Caves, Qubit metrology and decoherence, Phys. Rev. A 76, 032111 (2007).
  • Escher et al. [2011] B. M. Escher, R. L. de Matos Filho, and L. Davidovich, General framework for estimating the ultimate precision limit in noisy quantum-enhanced metrology, Nature Physics 7, 406 (2011).
  • Maccone and Giovannetti [2011] L. Maccone and V. Giovannetti, Beauty and the noisy beast, Nature Physics 7, 376 (2011).
  • Escher et al. [2012] B. M. Escher, L. Davidovich, N. Zagury, and R. L. de Matos Filho, Quantum metrological limits via a variational approach, Phys. Rev. Lett. 109, 190404 (2012).
  • Demkowicz-Dobrzański et al. [2012] R. Demkowicz-Dobrzański, J. Kołodyński, and M. GuŢă, The elusive heisenberg limit in quantum-enhanced metrology, Nature Communications 3, 1063 (2012).
  • Kołodyński and Demkowicz-Dobrzański [2013] J. Kołodyński and R. Demkowicz-Dobrzański, Efficient tools for quantum metrology with uncorrelated noise, New Journal of Physics 15, 073043 (2013).
  • Alipour et al. [2014] S. Alipour, M. Mehboudi, and A. T. Rezakhani, Quantum metrology in open systems: Dissipative cramér-rao bound, Phys. Rev. Lett. 112, 120405 (2014).
  • Macieszczak et al. [2014] K. Macieszczak, M. Fraas, and R. Demkowicz-Dobrzański, Bayesian quantum frequency estimation in presence of collective dephasing, New Journal of Physics 16, 113002 (2014).
  • Demkowicz-Dobrzański et al. [2017] R. Demkowicz-Dobrzański, J. Czajkowski, and P. Sekatski, Adaptive quantum metrology under general markovian noise, Phys. Rev. X 7, 041009 (2017).
  • Cole et al. [2006] J. H. Cole, A. D. Greentree, D. K. L. Oi, S. G. Schirmer, C. J. Wellard, and L. C. L. Hollenberg, Identifying a two-state hamiltonian in the presence of decoherence, Phys. Rev. A 73, 062333 (2006).
  • Maccone and De Cillis [2009] L. Maccone and G. De Cillis, Robust strategies for lossy quantum interferometry, Phys. Rev. A 79, 023812 (2009).
  • Dorner et al. [2009] U. Dorner, R. Demkowicz-Dobrzanski, B. J. Smith, J. S. Lundeen, W. Wasilewski, K. Banaszek, and I. A. Walmsley, Optimal quantum phase estimation, Phys. Rev. Lett. 102, 040403 (2009).
  • Kołodyński and Demkowicz-Dobrzański [2010] J. Kołodyński and R. Demkowicz-Dobrzański, Phase estimation without a priori phase knowledge in the presence of loss, Phys. Rev. A 82, 053804 (2010).
  • Kacprowicz et al. [2010] M. Kacprowicz, R. Demkowicz-Dobrzański, W. Wasilewski, K. Banaszek, and I. A. Walmsley, Experimental quantum-enhanced estimation of a lossy phase shift, Nature Photonics 4, 357 (2010).
  • Vidrighin et al. [2014] M. D. Vidrighin, G. Donati, M. G. Genoni, X.-M. Jin, W. S. Kolthammer, M. S. Kim, A. Datta, M. Barbieri, and I. A. Walmsley, Joint estimation of phase and phase diffusion for quantum metrology, Nature Communications 5, 3532 (2014).
  • Ángel Gutiérrez-Rubio et al. [2020] Ángel Gutiérrez-Rubio, P. Stano, and D. Loss, Optimal frequency estimation and its application to quantum dots, arXiv:2004.12049 [cond-mat.mes-hall] (2020).
  • Cappellaro [2012] P. Cappellaro, Spin-bath narrowing with adaptive parameter estimation, Phys. Rev. A 85, 030301 (2012).
  • Hayes and Berry [2014] A. J. F. Hayes and D. W. Berry, Swarm optimization for adaptive phase measurements with low visibility, Phys. Rev. A 89, 013838 (2014).
  • Bonato et al. [2016] C. Bonato, M. S. Blok, H. T. Dinani, D. W. Berry, M. L. Markham, D. J. Twitchen, and R. Hanson, Optimized quantum sensing with a single electron spin using real-time adaptive measurements, Nature Nanotechnology 11, 247 (2016).
  • Bonato and Berry [2017] C. Bonato and D. W. Berry, Adaptive tracking of a time-varying field with a quantum sensor, Phys. Rev. A 95, 052348 (2017).
  • Santagati et al. [2019] R. Santagati, A. A. Gentile, S. Knauer, S. Schmitt, S. Paesani, C. Granade, N. Wiebe, C. Osterkamp, L. P. McGuinness, J. Wang, M. G. Thompson, J. G. Rarity, F. Jelezko, and A. Laing, Magnetic-field learning using a single electronic spin in diamond with one-photon readout at room temperature, Phys. Rev. X 9, 021019 (2019).
  • Joas et al. [2021] T. Joas, S. Schmitt, R. Santagati, A. A. Gentile, C. Bonato, A. Laing, L. P. McGuinness, and F. Jelezko, Online adaptive quantum characterization of a nuclear spin, npj Quantum Information 7, 56 (2021).
  • McMichael et al. [2021] R. D. McMichael, S. Dushenko, and S. M. Blakley, Sequential bayesian experiment design for adaptive ramsey sequence measurements, Journal of Applied Physics 130, 144401 (2021)https://doi.org/10.1063/5.0055630 .
  • Zohar et al. [2022] I. Zohar, Y. Romach, M. J. Arshad, N. Halay, N. Drucker, R. Stöhr, A. Denisenko, Y. Cohen, C. Bonato, and A. Finkler, Real-time frequency estimation of a qubit without single-shot-readout, arXiv:2210.05542 [quant-ph] (2022).
  • Note [2] Readout can also be improved by adaptive methods [89, 90].
  • Said et al. [2011] R. S. Said, D. W. Berry, and J. Twamley, Nanoscale magnetometry using a single-spin system in diamond, Phys. Rev. B 83, 125410 (2011).
  • Waldherr et al. [2012] G. Waldherr, J. Beck, P. Neumann, R. S. Said, M. Nitsche, M. L. Markham, D. J. Twitchen, J. Twamley, F. Jelezko, and J. Wrachtrup, High-dynamic-range magnetometry with a single nuclear spin in diamond, Nature Nanotechnology 7, 105 (2012).
  • Nusran et al. [2012] N. M. Nusran, M. U. Momeen, and M. V. G. Dutt, High-dynamic-range magnetometry with a single electronic spin in diamond, Nature Nanotechnology 7, 109 (2012).
  • Danilin et al. [2018] S. Danilin, A. V. Lebedev, A. Vepsäläinen, G. B. Lesovik, G. Blatter, and G. S. Paraoanu, Quantum-enhanced magnetometry by phase estimation algorithms with a single artificial atom, npj Quantum Information 4, 29 (2018).
  • Belliardo and Giovannetti [2020] F. Belliardo and V. Giovannetti, Achieving heisenberg scaling with maximally entangled states: An analytic upper bound for the attainable root-mean-square error, Phys. Rev. A 102, 042613 (2020).
  • Note [3] This is an example where entanglement can be used to convert temporal resources into spacial resources [11].
  • Granade et al. [2012] C. E. Granade, C. Ferrie, N. Wiebe, and D. G. Cory, Robust online hamiltonian learning, New Journal of Physics 14, 103013 (2012).
  • Wiebe and Granade [2016] N. Wiebe and C. Granade, Efficient Bayesian Phase Estimation, Phys. Rev. Lett. 117, 010503 (2016).
  • Paesani et al. [2017] S. Paesani, A. A. Gentile, R. Santagati, J. Wang, N. Wiebe, D. P. Tew, J. L. O’Brien, and M. G. Thompson, Experimental bayesian quantum phase estimation on a silicon photonic chip, Phys. Rev. Lett. 118, 100503 (2017).
  • Gebhart et al. [2021] V. Gebhart, A. Smerzi, and L. Pezzè, Bayesian Quantum Multiphase Estimation Algorithm, Phys. Rev. Appl. 16, 014035 (2021).
  • Demkowicz-Dobrzański [2011] R. Demkowicz-Dobrzański, Optimal phase estimation with arbitrary a priori knowledge, Phys. Rev. A 83, 061802 (2011).
  • Sergeevich et al. [2011] A. Sergeevich, A. Chandran, J. Combes, S. D. Bartlett, and H. M. Wiseman, Characterization of a qubit hamiltonian using adaptive measurements in a fixed basis, Phys. Rev. A 84, 052315 (2011).
  • Nielsen and Chuang [2010] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, 2010).
  • Note [4] While λk<1\lambda_{k}<1 describes a reduced probability for the outcome ξ=1\xi=-1, the reverse situation can always be described by relabelling the outcomes.
  • Holevo [2011] A. Holevo, Probabilistic and Statistical Aspects of Quantum Theory, 2nd ed., Publications of the Scuola Normale Superiore (Edizioni della Normale Pisa, 2011).
  • Note [5] Even more generally, which resources are valuable depends on the setting and what the experimenter wants to optimise. e.g. they may have a restricted number of qubits to measure, so that number of measurements becomes the relevant resource. In this sense it is useful to have a method where the optimisation can be adjusted by the experimenter to describe best how they value their resources.
  • Note [6] The differential entropy is the entropy of a continuous random variable, but it lacks some important properties of the Shannon entropy for discrete random variables. See e.g. [91], chapter 8. In this manuscript we will usually write simply “entropy” when referring to the differential entropy.
  • Kullback and Leibler [1951] S. Kullback and R. A. Leibler, On Information and Sufficiency, The Annals of Mathematical Statistics 22, 79 (1951).
  • Kullback [1978] S. Kullback, Information Theory and Statistics (Dover Publications, Inc., Gloucester Mass., USA, 1978).
  • Arshad et al. [2022] M. J. Arshad, C. Bekker, B. Haylock, K. Skrzypczak, D. White, B. Griffiths, J. Gore, G. W. Morley, P. Salter, J. Smith, I. Zohar, A. Finkler, Y. Altmann, E. M. Gauger, and C. Bonato, Online adaptive estimation of decoherence timescales for a single qubit, arXiv:2210.06103 [quant-ph] (2022).
  • Avriel and Wilde [1966] M. Avriel and D. J. Wilde, Optimality proof for the symmetric Fibonacci search technique, Fibonacci Quarterly 4, 265 (1966).
  • van den Berg [2021] E. van den Berg, Efficient Bayesian phase estimation using mixed priors, Quantum 5, 469 (2021).
  • Note [7] Although the differential entropy is not invariant when changing the scale, the differential entropy gain is.
  • Note [8] This also works when using sharpness gain, but there is no speedup.
  • Note [9] The performance of estimation may be dependent on the system phase when starting from a uniform prior if the control phase for the first measurement is fixed (e.g. α=0\alpha=0). By choosing a random control phase α[0,π]\alpha\in[0,\pi] for the first measurement, the performance is the same for any value of the system phase. Thus, our results can also be interpreted as the performance for any particular value of the system phase assuming a random control phase is used in the first measurement.
  • Cleve R. and M. [1998] M. C. Cleve R., Ekert A. and M. M., Quantum algorithms revisited, in Proceedings of the Royal Society Lond. A, Vol. 454 (1998) pp. 339–354.
  • Pezzè and Smerzi [2020] L. Pezzè and A. Smerzi, Heisenberg-Limited Noisy Atomic Clock Using a Hybrid Coherent and Squeezed State Protocol, Phys. Rev. Lett. 125, 210503 (2020).
  • Pezzè and Smerzi [2021] L. Pezzè and A. Smerzi, Quantum Phase Estimation Algorithm with Gaussian Spin States, PRX Quantum 2, 040301 (2021).
  • Demkowicz-Dobrzański and Maccone [2014] R. Demkowicz-Dobrzański and L. Maccone, Using entanglement against noise in quantum metrology, Phys. Rev. Lett. 113, 250801 (2014).
  • Myerson et al. [2008] A. H. Myerson, D. J. Szwer, S. C. Webster, D. T. C. Allcock, M. J. Curtis, G. Imreh, J. A. Sherman, D. N. Stacey, A. M. Steane, and D. M. Lucas, High-fidelity readout of trapped-ion qubits, Phys. Rev. Lett. 100, 200502 (2008).
  • D’Anjou et al. [2016] B. D’Anjou, L. Kuret, L. Childress, and W. A. Coish, Maximal adaptive-decision speedups in quantum-state readout, Phys. Rev. X 6, 011017 (2016).
  • Cover and Thomas [2005] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. (John Wiley & Sons, Inc., Hoboken, New Jersey, USA, 2005).
  • [92] Digital Library of Mathematical Functions, National Institute of Standards and Technology, https://dlmf.nist.gov/7.17, accessed: 19.01.2024.
  • Blair et al. [1976] J. M. Blair, C. A. Edwards, and J. H. Johnson, Rational Chebyshev Approximations for the Inverse of the Error Function, Mathematics of Computation 30, 827 (1976).
  • Note [10] This is possible since |β|1|\beta|\leq 1.
  • Khristo N. Boyadzhiev [2012] Khristo N. Boyadzhiev, Series with Central Binomial Coefficients, Catalan Numbers, and Harmonic Numbers, Journal of Integer Sequences 15 (2012).
  • Note [11] This is required by normalisation of ps1(ϕ)p_{s-1}(\phi).

Supplemental Material I

S.I kk-value subsets

In order to simplify the optimisation procedure that is performed to choose α\alpha and kk adaptively for each experiment it is interesting to consider procedures where we restrict kk-values to certain subsets. The main advantage is that the optimisation can be performed in less time (without parallel computation) which could make TAPE accessible to experimental systems where the time of a single shot is shorter. Here we focus on the commonly studied subset containing only powers of two: k{2n},n=1,2,3,k\in\{2^{n}\},\,n=1,2,3,\dots.

In the particular case when 𝐓𝐊\mathbf{T}\propto\mathbf{K}, which is usually studied in quantum metrology the subset with only powers of two is also interesting to study for the multi-step method; the results of section S.III show that the multi-step method is limited by the fact that it converges to local maxima when optimising the choice of kk-value. This is due to the fact that the expected knowledge gains compared are for sequences of experiments that only take approximately equal time. When 𝐓𝐊\mathbf{T}\propto\mathbf{K} and k{2n},n=1,2,3,k\in\{2^{n}\},\,n=1,2,3,\dots, the gains compared by the multi-step method are for experiments that take exactly the same time, thereby avoiding the limitations of the multi-step method that occur in the general case.

Refer to caption
Figure S.1: Metrology scaling without noise when only powers of 22 are used as kk-values. The time of all experiments in an estimation sequence is equal to the total number of applications of the unitary UU. The phase uncertainties Δϕ^\Delta\hat{\phi} are calculated from 10610^{6} realisations of estimation with system phase ϕ\phi chosen uniformly at random. For all methods plotted contraction is used. When V[p(ϕ)]\sqrt{V[p(\phi)]}, is less than π214M\frac{\pi}{2^{14}M} with current magnification MM, then we perform at least one measurement using parameters α,k\alpha,k that maximise the expected sharpness gain, and then we perform a contraction with m=2m=2. Expected entropy gain, multi-step method: \bullet, hybrid method (uses gain rate): \circ, expected sharpness gain, multi-step method (gain rate method): \square (\star).

We fit all results with a functional form Δϕ^=AπNγ\Delta\hat{\phi}=A\pi N^{-\gamma}. For the multi-step method, we fit for total estimation times from 2122^{12} to 2142^{14}, and for the gain rate method from 2122^{12} to 2182^{18}. The fit parameters as well as ratios to the HL after N=214N=2^{14} and 2182^{18} are summarised in table S.1. Some results from the main text for cases where all kk-values are used are also included in table S.1 for comparison.

Method AA γ\gamma ratio to HL, N=214N=2^{14} ratio to HL, N=218N=2^{18}
hybrid, all kk (BFS) 1.394±0.0101.394\pm 0.010
sharp., all kk (BFS) 1.400±0.0131.400\pm 0.013
hybrid, all kk (FS) 1.430±0.0361.430\pm 0.036 1.0±0.0021.0\pm 0.002 1.420±0.0091.420\pm 0.009 1.429±0.0111.429\pm 0.011
sharp., all kk (FS) 1.371±0.0281.371\pm 0.028 0.998±0.0020.998\pm 0.002 1.390±0.0101.390\pm 0.010 1.422±0.0151.422\pm 0.015
hybrid, k{2n}k\in\{2^{n}\} (BFS) 1.304±0.0761.304\pm 0.076 0.983±0.0060.983\pm 0.006 1.738±0.2231.738\pm 0.223 1.617±0.0261.617\pm 0.026
sharp., k{2n}k\in\{2^{n}\} (BFS) 0.452±0.0520.452\pm 0.052 0.862±0.0100.862\pm 0.010 1.851±0.2141.851\pm 0.214 2.528±0.0192.528\pm 0.019
multi-step, entropy, k{2n}k\in\{2^{n}\} 2.131±0.6182.131\pm 0.618 1.0±0.0181.0\pm 0.018 2.128±0.0752.128\pm 0.075
multi-step, sharp., k{2n}k\in\{2^{n}\} 1.044±0.0071.044\pm 0.007 0.958±0.0010.958\pm 0.001 1.563±0.0101.563\pm 0.010
Table S.1: Fit parameters for the methods plotted in figure S.1 using a functional form Δϕ^=AπNγ\Delta\hat{\phi}=A\pi N^{-\gamma}, and ratios to the HL after total estimation times of N=214N=2^{14} and 2182^{18}. The results for the hybrid and maximum sharpness gain rate methods using contraction from figure 4 are also included for comparison. In addition we have put the results of gain rate methods when a brute force search is used; these particular simulations are results from 2×1062\times 10^{6} estimations with random system phase. BFS: brute force search. FS: Fibonacci search. All results are fitted for total estimation times 212\geq 2^{12} (except for BFS since we have only simulated the N=218N=2^{18} case).
Method tupt_{\mathrm{up}} tHt_{H} tSt_{S} tcont_{\mathrm{con}}
hybrid, all kk (BFS) 40(330)μs40\,(330)\,\mu s 2.5(17)ms2.5\,(17)\,ms 6(20)ms6\,(20)\,ms 37(70)μs37\,(70)\,\mu s
sharp., all kk (BFS) 17(125)μs17\,(125)\,\mu s 3.2(20)ms3.2\,(20)\,ms 27(50)μs27\,(50)\,\mu s
hybrid, all kk (FS) 42(330)μs42\,(330)\,\mu s 800(4300)μs800\,(4300)\,\mu s 800(1700)μs800\,(1700)\,\mu s 38(70)μs38\,(70)\,\mu s
sharp., all kk (FS) 18(120)μs18\,(120)\,\mu s 450(1900)μs450\,(1900)\,\mu s 27(40)μs27\,(40)\,\mu s
hybrid, k{2n}k\in\{2^{n}\} (BFS) 90(700)μs90\,(700)\,\mu s 250(1200)μs250\,(1200)\,\mu s 8(25)μs8\,(25)\mu s 40(80)μs40\,(80)\,\mu s
sharp., k{2n}k\in\{2^{n}\} (BFS) 60(1000)μs60\,(1000)\,\mu s 8(40)μs8\,(40)\,\mu s 45(100)μs45(100)\mu s
Table S.2: Benchmarks. A summary of approximate times required for computation with CPU model: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz. All values are rough estimates from running 20\sim 20 repetitions with total estimation time N=216N=2^{16}. In all simulations performed here we assume the experiment and modelled measurement probabilities are noise-free, and 𝐓𝐊\mathbf{T}\propto\mathbf{K}. tupt_{\mathrm{up}} is the time required for the Bayesian update. tHt_{H} is the time required to determine the optimal values of α\alpha, and kk by maximising the expected entropy gain rate. tSt_{S} is the time required for determining α\alpha and kk that maximise the expected sharpness gain rate. And tcont_{\mathrm{con}} is the time required for contraction (when V[p(ϕ)]\sqrt{V[p(\phi)]}, is less than π212M\frac{\pi}{2^{12}M} then we perform at least one measurement using parameters α,k\alpha,k that maximise the expected sharpness gain, and then we perform a contraction with m=2m=2). Values represent rough averages over all shots and runs of estimation. In parentheses are maximum values observed on a single shot over all runs of estimation. Methods are listed for brute force search (BFS) and Fibonacci search (FS) of the optimal kk-value.

We see that gain rate methods optimising over all possible kk-values reach HS, while those using the subset k{2n},n=1,2,3,k\in\{2^{n}\},\,n=1,2,3,\dots do not. However, the hybrid method using the subset k{2n},n=1,2,3,k\in\{2^{n}\},\,n=1,2,3,\dots is very close to HS and performs only slightly worse than when all kk-values are used. For methods allowing all kk-values we performed contractions when the Holevo variance of the phase density, V[p(ϕ)]V[p(\phi)] was less than π213M\frac{\pi}{2^{13}M}, while when using only a subset of kk-values we had to use the condition V[p(ϕ)]<π214MV[p(\phi)]<\frac{\pi}{2^{14}M} to sufficiently suppress unwanted estimation errors. This suggests the probability distribution of phase estimates can have larger tails when using only subsets of kk-values. This in turn requires a more cautious contraction criterion leading to larger numbers of coefficients in the series representation for p(ϕ)p(\phi), and therefore longer computation times are expected. If the goal of using subsets is to lower computation times for determining the optimal kk, the potential increase in computation due to requiring more coefficients must therefore also be considered.

The results for the multi-step method show that while it performs much better when the compared expected gains are for sequences that require exactly the same time (rather than approximately), it does not perform particularly better than the simpler and more versatile hybrid gain rate method. In particular, only the multi-step method optimising entropy gain reaches HS, but with a notably larger pre-factor of 2.13 compared to the gain rate methods.

For the estimation sequences using a total time of 2162^{16} a summary of some computation time benchmarks for gain rate methods are given in table S.2. Since for later shots in the sequence contraction is also performed on a significant fraction of shots, times required for this computation are also listed. Here we have studied the cases where k{2n},n=0,1,2,k\in\{2^{n}\}\,,n=0,1,2,\dots, while the cases where all kk-values (up to some maximum value) are searched using a series of Fibonacci searches was studied in the main text. For comparison the case where a brute force search is performed over all kk-values is also included.

We observe that the time required for the Bayesian update depends on the method used. We expect this to be related to the size of kk-values chosen by a strategy since this determines how many coefficients must be used in the Fourier representation. Optimising the rate of entropy gain tends to choose larger kk-values than the sharpness. When considering only a subset of possible kk-values, we expect that occasionally the algorithm will choose a larger value of kk than if all values were considered. This would then have the effect of increasing maximum update time. We see that this is indeed the case when restricting to the subset k{2n},n=0,1,2,k\in\{2^{n}\}\,,n=0,1,2,\dots, and that the average update time is also slightly longer.

When considering the times required to determine optimal values of α\alpha and kk, it’s important to note that since the optimal kk increases with phase knowledge, later shots (i.e. measurements) generally use higher kk-values and require more time for the optimisation. When the phase knowledge is sufficient, contractions are performed that prevent the required computation per shot from increasing further. In the hybrid method, at most the first half of the total time (for values in table S.2, 2152^{15}) is used for shots with α\alpha and kk optimised using entropy gain rate; since kk tends to increase, this is most of the shots in the estimation sequence. For the remaining time α\alpha and kk are chosen to maximise the expected rate of sharpness gain. Due to these changes in the computation time with every shot, the times listed in table S.2 for the hybrid method cannot be directly compared with those where the expected rate of sharpness gain is maximised for all shots.

Using the same implementation and similar processor in an experiment we see that the brute force optimisation over all kk values would be appropriate for single shot times of at least 10ms\sim 10\;ms. Using a series of Fibonacci searches, as was done for the methods studied in the main text, would be appropriate for single shot times of at least 1ms\sim 1\;ms for the hybrid method, and 500μs\sim 500\;\mu s when using only sharpness gain. Using the subset k{2n},n=0,1,2,k\in\{2^{n}\}\,,n=0,1,2,\dots would be appropriate for experiments with times of at least 400μs\sim 400\;\mu s for the hybrid method, and 100μs\sim 100\;\mu s when using only sharpness gain.

S.II Intermediate time-repetition regime

In the main text we discussed the performance of TAPE in different settings where the relation between the time of an experiment and the number of repetitions kk of the unknown phase (coherent evolution) is different. In particular we considered two extreme cases: 𝐓𝐊\mathbf{T}\propto\mathbf{K} which is the usual situation in quantum metrology, and t=1kt=1\,\forall\,k which is practically relevant for experiments where the measurement time is several orders of magnitude greater than the time of coherent evolution. Here we consider an intermediate case where the time required for state preparation and measurement (SPAM) is 100 times greater than the time required for a single application of the unknown phase (k=1k=1). In addition we assume a system with decoherence as described in the main text, i.e. perfectly prepared |+|+\rangle states, U=eiZϕ/2U=\mathrm{e}^{-iZ\phi/2}, ZZ, and each application UU of the unknown phase is followed by a dephasing channel K0=1+η2𝟙K_{0}=\sqrt{\frac{1+\eta}{2}}\openone, K1=1η2ZK_{1}=\sqrt{\frac{1-\eta}{2}}Z with η=0.995\eta=0.995.

Refer to caption
Figure S.2: Phase uncertainty of estimates (10610^{6} realisations of estimation) obtained by applying TAPE in the case that the time required for SPAM is 100 times greater than the time required for a single application of the unknown phase, and in the presence of dephasing (η=0.995\eta=0.995). In particular, the time on the xx-axis of the plot is related to the number of applications kk of the unknown phase by t=(k+100)/101t=(k+100)/101. The correct dephasing rate is included in the model of the estimators. Expected entropy gain: \bullet, expected sharpness gain: \square, hybrid method: \circ.

The results of 10610^{6} realisations of estimation using the gain rate method with entropy gain, sharpening gain, and the hybrid method are plotted in figure S.2. For times shorter than twice the SPAM time the phase uncertainty decreases little because we can only perform one measurement. For times slightly more than twice the SPAM time the situation is similar to the case t=1kt=1\,\forall\,k discussed in the main text since applying large values of kk does not significantly impact the experiment time unless k100k\gg 100. In the large kk limit we approach the usual situation in quantum metrology 𝐓𝐊\mathbf{T}\propto\mathbf{K}, however since in this case kk is limited by decoherence we never reach this limit. All three of the plotted methods preform similarly. Initially this is consistent with the results of figure 5 for short times. In the large kk limit the situation becomes similar to the situation in figure 6 for longer times, where all three methods perform similarly when decoherence is included in the model of the estimator. Thus, we see that the results in the limiting cases of the relation between 𝐓\mathbf{T} and 𝐊\mathbf{K} provide an indication of each method’s performance in the intermediate case.

S.III Multi-step gain method

Supposing we have a prior probability density ps1(ϕ)p_{s-1}(\phi) at step s1s-1 of a sequence of measurements, we calculate the maximum expected knowledge (sharpness or entropy) gain for step ss as ΔG(s)maxαs,ksΔG(αs,ks)\Delta G(s)\equiv\max_{\alpha_{s},k_{s}}\Delta G(\alpha_{s},k_{s}) for a single measurement using (7) or (9) for the sharpness or entropy gain, respectively. We calculate the maximum expected gain for two measurements as

ΔG(s,s+1)\displaystyle\Delta G(s,s+1) ΔG(s)+ξΠξ(αs+1(ξ),ks+1(ξ))ΔG(s+1).\displaystyle\equiv\Delta G(s)+\sum_{\xi}\Pi_{\xi}(\alpha_{s+1}^{(\xi)},k_{s+1}^{(\xi)})\Delta G(s+1)\,.

Since ΔG(αs+1,ks+1)\Delta G(\alpha_{s+1},k_{s+1}) depends on ps(ϕ|ξ;αs,ks)p_{s}(\phi|\xi;\alpha_{s},k_{s}), it implicitly depends on the outcome ξ\xi. Therefore we write αs+1(ξ)\alpha_{s+1}^{(\xi)}, ks+1(ξ)k_{s+1}^{(\xi)} to denote the values optimised conditioned on a particular measurement outcome at step ss. We can similarly calculate the maximum expected gain for three measurements as

ΔG(s,s+2)\displaystyle\Delta G(s,s+2)\equiv
ΔG(s)+ξΠξ(αs+1(ξ),ks+1(ξ))ΔGξ(s+1,s+2),\displaystyle\Delta G(s)+\sum_{\xi}\Pi_{\xi}(\alpha_{s+1}^{(\xi)},k_{s+1}^{(\xi)})\Delta G_{\xi}(s+1,s+2)\,,

where the subscript ξ\xi in ΔGξ(s+1,s+2)\Delta G_{\xi}(s+1,s+2) is to remind us that this quantity depends on the measurement outcome at step ss. Applying this recursively we can write the maximum expected gain for jj measurements as

ΔG(s,s1+j)=ΔG(s)\displaystyle\Delta G(s,s-1+j)=\Delta G(s)
+ξΠξ(αs+1(ξ),ks+1(ξ))ΔGξ(s+1,s1+j).\displaystyle+\sum_{\xi}\Pi_{\xi}(\alpha_{s+1}^{(\xi)},k_{s+1}^{(\xi)})\Delta G_{\xi}(s+1,s-1+j)\,. (S.1)

Since the number of possible outcomes for the next jj measurements is 2j2^{j}, the computation of the multi-step gain generally grows exponentially with the number of measurements. For this reason algorithms S.1 and S.2 are written to compare measurement sequences of approximately equal time that contain at most 5 measurements. In general the gain calculations (7), (9) don’t require all coefficients in (2) allowing a more efficient computation of the multi-step gain.

Algorithm S.1 Calculate intervals
Input: 𝐓={t1,t2,,tN},tn\mathbf{T}=\{t_{1},t_{2},\dots,t_{N}\},\,t_{n}\in\mathbb{R}, sorted in increasing order.
function Intervals(𝐓\mathbf{T})
  Let 𝐈\mathbf{I} be a list of pairs (a,b)(a,b). a,ba,b\in\mathbb{N}.
  \triangleright 𝐈.push((a,b))\mathbf{I}.\text{push}((a,b)) denotes adding a pair (a,b)(a,b) to the back of the list.
  𝐈\mathbf{I} is initially empty.
  n1n\leftarrow 1
  while n<Nn<N do
   mnm\leftarrow n
   nn+1n\leftarrow n+1
   while nNn\leq N and tn/tm<8/7t_{n}/t_{m}<8/7 do
     nn+1n\leftarrow n+1
   end while
   𝐈.push((m,n1))\mathbf{I}.\text{push}((m,n-1))
  end while
  if n1<Nn-1<N then
   𝐈.push((n,N))\mathbf{I}.\text{push}((n,N))
  end if
  return 𝐈\mathbf{I}
end function
Algorithm S.2 Compare two kk-values
Requires: function gain(k,j)(k,j), where kk is the number of applications of the unitary UU and jj is the number of steps. gain(k,j)(k,j) returns the expected knowledge gain for performing jj steps (measurements) with UkU^{k}, i.e. the result of equation (S.1) with ks=ks+1==ks1+j=kk_{s}=k_{s+1}=\dots=k_{s-1+j}=k.
Inputs:
𝐊={k1,k2,,kN},kn\mathbf{K}=\{k_{1},k_{2},\dots,k_{N}\},\,k_{n}\in\mathbb{N}, 𝐓={t1,t2,,tN},tn\mathbf{T}=\{t_{1},t_{2},\dots,t_{N}\},\,t_{n}\in\mathbb{R},
n1,n2n_{1},n_{2}\in\mathbb{N}, n1<n2n_{1}<n_{2}, tn1/tn2<32/7t_{n_{1}}/t_{n_{2}}<32/7.
function Compare(𝐊\mathbf{K}, 𝐓\mathbf{T}, n1n_{1}, n2n_{2})
  rtn2/tn1r\leftarrow t_{n_{2}}/t_{n_{1}}
  if r<8/7r<8/7 then j11j_{1}\leftarrow 1, j21j_{2}\leftarrow 1
  else if r<24/17r<24/17 then j14j_{1}\leftarrow 4, j23j_{2}\leftarrow 3
  else if r<12/7r<12/7 then j13j_{1}\leftarrow 3, j22j_{2}\leftarrow 2
  else if r<20/9r<20/9 then j12j_{1}\leftarrow 2, j21j_{2}\leftarrow 1
  else if r<30/11r<30/11 then j15j_{1}\leftarrow 5, j22j_{2}\leftarrow 2
  else if r<24/7r<24/7 then j13j_{1}\leftarrow 3, j21j_{2}\leftarrow 1
  else if r<32/7r<32/7 then j14j_{1}\leftarrow 4, j21j_{2}\leftarrow 1
  end if
  g1gain(kn1,j1)g_{1}\leftarrow\text{gain}(k_{n_{1}},j_{1}), g2gain(kn2,j2)g_{2}\leftarrow\text{gain}(k_{n_{2}},j_{2})
  if g1>g2g_{1}>g_{2} then return n1n_{1}
  else return n2n_{2}
  end if
end function
Algorithm S.3 Search interval
Inputs:
𝐊={k1,k2,,kN},kn\mathbf{K}=\{k_{1},k_{2},\dots,k_{N}\},\,k_{n}\in\mathbb{N}, 𝐓={t1,t2,,tN},tn\mathbf{T}=\{t_{1},t_{2},\dots,t_{N}\},\,t_{n}\in\mathbb{R},
n1,n2n_{1},n_{2}\in\mathbb{N}, n1n2n_{1}\leq n_{2}
function searchInterval(𝐊\mathbf{K}, 𝐓\mathbf{T}, n1n_{1}, n2n_{2})
  if n2==n1n_{2}==n_{1} then
   if n1==Nn_{1}==N then return n1n_{1}
   return compare(n1,n1+1)(n_{1},n_{1}+1)
  else\triangleright Brute-force search for max gain in the interval.
   maxgain(kn1,1)\text{max}\leftarrow\text{gain}(k_{n_{1}},1), nn1n\leftarrow n_{1}
   for mn1+1,n2m\leftarrow n_{1}+1,n_{2} do
     ggain(km,1)g\leftarrow\text{gain}(k_{m},1)
     if g>maxg>\text{max} then
      maxg\text{max}\leftarrow g, nmn\leftarrow m
     end if
   end for
   return nn
  end if
end function
Algorithm S.4 Search up
Requires: function findInterval(𝐈,n)(\mathbf{I},n), where 𝐈\mathbf{I} is a list of intervals (see algorithm S.1) and nn\in\mathbb{N}, that returns the index ii of the unique interval in 𝐈\mathbf{I} s.t. n1nn2n1\leq n\leq n2.
Inputs:
𝐊={k1,k2,,kN},kn\mathbf{K}=\{k_{1},k_{2},\dots,k_{N}\},\,k_{n}\in\mathbb{N}, 𝐓={t1,t2,,tN},tn\mathbf{T}=\{t_{1},t_{2},\dots,t_{N}\},\,t_{n}\in\mathbb{R},
intervals 𝐈\mathbf{I} (see algorithm S.1), NIN_{I} (the number of intervals in 𝐈\mathbf{I}), nn\in\mathbb{N}
function searchUp(𝐊\mathbf{K}, 𝐓\mathbf{T}, 𝐈\mathbf{I}, nn)
  ifindInterval(𝐈,n)i\leftarrow\text{findInterval}(\mathbf{I},n)
  while true do
   (n1,n2)𝐈[i](n_{1},n_{2})\leftarrow\mathbf{I}[i]
   nsearchInterval(𝐊,𝐓,n1,n2)n\leftarrow\text{searchInterval}(\mathbf{K},\mathbf{T},n_{1},n_{2})
   if n2==n1n_{2}==n_{1} then
     if n==n1n==n_{1} then return nn
   else
     if n<n2n<n_{2} then return nn
     if i==NIi==N_{I} then return nn
     mcompare(𝐊,𝐓,n1,n2+1)m\leftarrow\text{compare}(\mathbf{K},\mathbf{T},n_{1},n_{2}+1)
     if m==n1m==n_{1} then return n1n_{1}
   end if
   ii+1i\leftarrow i+1
  end while
end function
Algorithm S.5 Multi-step gain method
Inputs:
𝐊={k1,k2,,kN},kn\mathbf{K}=\{k_{1},k_{2},\dots,k_{N}\},\,k_{n}\in\mathbb{N}, 𝐓={t1,t2,,tN},tn\mathbf{T}=\{t_{1},t_{2},\dots,t_{N}\},\,t_{n}\in\mathbb{R},
function multiStepMethod(𝐊,𝐓\mathbf{K},\mathbf{T})
  𝐈intervals(𝐓)\mathbf{I}\leftarrow\text{intervals}(\mathbf{T})
  nsearchUp(𝐊,𝐓,𝐈,1)n\leftarrow\text{searchUp}(\mathbf{K},\mathbf{T},\mathbf{I},1)
  return knk_{n}
end function

To understand the effective functions maximised by the multi-step method (in the noise-free case), we calculate the multi-shot gains for sequences with approximately equal time as detailed in algorithm S.2. We assume the usual experiment times in sequential metrology experiments, i.e. 𝐓𝐊\mathbf{T}\propto\mathbf{K}, and perform estimation using a total time of sks768\sum_{s}k_{s}\approx 768 starting from a uniform prior. In figure S.3 we then plot the cumulative difference in expected gain for the next measurement between consecutive kk-values:

m=2k(gain(m,jm)gain(m1,jm1)),\displaystyle\sum_{m=2}^{k}\big{(}\mathrm{gain}(m,j_{m})-\mathrm{gain}(m-1,j_{m-1})\big{)}\,, (S.2)

where the values of jm,jm1m=2,,kj_{m},j_{m-1}\forall m=2,\dots,k are determined as in algorithm S.2. The gain difference calculated in (S.2) is adjusted between intervals with multiple kk-values by comparing the first kk-value of consecutive intervals (as in algorithm S.4). The values plotted for k=1k=1 are set to zero. In this way the plotted values show an effective function that is maximised (locally) by algorithm S.5.

Refer to caption
Refer to caption
Figure S.3: Differential gains for the multi-step method. Upper (Lower) plot: differential multi-shot gains for the expected sharpness (entropy) gain. The expected sharpness becomes flat for larger values of kk since measurements with such large kk-values would lead to multi-peaked densities for the phase knowledge giving no increase in sharpness. Since multi-peaked densities still lead to a change in entropy, the differential gains in the lower plot continue to decrease for larger kk-values. The intervals determined by algorithm S.1 are shown by the colours. Black points correspond to intervals containing only one kk-value. Intervals with more than one kk-value are coloured, with a change in colour indicating the next interval.

We see that the differential gains for both expected sharpness and entropy gains show a global maximum around k45k\sim 45. However the differences in multi-shot gains oscillate due to the fact that the times of the multi-step sequences are only approximately equal. Due to these oscillations, algorithm S.4 tends to stop at a local maximum. In algorithm S.5 we always start the search at k=1k=1 so that the value of kk returned is typically much lower than the optimal value suggested by the plotted differential gains. Using the gain rate method to determine the choice of kk for this measurement gives 50(60)50(60) to maximise the expected sharpness (entropy) gain rate. More efficient methods to find the global maximum of the plotted functions could potentially also lead to accurate estimation procedures.

S.IV Computational complexity of contractions

In this section we derive an upper bound on the number of coefficients Γ\Gamma needed for the Fourier series p(ϕ)p(\phi) when using the contraction method, under the assumption that p(ϕ)p(\phi) is Gaussian. In particular, since p(ϕ)p(\phi) is periodic, we assume that it would have the form

p(ϕ)\displaystyle p(\phi) =n=e12n2σ2ein(ϕμ)\displaystyle=\sum_{n=-\infty}^{\infty}\mathrm{e}^{-\frac{1}{2}n^{2}\sigma^{2}}\mathrm{e}^{in(\phi-\mu)}
=1+2n=1e12n2σ2cos(n(ϕμ)),\displaystyle=1+2\sum_{n=1}^{\infty}\mathrm{e}^{-\frac{1}{2}n^{2}\sigma^{2}}\cos(n(\phi-\mu))\,,

where μ\mu is the mean phase, and the Holevo variance is V[p(ϕ)]=σ2V[p(\phi)]=\sigma^{2}. We have written the sum to \infty here; below we will consider the number of coefficients Γ\Gamma that would be used in an estimation sequence.

In the following we analyse the case where we perform contractions with m=2m=2, so that after performing cc contractions we go from a representation p(ϕ)p(\phi) on an interval of size 2π2\pi to a representation q(θ)q(\theta) on an interval of size 2π/2c2\pi/2^{c}. Since p(ϕ)p(\phi) is a measure for the probability of the system phase we are trying to estimate, we will quantify the probability of the system phase being outside the interval represented by q(θ)q(\theta) after a single contraction by

ε\displaystyle\varepsilon =μ+π2μ+3π2p(ϕ)2π𝑑ϕ\displaystyle=\int_{\mu+\frac{\pi}{2}}^{\mu+\frac{3\pi}{2}}\frac{p(\phi)}{2\pi}d\phi
=12ππ23π2(1+2n=1e12n2σ2cos(nϕ))𝑑ϕ\displaystyle=\frac{1}{2\pi}\int_{\frac{\pi}{2}}^{\frac{3\pi}{2}}\left(1+2\sum_{n=1}^{\infty}\mathrm{e}^{-\frac{1}{2}n^{2}\sigma^{2}}\cos(n\phi)\right)d\phi
=12+1πn=1e12n2σ22π3π2cos(nϕ)𝑑ϕ\displaystyle=\frac{1}{2}+\frac{1}{\pi}\sum_{n=1}^{\infty}\mathrm{e}^{-\frac{1}{2}n^{2}\sigma^{2}}\int_{\frac{2}{\pi}}^{\frac{3\pi}{2}}\cos(n\phi)d\phi
=12+2πn=0e12σ2(4n+3)24n+3e12σ2(4n+1)24n+1,\displaystyle=\frac{1}{2}+\frac{2}{\pi}\sum_{n=0}^{\infty}\frac{\mathrm{e}^{-\frac{1}{2}\sigma^{2}(4n+3)^{2}}}{4n+3}-\frac{\mathrm{e}^{-\frac{1}{2}\sigma^{2}(4n+1)^{2}}}{4n+1}\,,

where in the second line we have used the fact that the result is independent of the mean phase μ\mu so that we can, without loss of generality, set μ\mu to zero. The last line follows from solving the integral and relabelling the index of the sum to include only the odd values of nn in the third line (the integral is zero for even values of nn). Although we have not found a solution to the infinite series, we find that ε\varepsilon is very close to the corresponding quantity for a non-periodic Gaussian,

ε1+erf(π22σ),\displaystyle\varepsilon\approx 1+\mathrm{erf}\left(\frac{-\pi}{2\sqrt{2}\sigma}\right)\,,

when σ\sigma is less than π/2\sim\pi/2.

If we specify an upper bound ε\varepsilon on the desired probability that the system phase cannot be represented after performing a contraction, then we obtain an upper bound on the standard deviation, V[p(ϕ)]=σ\sqrt{V[p(\phi)]}=\sigma, before we should perform a contraction:

σ<π22erf1(1ε)=π22erfc1(ε),\displaystyle\sigma<\frac{\pi}{2\sqrt{2}\mathrm{erf}^{-1}(1-\varepsilon)}=\frac{\pi}{2\sqrt{2}\mathrm{erfc}^{-1}(\varepsilon)}\,, (S.3)

where erfc1\mathrm{erfc}^{-1} is the inverse complementary error function. As mentioned in the main text, we expect from Fourier analysis that the number of coefficients Γ\Gamma is inversely proportional to V[p(ϕ)]\sqrt{V[p(\phi)]} (and therefore to Δϕ^\Delta\hat{\phi}). We can see this in the case of a Gaussian by assuming that any numerical representation will have a finite machine precision. Let δ\delta be the smallest value that can be represented numerically. Then for the highest order coefficient that can be represented, we have eΓ2σ2/2=δ\mathrm{e}^{-\Gamma^{2}\sigma^{2}/2}=\delta, so

Γ\displaystyle\Gamma =2ln(δ1)σ.\displaystyle=\frac{\sqrt{2\ln(\delta^{-1})}}{\sigma}\,. (S.4)

Substituting (S.3) into (S.4), we obtain a relation between the probability of an error, ε\varepsilon, and the number of coefficients:

Γ\displaystyle\Gamma >4πln(δ1)erfc1(ε).\displaystyle>\frac{4}{\pi}\sqrt{\ln(\delta^{-1})}\mathrm{erfc}^{-1}(\varepsilon)\,. (S.5)

If we now have a p(ϕ)p(\phi) that is Gaussian with standard deviation σ\sigma^{*} given by (S.3) and that requires a representation to machine precision using Γ\Gamma^{*} coefficients, as given by (S.5), then the probability of an error after performing one contraction with m=2m=2 will be less than ε\varepsilon. At this point we would have a density q(θ)q(\theta) requiring only Γ/2\Gamma^{*}/2 coefficients and we could continue estimation until the standard deviation of q(θ)q(\theta) would be σ\sigma^{*}, and that of p(ϕ)p(\phi) represented by this contraction would be σ/2\sigma^{*}/2. Now q(θ)q(\theta) would have Γ\Gamma^{*} coefficients and we would need to perform another contraction to avoid increasing Γ\Gamma above Γ\Gamma^{*}. If we would perform a second contraction the total probability that either the first or the second contraction would lead to an error would be ε+ε(1ε)<2ε\varepsilon+\varepsilon(1-\varepsilon)<2\varepsilon. To keep the total error probability below the original value of ε\varepsilon we would then want to use slightly more coefficients to begin with so that ε\varepsilon is twice as small.

More generally, if we perform a total of cc contractions, the total error probability ϵ\epsilon is less than cεc\varepsilon, and from (S.5) we should use a number of coefficients

Γ\displaystyle\Gamma^{*} 4πln(δ1)erfc1(ϵc).\displaystyle\equiv\left\lceil\frac{4}{\pi}\sqrt{\ln(\delta^{-1})}\mathrm{erfc}^{-1}\left(\frac{\epsilon}{c}\right)\right\rceil\,.

A representation without contractions would require Γ=2cΓ\Gamma=2^{c}\Gamma^{*} coefficients. Let Γ1\Gamma_{1} be the number of coefficients required to keep the error probability below ϵ\epsilon for a single contraction:

Γ1\displaystyle\Gamma_{1} 4πln(δ1)erfc1(ϵ).\displaystyle\equiv\left\lceil\frac{4}{\pi}\sqrt{\ln(\delta^{-1})}\mathrm{erfc}^{-1}\left(\epsilon\right)\right\rceil\,.

We have Γ=2cΓ2cΓ1\Gamma=2^{c}\Gamma^{*}\geq 2^{c}\Gamma_{1}, so clog2(Γ/Γ1)c\leq\log_{2}(\Gamma/\Gamma_{1}), and we can write an upper bound on the number of coefficients Γ\Gamma^{*} when using contractions as a function of the number of coefficients Γ\Gamma that would be needed without using contractions:

Γ4πln(δ1)erfc1(ϵlog2(ΓΓ1)).\displaystyle\Gamma^{*}\leq\frac{4}{\pi}\sqrt{\ln(\delta^{-1})}\mathrm{erfc}^{-1}\left(\frac{\epsilon}{\log_{2}\left(\frac{\Gamma}{\Gamma_{1}}\right)}\right)\,. (S.6)

In figure S.4 we plot the upper bound (S.6) with ϵ=1010\epsilon=10^{-10} and δ=1016\delta=10^{-16}. We see that it scales slower than log(Γ)\log(\Gamma) suggesting that the we can use a number of coefficients Γ=𝒪(log(Γ))\Gamma^{*}=\mathcal{O}(\log(\Gamma)) while keeping the total probability that an error occurs below ϵ\epsilon.

Refer to caption
Figure S.4: Plot of the upper bound (S.6) with ϵ=1010\epsilon=10^{-10} and δ=1016\delta=10^{-16}.

To determine the asymptotic complexity of Γ\Gamma^{*} in terms of Γ\Gamma we make use of an asymptotic expansion of erfc1(x)\mathrm{erfc}^{-1}(x), for x0x\rightarrow 0 [92, 93]:

erfc1(x)u1/2(1+u(a2u+a3u2+))\displaystyle\mathrm{erfc}^{-1}(x)\sim u^{-1/2}\left(1+u\left(a_{2}u+a_{3}u^{2}+\dots\right)\right) (S.7)

where

u\displaystyle u =2/ln(πx2ln(1/x)),\displaystyle=-2/\ln(\pi x^{2}\ln(1/x))\,,
a2\displaystyle a_{2} =v/8,a3=(v2+6v6)/32,\displaystyle=v/8\,,\qquad a_{3}=-(v^{2}+6v-6)/32\,,
v\displaystyle v =ln(ln(1/x))2+ln(π).\displaystyle=\ln(\ln(1/x))-2+\ln(\pi)\,.

As x0x\rightarrow 0, vv\rightarrow\infty and u0u\rightarrow 0, but each term an+1una_{n+1}u^{n} in (S.7) will be 𝒪((vu)n)\mathcal{O}\left((vu)^{n}\right). Since

limx0vu0,\displaystyle\lim_{x\rightarrow 0}vu\rightarrow 0\,,

erfc1(x)u1/2\mathrm{erfc}^{-1}(x)\rightarrow u^{-1/2} as x0x\rightarrow 0. Then to go further, as x0x\rightarrow 0, we have u1/2<ln(1/x)u^{-1/2}<\sqrt{\ln(1/x)}. From this result we have Γ=𝒪(log(log(Γ)))\Gamma^{*}=\mathcal{O}\left(\sqrt{\log(\log(\Gamma))}\right).

We note that the bound (S.6) is only of practical interest for determining what value of σ\sigma should be reached before performing a contraction with a certain error probability, if p(ϕ)p(\phi) is Gaussian. Since we have not shown this in any of the estimation methods we have studied in the main text it cannot be used there. In those cases the value of σ\sigma to be reached before contracting is determined empirically from simulation. But the purpose of the bound (S.6) is mainly to determine what potential the method of using contractions has to reduce computational complexity, rather than to be a practical way of calculating the required value of σ\sigma.

S.V Expected entropy gain

In this section we show the relation between the expected entropy gain (8) and the Kullback-Leibler divergence (KL divergence), and we derive equation (9). Starting from the definition of the expected entropy gain (8), we can rewrite the entropy of the posterior (we sometimes write ps(ϕ)=ps(ϕ|ξ;α,k)p_{s}(\phi)=p_{s}(\phi|\xi;\alpha,k) for short)

H[ps(ϕ|ξ;α,k)]\displaystyle H[p_{s}(\phi|\xi;\alpha,k)] =02πdϕ2πps(ϕ)ln(Pξ(α,kϕ)Πξ(α,k))02πdϕ2πps(ϕ)ln(ps1(ϕ)2π)\displaystyle=-\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\ln\left(\frac{P_{\xi}(\alpha,k\phi)}{\Pi_{\xi}(\alpha,k)}\right)-\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\ln\left(\frac{p_{s-1}(\phi)}{2\pi}\right)
=02πdϕ2πps(ϕ)ln(ps(ϕ)ps1(ϕ))+H(ps,ps1)\displaystyle=-\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\ln\left(\frac{p_{s}(\phi)}{p_{s-1}(\phi)}\right)+H(p_{s},\,p_{s-1})
=DKL(psps1)+H(ps,ps1),\displaystyle=-D_{KL}(p_{s}\|p_{s-1})+H(p_{s},\,p_{s-1})\,,

where DKL(psps1)D_{KL}(p_{s}\|p_{s-1}) is the KL divergence of the posterior from the prior, and H(ps,ps1)H(p_{s},\,p_{s-1}) is the cross entropy. Noting that

ξΠξ(α,k)(H[ps1(ϕ)]H(ps(ϕ|ξ;α,k),ps1(ϕ)))\displaystyle\sum_{\xi}\Pi_{\xi}(\alpha,k)\Big{(}H[p_{s-1}(\phi)]-H\big{(}p_{s}(\phi|\xi;\alpha,k),\,p_{s-1}(\phi)\big{)}\Big{)} =0,\displaystyle=0\,,

we see that

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =ξΠξ(α,k)DKL(psps1),\displaystyle=\sum_{\xi}\Pi_{\xi}(\alpha,k)D_{KL}(p_{s}\|p_{s-1})\,,

i.e. the expected entropy gain is equal to the expected KL divergence. We can rewrite this as

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =ξ02πdϕ2πPξ(α,kϕ)ps1(ϕ)ln(Pξ(α,kϕ))ξΠξ(α,k)ln(Πξ(α,k)).\displaystyle=\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,.

We would like to rewrite the first term with the integral in a form that can be computed more efficiently.

S.V.1 Solving the integral

We have

ξ02πdϕ2πPξ(α,kϕ)ps1(ϕ)ln(Pξ(α,kϕ))\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}
=ln(2)+ξ02πdϕ2π[12(1+ξ((1λ)+λζcos(αkϕ)))\displaystyle=-\ln(2)+\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}\bigg{[}\frac{1}{2}\Big{(}1+\xi\big{(}(1-\lambda)+\lambda\zeta\cos(\alpha-k\phi)\big{)}\Big{)}
ln(1+ξ((1λ)+λζcos(αkϕ)))].\displaystyle\qquad\qquad\qquad\qquad\qquad\quad\ln\Big{(}1+\xi\big{(}(1-\lambda)+\lambda\zeta\cos(\alpha-k\phi)\big{)}\Big{)}\bigg{]}\,.

Let γ=ζcos(αkϕ)\gamma=\zeta\cos(\alpha-k\phi), β=(1λ)+λγ\beta=(1-\lambda)+\lambda\gamma. Then we have

ξ02πdϕ2πPξ(α,kϕ)ps1(ϕ)ln(Pξ(α,kϕ))\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}
=ln(2)+02πdϕ2π12[(ln(1+β)+ln(1β))+β(ln(1+β)ln(1β))].\displaystyle=-\ln(2)+\int_{0}^{2\pi}\frac{d\phi}{2\pi}\frac{1}{2}\bigg{[}\Big{(}\ln(1+\beta)+\ln(1-\beta)\Big{)}+\beta\Big{(}\ln(1+\beta)-\ln(1-\beta)\Big{)}\bigg{]}\,.

We can work out the terms in the last line using the Taylor series form of ln(1±β)\ln(1\pm\beta) 101010This is possible since |β|1|\beta|\leq 1.:

ln(1+β)+ln(1β)\displaystyle\ln(1+\beta)+\ln(1-\beta) =2n=1β2n2n,\displaystyle=-2\sum_{n=1}^{\infty}\frac{\beta^{2n}}{2n}\,,
ln(1+β)ln(1β)\displaystyle\ln(1+\beta)-\ln(1-\beta) =2n=1β2n12n1.\displaystyle=2\sum_{n=1}^{\infty}\frac{\beta^{2n-1}}{2n-1}\,.

So we have

(ln(1+β)+ln(1β))+β(ln(1+β)ln(1β))\displaystyle\Big{(}\ln(1+\beta)+\ln(1-\beta)\Big{)}+\beta\Big{(}\ln(1+\beta)-\ln(1-\beta)\Big{)} =2n=1(12n(2n1))β2n.\displaystyle=2\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\beta^{2n}\,.

Using this result, we have

ξ02πdϕ2πPξ(α,kϕ)ps1(ϕ)ln(Pξ(α,kϕ))\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)} =ln(2)+n=1(12n(2n1))02πdϕ2πps1(ϕ)β2n,\displaystyle=-\ln(2)+\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s-1}(\phi)\beta^{2n}\,,

and

β2n\displaystyle\beta^{2n} =((1λ)+λγ)2n=u=02n(2nu)(1λ)2nuλuγu,\displaystyle=\big{(}(1-\lambda)+\lambda\gamma\big{)}^{2n}=\sum_{u=0}^{2n}\begin{pmatrix}2n\\ u\end{pmatrix}(1-\lambda)^{2n-u}\lambda^{u}\gamma^{u}\,,

so

ξ02πdϕ2πPξ(α,kϕ)ps1(ϕ)ln(Pξ(α,kϕ))\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}
=ln(2)+n=1(12n(2n1))u=02n(2nu)(1λ)2nuλuζu02πdϕ2πps1(ϕ)cosu(αkϕ).\displaystyle=-\ln(2)+\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\sum_{u=0}^{2n}\begin{pmatrix}2n\\ u\end{pmatrix}(1-\lambda)^{2n-u}\lambda^{u}\zeta^{u}\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s-1}(\phi)\cos^{u}(\alpha-k\phi)\,.

We can rewrite the integral term

02πdϕ2πps1(ϕ)cosu(αkϕ)\displaystyle\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s-1}(\phi)\cos^{u}(\alpha-k\phi) =12uq=0u(uq)eiα(u2q)ck(u2q)(s1),\displaystyle=\frac{1}{2^{u}}\sum_{q=0}^{u}\begin{pmatrix}u\\ q\end{pmatrix}\mathrm{e}^{i\alpha(u-2q)}c_{k(u-2q)}^{(s-1)}\,,

so

ξ02πdϕ2πPξ(α,kϕ)ps1(ϕ)ln(Pξ(α,kϕ))=\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}=
ln(2)+n=1(12n(2n1))u=02n(2nu)(1λ)2nu(λζ2)uq=0u(uq)eiα(u2q)ck(u2q)(s1).\displaystyle-\ln(2)+\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\sum_{u=0}^{2n}\begin{pmatrix}2n\\ u\end{pmatrix}(1-\lambda)^{2n-u}\left(\frac{\lambda\zeta}{2}\right)^{u}\sum_{q=0}^{u}\begin{pmatrix}u\\ q\end{pmatrix}\mathrm{e}^{i\alpha(u-2q)}c_{k(u-2q)}^{(s-1)}\,. (S.8)

That solves the integral, but we are left with some infinite series that are not very practical for computation. Ideally we would like to have an expression that is a sum over the coefficients cn(s1)c_{n}^{(s-1)} of the prior, but without any other infinite series.

S.V.2 Simplifying and solving the series

Using some results from [95], we find the following series solutions

n=11n(2n1)(2nn)xn\displaystyle\sum_{n=1}^{\infty}\frac{1}{n(2n-1)}\begin{pmatrix}2n\\ n\end{pmatrix}x^{n} =8x1+14x+2ln(1+14x)2ln(2),\displaystyle=\frac{8x}{1+\sqrt{1-4x}}+2\ln(1+\sqrt{1-4x})-2\ln(2)\,, (S.9)
n=m1n(2n1)(2nnm)xn\displaystyle\sum_{n=m}^{\infty}\frac{1}{n(2n-1)}\begin{pmatrix}2n\\ n-m\end{pmatrix}x^{n} =(1+2m14xm(4m21))(4x(1+14x)2)m,\displaystyle=\left(\frac{1+2m\sqrt{1-4x}}{m(4m^{2}-1)}\right)\left(\frac{4x}{(1+\sqrt{1-4x})^{2}}\right)^{m}\,, (S.10)
v=m(12(v1)(2v1))(2v1vm)x2v1\displaystyle\sum_{v=m}^{\infty}\left(\frac{1}{2(v-1)(2v-1)}\right)\begin{pmatrix}2v-1\\ v-m\end{pmatrix}x^{2v-1}
=x2(1+(2m1)14x2(m1)m(2m1)(1+14x2))(4x2(1+14x2)2)m1,\displaystyle=\frac{x}{2}\left(\frac{1+(2m-1)\sqrt{1-4x^{2}}}{(m-1)m(2m-1)(1+\sqrt{1-4x^{2}})}\right)\left(\frac{4x^{2}}{(1+\sqrt{1-4x^{2}})^{2}}\right)^{m-1}\,, (S.11)
v=212(v1)(2v1)(2v1v1)x2v1\displaystyle\sum_{v=2}^{\infty}\frac{1}{2(v-1)(2v-1)}\begin{pmatrix}2v-1\\ v-1\end{pmatrix}x^{2v-1}
=x2(1+2ln(2)2ln(1+14x2)21+14x2),\displaystyle=\frac{x}{2}\left(1+2\ln(2)-2\ln(1+\sqrt{1-4x^{2}})-\frac{2}{1+\sqrt{1-4x^{2}}}\right)\,, (S.12)

which will be used in the derivation below.

We rewrite the sum over qq in (S.8) separately for the cases where uu is even or odd. If uu is even, we have

q=0u(uq)eiα(u2q)ck(u2q)(s1)\displaystyle\sum_{q=0}^{u}\begin{pmatrix}u\\ q\end{pmatrix}\mathrm{e}^{i\alpha(u-2q)}c_{k(u-2q)}^{(s-1)} =m=u/2u/2(uu2m)eiα2mck2m(s1).\displaystyle=\sum_{m=-u/2}^{u/2}\begin{pmatrix}u\\ \frac{u}{2}-m\end{pmatrix}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\,.

For every term in the sum with a positive value of mm there is another term with the same value of mm, but negative; and there is one term with m=0m=0. So we can rewrite the sum as

(uu2)c0(s1)+m=1u/2((uu2m)eiα2mck2m(s1)+(uu2+m)eiα2mck2m(s1)).\displaystyle\begin{pmatrix}u\\ \frac{u}{2}\end{pmatrix}c_{0}^{(s-1)}+\sum_{m=1}^{u/2}\Bigg{(}\begin{pmatrix}u\\ \frac{u}{2}-m\end{pmatrix}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}+\begin{pmatrix}u\\ \frac{u}{2}+m\end{pmatrix}\mathrm{e}^{-i\alpha 2m}c_{-k2m}^{(s-1)}\Bigg{)}\,.

We can use the property of binomial coefficients:

(nk)\displaystyle\begin{pmatrix}n\\ k\end{pmatrix} =(nnk), so we have (uu2m)=(uu2+m).\displaystyle=\begin{pmatrix}n\\ n-k\end{pmatrix}\text{, so we have }\begin{pmatrix}u\\ \frac{u}{2}-m\end{pmatrix}=\begin{pmatrix}u\\ \frac{u}{2}+m\end{pmatrix}\,.

Also noting that c0(s1)=1c_{0}^{(s-1)}=1 111111This is required by normalisation of ps1(ϕ)p_{s-1}(\phi). we can rewrite the sum as

(uu2)+2m=1u/2(uu2m)𝔢{eiα2mck2m(s1)}.\displaystyle\begin{pmatrix}u\\ \frac{u}{2}\end{pmatrix}+2\sum_{m=1}^{u/2}\begin{pmatrix}u\\ \frac{u}{2}-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\Big{\}}\,.

Similarly, if uu is odd, we can write

q=0u(uq)eiα(u2q)ck(u2q)(s1)\displaystyle\sum_{q=0}^{u}\begin{pmatrix}u\\ q\end{pmatrix}\mathrm{e}^{i\alpha(u-2q)}c_{k(u-2q)}^{(s-1)} =2m=1(u+1)/2(uu+12m)𝔢{eiα(2m1)ck(2m1)(s1)}.\displaystyle=2\sum_{m=1}^{(u+1)/2}\begin{pmatrix}u\\ \frac{u+1}{2}-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\Big{\}}\,.

Now let

f(u)q=0u(uq)eiα(u2q)ck(u2q)(s1),\displaystyle f(u)\equiv\sum_{q=0}^{u}\begin{pmatrix}u\\ q\end{pmatrix}\mathrm{e}^{i\alpha(u-2q)}c_{k(u-2q)}^{(s-1)}\,,

to simplify the notation. So splitting the sum over uu into even and odd parts, we have

u=02n(2nu)(1λ)2nu(λζ2)uf(u)\displaystyle\sum_{u=0}^{2n}\begin{pmatrix}2n\\ u\end{pmatrix}(1-\lambda)^{2n-u}\left(\frac{\lambda\zeta}{2}\right)^{u}f(u) =u=0n(2n2u)(1λ)2(nu)(λζ2)2uf(2u)\displaystyle=\sum_{u=0}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}f(2u)
+v=1n(2n2v1)(1λ)2(nv)+1(λζ2)2v1f(2v1).\displaystyle+\sum_{v=1}^{n}\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2(n-v)+1}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}f(2v-1)\,.

Now we consider the even part of the sum:

u=0n(2n2u)(1λ)2(nu)(λζ2)2uf(2u)\displaystyle\sum_{u=0}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}f(2u)
=(1λ)2n+u=1n(2n2u)(1λ)2(nu)(λζ2)2u(2uu)\displaystyle=(1-\lambda)^{2n}+\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\begin{pmatrix}2u\\ u\end{pmatrix}
+2u=1n(2n2u)(1λ)2(nu)(λζ2)2um=1u(2uum)𝔢{eiα2mck2m(s1)}.\displaystyle+2\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\sum_{m=1}^{u}\begin{pmatrix}2u\\ u-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\Big{\}}\,.

For the odd part we have

v=1n(2n2v1)(1λ)2(nv)+1(λζ2)2v1f(2v1)\displaystyle\sum_{v=1}^{n}\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2(n-v)+1}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}f(2v-1)
=2v=1n(2n2v1)(1λ)2(nv)+1(λζ2)2v1m=1v(2v1vm)𝔢{eiα(2m1)ck(2m1)(s1)}.\displaystyle=2\sum_{v=1}^{n}\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2(n-v)+1}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}\sum_{m=1}^{v}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\Big{\}}\,.

So if we combine everything we have

ξ02πdϕ2πPξ(α,kϕ)ps1(ϕ)ln(Pξ(α,kϕ))\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}
=ln(2)+n=1(12n(2n1))[(1λ)2n+u=1n(2n2u)(1λ)2(nu)(λζ2)2u(2uu)\displaystyle=-\ln(2)+\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\Bigg{[}(1-\lambda)^{2n}+\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\begin{pmatrix}2u\\ u\end{pmatrix}
+2u=1n(2n2u)(1λ)2(nu)(λζ2)2um=1u(2uum)𝔢{eiα2mck2m(s1)}\displaystyle+2\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\sum_{m=1}^{u}\begin{pmatrix}2u\\ u-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\Big{\}}
+2v=1n(2n2v1)(1λ)2(nv)+1(λζ2)2v1m=1v(2v1vm)𝔢{eiα(2m1)ck(2m1)(s1)}].\displaystyle+2\sum_{v=1}^{n}\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2(n-v)+1}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}\sum_{m=1}^{v}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\Big{\}}\Bigg{]}\,. (S.13)

There are now four terms in the infinite sum over nn. The first is

n=1(12n(2n1))(1λ)2n\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)(1-\lambda)^{2n} =12ln(1(1λ)2)+(1λ)tanh1(1λ).\displaystyle=\frac{1}{2}\ln\left(1-(1-\lambda)^{2}\right)+(1-\lambda)\tanh^{-1}(1-\lambda)\,. (S.14)

The second is

n=1(12n(2n1))u=1n(2n2u)(1λ)2(nu)(λζ2)2u(2uu)\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\begin{pmatrix}2u\\ u\end{pmatrix}
=m=1(λζ2)2m(2mm)(1λ)2mn=m(12n(2n1))(2n2m)(1λ)2n.\displaystyle=\sum_{m=1}^{\infty}\left(\frac{\lambda\zeta}{2}\right)^{2m}\begin{pmatrix}2m\\ m\end{pmatrix}(1-\lambda)^{-2m}\sum_{n=m}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\begin{pmatrix}2n\\ 2m\end{pmatrix}(1-\lambda)^{2n}\,.

Using the solution to the infinite series:

n=mx2n2n(2n1)(2n2m)\displaystyle\sum_{n=m}^{\infty}\frac{x^{2n}}{2n(2n-1)}\begin{pmatrix}2n\\ 2m\end{pmatrix} =x2m4m(2m1)((1+x)(1+x)2m+(1x)(1x)2m),\displaystyle=\frac{x^{2m}}{4m(2m-1)}\Bigg{(}\frac{(1+x)}{(1+x)^{2m}}+\frac{(1-x)}{(1-x)^{2m}}\Bigg{)}\,, (S.15)

we can rewrite the second term as

(2λ)4m=1(1m(2m1))(2mm)(λζ2(2λ))2m\displaystyle\frac{(2-\lambda)}{4}\sum_{m=1}^{\infty}\left(\frac{1}{m(2m-1)}\right)\begin{pmatrix}2m\\ m\end{pmatrix}\left(\frac{\lambda\zeta}{2(2-\lambda)}\right)^{2m}
+λ4m=1(1m(2m1))(2mm)(ζ2)2m.\displaystyle+\frac{\lambda}{4}\sum_{m=1}^{\infty}\left(\frac{1}{m(2m-1)}\right)\begin{pmatrix}2m\\ m\end{pmatrix}\left(\frac{\zeta}{2}\right)^{2m}\,.

Using the result (S.9) we find this is equal to

(1λ2)F(δ)+λ2F(ζ)ln(2),\displaystyle\left(1-\frac{\lambda}{2}\right)F(\delta)+\frac{\lambda}{2}F(\zeta)-\ln(2)\,, (S.16)

where we defined

δλζ2λ,F(x)x2g1(x)+ln(g1(x)),g1(x)1+g0(x),g0(x)1x2.\displaystyle\delta\equiv\frac{\lambda\zeta}{2-\lambda},\qquad F(x)\equiv\frac{x^{2}}{g_{1}(x)}+\ln\left(g_{1}(x)\right),\qquad g_{1}(x)\equiv 1+g_{0}(x),\qquad g_{0}(x)\equiv\sqrt{1-x^{2}}\,. (S.17)

Next we consider the third term in the infinite sum in (S.13):

n=1(1n(2n1))u=1n(2n2u)(1λ)2(nu)(λζ2)2um=1u(2uum)𝔢{eiα2mck2m(s1)}\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{n(2n-1)}\right)\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\sum_{m=1}^{u}\begin{pmatrix}2u\\ u-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\Big{\}}
=m=1𝔢{eiα2mck2m(s1)}u=m(λζ2)2u(2uum)(1λ)2un=u(1n(2n1))(2n2u)(1λ)2n.\displaystyle=\sum_{m=1}^{\infty}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\Big{\}}\sum_{u=m}^{\infty}\left(\frac{\lambda\zeta}{2}\right)^{2u}\begin{pmatrix}2u\\ u-m\end{pmatrix}(1-\lambda)^{-2u}\sum_{n=u}^{\infty}\left(\frac{1}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2n}\,.

The last series over nn is given by (S.15) (within a factor of two), so we have

u=m(λζ2)2u(2uum)(1λ)2un=u(1n(2n1))(2n2u)(1λ)2n\displaystyle\sum_{u=m}^{\infty}\left(\frac{\lambda\zeta}{2}\right)^{2u}\begin{pmatrix}2u\\ u-m\end{pmatrix}(1-\lambda)^{-2u}\sum_{n=u}^{\infty}\left(\frac{1}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2n}
=(2λ)2u=m(1u(2u1))(2uum)(λζ2(2λ))2u\displaystyle=\frac{(2-\lambda)}{2}\sum_{u=m}^{\infty}\left(\frac{1}{u(2u-1)}\right)\begin{pmatrix}2u\\ u-m\end{pmatrix}\left(\frac{\lambda\zeta}{2(2-\lambda)}\right)^{2u}
+λ2u=m(1u(2u1))(2uum)(ζ2)2u.\displaystyle+\frac{\lambda}{2}\sum_{u=m}^{\infty}\left(\frac{1}{u(2u-1)}\right)\begin{pmatrix}2u\\ u-m\end{pmatrix}\left(\frac{\zeta}{2}\right)^{2u}\,.

Using (S.10) we find this is equal to

(1λ2)G(δ,m)+λ2G(ζ,m),\displaystyle\left(1-\frac{\lambda}{2}\right)G(\delta,m)+\frac{\lambda}{2}G(\zeta,m)\,, (S.18)

where we defined

G(x,m)(1+2mg0(x)m(4m21))(xg1(x))2m.\displaystyle G(x,m)\equiv\left(\frac{1+2mg_{0}(x)}{m(4m^{2}-1)}\right)\left(\frac{x}{g_{1}(x)}\right)^{2m}\,. (S.19)

Last we consider the fourth term in the infinite sum in (S.13):

n=1(1n(2n1))v=1n(2n2v1)(1λ)2(nv)+1(λζ2)2v1m=1v(2v1vm)𝔢{eiα(2m1)ck(2m1)(s1)}\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{n(2n-1)}\right)\sum_{v=1}^{n}\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2(n-v)+1}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}\sum_{m=1}^{v}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\Big{\}}
=m=1𝔢{eiα(2m1)ck(2m1)(s1)}v=m(2v1vm)(λζ2)2v1\displaystyle=\sum_{m=1}^{\infty}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\Big{\}}\sum_{v=m}^{\infty}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}
(1λ)(2v1)n=v(1n(2n1))(2n2v1)(1λ)2n.\displaystyle\qquad(1-\lambda)^{-(2v-1)}\sum_{n=v}^{\infty}\left(\frac{1}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2n}\,.

Here the last series over nn is given by (for m>1m>1; we will need to consider the m=1m=1 case separately)

n=m(x2nn(2n1))(2n2m1)\displaystyle\sum_{n=m}^{\infty}\left(\frac{x^{2n}}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2m-1\end{pmatrix} =(x2m12(m1)(2m1))((1x)(1x)2m1(1+x)(1+x)2m1),\displaystyle=\left(\frac{x^{2m-1}}{2(m-1)(2m-1)}\right)\left(\frac{(1-x)}{(1-x)^{2m-1}}-\frac{(1+x)}{(1+x)^{2m-1}}\right)\,,

and for m=1m=1

n=m(x2nn(2n1))(2n2m1)=n=1x2nn(2n1)2n=2n=1x2n(2n1)=2xtanh1(x),\displaystyle\sum_{n=m}^{\infty}\left(\frac{x^{2n}}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2m-1\end{pmatrix}=\sum_{n=1}^{\infty}\frac{x^{2n}}{n(2n-1)}2n=2\sum_{n=1}^{\infty}\frac{x^{2n}}{(2n-1)}=2x\tanh^{-1}(x)\,,

so we have (for m>1m>1)

v=m(2v1vm)(λζ2)2v1(1λ)(2v1)n=v(1n(2n1))(2n2v1)(1λ)2n\displaystyle\sum_{v=m}^{\infty}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}(1-\lambda)^{-(2v-1)}\sum_{n=v}^{\infty}\left(\frac{1}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2n}
=λv=m(12(v1)(2v1))(2v1vm)(ζ2)2v1\displaystyle=\lambda\sum_{v=m}^{\infty}\left(\frac{1}{2(v-1)(2v-1)}\right)\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\left(\frac{\zeta}{2}\right)^{2v-1}
(2λ)v=m(12(v1)(2v1))(2v1vm)(λζ2(2λ))2v1.\displaystyle-(2-\lambda)\sum_{v=m}^{\infty}\left(\frac{1}{2(v-1)(2v-1)}\right)\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\left(\frac{\lambda\zeta}{2(2-\lambda)}\right)^{2v-1}\,.

Using (S.11) we find this is equal to

λζ4(J(ζ,m)J(δ,m)),\displaystyle\frac{\lambda\zeta}{4}\left(J(\zeta,m)-J(\delta,m)\right)\,, (S.20)

where we defined

J(x,m)(1+(2m1)g0(x)(m1)m(2m1)g1(x))(xg1(x))2(m1).\displaystyle J(x,m)\equiv\left(\frac{1+(2m-1)g_{0}(x)}{(m-1)m(2m-1)g_{1}(x)}\right)\left(\frac{x}{g_{1}(x)}\right)^{2(m-1)}\,. (S.21)

Finally, for the m=1m=1 case in the fourth term in (S.13), we have

v=m(2v1vm)(λζ2)2v1(1λ)(2v1)n=v(1n(2n1))(2n2v1)(1λ)2n\displaystyle\sum_{v=m}^{\infty}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}(1-\lambda)^{-(2v-1)}\sum_{n=v}^{\infty}\left(\frac{1}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2n}
=λζtanh1(1λ)\displaystyle=\lambda\zeta\tanh^{-1}(1-\lambda)
+v=2(12(v1)(2v1))(2v1v1)(λ(ζ2)2v1(2λ)(λζ2(2λ))2v1).\displaystyle+\sum_{v=2}^{\infty}\left(\frac{1}{2(v-1)(2v-1)}\right)\begin{pmatrix}2v-1\\ v-1\end{pmatrix}\left(\lambda\left(\frac{\zeta}{2}\right)^{2v-1}-(2-\lambda)\left(\frac{\lambda\zeta}{2(2-\lambda)}\right)^{2v-1}\right)\,.

Using (S.12) we can rewrite this as

λζ(tanh1(1λ)+L(δ)L(ζ)2),\displaystyle\lambda\zeta\left(\tanh^{-1}(1-\lambda)+\frac{L(\delta)-L(\zeta)}{2}\right)\,, (S.22)

where we defined

L(x)1g1(x)+ln(g1(x))\displaystyle L(x)\equiv\frac{1}{g_{1}(x)}+\ln\left(g_{1}(x)\right) (S.23)

S.V.3 Summary

Substituting the results and definitions (S.14), (S.16), (S.17), (S.18), (S.19), (S.20), (S.21), (S.22), and (S.23) into (S.13), we have

ξ02πdϕ2πPξ(α,kϕ)ps1(ϕ)ln(Pξ(α,kϕ))\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}
=ln(2)+12ln(1(1λ)2)+(1λ)tanh1(1λ)+(1λ2)F(δ)+λ2F(ζ)ln(2)\displaystyle=-\ln(2)+\frac{1}{2}\ln\left(1-(1-\lambda)^{2}\right)+(1-\lambda)\tanh^{-1}(1-\lambda)+\left(1-\frac{\lambda}{2}\right)F(\delta)+\frac{\lambda}{2}F(\zeta)-\ln(2)
+m=1Am𝔢{eiα2mck2m(s1)}+Bm𝔢{eiα(2m1)ck(2m1)(s1)},\displaystyle+\sum_{m=1}^{\infty}A_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}+B_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\right\}\,,

where

Am\displaystyle A_{m} =(1λ2)G(δ,m)+λ2G(ζ,m)\displaystyle=\left(1-\frac{\lambda}{2}\right)G(\delta,m)+\frac{\lambda}{2}G(\zeta,m)
Bm\displaystyle B_{m} ={λζ2(ln(1λ2)+L(δ)ln(λ2)L(ζ)),if m=1λζ4(J(ζ,m)J(δ,m)),otherwise\displaystyle=\begin{cases}\frac{\lambda\zeta}{2}\left(\ln\left(1-\frac{\lambda}{2}\right)+L(\delta)-\ln\left(\frac{\lambda}{2}\right)-L(\zeta)\right)\,,&\text{if }m=1\\ \frac{\lambda\zeta}{4}\left(J(\zeta,m)-J(\delta,m)\right)\,,&\text{otherwise}\end{cases}

and the expected entropy gain (9) is

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =2ln(2)+12ln(1(1λ)2)+(1λ)2ln(2λλ)+(1λ2)F(δ)+λ2F(ζ)\displaystyle=-2\ln(2)+\frac{1}{2}\ln\left(1-(1-\lambda)^{2}\right)+\frac{(1-\lambda)}{2}\ln\left(\frac{2-\lambda}{\lambda}\right)+\left(1-\frac{\lambda}{2}\right)F(\delta)+\frac{\lambda}{2}F(\zeta)
+m=1(Am𝔢{eiα2mck2m(s1)}+Bm𝔢{eiα(2m1)ck(2m1)(s1)})ξΠξ(α,k)ln(Πξ(α,k)).\displaystyle+\sum_{m=1}^{\infty}\Big{(}A_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}+B_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\right\}\Big{)}-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,. (S.24)

S.V.4 Limiting cases

If λ1\lambda\rightarrow 1, then δζ\delta\rightarrow\zeta, and we have

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =2ln(2)+12ln(1)+(0)ln(1)+F(ζ)\displaystyle=-2\ln(2)+\frac{1}{2}\ln(1)+(0)\ln(1)+F(\zeta)
+m=1(Am𝔢{eiα2mck2m(s1)}+Bm𝔢{eiα(2m1)ck(2m1)(s1)})\displaystyle+\sum_{m=1}^{\infty}\Big{(}A_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}+B_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\right\}\Big{)}
ξΠξ(α,k)ln(Πξ(α,k)),\displaystyle-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,,

and Am=G(ζ,m),Bm=0A_{m}=G(\zeta,m),\,B_{m}=0, so

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =2ln(2)+F(ζ)+m=1G(ζ,m)𝔢{eiα2mck2m(s1)}ξΠξ(α,k)ln(Πξ(α,k)).\displaystyle=-2\ln(2)+F(\zeta)+\sum_{m=1}^{\infty}G(\zeta,m)\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,.

If we also have ζ=1\zeta=1, then g0(ζ)=0g_{0}(\zeta)=0, and we have

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =2ln(2)+1+m=1(1m(4m21))𝔢{eiα2mck2m(s1)}ξΠξ(α,k)ln(Πξ(α,k)).\displaystyle=-2\ln(2)+1+\sum_{m=1}^{\infty}\left(\frac{1}{m(4m^{2}-1)}\right)\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,.

If ζ=0\zeta=0, then g0(ζ)=1g_{0}(\zeta)=1 and we have

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =2ln(2)+ln(2)ξΠξ(α,k)ln(Πξ(α,k)).\displaystyle=-2\ln(2)+\ln(2)-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,.

Since in this case Πξ(α,k)=12\Pi_{\xi}(\alpha,\,k)=\frac{1}{2}, this is

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =ln(2)ln(12)=0,\displaystyle=-\ln(2)-\ln\left(\frac{1}{2}\right)=0\,,

as expected, since no information about ϕ\phi is given by the measurement.

Finally, we consider the case λ0\lambda\rightarrow 0. Then δ0\delta\rightarrow 0, and we have

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =2ln(2)+12ln(0)+12ln(20)+ln(2)\displaystyle=-2\ln(2)+\frac{1}{2}\ln(0)+\frac{1}{2}\ln\left(\frac{2}{0}\right)+\ln(2)
+m=1(Am𝔢{eiα2mck2m(s1)}+Bm𝔢{eiα(2m1)ck(2m1)(s1)})\displaystyle+\sum_{m=1}^{\infty}\Big{(}A_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}+B_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\right\}\Big{)}
ξΠξ(α,k)ln(Πξ(α,k)),\displaystyle-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,,

and

Am\displaystyle A_{m} =0\displaystyle=0
Bm\displaystyle B_{m} ={0ln(20),if m=10,otherwise,\displaystyle=\begin{cases}0\ln\left(\frac{2}{0}\right)\,,&\text{if }m=1\\ 0\,,&\text{otherwise}\end{cases}\,,

and since Πξ(α,k){1,0}\Pi_{\xi}(\alpha,\,k)\in\{1,0\},

ξΠξ(α,k)ln(Πξ(α,k))=0.\displaystyle\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)=0\,.

But we see that the value of

12ln(1(1λ)2)+(1λ)2ln(2λλ)\displaystyle\frac{1}{2}\ln\left(1-(1-\lambda)^{2}\right)+\frac{(1-\lambda)}{2}\ln\left(\frac{2-\lambda}{\lambda}\right) 12ln(0)+12ln(20)\displaystyle\rightarrow\frac{1}{2}\ln(0)+\frac{1}{2}\ln\left(\frac{2}{0}\right)

is undefined. Recalling that this was the result of the series

n=1(12n(2n1))(1λ)2n,\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)(1-\lambda)^{2n}\,,

we can see that if we set λ=0\lambda=0, we have

n=1(12n(2n1))(1λ)2n\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)(1-\lambda)^{2n} n=1(12n(2n1))\displaystyle\rightarrow\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)
=ln(2).\displaystyle=\ln(2)\,.

Similarly, the value of B1B_{1} is undefined. We see that all terms in B1B_{1} are zero, except one, which becomes undefined at λ=0\lambda=0; this is the term that results from the series:

λζ(1λ)1n=1(1(2n1))(1λ)2n\displaystyle\lambda\zeta(1-\lambda)^{-1}\sum_{n=1}^{\infty}\left(\frac{1}{(2n-1)}\right)(1-\lambda)^{2n} (0)ζn=112n1.\displaystyle\rightarrow(0)\zeta\sum_{n=1}^{\infty}\frac{1}{2n-1}\,.

But since the series term diverges, we need to take the limit:

limλ0λln(2λλ)\displaystyle\lim_{\lambda\rightarrow 0}\lambda\ln\left(\frac{2-\lambda}{\lambda}\right) =0.\displaystyle=0\,.

So B10B_{1}\rightarrow 0. Putting these results back together, we have

ΔsH(α,k)\displaystyle\Delta_{s}H(\alpha,\,k) =2ln(2)+ln(2)+ln(2)=0.\displaystyle=-2\ln(2)+\ln(2)+\ln(2)=0\,.

Similarly to the ζ=0\zeta=0 case, when λ=0\lambda=0 the measurement gives no information about ϕ\phi so the expected entropy gain is zero.

S.VI Expected sharpness gain

The sharpness at step ss is

S[ps(ϕ)]\displaystyle S\left[p_{s}(\phi)\right] =|eiϕs|=|02πdϕ2πps(ϕ)eiϕ|=|c1(s)|\displaystyle=\left|\langle\mathrm{e}^{i\phi}\rangle_{s}\right|=\left|\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\mathrm{e}^{i\phi}\right|=\left|c_{-1}^{(s)}\right|

We define the expected sharpness gain for step ss:

ΔsS(α,k)\displaystyle\Delta_{s}S(\alpha,\,k) =ξΠξ(α,k)(S[ps(ϕ|ξ;α,k)]S[ps1(ϕ)])\displaystyle=\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\Big{(}S\left[p_{s}(\phi|\xi;\alpha,k)\right]-S\left[p_{s-1}(\phi)\right]\Big{)}
=ξΠξ(α,k)S[ps(ϕ|ξ;α,k)]|c1(s1)|\displaystyle=\sum_{\xi}\Pi_{\xi}(\alpha,\,k)S\left[p_{s}(\phi|\xi;\alpha,k)\right]-\left|c_{-1}^{(s-1)}\right|
=ξ|02πdϕ2πPξ(α,kϕ)ps1(ϕ)eiϕ||c1(s1)|.\displaystyle=\sum_{\xi}\left|\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\mathrm{e}^{i\phi}\right|-\left|c_{-1}^{(s-1)}\right|\,.

We have

02πdϕ2πPξ(α,kϕ)ps1(ϕ)eiϕ\displaystyle\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\mathrm{e}^{i\phi}
=02πdϕ2πn=[12(1+ξ(1λ))cn(s1)+ξλζ4(eiαcn+k(s1)+eiαcnk(s1))]einϕeiϕ\displaystyle=\int_{0}^{2\pi}\frac{d\phi}{2\pi}\sum_{n=-\infty}^{\infty}\Bigg{[}\frac{1}{2}\big{(}1+\xi(1-\lambda)\big{)}c_{n}^{(s-1)}+\xi\lambda\frac{\zeta}{4}\left(\mathrm{e}^{i\alpha}c_{n+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{n-k}^{(s-1)}\right)\Bigg{]}\mathrm{e}^{in\phi}\mathrm{e}^{i\phi}
=n=[12(1+ξ(1λ))cn(s1)+ξλζ4(eiαcn+k(s1)+eiαcnk(s1))]02πdϕ2πei(n+1)ϕ\displaystyle=\sum_{n=-\infty}^{\infty}\Bigg{[}\frac{1}{2}\big{(}1+\xi(1-\lambda)\big{)}c_{n}^{(s-1)}+\xi\lambda\frac{\zeta}{4}\left(\mathrm{e}^{i\alpha}c_{n+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{n-k}^{(s-1)}\right)\Bigg{]}\int_{0}^{2\pi}\frac{d\phi}{2\pi}\mathrm{e}^{i(n+1)\phi}
=12(1+ξ(1λ))c1(s1)+ξλζ4(eiαc1+k(s1)+eiαc1k(s1)),\displaystyle=\frac{1}{2}\big{(}1+\xi(1-\lambda)\big{)}c_{-1}^{(s-1)}+\xi\lambda\frac{\zeta}{4}\left(\mathrm{e}^{i\alpha}c_{-1+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{-1-k}^{(s-1)}\right)\,, (S.25)

Using (S.25), we have

ΔsS(α,k)\displaystyle\Delta_{s}S(\alpha,\,k) =ξ|12(1+ξ(1λ))c1(s1)+ξλζ4(eiαc1+k(s1)+eiαc1k(s1))||c1(s1)|.\displaystyle=\sum_{\xi}\bigg{|}\frac{1}{2}\big{(}1+\xi(1-\lambda)\big{)}c_{-1}^{(s-1)}+\xi\lambda\frac{\zeta}{4}\left(\mathrm{e}^{i\alpha}c_{-1+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{-1-k}^{(s-1)}\right)\bigg{|}-\left|c_{-1}^{(s-1)}\right|\,. (S.26)

S.VI.1 Limiting cases

If λ=1\lambda=1

ΔsS(α,k)\displaystyle\Delta_{s}S(\alpha,\,k) =ξ|12c1(s1)+ξζ4(eiαc1+k(s1)+eiαc1k(s1))||c1(s1)|,\displaystyle=\sum_{\xi}\bigg{|}\frac{1}{2}c_{-1}^{(s-1)}+\xi\frac{\zeta}{4}\left(\mathrm{e}^{i\alpha}c_{-1+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{-1-k}^{(s-1)}\right)\bigg{|}-\left|c_{-1}^{(s-1)}\right|\,,

and if ζ=0\zeta=0

ΔsS(α,k)\displaystyle\Delta_{s}S(\alpha,\,k) =ξ|12c1(s1)||c1(s1)|=122|c1(s1)||c1(s1)|=0,\displaystyle=\sum_{\xi}\bigg{|}\frac{1}{2}c_{-1}^{(s-1)}\bigg{|}-\left|c_{-1}^{(s-1)}\right|=\frac{1}{2}2\bigg{|}c_{-1}^{(s-1)}\bigg{|}-\left|c_{-1}^{(s-1)}\right|=0\,,

as expected. And if λ=0\lambda=0

ΔsS(α,k)\displaystyle\Delta_{s}S(\alpha,\,k) =ξ|12(1+ξ)c1(s1)||c1(s1)|=|122c1(s1)||c1(s1)|=0,\displaystyle=\sum_{\xi}\bigg{|}\frac{1}{2}\big{(}1+\xi\big{)}c_{-1}^{(s-1)}\bigg{|}-\left|c_{-1}^{(s-1)}\right|=\bigg{|}\frac{1}{2}2c_{-1}^{(s-1)}\bigg{|}-\left|c_{-1}^{(s-1)}\right|=0\,,

as expected.