Time-adaptive phase estimation

Brennan de Neeve brennan.mn@proton.me Institute for Quantum Electronics, ETH Zürich, Otto-Stern-Weg 1, 8093 Zürich, Switzerland Quantum Center, ETH Zürich, 8093 Zürich, Switzerland Andrey V. Lebedev Theoretische Physik, Wolfgang-Pauli-Strasse 27, ETH Zürich, CH-8093, Zürich, Switzerland Current affiliation: Dukhov Research Institute of Automatics (VNIIA), Moscow, 127030, Russia Current affiliation: Advanced Mesoscience and Nanotechnology Centre, Moscow Institute of Physics and Technology (MIPT), Dolgoprudny, 141700, Russia Vlad Negnevitsky Institute for Quantum Electronics, ETH Zürich, Otto-Stern-Weg 1, 8093 Zürich, Switzerland Jonathan P. Home jhome@phys.ethz.ch Institute for Quantum Electronics, ETH Zürich, Otto-Stern-Weg 1, 8093 Zürich, Switzerland Quantum Center, ETH Zürich, 8093 Zürich, Switzerland

Abstract

Phase estimation is known to be a robust method for single-qubit gate calibration in quantum computers [1], while Bayesian estimation is widely used in devising optimal methods for learning in quantum systems [2]. We present Bayesian phase estimation methods that adaptively choose a control phase and the time of coherent evolution based on prior phase knowledge. In the presence of noise, we find near-optimal performance with respect to known theoretical bounds, and demonstrate some robustness of the estimates to noise that is not accounted for in the model of the estimator, making the methods suitable for calibrating operations in quantum computers. We determine the utility of control parameter values using functions of the prior probability of the phase that quantify expected knowledge gain either in terms of expected narrowing of the posterior or expected information gain. In particular, we find that by maximising the rate of expected gain we obtain phase estimates having standard deviation a factor of $1.43$ larger than the Heisenberg limit using a classical sequential strategy. The methods provide optimal solutions accounting for available prior knowledge and experimental imperfections with minimal effort from the user. The effect of many types of noise can be specified in the model of the measurement probabilities, and the rate of knowledge gain can easily be adjusted to account for times included in the measurement sequence other than the coherent evolution leading to the unknown phase, such as times required for state preparation or readout.

I Introduction

Phase estimation has found an increasing number of applications in metrology and quantum computing in recent years. Although resources are considered differently in these two settings [3, 4, 5], the methods of each have led to new applications in the other. In metrology, classical averaging limits to estimates with phase uncertainty $\Delta\hat{\phi}$ scaling at best according to the standard quantum limit (SQL), $\Delta\hat{\phi}>1/\sqrt{N}$ , with the number of resources $N$ , while general strategies are fundamentally limited only by Heisenberg’s uncertainty principle and can obtain a $\sqrt{N}$ improvement over the SQL reaching the so-called Heisenberg limit (HL), $\Delta\hat{\phi}>\pi/N$ [6]. Entanglement was initially thought to be a key ingredient in schemes to reach Heisenberg scaling, $\Delta\hat{\phi}\propto 1/N$ [7], but when it was also found that the same scaling could be reached using sequentially prepared unentangled systems [8, 9, 10, 11, 12, 13, 14]¹¹1This is not necessarily true when noise is considered [88]., ideas from quantum computing [16, 17] soon led to new experimentally accessible metrology procedures with this scaling [18]. On the other hand, there have been several more recent proposals to use metrology methods to calibrate operations for quantum computing [19, 20, 21, 22, 1].

Many metrology proposals make use of a Bayesian approach to estimation which provides a natural framework to describe adaptive procedures, where the settings for future experiments are modified based on the results of previous measurements [23, 24, 25, 26, 27, 18, 28, 29, 30, 31]. While adaptive methods can lead to better performance, they are generally more complex than non-adaptive strategies and can also be more difficult to implement in some experiments. Remarkably, Higgins et al. found that they could reach near-optimal Heisenberg scaling using a non-adaptive procedure by optimising the number of measurements performed with different coherent applications of the unknown phase [32, 33].

While initial proposals like that of Higgins et al. [32] considered ideal settings with pure states and unitary operations, the role of noise and experimental imperfections has been increasingly studied over time [34]. This development has been both in the understanding of the fundamental limits to precision [35, 36, 37, 38, 39, 40, 41, 42, 43], and in devising better strategies to cope with non-ideal conditions, where the Bayesian framework with adaptive measurements has proved useful [44, 45, 46, 47, 48, 49, 50]. In some cases, adaptive methods have been used to design strategies better suited to the physics of particular experiments. In recent developments for the sensing of magnetic fields with NV centres, adaptive procedures were used to account for reduced visibility measurements [51, 52, 53, 54, 55, 56, 57, 58]²²2Readout can also be improved by adaptive methods [89, 90]. In many proposals the adaptive control acts on a phase that can be seen as an adjustment of the measurement basis, but the time of coherent interaction is chosen non-adaptively, e.g. as detailed in [32]. Since the interaction times proposed in [32] are optimised without noise, and thus may not be suited to experimental conditions, some experimenters perform optimisations using numerical simulations that include relevant noise in order to find interaction times better suited to their experiments[60, 61, 62, 53, 63]. Recently, Belliardo and Giovannetti [64] have analytically shown how to modify coherent interaction times in the non-adaptive procedure of Higgins et al. [32] to account for certain types of noise. Others have investigated the possibility of also choosing the interaction time, or equivalently, in the case of some optical measurements, the size of a so-called N00N-state [33]³³3This is an example where entanglement can be used to convert temporal resources into spacial resources [11]., adaptively: initially in proposals without noise [27, 33], or with mixed-state quantum computation [28], and later in proposals using numerical algorithms that can also account for noise and imperfections [66, 67, 68, 57, 69].

Commonly the performance of a strategy is only considered in the asymptotic regime, when the number of resources (i.e. number of physical systems and total estimation time) approaches infinity. Many experiments, however, could benefit from strategies that are optimised for finite resources. Phase estimation in metrology is typically studied in two settings. In one setting, often referred to as local, the goal is to achieve optimal sensitivity to phase fluctuations from an a priori known phase $\phi=\phi_{0}$ . In the other setting, referred to as global, it is assumed that there is initially no a priori knowledge of the phase [47]. While the amount of a priori knowledge of the phase is not relevant in the asymptotic regime, it can significantly change the optimal estimation strategies for finite estimation times [70]. Therefore, it is useful to have methods that can account for arbitrary a priori phase knowledge.

In addition, the coherent interaction time in many proposals is optimised under the assumption that this time dominates the experiment time. While this is true in the asymptotic regime, many experiments may not reach this regime for all or at least a significant portion of the estimation time [54]. An extreme example is found in GaAs quantum dots where the measurement time is several orders of magnitude longer than the coherent evolution [71]; here the authors show an exponential improvement in the mean-square error (MSE) of parameter estimation by adaptively choosing the time of coherent evolution.

In this paper we present time-adaptive phase estimation (TAPE), a method that allows for the adaptive optimisation of the coherent interaction time and a control phase to be adjusted to the resources of the experiment. TAPE can provide strategies optimised both when the experiment time is proportional to the time of coherent interaction, and when the experiment time is proportional to the number of measurements; the method also allows for any resource allocation in between these two extreme cases. In addition TAPE can provide optimal strategies for arbitrary a priori knowledge of the phase, because the choice of measurement settings depends only on the prior knowledge of the phase after the last measurement, and not on any record of previous measurements. In contrast to earlier works investigating similar adaptive procedures [27, 33], we use a more general form for the measurement probabilities so that we can account for many types of noise or imperfections.

We propose and analyse several different objective functions for the adaptive parameter selection that lead to near-optimal performance with respect to known theoretical bounds, with and without noise. When the experiment time is proportional to the time of coherent interaction we reach uncertainties in the phase estimates that are a factor of $1.43$ larger than the HL. In addition, we find that TAPE is quite robust to errors that are not accounted for in the model of the estimator. This demonstrates that, similarly to proposals like robust phase estimation [1], our methods are well-suited to calibration of single-qubit operations in the context of quantum computing.

TAPE uses a numerical representation of the phase knowledge as a Fourier series that can easily describe arbitrary prior knowledge, at the possible expense of increased computation. Since the complexity of the Fourier representation (i.e. number of coefficients to track in memory) increases with phase knowledge, the computation time required for adaptive control of experimental parameters may become too large for some practical applications. To tackle this limitation we propose a method that reduces the interval over which the Fourier representation of the phase is used. This can significantly reduce the required memory and time of computation.

II Bayesian estimation

The evolution and measurement for a single step $s$ of sequential phase estimation can be described by the quantum circuit in Fig. 1 [72]

\Qcircuit

@C=1em @R=.7em \lstick— 0 ⟩ \gateH \ctrl1 \gateR_z(-α) \gateH \meter\cw
\lstick— ϕ ⟩ \qw\gateU^k \qw\qw\qw\qw

Figure 1: Quantum circuit for sequential phase estimation.

where $H$ is the Hadamard gate, $R_{z}(-\alpha)=\mathrm{e}^{i\alpha Z/2}$ with Pauli operator $Z$ is a z-rotation, and $\alpha\,\in[0,\pi]$ is a control phase to be optimised. $U|\phi\rangle=\mathrm{e}^{i\phi}|\phi\rangle$ with $|\phi\rangle\in\mathbb{C}^{2^{n}}$ an eigenstate of the unitary $U\in\mathbb{C}^{2^{n}\times 2^{n}}$ with unknown phase $\phi\,\in[0,2\pi]$ . $k\in\mathbb{N}^{+}$ determines the number of applications of the unitary evolution. The parameters $\alpha$ and $k$ will be chosen adaptively for each measurement as detailed in section III. We denote the possible measurement outcomes at step $s$ by $\xi\in\{\pm 1\}$ , and in order to describe a range of noise or measurement errors [54], we assume the outcome $\xi$ to occur with probability

	$\displaystyle P_{\xi}(\alpha,k\phi)=$
	$\displaystyle\frac{1}{2}\Big{(}1+\xi\big{(}(1-\lambda_{k})+\lambda_{k}\zeta_{k}\cos\left(\alpha-k\phi\right)\big{)}\Big{)}\,,$		(1)

where $\lambda_{k},\zeta_{k}\in[0,1]$ allow respectively for the description of asymmetry and reduced contrast in the measurement⁴⁴4While $\lambda_{k}<1$ describes a reduced probability for the outcome $\xi=-1$ , the reverse situation can always be described by relabelling the outcomes.. The subscript $k$ of $\lambda_{k},\zeta_{k}$ indicates that these values can generally depend on $k$ . Since the outcome $\xi$ and the control parameters $\alpha$ and $k$ depend on $s$ we should really denote them $\xi_{s},\alpha_{s},k_{s}$ but we omit the subscripts for simplicity of notation.

As in [27, 33, 51] we use a Fourier series to write the probability density describing our knowledge of the phase $\phi$ at step $s$

p_{s}(\phi)=\sum_{n=-\infty}^{\infty}c_{n}^{(s)}\mathrm{e}^{in\phi}\,.

(2)

Given prior knowledge $p_{s-1}(\phi)$ and measurement outcome $\xi$ , we update our state of knowledge using Bayes’ theorem

	$\displaystyle p_{s}(\phi\|\xi;\alpha,k)$
	$\displaystyle=\frac{P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)}{\Pi_{\xi}(\alpha,\,k)}$
	$\displaystyle\propto P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)$
	$\displaystyle=\sum_{n=-\infty}^{\infty}\Bigg{[}\frac{1}{2}\big{(}1+\xi(1-\lambda_{k})\big{)}c_{n}^{(s-1)}$
	$\displaystyle+\xi\lambda_{k}\frac{\zeta_{k}}{4}\left(\mathrm{e}^{i\alpha}c_{n+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{n-k}^{(s-1)}\right)\Bigg{]}\mathrm{e}^{in\phi}\,,$		(3)

where

\Pi_{\xi}(\alpha,k)=\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)

is the posterior probability of outcome $\xi$ . Equation (3) specifies how to modify the coefficients $c_{n}^{(s-1)}\rightarrow c_{n}^{(s)}$ given the measurement outcome.

We choose to use the estimator

\hat{\phi}=\arg\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\mathrm{e}^{i\phi}=\arg\left(c_{-1}^{(s)}\right)\,.

(4)

A nice feature of the Fourier series representation for $p_{s}(\phi)$ is that the estimator (4) depends only on one coefficient. Given an estimate $\hat{\phi}$ of $\phi$ (in general now, not necessarily given by (4)), we quantify the uncertainty in $\hat{\phi}$ by the square root of the Holevo variance [74]:

\displaystyle\Delta\hat{\phi}=\sqrt{V(\hat{\phi})}=\sqrt{S(\hat{\phi})^{-2}-1}\,,

(5)

where $S(\hat{\phi})=|\langle\mathrm{e}^{i\hat{\phi}}\rangle|$ is the sharpness and the angular brackets indicate an average over the estimates $\hat{\phi}$ . If the estimate is biased, one can use $S=\langle\cos(\hat{\phi}-\phi)\rangle$ instead, where $\phi$ is the value of the correct system phase [33]. In [26] the authors show that the estimator (4) minimises the Holevo variance (5).

III Adaptive procedures

In the previous section we have described how in general to use Bayes’ theorem to update our knowledge of the phase $\phi$ given the measurement outcomes, and how to obtain an estimate $\hat{\phi}$ from the density $p_{s}(\phi)$ at step $s$ . The task of achieving minimal uncertainty in the estimates obtained from (4) is then dependent on the choices of the values of the control phase $\alpha$ and the number $k$ of applications of $U$ . In a sequential procedure, the state coherently evolves according to $U^{k}$ for a time proportional to $k$ , so that the choice of $k$ corresponds to choosing the time of coherent evolution. In the following we will assume such a sequential procedure, but we note that the results can also be applied in the case of some parallel procedures by using entanglement.

Given prior knowledge $p_{s-1}(\phi)$ at step $s-1$ we look to choose the optimal control phase for a given value of $k$ by maximising a function of $p_{s-1}(\phi)$ , $k$ , and $\alpha$ that quantifies the expected knowledge gain from the next measurement. One might then expect that the optimal choice of $k$ can be determined by maximising the expected knowledge gain over all possible values of $k$ . This can be a good choice when the evolution time of $U^{k}$ is negligible compared to other times in the experiment, as in the example of GaAs quantum dots mentioned above [71]. But in general experiments with different values of $k$ require different resources in terms of execution time, so that the expected knowledge gains for different values of $k$ are not directly comparable⁵⁵5Even more generally, which resources are valuable depends on the setting and what the experimenter wants to optimise. e.g. they may have a restricted number of qubits to measure, so that number of measurements becomes the relevant resource. In this sense it is useful to have a method where the optimisation can be adjusted by the experimenter to describe best how they value their resources..

In [27, 33], the authors studied a method where the value of $k$ is chosen adaptively. In [27], $k$ corresponds to the number of photons in a N00N state, however as shown for the noiseless case in [33] the sequential procedure we discuss here is mathematically equivalent. This equivalence also holds under certain types of noise [13, 14]. Physically these two procedures are different but they are connected by the use of entanglement to convert between temporal and spatial resources [11].

In the case that the experiment time is proportional to $k$ , the relevant resource is the number of applications of $U$ . This way of considering resources, which is typically chosen in quantum metrology, is considered in [27, 33]. In these works the authors calculate the expected (differential) entropy ⁶⁶6The differential entropy is the entropy of a continuous random variable, but it lacks some important properties of the Shannon entropy for discrete random variables. See e.g. [91], chapter 8. In this manuscript we will usually write simply “entropy” when referring to the differential entropy. of the posterior after the next measurement using a Gaussian approximation. Motivated by the resource dependence on $k$ they choose the value of $k$ that minimises the expected entropy divided by $\ln(N)$ . Here $N$ refers to the total number of resources used since the beginning of the estimation sequence, where an initially uniform prior is assumed (no prior knowledge of the phase). Since this method requires knowing $N$ assuming an initially uniform prior and because the priors in the later stages of the estimation must be approximately Gaussian, it is not well suited to incorporating arbitrary prior knowledge. Moreover, the approach of dividing the entropy by $\ln(N)$ is based on scaling of the uncertainty in the phase that is only valid without noise.

Here we study two different methods for adaptively choosing $k$ that have the following features:

1.

The resource dependence on $k$ is adjustable so that it can be chosen to match a given experiment. Since we consider sequential procedures, we take the resource requirement for performing an experiment with a particular value of $k$ to be the time required to perform that experiment.
2.

The choice of $k$ for a given measurement is determined only by the prior probability density at that step of the estimation sequence.
3.

The methods can be applied to different functions quantifying the expected knowledge gain associated with a particular choice of $k$ . We study two such possible functions possessing different benefits: the expected sharpness gain and the expected (differential) entropy gain.

In the following we show how to calculate these two functions quantifying expected knowledge gain exactly; with the exact expressions in hand these functions can be computed for any prior, not only Gaussian. The expected knowledge gains are calculated from expressions that can describe experiments with noise. This allows the methods to determine good phase estimation procedures for noisy experiments, as well as in the noise-free case.

III.1 Expected knowledge gain

Given prior knowledge $p_{s-1}(\phi)$ at step $s-1$ , one might expect that in order to minimise the Holevo variance of the estimates $\hat{\phi}$ (equation (5)) at step $s$ , a good strategy can be to choose the control phase $\alpha$ that minimises the expected Holevo variance of the posterior Bayesian probability density for the next measurement. However, it is shown in [26, 33] that in order to minimise the Holevo variance of the estimates $\hat{\phi}$ one should choose $\alpha$ to maximise the expected sharpness of the posterior Bayesian probability density for the next measurement

\displaystyle\sum_{\xi}\Pi_{\xi}(\alpha,k\phi)S\left[p_{s}(\phi|\xi;\alpha,k)\right]\,,

where we denote the sharpness of $p_{s}(\phi)$ by

\displaystyle S\left[p_{s}(\phi)\right]

\displaystyle=\left|\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\mathrm{e}^{i\phi}\right|=\left|c_{-1}^{(s)}\right|\,.

(6)

Similarly to the estimate (4), a nice feature of the Fourier series representation for $p_{s}(\phi)$ is that the sharpness (6) depends only on a single Fourier coefficient. Maximising the expected sharpness of $p_{s}(\phi)$ for the next measurement is a good strategy since the sharpness of the estimates can be written as the average of $S\left[p_{s}(\phi)\right]$ over possible measurement records; maximising the sharpness of the estimates is equivalent to minimising the Holevo variance of the estimates [33]. We note that the average of $S\left[p_{s}(\phi)\right]$ over possible measurement records considered in [33] assumes a uniform prior at the beginning of the estimation sequence, and this strategy may be less optimal for other priors.

In order to evaluate the best strategy for the measurement at step $s$ , we define the expected sharpness gain as

	$\displaystyle\Delta_{s}S(\alpha,\,k)\equiv$
	$\displaystyle\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\Big{(}S\left[p_{s}(\phi\|\xi;\alpha,k)\right]-S\left[p_{s-1}(\phi)\right]\Big{)}$
	$\displaystyle=-\left\|c_{-1}^{(s-1)}\right\|+\sum_{\xi}\bigg{\|}\frac{1}{2}\big{(}1+\xi(1-\lambda_{k})\big{)}c_{-1}^{(s-1)}$
	$\displaystyle\qquad\qquad\qquad+\xi\lambda_{k}\frac{\zeta_{k}}{4}\left(\mathrm{e}^{i\alpha}c_{-1+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{-1-k}^{(s-1)}\right)\bigg{\|}\,,$		(7)

rather than working with the expected sharpness of the posterior directly. The reason for working with the gain will become clear later on when we consider gain rates. Similarly, we define the expected entropy gain for the probability density $p_{s}(\phi)$ at step $s$ ,

	$\displaystyle\Delta_{s}H(\alpha,\,k)\equiv$
	$\displaystyle\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\Big{(}H[p_{s-1}(\phi)]-H[p_{s}(\phi\|\xi;\alpha,k)]\Big{)}\,,$		(8)

where $H[p(\phi)]$ is the differential entropy of $p(\phi)$ ,

\displaystyle H[p(\phi)]

\displaystyle=-\int_{0}^{2\pi}\frac{d\phi}{2\pi}p(\phi)\ln\left(\frac{p(\phi)}{2\pi}\right)\,.

We note that $\Delta_{s}H(\alpha,\,k)$ can be seen as the expected information gain from the next measurement. It can be rewritten as (see Supplemental Material, section S.V)

\displaystyle\Delta_{s}H(\alpha,\,k)

\displaystyle=\sum_{\xi}\Pi_{\xi}(\alpha,k)D_{KL}[p_{s}\|p_{s-1}]\,,

where we denoted $p_{s-1}=p_{s-1}(\phi)$ , $p_{s}=p_{s}(\phi|\xi;\alpha,k)$ for short, and $D_{KL}[p_{s}\|p_{s-1}]$ is the Kullback-Leibler divergence of the posterior from the prior [77, 78]. In ref. [50] the authors show that maximising expected entropy gain maximises the likelihood of estimating the true phase. In terms of the coefficients of the prior, this can be computed by an expression of the form

$\displaystyle\Delta_{s}H(\alpha,\,k)$	$\displaystyle=f(\lambda_{k},\zeta_{k})-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)$
	$\displaystyle+\sum_{m=1}^{\infty}\Big{(}A_{m}\mathfrak{Re}\left\{\mathrm{e}^{i2m\alpha}c_{2mk}^{(s-1)}\right\}$
	$\displaystyle+B_{m}\mathfrak{Re}\left\{\mathrm{e}^{i(2m-1)\alpha}c_{(2m-1)k}^{(s-1)}\right\}\Big{)}\,.$	(9)

The $m$ -dependent coefficients $A_{m}$ and $B_{m}$ are generally also functions of $\lambda_{k}$ and $\zeta_{k}$ . Since our knowledge $p(\phi)$ after a finite number of measurements is always described by a finite Fourier series, the sum over $m$ is always finite. The derivation of this expression as well as the exact expressions for $f$ , $A_{m}$ , and $B_{m}$ are given in the Supplemental Material, section S.V.

Before discussing in detail how to use the expected gains to choose $k$ we discuss the choice of the control phase $\alpha$ and compare the benefits of using sharpness to quantify expected knowledge gain versus entropy. For both methods described in section III.2 below, computing the expected gain for a given value of $k$ always involves an optimisation of the control phase $\alpha$ for that particular $k$ -value. An example prior is shown in figure 2 and the values of the expected entropy and sharpness gains for the next measurement are plotted for $k\leq 5$ . The expected entropy gain increases significantly from $k=1$ to $k=2$ while the increase for larger $k$ -values is smaller. The expected sharpness gain also increases significantly from $k=1$ to $k=2$ , but contrary to the expected entropy gain, decreases for $k>2$ , and becomes zero for $k>4$ . This is because as $k$ increases, it is increasingly likely that the posterior will have multiple peaks. When $k>4$ the effect of the measurement leads to a posterior density that is split into multiple peaks with an envelope similar to the prior density, so that the sharpness is unchanged. Conversely, the entropy still increases when the probability density is split into multiple peaks, so that the expected entropy gain is large for all values of $k$ .

Refer to caption — Figure 2: An example prior (top) is plotted along with the expected entropy gain, $\Delta H(\alpha,k)$ (middle) and expected sharpness gain $\Delta S(\alpha,k)$ (bottom) as a function of $\alpha$ for $k\leq 5$ . $\Delta H(\alpha,k)$ and $\Delta S(\alpha,k)$ have a period of $\pi$ in $\alpha$ (if we see $\alpha$ as a change of the measurement basis, then $\alpha=\pi$ corresponds to the change $|0\rangle\rightarrow|1\rangle$ , $|1\rangle\rightarrow|0\rangle$ ). The expected entropy gain increases to just over $0.3$ for large $k$ , while the expected sharpness gain increases from $k=1$ to $2$ , and then decreases, reaching zero for $k>4$ . The gains for $k>1$ do not align with $p(\phi)$ since the control phase $\alpha$ is applied once per measurement, whereas the unknown phase $\phi$ is applied $k$ times.

These differences between the entropy and the sharpness make them useful for different purposes. Maximising the expected sharpness gain is useful to ensure that the probability density $p(\phi)$ has a single peak – this is important for obtaining an accurate estimate of the phase. On the other hand, the entropy allows to quantify good strategies when there are multiple peaks in $p(\phi)$ . In figure 3 we plot a prior with four peaks along with the expected entropy and sharpness gains to illustrate these differences. We see that the maximum entropy gain corresponds to a measurement with $k=2$ that will eliminate two of the peaks in $p(\phi)$ with high probability. In contrast, the expected sharpness gain for $k=2$ is zero because the density with either four or two equally spaced peaks has zero sharpness, meaning that the strategy of eliminating two peaks is not quantified by the sharpness. The expected sharpness gain in this case is non-zero only for odd values of $k$ since they will lead to a narrowing of the envelope of $p(\phi)$ . The entropy quantifies strategies that sharpen the overall envelope ( $k=1,3$ ), the strategy of eliminating two peaks ( $k=2$ ), and the strategy of sharpening the individual peaks ( $k=4$ ).

III.2 Choosing $k$

We now describe the two methods for choosing $k$ . As noted above, if experiments with different values of $k$ require different amounts of resources, comparing the expected knowledge gain for different $k$ does not directly provide a means for choosing $k$ .

Let $\mathbf{K}=\{k_{1},k_{2},\dots,k_{L}\}$ be a vector of possible $k$ -values for the next measurements, and let $\mathbf{T}=\{t_{1},t_{2},\dots,t_{L}\}$ be the vector of corresponding times; i.e. $t_{n}$ is the time required to perform an experiment with $k=k_{n}$ . The methods for choosing $k$ take the vectors $\mathbf{K}$ and $\mathbf{T}$ as input. The value of the expected gain used by these methods is the maximum over possible values of $\alpha\in[0,\pi]$ .

III.2.1 Multi-step gain method

The first method we study for the adaptive choice of $k$ is based on the following idea: if we compute the expected gain for more than one measurement we can compare the expected knowledge gain for different sequences of measurements that take the same total time. Comparing many possible $k$ -values in this way will generally lead to an expensive optimisation since there can be many measurement sequences that take the same total time, and because the complexity of computing the expected knowledge gain grows exponentially with the number of measurements (also see the Supplemental Material, section S.III). Therefore we devise a technique which allows us to compare only a few $k$ -values at a time.

The following example illustrates how this method proceeds. Suppose the amount of resources per $k$ -value is proportional to $k$ ( $\mathbf{T}\propto\mathbf{K}$ ) and we restrict the choice of $k$ to the set $\mathbf{K}=\{2^{n-1},n=1,2,\ldots,L\}$ . We can then compute the expected knowledge gain for performing two measurements with $k=1$ versus the gain for performing one measurement with $k=2$ . If the former is greater, we perform an experiment with $k=1$ . Otherwise, we can compute the expected knowledge gain for performing two measurements with $k=2$ versus that for performing one measurement with $k=4$ . If the former is greater, we perform an experiment with $k=2$ , and so on. Note that calculating the expected gain for multiple measurements requires optimising one control phase $\alpha$ for each measurement. This optimisation is performed sequentially starting with the first measurement. A detailed description is given in the Supplemental Material, section S.III.

In this example we see that we need only compute the expected knowledge gain for one or two measurements and for only a few possible sequences of measurements. We expect this to converge to a locally optimal choice of $k$ . This method for choosing $k$ can be generalised to other possible input vectors $\mathbf{K}$ and $\mathbf{T}$ using an algorithm which is given in the Supplemental Material, section S.III. It works with vectors $\mathbf{K}$ and $\mathbf{T}$ sorted in increasing order, under the assumption that $k_{j}>k_{i}$ implies $t_{j}\geq t_{i}$ , and restricted to $t_{i}/t_{i-1}<32/7$ . In this general algorithm we restrict the computation of expected knowledge gain to at most 5 measurements to prevent the computation from becoming too expensive. To allow for general inputs $\mathbf{K}$ and $\mathbf{T}$ under this restriction, we in general compare sequences that require only approximately the same time. The detailed method is described by algorithm S.5 in the Supplemental Material, section S.III.

III.2.2 Gain rate method

The second method we study for choosing $k$ uses the rate of expected knowledge gain. A similar idea has been used recently to estimate decoherence timescales in a qubit [79]. We compute a vector of expected knowledge gain $\mathbf{G}=\{g_{1},g_{2},\dots,g_{L}\}$ where $g_{n}$ corresponds to the expected knowledge gain for performing a single measurement with $k=k_{n}$ , computed using either (7) or (9) depending on the choice of gain function. We then calculate the vector of expected rate of knowledge gain $\mathbf{R}=\mathbf{G}/\mathbf{T}$ (elementwise division), and choose the $k=k_{n^{*}}$ with $n^{*}=\arg\max_{n}\mathbf{R}$ . The value of $n^{*}$ may be found by performing a brute force search over $k$ -values. Since we use a computation that runs in series, the brute force search can take too long for many practical implementations, and we reduce the computation by performing a Fibonacci search [80] to find a good value of $k$ . Since the computation of any element of $\mathbf{R}$ can be performed independently, a brute force search could be performed by computing the gain rate for each $k$ -value in parallel. Since the brute force search is expected to lead to slightly better performance than the Fibonacci search we use here, it could be considered for future implementations. Overall, the gain rate method has the advantage of being simpler and computationally cheaper than the multi-step method.

IV Numerical representations

The representation of phase knowledge using the Fourier series (2) means that the number of non-zero coefficients $c_{n}$ to keep track of increases with our knowledge of the phase. If the number of non-zero coefficients at step $s$ is $\Gamma$ , the computation for the Bayesian update (3) requires calculating $\Gamma+k$ new coefficients. Considering the case that we start with a uniform probability density, we initially have $c_{0}=1,\,c_{n}=0,\,\forall n\neq 0$ in (2). After performing some measurements with $N$ applications of $U$ in total, the last update will require calculating $\Gamma=N$ non-zero coefficients.

If we are in a situation which results in Heisenberg scaling, then $N\propto 1/\Delta\hat{\phi}$ . If the scaling of $\Delta\hat{\phi}$ is slower than Heisenberg scaling, then it follows from (3) that the number of coefficients $\Gamma$ after reaching a given value of $\Delta\hat{\phi}$ will be greater than for Heisenberg scaling. However, if we assume a probability density $p(\phi)$ which on average preserves its functional form (i.e. the overall shape stays the same and only the width changes) as estimation proceeds and $\Delta\hat{\phi}$ decreases, then from standard Fourier analysis $\Gamma$ should scale as $1/\Delta\hat{\phi}$ . This follows from the fact that we expect the square root of the Holevo variance of the Bayesian probability density, $\sqrt{V[p(\phi)]}$ , to scale in the same way as $\Delta\hat{\phi}$ with $N$ . This suggests that when we converge more slowly than Heisenberg scaling, the description of $p(\phi)$ obtained from (3) uses disproportionately many non-zero coefficients and becomes inefficient. For this case, we should seek a good representation of $p(\phi)$ with fewer coefficients to minimise computational overhead. In this sense we can expect that it should be possible for the number of coefficients to be inversely proportional to the uncertainty, $\Gamma\propto 1/\Delta\hat{\phi}$ , even if we do not have Heisenberg scaling.

The computation for adaptively choosing $\alpha$ and $k$ scales more favourably with $\Gamma$ than the Bayesian update. Computing the expected sharpness gain (7) does not scale with the total number of coefficients $\Gamma$ , while the expected entropy gain (9) scales as $\Gamma/k$ . Since we expect the optimal value of $k$ to increase with $N$ (in the noise-free case), there may exist algorithms, offering a complexity independent of $N$ , for choosing $k$ such that the expected entropy gain typically is maximised.

The practicality of using TAPE depends on the possibility of computing the Bayesian update and choice of control parameters ( $\alpha$ and $k$ ) in a time shorter or equal to the time it takes to run the next measurement. Since the update computation time scales linearly with $\Gamma$ , it will become longer than the measurement time for sufficiently large $\Gamma$ . In the asymptotic limit when $N\rightarrow\infty$ , the average measurement time, proportional to $k$ for sequential procedures, will also scale linearly with $N$ if we have Heisenberg scaling. If the slope of the linear increase in computation time is less than that of the measurement time, the computation will remain shorter than the measurement time on average for all $N$ . However, this may not be the case for many experiments where TAPE could otherwise be useful, for example if decoherence prevents large $k$ -value measurements from giving phase information, or if TAPE is used for parallel procedures. In these cases it is beneficial to find methods that reduce the time of computation. One approach is to change representation from a truncated Fourier series to another function such as a Gaussian which is described by a number of parameters that is independent of $N$ [81]. Here we propose an alternative method that allows the truncated Fourier series representation to be used for arbitrarily large $N$ while ensuring that the number of coefficients needed in the series scales much slower than $1/\Delta\hat{\phi}$ . This way all the procedures of sections II and III can still be used for large $N$ with minimal changes, and the user is also free to adjust the generality of the description of $p(\phi)$ versus computational overhead as suits their purposes.

Given a probability density $p(\phi)$ , we define a contraction of $p(\phi)$ as $(M,\phi_{0},q(\theta))$ , such that $p(\phi_{0}+\theta/M)\approx q(\theta)$ for $0\leq\theta<2\pi$ and $p(\phi)\approx 0$ otherwise. We refer to $M\in\mathbb{N}^{+}$ as the magnification, to $\phi_{0}\in[0,2\pi)$ as the offset, and we represent $q(\theta)$ as a truncated Fourier series. Thus, by representing the density $p(\phi)$ as a contraction we use the truncated Fourier series representation for $\phi_{0}\leq\phi<\phi_{0}+2\pi/M$ and assume that $p(\phi)=0$ for values of $\phi$ outside this interval. We are effectively “zooming in” on a region around the expected value of $\phi$ and assuming that the probability of $\phi$ outside this region is negligible. Given a contraction $(M,\phi_{0},q(\theta))$ where $q(\theta)$ has coefficients $c_{n},\,\forall n=-\Gamma,\dots,\Gamma$ we can calculate a new contraction $(M^{\prime},\phi_{0}^{\prime},q^{\prime}(\theta^{\prime}))$ , $M^{\prime}$ a multiple of $M$ with $m=M^{\prime}/M\in\mathbb{N}^{+}$ as

	$\displaystyle\phi_{0}^{\prime}$	$\displaystyle=\phi_{0}+\hat{\theta}/M-\pi/M^{\prime}$		(10)
	$\displaystyle c^{\prime}_{n}$	$\displaystyle=\mathrm{e}^{i(\pi-\arg(c_{-m}))}c_{m\times n},\,\forall n=-\Gamma/m,\dots,\Gamma/m\,,$		(11)

where $\hat{\theta}=\arg(c_{-1})$ .

By using the truncated Fourier series on a reduced interval of size $2\pi/M$ , a contraction $(M,\phi_{0},q(\theta))$ of $p(\phi)$ (with $\Gamma$ coefficients) uses only $\Gamma/M$ coefficients. The assumption that $p(\phi)=0$ outside the interval ${[\phi_{0},\phi_{0}+2\pi/M)}$ is often false with finite probability. However by performing more measurements this probability can always be reduced arbitrarily while increasing the number of coefficients $\Gamma$ . Once this probability is small enough for the given application we can make contractions to ensure that the number of coefficients no longer exceeds $\Gamma$ . Thus, with this method we must balance the probability that a contraction fails to represent the estimated phase against the maximum computation time per measurement.

In the Supplemental Material, section S.IV, we show that under the assumption of a Gaussian probability density $p(\phi)$ , using contractions reduces the number of Fourier coefficients required by $\mathcal{O}(\sqrt{\log(\log(\Gamma))})$ , while keeping the probability that $p(\phi)\neq 0$ outside the interval $[\phi_{0},\phi_{0}+2\pi/M)$ below a constant value (which can be chosen to be arbitrarily small). In general, the computational complexity will depend on the particular functional form of $p(\phi)$ , but the Gaussian case demonstrates that the method can sometimes greatly reduce computation. If a much poorer complexity is found when faster computation is needed, it could be helpful to modify the adaptive strategy for choosing $\alpha$ and $k$ such that $p(\phi)$ becomes closer to a Gaussian before each contraction is performed.

The expected sharpness or entropy gain relations (7) and (9) can be used for the contracted Fourier series representation $q(\theta)$ with the replacement $\phi\rightarrow\theta$ , although the values of the expected sharpness gain in that case will not correspond to those for the full density $p(\phi)$ ⁷⁷7Although the differential entropy is not invariant when changing the scale, the differential entropy gain is.. However the control parameters that achieve the maximum gain for the contraction can still be used to determine the values to use for the next measurement. Denoting the optimal control phase thus obtained as $\beta$ and the optimal number of applications of $U$ as $j$ , then the corresponding values to use for the un-contracted density $p(\phi)$ can be calculated as $k=Mj$ , $\alpha=\beta+k\phi_{0}$ . In this way we can use contractions to reduce the time not only for the update computation but also for the search of optimal parameters $\alpha$ and $k$ using the expected entropy gain ⁸⁸8this also works when using sharpness gain, but there is no speedup.

V Simulation results

In order to characterise the performance of the methods considered above we study the results of numerical simulations. We focus on estimation starting from a uniform prior (initially no information about the phase), though we note that our methods are also well-suited to any prior that can be well-represented with a Fourier series and the contractions described in section IV, given available computational resources and particular timing requirements of an experiment. We first consider the case of noise-free quantum metrology. In this context the time of the measurement is assumed to be proportional to the number of applications $k$ of the unitary evolution $U$ ; in the following we set the time of $U$ to 1 so that the time of a single measurement is equal to $k$ . In this case, the number of resources $N$ in the SQL, $\Delta\hat{\phi}>1/\sqrt{N}$ , and HL, $\Delta\hat{\phi}>\pi/N$ , is the total number of applications of $U$ in the estimation procedure, and is equal to the total estimation time.

In figure 4 we plot the uncertainty in the phase estimates resulting from repeating the estimation procedure with randomly chosen values for the system phase $\phi$ ⁹⁹9The performance of estimation may be dependent on the system phase when starting from a uniform prior if the control phase for the first measurement is fixed (e.g. $\alpha=0$ ). By choosing a random control phase $\alpha\in[0,\pi]$ for the first measurement, the performance is the same for any value of the system phase. Thus, our results can also be interpreted as the performance for any particular value of the system phase assuming a random control phase is used in the first measurement.. More specifically, we plot the uncertainty multiplied by total estimation time for which any phase estimation reaching HS results in a constant value independent of estimation time; for the HL, the constant value is $\pi$ . The uncertainty in the phase estimates $\Delta\hat{\phi}$ is calculated according to equation (5), where the sharpness is calculated as $S(\hat{\phi})=\langle\cos(\hat{\phi}-\phi)\rangle$ for a true system phase $\phi$ , and the expectation value is approximated as an average over simulations. This choice can only lead to larger uncertainty values than the usual sharpness, $|\langle\mathrm{e}^{i(\hat{\phi}-\phi)}\rangle|$ , since $\langle\cos(\hat{\phi}-\phi)\rangle=\mathfrak{Re}\{\langle\mathrm{e}^{i(\hat{\phi}-\phi)}\rangle\}$ . For the results in figure 4 we find $\mathfrak{Im}\{\langle\mathrm{e}^{i(\hat{\phi}-\phi)}\rangle\}\ll\mathfrak{Re}\{\langle\mathrm{e}^{i(\hat{\phi}-\phi)}\rangle\}$ , which shows that the estimator is unbiased and either choice of sharpness measure leads to essentially the same values for the uncertainty. Nevertheless, in all simulation results presented in this manuscript we have calculated the phase uncertainty $\Delta\hat{\phi}$ using $S(\hat{\phi})=\langle\cos(\hat{\phi}-\phi)\rangle$ .

For the multi-step (gain rate) method, each plotted point is $(\Delta\hat{\phi})\times(\text{total estimation time})$ calculated from $6\times 10^{4}$ ( $3\times 10^{6}$ ) realisations of estimation. We compare four adaptive methods for the choice of control parameters $\alpha$ and $k$ . For the multi-step method we plot separately the results of maximising either the expected sharpness gain (7) or the expected entropy gain (9) for total estimation times $2^{m},\,m=1,\dots,12$ . For the gain rate method we plot the results of maximising the expected sharpness gain (7) to determine the choices of $\alpha$ and $k$ , but we find that maximising the expected entropy gain (9) performs worse (not shown), which is due to the properties of entropy mentioned in section III.1. We also note that when maximising the expected entropy gain only, we tend to obtain very accurate estimates with a small number of outliers that significantly increase the phase uncertainty.

Instead of studying further the gain rate method maximising only expected entropy gain, we plot the result of a hybrid strategy in which the choices of $\alpha$ and $k$ are determined using the expected entropy gain for at most the first half of the total estimation time, and the expected sharpness gain is used for the remaining time. This approach is motivated by the fact that although maximising ones information about the phase is desired, obtaining a precise estimate for the phase additionally requires a narrow probability density $p_{s}(\phi)$ . Since multi-peak density functions can have the same entropy as single-peak functions with a broader peak, maximising the entropy doesn’t necessarily lead to a narrow density. We choose the hybrid method to study the performance of strategies that maximise the information gain, while still ensuring a narrow density $p_{s}(\phi)$ that is required for accurate phase estimation.

As shown by Gutiérrez-Rubio et al. [50], maximising the entropy gain leads to the maximum likelihood of estimating the correct system phase. Another way to understand the drawback of maximising the entropy gain only, is that maximising the likelihood of estimating the system phase $\phi$ does not generally minimise $\Delta\hat{\phi}$ , which depends on the entire distribution of phase estimates, call it $q(\hat{\phi})$ , rather than the value $q(\hat{\phi}=\phi)$ at a single point. Note that if one could maximise expected gain over an entire measurement sequence used for estimation, maximising the expected sharpness gain should minimise $\Delta\hat{\phi}$ [33]. Since we perform the optimisation only for the next measurement, it becomes interesting to study the hybrid method.

The two plotted methods based on gain rate are for total estimation times $2^{m},\,m=1,\dots,18$ , and all methods make use of contractions (section IV) to reduce computation times; if the square root of the Holevo variance of the phase density, $\sqrt{V[p(\phi)]}$ is less than $\frac{\pi}{2^{13}M}$ , where $M$ is the current magnification, then we perform at least one measurement using parameters $\alpha,k$ that maximise the expected sharpness gain, and then we perform a contraction with $m=2$ . The additional measurement to maximise the expected sharpness gain is used in the hybrid method, and multi-step method maximising entropy to reduce the probability of the system phase being outside the reduced interval over which the Fourier series representation of the phase knowledge is used in the contraction. Since $k$ -values are chosen adaptively, a sequence of optimal choices does not generally lead to the total times we choose to plot in figure 4. For the simulations plotted we therefore always restrict the optimisation over $k$ to values less than or equal to the total remaining time. For this reason we have performed simulations for each plotted point independently of the other points for different total times.

The results in figure 4 show that all methods perform better than the SQL. The gain rate method using sharpness gain or the hybrid method perform best, and we fit the results for total estimation times from $2^{12}$ to $2^{18}$ with a functional form $\Delta\hat{\phi}=A\pi N^{-\gamma}$ . The values of $A$ and $\gamma$ obtained from the fits are summarised in table 1. The hybrid method clearly reaches HS ( $\gamma=1$ ), while the method maximising sharpness only is extremely close to HS, and both methods are close to the theoretically lowest achievable uncertainty for the values simulated. In particular, the uncertainty for a total time of $2^{18}$ (i.e. the longest time simulated) is a factor of $1.422\pm 0.015$ from the HL when maximising sharpness gain only. For the same total time the hybrid method reaches an uncertainty within a factor of $1.429\pm 0.011$ from the HL.

Method	$A$	$\gamma$
sharpening	$1.371\pm 0.028$	$0.998\pm 0.002$
hybrid	$1.430\pm 0.036$	$1.0\pm 0.002$

Table 1: Values of the fitted parameters

A

and

\gamma

using the fit function

\Delta\hat{\phi}=A\pi N^{-\gamma}

and the uncertainties obtained for estimation times from

2^{12}

2^{18}

for the two gain rate methods plotted in figure 4.

We attribute the poorer performance of the multi-step method to the fact that the compared gains require only approximately equal time and to the fact that only a local optimum is used (see Supplemental Material, section S.III). Although we expect the performance could be improved by starting the search at a $k$ -value dependent on the Holevo variance of the prior (cheap to compute) and additionally using a “search down” method analogous to algorithm S.4 (see Supplemental Material, section S.III), we choose rather to focus on the gain rate method since it demonstrates near-optimal performance in addition to being simpler and computationally more efficient.

In the Supplemental Material, section S.I, we have also compared the multi-step and gain rate methods in the case where we allow only the subset of $k$ -values: $k\in\{2^{n}\},\,n=1,2,3,\dots$ . In this case the gains compared by the multi-step method require exactly the same time, and the method performs similarly to the gain rate method. However even in this situation the multi-step method at best performs similarly to the hybrid method using the gain rate. For completeness, we have also simulated the gain rate methods studied here when a brute force search for $k$ is used rather than a Fibonacci search. In that case the hybrid method performs best reaching an uncertainty within $1.394\pm 0.010$ of the HL. A summary of simulation results for the best performing methods we have studied is given in table S.1 in the Supplemental Material, section S.I.

In figure 5 we consider the performance of the methods using the gain rate when the measurement time is independent of $k$ . This situation describes the limit where the time required to perform the unitary evolution $U$ is negligible compared to other times in the experiment such as state preparation and readout. In this case the total time is equal to the number of measurements. We plot methods based on maximising either sharpness gain or entropy gain as well as a hybrid method, as in figure 4. In addition we compare the case where the gain maximisation is based on the correct rate in this context, i.e. $\mathbf{R}=\mathbf{G}/\mathbf{T}$ where $\mathbf{T}=\{1,1,\dots,1\}$ (so $\mathbf{R}=\mathbf{G}$ ), with the usual case in metrology: $\mathbf{T}\propto\mathbf{K}$ . We plot the mean error resulting from $12\times 10^{6}$ realisations of estimation from 1 to 40 measurements.

The results in figure 5 show that the strategies using the correct rate ( $\mathbf{R}=\mathbf{G}$ ) outperform those optimised for the usual metrology setting with $\mathbf{T}\propto\mathbf{K}$ . In this case we find that the method that maximises the expected entropy gain performs best while that maximising expected sharpness gives the lowest accuracy in phase estimation; the hybrid method has a performance in between the two. We fit the results from $t=10$ to $40$ measurements with an exponential decay $\Delta\hat{\phi}=A\exp{(-\kappa t)}$ . The values of the fitted parameters $A$ and $\kappa$ for each of the plotted curves in figure 5 are listed in table 2.

Method	$A$	$\kappa$
sharpening (M)	$0.7659\pm 0.0109$	$0.13164\pm 0.00102$
hybrid (M)	$0.7903\pm 0.0139$	$0.13536\pm 0.00127$
entropy (M)	$0.7984\pm 0.0119$	$0.14074\pm 0.00103$
sharpening	$0.8015\pm 0.0031$	$0.14149\pm 0.00028$
hybrid	$0.7739\pm 0.0018$	$0.14424\pm 0.00017$
entropy	$0.7829\pm 0.0027$	$0.14856\pm 0.00026$

Table 2: Values of the fitted parameters

A

and

\kappa

for the fit function

\Delta\hat{\phi}=A\exp{(-\kappa t)}

for each of the methods plotted in figure 5. (M) indicates the rate used for optimisation is the one usually considered in quantum metrology,

\mathbf{T}\propto\mathbf{K}

Although the performance is better when the correct rate is used, the improvement is not dramatic. This is perhaps not surprising since the quantum phase estimation algorithm (QPEA) [85, 72] is known to be optimal even when the number of applications of $U$ is not the relevant resource (indeed this is also the case in the setting of quantum computation) [3, 4]. Methods based on QPEA also lead to Heisenberg scaling in the context of quantum metrology [33]; that is, estimation procedures using similar allocation of $U$ per measurement have been shown to be optimal in both settings. Nevertheless, we see that in the case of the single-step optimisation we perform for sequential strategies we can obtain a slight improvement by using the appropriate rate in the optimisation. In this setting we attribute the better performance when maximising the information gain rather than the sharpness gain to the fact that higher $k$ -values are cheap compared to the metrology setting, allowing strategies that are better quantified by the richer nature of the entropy.

For the remaining simulations we return to the metrology setting where $\mathbf{T}\propto\mathbf{K}$ since in this context we can compare performance with known theoretical bounds. In particular, we study the performance in the presence of noise.

In figure 6 we model a system with dephasing. We assume perfectly prepared $|+\rangle\equiv(|0\rangle+|1\rangle)/\sqrt{2}$ states, and $U=\mathrm{e}^{-iZ\phi/2}$ , $Z$ the Pauli-z operator. After each application of $U$ we additionally apply a dephasing channel described by the Kraus operators $K_{0}=\sqrt{\frac{1+\eta}{2}}\openone$ , $K_{1}=\sqrt{\frac{1-\eta}{2}}Z$ . At the end we perform the measurement in the x-basis. In this case it is possible to have Heisenberg scaling only initially while increasing $k$ is beneficial, but as $k$ is increased dephasing eventually reduces the information available by the measurement and we are restricted to $1/\sqrt{N}$ scaling. The ultimate bound on precision can then be expressed as $c/\sqrt{N}$ , where $c$ is a prefactor depending on the noise channel. In the case of dephasing this prefactor has been shown to be equal to at least $\sqrt{1-\eta^{2}}/\eta$ (but it is not known if this bound is tight) [39]. In figure 6 we consider the case where $\eta=0.995$ ; the bound $c/\sqrt{N}$ is also plotted. As before we simulate the gain rate method for sharpness gain, entropy gain, and the hybrid case described above. We compare these three methods when no decoherence is accounted for in the estimator model ( $\zeta_{k}=1$ in equation (1)) and when the decoherence rate is already known ( $\zeta_{k}=\mathrm{e}^{-(1-\eta)k}$ ). We plot the phase uncertainties calculated from $32\times 10^{4}$ realisations of the estimation starting from a uniform prior and system phase chosen uniformly at random.

When decoherence is not accounted for in the estimator model, maximising the entropy leads to the poorest performance since this chooses larger $k$ -values for which the system decoheres, leading to reduced information in the measurement. For all methods that do not account for decoherence in the model, estimation works initially, but as $k$ is increased decoherence eventually introduces large random errors in the estimates. When decoherence is accounted for in the model we find that all methods perform well. When sufficiently many resources are used the method that maximises the rate of entropy gain performs similarly to the other methods. This can be explained by the fact that decoherence eventually limits the $k$ -values chosen by the strategy and the density $p_{s}(\phi)$ is far less likely to have multiple peaks. We find that when the decoherence is accounted for, the performance after a total estimation time of $2^{15}$ is within $1.6715\pm 0.0021$ of the theoretical bound when maximising the expected entropy gain, $1.6807\pm 0.0063$ when maximising the expected sharpness gain, and $1.6734\pm 0.0021$ for the hybrid method. Our methods could also be combined with an optimised estimation of decoherence timescales as proposed by Arshad et al. [79].

The results in figure 6 demonstrate some robustness of the estimation methods to errors not accounted for in the model. In figure 7 we present the results of a last set of simulations to examine further the robustness of the methods. In this case we assume perfectly prepared $|0\rangle$ states and $U=\mathrm{e}^{iY\phi/2}$ . After the $k$ applications of $U$ we apply a bit-flip channel with Kraus operators $K_{0}=\sqrt{1-p_{b}}\openone$ , $K_{1}=\sqrt{p_{b}}X$ , $X,Y$ Pauli-x,y, followed by spontaneous emission with Kraus operators

\displaystyle K_{0}

\displaystyle=\begin{pmatrix}1&0\\ 0&\sqrt{1-p_{s}}\end{pmatrix},\quad K_{1}=\begin{pmatrix}0&\sqrt{p_{s}}\\ 0&0\end{pmatrix}\,.

We choose to set $p_{b}=p/2$ , $p_{s}=p$ and simulate the hybrid gain rate method with $p=0.1,0.2,0.3$ . For each value of $p$ we simulate estimation using an estimator model with ( $\lambda_{k}=\zeta_{k}=(1-p),\,\forall k$ ) and without ( $\lambda_{k}=\zeta_{k}=1,\,\forall k$ ) errors included.

We see that when errors are included in the model the phase uncertainty is significantly larger than in the noise-free case, but we still retain better than classical scaling for longer estimation times. We fit the phase uncertainty with a functional form $\Delta\hat{\phi}=A\pi N^{-\gamma}$ , with fit parameters $A$ and $\gamma$ . By fitting subsets of points we see that $\gamma$ tends to increase for longer times, but we would need to simulate further to see if HS is eventually reached. When errors are not included in the model, only the $p=0.1$ case performs similarly to the SQL initially, while for larger errors the performance drops significantly. We see also that for longer estimation times the lack of errors in the model can prevent further reduction of the phase uncertainty. This suggests that it may be better to restrict to lower $k$ -values in the presence of large unknown errors. Overall, these results demonstrate robustness of the method to large errors indicating that TAPE can also be useful for calibration of quantum systems, e.g. for quantum computing, as considered in ref. [1].

In the noise-free case we find average computation times are $\sim 1\,\mathrm{ms}$ for the computation required for a single shot using a standard PC (CPU model: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz) without any parallel computation. When noise is included in the model we find that the maximisation of expected knowledge gain can take a time up to $\sim 3\,\mathrm{ms}$ for the entropy gain, and $\sim 1\,\mathrm{ms}$ when maximising the expected sharpness gain. This makes the method suitable for trapped ion or neutral atom based quantum computing systems where the time of a single shot is typically a few milliseconds. Since the computation of expected knowledge gain for each $k$ -value could be performed in parallel it may be possible to obtain sufficient speedups for some applications in superconducting circuits or NV centres. Further speedups may also be possible when maximising the expected sharpness gain if the optimal value of $\alpha$ where determined analytically. An alternative approach to achieve faster computation with the implementation we use here is to restrict measurements to use only a subset of possible $k$ -values; a further discussion can be found in the Supplemental Material, section S.I.

For experimental systems with sufficiently low measurement rates, TAPE thus provides a near-optimal and very flexible method for phase estimation. In the metrology setting without noise, the best performance we know of for phase estimation using a classical sequential strategy is the adaptive method of Higgins et al. that is inspired by QPEA from quantum computing [18, 4]. Their adaptive method reaches a phase uncertainty (as quantified by the square root of the Holevo variance of the estimates) a factor of $1.56$ larger than the HL, while they have later devised a non-adaptive method, also inspired by QPEA, that performs similarly, demonstrating uncertainty less than a factor of $2.03$ larger the HL [32, 33]. The latter non-adaptive method has also been shown to be a robust method for calibrating single-qubit gates for quantum computation [1]. Non-classical strategies for phase estimation reaching lower uncertainty estimates are known, as e.g. the method of Pezzè and Smerzi [86, 87] which provides phase estimates with uncertainty 1.27 larger than the HL. However, classical sequential strategies are best suited to single-qubit gate calibration. In the metrology setting we have found that TAPE reaches an uncertainty that is a factor of $1.43$ larger than the HL using the hybrid method, demonstrating similar or better performance to known methods. The results presented in figures 6 and 7 also show significant robustness to errors, demonstrating that TAPE can be a good method for calibrating single-qubit operations.

In contrast to algorithms inspired by QPEA where a predetermined number of measurements is performed for values of $k\in 2^{n},\,n=1,2,3,\dots$ , TAPE chooses the time of phase evolution (i.e. the $k$ -value) adaptively. For the non-adaptive method that is used for robust phase estimation [1, 32], the number of measurements for each $k$ -value is optimised beforehand for a given total number of phase applications $N=\sum_{s}k_{s}$ assuming no prior knowledge of the phase. If noise is present the optimisation needs to be modified as shown in [64]. But as noted therein, the noise models they consider “are to be thought more as toy models” that “capture some of the key features of those scenarios”. Depending on the type of noise present in a given experiment a further analysis and optimisation of the number of measurements to perform with each $k$ -value will be required to minimise the phase uncertainty. By using the general form for the measurement probabilities (1), TAPE allows for the description of a wide range of noise to be included in the model of the estimator and directly provides near-optimal phase estimation procedures by accounting for the modelled noise in the optimisation for the control parameters $\alpha$ and $k$ .

In addition, TAPE allows the exact experiment times to be easily included in the optimisation for the adaptive choice of $k$ -value. This is very convenient for experiments where state preparation and readout times cannot be neglected in comparison to the time required to apply the unknown phase. Using the QPEA inspired methods above, a further optimisation of the number of measurements with each $k$ -value would otherwise be needed to minimise the phase uncertainty.

In the QPEA inspired methods above, the value of $k$ to use for a particular step in the estimation sequence requires knowing how many measurements have been performed so far with each $k$ -value. In TAPE, once the model of the noise is specified (by setting the values $\lambda_{k},\,\zeta_{k}\,\forall k$ ), and the experiment times required for each $k$ -value are set, the values of the control parameters $\alpha$ and $k$ are determined using only the current prior knowledge density $p(\phi)$ . This makes it easy to apply TAPE in situations where some prior knowledge may be available. As an example suppose we would like to use phase estimation for calibrating single-qubit operations on a quantum computer where internal parameters of the device can drift over time. One could use either TAPE or a QPEA inspired method to initially estimate parameters. However, it would be easy to include the drift rate in the model of the phase knowledge $p(\phi)$ as e.g. a broadening of the probability density function over time. Then one could use TAPE to perform a minimal number of measurements to keep track of the parameters needed for single-qubit operations over time.

Other proposals for phase estimation such as [27, 67, 31], and some discussed in [25] are more similar to TAPE in that they choose the value of $k$ adaptively. However, they do not demonstrate better performance in terms of uncertainty of phase estimates or flexibility in terms of using potentially available prior information or accounting for experimental resources and imperfections.

VI Conclusion

Between the different forms of TAPE compared we find that choosing the control parameters for the phase $\alpha$ and number of unknown phase applications $k$ based on the rate of knowledge gain gives near-optimal performance in several different settings, while requiring computation times that make it accessible to many experiments. In the context of noise-free quantum metrology we reach uncertainties in phase estimates within $1.43$ of the HL using a hybrid method maximising the rate of expected entropy and sharpness gains. In addition, we have found uncertainties within $1.39$ of the HL using the hybrid gain rate method performing a brute-force search over measurement setting $k$ , rather than a Fibonacci search (Supplemental Material, section S.I, table S.1). Performing the computations for each $k$ -value in parallel would allow this to be done in times comparable to those we find for performing a computation in series with the Fibonacci search method, or even faster.

In a setting where experiment times are proportional to the number of measurements rather than to the number of unknown phase applications $k$ , we find that maximising the information gain only, leads to the best performance, while the hybrid strategy performs only slightly worse. The method is also able to find optimal strategies in the presence of different types of noise, and demonstrates significant robustness to errors. Combined with the fact that the optimisation can be easily tuned to the real times of experiments as a function of $k$ and can be used with arbitrary prior information, TAPE thus provides an extremely versatile phase estimation method that can directly give optimal performance in a wide range of experimental settings.

Code availability

The core implementation of TAPE used for all simulations in this work is available at:

https://bitbucket.org/brennann/qtape/src/main/

The gain rate method is provided therein. The multi-step method was implemented in python using the methods from the core implementation. The code used to generate the figures and values simulated in this work are available from B.N. upon reasonable request.

Author contributions

Initial theory for phase estimation in the noise-free case for $k=1$ including derivation of an expression for the expected entropy gain was done by A.V.L., and the resulting adaptive method was implemented by V.N. for a trapped-ion experiment in the group of J.P.H. to perform adaptive Ramsey measurements. B.N. had the idea to choose different $k$ -values adaptively using the same formalism, and worked out the theory in the general case allowing for a range of noise. B.N. devised the gain rate and multi-step methods and chose to study the sharpness as a measure of knowledge in addition to the entropy. The manuscript was written by B.N. with input from all the authors.

Acknowledgements.

We acknowledge support from the Swiss National Science Foundation (SNF) under Grant No. 200020_179147, and from Intelligence Advanced Research Projects Activity (IARPA), via the US Army Research Office grant W911NF-16-1-0070. B.N. thanks Ivan Rojkov for helpful feedback on the manuscript.

References

Russo et al. [2021] A. E. Russo, W. M. Kirby, K. M. Rudinger, A. D. Baczewski, and S. Kimmel, Consistency testing for robust phase estimation, Phys. Rev. A 103, 042609 (2021).
Gebhart et al. [2023] V. Gebhart, R. Santagati, A. A. Gentile, E. M. Gauger, D. Craig, N. Ares, L. Banchi, F. Marquardt, L. Pezzè, and C. Bonato, Learning quantum systems, Nature Reviews Physics 5, 141 (2023).
van Dam et al. [2007] W. van Dam, G. M. D’Ariano, A. Ekert, C. Macchiavello, and M. Mosca, Optimal quantum circuits for general phase estimation, Phys. Rev. Lett. 98, 090501 (2007).
Wiseman et al. [2009] H. M. Wiseman, D. W. Berry, S. D. Bartlett, B. L. Higgins, and G. J. Pryde, Adaptive measurements in the optical quantum information laboratory, IEEE Journal of Selected Topics in Quantum Electronics 15, 1661 (2009).
Kaftal and Demkowicz-Dobrzański [2014] T. Kaftal and R. Demkowicz-Dobrzański, Usefulness of an enhanced kitaev phase-estimation algorithm in quantum metrology and computation, Phys. Rev. A 90, 062313 (2014).
Górecki et al. [2020] W. Górecki, R. Demkowicz-Dobrzański, H. M. Wiseman, and D. W. Berry, $\pi$ -corrected heisenberg limit, Phys. Rev. Lett. 124, 030501 (2020).
Giovannetti et al. [2004] V. Giovannetti, S. Lloyd, and L. Maccone, Quantum-enhanced measurements: Beating the standard quantum limit, Science 306, 1330 (2004), https://www.science.org/doi/pdf/10.1126/science.1104149 .
Luis [2002] A. Luis, Phase-shift amplification for precision measurements without nonclassical states, Phys. Rev. A 65, 025802 (2002).
Rudolph and Grover [2003] T. Rudolph and L. Grover, Quantum communication complexity of establishing a shared reference frame, Phys. Rev. Lett. 91, 217905 (2003).
de Burgh and Bartlett [2005] M. de Burgh and S. D. Bartlett, Quantum methods for clock synchronization: Beating the standard quantum limit without entanglement, Phys. Rev. A 72, 042301 (2005).
Giovannetti et al. [2006] V. Giovannetti, S. Lloyd, and L. Maccone, Quantum metrology, Phys. Rev. Lett. 96, 010401 (2006).
O’Loan [2009] C. J. O’Loan, Iterative phase estimation, Journal of Physics A: Mathematical and Theoretical 43, 015301 (2009).
Boixo and Heunen [2012] S. Boixo and C. Heunen, Entangled and sequential quantum protocols with dephasing, Phys. Rev. Lett. 108, 120402 (2012).
Maccone [2013] L. Maccone, Intuitive reason for the usefulness of entanglement in quantum metrology, Phys. Rev. A 88, 042109 (2013).
Note [1] This is not necessarily true when noise is considered [88].
Kitaev [1995] A. Y. Kitaev, Quantum measurements and the abelian stabilizer problem, Electron. Colloquium Comput. Complex. TR96 (1995).
Griffiths and Niu [1996] R. B. Griffiths and C.-S. Niu, Semiclassical fourier transform for quantum computation, Phys. Rev. Lett. 76, 3228 (1996).
Higgins et al. [2007] B. L. Higgins, D. W. Berry, S. D. Bartlett, H. M. Wiseman, and G. J. Pryde, Entanglement-free heisenberg-limited phase estimation, Nature 450, 393 (2007).
Teklu et al. [2009] B. Teklu, S. Olivares, and M. G. A. Paris, Bayesian estimation of one-parameter qubit gates, Journal of Physics B: Atomic, Molecular and Optical Physics 42, 035502 (2009).
Brivio et al. [2010] D. Brivio, S. Cialdi, S. Vezzoli, B. T. Gebrehiwot, M. G. Genoni, S. Olivares, and M. G. A. Paris, Experimental estimation of one-parameter qubit gates in the presence of phase diffusion, Phys. Rev. A 81, 012305 (2010).
Kimmel et al. [2015] S. Kimmel, G. H. Low, and T. J. Yoder, Robust calibration of a universal single-qubit gate set via robust phase estimation, Physical Review A 92, 062315 (2015).
Martínez-García et al. [2019] F. Martínez-García, D. Vodola, and M. Müller, Adaptive bayesian phase estimation for quantum error correcting codes, New Journal of Physics 21, 123027 (2019).
Wiseman and Killip [1997] H. M. Wiseman and R. B. Killip, Adaptive single-shot phase measurements: A semiclassical approach, Phys. Rev. A 56, 944 (1997).
Wiseman and Killip [1998] H. M. Wiseman and R. B. Killip, Adaptive single-shot phase measurements: The full quantum theory, Phys. Rev. A 57, 2169 (1998).
Berry and Wiseman [2000] D. W. Berry and H. M. Wiseman, Optimal states and almost optimal adaptive measurements for quantum interferometry, Phys. Rev. Lett. 85, 5098 (2000).
Berry et al. [2001] D. W. Berry, H. M. Wiseman, and J. K. Breslin, Optimal input states and feedback for interferometric phase estimation, Phys. Rev. A 63, 053804 (2001).
Mitchell [2005] M. W. Mitchell, Metrology with entangled states, in Quantum Communications and Quantum Imaging III, Vol. 5893, edited by R. E. Meyers and Y. Shih, International Society for Optics and Photonics (SPIE, 2005) p. 589310.
Boixo and Somma [2008] S. Boixo and R. D. Somma, Parameter estimation with mixed-state quantum computation, Phys. Rev. A 77, 052320 (2008).
Olivares and Paris [2009] S. Olivares and M. G. A. Paris, Bayesian estimation in homodyne interferometry, Journal of Physics B: Atomic, Molecular and Optical Physics 42, 055506 (2009).
Valeri et al. [2023] M. Valeri, V. Cimini, S. Piacentini, F. Ceccarelli, E. Polino, F. Hoch, G. Bizzarri, G. Corrielli, N. Spagnolo, R. Osellame, and F. Sciarrino, Experimental multiparameter quantum metrology in adaptive regime, Phys. Rev. Res. 5, 013138 (2023).
Smith et al. [2023] J. G. Smith, C. H. W. Barnes, and D. R. M. Arvidsson-Shukur, An adaptive Bayesian quantum algorithm for phase estimation, arXiv:2303.01517 [quant-ph] (2023).
Higgins et al. [2009] B. L. Higgins, D. W. Berry, S. D. Bartlett, M. W. Mitchell, H. M. Wiseman, and G. J. Pryde, Demonstrating heisenberg-limited unambiguous phase estimation without adaptive measurements, New Journal of Physics 11, 073023 (2009).
Berry et al. [2009] D. W. Berry, B. L. Higgins, S. D. Bartlett, M. W. Mitchell, G. J. Pryde, and H. M. Wiseman, How to perform the most accurate possible phase measurements, Phys. Rev. A 80, 052114 (2009).
Giovannetti et al. [2011] V. Giovannetti, S. Lloyd, and L. Maccone, Advances in quantum metrology, Nature Photonics 5, 222 (2011).
Shaji and Caves [2007] A. Shaji and C. M. Caves, Qubit metrology and decoherence, Phys. Rev. A 76, 032111 (2007).
Escher et al. [2011] B. M. Escher, R. L. de Matos Filho, and L. Davidovich, General framework for estimating the ultimate precision limit in noisy quantum-enhanced metrology, Nature Physics 7, 406 (2011).
Maccone and Giovannetti [2011] L. Maccone and V. Giovannetti, Beauty and the noisy beast, Nature Physics 7, 376 (2011).
Escher et al. [2012] B. M. Escher, L. Davidovich, N. Zagury, and R. L. de Matos Filho, Quantum metrological limits via a variational approach, Phys. Rev. Lett. 109, 190404 (2012).
Demkowicz-Dobrzański et al. [2012] R. Demkowicz-Dobrzański, J. Kołodyński, and M. GuŢă, The elusive heisenberg limit in quantum-enhanced metrology, Nature Communications 3, 1063 (2012).
Kołodyński and Demkowicz-Dobrzański [2013] J. Kołodyński and R. Demkowicz-Dobrzański, Efficient tools for quantum metrology with uncorrelated noise, New Journal of Physics 15, 073043 (2013).
Alipour et al. [2014] S. Alipour, M. Mehboudi, and A. T. Rezakhani, Quantum metrology in open systems: Dissipative cramér-rao bound, Phys. Rev. Lett. 112, 120405 (2014).
Macieszczak et al. [2014] K. Macieszczak, M. Fraas, and R. Demkowicz-Dobrzański, Bayesian quantum frequency estimation in presence of collective dephasing, New Journal of Physics 16, 113002 (2014).
Demkowicz-Dobrzański et al. [2017] R. Demkowicz-Dobrzański, J. Czajkowski, and P. Sekatski, Adaptive quantum metrology under general markovian noise, Phys. Rev. X 7, 041009 (2017).
Cole et al. [2006] J. H. Cole, A. D. Greentree, D. K. L. Oi, S. G. Schirmer, C. J. Wellard, and L. C. L. Hollenberg, Identifying a two-state hamiltonian in the presence of decoherence, Phys. Rev. A 73, 062333 (2006).
Maccone and De Cillis [2009] L. Maccone and G. De Cillis, Robust strategies for lossy quantum interferometry, Phys. Rev. A 79, 023812 (2009).
Dorner et al. [2009] U. Dorner, R. Demkowicz-Dobrzanski, B. J. Smith, J. S. Lundeen, W. Wasilewski, K. Banaszek, and I. A. Walmsley, Optimal quantum phase estimation, Phys. Rev. Lett. 102, 040403 (2009).
Kołodyński and Demkowicz-Dobrzański [2010] J. Kołodyński and R. Demkowicz-Dobrzański, Phase estimation without a priori phase knowledge in the presence of loss, Phys. Rev. A 82, 053804 (2010).
Kacprowicz et al. [2010] M. Kacprowicz, R. Demkowicz-Dobrzański, W. Wasilewski, K. Banaszek, and I. A. Walmsley, Experimental quantum-enhanced estimation of a lossy phase shift, Nature Photonics 4, 357 (2010).
Vidrighin et al. [2014] M. D. Vidrighin, G. Donati, M. G. Genoni, X.-M. Jin, W. S. Kolthammer, M. S. Kim, A. Datta, M. Barbieri, and I. A. Walmsley, Joint estimation of phase and phase diffusion for quantum metrology, Nature Communications 5, 3532 (2014).
Ángel Gutiérrez-Rubio et al. [2020] Ángel Gutiérrez-Rubio, P. Stano, and D. Loss, Optimal frequency estimation and its application to quantum dots, arXiv:2004.12049 [cond-mat.mes-hall] (2020).
Cappellaro [2012] P. Cappellaro, Spin-bath narrowing with adaptive parameter estimation, Phys. Rev. A 85, 030301 (2012).
Hayes and Berry [2014] A. J. F. Hayes and D. W. Berry, Swarm optimization for adaptive phase measurements with low visibility, Phys. Rev. A 89, 013838 (2014).
Bonato et al. [2016] C. Bonato, M. S. Blok, H. T. Dinani, D. W. Berry, M. L. Markham, D. J. Twitchen, and R. Hanson, Optimized quantum sensing with a single electron spin using real-time adaptive measurements, Nature Nanotechnology 11, 247 (2016).
Bonato and Berry [2017] C. Bonato and D. W. Berry, Adaptive tracking of a time-varying field with a quantum sensor, Phys. Rev. A 95, 052348 (2017).
Santagati et al. [2019] R. Santagati, A. A. Gentile, S. Knauer, S. Schmitt, S. Paesani, C. Granade, N. Wiebe, C. Osterkamp, L. P. McGuinness, J. Wang, M. G. Thompson, J. G. Rarity, F. Jelezko, and A. Laing, Magnetic-field learning using a single electronic spin in diamond with one-photon readout at room temperature, Phys. Rev. X 9, 021019 (2019).
Joas et al. [2021] T. Joas, S. Schmitt, R. Santagati, A. A. Gentile, C. Bonato, A. Laing, L. P. McGuinness, and F. Jelezko, Online adaptive quantum characterization of a nuclear spin, npj Quantum Information 7, 56 (2021).
McMichael et al. [2021] R. D. McMichael, S. Dushenko, and S. M. Blakley, Sequential bayesian experiment design for adaptive ramsey sequence measurements, Journal of Applied Physics 130, 144401 (2021), https://doi.org/10.1063/5.0055630 .
Zohar et al. [2022] I. Zohar, Y. Romach, M. J. Arshad, N. Halay, N. Drucker, R. Stöhr, A. Denisenko, Y. Cohen, C. Bonato, and A. Finkler, Real-time frequency estimation of a qubit without single-shot-readout, arXiv:2210.05542 [quant-ph] (2022).
Note [2] Readout can also be improved by adaptive methods [89, 90].
Said et al. [2011] R. S. Said, D. W. Berry, and J. Twamley, Nanoscale magnetometry using a single-spin system in diamond, Phys. Rev. B 83, 125410 (2011).
Waldherr et al. [2012] G. Waldherr, J. Beck, P. Neumann, R. S. Said, M. Nitsche, M. L. Markham, D. J. Twitchen, J. Twamley, F. Jelezko, and J. Wrachtrup, High-dynamic-range magnetometry with a single nuclear spin in diamond, Nature Nanotechnology 7, 105 (2012).
Nusran et al. [2012] N. M. Nusran, M. U. Momeen, and M. V. G. Dutt, High-dynamic-range magnetometry with a single electronic spin in diamond, Nature Nanotechnology 7, 109 (2012).
Danilin et al. [2018] S. Danilin, A. V. Lebedev, A. Vepsäläinen, G. B. Lesovik, G. Blatter, and G. S. Paraoanu, Quantum-enhanced magnetometry by phase estimation algorithms with a single artificial atom, npj Quantum Information 4, 29 (2018).
Belliardo and Giovannetti [2020] F. Belliardo and V. Giovannetti, Achieving heisenberg scaling with maximally entangled states: An analytic upper bound for the attainable root-mean-square error, Phys. Rev. A 102, 042613 (2020).
Note [3] This is an example where entanglement can be used to convert temporal resources into spacial resources [11].
Granade et al. [2012] C. E. Granade, C. Ferrie, N. Wiebe, and D. G. Cory, Robust online hamiltonian learning, New Journal of Physics 14, 103013 (2012).
Wiebe and Granade [2016] N. Wiebe and C. Granade, Efficient Bayesian Phase Estimation, Phys. Rev. Lett. 117, 010503 (2016).
Paesani et al. [2017] S. Paesani, A. A. Gentile, R. Santagati, J. Wang, N. Wiebe, D. P. Tew, J. L. O’Brien, and M. G. Thompson, Experimental bayesian quantum phase estimation on a silicon photonic chip, Phys. Rev. Lett. 118, 100503 (2017).
Gebhart et al. [2021] V. Gebhart, A. Smerzi, and L. Pezzè, Bayesian Quantum Multiphase Estimation Algorithm, Phys. Rev. Appl. 16, 014035 (2021).
Demkowicz-Dobrzański [2011] R. Demkowicz-Dobrzański, Optimal phase estimation with arbitrary a priori knowledge, Phys. Rev. A 83, 061802 (2011).
Sergeevich et al. [2011] A. Sergeevich, A. Chandran, J. Combes, S. D. Bartlett, and H. M. Wiseman, Characterization of a qubit hamiltonian using adaptive measurements in a fixed basis, Phys. Rev. A 84, 052315 (2011).
Nielsen and Chuang [2010] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge University Press, 2010).
Note [4] While $\lambda_{k}<1$ describes a reduced probability for the outcome $\xi=-1$ , the reverse situation can always be described by relabelling the outcomes.
Holevo [2011] A. Holevo, Probabilistic and Statistical Aspects of Quantum Theory, 2nd ed., Publications of the Scuola Normale Superiore (Edizioni della Normale Pisa, 2011).
Note [5] Even more generally, which resources are valuable depends on the setting and what the experimenter wants to optimise. e.g. they may have a restricted number of qubits to measure, so that number of measurements becomes the relevant resource. In this sense it is useful to have a method where the optimisation can be adjusted by the experimenter to describe best how they value their resources.
Note [6] The differential entropy is the entropy of a continuous random variable, but it lacks some important properties of the Shannon entropy for discrete random variables. See e.g. [91], chapter 8. In this manuscript we will usually write simply “entropy” when referring to the differential entropy.
Kullback and Leibler [1951] S. Kullback and R. A. Leibler, On Information and Sufficiency, The Annals of Mathematical Statistics 22, 79 (1951).
Kullback [1978] S. Kullback, Information Theory and Statistics (Dover Publications, Inc., Gloucester Mass., USA, 1978).
Arshad et al. [2022] M. J. Arshad, C. Bekker, B. Haylock, K. Skrzypczak, D. White, B. Griffiths, J. Gore, G. W. Morley, P. Salter, J. Smith, I. Zohar, A. Finkler, Y. Altmann, E. M. Gauger, and C. Bonato, Online adaptive estimation of decoherence timescales for a single qubit, arXiv:2210.06103 [quant-ph] (2022).
Avriel and Wilde [1966] M. Avriel and D. J. Wilde, Optimality proof for the symmetric Fibonacci search technique, Fibonacci Quarterly 4, 265 (1966).
van den Berg [2021] E. van den Berg, Efficient Bayesian phase estimation using mixed priors, Quantum 5, 469 (2021).
Note [7] Although the differential entropy is not invariant when changing the scale, the differential entropy gain is.
Note [8] This also works when using sharpness gain, but there is no speedup.
Note [9] The performance of estimation may be dependent on the system phase when starting from a uniform prior if the control phase for the first measurement is fixed (e.g. $\alpha=0$ ). By choosing a random control phase $\alpha\in[0,\pi]$ for the first measurement, the performance is the same for any value of the system phase. Thus, our results can also be interpreted as the performance for any particular value of the system phase assuming a random control phase is used in the first measurement.
Cleve R. and M. [1998] M. C. Cleve R., Ekert A. and M. M., Quantum algorithms revisited, in Proceedings of the Royal Society Lond. A, Vol. 454 (1998) pp. 339–354.
Pezzè and Smerzi [2020] L. Pezzè and A. Smerzi, Heisenberg-Limited Noisy Atomic Clock Using a Hybrid Coherent and Squeezed State Protocol, Phys. Rev. Lett. 125, 210503 (2020).
Pezzè and Smerzi [2021] L. Pezzè and A. Smerzi, Quantum Phase Estimation Algorithm with Gaussian Spin States, PRX Quantum 2, 040301 (2021).
Demkowicz-Dobrzański and Maccone [2014] R. Demkowicz-Dobrzański and L. Maccone, Using entanglement against noise in quantum metrology, Phys. Rev. Lett. 113, 250801 (2014).
Myerson et al. [2008] A. H. Myerson, D. J. Szwer, S. C. Webster, D. T. C. Allcock, M. J. Curtis, G. Imreh, J. A. Sherman, D. N. Stacey, A. M. Steane, and D. M. Lucas, High-fidelity readout of trapped-ion qubits, Phys. Rev. Lett. 100, 200502 (2008).
D’Anjou et al. [2016] B. D’Anjou, L. Kuret, L. Childress, and W. A. Coish, Maximal adaptive-decision speedups in quantum-state readout, Phys. Rev. X 6, 011017 (2016).
Cover and Thomas [2005] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. (John Wiley & Sons, Inc., Hoboken, New Jersey, USA, 2005).
[92] Digital Library of Mathematical Functions, National Institute of Standards and Technology, https://dlmf.nist.gov/7.17, accessed: 19.01.2024.
Blair et al. [1976] J. M. Blair, C. A. Edwards, and J. H. Johnson, Rational Chebyshev Approximations for the Inverse of the Error Function, Mathematics of Computation 30, 827 (1976).
Note [10] This is possible since $|\beta|\leq 1$ .
Khristo N. Boyadzhiev [2012] Khristo N. Boyadzhiev, Series with Central Binomial Coefficients, Catalan Numbers, and Harmonic Numbers, Journal of Integer Sequences 15 (2012).
Note [11] This is required by normalisation of $p_{s-1}(\phi)$ .

Supplemental Material I

S.I $k$ -value subsets

In order to simplify the optimisation procedure that is performed to choose $\alpha$ and $k$ adaptively for each experiment it is interesting to consider procedures where we restrict $k$ -values to certain subsets. The main advantage is that the optimisation can be performed in less time (without parallel computation) which could make TAPE accessible to experimental systems where the time of a single shot is shorter. Here we focus on the commonly studied subset containing only powers of two: $k\in\{2^{n}\},\,n=1,2,3,\dots$ .

In the particular case when $\mathbf{T}\propto\mathbf{K}$ , which is usually studied in quantum metrology the subset with only powers of two is also interesting to study for the multi-step method; the results of section S.III show that the multi-step method is limited by the fact that it converges to local maxima when optimising the choice of $k$ -value. This is due to the fact that the expected knowledge gains compared are for sequences of experiments that only take approximately equal time. When $\mathbf{T}\propto\mathbf{K}$ and $k\in\{2^{n}\},\,n=1,2,3,\dots$ , the gains compared by the multi-step method are for experiments that take exactly the same time, thereby avoiding the limitations of the multi-step method that occur in the general case.

We fit all results with a functional form $\Delta\hat{\phi}=A\pi N^{-\gamma}$ . For the multi-step method, we fit for total estimation times from $2^{12}$ to $2^{14}$ , and for the gain rate method from $2^{12}$ to $2^{18}$ . The fit parameters as well as ratios to the HL after $N=2^{14}$ and $2^{18}$ are summarised in table S.1. Some results from the main text for cases where all $k$ -values are used are also included in table S.1 for comparison.

Method	$A$	$\gamma$	ratio to HL, $N=2^{14}$	ratio to HL, $N=2^{18}$
hybrid, all $k$ (BFS)	—	—	—	$1.394\pm 0.010$
sharp., all $k$ (BFS)	—	—	—	$1.400\pm 0.013$
hybrid, all $k$ (FS)	$1.430\pm 0.036$	$1.0\pm 0.002$	$1.420\pm 0.009$	$1.429\pm 0.011$
sharp., all $k$ (FS)	$1.371\pm 0.028$	$0.998\pm 0.002$	$1.390\pm 0.010$	$1.422\pm 0.015$
hybrid, $k\in\{2^{n}\}$ (BFS)	$1.304\pm 0.076$	$0.983\pm 0.006$	$1.738\pm 0.223$	$1.617\pm 0.026$
sharp., $k\in\{2^{n}\}$ (BFS)	$0.452\pm 0.052$	$0.862\pm 0.010$	$1.851\pm 0.214$	$2.528\pm 0.019$
multi-step, entropy, $k\in\{2^{n}\}$	$2.131\pm 0.618$	$1.0\pm 0.018$	$2.128\pm 0.075$	—
multi-step, sharp., $k\in\{2^{n}\}$	$1.044\pm 0.007$	$0.958\pm 0.001$	$1.563\pm 0.010$	—

Table S.1: Fit parameters for the methods plotted in figure S.1 using a functional form

\Delta\hat{\phi}=A\pi N^{-\gamma}

, and ratios to the HL after total estimation times of

N=2^{14}

and

2^{18}

. The results for the hybrid and maximum sharpness gain rate methods using contraction from figure 4 are also included for comparison. In addition we have put the results of gain rate methods when a brute force search is used; these particular simulations are results from

2\times 10^{6}

estimations with random system phase. BFS: brute force search. FS: Fibonacci search. All results are fitted for total estimation times

\geq 2^{12}

(except for BFS since we have only simulated the

N=2^{18}

case).

Method	$t_{\mathrm{up}}$	$t_{H}$	$t_{S}$	$t_{\mathrm{con}}$
hybrid, all $k$ (BFS)	$40\,(330)\,\mu s$	$2.5\,(17)\,ms$	$6\,(20)\,ms$	$37\,(70)\,\mu s$
sharp., all $k$ (BFS)	$17\,(125)\,\mu s$	—	$3.2\,(20)\,ms$	$27\,(50)\,\mu s$
hybrid, all $k$ (FS)	$42\,(330)\,\mu s$	$800\,(4300)\,\mu s$	$800\,(1700)\,\mu s$	$38\,(70)\,\mu s$
sharp., all $k$ (FS)	$18\,(120)\,\mu s$	—	$450\,(1900)\,\mu s$	$27\,(40)\,\mu s$
hybrid, $k\in\{2^{n}\}$ (BFS)	$90\,(700)\,\mu s$	$250\,(1200)\,\mu s$	$8\,(25)\mu s$	$40\,(80)\,\mu s$
sharp., $k\in\{2^{n}\}$ (BFS)	$60\,(1000)\,\mu s$	—	$8\,(40)\,\mu s$	$45(100)\mu s$

Table S.2: Benchmarks. A summary of approximate times required for computation with CPU model: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz. All values are rough estimates from running

\sim 20

repetitions with total estimation time

N=2^{16}

. In all simulations performed here we assume the experiment and modelled measurement probabilities are noise-free, and

\mathbf{T}\propto\mathbf{K}

t_{\mathrm{up}}

is the time required for the Bayesian update.

t_{H}

is the time required to determine the optimal values of

\alpha

, and

k

by maximising the expected entropy gain rate.

t_{S}

is the time required for determining

\alpha

and

k

that maximise the expected sharpness gain rate. And

t_{\mathrm{con}}

is the time required for contraction (when

\sqrt{V[p(\phi)]}

, is less than

\frac{\pi}{2^{12}M}

then we perform at least one measurement using parameters

\alpha,k

that maximise the expected sharpness gain, and then we perform a contraction with

m=2

). Values represent rough averages over all shots and runs of estimation. In parentheses are maximum values observed on a single shot over all runs of estimation. Methods are listed for brute force search (BFS) and Fibonacci search (FS) of the optimal

k

-value.

We see that gain rate methods optimising over all possible $k$ -values reach HS, while those using the subset $k\in\{2^{n}\},\,n=1,2,3,\dots$ do not. However, the hybrid method using the subset $k\in\{2^{n}\},\,n=1,2,3,\dots$ is very close to HS and performs only slightly worse than when all $k$ -values are used. For methods allowing all $k$ -values we performed contractions when the Holevo variance of the phase density, $V[p(\phi)]$ was less than $\frac{\pi}{2^{13}M}$ , while when using only a subset of $k$ -values we had to use the condition $V[p(\phi)]<\frac{\pi}{2^{14}M}$ to sufficiently suppress unwanted estimation errors. This suggests the probability distribution of phase estimates can have larger tails when using only subsets of $k$ -values. This in turn requires a more cautious contraction criterion leading to larger numbers of coefficients in the series representation for $p(\phi)$ , and therefore longer computation times are expected. If the goal of using subsets is to lower computation times for determining the optimal $k$ , the potential increase in computation due to requiring more coefficients must therefore also be considered.

The results for the multi-step method show that while it performs much better when the compared expected gains are for sequences that require exactly the same time (rather than approximately), it does not perform particularly better than the simpler and more versatile hybrid gain rate method. In particular, only the multi-step method optimising entropy gain reaches HS, but with a notably larger pre-factor of 2.13 compared to the gain rate methods.

For the estimation sequences using a total time of $2^{16}$ a summary of some computation time benchmarks for gain rate methods are given in table S.2. Since for later shots in the sequence contraction is also performed on a significant fraction of shots, times required for this computation are also listed. Here we have studied the cases where $k\in\{2^{n}\}\,,n=0,1,2,\dots$ , while the cases where all $k$ -values (up to some maximum value) are searched using a series of Fibonacci searches was studied in the main text. For comparison the case where a brute force search is performed over all $k$ -values is also included.

We observe that the time required for the Bayesian update depends on the method used. We expect this to be related to the size of $k$ -values chosen by a strategy since this determines how many coefficients must be used in the Fourier representation. Optimising the rate of entropy gain tends to choose larger $k$ -values than the sharpness. When considering only a subset of possible $k$ -values, we expect that occasionally the algorithm will choose a larger value of $k$ than if all values were considered. This would then have the effect of increasing maximum update time. We see that this is indeed the case when restricting to the subset $k\in\{2^{n}\}\,,n=0,1,2,\dots$ , and that the average update time is also slightly longer.

When considering the times required to determine optimal values of $\alpha$ and $k$ , it’s important to note that since the optimal $k$ increases with phase knowledge, later shots (i.e. measurements) generally use higher $k$ -values and require more time for the optimisation. When the phase knowledge is sufficient, contractions are performed that prevent the required computation per shot from increasing further. In the hybrid method, at most the first half of the total time (for values in table S.2, $2^{15}$ ) is used for shots with $\alpha$ and $k$ optimised using entropy gain rate; since $k$ tends to increase, this is most of the shots in the estimation sequence. For the remaining time $\alpha$ and $k$ are chosen to maximise the expected rate of sharpness gain. Due to these changes in the computation time with every shot, the times listed in table S.2 for the hybrid method cannot be directly compared with those where the expected rate of sharpness gain is maximised for all shots.

Using the same implementation and similar processor in an experiment we see that the brute force optimisation over all $k$ values would be appropriate for single shot times of at least $\sim 10\;ms$ . Using a series of Fibonacci searches, as was done for the methods studied in the main text, would be appropriate for single shot times of at least $\sim 1\;ms$ for the hybrid method, and $\sim 500\;\mu s$ when using only sharpness gain. Using the subset $k\in\{2^{n}\}\,,n=0,1,2,\dots$ would be appropriate for experiments with times of at least $\sim 400\;\mu s$ for the hybrid method, and $\sim 100\;\mu s$ when using only sharpness gain.

S.II Intermediate time-repetition regime

In the main text we discussed the performance of TAPE in different settings where the relation between the time of an experiment and the number of repetitions $k$ of the unknown phase (coherent evolution) is different. In particular we considered two extreme cases: $\mathbf{T}\propto\mathbf{K}$ which is the usual situation in quantum metrology, and $t=1\,\forall\,k$ which is practically relevant for experiments where the measurement time is several orders of magnitude greater than the time of coherent evolution. Here we consider an intermediate case where the time required for state preparation and measurement (SPAM) is 100 times greater than the time required for a single application of the unknown phase ( $k=1$ ). In addition we assume a system with decoherence as described in the main text, i.e. perfectly prepared $|+\rangle$ states, $U=\mathrm{e}^{-iZ\phi/2}$ , $Z$ , and each application $U$ of the unknown phase is followed by a dephasing channel $K_{0}=\sqrt{\frac{1+\eta}{2}}\openone$ , $K_{1}=\sqrt{\frac{1-\eta}{2}}Z$ with $\eta=0.995$ .

The results of $10^{6}$ realisations of estimation using the gain rate method with entropy gain, sharpening gain, and the hybrid method are plotted in figure S.2. For times shorter than twice the SPAM time the phase uncertainty decreases little because we can only perform one measurement. For times slightly more than twice the SPAM time the situation is similar to the case $t=1\,\forall\,k$ discussed in the main text since applying large values of $k$ does not significantly impact the experiment time unless $k\gg 100$ . In the large $k$ limit we approach the usual situation in quantum metrology $\mathbf{T}\propto\mathbf{K}$ , however since in this case $k$ is limited by decoherence we never reach this limit. All three of the plotted methods preform similarly. Initially this is consistent with the results of figure 5 for short times. In the large $k$ limit the situation becomes similar to the situation in figure 6 for longer times, where all three methods perform similarly when decoherence is included in the model of the estimator. Thus, we see that the results in the limiting cases of the relation between $\mathbf{T}$ and $\mathbf{K}$ provide an indication of each method’s performance in the intermediate case.

S.III Multi-step gain method

Supposing we have a prior probability density $p_{s-1}(\phi)$ at step $s-1$ of a sequence of measurements, we calculate the maximum expected knowledge (sharpness or entropy) gain for step $s$ as $\Delta G(s)\equiv\max_{\alpha_{s},k_{s}}\Delta G(\alpha_{s},k_{s})$ for a single measurement using (7) or (9) for the sharpness or entropy gain, respectively. We calculate the maximum expected gain for two measurements as

\displaystyle\Delta G(s,s+1)

\displaystyle\equiv\Delta G(s)+\sum_{\xi}\Pi_{\xi}(\alpha_{s+1}^{(\xi)},k_{s+1}^{(\xi)})\Delta G(s+1)\,.

Since $\Delta G(\alpha_{s+1},k_{s+1})$ depends on $p_{s}(\phi|\xi;\alpha_{s},k_{s})$ , it implicitly depends on the outcome $\xi$ . Therefore we write $\alpha_{s+1}^{(\xi)}$ , $k_{s+1}^{(\xi)}$ to denote the values optimised conditioned on a particular measurement outcome at step $s$ . We can similarly calculate the maximum expected gain for three measurements as

	$\displaystyle\Delta G(s,s+2)\equiv$
	$\displaystyle\Delta G(s)+\sum_{\xi}\Pi_{\xi}(\alpha_{s+1}^{(\xi)},k_{s+1}^{(\xi)})\Delta G_{\xi}(s+1,s+2)\,,$

where the subscript $\xi$ in $\Delta G_{\xi}(s+1,s+2)$ is to remind us that this quantity depends on the measurement outcome at step $s$ . Applying this recursively we can write the maximum expected gain for $j$ measurements as

	$\displaystyle\Delta G(s,s-1+j)=\Delta G(s)$
	$\displaystyle+\sum_{\xi}\Pi_{\xi}(\alpha_{s+1}^{(\xi)},k_{s+1}^{(\xi)})\Delta G_{\xi}(s+1,s-1+j)\,.$		(S.1)

Since the number of possible outcomes for the next $j$ measurements is $2^{j}$ , the computation of the multi-step gain generally grows exponentially with the number of measurements. For this reason algorithms S.1 and S.2 are written to compare measurement sequences of approximately equal time that contain at most 5 measurements. In general the gain calculations (7), (9) don’t require all coefficients in (2) allowing a more efficient computation of the multi-step gain.

Algorithm S.1 Calculate intervals

Input:

\mathbf{T}=\{t_{1},t_{2},\dots,t_{N}\},\,t_{n}\in\mathbb{R}

, sorted in increasing order.

function Intervals(

\mathbf{T}

)

Let

\mathbf{I}

be a list of pairs

(a,b)

a,b\in\mathbb{N}

\triangleright

\mathbf{I}.\text{push}((a,b))

denotes adding a pair

(a,b)

to the back of the list.

\mathbf{I}

is initially empty.

n\leftarrow 1

while

n<N

m\leftarrow n

n\leftarrow n+1

while

n\leq N

and

t_{n}/t_{m}<8/7

n\leftarrow n+1

end while

\mathbf{I}.\text{push}((m,n-1))

end while

n-1<N

then

\mathbf{I}.\text{push}((n,N))

end if

return

\mathbf{I}

end function

Algorithm S.2 Compare two

k

-values

Requires: function gain

(k,j)

, where

k

is the number of applications of the unitary

U

and

j

is the number of steps. gain

(k,j)

returns the expected knowledge gain for performing

j

steps (measurements) with

U^{k}

, i.e. the result of equation (S.1) with

k_{s}=k_{s+1}=\dots=k_{s-1+j}=k

Inputs:

\mathbf{K}=\{k_{1},k_{2},\dots,k_{N}\},\,k_{n}\in\mathbb{N}

\mathbf{T}=\{t_{1},t_{2},\dots,t_{N}\},\,t_{n}\in\mathbb{R}

n_{1},n_{2}\in\mathbb{N}

n_{1}<n_{2}

t_{n_{1}}/t_{n_{2}}<32/7

function Compare(

\mathbf{K}

\mathbf{T}

n_{1}

n_{2}

)

r\leftarrow t_{n_{2}}/t_{n_{1}}

r<8/7

then

j_{1}\leftarrow 1

j_{2}\leftarrow 1

else if

r<24/17

then

j_{1}\leftarrow 4

j_{2}\leftarrow 3

else if

r<12/7

then

j_{1}\leftarrow 3

j_{2}\leftarrow 2

else if

r<20/9

then

j_{1}\leftarrow 2

j_{2}\leftarrow 1

else if

r<30/11

then

j_{1}\leftarrow 5

j_{2}\leftarrow 2

else if

r<24/7

then

j_{1}\leftarrow 3

j_{2}\leftarrow 1

else if

r<32/7

then

j_{1}\leftarrow 4

j_{2}\leftarrow 1

end if

g_{1}\leftarrow\text{gain}(k_{n_{1}},j_{1})

g_{2}\leftarrow\text{gain}(k_{n_{2}},j_{2})

g_{1}>g_{2}

then return

n_{1}

else return

n_{2}

end if

end function

Algorithm S.3 Search interval

Inputs:

\mathbf{K}=\{k_{1},k_{2},\dots,k_{N}\},\,k_{n}\in\mathbb{N}

\mathbf{T}=\{t_{1},t_{2},\dots,t_{N}\},\,t_{n}\in\mathbb{R}

n_{1},n_{2}\in\mathbb{N}

n_{1}\leq n_{2}

function searchInterval(

\mathbf{K}

\mathbf{T}

n_{1}

n_{2}

)

n_{2}==n_{1}

then

n_{1}==N

then return

n_{1}

return compare

(n_{1},n_{1}+1)

else

\triangleright

Brute-force search for max gain in the interval.

\text{max}\leftarrow\text{gain}(k_{n_{1}},1)

n\leftarrow n_{1}

for

m\leftarrow n_{1}+1,n_{2}

g\leftarrow\text{gain}(k_{m},1)

g>\text{max}

then

\text{max}\leftarrow g

n\leftarrow m

end if

end for

return

n

end if

end function

Algorithm S.4 Search up

Requires: function findInterval

(\mathbf{I},n)

, where

\mathbf{I}

is a list of intervals (see algorithm S.1) and

n\in\mathbb{N}

, that returns the index

i

of the unique interval in

\mathbf{I}

s.t.

n1\leq n\leq n2

Inputs:

\mathbf{K}=\{k_{1},k_{2},\dots,k_{N}\},\,k_{n}\in\mathbb{N}

\mathbf{T}=\{t_{1},t_{2},\dots,t_{N}\},\,t_{n}\in\mathbb{R}

intervals

\mathbf{I}

(see algorithm S.1),

N_{I}

(the number of intervals in

\mathbf{I}

n\in\mathbb{N}

function searchUp(

\mathbf{K}

\mathbf{T}

\mathbf{I}

n

)

i\leftarrow\text{findInterval}(\mathbf{I},n)

while true do

(n_{1},n_{2})\leftarrow\mathbf{I}[i]

n\leftarrow\text{searchInterval}(\mathbf{K},\mathbf{T},n_{1},n_{2})

n_{2}==n_{1}

then

n==n_{1}

then return

n

else

n<n_{2}

then return

n

i==N_{I}

then return

n

m\leftarrow\text{compare}(\mathbf{K},\mathbf{T},n_{1},n_{2}+1)

m==n_{1}

then return

n_{1}

end if

i\leftarrow i+1

end while

end function

Algorithm S.5 Multi-step gain method

Inputs:

\mathbf{K}=\{k_{1},k_{2},\dots,k_{N}\},\,k_{n}\in\mathbb{N}

\mathbf{T}=\{t_{1},t_{2},\dots,t_{N}\},\,t_{n}\in\mathbb{R}

function multiStepMethod(

\mathbf{K},\mathbf{T}

)

\mathbf{I}\leftarrow\text{intervals}(\mathbf{T})

n\leftarrow\text{searchUp}(\mathbf{K},\mathbf{T},\mathbf{I},1)

return

k_{n}

end function

To understand the effective functions maximised by the multi-step method (in the noise-free case), we calculate the multi-shot gains for sequences with approximately equal time as detailed in algorithm S.2. We assume the usual experiment times in sequential metrology experiments, i.e. $\mathbf{T}\propto\mathbf{K}$ , and perform estimation using a total time of $\sum_{s}k_{s}\approx 768$ starting from a uniform prior. In figure S.3 we then plot the cumulative difference in expected gain for the next measurement between consecutive $k$ -values:

\displaystyle\sum_{m=2}^{k}\big{(}\mathrm{gain}(m,j_{m})-\mathrm{gain}(m-1,j_{m-1})\big{)}\,,

(S.2)

where the values of $j_{m},j_{m-1}\forall m=2,\dots,k$ are determined as in algorithm S.2. The gain difference calculated in (S.2) is adjusted between intervals with multiple $k$ -values by comparing the first $k$ -value of consecutive intervals (as in algorithm S.4). The values plotted for $k=1$ are set to zero. In this way the plotted values show an effective function that is maximised (locally) by algorithm S.5.

We see that the differential gains for both expected sharpness and entropy gains show a global maximum around $k\sim 45$ . However the differences in multi-shot gains oscillate due to the fact that the times of the multi-step sequences are only approximately equal. Due to these oscillations, algorithm S.4 tends to stop at a local maximum. In algorithm S.5 we always start the search at $k=1$ so that the value of $k$ returned is typically much lower than the optimal value suggested by the plotted differential gains. Using the gain rate method to determine the choice of $k$ for this measurement gives $50(60)$ to maximise the expected sharpness (entropy) gain rate. More efficient methods to find the global maximum of the plotted functions could potentially also lead to accurate estimation procedures.

S.IV Computational complexity of contractions

In this section we derive an upper bound on the number of coefficients $\Gamma$ needed for the Fourier series $p(\phi)$ when using the contraction method, under the assumption that $p(\phi)$ is Gaussian. In particular, since $p(\phi)$ is periodic, we assume that it would have the form

	$\displaystyle p(\phi)$	$\displaystyle=\sum_{n=-\infty}^{\infty}\mathrm{e}^{-\frac{1}{2}n^{2}\sigma^{2}}\mathrm{e}^{in(\phi-\mu)}$
		$\displaystyle=1+2\sum_{n=1}^{\infty}\mathrm{e}^{-\frac{1}{2}n^{2}\sigma^{2}}\cos(n(\phi-\mu))\,,$

where $\mu$ is the mean phase, and the Holevo variance is $V[p(\phi)]=\sigma^{2}$ . We have written the sum to $\infty$ here; below we will consider the number of coefficients $\Gamma$ that would be used in an estimation sequence.

In the following we analyse the case where we perform contractions with $m=2$ , so that after performing $c$ contractions we go from a representation $p(\phi)$ on an interval of size $2\pi$ to a representation $q(\theta)$ on an interval of size $2\pi/2^{c}$ . Since $p(\phi)$ is a measure for the probability of the system phase we are trying to estimate, we will quantify the probability of the system phase being outside the interval represented by $q(\theta)$ after a single contraction by

	$\displaystyle\varepsilon$	$\displaystyle=\int_{\mu+\frac{\pi}{2}}^{\mu+\frac{3\pi}{2}}\frac{p(\phi)}{2\pi}d\phi$
		$\displaystyle=\frac{1}{2\pi}\int_{\frac{\pi}{2}}^{\frac{3\pi}{2}}\left(1+2\sum_{n=1}^{\infty}\mathrm{e}^{-\frac{1}{2}n^{2}\sigma^{2}}\cos(n\phi)\right)d\phi$
		$\displaystyle=\frac{1}{2}+\frac{1}{\pi}\sum_{n=1}^{\infty}\mathrm{e}^{-\frac{1}{2}n^{2}\sigma^{2}}\int_{\frac{2}{\pi}}^{\frac{3\pi}{2}}\cos(n\phi)d\phi$
		$\displaystyle=\frac{1}{2}+\frac{2}{\pi}\sum_{n=0}^{\infty}\frac{\mathrm{e}^{-\frac{1}{2}\sigma^{2}(4n+3)^{2}}}{4n+3}-\frac{\mathrm{e}^{-\frac{1}{2}\sigma^{2}(4n+1)^{2}}}{4n+1}\,,$

where in the second line we have used the fact that the result is independent of the mean phase $\mu$ so that we can, without loss of generality, set $\mu$ to zero. The last line follows from solving the integral and relabelling the index of the sum to include only the odd values of $n$ in the third line (the integral is zero for even values of $n$ ). Although we have not found a solution to the infinite series, we find that $\varepsilon$ is very close to the corresponding quantity for a non-periodic Gaussian,

\displaystyle\varepsilon\approx 1+\mathrm{erf}\left(\frac{-\pi}{2\sqrt{2}\sigma}\right)\,,

when $\sigma$ is less than $\sim\pi/2$ .

If we specify an upper bound $\varepsilon$ on the desired probability that the system phase cannot be represented after performing a contraction, then we obtain an upper bound on the standard deviation, $\sqrt{V[p(\phi)]}=\sigma$ , before we should perform a contraction:

\displaystyle\sigma<\frac{\pi}{2\sqrt{2}\mathrm{erf}^{-1}(1-\varepsilon)}=\frac{\pi}{2\sqrt{2}\mathrm{erfc}^{-1}(\varepsilon)}\,,

(S.3)

where $\mathrm{erfc}^{-1}$ is the inverse complementary error function. As mentioned in the main text, we expect from Fourier analysis that the number of coefficients $\Gamma$ is inversely proportional to $\sqrt{V[p(\phi)]}$ (and therefore to $\Delta\hat{\phi}$ ). We can see this in the case of a Gaussian by assuming that any numerical representation will have a finite machine precision. Let $\delta$ be the smallest value that can be represented numerically. Then for the highest order coefficient that can be represented, we have $\mathrm{e}^{-\Gamma^{2}\sigma^{2}/2}=\delta$ , so

\displaystyle\Gamma

\displaystyle=\frac{\sqrt{2\ln(\delta^{-1})}}{\sigma}\,.

(S.4)

Substituting (S.3) into (S.4), we obtain a relation between the probability of an error, $\varepsilon$ , and the number of coefficients:

\displaystyle\Gamma

\displaystyle>\frac{4}{\pi}\sqrt{\ln(\delta^{-1})}\mathrm{erfc}^{-1}(\varepsilon)\,.

(S.5)

If we now have a $p(\phi)$ that is Gaussian with standard deviation $\sigma^{*}$ given by (S.3) and that requires a representation to machine precision using $\Gamma^{*}$ coefficients, as given by (S.5), then the probability of an error after performing one contraction with $m=2$ will be less than $\varepsilon$ . At this point we would have a density $q(\theta)$ requiring only $\Gamma^{*}/2$ coefficients and we could continue estimation until the standard deviation of $q(\theta)$ would be $\sigma^{*}$ , and that of $p(\phi)$ represented by this contraction would be $\sigma^{*}/2$ . Now $q(\theta)$ would have $\Gamma^{*}$ coefficients and we would need to perform another contraction to avoid increasing $\Gamma$ above $\Gamma^{*}$ . If we would perform a second contraction the total probability that either the first or the second contraction would lead to an error would be $\varepsilon+\varepsilon(1-\varepsilon)<2\varepsilon$ . To keep the total error probability below the original value of $\varepsilon$ we would then want to use slightly more coefficients to begin with so that $\varepsilon$ is twice as small.

More generally, if we perform a total of $c$ contractions, the total error probability $\epsilon$ is less than $c\varepsilon$ , and from (S.5) we should use a number of coefficients

\displaystyle\Gamma^{*}

\displaystyle\equiv\left\lceil\frac{4}{\pi}\sqrt{\ln(\delta^{-1})}\mathrm{erfc}^{-1}\left(\frac{\epsilon}{c}\right)\right\rceil\,.

A representation without contractions would require $\Gamma=2^{c}\Gamma^{*}$ coefficients. Let $\Gamma_{1}$ be the number of coefficients required to keep the error probability below $\epsilon$ for a single contraction:

\displaystyle\Gamma_{1}

\displaystyle\equiv\left\lceil\frac{4}{\pi}\sqrt{\ln(\delta^{-1})}\mathrm{erfc}^{-1}\left(\epsilon\right)\right\rceil\,.

We have $\Gamma=2^{c}\Gamma^{*}\geq 2^{c}\Gamma_{1}$ , so $c\leq\log_{2}(\Gamma/\Gamma_{1})$ , and we can write an upper bound on the number of coefficients $\Gamma^{*}$ when using contractions as a function of the number of coefficients $\Gamma$ that would be needed without using contractions:

\displaystyle\Gamma^{*}\leq\frac{4}{\pi}\sqrt{\ln(\delta^{-1})}\mathrm{erfc}^{-1}\left(\frac{\epsilon}{\log_{2}\left(\frac{\Gamma}{\Gamma_{1}}\right)}\right)\,.

(S.6)

In figure S.4 we plot the upper bound (S.6) with $\epsilon=10^{-10}$ and $\delta=10^{-16}$ . We see that it scales slower than $\log(\Gamma)$ suggesting that the we can use a number of coefficients $\Gamma^{*}=\mathcal{O}(\log(\Gamma))$ while keeping the total probability that an error occurs below $\epsilon$ .

To determine the asymptotic complexity of $\Gamma^{*}$ in terms of $\Gamma$ we make use of an asymptotic expansion of $\mathrm{erfc}^{-1}(x)$ , for $x\rightarrow 0$ [92, 93]:

\displaystyle\mathrm{erfc}^{-1}(x)\sim u^{-1/2}\left(1+u\left(a_{2}u+a_{3}u^{2}+\dots\right)\right)

(S.7)

where

	$\displaystyle u$	$\displaystyle=-2/\ln(\pi x^{2}\ln(1/x))\,,$
	$\displaystyle a_{2}$	$\displaystyle=v/8\,,\qquad a_{3}=-(v^{2}+6v-6)/32\,,$
	$\displaystyle v$	$\displaystyle=\ln(\ln(1/x))-2+\ln(\pi)\,.$

As $x\rightarrow 0$ , $v\rightarrow\infty$ and $u\rightarrow 0$ , but each term $a_{n+1}u^{n}$ in (S.7) will be $\mathcal{O}\left((vu)^{n}\right)$ . Since

\displaystyle\lim_{x\rightarrow 0}vu\rightarrow 0\,,

$\mathrm{erfc}^{-1}(x)\rightarrow u^{-1/2}$ as $x\rightarrow 0$ . Then to go further, as $x\rightarrow 0$ , we have $u^{-1/2}<\sqrt{\ln(1/x)}$ . From this result we have $\Gamma^{*}=\mathcal{O}\left(\sqrt{\log(\log(\Gamma))}\right)$ .

We note that the bound (S.6) is only of practical interest for determining what value of $\sigma$ should be reached before performing a contraction with a certain error probability, if $p(\phi)$ is Gaussian. Since we have not shown this in any of the estimation methods we have studied in the main text it cannot be used there. In those cases the value of $\sigma$ to be reached before contracting is determined empirically from simulation. But the purpose of the bound (S.6) is mainly to determine what potential the method of using contractions has to reduce computational complexity, rather than to be a practical way of calculating the required value of $\sigma$ .

S.V Expected entropy gain

In this section we show the relation between the expected entropy gain (8) and the Kullback-Leibler divergence (KL divergence), and we derive equation (9). Starting from the definition of the expected entropy gain (8), we can rewrite the entropy of the posterior (we sometimes write $p_{s}(\phi)=p_{s}(\phi|\xi;\alpha,k)$ for short)

	$\displaystyle H[p_{s}(\phi\|\xi;\alpha,k)]$	$\displaystyle=-\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\ln\left(\frac{P_{\xi}(\alpha,k\phi)}{\Pi_{\xi}(\alpha,k)}\right)-\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\ln\left(\frac{p_{s-1}(\phi)}{2\pi}\right)$
		$\displaystyle=-\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\ln\left(\frac{p_{s}(\phi)}{p_{s-1}(\phi)}\right)+H(p_{s},\,p_{s-1})$
		$\displaystyle=-D_{KL}(p_{s}\\|p_{s-1})+H(p_{s},\,p_{s-1})\,,$

where $D_{KL}(p_{s}\|p_{s-1})$ is the KL divergence of the posterior from the prior, and $H(p_{s},\,p_{s-1})$ is the cross entropy. Noting that

\displaystyle\sum_{\xi}\Pi_{\xi}(\alpha,k)\Big{(}H[p_{s-1}(\phi)]-H\big{(}p_{s}(\phi|\xi;\alpha,k),\,p_{s-1}(\phi)\big{)}\Big{)}

\displaystyle=0\,,

we see that

\displaystyle\Delta_{s}H(\alpha,\,k)

\displaystyle=\sum_{\xi}\Pi_{\xi}(\alpha,k)D_{KL}(p_{s}\|p_{s-1})\,,

i.e. the expected entropy gain is equal to the expected KL divergence. We can rewrite this as

\displaystyle\Delta_{s}H(\alpha,\,k)

\displaystyle=\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,.

We would like to rewrite the first term with the integral in a form that can be computed more efficiently.

S.V.1 Solving the integral

We have

	$\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}$
	$\displaystyle=-\ln(2)+\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}\bigg{[}\frac{1}{2}\Big{(}1+\xi\big{(}(1-\lambda)+\lambda\zeta\cos(\alpha-k\phi)\big{)}\Big{)}$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\quad\ln\Big{(}1+\xi\big{(}(1-\lambda)+\lambda\zeta\cos(\alpha-k\phi)\big{)}\Big{)}\bigg{]}\,.$

Let $\gamma=\zeta\cos(\alpha-k\phi)$ , $\beta=(1-\lambda)+\lambda\gamma$ . Then we have

	$\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}$
	$\displaystyle=-\ln(2)+\int_{0}^{2\pi}\frac{d\phi}{2\pi}\frac{1}{2}\bigg{[}\Big{(}\ln(1+\beta)+\ln(1-\beta)\Big{)}+\beta\Big{(}\ln(1+\beta)-\ln(1-\beta)\Big{)}\bigg{]}\,.$

We can work out the terms in the last line using the Taylor series form of $\ln(1\pm\beta)$ ¹⁰¹⁰10This is possible since $|\beta|\leq 1$ .:

\displaystyle\ln(1+\beta)+\ln(1-\beta)

\displaystyle=-2\sum_{n=1}^{\infty}\frac{\beta^{2n}}{2n}\,,

\displaystyle\ln(1+\beta)-\ln(1-\beta)

\displaystyle=2\sum_{n=1}^{\infty}\frac{\beta^{2n-1}}{2n-1}\,.

So we have

\displaystyle\Big{(}\ln(1+\beta)+\ln(1-\beta)\Big{)}+\beta\Big{(}\ln(1+\beta)-\ln(1-\beta)\Big{)}

\displaystyle=2\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\beta^{2n}\,.

Using this result, we have

\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}

\displaystyle=-\ln(2)+\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s-1}(\phi)\beta^{2n}\,,

and

\displaystyle\beta^{2n}

\displaystyle=\big{(}(1-\lambda)+\lambda\gamma\big{)}^{2n}=\sum_{u=0}^{2n}\begin{pmatrix}2n\\ u\end{pmatrix}(1-\lambda)^{2n-u}\lambda^{u}\gamma^{u}\,,

	$\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}$
	$\displaystyle=-\ln(2)+\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\sum_{u=0}^{2n}\begin{pmatrix}2n\\ u\end{pmatrix}(1-\lambda)^{2n-u}\lambda^{u}\zeta^{u}\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s-1}(\phi)\cos^{u}(\alpha-k\phi)\,.$

We can rewrite the integral term

\displaystyle\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s-1}(\phi)\cos^{u}(\alpha-k\phi)

\displaystyle=\frac{1}{2^{u}}\sum_{q=0}^{u}\begin{pmatrix}u\\ q\end{pmatrix}\mathrm{e}^{i\alpha(u-2q)}c_{k(u-2q)}^{(s-1)}\,,

	$\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}=$
	$\displaystyle-\ln(2)+\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\sum_{u=0}^{2n}\begin{pmatrix}2n\\ u\end{pmatrix}(1-\lambda)^{2n-u}\left(\frac{\lambda\zeta}{2}\right)^{u}\sum_{q=0}^{u}\begin{pmatrix}u\\ q\end{pmatrix}\mathrm{e}^{i\alpha(u-2q)}c_{k(u-2q)}^{(s-1)}\,.$		(S.8)

That solves the integral, but we are left with some infinite series that are not very practical for computation. Ideally we would like to have an expression that is a sum over the coefficients $c_{n}^{(s-1)}$ of the prior, but without any other infinite series.

S.V.2 Simplifying and solving the series

Using some results from [95], we find the following series solutions

\displaystyle\sum_{n=1}^{\infty}\frac{1}{n(2n-1)}\begin{pmatrix}2n\\ n\end{pmatrix}x^{n}

\displaystyle=\frac{8x}{1+\sqrt{1-4x}}+2\ln(1+\sqrt{1-4x})-2\ln(2)\,,

(S.9)

\displaystyle\sum_{n=m}^{\infty}\frac{1}{n(2n-1)}\begin{pmatrix}2n\\ n-m\end{pmatrix}x^{n}

\displaystyle=\left(\frac{1+2m\sqrt{1-4x}}{m(4m^{2}-1)}\right)\left(\frac{4x}{(1+\sqrt{1-4x})^{2}}\right)^{m}\,,

(S.10)

	$\displaystyle\sum_{v=m}^{\infty}\left(\frac{1}{2(v-1)(2v-1)}\right)\begin{pmatrix}2v-1\\ v-m\end{pmatrix}x^{2v-1}$
	$\displaystyle=\frac{x}{2}\left(\frac{1+(2m-1)\sqrt{1-4x^{2}}}{(m-1)m(2m-1)(1+\sqrt{1-4x^{2}})}\right)\left(\frac{4x^{2}}{(1+\sqrt{1-4x^{2}})^{2}}\right)^{m-1}\,,$		(S.11)

	$\displaystyle\sum_{v=2}^{\infty}\frac{1}{2(v-1)(2v-1)}\begin{pmatrix}2v-1\\ v-1\end{pmatrix}x^{2v-1}$
	$\displaystyle=\frac{x}{2}\left(1+2\ln(2)-2\ln(1+\sqrt{1-4x^{2}})-\frac{2}{1+\sqrt{1-4x^{2}}}\right)\,,$		(S.12)

which will be used in the derivation below.

We rewrite the sum over $q$ in (S.8) separately for the cases where $u$ is even or odd. If $u$ is even, we have

\displaystyle\sum_{q=0}^{u}\begin{pmatrix}u\\ q\end{pmatrix}\mathrm{e}^{i\alpha(u-2q)}c_{k(u-2q)}^{(s-1)}

\displaystyle=\sum_{m=-u/2}^{u/2}\begin{pmatrix}u\\ \frac{u}{2}-m\end{pmatrix}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\,.

For every term in the sum with a positive value of $m$ there is another term with the same value of $m$ , but negative; and there is one term with $m=0$ . So we can rewrite the sum as

\displaystyle\begin{pmatrix}u\\ \frac{u}{2}\end{pmatrix}c_{0}^{(s-1)}+\sum_{m=1}^{u/2}\Bigg{(}\begin{pmatrix}u\\ \frac{u}{2}-m\end{pmatrix}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}+\begin{pmatrix}u\\ \frac{u}{2}+m\end{pmatrix}\mathrm{e}^{-i\alpha 2m}c_{-k2m}^{(s-1)}\Bigg{)}\,.

We can use the property of binomial coefficients:

\displaystyle\begin{pmatrix}n\\ k\end{pmatrix}

\displaystyle=\begin{pmatrix}n\\ n-k\end{pmatrix}\text{, so we have }\begin{pmatrix}u\\ \frac{u}{2}-m\end{pmatrix}=\begin{pmatrix}u\\ \frac{u}{2}+m\end{pmatrix}\,.

Also noting that $c_{0}^{(s-1)}=1$ ¹¹¹¹11This is required by normalisation of $p_{s-1}(\phi)$ . we can rewrite the sum as

\displaystyle\begin{pmatrix}u\\ \frac{u}{2}\end{pmatrix}+2\sum_{m=1}^{u/2}\begin{pmatrix}u\\ \frac{u}{2}-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\Big{\}}\,.

Similarly, if $u$ is odd, we can write

\displaystyle\sum_{q=0}^{u}\begin{pmatrix}u\\ q\end{pmatrix}\mathrm{e}^{i\alpha(u-2q)}c_{k(u-2q)}^{(s-1)}

\displaystyle=2\sum_{m=1}^{(u+1)/2}\begin{pmatrix}u\\ \frac{u+1}{2}-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\Big{\}}\,.

Now let

\displaystyle f(u)\equiv\sum_{q=0}^{u}\begin{pmatrix}u\\ q\end{pmatrix}\mathrm{e}^{i\alpha(u-2q)}c_{k(u-2q)}^{(s-1)}\,,

to simplify the notation. So splitting the sum over $u$ into even and odd parts, we have

	$\displaystyle\sum_{u=0}^{2n}\begin{pmatrix}2n\\ u\end{pmatrix}(1-\lambda)^{2n-u}\left(\frac{\lambda\zeta}{2}\right)^{u}f(u)$	$\displaystyle=\sum_{u=0}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}f(2u)$
		$\displaystyle+\sum_{v=1}^{n}\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2(n-v)+1}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}f(2v-1)\,.$

Now we consider the even part of the sum:

	$\displaystyle\sum_{u=0}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}f(2u)$
	$\displaystyle=(1-\lambda)^{2n}+\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\begin{pmatrix}2u\\ u\end{pmatrix}$
	$\displaystyle+2\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\sum_{m=1}^{u}\begin{pmatrix}2u\\ u-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\Big{\}}\,.$

For the odd part we have

	$\displaystyle\sum_{v=1}^{n}\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2(n-v)+1}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}f(2v-1)$
	$\displaystyle=2\sum_{v=1}^{n}\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2(n-v)+1}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}\sum_{m=1}^{v}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\Big{\}}\,.$

So if we combine everything we have

	$\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}$
	$\displaystyle=-\ln(2)+\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\Bigg{[}(1-\lambda)^{2n}+\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\begin{pmatrix}2u\\ u\end{pmatrix}$
	$\displaystyle+2\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\sum_{m=1}^{u}\begin{pmatrix}2u\\ u-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\Big{\}}$
	$\displaystyle+2\sum_{v=1}^{n}\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2(n-v)+1}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}\sum_{m=1}^{v}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\Big{\}}\Bigg{]}\,.$		(S.13)

There are now four terms in the infinite sum over $n$ . The first is

\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)(1-\lambda)^{2n}

\displaystyle=\frac{1}{2}\ln\left(1-(1-\lambda)^{2}\right)+(1-\lambda)\tanh^{-1}(1-\lambda)\,.

(S.14)

The second is

	$\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\begin{pmatrix}2u\\ u\end{pmatrix}$
	$\displaystyle=\sum_{m=1}^{\infty}\left(\frac{\lambda\zeta}{2}\right)^{2m}\begin{pmatrix}2m\\ m\end{pmatrix}(1-\lambda)^{-2m}\sum_{n=m}^{\infty}\left(\frac{1}{2n(2n-1)}\right)\begin{pmatrix}2n\\ 2m\end{pmatrix}(1-\lambda)^{2n}\,.$

Using the solution to the infinite series:

\displaystyle\sum_{n=m}^{\infty}\frac{x^{2n}}{2n(2n-1)}\begin{pmatrix}2n\\ 2m\end{pmatrix}

\displaystyle=\frac{x^{2m}}{4m(2m-1)}\Bigg{(}\frac{(1+x)}{(1+x)^{2m}}+\frac{(1-x)}{(1-x)^{2m}}\Bigg{)}\,,

(S.15)

we can rewrite the second term as

	$\displaystyle\frac{(2-\lambda)}{4}\sum_{m=1}^{\infty}\left(\frac{1}{m(2m-1)}\right)\begin{pmatrix}2m\\ m\end{pmatrix}\left(\frac{\lambda\zeta}{2(2-\lambda)}\right)^{2m}$
	$\displaystyle+\frac{\lambda}{4}\sum_{m=1}^{\infty}\left(\frac{1}{m(2m-1)}\right)\begin{pmatrix}2m\\ m\end{pmatrix}\left(\frac{\zeta}{2}\right)^{2m}\,.$

Using the result (S.9) we find this is equal to

\displaystyle\left(1-\frac{\lambda}{2}\right)F(\delta)+\frac{\lambda}{2}F(\zeta)-\ln(2)\,,

(S.16)

where we defined

\displaystyle\delta\equiv\frac{\lambda\zeta}{2-\lambda},\qquad F(x)\equiv\frac{x^{2}}{g_{1}(x)}+\ln\left(g_{1}(x)\right),\qquad g_{1}(x)\equiv 1+g_{0}(x),\qquad g_{0}(x)\equiv\sqrt{1-x^{2}}\,.

(S.17)

Next we consider the third term in the infinite sum in (S.13):

	$\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{n(2n-1)}\right)\sum_{u=1}^{n}\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2(n-u)}\left(\frac{\lambda\zeta}{2}\right)^{2u}\sum_{m=1}^{u}\begin{pmatrix}2u\\ u-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\Big{\}}$
	$\displaystyle=\sum_{m=1}^{\infty}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\Big{\}}\sum_{u=m}^{\infty}\left(\frac{\lambda\zeta}{2}\right)^{2u}\begin{pmatrix}2u\\ u-m\end{pmatrix}(1-\lambda)^{-2u}\sum_{n=u}^{\infty}\left(\frac{1}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2n}\,.$

The last series over $n$ is given by (S.15) (within a factor of two), so we have

	$\displaystyle\sum_{u=m}^{\infty}\left(\frac{\lambda\zeta}{2}\right)^{2u}\begin{pmatrix}2u\\ u-m\end{pmatrix}(1-\lambda)^{-2u}\sum_{n=u}^{\infty}\left(\frac{1}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2u\end{pmatrix}(1-\lambda)^{2n}$
	$\displaystyle=\frac{(2-\lambda)}{2}\sum_{u=m}^{\infty}\left(\frac{1}{u(2u-1)}\right)\begin{pmatrix}2u\\ u-m\end{pmatrix}\left(\frac{\lambda\zeta}{2(2-\lambda)}\right)^{2u}$
	$\displaystyle+\frac{\lambda}{2}\sum_{u=m}^{\infty}\left(\frac{1}{u(2u-1)}\right)\begin{pmatrix}2u\\ u-m\end{pmatrix}\left(\frac{\zeta}{2}\right)^{2u}\,.$

Using (S.10) we find this is equal to

\displaystyle\left(1-\frac{\lambda}{2}\right)G(\delta,m)+\frac{\lambda}{2}G(\zeta,m)\,,

(S.18)

where we defined

\displaystyle G(x,m)\equiv\left(\frac{1+2mg_{0}(x)}{m(4m^{2}-1)}\right)\left(\frac{x}{g_{1}(x)}\right)^{2m}\,.

(S.19)

Last we consider the fourth term in the infinite sum in (S.13):

	$\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{n(2n-1)}\right)\sum_{v=1}^{n}\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2(n-v)+1}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}\sum_{m=1}^{v}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\Big{\}}$
	$\displaystyle=\sum_{m=1}^{\infty}\mathfrak{Re}\Big{\{}\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\Big{\}}\sum_{v=m}^{\infty}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}$
	$\displaystyle\qquad(1-\lambda)^{-(2v-1)}\sum_{n=v}^{\infty}\left(\frac{1}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2n}\,.$

Here the last series over $n$ is given by (for $m>1$ ; we will need to consider the $m=1$ case separately)

\displaystyle\sum_{n=m}^{\infty}\left(\frac{x^{2n}}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2m-1\end{pmatrix}

\displaystyle=\left(\frac{x^{2m-1}}{2(m-1)(2m-1)}\right)\left(\frac{(1-x)}{(1-x)^{2m-1}}-\frac{(1+x)}{(1+x)^{2m-1}}\right)\,,

and for $m=1$

\displaystyle\sum_{n=m}^{\infty}\left(\frac{x^{2n}}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2m-1\end{pmatrix}=\sum_{n=1}^{\infty}\frac{x^{2n}}{n(2n-1)}2n=2\sum_{n=1}^{\infty}\frac{x^{2n}}{(2n-1)}=2x\tanh^{-1}(x)\,,

so we have (for $m>1$ )

	$\displaystyle\sum_{v=m}^{\infty}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}(1-\lambda)^{-(2v-1)}\sum_{n=v}^{\infty}\left(\frac{1}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2n}$
	$\displaystyle=\lambda\sum_{v=m}^{\infty}\left(\frac{1}{2(v-1)(2v-1)}\right)\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\left(\frac{\zeta}{2}\right)^{2v-1}$
	$\displaystyle-(2-\lambda)\sum_{v=m}^{\infty}\left(\frac{1}{2(v-1)(2v-1)}\right)\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\left(\frac{\lambda\zeta}{2(2-\lambda)}\right)^{2v-1}\,.$

Using (S.11) we find this is equal to

\displaystyle\frac{\lambda\zeta}{4}\left(J(\zeta,m)-J(\delta,m)\right)\,,

(S.20)

where we defined

\displaystyle J(x,m)\equiv\left(\frac{1+(2m-1)g_{0}(x)}{(m-1)m(2m-1)g_{1}(x)}\right)\left(\frac{x}{g_{1}(x)}\right)^{2(m-1)}\,.

(S.21)

Finally, for the $m=1$ case in the fourth term in (S.13), we have

	$\displaystyle\sum_{v=m}^{\infty}\begin{pmatrix}2v-1\\ v-m\end{pmatrix}\left(\frac{\lambda\zeta}{2}\right)^{2v-1}(1-\lambda)^{-(2v-1)}\sum_{n=v}^{\infty}\left(\frac{1}{n(2n-1)}\right)\begin{pmatrix}2n\\ 2v-1\end{pmatrix}(1-\lambda)^{2n}$
	$\displaystyle=\lambda\zeta\tanh^{-1}(1-\lambda)$
	$\displaystyle+\sum_{v=2}^{\infty}\left(\frac{1}{2(v-1)(2v-1)}\right)\begin{pmatrix}2v-1\\ v-1\end{pmatrix}\left(\lambda\left(\frac{\zeta}{2}\right)^{2v-1}-(2-\lambda)\left(\frac{\lambda\zeta}{2(2-\lambda)}\right)^{2v-1}\right)\,.$

Using (S.12) we can rewrite this as

\displaystyle\lambda\zeta\left(\tanh^{-1}(1-\lambda)+\frac{L(\delta)-L(\zeta)}{2}\right)\,,

(S.22)

where we defined

\displaystyle L(x)\equiv\frac{1}{g_{1}(x)}+\ln\left(g_{1}(x)\right)

(S.23)

S.V.3 Summary

Substituting the results and definitions (S.14), (S.16), (S.17), (S.18), (S.19), (S.20), (S.21), (S.22), and (S.23) into (S.13), we have

	$\displaystyle\sum_{\xi}\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\ln\big{(}P_{\xi}(\alpha,k\phi)\big{)}$
	$\displaystyle=-\ln(2)+\frac{1}{2}\ln\left(1-(1-\lambda)^{2}\right)+(1-\lambda)\tanh^{-1}(1-\lambda)+\left(1-\frac{\lambda}{2}\right)F(\delta)+\frac{\lambda}{2}F(\zeta)-\ln(2)$
	$\displaystyle+\sum_{m=1}^{\infty}A_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}+B_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\right\}\,,$

where

	$\displaystyle A_{m}$	$\displaystyle=\left(1-\frac{\lambda}{2}\right)G(\delta,m)+\frac{\lambda}{2}G(\zeta,m)$
	$\displaystyle B_{m}$	$\displaystyle=\begin{cases}\frac{\lambda\zeta}{2}\left(\ln\left(1-\frac{\lambda}{2}\right)+L(\delta)-\ln\left(\frac{\lambda}{2}\right)-L(\zeta)\right)\,,&\text{if }m=1\\ \frac{\lambda\zeta}{4}\left(J(\zeta,m)-J(\delta,m)\right)\,,&\text{otherwise}\end{cases}$

and the expected entropy gain (9) is

	$\displaystyle\Delta_{s}H(\alpha,\,k)$	$\displaystyle=-2\ln(2)+\frac{1}{2}\ln\left(1-(1-\lambda)^{2}\right)+\frac{(1-\lambda)}{2}\ln\left(\frac{2-\lambda}{\lambda}\right)+\left(1-\frac{\lambda}{2}\right)F(\delta)+\frac{\lambda}{2}F(\zeta)$
		$\displaystyle+\sum_{m=1}^{\infty}\Big{(}A_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}+B_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\right\}\Big{)}-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,.$		(S.24)

S.V.4 Limiting cases

If $\lambda\rightarrow 1$ , then $\delta\rightarrow\zeta$ , and we have

	$\displaystyle\Delta_{s}H(\alpha,\,k)$	$\displaystyle=-2\ln(2)+\frac{1}{2}\ln(1)+(0)\ln(1)+F(\zeta)$
		$\displaystyle+\sum_{m=1}^{\infty}\Big{(}A_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}+B_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\right\}\Big{)}$
		$\displaystyle-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,,$

and $A_{m}=G(\zeta,m),\,B_{m}=0$ , so

\displaystyle\Delta_{s}H(\alpha,\,k)

\displaystyle=-2\ln(2)+F(\zeta)+\sum_{m=1}^{\infty}G(\zeta,m)\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,.

If we also have $\zeta=1$ , then $g_{0}(\zeta)=0$ , and we have

\displaystyle\Delta_{s}H(\alpha,\,k)

\displaystyle=-2\ln(2)+1+\sum_{m=1}^{\infty}\left(\frac{1}{m(4m^{2}-1)}\right)\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,.

If $\zeta=0$ , then $g_{0}(\zeta)=1$ and we have

\displaystyle\Delta_{s}H(\alpha,\,k)

\displaystyle=-2\ln(2)+\ln(2)-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,.

Since in this case $\Pi_{\xi}(\alpha,\,k)=\frac{1}{2}$ , this is

\displaystyle\Delta_{s}H(\alpha,\,k)

\displaystyle=-\ln(2)-\ln\left(\frac{1}{2}\right)=0\,,

as expected, since no information about $\phi$ is given by the measurement.

Finally, we consider the case $\lambda\rightarrow 0$ . Then $\delta\rightarrow 0$ , and we have

	$\displaystyle\Delta_{s}H(\alpha,\,k)$	$\displaystyle=-2\ln(2)+\frac{1}{2}\ln(0)+\frac{1}{2}\ln\left(\frac{2}{0}\right)+\ln(2)$
		$\displaystyle+\sum_{m=1}^{\infty}\Big{(}A_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha 2m}c_{k2m}^{(s-1)}\right\}+B_{m}\mathfrak{Re}\left\{\mathrm{e}^{i\alpha(2m-1)}c_{k(2m-1)}^{(s-1)}\right\}\Big{)}$
		$\displaystyle-\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)\,,$

and

	$\displaystyle A_{m}$	$\displaystyle=0$
	$\displaystyle B_{m}$	$\displaystyle=\begin{cases}0\ln\left(\frac{2}{0}\right)\,,&\text{if }m=1\\ 0\,,&\text{otherwise}\end{cases}\,,$

and since $\Pi_{\xi}(\alpha,\,k)\in\{1,0\}$ ,

\displaystyle\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\ln\left(\Pi_{\xi}(\alpha,\,k)\right)=0\,.

But we see that the value of

\displaystyle\frac{1}{2}\ln\left(1-(1-\lambda)^{2}\right)+\frac{(1-\lambda)}{2}\ln\left(\frac{2-\lambda}{\lambda}\right)

\displaystyle\rightarrow\frac{1}{2}\ln(0)+\frac{1}{2}\ln\left(\frac{2}{0}\right)

is undefined. Recalling that this was the result of the series

\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)(1-\lambda)^{2n}\,,

we can see that if we set $\lambda=0$ , we have

	$\displaystyle\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)(1-\lambda)^{2n}$	$\displaystyle\rightarrow\sum_{n=1}^{\infty}\left(\frac{1}{2n(2n-1)}\right)$
		$\displaystyle=\ln(2)\,.$

Similarly, the value of $B_{1}$ is undefined. We see that all terms in $B_{1}$ are zero, except one, which becomes undefined at $\lambda=0$ ; this is the term that results from the series:

\displaystyle\lambda\zeta(1-\lambda)^{-1}\sum_{n=1}^{\infty}\left(\frac{1}{(2n-1)}\right)(1-\lambda)^{2n}

\displaystyle\rightarrow(0)\zeta\sum_{n=1}^{\infty}\frac{1}{2n-1}\,.

But since the series term diverges, we need to take the limit:

\displaystyle\lim_{\lambda\rightarrow 0}\lambda\ln\left(\frac{2-\lambda}{\lambda}\right)

\displaystyle=0\,.

So $B_{1}\rightarrow 0$ . Putting these results back together, we have

\displaystyle\Delta_{s}H(\alpha,\,k)

\displaystyle=-2\ln(2)+\ln(2)+\ln(2)=0\,.

Similarly to the $\zeta=0$ case, when $\lambda=0$ the measurement gives no information about $\phi$ so the expected entropy gain is zero.

S.VI Expected sharpness gain

The sharpness at step $s$ is

\displaystyle S\left[p_{s}(\phi)\right]

\displaystyle=\left|\langle\mathrm{e}^{i\phi}\rangle_{s}\right|=\left|\int_{0}^{2\pi}\frac{d\phi}{2\pi}p_{s}(\phi)\mathrm{e}^{i\phi}\right|=\left|c_{-1}^{(s)}\right|

We define the expected sharpness gain for step $s$ :

	$\displaystyle\Delta_{s}S(\alpha,\,k)$	$\displaystyle=\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\Big{(}S\left[p_{s}(\phi\|\xi;\alpha,k)\right]-S\left[p_{s-1}(\phi)\right]\Big{)}$
		$\displaystyle=\sum_{\xi}\Pi_{\xi}(\alpha,\,k)S\left[p_{s}(\phi\|\xi;\alpha,k)\right]-\left\|c_{-1}^{(s-1)}\right\|$
		$\displaystyle=\sum_{\xi}\left\|\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\mathrm{e}^{i\phi}\right\|-\left\|c_{-1}^{(s-1)}\right\|\,.$

We have

	$\displaystyle\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\mathrm{e}^{i\phi}$
	$\displaystyle=\int_{0}^{2\pi}\frac{d\phi}{2\pi}\sum_{n=-\infty}^{\infty}\Bigg{[}\frac{1}{2}\big{(}1+\xi(1-\lambda)\big{)}c_{n}^{(s-1)}+\xi\lambda\frac{\zeta}{4}\left(\mathrm{e}^{i\alpha}c_{n+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{n-k}^{(s-1)}\right)\Bigg{]}\mathrm{e}^{in\phi}\mathrm{e}^{i\phi}$
	$\displaystyle=\sum_{n=-\infty}^{\infty}\Bigg{[}\frac{1}{2}\big{(}1+\xi(1-\lambda)\big{)}c_{n}^{(s-1)}+\xi\lambda\frac{\zeta}{4}\left(\mathrm{e}^{i\alpha}c_{n+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{n-k}^{(s-1)}\right)\Bigg{]}\int_{0}^{2\pi}\frac{d\phi}{2\pi}\mathrm{e}^{i(n+1)\phi}$
	$\displaystyle=\frac{1}{2}\big{(}1+\xi(1-\lambda)\big{)}c_{-1}^{(s-1)}+\xi\lambda\frac{\zeta}{4}\left(\mathrm{e}^{i\alpha}c_{-1+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{-1-k}^{(s-1)}\right)\,,$		(S.25)

Using (S.25), we have

\displaystyle\Delta_{s}S(\alpha,\,k)

\displaystyle=\sum_{\xi}\bigg{|}\frac{1}{2}\big{(}1+\xi(1-\lambda)\big{)}c_{-1}^{(s-1)}+\xi\lambda\frac{\zeta}{4}\left(\mathrm{e}^{i\alpha}c_{-1+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{-1-k}^{(s-1)}\right)\bigg{|}-\left|c_{-1}^{(s-1)}\right|\,.

(S.26)

S.VI.1 Limiting cases

If $\lambda=1$

\displaystyle\Delta_{s}S(\alpha,\,k)

\displaystyle=\sum_{\xi}\bigg{|}\frac{1}{2}c_{-1}^{(s-1)}+\xi\frac{\zeta}{4}\left(\mathrm{e}^{i\alpha}c_{-1+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{-1-k}^{(s-1)}\right)\bigg{|}-\left|c_{-1}^{(s-1)}\right|\,,

and if $\zeta=0$

\displaystyle\Delta_{s}S(\alpha,\,k)

\displaystyle=\sum_{\xi}\bigg{|}\frac{1}{2}c_{-1}^{(s-1)}\bigg{|}-\left|c_{-1}^{(s-1)}\right|=\frac{1}{2}2\bigg{|}c_{-1}^{(s-1)}\bigg{|}-\left|c_{-1}^{(s-1)}\right|=0\,,

as expected. And if $\lambda=0$

\displaystyle\Delta_{s}S(\alpha,\,k)

\displaystyle=\sum_{\xi}\bigg{|}\frac{1}{2}\big{(}1+\xi\big{)}c_{-1}^{(s-1)}\bigg{|}-\left|c_{-1}^{(s-1)}\right|=\bigg{|}\frac{1}{2}2c_{-1}^{(s-1)}\bigg{|}-\left|c_{-1}^{(s-1)}\right|=0\,,

as expected.

	$\displaystyle\Delta_{s}S(\alpha,\,k)\equiv$
	$\displaystyle\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\Big{(}S\left[p_{s}(\phi\|\xi;\alpha,k)\right]-S\left[p_{s-1}(\phi)\right]\Big{)}$
	$\displaystyle=-\left\|c_{-1}^{(s-1)}\right\|+\sum_{\xi}\bigg{\|}\frac{1}{2}\big{(}1+\xi(1-\lambda_{k})\big{)}c_{-1}^{(s-1)}$
	$\displaystyle\qquad\qquad\qquad+\xi\lambda_{k}\frac{\zeta_{k}}{4}\left(\mathrm{e}^{i\alpha}c_{-1+k}^{(s-1)}+\mathrm{e}^{-i\alpha}c_{-1-k}^{(s-1)}\right)\bigg{\|}\,,$		(7)

	$\displaystyle\Delta_{s}S(\alpha,\,k)$	$\displaystyle=\sum_{\xi}\Pi_{\xi}(\alpha,\,k)\Big{(}S\left[p_{s}(\phi\|\xi;\alpha,k)\right]-S\left[p_{s-1}(\phi)\right]\Big{)}$
		$\displaystyle=\sum_{\xi}\Pi_{\xi}(\alpha,\,k)S\left[p_{s}(\phi\|\xi;\alpha,k)\right]-\left\|c_{-1}^{(s-1)}\right\|$
		$\displaystyle=\sum_{\xi}\left\|\int_{0}^{2\pi}\frac{d\phi}{2\pi}P_{\xi}(\alpha,k\phi)p_{s-1}(\phi)\mathrm{e}^{i\phi}\right\|-\left\|c_{-1}^{(s-1)}\right\|\,.$

Time-adaptive phase estimation

Abstract

I Introduction

II Bayesian estimation

III Adaptive procedures

III.1 Expected knowledge gain

III.2 Choosing kk

III.2.1 Multi-step gain method

III.2.2 Gain rate method

IV Numerical representations

V Simulation results

VI Conclusion

Code availability

Author contributions

Acknowledgements.

References

Supplemental Material I

S.I kk-value subsets

S.II Intermediate time-repetition regime

S.III Multi-step gain method

S.IV Computational complexity of contractions

S.V Expected entropy gain

S.V.1 Solving the integral

S.V.2 Simplifying and solving the series

S.V.3 Summary

S.V.4 Limiting cases

S.VI Expected sharpness gain

S.VI.1 Limiting cases

III.2 Choosing $k$

S.I $k$ -value subsets