This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Uncorrelated problem-specific samples of quantum states from zero-mean Wishart distributions

Rui Han1, Weijun Li1111Now at Trinity Hall, Cambridge University, Cambridge CB3 0D2, UK \not\in EU, Shrobona Bagchi1222Now at Raymond and Beverly Sackler School of Physics and Astronomy, Tel-Aviv University, Tel-Aviv 69978, Israel, Hui Khoon Ng2,1,3 and Berthold-Georg Englert1,3,4 1Centre for Quantum Technologies, National University of Singapore, Singapore 2Yale-NUS College, Singapore 3MajuLab, CNRS-UCA-SU-NUS-NTU International Joint Research Unit, Singapore 4Department of Physics, National University of Singapore, Singapore han.rui@quantumlah.org
Abstract

Random samples of quantum states are an important resource for various tasks in quantum information science, and samples in accordance with a problem-specific distribution can be indispensable ingredients. Some algorithms generate random samples by a lottery that follows certain rules and yield samples from the set of distributions that the lottery can access. Other algorithms, which use random walks in the state space, can be tailored to any distribution, at the price of autocorrelations in the sample and with restrictions to low-dimensional systems in practical implementations. In this paper, we present a two-step algorithm for sampling from the quantum state space that overcomes some of these limitations.

We first produce a CPU-cheap large proposal sample, of uncorrelated entries, by drawing from the family of complex Wishart distributions, and then reject or accept the entries in the proposal sample such that the accepted sample is strictly in accordance with the target distribution. We establish the explicit form of the induced Wishart distribution for quantum states. This enables us to generate a proposal sample that mimics the target distribution and, therefore, the efficiency of the algorithm, measured by the acceptance rate, can be many orders of magnitude larger than that for a uniform sample as the proposal.

We demonstrate that this sampling algorithm is very efficient for one-qubit and two-qubit states, and reasonably efficient for three-qubit states, while it suffers from the “curse of dimensionality” when sampling from structured distributions of four-qubit states.

Keywords: sampling algorithm, quantum state space, random matrix, Wishart distribution.

Posted on the arXiv on 16 June 2021.

1 Introduction

Random states — states with random properties — play a central role in statistical mechanics because one simply cannot account for the huge number of dynamical degrees of freedom in systems composed of very many constituents. In quantum mechanics — in particular, in quantum information applications — random states also appear naturally even if there are relatively few degrees of freedom. On the one hand, there is always the noisy and uncontrollable interaction of the quantum system under study with its environment; on the other hand, our knowledge about the system is always uncertain because the state parameters cannot be known with perfect precision.

There are many instances in quantum information science where a random sample of quantum states, pure or mixed, is desirable: in the evaluation of entanglement [1, 2, 3, 4]; for the numerical testing of a noise model for state preparation or gate operation [5, 6]; when optimizing a function on the state space [4]; for determining the size and credibility of a region in the state space [7, 8, 9]; when determining the minimum output entropy of a quantum channel [4]; and in many other applications. While direct sampling, in which an easy-to-generate raw sample is converted into a sample from the desired distribution by an appropriate transformation, is an option for some particular distributions, it does not offer a flexible general strategy.

The seemingly simple requirement that the quantum state is represented by a positive unit-trace Hilbert-space operator translates into complicated conditions obeyed by the variables that parameterize the state, and these constraints must be enforced in the sampling algorithm. In one standard approach, the probability distribution of the quantum states is defined by the sampling algorithm — a lottery for pure system-and-ancilla states, for example, followed by tracing out the ancilla [10, 4, 11] — and this can yield large high-quality samples for the distributions that are accessible in this way. This includes uniform distributions for the Haar measure, the Hilbert–Schmidt measure, the Bures measure, or the measure induced by the partial trace [12, 13, 11].111The sample used in [14] for demonstrating that local hidden variables are a lost cause is of this lottery-defined kind, too. The major drawback is that the exact probability distribution for random states constructed this way are typically not known and one has to rely on the error-prone numerical estimation for an approximated distribution.

Another standard approach generates samples by a random walk in the quantum state space (or, rather, in the parameter space for it). These Monte Carlo (MC) algorithms can accommodate any target distribution by flexible rules for the walk. The well-known Markov Chain MC algorithm can generate samples in the probability simplex, followed by an accept-reject step that enforces the constraints [15, 16]. There is also the Hamiltonian MC algorithm, in which the constraints are obeyed by design, at the price of a rather more complicated parameterization [17, 18].222The samples of quantum channels in [5] are generated in this way. These random-walk algorithms yield samples with autocorrelations, and the effective sample size can be a small fraction. Moreover, we encounter implementation problems when applying the MC algorithm to higher-dimensional system.

In view of these observations, two directions suggest themselves for exploration. One could begin with easy-to-generate and flexible raw samples drawn from a proposal distribution, richer in structure than the uniform samples mentioned above, and then perform suitable acceptance-rejection resampling (or rejection sampling in standard terminology) to obtain a sample in accordance with the target distribution. Or one could process large uncorrelated333The term “uncorrelated” is synonymous to “independent and identically distributed” (i.i.d.), which is standard terminology in statistics. raw samples by tailored random walks that eventually enforce the constraints. In this work, we explore the first suggestion by adapting the complex Wishart distributions444So named in recognition of John Wishart’s [19] seminal work of 1928 [20]. For important follow-up developments and extensive discussions of Wishart distributions as well as, more generally, of random matrices and positive random matrices as tools for multivariate statistical analysis with real-life applications in, for example, biology, physics, social science, and meteorology, see [21, 22, 23, 24, 25]. from the statistics literature to suitable proposal distributions of quantum states. The second suggestion is the subject matter of a separate paper where we use a sequentially constraint MC algorithm [26].

The general idea of the first suggestion — generate a sample drawn from the target distribution by rejecting or accepting entries from a larger CPU-cheap raw sample — has many potential implementations, and the target distribution will inform the choice we make for the proposal distribution from which we draw the raw sample. We demonstrate the practical feasibility of the method at the example of target distributions as they arise in the context of quantum state estimation (section 3), and we use raw samples generated from properly adapted Wishart distributions (section 2). The target and proposal distributions used for illustration have single narrow peaks, but that is no limitation as convex sums of several Wishart distributions are proposal distributions for target distributions with more peaks.

More specifically, we introduce the complex Wishart distribution of a quantum state in section 2 and derive its explicit probability distribution, followed by a look at its important properties, such as the peak location and peak shape; then we discuss how to get a useful proposal distribution from it. Section 3 deals with the exemplary target distributions that one meets in quantum state estimation. In section 4, then, we show how rejection sampling converts a sample drawn from the proposal distribution into a sample drawn from the target distribution. We verify the sample so generated by the procedure described in section 5, with technical details delegated to the appendix. After commenting on the limitations that follow from the “curse of dimensionality” in section 6, we present various samples of quantum states in section 7 that illustrate our sampling method. The examples demonstrate that the method is reliable and competitive but does not escape the curse of dimensionality. We close with a summary and outlook in section 8.

2 Wishart distributions of quantum states

2.1 The quantum state space

A state ρ\rho of a mm-dimensional quantum system, represented by a positive-semidefinite unit-trace m×mm\times m matrix, is of the form

ρ=1m𝟏m+l=1m21ϱlBl,\rho=\frac{1}{m}\mathbf{1}_{m}+\sum_{l=1}^{m^{2}-1}\varrho_{l}B_{l}\,, (1)

where 𝟏m\mathbf{1}_{m} is the m×mm\times m unit matrix, the set {B1,B2,,Bm21}\bigl{\{}B_{1},B_{2},\dots,B_{m^{2}-1}\bigr{\}} is an orthonormal basis in the (m21)(m^{2}-1)-dimensional real vector space of traceless hermitian m×mm\times m matrices,

Bl=Bl,tr(Bl)=0,tr(BlBl)=δll,B_{l}^{\vphantom{\dagger}}=B_{l}^{\dagger}\,,\quad\mathrm{tr}{\left(B_{l}\rule{0.0pt}{0.0pt}\right)}=0\,,\quad\mathrm{tr}{\left(B_{l}B_{l^{\prime}}\rule{0.0pt}{0.0pt}\right)}=\delta_{ll^{\prime}}\,, (2)

and the ϱl\varrho_{l}s are the coordinates for the vector associated with ρ\rho. The volume element in this euclidean space,

(dρ)=l=1m21dϱl,({\mathrm{d}}\rho)=\prod_{l=1}^{m^{2}-1}{\mathrm{d}}\varrho_{l}\,, (3)

is invariant under basis changes; it is independent of the choice made for the BlB_{l}s. Since tr((ρρ)2)=l(ϱlϱl)2\mathrm{tr}{\left(\rule{0.0pt}{10.0pt}(\rho-\rho^{\prime})^{2}\rule{0.0pt}{0.0pt}\right)}=\sum_{l}(\varrho_{l}-\varrho_{l}^{\prime})^{2}, this is the volume element induced by the Hilbert–Schmidt distance. If we use the first m1m-1 diagonal matrix elements ρjj\rho_{jj} of ρ\rho as well as the real and imaginary parts of the off-diagonal matrix elements, ρjk=ρjk(r)+iρjk(i)=ρkj{\rho_{jk}^{\ }=\rho_{jk}^{(\mathrm{r})}+{\mathrm{i}}\rho_{jk}^{({\mathrm{i}})}=\rho_{kj}^{*}} for j<kj<k, as integration variables, which are coordinates in a basis that is not orthonormal, we have

(dρ)=(2m(m1)m)12[dρ]with[dρ]=j=1m1dρjjk=j+1mdρjk(r)dρjk(i),\displaystyle({\mathrm{d}}\rho)={\left(2^{m(m-1)}m\right)}^{\frac{1}{2}}[{\mathrm{d}}\rho]\quad\mbox{with}\quad[{\mathrm{d}}\rho]=\prod_{j=1}^{m-1}{\mathrm{d}}\rho_{jj}\prod_{k=j+1}^{m}{\mathrm{d}}\rho^{(\mathrm{r})}_{jk}\,{\mathrm{d}}\rho^{({\mathrm{i}})}_{jk}\,, (4)

where the prefactor is the Jacobian determinant. The euclidean space contains quantum states (ρ0{\rho\geq 0}) and also unphysical ρ \rho\rule{1.00006pt}{0.0pt}s that have negative eigenvalues (ρ0{\rho\not\geq 0}). The total volume of the convex quantum state space is stated in (17).

In preparation of the following discussion of sampling ρ\rho from a Wishart distribution, we note that the m×mm\times m matrix for a quantum state can be written as

ρ=ΨΨtr(ΨΨ)withΨ=(ψ1ψ2ψn),\rho=\frac{\Psi\Psi^{\dagger}}{\mathrm{tr}{\left(\Psi\Psi^{\dagger}\rule{0.0pt}{0.0pt}\right)}}\quad\textrm{with}\quad\Psi=\bigl{(}\begin{array}[]{cccc}\psi_{1}&\psi_{2}&\cdots&\psi_{n}\end{array}\bigr{)}\,, (5)

where Ψ\Psi is a m×nm\times n matrix composed of nn mm-component columns. If we view the columns of Ψ\Psi as representations of pure states, then

ρ=ψ1ψ1+ψ2ψ2++ψnψnψ1ψ1+ψ2ψ2++ψnψn\rho=\frac{\psi_{1}^{\vphantom{\dagger}}\psi_{1}^{\dagger}+\psi_{2}^{\vphantom{\dagger}}\psi_{2}^{\dagger}+\cdots+\psi_{n}^{\vphantom{\dagger}}\psi_{n}^{\dagger}}{\psi_{1}^{\dagger}\psi_{1}^{\vphantom{\dagger}}+\psi_{2}^{\dagger}\psi_{2}^{\vphantom{\dagger}}+\cdots+\psi_{n}^{\dagger}\psi_{n}^{\vphantom{\dagger}}} (6)

is a mixed state blended from nn pure states, as if a pure system-and-ancilla state were marginalized by tracing out the ancilla. The rank of ρ\rho cannot exceed min{m,n}\min\{m,n\} and is usually equal to this minimum when Ψ\Psi is chosen “at random,” that is: drawn from a distribution on the state space, as discussed in the subsequent sections. For the purposes of this paper, we take nm{n\geq m} for granted.

2.2 The Wishart distributions

2.2.1 Gaussian distribution of complex matrices.

Now, to give meaning to the phrase “Ψ\,\Psi is chosen at random,” we regard the real and imaginary parts of the matrix elements Ψjk=αjk+iβjk{\Psi_{jk}=\alpha_{jk}+{\mathrm{i}}\beta_{jk}} of Ψ\Psi as coordinates in a 2mn2mn-dimensional euclidean space with the volume element

(dΨ)=j=1mk=1ndαjkdβjk,({\mathrm{d}}\Psi)=\prod_{j=1}^{m}\prod_{k=1}^{n}{\mathrm{d}}\alpha_{jk}\,{\mathrm{d}}\beta_{jk}\,, (7)

and then have a multivariate, zero-mean, gaussian distribution on this matrix space specified by the probability element

(dΨ)G(Ψ|Σ)=(dΨ)(2π)mndet(Σ)ne12tr(ΨΣ1Ψ),({\mathrm{d}}\Psi)\,G(\Psi|\Sigma)=({\mathrm{d}}\Psi)\,(2\pi)^{-mn}\,\mathrm{det}{\left(\Sigma\right)}^{-n}\,{\mathrm{e}}^{\mbox{\footnotesize$-\frac{1}{2}\mathrm{tr}{\left(\Psi^{\dagger}\Sigma^{-1}\Psi\rule{0.0pt}{0.0pt}\right)}$}}\,, (8)

where Σ\Sigma is a positive m×mm\times m matrix — the covariance matrix for each column in Ψ\Psi. Denoting the random variable by Ψ¯\underline{\Psi} and one of its values by Ψ\Psi, we say that “Ψ¯\,\underline{\Psi} is drawn from the zero-mean normal distribution of m×nm\times n matrices with the m×mm\times m covariance matrix Σ\Sigma\,” and, adopting the conventional notation from the statistics literature (see [27], for example), write Ψ¯Nmn(0,𝟏nΣ){\underline{\Psi}\sim{\mathrm{N}}_{mn}(0,\mathbf{1}_{n}\otimes\Sigma)}.555More general gaussian distributions have tr(Σ21(ΨΦ)Σ11(ΨΦ))\mathrm{tr}{\left(\Sigma_{2}^{-1}(\Psi-\Phi)^{\dagger}\Sigma_{1}^{-1}(\Psi-\Phi)\rule{0.0pt}{0.0pt}\right)} in the exponent, with two covariance matrices Σ1\Sigma_{1} and Σ2\Sigma_{2} and a displacement matrix Φ\Phi, and then Ψ¯Nmn(Φ,Σ2Σ1){\underline{\Psi}\sim{\mathrm{N}}_{mn}(\Phi,\Sigma_{2}\otimes\Sigma_{1})}. We only need the basic gaussian of (8) here, and explore the option of Φ0\Phi\neq 0 in a companion paper [28]. Since, in analogy with (6), we can read the n×nn\times n trace as

tr(ΨΣ1Ψ)=k=1nψkΣ1ψk,\mathrm{tr}{\left(\Psi^{\dagger}\Sigma^{-1}\Psi\rule{0.0pt}{0.0pt}\right)}=\sum_{k=1}^{n}\psi_{k}^{\dagger}\Sigma^{-1}\psi_{k}^{\vphantom{\dagger}}\,, (9)

the exponential function in (8) is the product of nn exponential functions, one factor for each column of Ψ\Psi. Having noted this factorization, we can state in which sense Σ\Sigma is the covariance matrix: For each mm-component random column ψ\psi we have the expected values 𝔼(ψ)=0{\mathbb{E}{\left(\psi\rule{0.0pt}{0.0pt}\right)}=0} for the mean and 𝔼((ψ𝔼(ψ))(ψ𝔼(ψ)))=𝔼(ψψ)=Σ{\mathbb{E}{\left(\left(\rule{0.0pt}{10.0pt}\psi-\mathbb{E}{\left(\psi\rule{0.0pt}{0.0pt}\right)}\right)\left(\rule{0.0pt}{10.0pt}\psi-\mathbb{E}{\left(\psi\rule{0.0pt}{0.0pt}\right)}\right)^{\dagger}\rule{0.0pt}{0.0pt}\right)}=\mathbb{E}{\left(\psi\psi^{\dagger}\rule{0.0pt}{0.0pt}\right)}=\Sigma} for the covariance. It follows that 𝔼(ΨΨ)=nΣ{\mathbb{E}{\left(\Psi\Psi^{\dagger}\rule{0.0pt}{0.0pt}\right)}=n\Sigma}.

2.2.2 Wishart distribution of hermitian matrices.

The numerator in (5) or (6) is a hermitian m×mm\times m matrix R=ΨΨ{R=\Psi\Psi^{\dagger}} with matrix elements Rjk=xjk+iyjk=Rkj{R_{jk}=x_{jk}+{\mathrm{i}}y_{jk}=R_{kj}^{*}} with which we associate the m2m^{2}-dimensional euclidean space with the volume element

(dR)=j=1mdxjjk=j+1mdxjkdyjk.({\mathrm{d}}R)=\prod_{j=1}^{m}{\mathrm{d}}x_{jj}\prod_{k=j+1}^{m}{\mathrm{d}}x_{jk}\,{\mathrm{d}}y_{jk}\,. (10)

As stated in Theorem 5.1 in [21], the induced probability distribution on this RR space has the probability element

(dR)W(R|Σ)\displaystyle({\mathrm{d}}R)\,W(R|\Sigma) =\displaystyle= (dR)(dΨ)G(Ψ|Σ)δ(RΨΨ)\displaystyle({\mathrm{d}}R)\int({\mathrm{d}}\Psi)\,G(\Psi|\Sigma)\,\delta(R-\Psi\Psi^{\dagger}) (11)
=\displaystyle= (dR)det(R)nm2mnΓm(n)det(Σ)ne12tr(Σ1R)\displaystyle({\mathrm{d}}R)\,\frac{\mathrm{det}{\left(R\right)}^{n-m}}{2^{mn}\,\Gamma_{m}(n)\,\mathrm{det}{\left(\Sigma\right)}^{n}}\,{\mathrm{e}}^{\mbox{\footnotesize$-\frac{1}{2}\mathrm{tr}{\left(\Sigma^{-1}R\rule{0.0pt}{0.0pt}\right)}$}}

where

Γm(n)=j=0m1πjΓ(nj)=π12m(m1)j=1m(nj)!\Gamma_{m}(n)=\prod_{j=0}^{m-1}\pi^{j}\Gamma(n-j)=\pi^{\frac{1}{2}m(m-1)}\prod_{j=1}^{m}(n-j)! (12)

is a multivariate gamma function. The random variable R¯\underline{R} is drawn from the centered complex Wishart (CCW) distribution of hermitian m×mm\times m matrices [20, 21, 22], R¯CCWm(n,Σ){\underline{R}\sim{\mathrm{CCW}}_{m}(n,\Sigma)}, that derives from Nmn(0,𝟏nΣ){{\mathrm{N}}_{mn}(0,\mathbf{1}_{n}\otimes\Sigma)}.666When n<m{n<m}, all R R\rule{1.00006pt}{0.0pt}s are rank deficient and the anti-Wishart distribution is obtained instead [29, 30]. We do not consider samples of rank-deficient matrices here.

2.2.3 The quantum Wishart distribution.

The conversion of this distribution for RR to a distribution for ρ=R/tr(R){\rho=R/\mathrm{tr}{\left(R\rule{0.0pt}{0.0pt}\right)}} requires an integration over τ=tr(R){\tau=\mathrm{tr}{\left(R\rule{0.0pt}{0.0pt}\right)}} after expressing the RjkR_{jk}s in terms of τ\tau and the ρjk\rho_{jk}s, omitting ρmm\rho_{mm} which does not appear in (4), that is

Rjk=ρjkτfor(j,k)(m,m),Rmm=(1j=1m1ρjj)τ,\begin{array}[b]{rcl}R_{jk}&=&\rho_{jk}\tau\quad\mbox{for}\quad(j,k)\neq(m,m)\,,\\[4.30554pt] R_{mm}&=&\displaystyle{\left(1-\sum_{j=1}^{m-1}\rho_{jj}\right)}\tau\,,\end{array} (13)

so that the volume elements in RR space and ρ\rho space are related by

(dR)=[dρ]dττm21.({\mathrm{d}}R)=[{\mathrm{d}}\rho]\,{\mathrm{d}}\tau\,\tau^{m^{2}-1}\,. (14)

Accordingly, upon evaluating the τ\tau integral in

(dρ)g(ρ)=[dρ]0dττm21W(R=τρ|Σ)({\mathrm{d}}\rho)\,g(\rho)=[{\mathrm{d}}\rho]\,\int_{0}^{\infty}{\mathrm{d}}\tau\,\tau^{m^{2}-1}W(R=\tau\rho|\Sigma) (15)

we arrive at the following Theorem:777The case of Σ=𝟏m\Sigma=\mathbf{1}_{m}, when g(ρ)det(ρ)nm{g(\rho)\propto\mathrm{det}{\left(\rho\right)}^{n-m}} accounts in full for the ρ\rho dependence, is well known; see, for example, (15.58) in [11] or (3.5) in [31].

For Ψ¯Nmn(0,𝟏nΣ){\underline{\Psi}\sim{\mathrm{N}}_{mn}(0,\mathbf{1}_{n}\otimes\Sigma)} with nm{n\geq m}, the m×mm\times m matrices ρ=ΨΨtr(ΨΨ)\displaystyle{\rho=\frac{\Psi\Psi^{\dagger}}{\mathrm{tr}{\left(\Psi\Psi^{\dagger}\rule{0.0pt}{0.0pt}\right)}}} are drawn from the Wishart distribution of mm-dimensional quantum states, ρ¯Wm(Q)(n,Σ){\underline{\rho}\sim{\mathrm{W}}^{(\mathrm{Q})}_{m}(n,\Sigma)}, which is specified by the probability element (dρ)g(ρ)=[dρ]Γ(mn)Γm(n)det(ρ)nmdet(Σ)ntr(Σ1ρ)mn.\displaystyle({\mathrm{d}}\rho)\,g(\rho)=[{\mathrm{d}}\rho]\,\frac{\Gamma(mn)}{\Gamma_{m}(n)}\frac{\mathrm{det}{\left(\rho\right)}^{n-m}}{\mathrm{det}{\left(\Sigma\right)}^{n}\,\mathrm{tr}{\left(\Sigma^{-1}\rho\rule{0.0pt}{0.0pt}\right)}^{mn}}\,. (16)

By construction, g(ρ)=0{g(\rho)=0} for all unphysical ρ \rho\rule{1.00006pt}{0.0pt}s. For the sake of notational simplicity, we leave the dependence of g(ρ)g(\rho) on mm, nn, and Σ\Sigma implicit, just as we do not indicate the dependence of G(Ψ|Σ)G(\Psi|\Sigma) on mm and nn in (8).

We observe that, when drawing a Ψ\Psi from the distribution G(Ψ|Σ)G(\Psi|\Sigma) to arrive at a quantum state ρ\rho drawn from the distribution g(ρ)g(\rho), it does not matter if we replace Σ\Sigma by λΣ\lambda\Sigma with a positive scaling factor λ\lambda because the product det(λΣ)tr((λΣ)1ρ)m\mathrm{det}{\left(\lambda\Sigma\right)}\,\mathrm{tr}{\left((\lambda\Sigma)^{-1}\rho\rule{0.0pt}{10.0pt}\right)}^{m} does not depend on λ\lambda. We write ΣΣ{\Sigma^{\prime}\doteq\Sigma} for two Σ\Sigmas that are equivalent in this sense.

When Σ𝟏m{\Sigma\doteq\mathbf{1}_{m}}, the probability distribution g(ρ)g(\rho) is isotropic in the sense of g(UρU)=g(ρ){g(U\rho U^{\dagger})=g(\rho)} for all unitary m×mm\times m matrices UU. In the case of a generic Σ>0{\Sigma>0}, the distribution is invariant under ρUρU{\rho\to U\rho U^{\dagger}} if UU commutes with Σ\Sigma, and only then.

2.2.4 Uniformly distributed quantum states.

A particular case is that of n=m{n=m} and Σ𝟏m{\Sigma\doteq\mathbf{1}_{m}}, when g(ρ)g(\rho) is constant and equal to the reciprocal of the volume occupied by the quantum states in the (m21)(m^{2}-1)-dimensional euclidean space,

(ρ0)(dρ)=(2m(m1)m)12Γm(m)Γ(m2).\int\limits_{\mbox{\small$\scriptstyle({\rho}\geq 0)$}}\!\!({\mathrm{d}}{\rho})={\left(2^{m(m-1)}m\right)}^{\frac{1}{2}}\frac{\Gamma_{m}(m)}{\Gamma(m^{2})}\,. (17)

Here, the ρ \rho\rule{1.00006pt}{0.0pt}s are uniformly distributed over the state space — uniform in the sense of the Hilbert–Schmidt distance.888It appears that, for many authors, uniformity in the Hilbert–Schmidt distance is the default meaning of “picking a quantum state at random;” an example is the marginalization performed in [32]. This provides a computationally cheap and reliable way of generating an uncorrelated, uniform sample of quantum states of this kind.

2.3 Efficient sampling

In addition to unitary transformations of ρ\rho, we can also consider general hermitian conjugations, which invites the following question: If ρ\rho is drawn from the distribution g(ρ)g(\rho), what is the corresponding distribution for

ρ=AρAtr(AρA),\rho^{\prime}=\frac{A\rho A^{\dagger}}{\mathrm{tr}{\left(A\rho A^{\dagger}\rule{0.0pt}{0.0pt}\right)}}\,, (18)

where AA is a nonsingular m×mm\times m matrix? Clearly, the sample of ρ\rho^{\prime}s can be generated by the replacement ΨΨ=AΨ{\Psi\to\Psi^{\prime}=A\Psi} for each Ψ\Psi drawn from G(Ψ|Σ)G(\Psi|\Sigma). We write AA in its polar form,

A=HU,A=HU\,, (19)

where UU is a unitary matrix and H=(AA)12{H=(AA^{\dagger})^{\frac{1}{2}}} is a positive matrix, and note that

(dΨ)=(dΨ)det(H)2n({\mathrm{d}}\Psi^{\prime})=({\mathrm{d}}\Psi)\,\mathrm{det}{\left(H\right)}^{2n} (20)

for the respective volume elements in (7). Then,

(dΨ)G(Ψ|Σ)\displaystyle({\mathrm{d}}\Psi)\,G(\Psi|\Sigma) =\displaystyle= (dΨ)det(H)2nG(A1Ψ|Σ)\displaystyle({\mathrm{d}}\Psi^{\prime})\,\mathrm{det}{\left(H\right)}^{-2n}G(A^{-1}\Psi^{\prime}|\Sigma) (21)
=\displaystyle= (dΨ)G(Ψ|AΣA),\displaystyle({\mathrm{d}}\Psi^{\prime})\,G(\Psi^{\prime}|A\Sigma A^{\dagger})\,,

so that Ψ\Psi^{\prime} is drawn from the multivariate gaussian distribution with the covariance matrix AΣAA\Sigma A^{\dagger}.999This is the statement of Theorem 1.2.6 in [23]. Here, then, is the answer to the question above: ρ\rho^{\prime} is drawn from g(ρ)g(\rho^{\prime}) with Σ\Sigma replaced by AΣAA\Sigma A^{\dagger} in (16), that is ρ¯Wm(Q)(n,AΣA){\underline{\rho^{\prime}}\sim{\mathrm{W}}^{(\mathrm{Q})}_{m}(n,A\Sigma A^{\dagger})}. Note that Wm(Q)(n,AΣA)=Wm(Q)(n,Σ){{\mathrm{W}}^{(\mathrm{Q})}_{m}(n,A\Sigma A^{\dagger})={\mathrm{W}}^{(\mathrm{Q})}_{m}(n,\Sigma)} when AΣAΣ{A\Sigma A^{\dagger}\doteq\Sigma}, that is: A=λ12Σ12UΣ12{A=\lambda^{\frac{1}{2}}\Sigma^{\frac{1}{2}}U\Sigma^{-\frac{1}{2}}} with λ\lambda a positive number and UU a unitary matrix. This includes the case, discussed above, of a unitary AA that commutes with Σ\Sigma.

This observation is of practical importance. We can sample Ψ\Psi from G(Ψ|𝟏m)G(\Psi|\mathbf{1}_{m}) — drawing both the real and the imaginary parts of all matrix elements Ψjk\Psi_{jk} from the one-dimensional gaussian distribution with zero mean and unit variance, which can be done very efficiently — and then put ρ=AΨΨA/tr(AAΨΨ){\rho=A\Psi\Psi^{\dagger}A^{\dagger}/\mathrm{tr}{\left(A^{\dagger}A\Psi\Psi^{\dagger}\rule{0.0pt}{0.0pt}\right)}} with AA such that ΣAA{\Sigma\doteq AA^{\dagger}}; the choice A=Σ12{A=\Sigma^{\frac{1}{2}}} suggests itself. This yields a random sample of uncorrelated ρ \rho\rule{1.00006pt}{0.0pt}s drawn from g(ρ)g(\rho).

2.4 Peak location

In (16), we have

g(ρ)det(ρΣ)ndet(ρ)mwithρΣ=Σ12ρΣ12tr(Σ1ρ),g(\rho)\propto\frac{\mathrm{det}{\left(\rho_{\Sigma}^{\ }\right)}^{n}}{\mathrm{det}{\left(\rho\right)}^{m}}\quad\mbox{with}\quad\rho_{\Sigma}^{\ }=\frac{\Sigma^{-\frac{1}{2}}\rho\Sigma^{-\frac{1}{2}}}{\mathrm{tr}{\left(\Sigma^{-1}\rho\rule{0.0pt}{0.0pt}\right)}}\,, (22)

so that g(ρ)g(\rho) is peaked at the ρ\rho that compromises between maximizing det(ρΣ)\mathrm{det}{\left(\rho_{\Sigma}^{\ }\right)} and minimizing det(ρ)\mathrm{det}{\left(\rho\right)}. Since the response of det(ρ)\mathrm{det}{\left(\rho\right)} to an infinitesimal change of ρ\rho is

δdet(ρ)=det(ρ)tr(ρ1δρ),\delta\,\mathrm{det}{\left(\rho\right)}=\mathrm{det}{\left(\rho\right)}\,\mathrm{tr}{\left(\rho^{-1}\delta\rho\rule{0.0pt}{0.0pt}\right)}\,, (23)

the stationarity requirement g(ρpeak+δρ)=g(ρpeak)g(\rho_{\mathrm{peak}}+\delta\rho)=g(\rho_{\mathrm{peak}}) reads

tr([(nm)ρpeak1mnΣ1tr(Σ1ρpeak)]δρ)=0\mathrm{tr}{\left({\left[(n-m)\rho_{\mathrm{peak}}^{-1}-\frac{mn\Sigma^{-1}}{\mathrm{tr}{\left(\Sigma^{-1}\rho_{\mathrm{peak}}\rule{0.0pt}{0.0pt}\right)}}\right]}\delta\rho\rule{0.0pt}{0.0pt}\right)}=0 (24)

for all permissible δρ \delta\rho\rule{1.00006pt}{0.0pt}s. It follows that we need101010For a hermitian m×mm\times m matrix AρA_{\rho} that is a functional of ρ\rho, the requirement that tr(Aρδρ)=0{\mathrm{tr}{\left(A_{\rho}\delta\rho\rule{0.0pt}{0.0pt}\right)}=0} holds for all permissible δρ \delta\rho\rule{1.00006pt}{0.0pt}s implies Aρρ=ρAρ=ρtr(Aρρ){A_{\rho}\rho=\rho A_{\rho}=\rho\,\mathrm{tr}{\left(A_{\rho}\rho\rule{0.0pt}{0.0pt}\right)}}.

Σ(ρpeak1+m2nm𝟏m)1\Sigma\doteq{\left(\rho_{\mathrm{peak}}^{-1}+\frac{m^{2}}{n-m}\mathbf{1}_{m}\right)}^{-1} (25)

if we want g(ρ)g(\rho) to be largest at ρ=ρpeak\rho=\rho_{\mathrm{peak}},

maxρ{g(ρ)}=g(ρpeak).\max_{\rho}{\left\{g(\rho)\right\}}=g(\rho_{\mathrm{peak}})\,. (26)

Clearly, for every full-rank ρpeak\rho_{\mathrm{peak}}, there are probability densities g(ρ)g(\rho) that have their maximum there, one such g(ρ)g(\rho) for each nn value larger than mm.

The covariance matrix (25) applies for n>m{n>m} and a full-rank ρpeak\rho_{\mathrm{peak}}, as g(ρ)=0{g(\rho)=0} for a rank-deficient ρ\rho when n>m{n>m}. If n=m{n=m}, we get Σ𝟏m{\Sigma\doteq\mathbf{1}_{m}} for all full-rank ρpeak\rho_{\mathrm{peak}}s, which is the g(ρ)=constant{g(\rho)=\mathrm{constant}} situation noted above.

For n=m{n=m} and Σ≐̸𝟏m{\Sigma\not\doteq\mathbf{1}_{m}}, g(ρ)g(\rho) is largest when tr(Σ1ρ)\mathrm{tr}{\left(\Sigma^{-1}\rho\rule{0.0pt}{0.0pt}\right)} is smallest and, therefore, the eigenvalue-subspace of Σ\Sigma for its largest eigenvalue comprises all ρpeak\rho_{\mathrm{peak}}s. While this enables us to design g(ρ)g(\rho) such that it is maximal for a rank-deficient ρpeak\rho_{\mathrm{peak}}, it is of little practical use; more about this in section 3.2.

2.5 Peak shape

For ρ \rho\rule{1.00006pt}{0.0pt}s near ρpeak\rho_{\mathrm{peak}},

ρ=ρpeak+ϵ=ρpeak+l=1m21εlBl,\rho=\rho_{\mathrm{peak}}+\epsilon=\rho_{\mathrm{peak}}+\sum_{l=1}^{m^{2}-1}\varepsilon_{l}B_{l}\,, (27)

where the traceless hermitian m×mm\times m matrix ϵ\epsilon and its real coordinates εl\varepsilon_{l} are small on the relevant scales, we have

logg(ρpeak+ϵ)g(ρpeak)\displaystyle\displaystyle\log\frac{g(\rho_{\mathrm{peak}}+\epsilon)}{g(\rho_{\mathrm{peak}})} \displaystyle\cong 12(nm)[tr((ρpeak1ϵ)2)nmmntr(ρpeak1ϵ)2]\displaystyle\displaystyle-\frac{1}{2}(n-m){\left[\mathrm{tr}{\left({\left(\rho_{\mathrm{peak}}^{-1}\epsilon\right)}^{2}\rule{0.0pt}{0.0pt}\right)}-\frac{n-m}{mn}\mathrm{tr}{\left(\rho_{\mathrm{peak}}^{-1}\epsilon\rule{0.0pt}{0.0pt}\right)}^{2}\right]} (28)
=\displaystyle= 12l,lεlGllεl\displaystyle\displaystyle-\frac{1}{2}\sum_{l,l^{\prime}}\varepsilon_{l}G_{ll^{\prime}}\varepsilon_{l^{\prime}}

with

Gll\displaystyle G_{ll^{\prime}} =\displaystyle= (nm)tr(ρpeak1Blρpeak1Bl)\displaystyle(n-m)\mathrm{tr}{\left(\rho_{\mathrm{peak}}^{-1}B_{l}\rho_{\mathrm{peak}}^{-1}B_{l^{\prime}}\rule{0.0pt}{0.0pt}\right)} (29)
(nm)2mntr(ρpeak1Bl)tr(ρpeak1Bl)\displaystyle\mbox{}-\frac{(n-m)^{2}}{mn}\mathrm{tr}{\left(\rho_{\mathrm{peak}}^{-1}B_{l}\rule{0.0pt}{0.0pt}\right)}\mathrm{tr}{\left(\rho_{\mathrm{peak}}^{-1}B_{l^{\prime}}\rule{0.0pt}{0.0pt}\right)}

upon discarding terms proportional to the third and higher powers of ϵ\epsilon. For the gaussian approximation of g(ρ)g(\rho) in the vicinity of ρpeak\rho_{\mathrm{peak}} in (28), the inverse of the matrix GG is the covariance matrix in the coordinate space. It follows that the distribution g(ρ)g(\rho) is narrower when nn is larger; and since ρpeak12ϵρpeak12\rho_{\mathrm{peak}}^{-\frac{1}{2}}\epsilon\rho_{\mathrm{peak}}^{-\frac{1}{2}} appears in (28), rather than ϵ\epsilon by itself, the distribution will be particularly narrow in the directions associated with the smallest eigenvalues of ρpeak\rho_{\mathrm{peak}}.

We break for a quick look at one-qubit and multi-qubit examples and then pick up the story in section 2.8.

2.6 Example: Wishart samples for a qubit

The qubit case (m=2{m=2}) is the one case that we can visualize. The state space is the unit Bloch ball [33] with cartesian coordinates x,y,zx,y,z introduced in the standard way,

ρ=12(1+zxiyx+iy1z)=12(𝟏2+xσx+yσy+zσz)=12(𝟏2+𝝈𝝈),(dρ)=18dxdydz,det(ρ)=14(1x2y2z2),\begin{array}[b]{rcl}\rho&=&\displaystyle\frac{1}{2}{\left(\begin{array}[]{cc}1+z&x-{\mathrm{i}}y\\ x+{\mathrm{i}}y&1-z\end{array}\right)}=\frac{1}{2}\bigl{(}\mathbf{1}_{2}+x\sigma_{x}+y\sigma_{y}+z\sigma_{z}\bigr{)}=\frac{1}{2}(\mathbf{1}_{2}+\langle\boldsymbol{\sigma}\rangle\cdot\boldsymbol{\sigma})\,,\\[12.91663pt] {}({\mathrm{d}}\rho)&=&\displaystyle\frac{1}{\sqrt{8}}\,{\mathrm{d}}x\,{\mathrm{d}}y\,{\mathrm{d}}z\,,\quad\mathrm{det}{\left(\rho\right)}=\frac{1}{4}(1-x^{2}-y^{2}-z^{2})\,,\end{array} (30)

where σx\sigma_{x}, σy\sigma_{y}, σz\sigma_{z} are the standard Pauli matrices, the cartesian components of the vector matrix 𝝈\boldsymbol{\sigma}, and xx, yy, zz are their respective expectation values, such as x=σx=tr(ρσx)x=\langle\sigma_{x}\rangle=\mathrm{tr}{\left(\rho\sigma_{x}\rule{0.0pt}{0.0pt}\right)}, with x2+y2+z21{x^{2}+y^{2}+z^{2}\leq 1}. Any choice of right-handed coordinate axes is fine (proper rotations in the three-dimensional euclidean space are unitary transformations of the 2×22\times 2 matrices) and, by convention, we choose the coordinate axes such that Σ\Sigma is diagonal,

Σ(eϑ00eϑ),det(Σ)tr(Σ1ρ)2=(coshϑzsinhϑ)2.\Sigma\doteq{\left(\begin{array}[]{cc}{\mathrm{e}}^{\mbox{\footnotesize$\vartheta$}}&0\\ 0&{\mathrm{e}}^{\mbox{\footnotesize$-\vartheta$}}\end{array}\right)}\,,\quad\mathrm{det}{\left(\Sigma\right)}\,\mathrm{tr}{\left(\Sigma^{-1}\rho\rule{0.0pt}{0.0pt}\right)}^{2}=(\cosh\vartheta-z\sinh\vartheta)^{2}\,. (31)

Then we have

(dρ)g(ρ)=dxdydz12πn14n1Γ(2n)Γ(n)2(1x2y2z2)n2(coshϑzsinhϑ)2n=ds 2(n1)s(1s2)n2dϕ2πdz22n1Γ(2n)Γ(n)2(1z2)n1(coshϑzsinhϑ)2n,\begin{array}[b]{rcl}({\mathrm{d}}\rho)\,g(\rho)&=&\displaystyle{\mathrm{d}}x\,{\mathrm{d}}y\,{\mathrm{d}}z\,\frac{1}{2\pi}\frac{n-1}{4^{n-1}}\frac{\Gamma(2n)}{\Gamma(n)^{2}}\frac{(1-x^{2}-y^{2}-z^{2})^{n-2}}{(\cosh\vartheta-z\sinh\vartheta)^{2n}}\\[8.61108pt] &=&\displaystyle{\mathrm{d}}s\,2(n-1)s(1-s^{2})^{n-2}\;\frac{{\mathrm{d}}\phi}{2\pi}\,\;\frac{{\mathrm{d}}z}{2^{2n-1}}\,\frac{\Gamma(2n)}{\Gamma(n)^{2}}\frac{(1-z^{2})^{n-1}}{(\cosh\vartheta-z\sinh\vartheta)^{2n}}\,,\end{array}

where s,ϕs,\phi are polar coordinates in the unit disk and (x,y)=1z2(scosϕ,ssinϕ){(x,y)=\sqrt{1-z^{2}}\,(s\cos\phi,s\sin\phi)}. In view of the factorization of the s,ϕ,zs,\phi,z version, the probability (dρ)g(ρ)({\mathrm{d}}\rho)\,g(\rho) is the product of its three marginal probabilities, which tells us that ss, ϕ\phi, and zz are independent random variables.111111Note that the substitution z=(z+tanhϑ)/(1+ztanhϑ)z=(z^{\prime}+\tanh\vartheta)/(1+z^{\prime}\tanh\vartheta) yields a zz^{\prime} marginal dz(1z2)n1{\propto{\mathrm{d}}z^{\prime}\,(1-{z^{\prime}}^{2})^{n-1}} that does not depend on ϑ\vartheta. Similarly, the substitution x=x/(coshϑ)2(xsinhϑ)2{x=x^{\prime}/\sqrt{(\cosh\vartheta)^{2}-(x^{\prime}\sinh\vartheta)^{2}}} yields a ϑ\vartheta-independent xx^{\prime} marginal dx(1x2)n1{\propto{\mathrm{d}}x^{\prime}\,(1-{x^{\prime}}^{2})^{n-1}}, and likewise for yy. This g(ρ)g(\rho) is largest for ρpeak\rho_{\mathrm{peak}} with (x,y,z)peak=(0,0,zpeak)(x,y,z)_{\mathrm{peak}}^{\ }=(0,0,z_{\mathrm{peak}}^{\ }) where zpeakz_{\mathrm{peak}}^{\ } is given by

tanhϑ=(n2)zpeakn2zpeak2,\tanh\vartheta=\frac{(n-2)z_{\mathrm{peak}}^{\ }}{n-2z_{\mathrm{peak}}^{2}}\,, (32)

whereas the zz-marginal is largest for the zz that solves

tanhϑ=(n1)znz2,\tanh\vartheta=\frac{(n-1)z}{n-z^{2}}\,, (33)

which gives a value between z=0z=0 and zpeakz_{\mathrm{peak}}^{\ }.

For the most natural choice of orthonormal basis matrices — that is B1=212σx{B_{1}=2^{-\frac{1}{2}}\sigma_{x}}, B2=212σy{B_{2}=2^{-\frac{1}{2}}\sigma_{y}}, and B3=212σz{B_{3}=2^{-\frac{1}{2}}\sigma_{z}} — we have

ϵ=12(ε3ε1iε2ε1+iε2ε3)\epsilon=\frac{1}{\sqrt{2}}{\left(\begin{array}[]{cc}\varepsilon_{3}&\varepsilon_{1}-{\mathrm{i}}\varepsilon_{2}\\ \varepsilon_{1}+{\mathrm{i}}\varepsilon_{2}&-\varepsilon_{3}\end{array}\right)} (34)

here, and find

logg(ρpeak+ϵ)g(ρpeak)124(n2)1zpeak2(ε12+ε22+1+2nzpeak21zpeak2ε32)\log\frac{g(\rho_{\mathrm{peak}}+\epsilon)}{g(\rho_{\mathrm{peak}})}\cong-\frac{1}{2}\frac{4(n-2)}{1-z_{\mathrm{peak}}^{2}}{\left(\varepsilon_{1}^{2}+\varepsilon_{2}^{2}+\frac{1+\frac{2}{n}z_{\mathrm{peak}}^{2}}{1-z_{\mathrm{peak}}^{2}}\varepsilon_{3}^{2}\right)} (35)

near the peak. The longitudinal variance (coordinate ε3\varepsilon_{3}) is smaller by the factor (1zpeak2)/(1+2nzpeak2)\left(1-z_{\mathrm{peak}}^{2}\right)/\left(1+\frac{2}{n}z_{\mathrm{peak}}^{2}\right) than the transverse variance; all variances decrease when nn and zpeakz_{\mathrm{peak}} increase.

Refer to caption
Figure 1: Single-qubit quantum Wishart distribution. The top plots show a sample of 10 00010\rule{1.00006pt}{0.0pt}000 quantum states drawn from g(ρ)g(\rho) of (2.6) for n=5{n=5} and ρpeak=12(𝟏2+45σz){\rho_{\mathrm{peak}}=\frac{1}{2}(\mathbf{1}_{2}+\frac{4}{5}\sigma_{z})}. Left: Pseudo-3D scatter plot of g(ρ)g(\rho) inside the Bloch ball. Center: Scatter plot in the x,zx,z plane of g(ρ)g(\rho) integrated over yy. Right: Scatter plot in the x,yx,y plane of g(ρ)g(\rho) integrated over zz.
Bottom: Histograms of the xx-marginal (blue staircase) and the zz-marginal (black staircase), obtained by binning a sample of 100 000100\rule{1.00006pt}{0.0pt}000 quantum states into fifty intervals of 0.040.04. The impulses (cyan for xx, red for zz) show the expected values of counts in each bin. The arrows indicate the peak locations of the analytical marginals at x=0{x=0} and z=0.7223{z=0.7223}.

These matters are illustrated in figure 1 for n=5{n=5} and zpeak=45{z_{\mathrm{peak}}=\frac{4}{5}}, where we have scatter plots of g(ρ)g(\rho) and of its x,zx,z and x,yx,y marginal distributions, of 10 00010\rule{1.00006pt}{0.0pt}000 quantum states each. There are also histograms, from a sample of 100 000100\rule{1.00006pt}{0.0pt}000 quantum states, for the one-dimensional marginal distributions in xx and zz, which we compare with the histograms computed from the analytical expressions. The sample marginals are respectively obtained by ignoring the Bloch-ball coordinate yy, the coordinate zz, the coordinate pair (y,z)(y,z), or the coordinate pair (x,y)(x,y) of the sample entries. This deliberate ignorance corresponds to integrating g(ρ)g(\rho) over these variables. Note, in particular, the very good agreements of the sample histograms with the analytical ones, which is partial confirmation that the sampling algorithm of sections 2.2 and 2.3 does indeed yield a sample in accordance with the Wishart distribution in (2.6).

2.7 Example: Multi-qubit analog

As one multi-qubit example, we consider an analog of (31)–(32) for nqbn_{\mathrm{qb}} qubits,

Σcoshϑ 1m+sinhϑσznqb,ρpeak=1m(𝟏m+zpeakσznqb),\begin{array}[b]{rcl}\Sigma&\doteq&\displaystyle\cosh\vartheta\;\mathbf{1}_{m}+\sinh\vartheta\;\sigma_{z}^{\otimes n_{\mathrm{qb}}}\,,\\ {}\rho_{\mathrm{peak}}&=&\displaystyle\frac{1}{m}\Bigl{(}\mathbf{1}_{m}\rule{0.0pt}{23.0pt}+z_{\mathrm{peak}}^{\ }\;\sigma_{z}^{\otimes n_{\mathrm{qb}}}\Bigr{)}\,,\end{array}

with m=2nqb{m=2^{n_{\mathrm{qb}}}} and ϑ\vartheta related to zpeakz_{\mathrm{peak}}^{\ } through the analog of (32) with “22” replaced by mm. As observed in (29), different values of zpeakz_{\mathrm{peak}}^{\ } do not just give different peak locations, they also give different shapes of the distribution. We look at g(ρ)g(\rho) for ρ=ρpeak+εm12σznqb{\rho=\rho_{\mathrm{peak}}+\varepsilon m^{-\frac{1}{2}}\sigma_{z}^{\otimes n_{\mathrm{qb}}}}, where ε\varepsilon is the coordinate in the longitudinal direction. In other words, ρ\rho is of the form of ρpeak\rho_{\mathrm{peak}} in (2.7) with εm12\varepsilon m^{\frac{1}{2}} added to zpeakz_{\mathrm{peak}}^{\ }.

We quantify the width of the distribution, as a function of ε\varepsilon, by the full width at half maximum (FWHM). The approximate value that follows from the gaussian approximation in (28),

FWHM2m(1zpeak2)[log(4)(nm)(1+mnzpeak2)]12,{\mathrm{FWHM}}\cong\frac{2}{m}{\left(1-z_{\mathrm{peak}}^{2}\right)}{\left[\frac{\log(4)}{(n-m){\left(1+\frac{m}{n}z_{\mathrm{peak}}^{2}\right)}}\right]}^{\mbox{\footnotesize$\frac{1}{2}$}}\,, (36)

catches the dependence on zpeakz_{\mathrm{peak}}^{\ } and nn well. It slightly overestimates FWHM for small zpeakz_{\mathrm{peak}} values, is very close to the actual FWHM for zpeak0.5{z_{\mathrm{peak}}\simeq 0.5}, and slightly underestimates FWHM for large zpeakz_{\mathrm{peak}} values. The relative error is in the range 4% 2%{-4\%\,\cdots\,2\%} when n>m+16/m{n>m+16/m}; this is illustrated in figure 2.

Refer to caption
Figure 2: Accuracy of the approximation (36). For one to seven qubits (m=2nqb=2,4,8,,128m=2^{n_{\mathrm{qb}}}=2,4,8,\dots,128), the plot reports the relative error for n=m+16/m+1{n=\lfloor m+16/m\rfloor+1} as a function of zpeakz_{\mathrm{peak}}; the error is smaller for larger nn values. Note that the curves for nqb=4{n_{\mathrm{qb}}=4} and nqb=5{n_{\mathrm{qb}}=5} are indiscernible for zpeak0.5{z_{\mathrm{peak}}\lesssim 0.5}.

2.8 Linearly shifted distributions

After choosing ρpeak\rho_{\mathrm{peak}} and thus Σ\Sigma such that the peak shape of the Wishart distribution g(ρ)g(\rho) fits to that of the target distribution, in the sense discussed at the end of section 3, the two peak shapes are matched. The peak locations usually do not agree, however.

The proposal distribution is, therefore, obtained by shifting the Wishart distribution g(ρ)g(\rho). After acquiring a sample drawn from Wm(Q)(n,Σ){\mathrm{W}}^{(\mathrm{Q})}_{m}(n,\Sigma), we turn each ρ\rho^{\prime} from the sample into ρ\rho by the linear shift map

ρ=ρ+Δρwith(dρ)=(dρ),\rho=\rho^{\prime}+\Delta\rho\quad\mbox{with}\quad({\mathrm{d}}\rho)=({\mathrm{d}}\rho^{\prime})\,, (37)

where Δρ\Delta\rho is a traceless hermitian m×mm\times m matrix that we choose suitably. The distribution of the ρ\rho sample then peaks at ρ=ρpeak+Δρ{\rho=\rho_{\mathrm{peak}}+\Delta\rho} and is characterized by the probability element

(dρ)gs(ρ)=[dρ]Γ(mn)Γm(n)det(ρΔρ)nmdet(Σ)ntr(Σ1(ρΔρ))mn.({\mathrm{d}}\rho)\,g_{\mathrm{s}}(\rho)=[{\mathrm{d}}\rho]\,\frac{\Gamma(mn)}{\Gamma_{m}(n)}\frac{\mathrm{det}{\left(\rho-\Delta\rho\right)}^{n-m}}{\mathrm{det}{\left(\Sigma\right)}^{n}\,\mathrm{tr}{\left(\Sigma^{-1}(\rho-\Delta\rho)\rule{0.0pt}{10.0pt}\right)}^{mn}}\,. (38)

Since the mapping (37) linearly shifts the entire state space, there are ρ \rho\rule{1.00006pt}{0.0pt}s with no preimage ρ\rho^{\prime} in the state space, so that gs(ρ)=0{g_{\mathrm{s}}(\rho)=0} in the corresponding part of the state space. There are also unphysical ρ \rho\rule{1.00006pt}{0.0pt}s with gs(ρ)>0g_{\mathrm{s}}(\rho)>0; they will be discarded during the rejection sampling discussed in section 4. The matter is illustrated in figure 3. While the fraction of physical ρ \rho\rule{1.00006pt}{0.0pt}s in the shifted distribution gs(ρ)g_{\mathrm{s}}(\rho) is quite large (0.6\gtrsim 0.6) for a single qubit even when the shift is by half of the radius of the Bloch ball, this fraction is quite small for several-qubit distributions unless the shift itself is small enough. Therefore, one needs to compromise between Σ\Sigma, nn, and Δρ\Delta\rho and exploit (25) when choosing Σ\Sigma and nn such that Δρ\Delta\rho is kept small.

Refer to caption
Figure 3: Linearly shifted Wishart distributions. Left: Uniform single-qubit distribution shifted by Δρ=14σz{\Delta\rho=\frac{1}{4}\sigma_{z}} and partly beyond the Bloch ball. The physical ρ \rho\rule{1.00006pt}{0.0pt}s in the sample of 10 00010\rule{1.00006pt}{0.0pt}000 entries are marked by blue dots, the unphysical ones by red dots. Right: The fraction of physical ρ \rho\rule{1.00006pt}{0.0pt}s after shifting an isotropic nqbn_{\mathrm{qb}}-qubit distribution by Δρ=12Δzσznqb{\Delta\rho=\frac{1}{2}\Delta z\,\sigma_{z}^{\otimes n_{\mathrm{qb}}}} for nqb=1n_{\mathrm{qb}}=1, 22, 33, and 44, for n=m=2nqb{n=m=2^{n_{\mathrm{qb}}}} (circles) and n=m+1{n=m+1} (diamonds). Since the distribution is narrower for larger nn, the fraction of physical ρ \rho\rule{1.00006pt}{0.0pt}s is systematically larger for n=m+1{n=m+1} than for n=m{n=m}.

The rejection sampling of section 4 requires a proposal distribution that is strictly positive and, therefore, we need to fill the void created by the lack of ρ\rho^{\prime}s for certain ρ \rho\rule{1.00006pt}{0.0pt}s. For this purpose, we supplement the sample obtained by the shift (37) with ρ \rho\rule{1.00006pt}{0.0pt}s drawn from the uniform distribution Wm(Q)(m,𝟏m){\mathrm{W}}^{(\mathrm{Q})}_{m}(m,\mathbf{1}_{m}); this has no effect on the peak shape or the peak location. The full distribution of the proposal sample is then a convex sum of gs(ρ)g_{\mathrm{s}}(\rho) and g(ρ)=constant{g(\rho)=\mathrm{constant}}. When 0<κ<1{0<\kappa<1} is the weight assigned to the uniformly distributed subsample, and NN is the desired sample size, we repeat the sequence

    draw Ψ\Psi from G(Ψ|𝟏m);G(\Psi|\mathbf{1}_{m})\,;
    compute ρ=Σ12ΨΨΣ12tr(ΣΨΨ)+Δρ;\displaystyle\rho=\frac{\Sigma^{\frac{1}{2}}\Psi\Psi^{\dagger}\Sigma^{\frac{1}{2}}}{\mathrm{tr}{\left(\Sigma\Psi\Psi^{\dagger}\rule{0.0pt}{0.0pt}\right)}}+\Delta\rho\,;
(39)

until we have (1κ)N(1-\kappa)N entries in the sample. Then we add κN\kappa N ρ \rho\rule{1.00006pt}{0.0pt}s from the uniform distribution to complete the proposal sample.

The proposal sample produced in this way is drawn from a distribution with the probability element

(dρ)gs,κ(ρ)\displaystyle({\mathrm{d}}\rho)\,g_{\mathrm{s},\kappa}(\rho) \displaystyle\propto [dρ](1κ)Γ(mn)Γm(n)det(ρΔρ)nmdet(Σ)ntr(Σ1(ρΔρ))mn\displaystyle\displaystyle[{\mathrm{d}}\rho]\,(1-\kappa)\frac{\Gamma(mn)}{\Gamma_{m}(n)}\frac{\mathrm{det}{\left(\rho-\Delta\rho\right)}^{n-m}}{\mathrm{det}{\left(\Sigma\right)}^{n}\,\mathrm{tr}{\left(\Sigma^{-1}(\rho-\Delta\rho)\rule{0.0pt}{10.0pt}\right)}^{mn}} (40)
+[dρ]κΓ(m2)Γm(m),\displaystyle\displaystyle\mbox{}+[{\mathrm{d}}\rho]\,\kappa\,\frac{\Gamma(m^{2})}{\Gamma_{m}(m)}\,,

where we write “\propto” rather than “==” because we are missing the overall factor that normalizes gs,κ(ρ)g_{\mathrm{s},\kappa}(\rho) to unit integral over the physical ρ \rho\rule{1.00006pt}{0.0pt}s. The proposal distributions of this kind are nonzero for rank-deficient ρ \rho\rule{1.00006pt}{0.0pt}s and the samples can have many ρ \rho\rule{1.00006pt}{0.0pt}s near a part of the state-space boundary if ρpeak+Δρ\rho_{\mathrm{peak}}+\Delta\rho is close to the boundary. This is a useful feature when we need to mimic a target distribution that peaks near or on the boundary, and the proposal sample should have many points near the boundary where the unshifted g(ρ)g(\rho) vanishes when n>m{n>m}; see section 7 for examples. Such target distributions assign very little probability to the void region of the shift, and then we may not need many ρ \rho\rule{1.00006pt}{0.0pt}s from the uniform distribution in the proposal sample.

The rejection sampling discussed in section 4 has a high yield only if the proposal distribution matches the target distribution well and, therefore, a judicious choice of the parameters that define the sample is crucial for the overall efficiency of the sampling algorithm. In practice, we choose n,Σ,Δρ,κn,\Sigma,\Delta\rho,\kappa and generate a small sample, just large enough for a good estimate of the overall yield. After trying out several choices, we generate the actual large sample for the n,Σ,Δρ,κn,\Sigma,\Delta\rho,\kappa with the largest yield. Then we verify the sample by the procedure described in section 5.

3 The target distributions

The target distribution — the probability distribution on the quantum state space from which we want to sample — can be of any form. An important use of quantum samples is in Bayesian analysis, where integrals over the high-dimensional quantum state space are central quantities, which can often only be computed by Monte Carlo integration. The samples are then drawn from the respective prior distribution, or the posterior distribution after accounting for the knowledge provided by the experimental data. The target distributions that we use for illustration are of this kind. The sampling algorithm can, of course, also be used for other target distributions.

3.1 Target distributions that arise in quantum state estimation

To be specific, the examples used for illustration in section 7 refer to the scenario of quantum state estimation where many uncorrelated copies of the system are measured by an apparatus and one of the KK outcomes is found for each copy. The sequence of detection events constitute the data DD and the probability of observing the actual data, if the system is prepared in the state ρ\rho, is the likelihood

L(D|ρ)=k=1Kpkνk,L(D|\rho)=\prod_{k=1}^{K}p_{k}^{\nu_{k}}\,, (41)

where pkp_{k} is the probability of observing the kkth outcome and νk\nu_{k} is the count of kkth outcomes in the sequence; the counts themselves, 𝝂=(ν1,ν2,ν3,,νK){\boldsymbol{\nu}=(\nu_{1},\nu_{2},\nu_{3},\,\dots,\nu_{K})}, are a minimal statistic. The dependence on ρ\rho is given by the Born rule,

pk=tr(ρΠk),p_{k}=\mathrm{tr}{\left(\rho\Pi_{k}\rule{0.0pt}{0.0pt}\right)}\,, (42)

with the probability operator Πk\Pi_{k} for the kkth outcome. The Πk\Pi_{k}s are nonnegative and add up to the identity,

Πk0,k=1KΠk=1,\Pi_{k}\geq 0\,,\qquad\sum_{k=1}^{K}\Pi_{k}=1\,, (43)

but are not restricted otherwise. The permissible probabilities 𝒑=(𝒑1,𝒑2,,𝒑𝑲)\vecfont{p}=(p_{1},p_{2},\dots,p_{K}) are those consistent with ρ0{\rho\geq 0} and tr(ρ)=1{\mathrm{tr}{\left(\rho\rule{0.0pt}{0.0pt}\right)}=1}, they are a convex subset in the probability simplex — identified by pk0p_{k}\geq 0 and k=1Kpk=1{\sum_{k=1}^{K}p_{k}=1} — and it can be CPU-time expensive to check if a certain 𝒑\vecfont{p} is permissible.

The target distribution f(ρ)f(\rho) is the posterior distribution, the product of the prior probability density w0(ρ)w_{0}(\rho) and the likelihood,

f(ρ)L(D|ρ)w0(ρ),f(\rho)\propto L(D|\rho)w_{0}(\rho)\,, (44)

where we do not write the proportionality constant that ensures proper normalization. Usually, it is expedient to use a conjugate prior, that is: w0(ρ)w_{0}(\rho) is of the product form in (41) with prechosen values for the νk\nu_{k}s, which need not be integers, and then the posterior is also of this form. For the purposes of this paper, therefore, we can just use a flat prior, w0(ρ)=constant{w_{0}(\rho)=\textrm{constant}}, for which the target distribution is the likelihood in (41), possibly with noninteger values for the νk\nu_{k}s, properly normalized to unit total probability when integrated over the ρ\rho space. With the volume element (dρ)({\mathrm{d}}\rho) of (3) and (4), the probability element of the target distribution is then

(dρ)f(ρ)(dρ)k=1Ktr(Πkρ)νk({\mathrm{d}}\rho)\,f(\rho)\propto({\mathrm{d}}\rho)\,\prod_{k=1}^{K}\mathrm{tr}{\left(\Pi_{k}\rho\rule{0.0pt}{0.0pt}\right)}^{\nu_{k}} (45)

for the physical values of the integration variables — those for which ρ0\rho\geq 0 in (1), the convex set of quantum states — and f(ρ)=0f(\rho)=0 for all unphysical ρ \rho\rule{1.00006pt}{0.0pt}s. Accordingly, the normalization integral

(dρ)f(ρ)=(ρ0)(dρ)f(ρ)=1\int({\mathrm{d}}\rho)\,f(\rho)=\int\limits_{\mbox{\small$\scriptstyle({\rho}\geq 0)$}}\!\!({\mathrm{d}}{\rho})\,f(\rho)=1 (46)

has no contribution from unphysical ρ \rho\rule{1.00006pt}{0.0pt}s.

The particular form of the target distribution, proportional to the likelihood in (41), invites us to regard it as a function on the probability simplex, a polynomial on the physical subset of permissible probabilities and vanishing outside. Indeed, the sampling algorithms in [16, 18] are random walks in the probability simplex, and these algorithms have the practical limitations mentioned above: Either one needs to check if a candidate entry for the sample is permissible, which has high CPU-time costs; or one needs a complicated parameterization of the probability space and then faces issues with the large Jacobian matrix, its determinant, and its derivatives. Therefore, alternative algorithms will be useful, such as schemes that directly generate samples from the quantum state space rather than the probability space associated with the Πk\Pi_{k}s. It is the aim of this paper to contribute such an alternative algorithm.

We emphasize a crucial difference between the target distribution in (45) and the proposal distribution in (40): While both have analytical expressions, which will be important in section 4, the proposal distribution is defined by its sampling algorithm, whereas we do not know an analogous sampling algorithm for the target distribution. Therefore, we cannot sample from the target distribution in a direct way and must resort to processing samples drawn from the proposal distribution.

3.2 Peak location

The target distribution f(ρ)L(D|ρ)f(\rho)\propto L(D|\rho) is peaked at ρML\rho^{\ }_{\mathrm{ML}}, given by

maxρ{L(D|ρ)}=L(D|ρML),\max_{\rho}\bigl{\{}L(D|\rho)\bigr{\}}=L(D|\rho^{\ }_{\mathrm{ML}})\,, (47)

which is to say that ρML\rho^{\ }_{\mathrm{ML}} is the maximum-likelihood estimator for the data DD [34, 35]. While it is possible that L(D|ρ)L(D|\rho) is maximal for a multidimensional set of ρML\rho^{\ }_{\mathrm{ML}}s, there is a unique ρML\rho^{\ }_{\mathrm{ML}} for each of the target distributions that we use for illustration. If the relative frequencies ν~k=νk/k=1Kνk{\tilde{\nu}_{k}=\nu_{k}\Big{/}\sum_{k^{\prime}=1}^{K}\nu_{k^{\prime}}} are permissible probabilities, then ν~k=tr(ΠkρML){\tilde{\nu}_{k}=\mathrm{tr}{\left(\Pi_{k}\rho^{\ }_{\mathrm{ML}}\rule{0.0pt}{0.0pt}\right)}} identifies ρML\rho^{\ }_{\mathrm{ML}}, otherwise one needs to determine ρML\rho^{\ }_{\mathrm{ML}} numerically, perhaps by the fast algorithm of [36], and one may find a rank-deficient ρML\rho^{\ }_{\mathrm{ML}} on the boundary of the state space. When ρML\rho^{\ }_{\mathrm{ML}} is on, or near to, the boundary it is usually more difficult to sample from the state space in accordance with the target distribution.121212One way of checking if certain probabilities 𝒑\vecfont{p} are permissible, is to regard them as relative frequencies of mock data and determine ρML\rho^{\ }_{\mathrm{ML}} for these data. The probabilities are physical if pk=tr(ΠkρML){p_{k}=\mathrm{tr}{\left(\Pi_{k}\rho^{\ }_{\mathrm{ML}}\rule{0.0pt}{0.0pt}\right)}} for all kk, and only then.

Now, harking back to the final paragraph in section 2.4, we note that the option of matching a rank-deficient ρML\rho^{\ }_{\mathrm{ML}} by the ρpeak\rho_{\mathrm{peak}} of a n=m{n=m} Wishart distribution does not work well usually, because Wm(Q)(m,Σ){\mathrm{W}}^{(\mathrm{Q})}_{m}(m,\Sigma) is then maximal for all ρ \rho\rule{1.00006pt}{0.0pt}s in the range of ρML\rho^{\ }_{\mathrm{ML}}. Therefore, we cannot get a good match when the rank of ρML\rho^{\ }_{\mathrm{ML}} is larger than one.

When ρML\rho^{\ }_{\mathrm{ML}} has full rank, the choice Δρ=ρMLρpeak{\Delta\rho=\rho^{\ }_{\mathrm{ML}}-\rho_{\mathrm{peak}}} suggests itself for the shift in section 2.8. This is not a viable option, however, when ρML\rho^{\ }_{\mathrm{ML}} is rank deficient, a typical situation when the νk\nu_{k}s in (41) are small numbers. Rather, we choose Δρ\Delta\rho such that gs(ρ)g_{\mathrm{s}}(\rho) peaks for a full-rank ρ\rho that is close to ρML\rho^{\ }_{\mathrm{ML}}, as this gives a better over-all yield; see the example in figure 8 in section 7.

3.3 Peak shape

Let us briefly consider the problem of maximizing the target distribution f(ρ)f(\rho) over all ρ \rho\rule{1.00006pt}{0.0pt}s of the form (1) without enforcing ρ0{\rho\geq 0}, which amounts to regarding f(ρ)f(\rho) as a function of the coordinates ϱl\varrho_{l} and finding the maximum on the coordinate space. While the permissible ρ \rho\rule{1.00006pt}{0.0pt}s in (47) are positive unit-trace m×mm\times m matrices, we now maximize over all hermitian unit-trace matrices, including those with negative eigenvalues. This maximum is reached for ρ=ρ^{\rho=\widehat{\rho}}. Whenever ρ^\widehat{\rho} is in the quantum state space, ρ^0{\widehat{\rho}\geq 0}, we have ρ^=ρML{\widehat{\rho}=\rho^{\ }_{\mathrm{ML}}}; and ρML\rho^{\ }_{\mathrm{ML}} sits on the boundary of the state space whenever ρ^\widehat{\rho} is outside the state space, ρ^0{\widehat{\rho}\not\geq 0}.131313The mock probabilities p^k=tr(Πkρ^){\widehat{p}_{k}=\mathrm{tr}{\left(\Pi_{k}\widehat{\rho}\rule{0.0pt}{0.0pt}\right)}} are always in the probability simplex; they are in the convex set of the physical probabilities only when ρ^=ρML{\widehat{\rho}=\rho^{\ }_{\mathrm{ML}}}.

As noted above, we do not match the peak location of gs(ρ)g_{\mathrm{s}}(\rho) with that of f(ρ)f(\rho) when ρMLρ^{\rho^{\ }_{\mathrm{ML}}\neq\widehat{\rho}} is rank-deficient, whereas the shift Δρ=ρ^ρpeak{\Delta\rho=\widehat{\rho}-\rho_{\mathrm{peak}}} suggests itself when ρ^\widehat{\rho} is a quantum state. It is then worth trying to match the peak shape of the quantum Wishart distribution g(ρ)g(\rho) with that of the target distribution f(ρ)f(\rho). The expressions corresponding to (28) and (29), now for ρ \rho\rule{1.00006pt}{0.0pt}s near ρ^=ρML{\widehat{\rho}=\rho^{\ }_{\mathrm{ML}}} in the target distribution, are

logf(ρ^+ϵ)f(ρ^)12k=1Kνktr(Πkρ^)2tr(Πkϵ)2=12l,lεlFllεl\log\frac{f(\widehat{\rho}+\epsilon)}{f(\widehat{\rho})}\cong-\frac{1}{2}\sum_{k=1}^{K}\frac{\nu_{k}}{\mathrm{tr}{\left(\Pi_{k}\widehat{\rho}\rule{0.0pt}{0.0pt}\right)}^{2}}\mathrm{tr}{\left(\Pi_{k}\epsilon\rule{0.0pt}{0.0pt}\right)}^{2}=-\frac{1}{2}\sum_{l,l^{\prime}}\varepsilon_{l}F_{ll^{\prime}}\varepsilon_{l^{\prime}} (48)

with

Fll=ktr(BlΠk)νktr(Πkρ^)2tr(ΠkBl).F_{ll^{\prime}}=\sum_{k}\mathrm{tr}{\left(B_{l}\Pi_{k}\rule{0.0pt}{0.0pt}\right)}\,\frac{\nu_{k}}{\mathrm{tr}{\left(\Pi_{k}\widehat{\rho}\rule{0.0pt}{0.0pt}\right)}^{2}}\,\mathrm{tr}{\left(\Pi_{k}B_{l^{\prime}}\rule{0.0pt}{0.0pt}\right)}\,. (49)

When choosing the proposal distribution, one opts for ρpeak\rho_{\mathrm{peak}} and nn such that the matrix GG resembles the matrix FF. It is usually not possible to get a perfect match because GG derives from a Wishart distribution and is, therefore, subject to the symmetries discussed after (16), whereas FF is not constrained in this way.

In practice, we resort to looking at the one-dimensional “longitudinal” slice of g(ρ)g(\rho) along the line from the completely mixed state m1𝟏mm^{-1}\mathbf{1}_{m} to ρpeak\rho_{\mathrm{peak}}, that is ϵρpeakm1𝟏m{\epsilon\propto\rho_{\mathrm{peak}}-m^{-1}\mathbf{1}_{m}}, and choose the parameters such that this single-parameter distribution is a bit wider141414This illustrates the rule that one should “sample from a density gg with thicker tails than ff[27]. than that of the corresponding longitudinal slice of the target distribution f(ρ)f(\rho), obtained for ϵρ^m1𝟏m{\epsilon\propto\widehat{\rho}-m^{-1}\mathbf{1}_{m}}; in example of section 2.7, this slice is parameterized by the coordinate increment ε\varepsilon. The matter will be illustrated by more examples in section 7; see figure 8.

Regarding the shift Δρ\Delta\rho when ρ^ρML{\widehat{\rho}\neq\rho^{\ }_{\mathrm{ML}}}, we note that Δρ=r(ρMLρpeak){\Delta\rho=r(\rho^{\ }_{\mathrm{ML}}-\rho_{\mathrm{peak}})} with r1{r\lesssim 1} is a good first try for the trial-and-error search in the last paragraph of section 2.8. In this situation, it is more important to match well the distributions on and near the boundary of the state space than at ρML\rho^{\ }_{\mathrm{ML}}, and we adjust nn, Σ\Sigma, κ\kappa, and Δρ\Delta\rho in (39) for a higher over-all yield.

4 Rejection sampling

We convert the proposal sample, drawn from the distribution gs,κ(ρ)g_{\mathrm{s},\kappa}(\rho), into a sample drawn from the target distribution f(ρ)f(\rho) by rejection sampling, a procedure introduced by John von Neumann in 1951 [37]. It consists of a simple accept-reject step: A ρ\rho from the proposal sample is entered into the target sample with a probability proportional to the ratio f(ρ)/gs,κ(ρ)f(\rho)/g_{\mathrm{s},\kappa}(\rho). Since we want to have the largest possible yield, while the acceptance probability Pacc(ρ)P_{\mathrm{acc}}(\rho) cannot exceed unity, we choose

Pacc(ρ)=1Cf(ρ)gs,κ(ρ)withC=maxρ{f(ρ)gs,κ(ρ)},P_{\mathrm{acc}}(\rho)=\frac{1}{C}\frac{f(\rho)}{g_{\mathrm{s},\kappa}(\rho)}\quad\mbox{with}\quad C=\max_{\rho}{\left\{\frac{f(\rho)}{g_{\mathrm{s},\kappa}(\rho)}\right\}}\,, (50)

where the maximum is evaluated over all ρ \rho\rule{1.00006pt}{0.0pt}s in the proposal sample. The unphysical ρ \rho\rule{1.00006pt}{0.0pt}s in the proposal sample, obtained when the shift in the second step of the state lottery (39) puts ρ\rho beyond the boundary of the state space, are always rejected because f(ρ)=0{f(\rho)=0} for every unphysical ρ\rho.

It is crucial here that we have the analytical expressions for f(ρ)f(\rho) and gs,κ(ρ)g_{\mathrm{s},\kappa}(\rho) in (45) and (40), and it does not matter that we often do not know the normalization factor needed in (46) or missing in (40);151515Similarly, in the context of (50) we can put aside factors that the two terms in (40) have in common, such as the proportionality factor between (dρ)({\mathrm{d}}\rho) and [dρ][{\mathrm{d}}\rho] in (4). it is permissible to replace CC by an upper bound on the maximal ratio, at the price of a lower overall yield. Note that, unless we have an independent way of computing an upper bound on CC,161616Such cases require symmetries in the target distribution that match those of the proposal distribution. While it may be permissible to choose a Bayesian prior accordingly, the data-driven posterior will lack the symmetries. the rejection sampling cannot be done while we are entering ρ \rho\rule{1.00006pt}{0.0pt}s into the proposal sample; we must first compose the whole proposal sample, then determine the value of CC, and finally perform the rejection sampling.

As remarked in section 2.8, the proper choice of the parameters that specify the proposal distribution gs,κ(ρ)g_{\mathrm{s},\kappa}(\rho) is crucial for a good overall efficiency. Here is one more aspect that one should keep in mind: If κ\kappa is too small, the ratio f(ρ)/gs,κ(ρ)f(\rho)/g_{\mathrm{s},\kappa}(\rho) will be largest where both distributions are small — in the tails of the target distribution and where gs(ρ)g_{\mathrm{s}}(\rho) vanishes, in particular when ρ\rho has no preimage ρ\rho^{\prime} in (37) or when the preimage is rank deficient.171717The occurrence of rank-deficient preimages is an issue even when Δρ=0{\Delta\rho=0} and there is no shift in (39), but κ>0{\kappa>0} is still required. When this happens, the rejection sampling has a low yield. Therefore, we choose κ\kappa large enough to avoid this situation. This is an element in the overall-yield optimization mentioned in the final paragraph of section 2.8.

5 Sample verification

We exploit some tools, which were introduced in [38] for the quantification of the errors in quantum state estimation, for a verification of the target sample. In this section, f(ρ)f(\rho) refers to the ρ\rho dependence of the target distribution in (45) without assuming the proper normalization of (46), and ρML\rho^{\ }_{\mathrm{ML}} is the quantum state for which f(ρ)f(\rho) is maximal,

f(ρML)=maxρ{f(ρ)}.f(\rho^{\ }_{\mathrm{ML}})=\max_{\rho}\{f(\rho)\}\,. (51)

Then, the sets of quantum states defined by

λ={ρ|f(ρ)λf(ρML)}with0λ1\mathcal{R}_{\lambda}={\left\{\rho|f(\rho)\geq\lambda f(\rho^{\ }_{\mathrm{ML}})\right\}}\quad\mbox{with}\quad 0\leq\lambda\leq 1 (52)

are nested regions in the state space with λλ{\mathcal{R}_{\lambda}\subset\mathcal{R}_{\lambda^{\prime}}} if λ>λ{\lambda>\lambda^{\prime}}; λ=1\mathcal{R}_{\lambda=1} contains only ρML\rho^{\ }_{\mathrm{ML}} and λ=0\mathcal{R}_{\lambda=0} is the whole state space. It is easy to check whether a particular ρ\rho is in λ\mathcal{R}_{\lambda} or not.

For lack of a better alternative, we borrow the terminology from [38] and call

sλ=λ(dρ)0(dρ)andcλ=λ(dρ)f(ρ)0(dρ)f(ρ)s_{\lambda}^{\ }=\frac{\int_{\mathcal{R}_{\lambda}}({\mathrm{d}}\rho)}{\int_{\mathcal{R}_{0}}({\mathrm{d}}\rho)}\quad\mbox{and}\quad c_{\lambda}^{\ }=\frac{\int_{\mathcal{R}_{\lambda}}({\mathrm{d}}\rho)\,f(\rho)}{\int_{\mathcal{R}_{0}}({\mathrm{d}}\rho)\,f(\rho)} (53)

the size and the credibility of λ\mathcal{R}_{\lambda}, respectively, and we evaluate both integrals by a Monte Carlo integration.181818The term was coined by Stanisław Ulam [39]. The method dates back to 1949 [40]; for a comprehensive textbook exposition, see [41], for example. By counting how many ρ \rho\rule{1.00006pt}{0.0pt}s from a uniform sample are in λ\mathcal{R}_{\lambda}, we get a value for sλs_{\lambda}^{\ } as the fractional count; likewise, by counting how many ρ \rho\rule{1.00006pt}{0.0pt}s from the target sample are in λ\mathcal{R}_{\lambda}, we get a value for cλc_{\lambda}^{\ }.

The accuracy of the sλs_{\lambda}^{\ } and cλc_{\lambda}^{\ } values is determined by the sampling error and that is small if the sample is large (10610^{6} entries, say) and of good quality;191919Sampling errors are discussed in section VI A 2 in [14], for example. see section 6. Since the uniform sample has no quality issues, it provides a reference for the target sample through the identity

cλ=λsλ+λ1dλsλ01dλsλ,c_{\lambda}^{\ }=\frac{\lambda s_{\lambda}^{\ }+\int_{\lambda}^{1}{\mathrm{d}}\lambda^{\prime}\,s_{\lambda^{\prime}}^{\ }}{\int_{0}^{1}{\mathrm{d}}\lambda^{\prime}\,s_{\lambda^{\prime}}^{\ }}\,, (54)

which links the credibility as a function of λ\lambda to the size. Note that the sλs_{\lambda} values for the target distribution f(ρ)f(\rho) as well as the cλc_{\lambda} values from (54) can be computed before any sampling from the target distribution is performed.

Good agreement between the reliable credibility values provided by (54) and those obtained by the Monte Carlo integration of (53) confirms that the target sample is of good quality. More specifically, the quality assessment proceeds from regarding cλc_{\lambda}^{\ } from (54) as exact202020When estimating cλc^{\ }_{\lambda} from a large uniform sample, we are accepting a small negative bias that is, however, of no consequence here. The finite difference between the estimated and the actual values of cλc^{\ }_{\lambda} do matter, however. See the Appendix for details. and the values obtained from the target sample as estimates,

 cλ ^=1Ntgtk=1Ntgtχ(f(ρk)>λf(ρML))withχ(A)={1if A true0if A false},\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}=\frac{1}{N_{\mathrm{tgt}}}\sum_{k=1}^{N_{\mathrm{tgt}}}{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}\mathopen{\scalebox{1.2}{$($}}f(\rho_{k})>\lambda f(\rho^{\ }_{\mathrm{ML}})\mathclose{\scalebox{1.2}{$)$}}\quad\textrm{with}\ {\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(A)={\left\{\begin{array}[]{ll}1&\mbox{if $A$ true}\\ 0&\mbox{if $A$ false}\end{array}\right\}}\,, (55)

where the sum is over all ρk\rho_{k}s in the target sample, which has NtgtN_{\mathrm{tgt}} entries. This is an unbiased estimator, 𝔼( cλ ^)=cλ\mathbb{E}{\left(\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}\rule{0.0pt}{0.0pt}\right)}=c_{\lambda}^{\ }, with the variance 𝔼( cλ ^2)cλ2=cλ(1cλ)/Ntgt\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{2}}\mathclose{\scalebox{1.2}{$)$}}-c^{2}_{\lambda}=c_{\lambda}^{\ }(1-c^{\ }_{\lambda})/N_{\mathrm{tgt}}. For the comparison of  cλ ^\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}} with cλc_{\lambda}, then, we use the mean squared error

Q=01dλ( cλ ^cλ)2,Q=\int_{0}^{1}{\mathrm{d}}\lambda\,\mathopen{\scalebox{1.2}{$($}}\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}-c_{\lambda}\mathclose{\scalebox{1.2}{$)$}}^{2}\,, (56)

which has the expected value

𝔼(Q)=01dλ𝔼(( cλ ^cλ)2)=1Ntgt01dλcλ(1cλ)\mathbb{E}{\left(Q\rule{0.0pt}{0.0pt}\right)}=\int_{0}^{1}{\mathrm{d}}\lambda\;\mathbb{E}\mathopen{\scalebox{1.4}{$($}}{\mathopen{\scalebox{1.2}{$($}}\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}-c_{\lambda}\mathclose{\scalebox{1.2}{$)$}}^{2}}\mathclose{\scalebox{1.4}{$)$}}=\frac{1}{N_{\mathrm{tgt}}}\int_{0}^{1}{\mathrm{d}}\lambda\,c_{\lambda}(1-c_{\lambda}) (57)

and the variance

 𝔼(Q2)𝔼(Q)2=2Ntgt201dλ01dλc<2(1c>)2+1Ntgt301dλ01dλc<(1c>)(14c<2c>+6c<c>),\begin{array}[b]{rcl}\rule{10.00002pt}{0.0pt}\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{Q^{2}}\mathclose{\scalebox{1.2}{$)$}}-\mathbb{E}{\left(Q\rule{0.0pt}{0.0pt}\right)}^{2}&=&\displaystyle\frac{2}{N_{\mathrm{tgt}}^{2}}\int_{0}^{1}{\mathrm{d}}\lambda\int_{0}^{1}{\mathrm{d}}\lambda^{\prime}\,c_{<}^{2}(1-c_{>}^{\ })^{2}\\ &&\displaystyle\mbox{}+\frac{1}{N_{\mathrm{tgt}}^{3}}\int_{0}^{1}{\mathrm{d}}\lambda\int_{0}^{1}{\mathrm{d}}\lambda^{\prime}\,c_{<}^{\ }(1-c_{>}^{\ })(1-4c_{<}^{\ }-2c_{>}^{\ }+6c_{<}^{\ }c_{>}^{\ })\,,\end{array} (58)

where c<=min{cλ,cλ}c_{<}^{\ }=\min\{c_{\lambda}^{\ },c_{\lambda^{\prime}}^{\ }\} and c>=max{cλ,cλ}c_{>}^{\ }=\max\{c_{\lambda}^{\ },c_{\lambda^{\prime}}^{\ }\}. We consider the target sample to be of good quality if its QQ value differs from 𝔼(Q)\mathbb{E}{\left(Q\rule{0.0pt}{0.0pt}\right)} by less than two standard deviations, and of very good quality if the difference is less than one standard deviation.212121More sophisticated tests are possible, among them the Kolmogorov–Smirnov test [42, 43]; see section 15.4 in [27], for instance. We are not exploring such other possibilities here. In practice, we calculate the QQ values with the approximate cλc_{\lambda} values obtained via (54) from the sλs_{\lambda} values that, in turn, we estimate from a large uniform sample, and the difference between the approximate and the actual cλc_{\lambda} adds an additional term to 𝔼(Q)\mathbb{E}{\left(Q\rule{0.0pt}{0.0pt}\right)} of (57); see (79) in the appendix and figure 5 in section 7.

6 The curse of dimensionality

As discussed in the preceding section, the fractional counts that give us approximate values for cλc^{\ }_{\lambda} are unbiased estimators for the actual values with standard deviations of [cλ(1cλ)/Ntgt]12[c^{\ }_{\lambda}(1-c^{\ }_{\lambda})/N_{\mathrm{tgt}}]^{\frac{1}{2}}. Likewise the approximate values for sλs^{\ }_{\lambda} have standard deviations of [sλ(1sλ)/Nufm]12[s^{\ }_{\lambda}(1-s^{\ }_{\lambda})/N_{\mathrm{ufm}}]^{\frac{1}{2}}, where NufmN_{\mathrm{ufm}} is the total number of ρ \rho\rule{1.00006pt}{0.0pt}s in the uniform sample; we take for granted that NufmN_{\mathrm{ufm}} is large enough that the sampling error in sλs^{\ }_{\lambda}, and the propagated error in cλc_{\lambda}^{\ } of (54) can be ignored. We now emphasize that the accuracy of these estimates is determined by the sample size and is independent of the dimension of the quantum state space. In this regard, then, these estimates do not fall prey to the “curse of dimensionality” that Richard Bellman observed [44]. But we cannot really escape from the curse, which is known to affect all major sampling methods,222222All samples for higher-dimensional spaces that we are aware of, the samples in the repository [45] among them, are specified by the sampling algorithm, not by a target distribution. including the rejection sampling of section 4, and also occurs in other domains, such as optimization, function approximation, numerical integration, and machine learning [46].

Let us first consider storage requirements. We need 8 bytes of memory for one double precision number, which requires 24 bytes for one single-qubit state, 120 bytes for one two-qubit state, 504 bytes for one three-qubit state, 2040 bytes = 2.04 kB for one four-qubit state, and so forth, picking up a factor 4\gtrsim 4 for each additional qubit: growth proportional to m2m^{2}, the dimension of the state space. This linear-in-m2m^{2} increase in memory per quantum state is, however, paired with an exponential decrease of the overall acceptance probability during the rejection sampling. Suppose we have a pretty good proposal distribution that is a 90% match in every dimension, then — roughly speaking — the overall match is 0.93=0.729{0.9^{3}=0.729} for a single qubit, 0.915=0.206{0.9^{15}=0.206} for a qubit pair, 0.963=1.31×103{0.9^{63}=1.31\times 10^{-3}} for three-qubit states, and a dismal 0.9255=2.15×1012{0.9^{255}=2.15\times 10^{-12}} for four-qubit states. Accordingly, a target sample of 10510^{5} states requires a proposal sample stored in 3.3MB3.3\,\mathrm{MB}, 58MB58\,\mathrm{MB}, 38GB38\,\mathrm{GB}, and 95EB95\,\mathrm{EB}, respectively. Even if cleverly chosen proposal distributions can gain a few powers of 1010 on these rough estimates, it is clear that the curse of dimensionality prevents us from generating useful target samples for many-qubit systems by the rejection sampling of section 4.

Similarly, the CPU-cost increases with the dimension quite dramatically. There is the positivity check that ensures f(ρ)=0{f(\rho)=0} if ρ0{\rho\not\geq 0}, which has a CPU-cost proportional m3m^{3} for a m×mm\times m matrix, and finding the value of CC in (50) has a CPU-cost proportional to the size of the proposal sample if we generously assume that the evaluation of gs,κ(ρ)g_{\mathrm{s},\kappa}(\rho) and f(ρ)f(\rho) is CPU-cheap. Clearly, the CPU-cost is forbiddingly large for many-qubit systems with m=2nqb{m=2^{n_{\mathrm{qb}}}}. We can, however, avoid the need for checking that ρ0{\rho\geq 0} by choosing Δρ=0{\Delta\rho=0} in (40), while accepting the price of a smaller yield of the rejection sampling or more steps of the evolution algorithm.

The examples in section 7 demonstrate that, when using the modified Wishart distribution of (40) as the proposal distribution, we manage to sample efficiently one-, two-, and three-qubit states. For four-qubit states, with our limited computational power, we can only sample a certain class of simple target distributions by the rejection sampling. Thus, although the sampling algorithm introduced here is superior to other methods — in particular, we obtain uncorrelated samples — yet better methods for sampling from higher-dimensional state spaces are in demand. In a separate paper we combine Wishart distributions with sequentially constrained Monte Carlo sampling [47, 48] and so manage to sample higher-dimensional quantum systems rather efficiently [26].

7 Results

First, through examples of sampling qubit states, we will illustrate in detail how to make use of the Wishart distribution for sampling quantum states with a given target distribution. Following that, we present examples of sampling two-qubit states. Although algorithms for reliably sampling such low-dimensional quantum states exist (for example, the HMC sampling method works well for system with dimension m<6{m<6} [16, 18]), our method has the advantage of producing non-correlated samples. Next, we apply our method to the sampling of three- and four-qubit systems which are notoriously difficult for other sampling algorithms, and the method introduced here, while applicable, suffers from a high rejection rate.

7.1 Qubits

Assume that qubits are measured by a tetrahedral POM with the probabilities[49]

p1=14(1+xyz3),p3=14(1+zxy3),p2=14(1+yzx3),p4=14(1+x+y+z3),\begin{array}[b]{rclcrcl}p_{1}&=&\displaystyle\frac{1}{4}\Bigl{(}1+\frac{x-y-z}{\sqrt{3}}\Bigr{)}\,,&&p_{3}&=&\displaystyle\frac{1}{4}\Bigl{(}1+\frac{z-x-y}{\sqrt{3}}\Bigr{)}\,,\\[8.61108pt] p_{2}&=&\displaystyle\frac{1}{4}\Bigl{(}1+\frac{y-z-x}{\sqrt{3}}\Bigr{)}\,,&&p_{4}&=&\displaystyle\frac{1}{4}\Bigl{(}1+\frac{x+y+z}{\sqrt{3}}\Bigr{)}\,,\end{array} (59)

where ρ\rho is parameterized as in (30). Accordingly, we have K=4{K=4} in (41), (43), and (45). For the observed counts of detection events 𝝂=(ν1,ν2,ν3,ν4)\boldsymbol{\nu}=(\nu_{1},\nu_{2},\nu_{3},\nu_{4}), the probability element of the target distribution is

(dρ)f(ρ)dxdydzη(1x2y2z2)p1ν1p2ν2p3ν3p4ν4({\mathrm{d}}\rho)\,f(\rho)\propto{\mathrm{d}}x\,{\mathrm{d}}y\,{\mathrm{d}}z\,\eta\bigl{(}1-x^{2}-y^{2}-z^{2}\bigr{)}\,p_{1}^{\nu_{1}}p_{2}^{\nu_{2}}p_{3}^{\nu_{3}}p_{4}^{\nu_{4}} (60)

where the step function selects the permissible x,y,zx,y,z values, those of the unit Bloch ball. As noted above, we do not need to calculate the normalization factor, as it plays no role in (50).

Because the POM is symmetric — technically speaking, it is a 2-design [50] — in the simplest, if untypical, scenario where each outcome is observed equally often, the target distribution is peaked at the completely mixed state ρpeak=12𝟏2{\rho_{\mathrm{peak}}=\frac{1}{2}\mathbf{1}_{2}}; f(ρ)f(\rho) is not isotropic, however, as the tetrahedron has privileged directions in the Bloch ball. We use the proposal distribution of (40) with m=2{m=2}, Σ𝟏2{\Sigma\doteq\mathbf{1}_{2}}, Δρ=0{\Delta\rho=0}, and κ>0{\kappa>0}, that is: an isotropic Wishart distribution with an admixture of the uniform distribution; the uniform-distribution component ensures g(ρ)>f(ρ){g(\rho)>f(\rho)} in the tails. The yield — the acceptance rate of the rejection sampling — depends on the parameter nn that controls the width of the peak. For example, a rather high acceptance rate of Pacc=60.8%P_{\mathrm{acc}}=60.8\% is obtained with κ=0.1{\kappa=0.1} and n=14{n=14} for 𝝂={25,25,25,25}{\boldsymbol{\nu}=\{25,25,25,25\}}; by contrast, we have Pacc=33%P_{\mathrm{acc}}=33\% for n=18{n=18}, which is thus a worse choice for nn here. The top and middle rows in figure 4 refer to these nn values.

Refer to caption

Figure 4: Line densities of the qubit target distribution f(ρ)f(\rho) in (60) (black) and the proposal distribution gs,κ(ρ)g_{\mathrm{s},\kappa}(\rho) in (40) with m=2{m=2} (blue) for ρ(t)=12(𝟏2+t𝒆𝝈){\rho(t)=\frac{1}{2}(\mathbf{1}_{2}+t\vecfont{e}\cdot\boldsymbol{\sigma})} with a unit vector 𝒆\vecfont{e}. The target distributions (solid black) are rescaled by a factor of maxt{f(ρ(t))/g(ρ(t))}\max_{t}\{f\bigl{(}\rho(t)\bigr{)}/g\bigl{(}\rho(t)\bigr{)}\} for each line, with the arrows pointing to where the blue and solid-black curves coincide; the dashed black curve in the middle-right plot shows the target distribution rescaled for matched peaks. In each plot, the value of p𝒆p_{\vecfont{e}} is the acceptance rate of (61), the ratio of the areas under the solid black and blue curves. Top row and middle row: The target distribution is for 𝝂={25,25,25,25}{\boldsymbol{\nu}=\{25,25,25,25\}}, and the isotropic proposal distributions have Σ=𝟏2{\Sigma=\mathbf{1}_{2}} and Δρ=0{\Delta\rho=0} as well as κ=0.1\kappa=0.1 and n=14n=14 or 1818, respectively. The unit vectors 𝒆\vecfont{e} are chosen at random. The overall acceptance rates are Pacc=60.8%P_{\mathrm{acc}}=60.8\% (top row) and Pacc=33%P_{\mathrm{acc}}=33\% (middle row). Bottom row: The target distribution is for 𝝂={10,20,25,45}\boldsymbol{\nu}=\{10,20,25,45\}, the proposal distribution has Σ=𝟏2\Sigma=\mathbf{1}_{2}, κ=0.2\kappa=0.2, n=13n=13, and Δρ0\Delta\rho\neq 0 such that the peaks of the target and proposal distributions are at the same location; the overall acceptance rate is Pacc=28.6%P_{\mathrm{acc}}=28.6\%. In the left plot, the unit vector 𝒆\vecfont{e} points from the center of the Bloch ball to the peak location, whereas we have a randomly chosen 𝒆\vecfont{e} in the right plot; note the smaller range of values in the right plot.

In figure 4 we exploit our knowledge of the exact forms of the target distribution f(ρ)f(\rho) and the proposal distribution g(ρ)g(\rho) for a comparison of the distributions along diameters of the Bloch ball. More specifically, the qubit states considered are ρ(t)=12(𝟏2+t𝒆𝝈)\rho(t)=\frac{1}{2}(\mathbf{1}_{2}+t\vecfont{e}\cdot\boldsymbol{\sigma}) with a unit vector 𝒆\vecfont{e} and 1t1{-1\leq t\leq 1}. Accordingly, the acceptance rate for this one-parameter family of states is

p𝒆=maxt{f(ρ(t))g(ρ(t))}111dtf(ρ(t))11dtg(ρ(t)).p_{\vecfont{e}}=\max_{t}\Biggl{\{}\frac{f\mathopen{\scalebox{1.1}{$($}}\rho(t)\mathclose{\scalebox{1.1}{$)$}}}{g\mathopen{\scalebox{1.1}{$($}}\rho(t)\mathclose{\scalebox{1.1}{$)$}}}\Biggr{\}}^{-1}\frac{\mathop{\scalebox{1.3}{$\int$}}_{\!\!\!-1}^{1}{\mathrm{d}}t\,f\mathopen{\scalebox{1.1}{$($}}\rho(t)\mathclose{\scalebox{1.1}{$)$}}}{\mathop{\scalebox{1.3}{$\int$}}_{\!\!\!-1}^{1}{\mathrm{d}}t\,g\mathopen{\scalebox{1.1}{$($}}\rho(t)\mathclose{\scalebox{1.1}{$)$}}}\,. (61)

We note in passing that suitable analogous expressions work also for higher dimensional systems.

As the top and middle rows in figure 4 confirm, we have a better overall yield for n=14n=14 than for n=18n=18; for n=18n=18, where the width of the Wishart distribution well matches that of the target distribution, the almost perfect yields for some directions 𝒆\vecfont{e}, as in the middle-left example, does not compensate for the low yield for other diameters, as in the middle-right example. The middle-right plot also indicates why n=18n=18 is a poor choice, as the maximum of f(ρ(t))/g(ρ(t))f\mathopen{\scalebox{1.1}{$($}}\rho(t)\mathclose{\scalebox{1.1}{$)$}}/g\mathopen{\scalebox{1.1}{$($}}\rho(t)\mathclose{\scalebox{1.1}{$)$}} occurs in the tails of the distributions, not near the peak of f(ρ(t))f\mathopen{\scalebox{1.1}{$($}}\rho(t)\mathclose{\scalebox{1.1}{$)$}}; recall footnote 14 in this context.

In the typical experimental scenario, the counts of measurement clicks are not balanced among the POM outcomes, the target distribution does not peak at the center of the Bloch ball, and is not approximately isotropic. We use 𝝂={10,20,25,45}{\boldsymbol{\nu}=\{10,20,25,45\}} to illustrate this situation in the bottom row of figure 4; here the peak of f(ρ)f(\rho) is at distance 0.88320.8832 from the center of the Bloch sphere. We can match the peak of the Wishart distribution by either changing the covariance matrix Σ\Sigma in accordance with (25), or by a suitable shift Δρ\Delta\rho, or by a combination of both. We find that Σ=𝟏2{\Sigma=\mathbf{1}_{2}} in conjunction with Δρ=ρpeak{\Delta\rho=\rho_{\mathrm{peak}}} is very effective as this provides a high acceptance rate. For the event counts stated above and this shift, an acceptance rate of Pacc=28.6%{P_{\mathrm{acc}}=28.6\%} is achieved by using a Wishart sample with κ=0.2{\kappa=0.2} and n=13{n=13} — this is, in fact, a rather high acceptance rate in view of the many ρ\rhos in the Wishart sample that the shift renders unphysical. We use a larger admixture of the uniform sample here as we need to fill in the void left behind after the shift by Δρ\Delta\rho. The bottom row in figure 4 shows the distribution along the diameter through the peak location (left) and along another, randomly chosen, diameter (right).

Refer to caption

Figure 5: Verification for qubit samples. Top row: Samples generated for data 𝝂={25,25,25,25}{\boldsymbol{\nu}=\{25,25,25,25\}} (left) and 𝝂={10,20,25,45}{\boldsymbol{\nu}=\{10,20,25,45\}} (right). The size sλs_{\lambda}^{\ } and the regarded-as-exact credibility cλc_{\lambda}^{\ } of (54) are evaluated with the aid of uniformly distributed samples of (Nufm=107{N_{\mathrm{ufm}}=10^{7}}). The estimated credibility  cλ ^\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}} is obtained from target samples with Ntgt=500N_{\mathrm{tgt}}=500, 1 0001\rule{1.00006pt}{0.0pt}000, 5 0005\rule{1.00006pt}{0.0pt}000, or 10 00010\rule{1.00006pt}{0.0pt}000 entries. Middle row: Ratio of the expected and the exact value of  cλ ^\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}} (left, for a single target sample); standard deviation of  cλ ^\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}} (right, averaged over 100100 target samples); for 𝝂={10,20,25,45}\boldsymbol{\nu}=\{10,20,25,45\} and target samples with Ntgt=500{N_{\mathrm{tgt}}=500}, 1 0001\rule{1.00006pt}{0.0pt}000, 2 0002\rule{1.00006pt}{0.0pt}000, 5 0005\rule{1.00006pt}{0.0pt}000, or 10 00010\rule{1.00006pt}{0.0pt}000 entries each. There are values for λ=0.0,0.1,0.2,,1.0\lambda=0.0,0.1,0.2,\dots,1.0, connected by straight lines that guide the eye. The smooth dashed yellow curve on the right shows the square root of the variance for Ntgt=2 000{N_{\mathrm{tgt}}=2\rule{1.00006pt}{0.0pt}000}; see after (55). Bottom: Histograms of NtgtQN_{\mathrm{tgt}}Q values for 1 0001\rule{1.00006pt}{0.0pt}000 samples of Ntgt=101N_{\mathrm{tgt}}=10^{1}, 10210^{2}, 10310^{3}, 10410^{4}, and 10510^{5} for 𝝂={10,20,25,45}{\boldsymbol{\nu}=\{10,20,25,45\}}. The purple vertical lines indicate 0, 1010, 2020, or 3030 counts in the histogram bins. The circles show the mean values for each histogram, to which the blue curve is fitted in accordance with (79), with 0.155670.15567 (dashed horizontal line) and 5.245×1075.245\times 10^{-7} for the values of the two integrals.

To verify the sample we evaluate the size and credibility of the bounded likelihood regions as described in section 5. The size sλs_{\lambda} is estimated from a uniform sample with ten million entries. The regarded-as-exact cλc_{\lambda} values then obtained from (54) are used for reference (dashed black curves in figure 5, top row). We numerically evaluate cλc_{\lambda} using (55) for Ntgt=500N_{\mathrm{tgt}}=500, 1 0001\rule{1.00006pt}{0.0pt}000, 5 0005\rule{1.00006pt}{0.0pt}000, and 10 00010\rule{1.00006pt}{0.0pt}000 and compare it with the regarded-as-exact values in the top row in figure 5. The middle row in figure 5, with data for target sample sizes of Ntgt=500N_{\mathrm{tgt}}=500, 1 0001\rule{1.00006pt}{0.0pt}000, 2 0002\rule{1.00006pt}{0.0pt}000, 5 0005\rule{1.00006pt}{0.0pt}000, and 10 00010\rule{1.00006pt}{0.0pt}000, shows that our method of sampling is reliable and the standard deviation of cλc_{\lambda} is proportional to Ntgt12N_{\mathrm{tgt}}^{-\frac{1}{2}} as expected. The bottom plot in figure 5 confirms (79) in the appendix, which corrects (57) by accounting for the difference between the regarded-as-exact cλc_{\lambda} values and the actual ones. We note further that the standard deviations of the histograms are approximately equal to their mean values (indicated by circles), so that the criterion stated after (58) tells us that all samples with QQ values [cf. (56)] less than twice the mean value are of very good quality.

7.2 Qubit pairs

For simulated data for measurements on qubit pairs, we use the probabilities of a double tetrahedral measurement with the sixteen probability operators

Πk=Πl(1)Πm(2)with l,m=1,2,3,4,\Pi_{k}=\Pi_{l}^{(1)}\otimes\Pi_{m}^{(2)}\quad\mbox{with\ }l,m=1,2,3,4\,, (62)

where Πl(1)\Pi_{l}^{(1)} and Πm(2)\Pi_{m}^{(2)} make up the single-qubit tetrahedron POMs for the probabilities in (59), and the index k=4(l1)+mk=4(l-1)+m covers the integers in the rage 1k16=K{1\leq k\leq 16=K}. For measurement data 𝝂={ν1,ν2,,ν16}\boldsymbol{\nu}=\{\nu_{1},\nu_{2},\dots,\nu_{16}\}, the target distribution has the form of (45), (dρ)f(ρ)(dρ)k=116tr(Πkρ)νk({\mathrm{d}}\rho)\,f(\rho)\propto({\mathrm{d}}\rho)\,\prod_{k=1}^{16}\mathrm{tr}{\left(\Pi_{k}\rho\rule{0.0pt}{0.0pt}\right)}^{\nu_{k}}.

First, we compare the performance of a proposal sample from the uniform distribution with one from the Wishart distribution for the centered, symmetric target distributions that refer to fictitious measurements with the same number of events for each outcome, that is ν1=ν2==ν16=ν¯\nu_{1}=\nu_{2}=\cdots=\nu_{16}=\overline{\nu}; see figure 6. The acceptance rate PaccP_{\mathrm{acc}} obtained from using a uniformly distributed proposal sample decreases exponentially as the measurement count ν¯\overline{\nu} increases and the target distribution becomes correspondingly narrower. By contrast, when we admix a sample drawn from the Wishart distribution, the optimal acceptance rate decreases to about 0.1%0.1\% at around ν¯=20\overline{\nu}=20 and then it increases slowly to about 1%1\% for ν¯=100\overline{\nu}=100 and remains at around this value for even larger ν¯\overline{\nu}s. For example, an acceptance rate of Pacc=0.48%P_{\mathrm{acc}}=0.48\% is obtained by adding 20%20\% of W4(Q)(6,𝟏4){\mathrm{W}}^{(\mathrm{Q})}_{4}(6,\mathbf{1}_{4}) for ν¯=10\overline{\nu}=10, Pacc=0.091%P_{\mathrm{acc}}=0.091\% is obtained by adding 50%50\% of W4(Q)(8,𝟏4){\mathrm{W}}^{(\mathrm{Q})}_{4}(8,\mathbf{1}_{4}) for ν¯=20\overline{\nu}=20, and Pacc=0.64%P_{\mathrm{acc}}=0.64\% is obtained by adding 90%90\% of W4(Q)(35,𝟏4){\mathrm{W}}^{(\mathrm{Q})}_{4}(35,\mathbf{1}_{4}) for ν¯=100\overline{\nu}=100. Whereas, the acceptance rates for the uniform proposal samples are Pacc=5.4×104P_{\mathrm{acc}}=5.4\times 10^{-4}, 2.3×1052.3\times 10^{-5}, and 1.0×1081.0\times 10^{-8} for ν¯=10\overline{\nu}=10, 2020 and 100100, respectively. The use of a proposal sample from the Wishart distribution clearly gives a much larger acceptance rate for such a peaked target distribution than the uniform proposal sample.

Refer to caption

Figure 6: Acceptance rate for sampling two-qubit states with target distribution for the data 𝝂={νk=ν¯;k=1,2,,16}\boldsymbol{\nu}=\{\nu_{k}=\overline{\nu};k=1,2,\dots,16\}. The blue curve and the red curve show the acceptance as a function of ν¯\overline{\nu} with a uniformly distributed proposal sample and the optimal result given by adding in a suitable Wishart distribution, respectively. Each data point of the acceptance rate is obtained with 10810^{8} proposal samples.

While the use of Wishart distribution can increase the acceptance rate, to reach the optimal acceptance rate one needs to find the most suitable Wishart distribution to use. Fortunately, this optimization is not difficult because one observes that the optimal number of columns noptn_{\mathrm{opt}} scales linearly with the total number of counts; see figure 7. Moreover, we find that this proportionality of noptn_{\mathrm{opt}} and NN holds quite generally when one samples non-centered target distributions. When NN is large, the acceptance rate can be rather robust against the choice of nn because the peak width depends only weakly on this parameter; see section 2.5.

Refer to caption

Figure 7: The optimal number of columns noptn_{\mathrm{opt}} of the Wishart distribution for sampling one qubit (blue) and two qubit (red) centered target distributions against the total number of measurements NN.

Next, let’s check the performance of the method for sampling non-centered target distributions. For illustration, we use a target distribution given by a particular random data obtained from 100100 measurements with 𝝂={10,4,6,4,7,6,5,6,5,6,10,6,5,6,8,6}\boldsymbol{\nu}=\{10,4,6,4,7,6,5,6,5,6,10,6,5,6,8,6\}. The peak of the target distribution in the probability simplex corresponds to a non-physical state, and the peak in the physical space is given by the maximum-likelihood estimator ρML\rho^{\ }_{\mathrm{ML}} which is a rank-3 state with eigenvalues {0.5033,0.3377,0.1589,0}\{0.5033,0.3377,0.1589,0\}.

As discussed in section 2.8, we can adjust the peak location of the proposal distribution by a suitable shift. We generate the proposal distribution by the two steps in (39). First, we chose a state ρpeak=x1ρML+14(1x1)𝟏4\rho_{\mathrm{peak}}^{\prime}=x_{1}\rho^{\ }_{\mathrm{ML}}+\frac{1}{4}(1-x_{1})\mathbf{1}_{4} to be the peak of the Wishart distribution W4(Q)(n,Σ){\mathrm{W}}^{(\mathrm{Q})}_{4}(n,\Sigma) and find the corresponding Σ\Sigma to produce a preliminary proposal sample. Then, we shift the sample states by Δρ=x2(ρML14𝟏4)\Delta\rho=x_{2}(\rho^{\ }_{\mathrm{ML}}-\frac{1}{4}\mathbf{1}_{4}) as in (37), after which we have a sample that is peaked at ρpeak=(x1+x2)ρML+14(1x1x2)𝟏4\rho_{\mathrm{peak}}=(x_{1}+x_{2})\rho^{\ }_{\mathrm{ML}}+\frac{1}{4}(1-x_{1}-x_{2})\mathbf{1}_{4}; to this we admix a fraction κ\kappa from the uniform sample and arrive at a proposal sample in accordance with the distribution gs,κ(ρ)g_{\mathrm{s},\kappa}(\rho) of (40). When x1+x2=1x_{1}+x_{2}=1, the peak of the proposal distribution coincides with the maximum-likelihood estimator. However, we do not need the peaks to coincide exactly, instead, they just need to be close enough to give a good acceptance rate.

Refer to caption

Figure 8: Left: The acceptance rate PaccP_{\mathrm{acc}} against 1κ1-\kappa for a proposal sample with 10810^{8} states obtained from a shifted non-centered Wishart distribution W4(Q)(5,Σ){\mathrm{W}}^{(\mathrm{Q})}_{4}(5,\Sigma). Right: The cross-section density of the reference and target distributions for states of the form of ρ(x)=xρML+14(1x)𝟏4\rho(x)=x\rho^{\ }_{\mathrm{ML}}+\frac{1}{4}(1-x)\mathbf{1}_{4}.

Refer to caption

Figure 9: The credibility cλc_{\lambda} evaluated for different sample sizes. 10610^{6} sample states from the uniform distribution are used to evaluate the size and the ‘exact’ values of the credibility given by the dashed curve.

For example, we achieve an acceptance rate of Pacc>0.5%{P_{\mathrm{acc}}>0.5\%} with a proposal distribution obtained for κ=0.6{\kappa=0.6}, n=5{n=5}, x1=0.75{x_{1}=0.75}, and x2=0.15{x_{2}=0.15}; see figure 8. For the verification of the sample, we evaluate the credibility cλc_{\lambda} for different sample sizes and compare it to the regarded-as-exact value computed from sλs_{\lambda}; see figure 9. The result shows that our method is reliable for sampling two-qubit systems and 10 00010\rule{1.00006pt}{0.0pt}000 sample points are enough for the estimation of physical quantities, such as the credibility, with an error less than 0.1%. For this particular example, the acceptance rate for a uniformly distributed proposal distribution is about 0.06% which is smaller by about a factor of ten. Thus, using the Wishart distribution does improve the acceptance rate but the improvement is not significant enough when applied to a target distribution that is not narrowly peaked, such as the one for N=100{N=100} measurements. The gain in efficiency is much more remarkable when the target distributed is more peaked. For instance, if the target distribution is given by N=1 000{N=1\rule{1.00006pt}{0.0pt}000} measurements with the same frequency of clicks for each POM outcome, an acceptance rate of larger than 10510^{-5} is achieved with a proposal distribution drawn from W4(Q)(10,Σ){\mathrm{W}}^{(\mathrm{Q})}_{4}(10,\Sigma) with x1=0.8{x_{1}=0.8} and x2=0.15{x_{2}=0.15}. This is at least 10310^{3} times more efficient than using a uniform proposal distribution which fails to produce any accepted sample point for as many as 10810^{8} proposal states.

Table 1: CPU time taken for generating a sample of 10510^{5} states with different sampling strategies. All the tasks are CPU-parallelized and run on the same regular desktop. (a) Two-qubit sample with νk=10{\nu_{k}=10} for k=1,,16{k=1,\dots,16}; (b) two-qubit sample with 𝝂={10,4,6,4,7,6,5,6,5,6,10,6,5,6,8,6}{\boldsymbol{\nu}=\{10,4,6,4,7,6,5,6,5,6,10,6,5,6,8,6\}}; (c) two-qubit sample with each νk\nu_{k} equal to ten times the value of example (b); (d) three-qubit sample with νk=10{\nu_{k}=10} for k=1,,64{k=1,\dots,64}; (e) and (f) three- and four-qubit samples with a randomly generated count sequence of 3 000{3\rule{1.00006pt}{0.0pt}000} experiments.242424The simulated data for the three-qubit example are 𝝂={36,13,64,71, 14,16,7,15, 60,10,84,63, 64,9,55,71,8,12,10,16, 16,48,67,62,9,64,75,63, 10,74,60,73,65,14,62,66,9,57,76,53, 82,78,128,22, 61,44,25,27,56,12,52,66, 14,76,56,78, 45,47,22,27, 66,68,25,102}\begin{array}[]{r@{}l}\boldsymbol{\nu}=\{&36,13,64,71,\ 14,16,\phantom{0}7,15,\ 60,10,\phantom{0}84,63,\ 64,\phantom{0}9,55,71,\\ &\phantom{0}8,12,10,16,\ 16,48,67,62,\ \phantom{0}9,64,\phantom{0}75,63,\ 10,74,60,73,\\ &65,14,62,66,\ \phantom{0}9,57,76,53,\ 82,78,128,22,\ 61,44,25,27,\\ &56,12,52,66,\ 14,76,56,78,\ 45,47,\phantom{0}22,27,\ 66,68,25,102\}\end{array} with N=3 000{N=3\rule{1.00006pt}{0.0pt}000}. The 6464 counts refer to the three-qubit analog of (62). For these data, ρML\rho^{\ }_{\mathrm{ML}} is rank deficient.
Remarks:  \star\;the acceptance rate is too low, less than 11 in 10810^{8}, – the strategy is not reliable, {}^{\dagger}\;the acceptance rate is about 1.2×1051.2\times 10^{-5}, {}^{{\dagger}{\dagger}}\;the acceptance rate is 2121 samples from 1.5×1081.5\times 10^{8} proposal samples.

When sampling from the two-qubit state space, the CPU time taken when using the Wishart distribution for the proposal is significantly lower than that for other methods. Table 1 shows that for sampling a centered distribution with N=160{N=160} and a non-centered distribution with N=100{N=100} it is a few times more efficient to use the Wishart distribution with an admixture of the uniform distribution than solely the uniform distribution; see the top-two entries in columns (a) and (b). The advantage of using the Wishart distribution can become much more prominent for sampling more peaked distributions with larger NN; see column (c). Our method is also more efficient than the Hamiltonian Monte Carlo (HMC) method that is discussed in [38]; see the third row. While the efficiency of the HMC method does not depend on NN, which can give it an advantage over the sampling from a uniform distributions for large NN, the current implementation of the HMC algorithm is only reliable for systems with low dimension, owing to issues with the stability of the algorithm; note also that HMC yields correlated samples even if the correlations are usually weaker than those in samples from other Markov chain MC methods. In the fourth row in table 1 we list the CPU time for generating samples with the sequentially constrained MC (SCMC) algorithm that we describe in [26]; it appears to outperform all other methods. If we consider the one-qubit and two-qubit situations of columns (a) and (b), however, the “Wishart-uniform method” of this paper is just as practical and requires much less effort in writing and debugging computer code; also, the “SCMC+Wishart method” of reference [26] builds on the foundations laid by the “Wishart-uniform method.”

7.3 Three and four qubits

The sampling method introduced here works for systems of any dimension — at least, there are no reasons of principle why it shouldn’t. In practice, however, the ‘curse of dimensionality’ can become a serious obstacle, due to limitations in both CPU time and memory; see section 6.

The storage aspects discussed in section 6 is much more a concern for three-qubit systems than for one-qubit or two-qubit systems. When sampling three-qubit states, we use up to a maximum number of 2.4×1082.4\times 10^{8} proposal states in each run of the algorithm, which takes up about 120 GB of storage. The first example we investigated is a centered distribution for 𝝂={νk=10fork=1,,64}\boldsymbol{\nu}=\{\nu_{k}=10\ \mathrm{for}\ k=1,\dots,64\}. No useful sample can be produced by drawing from a uniform distribution as its acceptance rate is smaller than 108/2.410^{-8}/2.4. On the other hand, when sampling with a proposal distribution composed 20% of the Wishart distribution W8(Q)(9,𝟏8){\mathrm{W}}^{(\mathrm{Q})}_{8}(9,\mathbf{1}_{8}) and 80% of the uniform distribution — κ=0.8{\kappa=0.8} in (40) — the acceptance rate is approximately 1.2×1051.2\times 10^{-5}. This allows us to obtain a sample size of 10510^{5} within 100 hours.

Reliable samples for non-centered and/or narrower distributions can also be generated from the Wishart-plus-uniform proposal distribution (40). For example, for the target distribution that corresponds to the simulated data in footnote 24, we obtain Pacc1.4×107{P_{\mathrm{acc}}\approx 1.4\times 10^{-7}} using 40% of W8(Q)(80,𝟏8){\mathrm{W}}^{(\mathrm{Q})}_{8}(80,\mathbf{1}_{8}), with shift parameters x1=0.6{x_{1}=0.6} and x2=0.35{x_{2}=0.35}, plus 60% of the uniform distribution. It takes about 2.75 hours to produce 21 sample states from 1.5×1081.5\times 10^{8} proposal states. Thus, to generate 10510^{5} samples it would take about 13 000 hours, as entered in column (e) of table 1.

For a four-qubit system, storage is even more of an issue; we can only store about 6×1076\times 10^{7} states in 120 GB of memory. We tried to sample a target distribution for N=3 000{N=3\rule{1.00006pt}{0.0pt}000} simulated counts from a Wishart distribution but failed to produce a reliable sample. This suggests that the acceptance rate is well below 1.6×1081.6\times 10^{-8} and, therefore, to sample states of such high dimension efficiently, yet other methods are to be sought. One option is the adaptation of the SCMC sampler [47, 48] to the sampling of quantum states. We managed to produce uncorrelated samples of high-dimensional quantum states in this way. The “SCMC+Wishart algorithm” is, however, beyond the scope of this paper; we deal with it in [26].

8 Summary and outlook

We established the probability distribution of the unit-trace, positive square matrices that represent quantum states, generated from the gaussian distribution for nonsquare matrices and the induced intermediate Wishart distribution for positive matrices. These distributions of quantum-state matrices are shifted suitably and supplemented with an admixture of the uniform distribution so that they can serve as tailored proposal distributions for the target distribution from which we want to sample.

The target sample is generated from a proposal sample by an accept-reject step. Since the proposal sample is uncorrelated (or i.i.d.), so is the target sample. This is a notable advantage over random walk algorithms, including that of Hamiltonian Monte Carlo, which generate correlated samples with a correspondingly reduced effective sample size.

The sampling algorithm works very well for samples from the one-qubit and two-qubit state spaces, whereas the efficiency in generating three-qubit samples is low, owing to the curse of dimensionality. We could not generate a useful sample of four-qubit states drawn from a structured target distribution. (Drawing from the uniform distribution or other highly symmetric distributions is efficient.)

The discussion here is limited to the Wishart distributions that derive from zero-mean gaussian distributions. We explore the options offered by gaussian distributions with a nonzero mean in a companion paper [28] but the increase in flexibility does not overcome the curse of dimensionality.

The general strategy explored in this paper — equip yourself with a large, CPU-cheap, uncorrelated proposal sample and turn it into a smaller target sample by rejection sampling — does not require a proposal distribution of the Wishart-uniform kind for its implementation. Other proposal distributions are possible and could very well yield substantially larger acceptance rates. While we have no particular suggestions to make, we trust that others will till this field.

There is, fortunately, solid evidence that the sequentially constrained Monte Carlo sampling algorithm succeeds in overcoming the curse of dimensionality to some extent. Rather than accepting or rejecting the quantum states in the proposal sample, the sample as a whole is processed and turned into a proper target sample step by step; this method also benefits from using a tailored Wishart distributions for the proposal. We present this approach in a separate paper [26].

Acknowledgments

S.B. is supported by a PBC postdoctoral fellowship at Tel-Aviv University. The Centre for Quantum Technologies is a Research Centre of Excellence funded by the Ministry of Education and the National Research Foundation of Singapore.

Appendix

In this appendix, we elaborate on footnote 20. As the analog of (55), we have

 sλ ^=1Nufmk=1Nufmχ(f(ρk)>λf(ρML))=1Nufmk=1Nufmχ(λ<λk)\widehat{\rule{0.70007pt}{0.0pt}s_{\lambda}\rule{0.59998pt}{0.0pt}}=\frac{1}{N_{\mathrm{ufm}}}\sum_{k=1}^{N_{\mathrm{ufm}}}{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}\mathopen{\scalebox{1.2}{$($}}f(\rho_{k})>\lambda f(\rho^{\ }_{\mathrm{ML}})\mathclose{\scalebox{1.2}{$)$}}=\frac{1}{N_{\mathrm{ufm}}}\sum_{k=1}^{N_{\mathrm{ufm}}}{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}\bigl{(}\lambda<\lambda_{k}\bigr{)} (63)

when estimating sλs^{\ }_{\lambda} from the NufmN_{\mathrm{ufm}} entries ρk\rho_{k} in the uniform sample, with

λk=f(ρk)f(ρML),0λk1.\lambda_{k}=\frac{f(\rho_{k})}{f(\rho^{\ }_{\mathrm{ML}})}\,,\quad 0\leq\lambda_{k}\leq 1\,. (64)

The λk\lambda_{k}s are independent (uncorrelated) random variables, each with the probability element

𝔼(λ<λk<λ+dλ)=dλ(sλλ),\mathbb{E}{\left(\lambda^{\prime}<\lambda_{k}<\lambda^{\prime}+{\mathrm{d}}\lambda^{\prime}\rule{0.0pt}{0.0pt}\right)}={\mathrm{d}}\lambda^{\prime}\,\biggl{(}-\frac{\partial s_{\lambda^{\prime}}}{\partial\lambda^{\prime}}\biggr{)}\,, (65)

so that, for each λk\lambda_{k},

𝔼(χ(λ<λk))=λ1dλ(sλλ)=sλ.\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda<\lambda_{k})}\mathclose{\scalebox{1.2}{$)$}}=\int_{\lambda}^{1}{\mathrm{d}}\lambda^{\prime}\,\biggl{(}-\frac{\partial s_{\lambda^{\prime}}}{\partial\lambda^{\prime}}\biggr{)}=s_{\lambda}^{\ }\,. (66)

It follows that  sλ ^\widehat{\rule{0.70007pt}{0.0pt}s_{\lambda}\rule{0.59998pt}{0.0pt}} is unbiased, 𝔼( sλ ^)=sλ\mathbb{E}{\left(\widehat{\rule{0.70007pt}{0.0pt}s_{\lambda}\rule{0.59998pt}{0.0pt}}\rule{0.0pt}{0.0pt}\right)}=s_{\lambda}^{\ }, and its variance is proportional to Nufm1N_{\mathrm{ufm}}^{-1},

𝔼( sλ ^2)𝔼( sλ ^)2=1Nufmsλ(1sλ).\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{\widehat{\rule{0.70007pt}{0.0pt}s_{\lambda}\rule{0.59998pt}{0.0pt}}^{2}}\mathclose{\scalebox{1.2}{$)$}}-\mathbb{E}{\left(\widehat{\rule{0.70007pt}{0.0pt}s_{\lambda}\rule{0.59998pt}{0.0pt}}\rule{0.0pt}{0.0pt}\right)}^{2}=\frac{1}{N_{\mathrm{ufm}}}s_{\lambda}^{\ }\mathopen{\scalebox{1.2}{$($}}1-s_{\lambda}^{\ }\mathclose{\scalebox{1.2}{$)$}}\,. (67)

Further, the numerator in (54) is

 λsλ+λ1dλsλ=λ1dλ(sλλ)λ=01dλ(sλλ)χ(λ<λ)λ=𝔼(χ(λ<λk)λk),\begin{array}[b]{rcl}\rule{30.00005pt}{0.0pt}\displaystyle\lambda s_{\lambda}^{\ }+\int_{\lambda}^{1}{\mathrm{d}}\lambda^{\prime}\,s_{\lambda^{\prime}}^{\ }&=&\displaystyle\int_{\lambda}^{1}{\mathrm{d}}\lambda^{\prime}\,\biggl{(}-\frac{\partial s_{\lambda^{\prime}}}{\partial\lambda^{\prime}}\biggr{)}\lambda^{\prime}=\int_{0}^{1}{\mathrm{d}}\lambda^{\prime}\,\biggl{(}-\frac{\partial s_{\lambda^{\prime}}}{\partial\lambda^{\prime}}\biggr{)}{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda<\lambda^{\prime})\lambda^{\prime}\\[8.61108pt] &=&\displaystyle\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda<\lambda_{k})\lambda_{k}}\mathclose{\scalebox{1.2}{$)$}}\,,\end{array} (68)

and

λ sλ ^+λ1dλsλ^=1Nufmk=1Nufmχ(λ<λk)λk,\lambda\widehat{\rule{0.70007pt}{0.0pt}s_{\lambda}\rule{0.59998pt}{0.0pt}}+\int_{\lambda}^{1}{\mathrm{d}}\lambda^{\prime}\,\widehat{s_{\lambda^{\prime}}}=\frac{1}{N_{\mathrm{ufm}}}\sum_{k=1}^{N_{\mathrm{ufm}}}{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}\bigl{(}\lambda<\lambda_{k}\bigr{)}\lambda_{k}\,, (69)

is the corresponding unbiased estimator for the numerator; for λ=0{\lambda=0}, this is the estimator for the denominator,

01dλsλ^=1Nufmk=1Nufmλk.\int_{0}^{1}{\mathrm{d}}\lambda^{\prime}\,\widehat{s_{\lambda^{\prime}}}=\frac{1}{N_{\mathrm{ufm}}}\sum_{k=1}^{N_{\mathrm{ufm}}}\lambda_{k}\,. (70)

Their ratio,

 cλ ^(ufm)=λ sλ ^+λ1dλsλ^01dλsλ^=k=1Nufmχ(λ<λk)λkl=1Nufmλl,\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})}=\frac{\lambda\widehat{\rule{0.70007pt}{0.0pt}s_{\lambda}\rule{0.59998pt}{0.0pt}}+\mathop{\scalebox{0.9}{$\displaystyle\int$}}_{\!\!\lambda}^{1}{\mathrm{d}}\lambda^{\prime}\,\widehat{s_{\lambda^{\prime}}}}{\mathop{\scalebox{0.9}{$\displaystyle\int$}}_{\!\!0}^{1}{\mathrm{d}}\lambda^{\prime}\,\widehat{s_{\lambda^{\prime}}}}=\frac{\mathop{\scalebox{0.9}{$\displaystyle\sum$}}\limits_{k=1}^{N_{\mathrm{ufm}}}{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}\bigl{(}\lambda<\lambda_{k}\bigr{)}\lambda_{k}}{\mathop{\scalebox{0.9}{$\displaystyle\sum$}}\limits_{l=1}^{N_{\mathrm{ufm}}}\lambda_{l}}\,, (71)

is what we get from (54) for the regarded-as-exact credibility.

As we shall now demonstrate,  cλ ^(ufm)\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})} has a negative bias. First, we note that

𝔼( cλ ^(ufm))=𝔼(kχ(λ<λk)λk0dαeαlλl)=k0dα𝔼(χ(λ<λk)λkeαλk)l(k)𝔼(eαλl)=Nufm0dα𝔼(χ(λ<λ)λeαλ)𝔼(eαλ)Nufm1,\begin{array}[b]{rcl}\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})}}\mathclose{\scalebox{1.2}{$)$}}&=&\displaystyle\mathbb{E}{\left(\sum_{k}{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}\bigl{(}\lambda<\lambda_{k}\bigr{)}\lambda_{k}\int\limits_{0}^{\infty}{\mathrm{d}}\alpha\,{\mathrm{e}}^{\mbox{\footnotesize$-\alpha\mathop{\scalebox{0.9}{$\displaystyle\sum$}}_{l}\lambda_{l}$}}\rule{0.0pt}{0.0pt}\right)}\\ &=&\displaystyle\sum_{k}\int\limits_{0}^{\infty}{\mathrm{d}}\alpha\,\mathbb{E}{\left({\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}\bigl{(}\lambda<\lambda_{k}\bigr{)}\lambda_{k}\,{\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda_{k}$}}\rule{0.0pt}{0.0pt}\right)}\,\prod_{l(\neq k)}\mathbb{E}{\left({\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda_{l}$}}\rule{0.0pt}{0.0pt}\right)}\\ &=&\displaystyle N_{\mathrm{ufm}}\int\limits_{0}^{\infty}{\mathrm{d}}\alpha\,\mathbb{E}{\left({\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}\bigl{(}\lambda<\lambda^{\prime}\bigr{)}\lambda^{\prime}\,{\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}\rule{0.0pt}{0.0pt}\right)}\,\mathbb{E}{\left({\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}\rule{0.0pt}{0.0pt}\right)}^{N_{\mathrm{ufm}}-1}\,,\end{array} (72)

where λ\lambda^{\prime} is a random variable with the probability element of (65) and the expected values refer to this random variable. Then, we integrate by parts and arrive at

𝔼( cλ ^(ufm))=cλ+0dα𝔼(eαλ)NufmαCλ(α),\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})}}\mathclose{\scalebox{1.2}{$)$}}=c_{\lambda}^{\ }+\int\limits_{0}^{\infty}{\mathrm{d}}\alpha\,\mathbb{E}{\left({\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}\rule{0.0pt}{0.0pt}\right)}^{N_{\mathrm{ufm}}}\frac{\partial}{\partial\alpha}C_{\lambda}(\alpha)\,, (73)

where

Cλ(α)=𝔼(χ(λ<λ)λeαλ)𝔼(λeαλ)=(1+𝔼(χ(λ>λ)λeαλ)𝔼(χ(λ<λ)λeαλ))1C_{\lambda}(\alpha)=\frac{\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda<\lambda^{\prime})\lambda^{\prime}\,{\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}}\mathclose{\scalebox{1.2}{$)$}}}{\mathbb{E}{\left(\lambda^{\prime}\,{\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}\rule{0.0pt}{0.0pt}\right)}}=\left(1+\frac{\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda>\lambda^{\prime})\lambda^{\prime}\,{\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}}\mathclose{\scalebox{1.2}{$)$}}}{\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda<\lambda^{\prime})\lambda^{\prime}\,{\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}}\mathclose{\scalebox{1.2}{$)$}}}\right)^{\!-1} (74)

is a monotonic decreasing function of α\alpha, and Cλ(0)=𝔼(λ)1𝔼(χ(λ<λ)λ)=cλC_{\lambda}(0)=\mathbb{E}{\left(\lambda^{\prime}\rule{0.0pt}{0.0pt}\right)}^{-1}\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda<\lambda^{\prime})\lambda^{\prime}}\mathclose{\scalebox{1.2}{$)$}}=c^{\ }_{\lambda} is taken into account. Accordingly, the term added to cλc^{\ }_{\lambda} in (73) is negative, that is:  cλ ^(ufm)\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})} has a negative bias. Indeed, when α>α\alpha^{\prime}>\alpha, we observe that

Cλ(α)1=1+𝔼(χ(λ>λ)λeαλe(αα)(λλ))𝔼(χ(λ<λ)λeαλe(αα)(λλ))>1+𝔼(χ(λ>λ)λeαλ)𝔼(χ(λ<λ)λeαλ)=Cλ(α)1,\begin{array}[b]{rcl}C_{\lambda}(\alpha^{\prime})^{-1}&=&\displaystyle 1+\frac{\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda>\lambda^{\prime})\lambda^{\prime}\,{\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}\,{\mathrm{e}}^{\mbox{\footnotesize$(\alpha^{\prime}-\alpha)(\lambda-\lambda^{\prime})$}}}\mathclose{\scalebox{1.2}{$)$}}}{\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda<\lambda^{\prime})\lambda^{\prime}\,{\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}\,{\mathrm{e}}^{\mbox{\footnotesize$-(\alpha^{\prime}-\alpha)(\lambda^{\prime}-\lambda)$}}}\mathclose{\scalebox{1.2}{$)$}}}\\[12.91663pt] &>&\displaystyle 1+\frac{\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda>\lambda^{\prime})\lambda^{\prime}\,{\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}}\mathclose{\scalebox{1.2}{$)$}}}{\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda<\lambda^{\prime})\lambda^{\prime}\,{\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}}\mathclose{\scalebox{1.2}{$)$}}}=C_{\lambda}(\alpha)^{-1}\,,\end{array} (75)

or Cλ(α)<Cλ(α)C_{\lambda}(\alpha^{\prime})<C_{\lambda}(\alpha).

Repeated integrations by parts express the bias in (73) as a sum of terms proportional to powers of Nufm1N_{\mathrm{ufm}}^{-1}. The leading term is exhibited here:

 0dα𝔼(eαλ)NufmαCλ(α)=1Nufm𝔼(λ2)𝔼(λ)2[cλ𝔼(χ(λ<λ)λ2)𝔼(λ2)]+\rule{10.00002pt}{0.0pt}\int\limits_{0}^{\infty}{\mathrm{d}}\alpha\,\mathbb{E}{\left({\mathrm{e}}^{\mbox{\footnotesize$-\alpha\lambda^{\prime}$}}\rule{0.0pt}{0.0pt}\right)}^{N_{\mathrm{ufm}}}\frac{\partial}{\partial\alpha}C_{\lambda}(\alpha)=\frac{1}{N_{\mathrm{ufm}}}\frac{\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\lambda^{\prime}}^{2}}\mathclose{\scalebox{1.2}{$)$}}}{\mathbb{E}{\left(\lambda^{\prime}\rule{0.0pt}{0.0pt}\right)}^{2}}{\left[c_{\lambda}^{\ }-\frac{\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda<\lambda^{\prime}){\lambda^{\prime}}^{2}}\mathclose{\scalebox{1.2}{$)$}}}{\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\lambda^{\prime}}^{2}}\mathclose{\scalebox{1.2}{$)$}}}\right]}+\cdots (76)

with

 𝔼(χ(λ<λ)λ2)=λ2sλ+2λ1dλλsλ,𝔼(λ2)=201dλλsλ.\rule{40.00006pt}{0.0pt}\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\raisebox{1.72218pt}{\scalebox{1.2}{$\chi$}}}(\lambda<\lambda^{\prime}){\lambda^{\prime}}^{2}}\mathclose{\scalebox{1.2}{$)$}}=\lambda^{2}s_{\lambda}^{\ }+2\int_{\lambda}^{1}{\mathrm{d}}\lambda^{\prime}\,\lambda^{\prime}\,s_{\lambda^{\prime}}^{\ }\,,\quad\mathbb{E}\mathopen{\scalebox{1.2}{$($}}{{\lambda^{\prime}}^{2}}\mathclose{\scalebox{1.2}{$)$}}=\displaystyle 2\int_{0}^{1}{\mathrm{d}}\lambda^{\prime}\,\lambda^{\prime}\,s_{\lambda^{\prime}}^{\ }\,. (77)

Since both the variance of  sλ ^\widehat{\rule{0.70007pt}{0.0pt}s_{\lambda}\rule{0.59998pt}{0.0pt}} and the bias in  cλ ^(ufm)\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})} are proportional to Nufm1N_{\mathrm{ufm}}^{-1}, the bias in  cλ ^(ufm)\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})} is smaller than the standard deviation of  sλ ^\widehat{\rule{0.70007pt}{0.0pt}s_{\lambda}\rule{0.59998pt}{0.0pt}}, from which  cλ ^(ufm)\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})} is computed, by a factor of Nufm12N_{\mathrm{ufm}}^{-\frac{1}{2}}. Accordingly, this bias is of no consequence for the considerations in section 5, the more so if we recall the typical circumstances of NufmNtgt1N_{\mathrm{ufm}}\gg N_{\mathrm{tgt}}\gg 1.

In practice, we are using the estimate  cλ ^(ufm)\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})} of (71), obtained from a large uniform sample, in the place of the actual cλc_{\lambda} when computing QQ of (56), that is

Q01dλ[( cλ ^cλ)( cλ ^(ufm)cλ)]2;Q\to\int_{0}^{1}{\mathrm{d}}\lambda\,{\left[\mathopen{\scalebox{1.2}{$($}}\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}-c_{\lambda}\mathclose{\scalebox{1.2}{$)$}}-\mathopen{\scalebox{1.2}{$($}}\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})}-c_{\lambda}\mathclose{\scalebox{1.2}{$)$}}\right]}^{2}\,; (78)

the expected value of  cλ ^(ufm)cλ{\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})}-c_{\lambda}} (with respect to the uniform distribution) is the bias in (73). As a consequence, the expected value of (57) (with respect to the target distribution) acquires a small additional term that does not depend on the size of the target sample,

𝔼(Q)1Ntgt01dλcλ(1cλ)+01dλ( cλ ^(ufm)cλ)2,\mathbb{E}{\left(Q\rule{0.0pt}{0.0pt}\right)}\to\frac{1}{N_{\mathrm{tgt}}}\int_{0}^{1}{\mathrm{d}}\lambda\,c_{\lambda}(1-c_{\lambda})+\int_{0}^{1}{\mathrm{d}}\lambda\,\mathopen{\scalebox{1.2}{$($}}\widehat{\rule{0.70007pt}{0.0pt}c_{\lambda}\rule{0.59998pt}{0.0pt}}^{(\mathrm{ufm})}-c_{\lambda}\mathclose{\scalebox{1.2}{$)$}}^{2}\,, (79)

and the variance in (58) also acquires a corresponding additional term.

References

References

  • [1] Horodecki Karol, Horodecki Michał, Horodecki Paweł, Leung Debbie and Oppenheim Jonathan 2008 Unconditional privacy over channels which cannot convey quantum information Phys. Rev. Lett. 100  110502
  • [2] Hamma Alioscia, Santra Siddharta and Zanardi Paolo 2012 Quantum entanglement in random physical states Phys. Rev. Lett. 109 040502
  • [3] Dupuis Frédéric, Fawzi Omar and Wehner Stephanie 2015 Entanglement sampling and applications IEEE Trans. Inf. Theory 61 1093–1112
  • [4] Collins Benoît and Nechita Ion 2016 Random matrix techniques in quantum information theory J. Math. Phys. 57 015215
  • [5] Sim Jun Yan, Suzuki Jun, Englert Berthold-Georg and Ng Hui Khoon 2020 User-specified random sampling of quantum channels and its applications Phys. Rev. A 101 022307
  • [6] Lu Yiping, Sim Jun Yan, Suzuki Jun, Englert Berthold-Georg and Ng Hui Khoon 2020 Direct estimation of minimum gate fidelity Phys. Rev. A 102 022410
  • [7] Li Xikun, Shang Jiangwei, Ng Hui Khoon and Englert Berthold-Georg 2016 Optimal error intervals for properties of the quantum state Phys. Rev. A 94 062112
  • [8] Oh Changhun, Teo Yong Siah and Jeong Hyunseok 2019 Probing Bayesian Credible Regions Intrinsically: A Feasible Error Certification for Physical Systems Phys. Rev. Lett. 123 040602
  • [9] Oh Changchun, Teo Yong Siah and Jeong Hyunseok 2019 Efficient Bayesian credible-region certification for quantum-state tomography Phys. Rev. A 100 012345
  • [10] Życzkowski Karol, Penson Karol A, Nechita Ion and Collines Benoît 2011 Generating random density matrices J. Math. Phys. 52 062201
  • [11] Bengtsson Ingemar and Życzkowski Karol 2017 Geometry of Quantum States second edition (Cambridge: Cambridge University Press)
  • [12] Braunstein Samuel Leon 1996 Geometry of quantum inference Phys. Lett. A 219 169–174
  • [13] Hall Michael J W 1998 Random quantum correlations and density operator distributions Phys. Lett. A 242 123–129
  • [14] Gu Yanwu, Li Weijun, Evans Michael and Englert Berthold-Georg 2019 Very strong evidence in favor of quantum mechanics and against local hidden variables from a Bayesian analysis Phys. Rev. A 99 026104
  • [15] Liu Jun S 2008 Monte Carlo Strategies in Scientific Computing (Heidelberg: Springer)
  • [16] Shang Jiangwei, Seah Yi-Lin, Ng Hui Khoon, Nott David John and Englert Berthold-Georg 2015 Monte Carlo sampling from the quantum state space. I New J. Phys.17 043017
  • [17] Neal Radford M 2011 MCMC using Hamiltonian dynamics (Handbook of Markov Chain Monte Carlo) ed Brooks Steve, Gelman Andrew, Jones Galin L and Meng Xiao-Li (Boca Raton: Chapman and Hall) chapter 5
  • [18] Seah Yi-Lin, Shang Jiangwei, Ng Hui Khoon, Nott David John and Englert Berthold-Georg 2015 Monte Carlo sampling from the quantum state space. II New J. Phys.17 043018
  • [19] Pearson Egon Sharpe 1957 John Wishart 1898–1956 Biometrika 44 1–8
  • [20] Wishart John 1928 The generalized product-moment distribution in samples from a normal multivariate population Biometrika 20 32–43
  • [21] Goodman N R 1963 Statistical analysis based on a certain multivariate complex Gaussian distribution (An introduction) Ann. Math. Statist. 34 152–177
  • [22] Srivastava Muni S 1965 On the Complex Wishart Distribution Ann. Math. Statist. 36 313–315
  • [23] Muirhead Robb John 1982 Aspects of Multivariate Statistical Theory (Hoboken: Wiley)
  • [24] Johnson Richard A and Wichern Dean W 2007 Applied multivariate statistical analysis sixth edition (London: Pearson Education)
  • [25] Anderson Theodore Wilbur 2003 An Introduction to Multivariate Statistical Analysis third edition (New York: Wiley)
  • [26] Li Weijun, Han Rui, Shang Jangwei, Ng Hui Khoon and Englert Berthold-Georg 2021 Sequentially constrained Monte Carlo sampler for quantum states (in preparation)
  • [27] Wasserman Larry 2004 All of Statistics: A Concise Course in Statistical Inference (New York: Springer Science+Business Media)
  • [28] Bagchi Shrobona, Li Weijun, Han Rui, Ng Hui Khoon and Englert Berthold-Georg 2021 Uncorrelated problem-specific samples of quantum states from nonzero-mean Wishart distributions (in preparation)
  • [29] Yu Yi-Kuo and Zhang Yi-Cheng 2002 On the anti-Wishart distribution Physica A 312 1–22
  • [30] Janik Romuald A and Nowak Maciej A 2003 Wishart and anti-Wishart random matrices J. Phys. A 36 3629–3647
  • [31] Życzkowski Karol and Sommers Hans-Jürgen 2001 Induced measures in the space of mixed quantum states J. Phys. A 34 7111–7125
  • [32] Faist Philippe and Renner Renato 2016 Practical, Reliable Error Bars in Quantum Tomography Phys. Rev. Lett. 117 010404
  • [33] Bloch Felix 1946 Nuclear induction Phys. Rev. 70 460–474
  • [34] Hradil Zdenek 1997 Quantum-state estimation Phys. Rev. A 55 R1561–R1564
  • [35] Paris Matteo and Řeháček Jaroslav (ed) 2004 Quantum State Estimation (Lecture Notes in Physics vol 649) (Heidelberg: Springer)
  • [36] Shang Jiangwei, Zhang Zhengyun and Ng Hui Khoon 2017 Superfast maximum likelihood reconstruction for quantum tomography Phys. Rev. A 95 062336
  • [37] von Neumann John 1951 Various Techniques Used in Connection With Random Digits J. Res. Nat. Bur. Stand. Appl. Math. Series 3 36–38
  • [38] Shang Jiangwei, Ng Hui Khoon, Sehrawat Arun, Li Xikun and Englert Berthold-Georg 2013 Optimal error regions for quantum state estimation New J. Phys.15 123026
  • [39] Metropolis Nicholas 1987 The beginning of the Monte Carlo method Los Alamos Science Special Issue 125–130
  • [40] Metropolis Nicholas and Ulam Stanisław 1949 The Monte Carlo method J. Am. Stat. Assoc. 44 335–341
  • [41] Evans Michael and Swartz Tim 2000 Approximating integrals via Monte Carlo and deterministic methods (New York: Oxford University Press)
  • [42] Andrey Kolmogorov 1933 Sulla determinazione empirica di una legge di distribuzione G. Ist. Ital. Attuari 4 83–91
  • [43] Nicolai Smirnov 1948 Table for estimating the goodness of fit of empirical distributions Ann. Math. Stat. 19 279–281
  • [44] Bellman Richard Ernest 1961 Adaptive control processes: a guided tour (Princeton: Princeton University Press)
  • [45] Shang Jiangwei, Seah Yi-Lin, Wang Boyu, Ng Hui Khoon, Nott David John and Englert Berthold-Georg 2016 Random samples of quantum states: Online resources eprint arXiv:1612.05180 [quant-ph] Repository at https://www.quantumlah.org/page/page.php?key=qsampling
  • [46] Donoho David Leigh 2000 High-dimensional data analysis: The curses and blessings of dimensionality AMS Math Challenges Lecture 1–32
  • [47] Del Moral Pierre, Doucet Arnaud and Jasra Ajay 2006 Sequential Monte Carlo samplers J. R. Statist. Soc. B 68 411–436
  • [48] Golchi Shirin and Campbell David A 2016 Sequentially Constrained Monte Carlo Comput. Stat. Data Anal. 97 98–113
  • [49] Řeháček Jaroslav, Englert Berthold-Georg and Kaszlikowski Dagomir 2004 Minimal qubit tomography Phys. Rev. A 70 052321
  • [50] Zauner Gerhard 2011 Quantum designs: Foundations of a noncommutative design theory Int. J. Quantum Inf. 9 445–507