This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Efficient Postprocessing Procedure for Evaluating Hamiltonian Expectation Values in Variational Quantum Eigensolver.

Chi-Chun Chen * r08222060@ntu.edu.tw Physics Division, National Center for Theoretical Sciences, Taipei 106319, Taiwan Center for Quantum Science and Engineering, National Taiwan University, Taipei 106319, Taiwan Hsi-Sheng Goan \dagger goan@phys.ntu.edu.tw Physics Division, National Center for Theoretical Sciences, Taipei 106319, Taiwan Center for Quantum Science and Engineering, National Taiwan University, Taipei 106319, Taiwan Department of Physics, National Taiwan University, Taipei 106319, Taiwan
Abstract

We proposed a simple strategy to improve the postprocessing overhead of evaluating Hamiltonian expectation values in Variational quantum eigensolvers (VQEs). Observing the fact that for a mutually commuting observable group G in a given Hamiltonian, b|G|b\expectationvalue{G}{b} is fixed for a measurement outcome bit string bb in the corresponding basis, we create a measurement memory (MM) dictionary for every commuting operator group G in a Hamiltonian. Once a measurement outcome bit string bb appears, we store bb and b|G|b\expectationvalue{G}{b} as key and value, and the next time the same bit string appears, we can find b|G|b\expectationvalue{G}{b} from the memory, rather than evaluate it once again. We further analyze the complexity of MM and compare it with commonly employed post-processing procedure, finding that MM is always more efficient in terms of time complexity. We implement this procedure on the task of minimizing a fully connected Ising Hamiltonians up to 20 qubits, and H2, H4, LiH, and H2O molecular Hamiltonians with different grouping methods. For Ising Hamiltonian, where all O(N2)O(N^{2}) terms commute, our method offers an O(N2)O(N^{2}) speedup in terms of the percentage of time saved. In the case of molecular Hamiltonians, we achieved over O(N)O(N) percentage time saved, depending on the grouping method.

Index Terms:
Quantum computing, Variational quantum eigensolver, VQE, Measurement memory

I INTRODUCTION

Variational quantum eigensolvers (VQEs)[1, 2, 3] have been considered one of the main applications of quantum computers in the noisy intermediate-scale quantum (NISQ) era[4]. VQEs can be applied to a broad type of tasks, such as chemistry, optimization, and machine learning. A common goal of most of these applications is to minimize an objective function, which usually is represented by a Hamiltonian that can be decomposed into a sum of tensor products of Pauli operators, i.e.

H=ihiOi,H=\sum_{i}h_{i}O_{i}, (1)

where Oi{I,σX,σY,σZ}NO_{i}\in\{I,\sigma_{X},\sigma_{Y},\sigma_{Z}\}^{\otimes N}, and NN is the number of qubits. In VQE, the goal is to minimize the Hamiltonian expectation value, obtained from a statistical average over a large number of measurements. The general process of VQEs can be summarized as follows:

  1. 1.

    Design the parameterized ansatz U(θ)U(\mathbf{\theta}).

  2. 2.

    Choose the initial parameter vector θ\mathbf{\theta}.

  3. 3.

    Calculate the gradient of f(θ)=i|U(θ)HU(θ)|if(\mathbf{\theta})=\expectationvalue{U^{\dagger}(\mathbf{\theta})HU(\mathbf{\theta})}{i}, where |i\ket{i} is the initial state.

  4. 4.

    Update θ\mathbf{\theta} according to the gradient, and check if f(θ)f(\theta) has decreased.

  5. 5.

    Go back to step 3 until f(θ)f(\theta) cannot be minimized anymore or has reached the maximum iterations.

Assuming choosing a gradient-based classical optimizer, the gradient can be calculated analytically with the parameter shift rule [5] for a quantum circuit. However, this process also requires evaluating the expectation value 22 times at f(θ+π2θx^)f(\theta+\frac{\pi}{2}\hat{\theta_{x}}) and f(θπ2θx^)f(\theta-\frac{\pi}{2}\hat{\theta_{x}}) for each dimension θx\theta_{x} in θ\mathbf{\theta}. This results in 2d2d times of evaluations of Hamiltonian expectation value for step 3, where dd is the dimension of θ\mathbf{\theta}. Then we evaluate f(θ)f(\mathbf{\theta}) once again with the updated θ\mathbf{\theta} in step 4, making it a total of 2d+12d+1 times of evaluation in every optimization step, and thus (2d+1)T(2d+1)T evaluations are required for a whole VQE process with TT iterations. This number increases as the problem size grows since the number of parameters should grow, and the number of iterations may also grow. It is well known that evaluating a Hamiltonian expectation value once already requires a large number of circuit repetitions [6]. However, as stated above, we even need to repeat this task over and over again to complete the whole VQE process.

There have been many elaborated methods proposed to reduce the circuit repetition for evaluating the Hamiltonian expectation value, such as efficient state tomography [7, 8, 9], measurement distribution [6, 10, 11], and Hamiltonian partitioning [1, 12, 13, 14, 15, 11]…etc. However, the measurement needed to reach a specific error is still very large. For example, it still requires hundreds of millions of circuit repetitions to evaluate even small molecules [11] (e.g., H2O, NH3) to chemical accuracy. Moreover, with such a large number of measurements, the classical overhead of postprocessing becomes nontrivial.

First, we started from a Hamiltonian that is partitioned into commuting groups, i.e.,

H=kGk.H=\sum_{k}G_{k}. (2)

Since commuting operators share common eigenstates, the Pauli operators in the same group can be evaluated with the same set of measurement outcomes {b}\{b\} in the corresponding measurement basis. The classical cost of evaluating a bit string depends on the number of terms in the group. For example, a fully connected Ising Hamiltonian has all of its operators in the same group, since they are all composed of Pauli Z, and thus requires O(N2)O(N^{2}) scaling of cost. A molecular Hamiltonian is known to have O(N4)O(N^{4}) operators, and a grouping strategy with O(N3)O(N^{3}) groups will thus result in linear scaling of operators in each group, which requires O(N)O(N) cost to evaluate. Observing the fact that b|G|b\expectationvalue{G}{b} is fixed and needs to be calculated every time bb occurs, a straightforward strategy is to memorize it. Since looking up a bit string with length NN in a dictionary requires only O(1×N)O(1\times N) on average. This is the main idea of measurement memory (MM).

We implemented MM on fully connected Ising Hamiltonians up to 20 qubits, finding an O(N2)O(N^{2}) speedup in terms of the percentage of time saved. MM is also tested on Molecular Hamiltonian of H2, H4, LiH, and H2O, each with 3 different grouping strategies: qubit-wise commuting (QWC) grouping [1, 13], general commuting (GC) grouping [14], and Fermion grouping (FG) [16]. Each grouping method has different scaling of groups and operators in each group, thus leading to a different degree of improvement.

II Measurement Memory

For a Hamiltonian HH that has been divided into KK commuting operator groups, i.e.,

H\displaystyle H =k=1KGk,\displaystyle=\sum_{k=1}^{K}G_{k}, (3)
Gk\displaystyle G_{k} =jhjkOjk,\displaystyle=\sum_{j}h_{j}^{k}O_{j}^{k},

where Ojk{I,σX,σY,σZ}NO_{j}^{k}\in\{I,\sigma_{X},\sigma_{Y},\sigma_{Z}\}^{\otimes N}, and

[Oik,Ojk]=0.[O_{i}^{k},O_{j}^{k}]=0. (4)

The operators in the same commuting groups are simultaneously diagonalizable and can be evaluated with the same set of measurements in the corresponding basis. Measurement Memory (MM) creates an independent dictionary k\mathcal{M}_{k} for each group GkG_{k} at the beginning. During the future process, every time we evaluate the Hamiltonian expectation value via quantum computer measurement results (including calculating the gradient with the parameter shift rule), we try to find the value b|Gk|b\expectationvalue{G_{k}}{b} in the corresponding dictionary k\mathcal{M}_{k} for each group with key bb. If it doesn’t exist, we evaluate it as the normal process and store bb and b|Gk|b\expectationvalue{G_{k}}{b} in the dictionary as key and value.

The pseudocode shown in Algorithm 1 represents the process of evaluating H\expectationvalue{H} once. For each measurement we evaluate, \mathcal{M} accumulates and starts to develop potential to reduce more and more computational cost.

1 E=0E=0
2foreach GkG_{k} in {G1,G2,,GK}\{G_{1},G_{2},\ldots,G_{K}\} do
3       Ek=0E_{k}=0
4       foreach bb in {b1,b2,,bm}k\{b_{1},b_{2},\ldots,b_{m}\}_{k} do
5             try :
6                   Ek+=k[b]1mE_{k}+=\mathcal{M}_{k}[b]\cdot\frac{1}{m}
7            except :
8                   k[b]=b|Gk|b\mathcal{M}_{k}[b]=\expectationvalue{G_{k}}{b}
9                  Ek+=k[b]1mE_{k}+=\mathcal{M}_{k}[b]\cdot\frac{1}{m}
10            
11      E+=EkE+=E_{k}
return EE
Algorithm 1 Measurement Memory

Note that one does not have to sort the measured bit string {b1,b2,,bm}\{b_{1},b_{2},\ldots,b_{m}\} into probability or count dictionaries in advance, as we usually do in regular procedures when applying MM. We will discuss the complexity of sorting and MM further in the next chapter. We also analyzed the time complexity in the appendix, showing that the potential performance improvement of MM is mainly determined by the scaling of the number of operators in each group. Thus, the improvement will be very limited for an MM created for a sublinear scaling operator group. On the other hand, we should expect fruitful performance improvement for superlinear scaling operator groups.

III Comparison with Regular Procedure

For every commuting group GkG_{k} in a Hamiltonian, we set up the circuit in the corresponding measurement basis and repeat the circuit, say mm times, to obtain a collection of measured bit strings {b1,b2,,bm}k\{b_{1},b_{2},\ldots,b_{m}\}_{k}. Typically, one would sort the set into distinct terms with corresponding counts or probabilities, such as {b1:Pr(b1),b2:Pr(b2),,bL:Pr(bL)}k\{b_{1}:Pr(b_{1}),b_{2}:Pr(b_{2}),\ldots,b_{L}:Pr(b_{L})\}_{k}, where LmL\leq m. For simplicity, we refer to the process of evaluating G\expectationvalue{G} from {b1:Pr(b1),b2:Pr(b2),,bL:Pr(bL)}k\{b_{1}:Pr(b_{1}),b_{2}:Pr(b_{2}),\ldots,b_{L}:Pr(b_{L})\}_{k} as the ”sort-and-evaluate” procedure and the process of directly evaluating every bit string in {b1,b2,,bm}k\{b_{1},b_{2},\ldots,b_{m}\}_{k} as the ”naive evaluation”. The sort-and-evaluate procedure reduces the evaluation of b|G|b\expectationvalue{G}{b} by mLm-L times. However, the sorting process is not free and might waste a lot of time in the worst case (i.e., L=mL=m). Nevertheless, we showed in the Appendix that in general and practical cases, one will still prefer the sort-and-evaluate procedure over naive evaluation.

One thing to mention here is that the MM procedure has already incorporated the process of sorting raw measurement results into probability dictionaries. Thus, we do not have to sort the measured result in advance when applying MM. The main difference is that we are now storing their eigenvalues (b|G|b\expectationvalue{G}{b}) instead of probabilities. While a probability dictionary is useless except for one specific evaluation of the expectation value, the information of MM is able to sustain through the whole VQE and develop the potential to save more and more cost as the iteration goes on. We further proved that the computational cost of the worst-case scenario of MM, which is the case that no eigenvalue of measured bit strings is stored in memory, is identical to the sort-and-evaluate procedure. This indicates that there are no downside trade-offs for adopting MM in terms of time complexity.

IV Neumerical Simulation

IV-A Ising Hamiltonian

Here we implement MM on fully connected Ising Hamiltonians up to 20 qubits. A random fully connected Ising Hamiltonian can be written as

HI=i=1Nj>iNhijZiZj+i=1NhiZi,H_{I}=\sum^{N}_{i=1}\sum^{N}_{j>i}h_{ij}Z_{i}Z_{j}+\sum^{N}_{i=1}h_{i}Z_{i}, (5)

where NN is the qubit number and hij0h_{ij}\neq 0. These types of Hamiltonians are commonly seen for solving quadratic unconstrained binary optimization (QUBO) problems [17]. The operators all consist of Pauli ZZ, thus we have (N2)+N\binom{N}{2}+N local operator terms in the same commuting group and require only a single measurement basis {Z}N\{Z\}^{\otimes N}.

We observed a special property of the Ising Hamiltonian, given the measured bit string set {b}\{b\},

HI\displaystyle\expectationvalue{H_{I}} =ij>i[1mb{b}hijb|ZiZj|b]+i[1mb{b}hib|Zi|b]\displaystyle=\sum_{i}\sum_{j>i}\left[\frac{1}{m}\sum_{b\in\{b\}}h_{ij}\expectationvalue{Z_{i}Z_{j}}{b}\right]+\sum_{i}\left[\frac{1}{m}\sum_{b\in\{b\}}h_{i}\expectationvalue{Z_{i}}{b}\right] (6)
=1mb{b}[ij>ihijb|ZiZj|b+ihib|Zi|b]\displaystyle=\frac{1}{m}\sum_{b\in\{b\}}\left[\sum_{i}\sum_{j>i}h_{ij}\expectationvalue{Z_{i}Z_{j}}{b}+\sum_{i}h_{i}\expectationvalue{Z_{i}}{b}\right]
=1mb{b}b|HI|b.\displaystyle=\frac{1}{m}\sum_{b\in\{b\}}\expectationvalue{H_{I}}{b}.

This implies that the cost of post-measurement classical computation per measurement is equivalent to searching one eigenstate. Due to the large number of circuit repetitions for evaluating the Hamiltonian expectation value, it would be very likely that the circuit repetition exceeds the search space of the problem (2N2^{N}) for problem sizes that are not large enough. It is thus difficult to justify the usage of VQE for Ising Hamiltonian on intermediate-scale quantum computers today and in the near future. MM here provides a simple strategy to bypass this limitation.

The measurement is done after a one CNOT layer ansatz, with 2N2N parameters. We adopted O(N2)O(N^{2}) circuit repetition scaling with problem size for each evaluation of Hamiltonian expectation value, proportional to the scaling of operator terms. We take 10 initial parameter guesses, and run VQE for 200 iterations. The final result is an average over 10 incidents. Note that the Hamiltonian is fixed for both normal procedure and MM procedure at every qubit size, and so are those 10 initial guesses.

The results are shown in Fig 1, MM achieved quadratic speed up in terms of percentage compared with the normal evaluation procedure. There is an additional benefit from using MM for optimization problems. Since the whole Ising Hamiltonian is one commuting group, b|G|b=b|HI|b\expectationvalue{G}{b}=\expectationvalue{H_{I}}{b}, where bb is the measured bit string and also an eigenstate of HIH_{I}. Thus, we are able to store every eigenvalue of every eigenstate that ever measured during the whole VQE process. This largely increases the probability of finding a good solution (i.e. low energy eigenstate) due to the large number of measurements.

Refer to caption
Figure 1: a CPU time (tt) for 200 steps of optimization of fully-connected Ising Hamiltonians with increasing problem size. b Percentage time saved, i.e. (tNormaltMM)/tNormal(t_{Normal}-t_{MM})/t_{Normal}, with MM.
Mol QWC GC FG
total groups total groups total groups
H2 15 5 15 2 34 4
H4 185 67 185 9 317 11
LiH 631 151 631 34 877 22
H2O 1086 556 1086 90 1611 29
TABLE I: Total operator terms and commuting groups for different grouping methods.

IV-B Molecular Hamiltonian

It is well known that the Pauli operator terms scales O(N4)O(N^{4}) with system size for molecular Hamiltonian. Fortunately, several method have been proposed to partition those operators into commuting groups[13, 14, 16]. Although it may not directly reduce the circuit repetition without gingerly selection of grouping terms[16], MM is able to reduce more classical overhead of postprocessing the more terms are grouped together. We test MM on three grouping strategies, qubit-wise commuting (QWC) grouping[1, 13], general commuting (GC) grouping[14], and Fermion grouping (FG)[16]. QWC results in O(N4)O(N^{4}) scaling with constant scaling terms in each group, while GC results in O(N3)O(N^{3}) scaling and linear scaling group members. FG is a more special case since it permits reasonable discarding of small eigenvalues in second quantization Hamiltonian, resulting in O(N)O(N) scaling groups, where each group contains O(N2)O(N^{2}) terms for small size molecules and reaching O(log2(N))O(log^{2}(N)) as the system size become large[18].

Here we demonstrated the improvement of MM on H2 (4 qubit), H4 (8 qubit), LiH (12 qubit), and H2O (14 qubit) molecules with sto-3g basis set, transformed via Jordan–Wigner (J-W) transformation. The comparison of MM with normal evaluation procedure is shown in Fig 2. As expected, the order of improvement is FG ¿ GC ¿ QWC, since the improvement of MM is more significant if there are more terms in the same group, and also because FG results in more terms of Pauli operators in total (TABLE I). The time saved in terms of percentage is also shown in Fig 3. All three grouping methods (i.e. QWC , GC, and FG grouping) are able to achieve linear to superlinear improvement in percentage time saved. However, for FG case, the qubit sizes we are able to demonstrate here are at the transition point of the term scaling, i.e. O(N2)O(N^{2}) to O(log2(N))O(log^{2}(N)), making it more difficult to estimate the scaling of improvement of MM for larger qubit systems.

Refer to caption
Figure 2: CPU time required for 100 steps of optimization, with different grouping methods and NN scaling of circuit repetition for each commuting group with qubit. a QWC, b GC, c FG, of H2 (4 qubit), H4 (8 qubit), LiH (12 qubit), and H2O (14 qubit) molecular Hamiltonians with a minimum basis set. The shaded regions are the standard deviation of 10 determined initial parameter guesses.
Refer to caption
Figure 3: Percentage time saved, i.e. (tNormaltMM)/tNormal(t_{Normal}-t_{MM})/t_{Normal}, with MM, using different grouping methods for molecular Hamiltonians.

V CONCLUSIONS and OUTLOOK

In this article, we introduced a Measurement Memory (MM) dictionary designed to store a measured bit string bb and its eigenvalue b|G|b\expectationvalue{G}{b} of a commuting group in a Hamiltonian. Throughout the VQE process, MM accumulates records, reducing the computational cost of evaluating bb over all terms in GG each time bb occurs after the first instance. We achieved O(N2)O(N^{2}) percentage time savings for a fully connected Ising Hamiltonian, providing an additional benefit of increasing the probability of finding a low-energy solution. MM also provides over O(N)O(N) time savings for molecular Hamiltonian depending on the grouping method. With careful selection of grouping elements, one may be able to reduce both circuit repetition and postprocessing classical overhead. We further compare the time complexity of regular sort-and-evaluate procedure with MM, finding that that MM is a more efficient way of storing useful information for evaluating Hamiltonian expectation values in VQE, which sustains through the whole process. Moreover, the worst case scenario of MM is identical to the regular procedure. Thus, MM is a more efficient procedure of doing post-processing for evaluating Hamiltonian expectation value in VQE, and employing MM has no down side trade offs. For future applications, MM can also be applied to other deterministic measurement schemes, such as other grouping methods, and derandomized shadow[9] as long as the measurement basis are fixed.

References

  • [1] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik, “The theory of variational hybrid quantum-classical algorithms,” New Journal of Physics, vol. 18, no. 2, p. 023023, 2016.
  • [2] A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta, “Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets,” Nature, vol. 549, no. 7671, pp. 242–246, 2017.
  • [3] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, et al., “Variational quantum algorithms,” Nature Reviews Physics, vol. 3, no. 9, pp. 625–644, 2021.
  • [4] J. Preskill, “Quantum computing in the nisq era and beyond,” Quantum, vol. 2, p. 79, 2018.
  • [5] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, “Quantum circuit learning,” Physical Review A, vol. 98, no. 3, p. 032309, 2018.
  • [6] D. Wecker, M. B. Hastings, and M. Troyer, “Progress towards practical quantum variational algorithms,” Physical Review A, vol. 92, no. 4, p. 042303, 2015.
  • [7] S. Aaronson, “Shadow tomography of quantum states,” SIAM Journal on Computing, vol. 49, no. 5, pp. STOC18–368, 2019.
  • [8] H.-Y. Huang, R. Kueng, and J. Preskill, “Predicting many properties of a quantum system from very few measurements,” Nature Physics, vol. 16, no. 10, pp. 1050–1057, 2020.
  • [9] H.-Y. Huang, R. Kueng, and J. Preskill, “Efficient estimation of pauli observables by derandomization,” Physical review letters, vol. 127, no. 3, p. 030503, 2021.
  • [10] N. C. Rubin, R. Babbush, and J. McClean, “Application of fermionic marginal constraints to hybrid quantum algorithms,” New Journal of Physics, vol. 20, no. 5, p. 053020, 2018.
  • [11] T.-C. Yen, A. Ganeshram, and A. F. Izmaylov, “Deterministic improvements of quantum measurements with grouping of compatible operators, non-local transformations, and covariance estimates,” npj Quantum Information, vol. 9, no. 1, p. 14, 2023.
  • [12] A. Jena, S. Genin, and M. Mosca, “Pauli partitioning with respect to gate sets,” arXiv preprint arXiv:1907.07859, 2019.
  • [13] V. Verteletskyi, T.-C. Yen, and A. F. Izmaylov, “Measurement optimization in the variational quantum eigensolver using a minimum clique cover,” The Journal of chemical physics, vol. 152, no. 12, p. 124114, 2020.
  • [14] T.-C. Yen, V. Verteletskyi, and A. F. Izmaylov, “Measuring all compatible operators in one series of single-qubit measurements using unitary transformations,” Journal of chemical theory and computation, vol. 16, no. 4, pp. 2400–2409, 2020.
  • [15] P. Gokhale, O. Angiuli, Y. Ding, K. Gui, T. Tomesh, M. Suchara, M. Martonosi, and F. T. Chong, “o(n3)o(n^{3}) measurement cost for variational quantum eigensolver on molecular hamiltonians,” IEEE Transactions on Quantum Engineering, vol. 1, pp. 1–24, 2020.
  • [16] W. J. Huggins, J. R. McClean, N. C. Rubin, Z. Jiang, N. Wiebe, K. B. Whaley, and R. Babbush, “Efficient and noise resilient measurements for quantum chemistry on near-term quantum computers,” npj Quantum Information, vol. 7, no. 1, p. 23, 2021.
  • [17] F. Glover, G. Kochenberger, and Y. Du, “A tutorial on formulating and using qubo models,” arXiv preprint arXiv:1811.11538, 2018.
  • [18] M. Motta, E. Ye, J. R. McClean, Z. Li, A. J. Minnich, R. Babbush, and G. K.-L. Chan, “Low rank representations for quantum simulation of electronic structure,” npj Quantum Information, vol. 7, no. 1, p. 83, 2021.

Time Complexity of sort-and-evaluate
Given {b1,b2,bm}\{b_{1},b_{2},...b_{m}\} with m total terms and LL distinct term, which can be sorted into dictionary {b1:Pr(b1),b2:Pr(b2),bL:Pr(bL)}\{b_{1}:Pr(b_{1}),b_{2}:Pr(b_{2}),...b_{L}:Pr(b_{L})\}, where LmL\leq m. The sorting process is

1 DD = {}
2 for bb in {b1,b2,bm}\{b_{1},b_{2},...b_{m}\} do
3       if bb in DD then
4             D[b]D[b] = D[b]D[b] + 1m\frac{1}{m}
5            
6      else
7             D[b]D[b] = 1
8            
9      
Algorithm 2 sort measurement

First we analyze the complexity for sort-and-evaluate method. Dictionaries in python are implemented with hash table, checking if a new key bb is in DD requires calculating the hash function, which has cost 𝒪(N)\mathcal{O}(N) since each bb has length NN. If hash function points to an empty memory, we know bb is not in DD, then we add it to DD with 𝒪(1)\mathcal{O}(1). If hash function points to an occupied memory, we need to check if bb is the same bit string as the occupied one to prevent from hash collision, costing additional 𝒪(N)\mathcal{O}(N). With the bit string set given above, we would result in LL ”no”s and mLm-L ”yes”s (whether the memory is occupied). Since we only have to evaluate b|G|b\expectationvalue{G}{b} for distinct bbs, making the complexity for sort-and-evaluate

L[𝒪(N)+𝒪(G)]+(mL)2𝒪(N)),L\left[\mathcal{O}(N)+\mathcal{O}(G)\right]+(m-L)2\mathcal{O}(N)), (7)

where 𝒪(G)\mathcal{O}(G) is the complexity of evaluating b|G|b\expectationvalue{G}{b} for one bit string.

On the other hand, naive evaluation simply evaluates b|G|b\expectationvalue{G}{b} for every bb in {b1,b2,bm}\{b_{1},b_{2},...b_{m}\}, making the complexity

m𝒪(G).m\mathcal{O}(G). (8)

The condition for choosing sort-and-evaluate over naive evaluation is

L[𝒪(N)+𝒪(G)]+(mL)2𝒪(N))m𝒪(G),L\left[\mathcal{O}(N)+\mathcal{O}(G)\right]+(m-L)2\mathcal{O}(N))\leq m\mathcal{O}(G), (9)

and thus

𝒪(G)2mLmL𝒪(N).\mathcal{O}(G)\geq\frac{2m-L}{m-L}\mathcal{O}(N). (10)

Now we consider two extreme cases, first is an ansatz with same probability distribution over all eigenstates. Sampling from this ansatz is equivalent to a random sample. In this case, we can estimate LL by calculating the expectation value of distinct values of drawing mm times randomly from the 2N2^{N} binary string space,

𝔼[L]=2N(1(112N)m),\mathbb{E}[L]=2^{N}\left(1-\left(1-\frac{1}{2^{N}}\right)^{m}\right), (11)

and 𝔼[L]m\mathbb{E}[L]\rightarrow m as 2N2^{N}\rightarrow\infty. Thus, substituting LL for mm in condition 10, we end up with O(G)O(G)\geq\infty, which implies that one should conduct naive evaluation in this case. However, despite the fact that naive evaluation is always better in this case, the degree of improvement may not be significant. By comparing the complexity of two schemes with LmL\rightarrow m, the m𝒪(N)m\mathcal{O}(N) improvement, although may be large if mm scales badly, is not the bottleneck if 𝒪(G)𝒪(N)\mathcal{O}(G)\geq\mathcal{O}(N). Since evaluating b|G|b\expectationvalue{G}{b} requires looping through every Pauli word on each qubit for every operator in the group,

𝒪(G)=𝒪(N)𝒪(T),\mathcal{O}(G)=\mathcal{O}(N)\mathcal{O}(T), (12)

where 𝒪(T)\mathcal{O}(T) is the scaling of operator terms in group GG. This shows that 𝒪(G)𝒪(N)\mathcal{O}(G)\geq\mathcal{O}(N) is always true and thus the additional cost of adapting sort-and-evaluate scheme will not be the bottleneck. The second case is a highly concentrated ansatz, i.e. L<<mL<<m. In this case, condition 10 becomes

𝒪(G)𝒪(2N),\mathcal{O}(G)\geq\mathcal{O}(2N), (13)

suggesting that if the complexity of evaluating b|G|b\expectationvalue{G}{b} is worse than linear, which, as mentioned above, is true for general cases, one should conduct the sort-and-evaluate scheme.

This discussion thus conclude that one can always conduct the sort-and-evaluate scheme with advantage in most case, and with acceptable disadvantage in some cases.

Time Complexity of MM
As shown in algorithm 1, for every bit string in {b1,b2,bm}\{b_{1},b_{2},...b_{m}\}, we check if bb is in \mathcal{M}. It also takes O(N)O(N) to calculate the hash function to see if the corresponding memory is occupied. A ”no” requires evaluating b|G|b\expectationvalue{G}{b}, and a ”yes” requires checking if it is exact the same string. Assuming the same set of bit string mentioned in the first part of Appendix, where we have total mm bit strings with LL distinct ones. We also have \ell bit strings in LL distinct bit strings stored in \mathcal{M} with its corresponding eigenvalue b|G|b\expectationvalue{G}{b}. We will thus get LL-\ell ”no”s and mL+m-L+\ell ”yes”s, and the complexity for naive evaluation of G\expectationvalue{G} given \mathcal{M} is

(L)[𝒪(N)+𝒪(G)]+(mL+)2𝒪(N)).(L-\ell)\left[\mathcal{O}(N)+\mathcal{O}(G)\right]+(m-L+\ell)2\mathcal{O}(N)). (14)

Now, if one sort the bit string into probability dictionary before feeding into MM, i.e. the {b1,b2,bm}k\{b_{1},b_{2},...b_{m}\}_{k} in the 4th4^{th} step in algorithm 1 becomes {b1:Pr(b1),b2:Pr(b2),bL:Pr(bL)}\{b_{1}:Pr(b_{1}),b_{2}:Pr(b_{2}),...b_{L}:Pr(b_{L})\}, and all following 1m\frac{1}{m} becomes Pr(b)Pr(b). The complexity is thus

L𝒪(N)+(mL)2𝒪(N)+(L)[𝒪(N)+𝒪(G)]+2𝒪(N).L\mathcal{O}(N)+(m-L)2\mathcal{O}(N)+(L-\ell)\left[\mathcal{O}(N)+\mathcal{O}(G)\right]+\ell 2\mathcal{O}(N). (15)

The first two terms represent the cost for sorting, and the last two terms represent the cost of finding the sorted LL bit strings in \mathcal{M}. By comparing equation 14 and equation 15, we see that sorting before MM introduced additional (L+2)𝒪(N)\left(L+2\ell\right)\mathcal{O}(N) computational cost. Although it is also not the bottleneck in most cases, it is totally unnecessary to sort in advance when applying MM.

In equation 14, we can see that in the worst case, for example, the first step, =0\ell=0, the complexity is equal to sort-and-evaluate scheme, i.e. equation 7. For every extra state we store (i.e. +1\ell+1), we are trading 𝒪(N)+𝒪(G)\mathcal{O}(N)+\mathcal{O}(G) cost for 2𝒪(N)2\mathcal{O}(N). Since 𝒪(G)𝒪(N)\mathcal{O}(G)\geq\mathcal{O}(N) in general (one will have to evaluate through the pauli word on every qubit, including ”I”), we explicitly proofed that MM is more efficient than the original sort-and-evaluate procedure (without MM). Furthermore, if given enough iteration, as \mathcal{M} accumulates, we expect asymptotic behavior of LL-\ell to zero, making the complexity

m𝒪(2N).m\mathcal{O}(2N). (16)