Simple and high-precision Hamiltonian simulation by compensating Trotter error with linear combination of unitary operations

Pei Zeng peizeng@uchicago.edu Pritzker School of Molecular Engineering, The University of Chicago, Illinois 60637, USA Jinzhao Sun jinzhao.sun.phys@gmail.com Clarendon Laboratory, University of Oxford, Parks Road, Oxford OX1 3PU, United Kingdom Blackett Laboratory, Imperial College London, London SW7 2AZ, United Kingdom Liang Jiang liang.jiang@uchicago.edu Pritzker School of Molecular Engineering, The University of Chicago, Illinois 60637, USA Qi Zhao zhaoqi@cs.hku.hk QICI Quantum Information and Computation Initiative, Department of Computer Science, Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong

(September 30, 2025)

Abstract

Trotter and linear-combination-of-unitary (LCU) are two popular Hamiltonian simulation methods. The Trotter method is easy to implement and enjoys good system-size dependence endowed by commutator scaling, while the LCU method admits high accuracy simulation with a smaller gate cost. We propose Hamiltonian simulation algorithms using LCU to compensate Trotter error, which enjoy both of their advantages. By adding few gates after the $K$ th-order Trotter formula, we realize a better time scaling than 2 $K$ th-order Trotter. Our first algorithm exponentially improves the accuracy scaling of the $K$ th-order Trotter formula. For a generic Hamiltonian, the estimated gate counts of the first algorithm can be 2 orders of magnitude smaller than the best analytical bound of fourth-order Trotter formula. In the second algorithm, we consider the detailed structure of Hamiltonians and construct LCU for Trotter errors with commutator scaling. Consequently, for lattice Hamiltonians, the algorithm enjoys almost linear system-size dependence and quadratically improves the accuracy of the $K$ th-order Trotter. For the lattice system, the second algorithm can achieve 3 to 4 orders of magnitude higher accuracy with the same gate costs as the optimal Trotter algorithm. These algorithms provide an easy-to-implement approach to achieve a low-cost and high-precision Hamiltonian simulation.

I INTRODUCTION

Hamiltonian simulation, i.e., to simulate the real-time evolution $U(t)=e^{-iHt}$ of a physical Hamiltonian $H=\sum_{l}H_{l}$ , is considered to be a natural and powerful application of quantum computing Feynman (1982). It can also be used as an important subroutine in many other quantum algorithms like ground-state preparation Abrams and Lloyd (1999); Aspuru-Guzik et al. (2005), optimization problems Farhi et al. (2014); Zhou et al. (2020), and quantum linear solvers Harrow et al. (2009). To pursue real-world applications of Hamiltonian simulation with near-term quantum devices, we need to design feasible algorithms with small space complexity (i.e., qubit number) and time complexity (i.e., circuit depth and gate number).

One of the most natural Hamiltonian simulation methods is based on Trotter formulas Lloyd (1996); Suzuki (1990, 1991); Berry et al. (2007); Campbell (2019); Childs et al. (2019); Childs and Su (2019); Endo et al. (2019); Heyl et al. (2019); Chen et al. (2021); Şahinoğlu and Somma (2021); Su et al. (2021); Tran et al. (2020); Childs et al. (2021); Layden (2021); Zhao et al. (2022), which approximate the real-time evolution of $H=\sum_{l=1}^{L}H_{l}$ by the product of the simple evolution of its summands $e^{-iH_{l}t}$ . Besides its prominent advantage of simple realization without ancillas, Trotter methods are recently rigorously shown to enjoy commutator scaling Childs and Su (2019); Childs et al. (2021), i.e., the Trotter error is only related to the nested commutators of the Hamiltonian summands $\{H_{l}\}$ . This is very helpful for the Hamiltonians with strong locality constraints. For example, when we consider $n$ -qubit lattice Hamiltonians, the gate cost of high-order Trotter methods is almost linear to the system size $n$ , which is nearly optimal Childs and Su (2019). The major drawback of the Trotter methods is its polynomial gate cost to the inversed accuracy $1/\varepsilon$ , $\rm Poly(1/\varepsilon)$ . This is unfavorable in many applications where high-precision simulation is demanded to obtain practical advantages over the existing classical algorithms Reiher et al. (2017).

In recent years, we have seen the developments of “post-Trotter” algorithms with exponentially improved accuracy dependence Berry et al. (2014, 2015, 2015); Low and Chuang (2019, 2017); Low (2019). Due to the smart choice of the expansion formulas (i.e., Taylor series Berry et al. (2015) or Jacobi-Anger expansion Low and Chuang (2017)), these post-Trotter methods are able to catch the dominant terms in the time evolution operator $U(t)$ with polynomially increasing gate resources, leading to a logarithmic gate-number dependence on the accuracy requirement $1/\varepsilon$ . Unlike Trotter methods, these advanced algorithms are not able to utilize the specific structure of Hamiltonians due to the lack of commutator-based error form. Consequently, for instance, for $n$ -qubit lattice Hamiltonians, their gate complexities are $\mathcal{O}(n^{2})$ , which is worse than those in Trotter algorithms $\mathcal{O}(n^{1+o(1)})$ . Furthermore, these post-Trotter algorithms require the implementation of linear combination of unitary (LCU) formulas Childs and Wiebe (2012); Long (2011) or block encoding of Hamiltonians Low and Chuang (2019) which often costs many ancillary qubits and multicontrolled Toffoli gates. This is still experimentally challenging in a near-term or early fault-tolerant quantum computer Lin and Tong (2022). To reduce the hardware requirement of compiling the LCU formulas, recent studies focus on a random-sampling implementation of a LCU formula Childs and Wiebe (2012); Yang et al. (2021); Wan et al. (2022); Faehrmann et al. (2022), where the elementary unitaries are sampled to realize the LCU formula statistically. In this case, the Hamiltonian simulation is not performed by coherently implementing the unitary $U=e^{-iHt}$ , but is instead realized through random sampling. This method remains effective for common applications, such as estimating the properties of the final state. Similar ideas have also been studied in the ground-state preparation algorithms Lin and Tong (2020); Zeng et al. (2021); Zhang et al. (2022).

Here, we propose composite algorithms that combine the inherent advantages of Trotter and LCU methods—easy implementation, high precision, and commutator scaling—by performing the Trotter method and then compensating the Trotter error with the LCU formulas we construct. We primarily focus on the random-sampling implementation of the LCU formula Childs and Wiebe (2012); Yang et al. (2021); Wan et al. (2022); Faehrmann et al. (2022), with the goal of estimating the properties of the target state after real-time evolution. We demonstrate that optimal performance can be achieved by allowing the Trotter circuit to handle the majority of the simulation, with the LCU method completing the remainder.

In Sec. II, we provide a summary of our construction and results, aimed at a general audience. We explain the key ideas behind the constructions with intuitive examples. For readers interested in the technical aspects, we introduce the necessary preliminary knowledge of Hamiltonian simulation in Sec. III to facilitate understanding of the technical results. Next, in Sec. IV and Sec. V, we present a detailed construction and gate complexity analysis of the two Trotter-LCU algorithms. Finally, in Sec. VI, we conclude our discussion and outline possible future directions.

II SUMMARY of RESULTS

II.1 General idea

The major idea of the proposed Trotter-LCU algorithm is illustrated in Fig. 1. In a normal $K$ th-order Trotter circuit, we decompose the time evolution $U(t)=e^{-iHt}$ to $\nu$ segments, each with a small evolution time $x=t/\nu$ . For consistency, we denote the $0$ th-order Trotter formula as $S_{0}(x)=I$ . After we perform the $K$ th-order Trotter formula $S_{K}(x)(K=0,1,2k,k\in\mathbb{N}_{+})$ , there will be a remaining Trotter error $V_{K}(x):=U(x)S_{K}(x)^{\dagger}$ , which affects the simulation accuracy. To address this problem, we introduce a random LCU formula to compensate the Trotter error using one ancilla and simple gates, which achieves a high-precision Hamiltonian simulation with low cost.

Refer to caption — Figure 1: (a) In the normal $K$ th-order Trotter circuit, there will be a remaining Trotter error $V_{K}(x)$ after each segment. (b) We introduce random LCU formulas to compensate $V_{K}(x)$ with a single ancilla qubit and simple gates.

Consider the following LCU formula of an operator $V$ ,

\tilde{V}=\mu\sum_{i=0}^{\Gamma-1}\Pr(i)V_{i},

(1)

such that the spectral norm distance $\|V-\tilde{V}\|\leq\varepsilon$ . Here, $\mu>0$ is the $1$ -norm (i.e., $l_{1}$ -norm) of the coefficient vector, $\Pr(i)$ is a probability distribution over different unitaries $V_{i}$ , and $\{V_{i}\}_{i=0}^{\Gamma-1}$ is a set of unitaries. There are usually two ways to implement the LCU formula: the coherent implementation Berry et al. (2014, 2015) and the random-sampling implementation Yang et al. (2021); Wan et al. (2022); Faehrmann et al. (2022). Our major focus is on the random-sampling implementation, where we can estimate the properties of the target state $\rho=U(t)\rho_{0}U(t)^{\dagger}$ with only one ancillary qubit. In Appendix H, we discuss the potential use of the coherent implementation of our algorithm. In the random-sampling implementation, we can use Eq. 1 to estimate an arbitrary observable value $O$ on $\rho$ ,

\mathrm{Tr}(O\rho)\approx\mu^{2}\sum_{i,j}p_{i}p_{j}\mathrm{Tr}(OV_{i}\rho_{0}V_{j}^{\dagger}).

(2)

As is shown in Fig. 2(b), since the estimation of $\mathrm{Tr}(OV_{i}\rho_{0}V_{j}^{\dagger})$ can be implemented using Hadamard-test-type circuits Kitaev (1995), we only need to sample $V_{i}$ and $V_{j}$ based on the LCU formula in Eq. 1 to estimate $\mathrm{Tr}(O\rho)$ with $\varepsilon$ accuracy using $\mathcal{\mathcal{missing}}{O}(\mu^{4}/\varepsilon^{2})$ sampling resource, which owns an extra $\mu^{4}$ overhead compared to the normal Hamiltonian simulation algorithms Faehrmann et al. (2022). To make the algorithm efficient, we need to set $\mu$ to be a constant. We also provide a variant in Fig. 2(c) where the ancillary qubit is measured and reset in each segment, which is equivalent to Fig. 2(b) for the observable estimation. In this case, the expectation value of $\mu^{2}X^{(tot)}_{A}\otimes O_{S}$ provides an unbiased estimation of $\mathrm{Tr}(O\rho)$ where $X^{(tot)}:=\bigotimes_{k=1}^{\nu}X_{k}$ is the multiplication of all the ancillary measurement values. This variant reduces the need to store the ancilla qubit, simplifying the implementation on a fault-tolerant quantum computer.

The construction of an appropriate LCU formula for the $K$ th-order Trotter remainder, $V_{K}(x)$ , is crucial for developing an efficient Hamiltonian simulation algorithm. In the following subsections, we briefly introduce two approaches for constructing the LCU formula for $V_{K}(x)$ . The resulting composite Trotter-LCU algorithms are referred to as paired Taylor-series compensation (PTSC) and nested-commutator compensation (NCC), respectively. Detailed analysis and performance proofs for these two algorithms can be found in Sec. IV and Sec. V.

II.2 Paired Taylor-series compensation: overview

Without loss of generality, we focus on the case of an $n$ -qubit Hamiltonian $H$ , which can be written as

H=\sum_{l=1}^{L}H_{l}=\sum_{l=1}^{L}\alpha_{l}P_{l}=\lambda\sum_{l=1}^{L}p_{l}P_{l},

(3)

where $\{P_{l}\}_{l}$ are different $n$ -qubit Pauli operators. We set all the coefficients $\{\alpha_{l}\}_{l}$ to be positive and absorb the signs into Pauli operators $\{P_{l}\}_{l}$ . $\lambda:=\sum_{l}\alpha_{l}$ is the $l_{1}$ -norm of the Hamiltonian coefficient vector. We consider the Hamiltonians where $\lambda$ increases polynomially with respect to $n$ .

We first consider to construct the LCU formulas for $V_{K}(x)$ from Taylor-series expansion Berry et al. (2015). In the $0$ th-order case, when no Trotter formula is introduced, the Trotter remainder $V_{0}(x)$ is the short-time evolution $U(x)$ itself, which can be expanded as

	$\displaystyle V_{0}(x)$	$\displaystyle=e^{-ixH}=\sum_{s=0}^{\infty}F_{0,s}(x)=\sum_{s=0}^{\infty}\frac{(-ix)^{s}}{s!}H^{s}$		(4)
		$\displaystyle=e^{\lambda x}\sum_{s=0}^{\infty}\mathrm{Poi}(s;\lambda x)\sum_{l_{1},...,l_{s}}p_{l_{1}}.p_{l_{s}}P_{l_{1}}.P_{l_{s}}.$		(4)

Here, $\mathrm{Poi}(s;a):=e^{-a}a^{s}/s!$ is the Poisson distribution. $F_{0,s}(x)$ denotes the $s$ -order expansion term. Eq. 4 illustrated in Fig. 3(a) is a LCU formula of $V_{0}(x)$ with $1$ -norm $\mu_{x}=e^{\lambda x}$ . The $1$ -norm of the overall evolution $U(t)=U(x)^{\nu}$ is then $\mu=e^{\lambda t}$ , which, unfortunately, grows exponentially with respect to $t$ regardless of how much we increase the segment number $\nu$ . This implies that the direct random-sampling implementation of the LCU formula $V_{0}(x)$ in Eq. 4 is not feasible. In Ref. Berry et al. (2015), the authors discuss the coherent implementation of $V_{0}(x)$ instead and find that one can achieve good time and accuracy dependence in that scenario.

When focusing on the random-sampling implementation, we need to suppress the $1$ -norm $\mu_{x}$ of each segment. To this end, we first consider the usage of the Trotter formula. For example, if we apply first-order Trotter formula $S_{1}(x)=\prod_{l=1}^{L}e^{-ixH_{l}}$ in each segment, the first-order remainder $V_{1}(x):=U(x)S_{1}(x)^{\dagger}$ can be expanded as

	$\displaystyle V_{1}(x)$	$\displaystyle=e^{-ixH}\prod_{l=L}^{1}e^{ixH_{l}}=\sum_{s=0}^{\infty}F_{1,s}(x)$		(5)
		$\displaystyle=\sum_{r;r_{1},r_{2},...,r_{L}}\frac{(-ix)^{r}}{r!}H^{r}\prod_{l=L}^{1}\frac{(ix)^{r_{l}}}{r_{l}!}H^{r_{l}},$		(5)

where $\{r_{1},...,r_{L}\}$ denotes $L$ expansion variables related to the Trotter formula and $s:=r+\sum_{l=1}^{L}r_{l}$ . $F_{1,s}$ denotes the $s$ -order expansion term of $V_{1}(x)$ . The $1$ -norm of $V_{1}(x)$ in Eq. 5 is $e^{2\lambda x}$ , which seems to be even larger than the $0$ th-order case. However, since $V_{1}(x)$ denotes the (multiplicative) Trotter error, we have $F_{1,1}(x)=0$ . Using this condition, we can rewrite $V_{1}(x)$ as

V_{1}(x)=I+\sum_{s=2}^{\infty}F_{1,s}(x),

(6)

as illustrated in Fig. 3. From the Taylor-series expansion, we can bound the $1$ -norm of the new LCU formula in Eq. 6 by $\mu_{x}=e^{2\lambda x}-(2\lambda x)\leq e^{(2\lambda x)^{2}}$ . In this way, we reduce $\mu_{x}$ from $1+\mathcal{O}(\lambda x)$ to $1+\mathcal{O}((\lambda x)^{2})$ . The $1$ -norm of the overall time evolution becomes $\mu=\mu_{x}^{\nu}\leq\exp((2\lambda t)^{2}/\nu)$ . As a result, by increasing the segment number $\nu$ — or equivalently reducing the unit evolution time $x$ — we can decrease the 1-norm $\mu$ , leading to a lower sampling cost. If we choose the segment number as $\nu=\mathcal{O}((\lambda t)^{2})$ , $\mu$ will remain constant.

From the above discussion, it is clear that, to reduce the 1-norm $\mu$ of the overall LCU formula for $U=e^{-iHt}$ , our main objective is to suppress the leading-order term of the $1$ -norm remainder $\mu_{x}-1$ for each segment, which determines the number of segments $\nu$ and hence circuit depth when $\mu$ is set to be a constant.

We can further reduce the $1$ -norm $\mu_{x}$ of $V_{1}(x)$ by taking advantage of the structure of Trotter errors. For an anti-Hermian Pauli operator $\pm iP$ where $P\in\{I,X,Y,Z\}^{\otimes n}$ , we have the following Euler’s formula,

I\pm iyP=\sqrt{1+y^{2}}e^{\pm i\theta P}.

(7)

Here, $\theta=\tan^{-1}(y)$ and we suppose $0<y<1$ . The 1-norm of the left-hand side of Eq. 7 is $1+y$ , while the 1-norm of the right-hand side is $\sqrt{1+y^{2}}<1+\frac{y^{2}}{2}=1+\mathcal{O}(y^{2})$ . As a result, the exponent of $\mu_{x}-1$ is effectively doubled. In subsection IV.2, we prove that the expansion terms $F_{1,2}$ and $F_{1,3}$ in the LCU formula Eq. 5 are anti-Hermitian. As a result, we can further suppress $\mu_{x}$ by pairing the terms in $F_{1,s}(x)$ with $F_{1,0}=I$ using Euler’s formula in Eq. 7. When $y=x^{s}$ , the paired formula $R_{1,s}(x)$ as a summation of Pauli rotation unitaries owns the $1$ -norm of $\mu_{x}=1+\mathcal{O}(x^{2s})$ , whose $x$ dependence is doubled, as illustrated inFig. 3(c).

To generalize the discussion, for the $K$ th-order Trotter error $V_{K}(x)$ ( $K=2k$ , $k\in\mathbb{N}_{+}$ ), we have $F_{K,1}=...F_{K,K}=0$ Suzuki (1990); Childs et al. (2021). Moreover, we can show that $F_{K,K+1}$ , $F_{K,K+2}$ ,…, $F_{K,2K+1}$ are anti-Hermitian. We call the term with $s=K+1$ to $2K+1$ the leading-order terms. The algorithm utilizing $K$ th-order Trotter formula and the paired idea in the LCU construction is called the $K$ th-order PTSC algorithm. We provide the detailed algorithm description and gate-complexity analysis in Sec. IV.

The PTSC algorithm is generic for the $L$ -sparse Hamiltonian $H=\sum_{l=1}^{L}H_{l}=\lambda\sum_{l=1}^{L}p_{l}P_{l}$ with $\lambda=\sum_{l}\|H_{l}\|$ and $\{P_{l}\}$ are Pauli matrices. It can be implemented using a simple and universal classical random-sampling procedure: first, we sample the order $s$ from the Taylor-series expansion, and then we sample a Pauli string based on the Hamiltonian coefficients. With the random-sampling implementation, we prove that by appending few gates after the $K$ th-order Trotter formula with only one ancillary qubit, one can improve the time scaling from $1+1/K$ to $1+1/(2K+1)$ and exponentially improves the accuracy scaling of the $K$ th-order Trotter formula compared to the $K$ th-order Trotter. We have the following theorem.

Theorem 1 (Informal, see Theorem 1 in Sec. IV) In a $K$ th-order paired Taylor-series compensation algorithm ( $K=0,1$ or $2k$ , $k\in\mathbb{N}_{+}$ ), the gate complexity in a single round is $\mathcal{O}\left((\lambda t)^{1+\frac{1}{2K+1}}(\kappa_{K}L+\frac{\log(1/\varepsilon)}{\log\log(1/\varepsilon)})\right)$ , where $\lambda=\sum_{l}\|H_{l}\|$ , $\kappa_{K}=K$ when $K=0$ or $1$ , $\kappa_{K}=2\times 5^{K/2-1}$ otherwise.

From Theorem 1, we observe that setting $K=0$ , i.e., not using Trotter formulas, still yields a valid PTSC algorithm by pairing $F_{0,1}$ with $I$ . In this case, the gate complexity is independent of the sparsity $L$ but quadratically dependent on $t$ , similar to the algorithm in Ref. Wan et al. (2022). Conversely, when using a $K$ th-order Trotter formula, the PTSC algorithm becomes $L$ dependent with an almost linear dependence on $t$ . In both cases, the PTSC algorithms achieve high simulation accuracy $\varepsilon$ . We expect the $0$ th-order algorithm to be particularly useful for quantum chemistry Hamiltonians with large $L$ , while higher-order algorithms are better suited for generic $L$ -sparse Hamiltonians with long simulation times $t$ .

II.3 Nested-commutator compensation: overview

The PTSC algorithms above are generic and applicable to any Hamiltonian. When we consider the detailed structure of Hamiltonians, we could make the compensation algorithms more efficient by taking advantage of the commutation relationship of the terms in the Hamiltonians, which was formerly also studied in the Trotter algorithms Childs et al. (2021).

We will take the first-order Trotter remainder $V_{1}(x)$ as an illustrative example. Following the Taylor-series expansion in Eq. 5, the second-order term $F_{1,2}(x)$ in $V_{1}(x)$ can be written as

\displaystyle F_{1,2}(x)

\displaystyle=\sum_{\begin{subarray}{c}r;\vec{r}\\ r+\sum\vec{r}=2\end{subarray}}\frac{(-ixH)^{r}}{r!}\prod_{l=L}^{1}\frac{(ixH_{l})^{r_{l}}}{r_{l}!}.

(8)

Since $F_{1,2}(x)$ is anti-Hermitian, all the Hermitian expansion terms in Eq. 8 will cancel out. We can then simplify $F_{1,2}(x)$ as,

$\displaystyle F_{1,2}(x)$	$\displaystyle=(ix)^{2}\left(\sum_{l=1}^{L}\frac{H_{l}^{2}}{2!}+\sum_{l,l^{\prime}:l>l^{\prime}}H_{l}H_{l^{\prime}}\right)$	(9)
	$\displaystyle\quad+(-ixH)\sum_{l=1}^{L}(ixH_{l})+\frac{(-ixH)^{2}}{2!}$
	$\displaystyle=\frac{x^{2}}{2}\sum_{l,l^{\prime}:l>l^{\prime}}[H_{l^{\prime}},H_{l}],$

which is a summation of $L(L-1)$ commutators. Since the commutators of Hermitian operators are always anti-Hermitian, this implies that the nested-commutator expansion of $F_{1,2}(x)$ in Eq. 9 is compact enough since there is no Hermitian expansion terms in it.

For a common physical Hamiltonian with locality constraints, we can take advantage of the commutator-form expression like Eq. 9. For example, for an $n$ -qubit lattice Hamiltonian with the form,

H=\sum_{j=0}^{n-1}H_{j,j+1},

(10)

where the summand $H_{j,j+1}$ acts on the $j$ th and $(j+1)$ th vertices. We can split the Hamiltonian to two components $H=A+B$ where $A:=\sum_{j:\mathrm{even}}H_{j,j+1},\quad B:=\sum_{j:\mathrm{odd}}H_{j,j+1}$ , so that the summands $H_{j,j+1}$ commute with each other in each component. We denote the norm of each Hamiltonian summand as

\displaystyle\Lambda=\max_{j}\|H_{j,j+1}\|,\quad\Lambda_{1}

\displaystyle=\max_{j}\|H_{j,j+1}\|_{1}.

(11)

Now, suppose we estimate the $1$ -norm of $F_{1,2}(x)$ of the lattice Hamiltonian based on Eq. 9, we can see that there are only $n$ nonzero terms: for any given Hamiltonian component $H_{j,j+1}$ , only $H_{j-i,j}$ and $H_{j+1,j+2}$ do not commute with it. Then, the norm of $F_{1,2}(x)$ is bounded by

\|F_{1,2}(x)\|_{1}\leq n\Lambda_{1}\frac{x^{2}}{2}=\mathcal{O}(nx^{2}).

(12)

Comparing with the original bound in Eq. 62, $\|F_{1,2}(x)\|_{1}\leq\eta_{2}=\mathcal{O}((\lambda x)^{2})=\mathcal{O}(n^{2}x^{2})$ , we improve the system-size-related factor $n$ . The improved system-size $n$ dependence of the 1-norm $\|F_{1,2}(x)\|_{1}$ suggests a corresponding improvement in the system-size dependence of the gate complexity for the nested commutator algorithm.

Now, we are going to generalize the idea above. In Sec. V, we show how to expand the second- and third-order terms of $V_{1}(x)$ as a summation of nested commutators,

	$\displaystyle F_{1,s}(x)$	$\displaystyle=\frac{x^{s}}{s!}C_{s-1}=\frac{(-ix)^{s}}{s!}\Big{(}\mathcal{A}_{\mathrm{com}}^{(s-1)}(H_{L},.,H_{1};H)$		(13)
		$\displaystyle\,-\sum_{l=1}^{L}\mathcal{A}_{\mathrm{com}}^{(s-1)}(H_{L},.,H_{l+1};H_{l})\Big{)},\quad s=2,3,$		(13)

where $\mathcal{A}_{\mathrm{com}}^{(s)}(A_{L},...,A_{1};B)$ is defined to be

\displaystyle\sum_{m_{1}+...+m_{L}=s}\binom{s}{m_{1},...,m_{L}}\mathrm{ad}_{A_{L}}^{m_{L}}.\mathrm{ad}_{A_{1}}^{m_{1}}B.

(14)

We also use the adjoint notation $\mathrm{ad}_{A_{L}}...\mathrm{ad}_{A_{1}}B:=[A_{L},...[A_{1},B]...]$ . It is easy to check that the form of $F_{1,s}(x)$ ( $s=2$ or $3$ ) in Eq. 13 is anti-Hermitian, which is consistent with the discussion in the paired Taylor-series algorithm in the previous section. We can also generalize the method to the case of $K$ th-order Trotter remainder, that is, to express the expansion terms $F_{K,s}$ of the $K$ th-order Trotter remainder based on the nested commutators.

Based on the nested-commutator expansion, we propose the $K$ th-order nested-commutator compensation (NCC) algorithm. As is illustrated in Fig. 4, in the construction of NCC, we first utilize the nested-commutator forms of Trotter error terms from $K+1$ to $2K+1$ order, i.e., the leading-order terms; then we apply the order-pairing techniques similar to PTSC in Fig. 3(c) to further suppress the $1$ -norm of $\mu_{x}$ . A key difference between NCC and PTSC algorithms is that in PTSC algorithms, we compensate the Trotter error $V_{K}(x)$ up to arbitrary order; while in NCC algorithms, we only compensate $V_{K}(x)$ for leading-order terms, which shrinks the error from $\mathcal{\mathcal{missing}}{O}(x^{K+1})$ to $\mathcal{O}(x^{2K+2})$ in one slice with the sampling cost $\mu=1+\mathcal{O}(\frac{\|C_{K}\|_{1}}{(K+1)!}x^{2K+2})$ . The gate complexity estimation is then converted to the calculation of the $1$ -norm of the commutator $\|C_{K}\|_{1}$ . For instance, if we consider the $n$ -qubit lattice Hamiltonian models in Eq. 10, then we can prove that $\|C_{K}\|_{1}=\mathcal{O}(n)$ . We can then provide the following performance guarantee for the NCC algorithms.

Theorem 2 (Informal, see Theorem 2 in Sec. V) In a $K$ th-order nested-commutator compensation (NCC) algorithm ( $K=1$ or $2k$ ) with $n$ -qubit lattice Hamiltonians, the gate complexity in a single round is $\mathcal{O}(n^{1+\frac{2}{2K+1}}t^{1+\frac{1}{2K+1}}\varepsilon^{-\frac{1}{2K+1}})$ .

Compared to the performance of $K$ th-order Trotter algorithm $\mathcal{O}((nt)^{1+1/K}\varepsilon^{-1/K})$ Childs and Su (2019), we achieve $t$ - and $\varepsilon$ -dependence better than $2K$ th-order Trotter using only $K$ th-order Trotter formula with simple compensation gates of Pauli-rotation operators. To generalize the result in Theorem 2, we also study the performance of Nested Commutator (NCC) algorithms when applied to a general Hamiltonian in subsection V.3.

II.4 Efficient random-sampling implementation

A simple implementation of the Trotter-LCU algorithm in Fig. 2(b) or Fig. 2(c) requires not only an easy-to-implement quantum circuit but also efficient classical random sampling of Pauli operators from the Trotter remainder $V_{K}(x)$ . We now briefly discuss how to realize an efficient classical random sampling in PTSC and NCC algorithms, that is, with a space resource of $\mathcal{O}(\kappa_{K}K)$ and time resource of $\mathcal{O}(K(\log{L}+\log\kappa_{K}))$ where $L$ is the sparsity of the Hamiltonian.

A key idea for achieving efficient sampling is to use a multistage hierarchical sampling algorithm. Rather than fully expanding the Trotter remainder into a direct summation of Pauli operators, we structure the LCU formula into multiple layers. This allowss us to decompose the overall Pauli operator sampling process into a series of simpler, more manageable sampling tasks.

In the PTSC algorithm, the Trotter remainder in Eq. 5 is derived by expanding the time-evolution $e^{-ixH}$ and each Hamiltonian summand term $e^{ixH_{l}}$ by Taylor series independently. As a result, the sampling can be done by first sampling the overall expansion order $s$ , then sample the individual expansion order $r$ of Hamiltonian $H$ or the expansion order $\vec{r}:=[r_{1},r_{2},...,r_{L}]$ of the summands $\{H_{l}\}$ . The sampling of $r$ and $\vec{r}$ , following the analysis in Sec. IV, can be done based on a multinomial distribution $\text{Mul}(r,\vec{r};\{\frac{1}{2},\frac{\vec{p}}{2}\};s)$ where $\vec{p}:=[p_{1},p_{2},...,p_{L}]$ denotes the normalized coefficient factor of $H$ defined in Eq. 3. For the sampled Hamiltonian $H$ , we further sample the summands $H_{l}$ inside based on $\vec{p}$ . We summarize the sampling algorithm in Fig. 5.

In the NCC algorithm, the Trotter remainder is expanded based on the adjoint operators. For example, for the lattice Hamiltonian $H=A+B$ in Eq. 10, we can write the second- and third-order Trotter remainder of $V_{1}(x)$ as,

	$\displaystyle F_{1,2}^{(nc)}(x)$	$\displaystyle=-\frac{x^{2}}{2!}\mathrm{ad}_{A}B,$		(15)
	$\displaystyle F_{1,3}^{(nc)}(x)$	$\displaystyle=i\frac{x^{3}}{3!}(2\mathrm{ad}_{B}\mathrm{ad}_{A}B+\mathrm{ad}_{A}^{2}B).$		(15)

We can first sample the expansion order $s=2$ or $3$ . If $s=3$ , we then sample the specific commutator, i.e., $\mathrm{ad}_{B}\mathrm{ad}_{A}B$ or $\mathrm{ad}_{A}^{2}B$ . For the given commutator, for example, $\mathrm{ad}_{B}\mathrm{ad}_{A}B=[B,[A,B]]$ , we first randomly sample a summand $H_{j,j+1}$ for the rightmost $B$ as the starting point of the adjoint operator. The action of the subsequent $\mathrm{ad}_{A}$ and $\mathrm{ad}_{B}$ will enlarge the support of $H_{j,j+1}$ , but within a “light-cone” region shown in Fig. 6. We then sample the Hamiltonian summand $H_{j_{1},j_{1}+1}$ and $H_{j_{2},j_{2}+1}$ for the adjoint operators $\mathrm{ad}_{B}$ and $\mathrm{ad}_{A}$ , but within the light-cone region. This will ensure our sampling to be efficient and with nested-commutator scaling.

A problem of sampling the Hamiltonian summands $H_{j,j+1}$ in the commutator is that, since the Hamiltonian $H$ may be nonhomogeneous, the $1$ -norm of $H_{j,j+1}$ with different $j$ may be different. This will complicate the sampling algorithm, since we need to first calculate the $1$ -norm of all the elementary commutators with the form of

\mathrm{ad}_{H_{j_{2},j_{2}+1}}\mathrm{ad}_{H_{j_{1},j_{1}+1}}H_{j,j+1},

(16)

for all $j,j_{1}$ and $j_{2}$ . When the Trotter order $K$ and the expansion order $s=K+1,...,2K+1$ gets larger, the number of elementary adjoint operators will increase exponentially. Consequently, we cannot estimate the $1$ -norm of all the elementary commutator.

To solve this problem but still keep the advantage of the NCC algorithm, we introduce the following Hamiltonian “padding” technique to ensure all the elementary nested commutator own the same $1$ -norm. Consider a Hamiltonian summand $H_{j,j+1}$ , which can be expanded to some Pauli operators,

H_{j,j+1}=\sum_{\omega}\alpha_{j}^{(\omega)}P_{j,j+1}^{(\omega)},

(17)

where $\alpha_{j}^{(\omega)}$ is a positive number and $P_{j,j+1}^{(\omega)}$ is a normalized Pauli operator whose support is on qubit $j$ and $j+1$ . Recall that $\Lambda_{1}:=\max_{j}\|H_{j,j+1}\|_{1}$ and $\|H_{j,j+1}\|_{1}:=\sum_{\omega}\alpha_{j}^{(\omega)}$ . When $\|H_{j,j+1}\|_{1}$ is smaller than $\Lambda_{1}$ , we add extra trivial terms $\pm I$ in the Pauli decomposition of $H_{j,j+1}$ ,

	$\displaystyle\bar{H}_{j,j+1}$	$\displaystyle=\sum_{\omega}\alpha_{j}^{(\omega)}P_{j,j+1}^{(\omega)}+\frac{\delta\Lambda_{1}}{2}I+\frac{\delta\Lambda_{1}}{2}(-I)$		(18)
		$\displaystyle=\Lambda_{1}\sum_{\omega}\bar{p}_{j}^{(\omega)}P_{j,j+1}^{(\omega)},$		(18)

where $\delta\Lambda_{1}:=\Lambda_{1}-\|H_{j,j+1}\|_{1}$ . Eq. 18 holds naturally, but now with a manually predetermined $1$ -norm value $\Lambda_{1}$ . Similarly, we can pad all the elementary commutators with the form

\mathrm{ad}_{H_{j_{1},j_{1}+1}}H_{j,j+1}=H_{j_{1},j_{1}+1}H_{j,j+1}-H_{j,j+1}H_{j_{1},j_{1}+1},

(19)

so that their $1$ -norms are all $2\Lambda_{1}^{2}$ . In this way, we ignore the commutator relationship between $H_{j,j+1}$ and $H_{j_{1},j_{1}+1}$ as long as they are in the light-cone region.

After the padding procedure described above, all elementary nested commutators of the same order will have the same $1$ -norm. This property allows us to uniformly sample these commutators: the starting summand $H_{j,j+1}$ is sampled uniformly from those in $B$ , and the subsequent $H_{j_{1},j_{1}+1}$ and $H_{j_{2},j_{2}+1}$ are sampled uniformly within the light-cone region. However, when the starting summand $H_{j,j+1}$ is near the boundary, applying a few adjoint operators may cause it to touch the boundary. This reduces the number of possible elementary nested commutators compared to those starting from the center, resulting in different $1$ -norms for $\text{ad}_{A}H_{j,j+1}$ depending on $j$ . This complicates the sampling of the starting summand $H_{j,j+1}$ . To resolve this issue and ensure uniform sampling of $H_{j,j+1}$ , we introduce virtual qubits at the boundary, as illustrated in Fig. 6, and pad the virtual qubits with $0$ -summed $\pm I$ terms. Since we only perform $\pm I$ operation on the virtual qubits, we do not need to introduce it in the real experiments.

We remark that, our padding method preserve the locality structure. As a result, the performance guarantee in Theorem 2 still holds. We summarize the sampling algorithm in Fig. 7. If we consider the Heisenberg Hamiltonian

H=\sum_{i}\vec{\sigma}_{i}\vec{\sigma}_{i+1}+\sum_{i}Z_{i},

(20)

as an example, where $\vec{\sigma}_{i}:=(X_{i},Y_{i},Z_{i})$ is the vector of Pauli operators on the $i$ th qubit, we can define the summand $H_{j,j+1}$ to be

H_{j,j+1}=4\left(\frac{1}{4}X_{j}X_{j+1}+\frac{1}{4}Y_{j}Y_{j+1}+\frac{1}{4}Z_{j}Z_{j+1}+\frac{1}{4}Z_{j}\right).

(21)

In this case, $\Lambda_{1}=\|H_{j,j+1}\|=4$ . The probability distribution $\bar{p}_{j}^{(\omega)}$ in Fig. 7 is to uniformly sample the $XX$ , $YY$ , $ZZ$ and $ZI$ term. As a demonstration, we explicitly present the algorithm to sample the Pauli-rotation operator in the first-order NCC algortihm for the Heisenberg Hamiltonian in Eq. 20 in Algorithm 1.

Algorithm 1 Demonstration: sampling of

V_{i}

V_{j}

of first-order NCC algorithm for the Heisenberg Hamiltonian in Eq. 20

1:An

n

-qubit Heisenberg Hamiltonian

H

in Eq. 20; unit evolution time

0<x<1

for each Trotter segment;

2:Sampling of a Pauli-rotation operator

V_{i}

from the Trotter remainder

\tilde{V}_{1}^{(nc)}(x)

3:Sample the expansion order

s\in\{2,3\}

with the probability

\{\frac{1}{1+24x},\frac{24x}{1+24x}\}

4:if

s=3

then

5: Sample the adjoint form

\mathrm{ad}^{(2)}B

from

\{\mathrm{ad}_{B}\mathrm{ad}_{A}B,\mathrm{ad}_{A}^{2}B\}

with the probability

\{1/3,2/3\}

6:end if

7:Sample the starting index

j

uniformly from all odd indices in

0,...,n-1

. Sample the Pauli operator

P_{j,j+1}

uniformly from

\{XX,YY,ZZ,ZI\}

on qubit

j

and

j+1

8:Set

W:=P_{j,j+1}

9:Sample the adjoint index

j_{1}

uniformly from

\{j-1,j+1\}

. Sample the Pauli operator

P_{j_{1},j_{1}+1}

uniformly from

\{XX,YY,ZZ,ZI\}

on qubit

j_{1}

and

j_{1}+1

. Sample the multiplication order

b_{1}

uniformly from

\{0,1\}

. ¹¹1If

j-1

j+1

exceeds the index range

0,...,n-1

, we pad extra virtual qubits similar to Fig. 6.

10:if

b_{1}=0

then

11: Set

W:=P_{j_{1},j_{1}+1}W

12:else

\triangleright

b_{1}=1

13: Set

W:=-WP_{j_{1},j_{1}+1}

14:end if

15:if

s=3

then

16: if

\mathrm{ad}^{(2)}B=\mathrm{ad}_{B}\mathrm{ad}_{A}B

then

17: Sample the adjoint index

j_{2}

uniformly from

\{j-2,j,j+2\}

. Sample the Pauli operator

P_{j_{2},j_{2}+1}

uniformly from

\{XX,YY,ZZ,ZI\}

on qubit

j_{2}

and

j_{2}+1

. Sample the multiplication order

b_{2}

uniformly from

\{0,1\}

18: else

\triangleright

\mathrm{ad}^{(2)}B=\mathrm{ad}_{A}^{2}B

19: Sample the adjoint index

j_{2}

uniformly from

\{j-3,j-1,j+1\}

. Sample the Pauli operator

P_{j_{2},j_{2}+1}

uniformly from

\{XX,YY,ZZ,ZI\}

on qubit

j_{2}

and

j_{2}+1

. Sample the multiplication order

b_{2}

uniformly from

\{0,1\}

20: end if

21: if

b_{2}=0

then

22: Set

W:=P_{j_{2},j_{2}+1}W

23: else

\triangleright

b_{2}=1

24: Set

W:=-WP_{j_{2},j_{2}+1}

25: end if

26:end if

27:Return

V_{i}:=\exp\left(i\theta W\right)

as the sampled Pauli-rotation operator. Here

\theta:=\tan^{-1}(16nx^{2}(1+24x))

As a final remark, the sampling procedure in both PTSC and NCC algorithms are independent of the implementation of the quantum circuit and the measurement outcome. Thanks to this property, we can perform the classical sampling during the quantum circuit implementation or even generate the sampled Pauli matrices before the implementation of the quantum circuits.

II.5 Performance comparison

In Table 1, we compare the implementation complexity and the gate complexity in a single round of experiment of the $0$ th-order PTSC, $K$ th-order PTSC, and $K$ th-order NCC algorithms to previous Hamiltonian simulation algorithms. For a fair comparison, we set the 1-norm of all LCU formulas $\mu$ to be constant. In this case, the sample complexity of PTSC and NCC algorithms incurs a $\mu^{4}$ overhead compared to standard sampling from the $K$ th-order Trotter or post-Trotter algorithms.

We show that by inserting a few randomly sampled Pauli-rotation gates after each Trotter segment, as illustrated in Fig. 1, both PTSC and NCC achieve improved accuracy and time dependence. The gate counts for PTSC exhibit logarithmic dependence on accuracy, $\log(1/\varepsilon)$ , while the NCC gate counts show improved system-size dependence.

Algorithm	Implementation hardness	Accuracy	Size scaling (lattice Hamiltonian)	Time dependence
$K$ th-order Trotter Suzuki (1990)	Easy	$\mathcal{O}(\varepsilon^{-1/K})$	$\mathcal{O}(n^{1+\frac{1}{K}})$	$\mathcal{O}(t^{1+\frac{1}{K}})$
Post-Trotter Berry et al. (2015); Low and Chuang (2019)	Hard	$\mathcal{O}(\tilde{\log}(1/\varepsilon))$	$\mathcal{O}(n^{2})$	$\mathcal{O}(t)$
$0$ th-order PTSC	Easy	$\mathcal{O}(\tilde{\log}(1/\varepsilon))$	$\mathcal{O}(n^{2})$	$\mathcal{O}(t^{2})$
$K$ th-order PTSC	Easy	$\mathcal{O}(\tilde{\log}(1/\varepsilon))$	$\mathcal{O}(n^{2+\frac{1}{2K+1}})$	$\mathcal{O}(t^{1+\frac{1}{2K+1}})$
$K$ th-order NCC	Easy	$\mathcal{O}(\varepsilon^{-1/(2K+1)})$	$\mathcal{O}(n^{1+\frac{2}{2K+1}})$	$\mathcal{O}(t^{1+\frac{1}{2K+1}})$

Table 1: Comparison of the implementation hardness and gate complexity in a single round of the circuit for Trotter-LCU methods versus previous algorithms. Here,

\tilde{\log}(x):=\log(x)/\log\log(x)

. The (implementation) hardness refers to whether one needs to implement multicontrolled gates with plenty of ancillary qubits. In the comparison for the system-size scaling of lattice Hamiltonians, we use the fact that

\lambda=\mathcal{O}(n)

and

L=\mathcal{O}(n)

To demonstrate how the Trotter-LCU algorithms can help reduce gate costs in practical scenarios, we estimate the single-shot gate count of random-sampling Trotter-LCU algorithms and compare it with the state-of-the-art Trotter algorithm, i.e., fourth-order Trotter Suzuki (1991); Childs et al. (2019, 2021). To ensure a fair comparison, we set $1$ -norm of the LCU formula to be $\mu=2$ . Based on Proposition 1, this implies that the random-sampling Trotter-LCU algorithm will require an additional factor of $16$ in the sample number to estimate the properties of the target state $U\rho U^{\dagger}$ where $U:=e^{iHt}$ to a given precision compared to the normal Trotter or coherent implementation of LCU algorithms.

We compile their quantum circuits to $\mathcal{CNOT}$ gates, single-qubit Clifford gates, and single-qubit $Z$ -axis rotation gates $R_{z}(\theta)=e^{i\theta Z}$ . Here, we mainly compare the number of $R_{z}(\theta)$ gates since they are the most resource-consuming part on a fault-tolerant quantum computer Litinski (2019). The $\mathcal{CNOT}$ gate-number comparison results can be found in Appendix G, which is similar to the $R_{z}(\theta)$ gate comparison. We also compare the gate counts of the Trotter-LCU algorithms to the coherent-implementation of LCU Berry et al. (2014, 2015) and QSP Low and Chuang (2017, 2019) in Appendix G.

In the first comparison, we consider the simulation of generic Hamiltonians without the usage of commutator information, for which we choose the $2$ -local Hamiltonian, $H=\sum_{i,j}X_{i}X_{j}+\sum_{i}Z_{i}$ where $X_{i}$ and $Z_{i}$ are the Pauli matrices on the $i$ th qubit. Fig. 8(a,b) show the gate counts for the fourth-order Trotter formula and the PTSC Trotter-LCU algorithms with different orders with an increasing time $t$ and increasing system size $n$ , respectively. The gate counting method for fourth-order Trotter with random permutation is based on the analytical bounds in Ref. Childs et al. (2019).

From Fig. 8(a,b), we can clearly see the advantage of composition of Trotter and LCU formulas: if we do not use LCU and merely apply Trotter formulas, the gate resource of fourth-order Trotter suffers from a large overhead that is 2 orders of magnitudes larger than the PTSC algorithms. Moreover, if we increase $\varepsilon$ from $10^{-3}$ to $10^{-5}$ , we can see a clear increase of the gate resources for the Trotter algorithm. For the PTSC algorithms, however, the gate number is almost not affected by $\varepsilon$ since they enjoy a logarithmic $\varepsilon$ -dependence.

On the other hand, if we do not use Trotter formula, the $0$ th-order PTSC algorithm owns a quadratically worse $t$ -dependence ( $\mathcal{O}(t^{2})$ ) than the fourth-order Trotter ( $\mathcal{O}(t^{1.25})$ ), second-order PTSC ( $\mathcal{O}(t^{1.2})$ ) and fourth-order PTSC ( $\mathcal{O}(t^{1.11})$ ) algorithms in Fig. 8(a). For a short-time evolution $t=n$ , the system-size dependence of second- or fourth-order PTSC in Fig. 8(b) outperforms $0$ th-order PTSC algorithm. In the case when long-time Hamiltonian simulation is required, for example, $t$ should be set to $10^{3}$ for the phase estimation Campbell (2019), the advantage of $2$ nd or fourth-order PTSC to $0$ th-order PTSC will be more obvious. The composition of Trotter and LCU formulas enables $2$ nd or fourth-order PTSC to enjoy good $t$ and $\varepsilon$ dependence and small gate-resource overhead simultaneously.

Next, we compare the gate count when simulating the lattice models, where the commutator analysis will help remarkably reduce the gate count. We consider the Heisenberg Hamiltonian $H=\sum_{i}\vec{\sigma}_{i}\vec{\sigma}_{i+1}+\sum_{i}Z_{i}$ using the nested-commutator bounds. In Fig. 8(c), we choose $n=t=12$ and $50$ and show the gate number with respect to the accuracy requirement $\varepsilon$ . The fourth-order Trotter error analysis is based on the nested-commutator bound (Proposition M.1 in Ref. Childs et al. (2021)), which is currently the tightest Trotter error analysis. The performance of our second-order NCC algorithm is based on the analytical bound in Sec. H in the Appendix. We mainly present results for the second-order NCC algorithm due to its simplicity and leave precise higher-order NCC gate count analysis for future study.

From Fig. 8(c), we can see that, while enjoying near-optimal system-size scaling similar to the fourth-order Trotter algorithm which is currently the best one for lattice Hamiltonians Childs et al. (2021); Childs and Su (2019), the second-order NCC algorithm shows better accuracy dependence than fourth-order Trotter algorithm. Particularly, using the same gate number as the fourth-order Trotter, we are able to achieve a 3 to 4 orders of magnitudes higher accuracy $\varepsilon$ .

III PRELIMINARIES

In this section, we review the Hamiltonian simulation algorithms based on Trotter and LCU formulas.

In all the Hamiltonian simulation algorithms discussed in this work, we divide the real-time evolution $U(t)$ into $\nu$ segments,

U(t)=e^{-iHt}=(U(x))^{\nu}=\left(e^{-iHx}\right)^{\nu},

(22)

with $x:=t/\nu$ , and consider the construction of each small segment $U(x)$ . Without loss of generality, we assume $x>0$ .

III.1 Trotter formulas

The most natural way to approximate $U(x)$ is to apply the Lie-Trotter-Suzuki formulas Suzuki (1990, 1991). Hereafter, we refer to them as Trotter formulas. The first-order Trotter formula is

\displaystyle S_{1}(x)=\prod_{l=1}^{L}e^{-ixH_{l}}=e^{-ixH_{L}}.e^{-ixH_{2}}e^{-ixH_{1}}=:\prod^{\leftarrow}e^{-ix\vec{H}}.

(23)

Here, $\vec{H}:=[H_{1},H_{2},...,H_{L}]$ . In Eq. 23, we simplify the notation of the sequential products from $l=1$ to $L$ with the same form, using the arrow to denote the ascending direction of the dummy index $l$ . The Hermitian conjugation of $S_{1}(x)$ can similarly be written as $S_{1}(x)^{\dagger}=\overset{\rightarrow}{\prod}e^{ix\vec{H}}$ .

The second-order Trotter formula is

	$\displaystyle S_{2}(x)$	$\displaystyle=D_{2}(-x)^{\dagger}D_{2}(x)=S_{1}(-\frac{x}{2})^{\dagger}S_{1}(\frac{x}{2})$		(24)
		$\displaystyle=\prod^{\rightarrow}e^{-i\frac{1}{2}x\vec{H}}\prod^{\leftarrow}e^{-i\frac{1}{2}x\vec{H}},$		(24)

where $D_{2}(x):=S_{1}(\frac{x}{2})$ . We have $S_{2}(-x)^{\dagger}=S_{2}(x)$ .

The general $2k$ th-order Trotter formula is Suzuki (1991)

	$\displaystyle S_{2k}(x)$	$\displaystyle=D_{2k}(-x)^{\dagger}D_{2k}(x)$		(25)
		$\displaystyle=[S_{2k-2}(u_{k}x)]^{2}S_{2k-2}((1-4u_{k})x)[S_{2k-2}(u_{k}x)]^{2},$		(25)

with $u_{k}:=1/(4-4^{1/(2k-1)})$ for $k\geq 1$ . The operator $D_{2k}(x)$ is defined recursively,

D_{2k}(x):=D_{2k-2}((1-4u_{k})x)S_{2k-2}(u_{k}x)^{2}.

(26)

By induction from Eq. 25 and Eq. 26 we can show that $S_{2k}(-x)^{\dagger}=S_{2k}(x)$ .

We also denote the zeroth-order Trotter formula to be $S_{0}(x)=I$ for consistency. We denote the multiplicative remainder of the Trotter formulas as

\displaystyle V_{K}(x)

\displaystyle=U(x)S_{K}(x)^{\dagger},

(27)

for $K=0,1,2k$ . In what follows, we name this Trotter error term $V_{K}(x)$ as the $K$ th-order Trotter remainder to avoid ambiguity to other error effects such as truncation error.

In Ref. Suzuki (1990), Suzuki proves the following order condition for the Trotter formulas,

S_{K}(x)=U(x)+\mathcal{O}(x^{K+1})=e^{-iHx+\mathcal{O}(x^{K+1})},

(28)

for $K=1$ or even positive $K$ . As a result, the remainder $V_{K}(x)$ will only contain the terms of $x^{q}$ with $q\geq K+1$ . We will use the following order condition of Trotter formulas in the later discussion.

Lemma 1 (Order condition of Trotter formulas, Theorem 4 in Childs et al. (2021)).

Let $H$ be an Hermitian operator and $U(x):=e^{-iHx}$ to be the time-evolution operator with $x\in\mathbb{R}$ . For the Trotter formula $S_{K}(x)$ defined in Eq. 23 and Eq. 25, where $K=1$ or positive even number, we have the following

1.

Additive error: $A_{K}(x):=S_{K}(x)-U(x)=\mathcal{O}(x^{K+1})$ ,
2.

Multiplicative error: $M_{K}(x):=U(x)S_{K}(x)^{\dagger}-I=\mathcal{O}(x^{K+1})$ ,
3.

For the exponentiated error $E_{K}(x)$ such that $S_{K}(x)=\mathcal{T}\exp\left(\int_{0}^{x}d\tau(-iH+E_{K}(\tau))\right)$ , we have $E_{K}(x)=\mathcal{O}(x^{K})$ .

III.2 LCU formulas

Instead of decomposing $U(x)$ as a product of elementary unitaries, another way is to decompose $U(x)$ to a summation of elementary unitaries. We now provide a formal definition of a LCU formula.

Definition 1 (Childs and Wiebe (2012)).

A $(\mu,\varepsilon)$ (LCU) formula of an operator $V$ is defined to be

\tilde{V}=\sum_{i=0}^{\Gamma-1}c_{i}V_{i}=\mu\sum_{i=0}^{\Gamma-1}\Pr(i)V_{i},

(29)

such that the spectral norm distance $\|V-\tilde{V}\|\leq\varepsilon$ . Here, $\mu>0$ is the $l_{1}$ -norm of the coefficient vector, $\Pr(i)$ is a probability distribution over different unitaries $V_{i}$ , and $\{V_{i}\}_{i=0}^{\Gamma-1}$ is a set of unitaries. Here, we assume $c_{i}>0$ for all $i$ and absorb the phase into the unitaries $V_{i}$ . We call $\mu$ the $1$ -norm of this $(\mu,\varepsilon)$ -LCU formula.

In what follows, we define the $1$ -norm $\|V\|_{1}$ of an operator $V$ to be its smallest $1$ -norm over all possible $(\mu,0)$ -LCU formulas for $V$ . Note that, $\|U\|_{1}=1$ for any unitary $U$ . One can easily check the validity of this norm definition.

We may consider two ways to implement the LCU formula of the operator $V$ . In the first way, we coherently implement the LCU formula by introducing an ancillary system $A$ with the dimension $\Gamma$ which costs $\lceil\log_{2}\Gamma\rceil$ qubits. For the LCU lemma defined in Eq. 29, we define the amplitude-encoding unitary $U_{AE}$ and the select gate $\mathrm{Sel}(\tilde{V})$ to be,

	$\displaystyle U_{AE}\ket{0}_{A}$	$\displaystyle=\sum_{i=0}^{\Gamma-1}\sqrt{\Pr(i)}\ket{i},$		(30)
	$\displaystyle\mathrm{Sel}(\tilde{V})$	$\displaystyle=\sum_{i=0}^{\Gamma-1}\ket{i}\bra{i}\otimes V_{i}.$		(30)

Then, the following controlled-gate $W_{AB}$ acting on ancillary $A$ and system $B$ defines a way to realize LCU coherently when we prepare $\ket{0}$ on $A$ and measure it on computational basis to get $\ket{0}$ ,

W:=(U_{AE}^{\dagger}\otimes I)\mathrm{Sel}(\tilde{V})(U_{AE}\otimes I).

(31)

More precisely, if we set the initial state on $B$ to be $\ket{\psi}$ , then we have

W\ket{0}_{A}\ket{\psi}_{B}=\frac{1}{\mu}\ket{0}_{A}\left(\tilde{V}\ket{\psi}\right)_{B}+\sqrt{1-\frac{1}{\mu^{2}}}\ket{\Phi_{\perp}}_{AB},

(32)

where $\ket{\Phi_{\perp}}_{AB}$ is a state whose ancillary state is supported in the subspace orthogonal to $\ket{0}$ .

If we perform computation-basis measurement directly on $A$ , the successful probability to obtain $\tilde{V}\ket{\psi}$ is $\frac{1}{\mu^{2}}$ . To boost the successful probability to nearly deterministic, we can introduce the amplitude amplification techniques Berry et al. (2014). To this end, we consider the following isometry

\mathcal{V}:=-WRW^{\dagger}RW\left(\ket{0}\otimes I\right)_{AB}.

(33)

Here, $R$ is a reflection over $\ket{0}$ on the system $A$ ,

R=I-2\ket{0}_{A}\bra{0}.

(34)

Consider the case of $\mu=2$ and $\varepsilon$ is small for the LCU formula. If $\tilde{V}$ is a unitary, then we have $(\bra{0}\otimes I)\mathcal{V}=\tilde{V}$ . For a general $\tilde{V}$ , we can verify the resulting operator

(\bra{0}\otimes I)\mathcal{V}=\frac{3}{2}\tilde{V}-\frac{1}{2}\tilde{V}\tilde{V}^{\dagger}\tilde{V},

(35)

is close to $V$ with the spectral norm distance bound $\xi:=(\varepsilon^{2}+3\varepsilon+4)\varepsilon/2$ . Also, the successful probability to project the ancilla to $\ket{0}$ is larger than $(1-\xi)^{2}$ .

In the second way to implement the LCU formula, we randomly sample the terms $\{V_{i}\}$ in Eq. 29 based on the probability $\Pr(i)$ Childs and Wiebe (2012). This way saves the ancillary qubit number and the cost to implement the multiqubit Toffoli gates in $U_{AE}$ and $\mathrm{Sel}(\tilde{V})$ in Eq. 30, which is more suitable to implement in the near-term devices. Instead of implementing the operator $V$ directly, we now focus on the task to estimate the properties of the target state $\sigma=V\rho V^{\dagger}$ where $\rho$ is the initial state of the Hamiltonian simulation task. Suppose we want to estimate expectation value of a given observable $O$ on $\sigma$ , we can then embed the task to Hadamard test Kitaev (1995) shown in Fig. 9.

We first prepare a $\ket{+}$ state on a single ancillary qubit. After that, we implement two controlled operations $\ket{0}_{A}\bra{0}\otimes I_{B}+\ket{1}_{B}\bra{1}\otimes(V_{i})_{B}$ and $\ket{0}_{A}\bra{0}\otimes(V_{j})_{B}+\ket{1}_{B}\bra{1}\otimes I_{B}$ where $V_{i}$ and $V_{j}$ are sampled independently from the LCU formula in Eq. 29. The measured expectation value of $\langle\mu^{2}X_{A}\otimes O_{B}\rangle$ is then a nearly unbiased estimation of $\mathrm{Tr}(V\rho V^{\dagger}O)$ . We will use the following performance guarantee for the observable estimation.

Proposition 1 (Performance of the random-sampling LCU implementation, Theorem 2 from Faehrmann et al. (2022)).

For a target operator $V$ and its $(\mu,\varepsilon)$ -LCU formula defined in Definition 1, if we estimate the value $\braket{O}_{V}:=\mathrm{Tr}(V\rho V^{\dagger}O)$ with an initial state $\rho$ and observable $O$ using the circuit in Fig. 9 for $N$ times, then the distance between the mean estimator value $\hat{O}$ and the true value $\braket{O}_{V}$ is bounded by

|\hat{O}-\braket{O}_{V}|\leq\|O\|(3\varepsilon+\varepsilon_{n}),

(36)

with successful probability $1-\delta$ . Here, $N=2\mu^{4}\ln(2/\delta)/\varepsilon_{n}^{2}$ , $\|O\|$ is the spectral norm of $O$ .

From Proposition 1 we can see that, the $1$ -norm $\mu$ of the LCU formula affects the sample complexity while the accuracy factor $\varepsilon$ introduces extra bias in the observable estimation. To estimate $\langle O\rangle_{V}$ using Hadamard-test-type circuits with $\varepsilon$ accuracy, we need $\mathcal{O}(\mu^{4}/\varepsilon^{2})$ sampling resource, which owns an extra $\mu^{4}$ overhead compared to the normal Hamiltonian simulation algorithms Faehrmann et al. (2022). To make the algorithm efficient, we need to set $\mu$ to be a constant.

In the later discussion, we will construct new LCU formulas from the product of LCU formulas. We will use following proposition.

Proposition 2 (Product of LCU formulas).

Suppose we have a $(\mu,\varepsilon)$ -LCU formula $\tilde{V}$ for an operator $V$ with the form of Eq. 29. Then the product formula

\tilde{V}^{\nu}=\mu^{\nu}\sum_{i_{1},i_{2},...,i_{\nu}}\Pr(i_{1})\Pr(i_{2})...\Pr(i_{\nu})V_{1}V_{2}...V_{\nu},\quad\nu\in\mathbb{N},

(37)

is a $(\mu^{\prime},\varepsilon^{\prime})$ -LCU formula for $V^{\nu}$ with

\mu^{\prime}=\mu^{\nu},\quad\varepsilon^{\prime}\leq\nu\mu^{\prime}\varepsilon.

(38)

Proof.

The $1$ -norm is obvious. We now bound the distance $\varepsilon^{\prime}$ between $\tilde{U}^{\nu}$ and $U^{\nu}$ .

$\displaystyle\\|U^{\nu}-\tilde{U}^{\nu}\\|$	$\displaystyle=\left\\|\sum_{k=1}^{\nu}U^{k-1}(U-\tilde{U})\tilde{U}^{\nu-k}\right\\|$
	$\displaystyle\leq\sum_{k=1}^{\nu}\\|U^{k-1}\\|\\|U-\tilde{U}\\|\\|\tilde{U}^{\nu-k}\\|$
	$\displaystyle\leq\nu\\|U-\tilde{U}\\|\sum_{k=1}^{\nu}\max\{\\|U\\|,\\|\tilde{U}\\|\}^{\nu-1}$	(39)
	$\displaystyle\leq\nu\varepsilon\mu^{\nu-1}\leq\nu\mu^{\prime}\varepsilon.$

∎

We remark that, when there are common unitary components for all $\{V_{i}\}$ in the LCU formula Eq. 29, we can simplify the circuit in Fig. 9 by removing the ancillary control on the common unitary components. Suppose we have the following LCU formula for an operator $V$ ,

\tilde{V}=\mu\sum_{i_{1},i_{2},...,i_{\nu}}\Pr(i_{1},i_{2},...,i_{\nu})V_{i_{\nu}}W_{\nu}...V_{i_{2}}W_{2}V_{i_{1}}W_{1},

(40)

such that $\|V-\tilde{V}\|\leq\varepsilon$ , where $W_{1},W_{2},...,W_{\nu}$ are some fixed unitaries. Then according to Definition 1, Eq. 40 is a $(\mu,\varepsilon)$ -LCU formula of $V$ with the elementary unitaries to be $V_{\vec{i}}=V_{i_{\nu}}W_{\nu}...V_{i_{2}}W_{2}V_{i_{1}}W_{1}$ for $\vec{i}=\{i_{1},i_{2},...,i_{\nu}\}$ . Instead of naively apply the Hadamard test circuit in Fig. 9, we can introduce an equivalent circuit implementation shown in Fig. 10(b). The performance guarantee in Proposition 1 also holds for the improved implementation.

Furthermore, we notice that $\mathrm{Tr}(O\tilde{V}\rho\tilde{V}^{\dagger})$ can be written as,

\displaystyle\mathbb{E}_{i_{1},j_{1};...;i_{\nu},j_{\nu}}\mathrm{Tr}[O\,\mathcal{E}_{i_{\nu},j_{\nu}}\circ\mathcal{W}_{\nu}\circ\,.\,\circ\mathcal{E}_{i_{1},j_{1}}\circ\mathcal{W}_{1}(\rho)],

(41)

where $\mathcal{W}_{k}(\bullet):=W_{k}\bullet W_{k}^{\dagger}$ and $\mathcal{E}_{i_{k},j_{k}}(\bullet):=\frac{1}{2}(V_{i_{k}}\bullet V_{j_{k}}^{\dagger}+V_{j_{k}}\bullet V_{i_{k}}^{\dagger})$ for $k=1,2,...,\nu$ . As a result, we can implement each channel $\mathcal{W}_{k}$ by a unitary and each map $\mathcal{E}_{i_{k},j_{k}}$ by a Hadamard-test-type circuit. This leads to a variant circuit shown in Fig. 10(c), where the ancillary qubit is measured and reset for every segment. While this circuit owns the same gate complexity as Fig. 10(b), it is beneficial in a fault-tolerant quantum computer since we do not need to store the ancillary qubit for a long time-the ancillary qubit is activated only in the compensation stage and is quickly measured.

IV Trotter-LCU algorithms with paired Taylor-series compensation

In this section, we construct a $(\mu,\varepsilon)$ -LCU formula for the remainder $V_{K}(x)$ of the $K$ -th order Trotter formula based on the idea to perform Taylor-series (TS) expansion on all the exponential terms in $V_{K}(x)$ . Although TS expansion is naturally a LCU formula of $V_{K}(x)$ , it usually owns poor $\mu$ performance. To further suppress the $1$ -norm of the expansion, we will modify the introduce a “pairing” idea to combine the terms that correspond to different TS expansion orders. We first consider a simple case without Trotter formula in subsection IV.1 to illustrate the major idea to construct TS-based LCU formula. Then we take the construction of LCU for the first-order Trotter formula as an example in subsection IV.2. Finally, in subsection IV.3, we discuss the random-sampling implementation of the algorithm and analyze its sample and gate complexity.

IV.1 Zeroth-order case

We begin our discussion from the case where the Trotter formula is trivial, i.e., $S_{0}(x)=I$ for each segment. In this case, the Trotter remainder is $V_{0}(x)=U(x)=e^{-ixH}$ . To construct LCU formula, we expand $V_{0}(x)$ by Taylor series Berry et al. (2015),

	$\displaystyle V_{0}(x)$	$\displaystyle=\sum_{s=0}^{\infty}\frac{(-i\lambda x)^{s}}{s!}\sum_{l_{1},...,l_{s}}p_{l_{1}}p_{l_{2}}.p_{l_{s}}P_{l_{1}}P_{l_{2}}.P_{l_{s}}$		(42)
		$\displaystyle=\sum_{s=0}^{\infty}F_{0,s}(x),$		(42)

where

F_{0,s}(x)=\frac{(-i\lambda x)^{s}}{s!}\sum_{l_{1},...,l_{s}}p_{l_{1}}p_{l_{2}}...p_{l_{s}}P_{l_{1}}P_{l_{2}}...P_{l_{s}}.

(43)

The $1$ -norm of $V_{0}(x)$ is

\|V_{0}(x)\|_{1}=\sum_{s=0}^{\infty}\|F_{0,s}(x)\|_{1}=1+(\lambda x)+\frac{1}{2!}(\lambda x)^{2}+...=e^{\lambda x}.

(44)

That is, the $1$ -norm of $V_{0}(x)$ is exponentially large with respect to $\lambda x$ . Suppose we use $V_{0}(x)$ directly for the random-sampling implementation of LCU following Fig. 2(b), the composite LCU formula for $U(t)=V_{0}(x)^{\nu}$ is the product of $V_{0}(x)$ . Based on Proposition 2, the $1$ -norm of the product formula is $\mu=(e^{\lambda x})^{\nu}=e^{\lambda t}$ , which increases exponentially with the simulation time $t$ . Based on Proposition 1, this implies an exponentially increasing sample cost $N=\mathcal{O}(\mu^{4})$ . To make the TS expansion practical for the random-sampling implementation, we need to reduce $1$ -norm of $V_{0}(x)$ .

When $\lambda x$ is a small value, the major contribution to $\|V_{0}(x)\|_{1}$ comes from the low-order terms of $F_{0,s}(x)$ . Note that the first-order term $F_{0,1}(x)$ is anti-Hermitian. This allows us to utilize the following Euler’s formula on Pauli operators: for a Pauli matrix $P$ and $y\in\mathbb{R}$ ,

I+iyP=\sqrt{1+y^{2}}e^{i\theta(y)P},

(45)

where $\theta(y):=\tan^{-1}(y)$ .

To suppress $\|V_{0}(x)\|_{1}$ , we rewrite $V_{0}(x)$ as follows:

		$\displaystyle V_{0}(x)=\sum_{s=0}^{\infty}F_{0,s}(x)$		(46)
		$\displaystyle=I-i\lambda x\sum_{l=1}^{L}p_{l}P_{l}+\sum_{s=2}^{\infty}F_{0,s}(x)$
		$\displaystyle=\sum_{l=1}^{L}p_{l}(I-i\lambda xP_{l})+\sum_{s=2}^{\infty}F_{0,s}(x).$

Then, we can apply Eq. 45 on Eq. 46 to convert the first-order term to Pauli rotation unitaries

\displaystyle V_{0}^{(p)}(x)=\sqrt{1+(\lambda x)^{2}}\sum_{l=1}^{L}p_{l}e^{i\theta_{0}P_{l}}+\sum_{s=2}^{\infty}F_{0,s}(x),

(47)

where $\theta_{0}:=\tan^{-1}(-\lambda x)$ . We call the formula in Eq. 47 the $0$ th-order PTS formula.

In practice, to avoid the sampling in the infinitely large space, we introduce a truncation $s_{c}\geq 2$ to the expansion order of $x$ . After this truncation, the approximated LCU formula of $V_{0}^{(p)}(x)$ is

		$\displaystyle\tilde{V}_{0}^{(p)}(x)=\tilde{U}_{0}^{(p)}(x)=\sqrt{1+(\lambda x)^{2}}\sum_{l=1}^{L}p_{l}e^{i\theta_{0}P_{l}}+\sum_{s=2}^{s_{c}}F_{0,s}(x)$		(48)
		$\displaystyle=\mu_{0}^{(p)}(x)\left(\mathrm{Pr}_{0}^{(p)}(1)V_{0,1}^{(p)}+\sum_{s=2}^{s_{c}}\mathrm{Pr}_{0}^{(p)}(s)V_{0,s}^{(p)}\right),$		(48)

where

$\displaystyle\mu_{0}^{(p)}(x)$	$\displaystyle:=\sqrt{1+(\lambda x)^{2}}+\sum_{s=2}^{s_{c}}\\|F_{0,s}(x)\\|_{1}$
	$\displaystyle\leq\sqrt{1+(\lambda x)^{2}}+\left(e^{\lambda x}-1-\lambda x\right),$
$\displaystyle\mathrm{Pr}_{0}^{(p)}(s)$	$\displaystyle:=\frac{1}{\mu_{0}^{(p)}(x)}\begin{cases}\sqrt{1+(\lambda x)^{2}},&s=1,\\ \\|F_{0,s}(x)\\|_{1}=\frac{(\lambda x)^{s}}{s!},&s=2,3,...,s_{c},\end{cases}$	(49)
$\displaystyle V_{0,s}^{(p)}$	$\displaystyle:=\begin{cases}\sum_{l}p_{l}\,e^{i\theta_{0}P_{l}},&s=1,\\ \sum_{l_{1:s}}p_{l_{1:s}}^{(s)}(-i)^{s}P_{l_{1:s}}^{(s)},&s=2,3,...,s_{c}.\end{cases}$

After “pairing” the terms with $s=0$ and $s=1$ , we obtain Eq. 48 which is a new LCU formula with the $1$ -norm $\mu_{0}^{(p)}(x)$ . We have the following proposition to characterize the LCU formula in Eq. 48.

Proposition 3 (0th-order Trotter-LCU formula by paired Taylor-series compensation).

For $x\geq 0$ and $s_{c}\geq 2$ , $\tilde{V}_{0}^{(p)}(x)$ in Eq. 48 is a $(\mu_{0}^{(p)}(x),\varepsilon_{0}^{(p)}(x))$ -LCU formula of $V_{0}(x)=U(x)$ with

\mu_{0}^{(p)}(x)\leq e^{\frac{3}{2}(\lambda x)^{2}},\quad\varepsilon_{0}^{(p)}(x)\leq\left(\frac{e\lambda x}{s_{c}+1}\right)^{s_{c}+1}.

(50)

Proof.

For the normalization factor, we have

	$\displaystyle\mu_{0}^{(p)}(x)\leq\sqrt{1+(\lambda x)^{2}}+\left(e^{\lambda x}-1-\lambda x\right)$
	$\displaystyle\leq 1+\frac{1}{2}(\lambda x)^{2}+\left(e^{\lambda x}-1-\lambda x\right)$		(51)
	$\displaystyle\leq e^{(\lambda x)^{2}}+\frac{1}{2}(\lambda x)^{2}\leq e^{\frac{3}{2}(\lambda x)^{2}}.$

The fourth inequality is due to $e^{x}-x\leq e^{x^{2}}$ for $x\in\mathbb{R}$ .

For the distance bound, we have

		$\displaystyle\\|\tilde{V}_{0}(x)-V_{0}(x)\\|\leq\sum_{s>s_{c}}\\|F_{0,s}(x)\\|$		(52)
		$\displaystyle\leq\sum_{s>s_{c}}\frac{(\lambda x)^{s}}{s!}\\|V_{0,s}(x)\\|$
		$\displaystyle\leq\sum_{s>s_{c}}\frac{(\lambda x)^{s}}{s!}\leq\left(\frac{e\lambda x}{s_{c}+1}\right)^{s_{c}+1}.$

In the second inequality, we use the fact that

\|V_{0,s}\|\leq\sum_{l_{1},l_{2},...,l_{s}}p_{l_{1}}p_{l_{2}}...p_{l_{s}}\|(-i)^{s}P_{l_{1}}P_{l_{2}}...P_{l_{s}}\|\leq 1.

(53)

In the fourth inequality of Eq. 52, we apply the following Poisson tail bound formula,

\sum_{s=k+1}^{\infty}\frac{x^{s}}{s!}\leq\left(\frac{ex}{k+1}\right)^{k+1},

(54)

which can be proven from Theorem 1 in Ref. Canonne (2016). ∎

From Proposition 3 we see that, the $1$ -norm of the LCU formula $V_{0}^{(p)}(x)$ in Eq. 47 is $\exp(\frac{3}{2}(\lambda x)^{2})\approx 1+\mathcal{O}((\lambda x)^{2})$ , whose leading term is quadratically smaller than the one of $\|V_{0}^{(p)}(x)\|_{1}=\exp(\lambda x)\approx 1+\mathcal{O}(\lambda x)$ when $\lambda x\ll 1$ . We will later see that $V_{0}^{(p)}(x)$ provides us an efficient random-sampling implementation of the Trotter-LCU algorithm.

IV.2 First-order case

Following similar ideas in subsection IV.1, we now study the PTS compensation of the Trotter remainder $V_{K}(x)$ . We will take first-order case as an example. The Trotter remainder $V_{1}(x)$ is

V_{1}(x)=U(x)S_{1}(x)^{\dagger}.

(55)

From Eq. 23 and Eq. 42 we have

$\displaystyle S_{1}(x)^{\dagger}$	$\displaystyle=\prod^{\rightarrow}e^{ix\vec{H}}=\left(\prod e^{\vec{\alpha}x}\right)\sum_{\vec{r}}\mathrm{Poi}(\vec{r};\vec{\alpha}x)\prod^{\rightarrow}(i\vec{P})^{\vec{r}}$	(56)
$\displaystyle U(x)$	$\displaystyle=\sum_{r=0}^{\infty}\frac{(\lambda x)^{r}}{r!}\sum_{l_{1},...,l_{r}}p_{l_{1}}p_{l_{2}}.p_{l_{r}}(-i)^{r}P_{l_{1}}P_{l_{2}}.P_{l_{r}}$
	$\displaystyle=e^{\lambda x}\sum_{r=0}^{\infty}\mathrm{Poi}(r;\lambda x)\sum_{l_{1:r}}p^{(r)}_{l_{1:r}}(-i)^{r}P^{(r)}_{l_{1:r}}.$

Here, we adopt the vector notations introduced in subsection III.1 to simplify the expressions. In Eq. 56, we also extend the notation of Poisson distribution,

\mathrm{Poi}(\vec{r};\vec{\alpha}x):=\prod_{l=1}^{L}\mathrm{Poi}(r_{l};\alpha_{l}x).

(57)

Based on Eq. 56, we then write the Taylor-series expansion of first-order remainder as follows:

		$\displaystyle V_{1}(x)=U(x)S_{1}(x)^{\dagger}$		(58)
		$\displaystyle=e^{2\lambda x}\sum_{r;\vec{r}}\mathrm{Poi}(r,\vec{r};\lambda x,\vec{\alpha}x)\sum_{l_{1:r}}p^{(r)}_{l_{1:r}}(-i)^{r-\sum\vec{r}}P^{(r)}_{l_{1:r}}\prod^{\rightarrow}\vec{P}^{\vec{r}},$		(58)

which is a LCU formula with $1$ -norm $e^{2\lambda x}$ . Here, $r$ and $\vec{r}$ are two groups of independent variables.

Now, we utilize the order condition of the Trotter formula in Lemma 1 to reduce $1$ -norm of Eq. 58. To this end, we first rewrite Eq. 58 by classifying the terms based on the power of $x$ , which is determined by the value $s=r+\sum\vec{r}$ ,

\displaystyle V_{1}(x)=\sum_{s=0}^{\infty}F_{1,s}(x)=\sum_{s=0}^{\infty}\|F_{1,s}(x)\|_{1}V_{1,s}.

(59)

Here, $\|F_{1,s}(x)\|_{1}$ is the $1$ -norm of the $s$ -order expansion formula $F_{1,s}(x)$ . $V_{1,s}$ denotes the normalized LCU formula for the $s$ -order terms. We have

$\displaystyle F_{1,s}(x)$	$\displaystyle=\sum_{r+\sum\vec{r}=s}\frac{(\lambda x)^{r}}{r!}\left(\prod\frac{(\vec{\alpha}x)^{\vec{r}}}{\vec{r}!}\right)\cdot$	(60)
	$\displaystyle\quad\quad\quad\sum_{l_{1:r}}p^{(r)}_{l_{1:r}}(-i)^{2r-s}P^{(r)}_{l_{1:r}}\prod^{\rightarrow}\vec{P}^{\vec{r}},$
$\displaystyle V_{1,s}$	$\displaystyle=\sum_{r;\vec{r}}\Pr(r,\vec{r}\|s)\sum_{l_{1:r}}p^{(r)}_{l_{1:r}}(-i)^{2r-s}P^{(r)}_{l_{1:r}}\prod^{\rightarrow}\vec{P}^{\vec{r}}.$

In the expression of $V_{1,s}$ , we use $r-\sum\vec{r}=r-(s-r)=2r-s$ . The conditional probability $\Pr(r,\vec{r}|s)$ indicates the probability to sample $r$ and $\vec{r}$ when their summation $s$ is given,

$\displaystyle\Pr(r,\vec{r}\|s)$	$\displaystyle=\frac{\Pr\{(\hat{r}=r,\hat{\vec{r}}=\vec{r})\cap(\hat{s}=s)\}}{\Pr\{\hat{s}=s\}}$	(61)
	$\displaystyle=\frac{s!}{r!\prod\vec{r}!}\left(\frac{1}{2}\right)^{r}\prod\left(\frac{\vec{p}}{2}\right)^{\vec{r}}$
	$\displaystyle=\mathrm{Mul}(\{r,\vec{r}\};\{\frac{1}{2},\frac{\vec{p}}{2}\};s),$

which is a multinomial distribution $\mathrm{Mul}(\cdot;\cdot;s)$ with $s$ trials and $(L+1)$ outcomes. In each trial, the $(L+1)$ outcomes $\{r;r_{1},r_{2},...,r_{L}\}$ occur with the corresponding probability $\{\frac{1}{2},\frac{1}{2}p_{1},\frac{1}{2}p_{2},...,\frac{1}{2}p_{L}\}$ . Recall that $\vec{p}=\{p_{1},p_{2},...,p_{L}\}$ is the normalized Hamiltonian coefficients defined in Eq. 3.

We first estimate the normalization cost $\|F_{1,s}(x)\|_{1}$ for different $s$ orders,

	$\displaystyle\\|F_{1,s}(x)\\|_{1}$	$\displaystyle=\sum_{s;\vec{r}}\mathbbm{1}\left[r+\sum\vec{r}=s\right]\frac{(\lambda x)^{r}}{r!}\left(\prod\frac{(\vec{\alpha}x)^{\vec{r}}}{\vec{r}!}\right)$		(62)
		$\displaystyle=\frac{(\lambda x+\sum\vec{\alpha}x)}{s!}=\frac{(2\lambda x)^{s}}{s!}=:\eta_{s}.$		(62)

In the second inequality, we use the following equation

\sum_{s_{1},s_{2}=0}^{r}\mathbbm{1}[s_{1}+s_{2}=r]\frac{(x_{1})^{s_{1}}}{s_{1}!}\frac{(x_{2})^{s_{2}}}{s_{2}!}=\frac{(x_{1}+x_{2})^{r}}{r!}.

(63)

We denote $\eta_{s}:=\frac{(2\lambda x)^{s}}{s!}$ , which will be frequently used in the following discussion.

From Eq. 62 we can see that, similar to the expansion of $V_{0}(x)$ , the $1$ -norm of the expansion terms with different orders of $x$ in $V_{1}(x)$ follow the Possion distribution. This motivates us to eliminate the low-order terms such as $F_{1,1}(x)$ . Based on Lemma 1, we have

F_{1,1}(x)=0.

(64)

As a result, we can directly remove the term $F_{1,1}(x)$ in Eq. 59. The resulting formula is,

\displaystyle V_{1}(x)

\displaystyle=I+\sum_{s=2}^{\infty}\eta_{s}V_{1,s}.

(65)

After the elimination of the first-order term, we now introduce Euler’s formula to suppress higher-order terms in $V_{1,2}$ and $V_{1,3}$ . For the convenience of later discussion, we simplify the notation of $F_{1,s}(x)$ in Eq. 60 as follows:

\displaystyle F_{1,s}(x)=\eta_{s}\sum_{r,\gamma}\Pr(r,\gamma|s)P_{1}(r,\gamma),

(66)

where $\gamma$ is used to denote all the expansion variables $\{\vec{r},l_{1:r}\}$ besides $r$ . $\Pr(r,\gamma|s)$ and $P_{1}(r,\gamma)$ are then defined to be

	$\displaystyle\Pr(r,\gamma\|s)$	$\displaystyle=\Pr(r,\vec{r}\|s)p^{(r)}_{l_{1:r}},$		(67)
	$\displaystyle P_{1}(r,\gamma)$	$\displaystyle=(-i)^{2r-s}P^{(r)}_{l_{1:r}}\prod^{\rightarrow}\vec{P}^{\vec{r}}.$		(67)

To apply Euler’s formula in Eq. 45 to Eq. 65, we need to make sure that the Pauli operator $P$ is Hermitian. To this end, we classify the Pauli terms of $F_{1,s}$ in Eq. 66 into Hermitian and anti-Hermitian types,

	$\displaystyle F_{1,s}(x)$	$\displaystyle=\eta_{s}\sum_{\{r,\gamma\}\in\text{Her}}\Pr(r,\gamma\|s)P_{1,s}(r,\gamma)$		(68)
		$\displaystyle+i\eta_{s}\sum_{\{r,\gamma\}\in\text{anti-Her}}\Pr(r,\gamma\|s)(-i)P_{1,s}(r,\gamma),$		(68)

where $\{r,\gamma\}\in\text{Her}$ and $\{r,\gamma\}\in\text{anti-Her}$ , respectively, indicates the set of $\{r,\gamma\}$ such that $P_{1}(r,\gamma)$ is Hermitian and anti-Hermitian. When $\{r,\gamma\}\in\text{Her}$ , the corresponding Pauli operator owns a real coefficient, which cannot be paired with $I$ based on Euler’s formula.

It seems that we cannot eliminate the Hermitian terms in Eq. 68 by Euler’s formula. However, by taking advantage of the properties of Trotter remainder $V_{1}(x)$ , we can show that $V_{1,2}$ and $V_{1,3}$ are actually anti-Hermitian. Recall that we can write the exponential form of $V_{1}(x)$ by applying the BCH formula on the definition of $V_{1}(x)$ in Eq. 27,

V_{1}(x)=\exp\left(i(E_{1,2}x^{2}+E_{1,3}x^{3}+E_{1,4}x^{4}+...)\right),

(69)

where $\{E_{1,s}\}$ are some Hermitian operators determined by the BCH formula. The first order term $E_{1,1}$ vanishes due to the order condition in Lemma 2.

Lemma 2 (Lemma 1 in Childs and Su (2019)).

Let $F(x)$ be an operator-valued function that is infinitely differentiable. Let $K\geq 1$ be a non-negative integer. The following two conditions are equivalent.

1.

Asymptotic scaling: $F(x)=\mathcal{O}(x^{K+1})$ .
2.

Derivative condition: $F(0)=F^{\prime}(0)=...=F^{(K)}(0)=0$ .

From Eq. 69 we then have

	$\displaystyle F_{1,2}(x)$	$\displaystyle=ix^{2}E_{1,2},$		(70)
	$\displaystyle F_{1,3}(x)$	$\displaystyle=ix^{3}E_{1,3}$		(70)

by the Taylor-series expansion on Eq. 69.

Comparing Eq. 68 and Eq. 70, we can see that

\displaystyle\sum_{\{r,\gamma\}\in\text{Her}}\Pr(r,\gamma|s)P_{1,s}(r,\gamma)=0,\quad s=2,3.

(71)

This is because $F_{1,2}(x)$ and $F_{1,3}(x)$ are anti-Hermitian from Eq. 70.

We can then modify the form of $F_{1,s}(x)$ ( $s=2$ or $3$ ) in Eq. 68 as follows:

$\displaystyle F_{1,s}(x)$	$\displaystyle=i\eta_{s}\sum_{\{r,\gamma\}\in\text{Her}}\Pr(r,\gamma\|s)P_{1,s}(r,\gamma)$	(72)
	$\displaystyle\quad+i\sum_{\{r,\gamma\}\in\text{anti-Her}}\Pr(r,\gamma\|s)(-i)P_{1,s}(r,\gamma),$
	$\displaystyle=i\eta_{s}\sum_{r,\gamma}\Pr(r,\gamma\|s)(-i)^{\mathbbm{1}[P_{1,s}(r,\gamma):\text{anti-Her}]}P_{1,s}(r,\gamma).$

In Eq. 72, we intentionally add an extra phase $i$ on the Hermitian terms, which has no effect on $F_{1,s}(x)$ as they own zero summation value. In this way, all the Pauli expansion terms in $F_{1,2}(x)$ and $F_{1,3}(x)$ are with imaginary coefficient, which can be paired with $I$ using Euler’s formula in Eq. 45. We call the second- and third-order terms $V_{1,s}(s=2,3)$ the leading TS expansion orders of $V_{1}(x)$ . The major reason to the Hermitian terms with zero summation value is to simplify the sampling procedure of $V_{1,s}$ , which will be clarified later.

Now, we are going to eliminate the leading-order terms in $V_{1}(x)$ ,

$\displaystyle V_{1}(x)$	$\displaystyle=I+\sum_{s=2}^{3}i\eta_{s}V_{1,s}^{\prime}+\sum_{s>3}\eta_{s}V_{1,s},$	(73)
$\displaystyle V_{1,s}^{\prime}$	$\displaystyle=\sum_{r,\gamma}\Pr(r,\gamma\|s)P_{1,s}^{\prime}(r,\gamma),\quad s=2,3$
$\displaystyle V_{1,s}$	$\displaystyle=\sum_{r,\gamma}\Pr(r,\gamma\|s)(-i)^{2r-s}P_{1,s}(r,\gamma),\quad s\geq 4.$

Here, the Pauli operator for the leading-order term $P_{1,s}^{\prime}(r,\gamma):=(-i)^{\mathbbm{1}[P_{1,s}(r,\gamma):\text{anti-Her}]}P_{1,s}(r,\gamma)$ is always with a real coefficient. We can then pair $I$ with the Pauli operators in $F_{1,2}(x)$ and $F_{1,3}(x)$ ,

		$\displaystyle V_{1}^{(p)}(x)=I+i\eta_{2}V_{1,2}^{\prime}+i\eta_{3}V_{1,3}^{\prime}+\sum_{s=4}^{\infty}\eta_{s}V_{1,s}$		(74)
		$\displaystyle=\frac{\eta_{2}}{\eta_{\Sigma}}(I+\eta_{\Sigma}V_{1,2}^{\prime}(x))+\frac{\eta_{3}}{\eta_{\Sigma}}(I+\eta_{\Sigma}V_{1,3}^{\prime}(x))+\sum_{s=4}^{\infty}\eta_{s}V_{1,s}$
		$\displaystyle=\sqrt{1+\eta_{\Sigma}^{2}}\left(\frac{\eta_{2}}{\eta_{\Sigma}}R_{1,2}(\eta_{\Sigma})+\frac{\eta_{3}}{\eta_{\Sigma}}R_{1,3}(\eta_{\Sigma})\right)+\sum_{s=4}^{\infty}\eta_{s}V_{1,s}.$

Here, $\eta_{\Sigma}:=\eta_{2}+\eta_{3}$ . The third line of Eq. 74 is the final LCU formula used for the first-order PTSC algorithm. In the third line, we apply the following pairing procedure,

		$\displaystyle I+\eta_{\Sigma}V_{1,s}^{\prime}=I+i\eta_{\Sigma}\sum_{r,\gamma}\Pr(r,\gamma\|s)P_{1,s}^{\prime}(r,\gamma)$		(75)
		$\displaystyle=\sum_{r,\gamma}\Pr(r,\gamma\|s)(I+\eta_{\Sigma}P_{1,s}^{\prime}(r,\gamma))$
		$\displaystyle=\sqrt{1+\eta_{\Sigma}^{2}}\sum_{r,\gamma}\Pr(r,\gamma\|s)\exp\left(i\theta(\eta_{\Sigma})P_{1,s}^{\prime}(r,\gamma)\right)$
		$\displaystyle=:\sqrt{1+\eta_{\Sigma}^{2}}R_{1,s}(\eta_{\Sigma}),\quad s=2,3.$

In practice, we introduce a truncation $s_{c}$ on the expansion formula in Eq. 74. For the convenience of analysis, we set $s_{c}>3$ . The truncated LCU formula for $V_{1}^{(p)}(x)$ is then

\tilde{V}_{1}^{(p)}(x)=\mu_{1}^{(p)}(x)\Big{(}\sum_{s=2,3}\mathrm{Pr}_{1}^{(p)}(s)R_{1,s}(\eta_{\Sigma})+\sum_{s=4}^{s_{c}}\mathrm{Pr}_{1}^{(p)}(s)V_{1,s}\Big{)},

(76)

with the $1$ -norm $\mu_{1}^{(p)}(x)$ and probabilities $\Pr_{1}^{(p)}(s)$ to sample the $s$ -order term

	$\displaystyle\mu_{1}^{(p)}(x)$	$\displaystyle=\sqrt{1+\eta_{\Sigma}^{2}}+\sum_{s=4}^{s_{c}}\eta_{s},$		(77)
	$\displaystyle\mathrm{Pr}_{1}^{(p)}(s)$	$\displaystyle=\frac{1}{\mu_{1}^{(p)}(x)}$		(77)

Combined with the deterministic first-order Trotter formula, the overall LCU formula for $U(x)$ is

\tilde{U}_{1}^{(p)}(x)=\tilde{V}_{1}^{(p)}(x)S_{1}(x).

(78)

The following proposition gives the performance characterization of $\tilde{U}^{(p)}_{1}(x)$ in Eq. 78 to approximate $U(x)$ .

Proposition 4 (first-order Trotter-LCU formula by paired Taylor-series compensation).

For $0<x<1/(2\lambda)$ and $s_{c}\geq 3$ , $\tilde{V}_{1}(x)$ in Eq. 76 is a $(\mu_{1}^{(p)}(x),\varepsilon_{1}^{(p)}(x))$ -LCU formula of the first-order Trotter remainder $V_{1}(x)$ with

\mu_{1}^{(p)}(x)\leq e^{(e+\frac{2}{9})(2\lambda x)^{4}},\quad\varepsilon_{1}^{(p)}(x)\leq\left(\frac{2e\lambda x}{s_{c}+1}\right)^{s_{c}+1}.

(79)

As a result, $\tilde{U}_{1}(x)$ in Eq. 78 is a $(\mu_{1}^{(p)}(x),\varepsilon_{1}^{(p)}(x))$ -LCU formula of $U(x)$ .

Proof.

We first bound the normalization factor $\mu_{1}^{(p)}(x)$ . When $2\lambda x<1$ we have

	$\displaystyle\mu_{1}^{(p)}(x)\leq\sqrt{1+(\eta_{\Sigma})^{2}}+\sum_{s=4}^{\infty}\eta_{s}$
	$\displaystyle\leq 1+\frac{1}{2}\eta_{\Sigma}^{2}+\left(e^{2\lambda x}-\sum_{s=0}^{3}\eta_{s}\right)$
	$\displaystyle=\frac{1}{2}(2\lambda x)^{4}\left(\frac{1}{2!}+\frac{(2\lambda x)}{3!}\right)^{2}+\left(e^{2\lambda x}-\sum_{s=1}^{3}\eta_{s}\right)$		(80)
	$\displaystyle\leq\frac{1}{2}(2\lambda x)^{4}\left(\frac{1}{2}+\frac{1}{6}\right)^{2}+\left(e^{2\lambda x}-\sum_{s=1}^{3}\eta_{s}\right)$
	$\displaystyle\leq\frac{2}{9}(2\lambda x)^{4}+e^{e(2\lambda x)^{4}}\leq e^{(e+\frac{2}{9})(2\lambda x)^{4}}.$

For the distance bound, from Eq. 58 and Eq. 76 we have

		$\displaystyle\\|V_{1}(x)-\tilde{V}_{1}^{(p)}(x)\\|\leq\sum_{s>s_{c}}\eta_{s}\\|V_{1,s}\\|$		(81)
		$\displaystyle=\sum_{s>s_{c}}\frac{(2\lambda x)^{s}}{s!}\leq\left(\frac{2e\lambda x}{s_{c}+1}\right)^{s_{c}+1}$		(81)

In the second line, we use the fact that $\|V_{1,s}\|\leq 1$ . In the third line, we apply Eq. 54. ∎

From Proposition 4 we have shown that, by introducing first-order Trotter formula, we can further suppress the 1-norm of the LCU formula $V_{1}^{(p)}(x)$ to $\exp(c(\lambda x)^{4})\approx 1+\mathcal{O}(\lambda x)^{4}$ where $c$ is a constant. In Appendix A, we discuss the generalized LCU formula construction of higher-order Trotter remainder $V_{K}(x)$ with $K=2k,k\in\mathbb{N}_{+}$ . Under such constructions, we have,

\|V_{K}^{(p)}(x)\|_{1}=\exp(c(\lambda x)^{2K+2}).

(82)

In subsection IV.3 we will see how this can help us to improve the time scaling of the whole simulation algorithm.

IV.3 Random-sampling implementation and performance

We have now derived the LCU formulas for Trotter remainders $V_{K}(x)$ and hence the small time evolution $U(x)=V_{K}(x)S_{K}(x)$ based on the idea to utilize the Trotter order condition and pair the leading-order terms in $V_{K}(x)$ to suppress the normalization factors. We now discuss the practical random-sampling implementation of them, taking the first-order case as an example.

Suppose we want to perform Hamiltonian evolution $e^{iHt}$ on an initial state $\rho$ . As is illustrated in Fig. 2, in each segment, we first implement the Trotter circuit $S_{1}(x)$ and then compensate the remainder $V_{1}(x)$ by LCU. In the random-sampling implementation of LCU, we embed the LCU sampling into a modified Hadamard test. In Fig. 5, we show the detailed sampling procedure of $V_{i}$ and $V_{j}$ . In stage 1), we sample the Taylor-expansion order $s$ from a finite probability distribution $\mathrm{Pr}^{(p)}_{1}(s)$ . Afterwards, in stage 2) and 3), we randomly sample Pauli string indices $\{r,\vec{r}\}$ and $l_{1},...,l_{r}$ based on the LCU formula of $F_{1,s}$ . The variables $\{r,\vec{r}\}$ obey a multinomial distribution $\text{Mul}(r,\vec{r};\{\frac{1}{2},\frac{\vec{p}}{2}\};s)$ defined in Eq. 61 while $\{l_{1},l_{2},...,l_{r}\}$ are sampled identically and independently from normalized Hamiltonian coefficients $\vec{p}$ defined in Eq. 3. Finally, in stage 4), depending on whether $s$ is the leading-order ( $s=2,3$ for $K=1$ ) or not, we determine the sampled unitary $V_{i}$ to be a Pauli-rotation unitary $e^{i\theta P}$ or just Pauli operators $P$ .

For the gate complexity of the random-sampling implementation, we have the following theorem.

Theorem 1 (Gate complexity of the $K$ th-order random-sampling Trotter-LCU algorithm by paired Taylor-series compensation).

To realize a (probabilistic) Hamiltonian simulation of $e^{iHt}$ with accuracy $\varepsilon$ , the gate complexity of random-sampling $K$ th-order Trotter-LCU algorithm ( $K=0,1$ or $2k$ ) based on paired Taylor-series compensation is

\mathcal{O}\left((\lambda t)^{1+\frac{1}{2K+1}}(\kappa_{K}L+\frac{\log(1/\varepsilon)}{\log\log(1/\varepsilon)})\right).

(83)

Here, $\kappa_{K}=K$ if $K=0$ or $1$ , or $\kappa_{K}=2\times 5^{K/2-1}$ if $K=2k$ .

Proof.

Without loss of generality, we focus on the case when $K=1$ . The case of $K=0$ and $K=2k,k\in\mathbb{N}_{+}$ can be analyzed similarly following Proposition 3 and Proposition 10, respectively.

For the random-sampling implementation, the overall LCU formula for $U(t)$ is to repeat the sampling of $\tilde{U}_{1}^{(p)}(x)$ for $\nu$ times, $\tilde{U}_{1,\mathcal{tot}}^{(p)}(t)=\tilde{U}_{1}^{(p)}(x)^{\nu}$ . Using Proposition 2 and Proposition 4, when $0<x<\frac{1}{2\lambda}$ and $s_{c}\geq 3$ , we conclude that $\tilde{U}_{1,\mathcal{tot}}^{(p)}(t)$ is a $(\mu_{1,\mathcal{tot}}^{(p)}(t),\varepsilon_{1,\mathcal{tot}}^{(p)}(t))$ -LCU formula of $U(t)$ with

	$\displaystyle\mu_{1,\mathcal{tot}}^{(p)}(t)$	$\displaystyle=\mu_{1}^{(p)}(x)^{\nu}\leq e^{(e+c_{1})\frac{(2\lambda t)^{4}}{\nu^{3}}},$		(84)
	$\displaystyle\varepsilon_{1,\mathcal{tot}}^{(p)}(t)$	$\displaystyle\leq\nu\mu_{1,\mathcal{tot}}^{(p)}(t)\varepsilon_{1}^{(p)}(x)\leq\nu e^{(e+c_{1})\frac{(2\lambda t)^{4}}{\nu^{3}}}\left(\frac{2e\lambda x}{s_{c}+1}\right)^{s_{c}+1}.$		(84)

Here, $c_{1}=2/9$ .

To realize a $(\mu,\epsilon)$ -LCU formula for $U(t)$ , we only need to set the segment number $\nu$ and the truncation order $s_{c}$ to satisfy

	$\displaystyle\nu$	$\displaystyle\geq\max\left\{\nu_{1}^{(p)}(t),2\lambda t\right\},$		(85)
	$\displaystyle s_{c}$	$\displaystyle\geq\max\left\{\left\lceil\frac{\ln\left(\frac{\mu}{\varepsilon}\nu_{1}^{(p)}(t)\right)}{W_{0}\left(\frac{1}{2e\lambda t}\nu_{1}^{(p)}(t)\ln\left(\frac{\mu}{\varepsilon}\nu_{1}^{(p)}(t)\right)\right)}-1\right\rceil,3\right\},$		(85)

we can then realize a $(\mu,\varepsilon)$ -LCU formula for $U(t)$ based on $\nu$ segments of $\tilde{U}_{1}^{(p)}$ in Eq. 78. Here, $\nu_{1}^{(p)}(t):=\left(\frac{2(e+c_{1})\lambda t}{\ln\mu}\right)^{\frac{1}{3}}2\lambda t$ . $W_{0}(y)$ is the principle branch of the Lambert $W$ function whose scaling is approximately $\ln(y)$ according to the tight bound in Lemma 7 in Appendix F. To derive the bound for $s_{c}$ in the second line of Eq. 85, we use Lemma 6 in Appendix F.

The gate complexity of the random-sampling implementation of the Trotter-LCU algorithm is determined by the segment number $\nu$ , Trotter order $K$ and the gate complexity of each elementary gate in the LCU formula. As shown in Fig. 5, to construct controlled- $U(t)$ , we split it to $\nu$ segments. In each segment, we need to implement $K$ th-order Trotter circuits and LCU circuits. The gate complexity of each elementary gate in the LCU circuit is determined by the truncation order $s_{c}$ of the Taylor-series compensation. Specifically, we consider the compilation of the controlled-Pauli gate and controlled-Pauli rotation gate, as shown in Fig. 11. The number of gates is determined by the weight of the sampled Pauli matrices $\mathrm{wt}(P_{l})$ , which is upper bounded by $\mathcal{O}(s_{c})$ .

Therefore, the gate complexity of the overall algorithm using $K$ th Trotter formula ( $K=0,1,2k$ ) is given by

\displaystyle N_{K}^{(RS)}=\mathcal{O}(\nu(\kappa_{K}L+s_{c}))

(86)

where

\kappa_{K}=\begin{cases}0,&K=0,\\ 1,&K=1,\\ 2\times 5^{K/2-1},&K=2k,k\in\mathbb{N}_{+}.\end{cases}

(87)

Based on Eq. 86, the gate complexity of the $2k$ th-order Trotter-LCU algorithm is then

\mathcal{O}(\nu(\kappa_{K}L+s_{c}))=\mathcal{O}\left((\lambda t)^{1+\frac{1}{4k+1}}(\kappa_{K}L+\frac{\log(1/\varepsilon)}{\log\log(1/\varepsilon)})\right).

(88)

Here, $\kappa_{K}$ is the stage number of the Trotter formula. When $K=0$ , $\kappa_{K}=0$ . When $K=2k$ , $\kappa_{K}=2\times 5^{K/2-1}$ . ∎

From Theorem 1 we can see that, by introducing the LCU compensation, the time scaling of the $K$ th-order Trotter-LCU algorithm improve the bare $K$ th-order Trotter time scaling from $1+1/K$ to $1+1/(2K+1)$ . Moreover, the accuracy scaling is exponentially improved. This allowss us to achieve optimal gate complexity with lower-order Trotter implementation. For example, the time dependence of using first-order (respectively, second-order) Trotter formula can be improved from $t^{2}$ (respectively, $t^{1+1/2}$ ) to $t^{1+1/3}$ (respectively, $t^{1+1/5}$ ) by adding LCU compensation.

V TROTTER-LCU FORMULA WITH NESTED-COMMUTATOR COMPENSATION

In this section, we provide detailed construction of the nested-commutator compensation Trotter-LCU algorithms and the gate complexity analysis. We first sketch the procedure to derive the nested-commutator form LCU formula in subsection V.1. Then we construct the LCU formula of of the first-order Trotter remainder of the lattice Hamiltonians in subsection V.2 as an example. In subsection V.3, we describe the random-sampling implementation of the nested-commutator compensation algorithm and analyze its gate complexity.

V.1 Derivation of the nested-commutator formula

Our aim is to expand the LCU formula for the $K$ th-order Trotter remainder $V_{K}(x):=U(x)S_{K}(x)^{\dagger}$ ( $K=1$ or $2k$ , $k\in\mathbb{N}_{+}$ ) in the following form:

V_{K}(x)=I+\sum_{s=K+1}^{2K+1}F_{K,s}^{(nc)}(x)+F_{K,\mathcal{res}}^{(nc)}(x),

(89)

where the leading-order terms $F_{K,s}^{(nc)}(x)$ are written as a summation of nested commutators. We will use the following lemma of the operator-valued differential equation.

Lemma 3 ((Lemma A.1 in Childs et al. (2021)).

Let $H(x)$ , $R(x)$ be continuous operator-valued functions defined for $x\in\mathbb{R}$ . Then, the first-order differential equation

\frac{d}{dx}W(x)=H(x)W(x)+R(x),\quad W(0)\text{ known},

(90)

has a unique solution given by

		$\displaystyle W(x)=\mathcal{T}\exp\left(\int_{0}^{x}d\tau H(\tau)\right)W(0)$		(91)
		$\displaystyle\quad+\int_{0}^{x}d\tau_{1}\mathcal{T}\exp\left(\int_{\tau_{1}}^{x}d\tau_{2}H(\tau_{2})\right)R(\tau_{1}).$		(91)

Here, $\mathcal{T}$ is the time-ordering operator.

In Eq. 90, if we set $W(x)$ to be the real-time evolution $U(x)=e^{-iHx}$ , we can find that $H(x)=-iH$ and $R(x)=0$ . Therefore, $R(x)$ reflects the derivation of $W(x)$ from the exponential function. We are going to apply Lemma 3 to $S_{K}(x)^{\dagger}$ , i.e., we set $W(x):=S_{K}(x)^{\dagger}$ and $H(x)=iH$ . The deviation of $S_{K}(x)^{\dagger}$ from $U(x)^{\dagger}=e^{ixH}$ is characterized by the following function,

R_{K}(x)=\frac{d}{dx}S_{K}(x)^{\dagger}-iHS_{K}(x)^{\dagger}.

(92)

Applying Lemma 3, we have

S_{K}(x)^{\dagger}=e^{ixH}+\int_{0}^{x}d\tau e^{i(x-\tau)H}R_{K}(\tau).

(93)

The Trotter remainder $V_{K}(x)$ can then be expressed as

$\displaystyle V_{K}(x)$	$\displaystyle=U(x)S_{K}(x)^{\dagger}=I+\int_{0}^{x}d\tau e^{-i\tau H}R_{K}(\tau)$	(94)
	$\displaystyle=I+\int_{0}^{x}d\tau U(\tau)S_{K}(\tau)^{\dagger}S_{K}(\tau)R_{K}(\tau)$
	$\displaystyle=I+\int_{0}^{x}d\tau V_{K}(\tau)J_{K}(\tau),$

where

J_{K}(x):=S_{K}(x)R_{K}(x).

(95)

Eq. 94 provides a recurrence formula to solve the expansion terms in $V_{K}(x)$ . To be more explicit, if we expand $V_{K}(x)$ and $J_{K}(x)$ based on the operator-valued Taylor series,

V_{K}(x)=\sum_{s=0}^{\infty}G_{s}\frac{x^{s}}{s!},\quad J_{K}(x)=\sum_{s=0}^{\infty}C_{s}\frac{x^{s}}{s!},

(96)

where $G_{s}$ and $C_{s}$ denotes the respective $s$ -order term, then we have

	$\displaystyle G_{s+1}$	$\displaystyle=\frac{(s+1)!}{x^{s+1}}\int_{0}^{x}d\tau\sum_{m=0}^{s}G_{m}C_{s-m}\frac{\tau^{s}}{m!(s-m)!}$		(97)
		$\displaystyle=\sum_{m=0}^{s}\binom{s}{m}G_{m}C_{s-m}$		(97)

since Eq. 94 holds for all $x\in\mathbb{R}$ . This is a recurrence formula which can be used to solve all the expansion terms $G_{s}$ of the remainder $V_{K}(x)$ from the expansion terms $C_{r}$ of $J_{K}(x)$ .

To solve the explicit form of $G_{s}$ , we need to study the form of expansion terms $\{C_{r}\}_{r=0}^{s-1}$ for the function $J_{K}(x)$ . We will use the following proposition derived from Lemma 2.

Proposition 5 (Order condition).

For the $K$ th-order Trotter formula $S_{K}(x)$ , the multiplicative remainder $V_{K}(x)$ and derivative remainder $J_{K}(x)$ defined in Eq. 89, Eq. 92 and Eq. 95, respectively, the following statements are equivalent.

(1)

$S_{K}(x)=U(x)+\mathcal{O}(x^{K+1})$ ,
(2)

$(S_{K}(0))^{(j)}=(-iH)^{j}$ , for $0\leq j\leq K$ ,
(3)

$J_{K}(x)=\mathcal{O}(x^{K})$ ,
(4)

$(J_{K}(0))^{(j)}=0$ , for $0\leq j\leq K-1$ ,
(5)

$V_{K}(x)=I+\mathcal{O}(x^{K+1})$ ,
(6)

$V_{K}(0)=I$ and $(V_{K}(0))^{(j)}=0$ , for $1\leq j\leq K$ .

Here, $f(x)^{(j)}$ is the $j$ th-derivative of the function $f(x)$ .

Proof.

First, we note that $(1)\Leftrightarrow(2)$ , $(3)\Leftrightarrow(4)$ , and $(5)\Leftrightarrow(6)$ by applying Lemma 2 and setting $F(x)$ to be $S_{K}(x)-U(x)$ , $J_{K}(x)$ , and $V_{K}(x)-I$ , respectively. So we only need to prove $(2)\Leftrightarrow(4)$ and $(2)\Leftrightarrow(6)$ .

We first prove $(2)\Leftrightarrow(4)$ . From $(2)$ we also have $(S_{K}(x)^{\dagger})^{(j)}=(iH)^{j}$ for $0\leq j\leq K$ . Based on Eq. 92 and Eq. 95, the derivatives of $R_{K}(x)$ and $J_{K}(x)$ are

	$\displaystyle(R_{K}(x))^{(j)}$	$\displaystyle=(S_{K}(x)^{\dagger})^{(j+1)}-iH(S_{K}(x)^{\dagger})^{(j)},$		(98)
	$\displaystyle(J_{K}(x))^{(j)}$	$\displaystyle=\sum_{l=0}^{j}\binom{j}{l}(S_{K}(x))^{(l)}(R_{K}(x))^{(j-l)}.$		(98)

Based on $(2)$ we have $(R_{K}(0))^{(j)}=0$ for $0\leq j\leq K-1$ . Hence $(J_{K}(0))^{(j)}=0$ for $0\leq j\leq K-1$ .

For the reverse direction, we notice that $R_{K}(x)=J_{K}(x)S_{K}(x)^{\dagger}$ . This implies

\displaystyle(R_{K}(x))^{(j)}

\displaystyle=\sum_{l=0}^{j}\binom{j}{l}(J_{K}(x))^{(l)}(S_{K}(x)^{\dagger})^{(j-l)}.

(99)

If we have $(J_{K}(0))^{(j)}=0$ for $0\leq j\leq K-1$ , then $(R_{K}(0))^{(j)}=0$ for $0\leq j\leq K-1$ . Then, from Eq. 98 we have $(S_{K}(x)^{\dagger})^{(j)}=(iH)^{j}$ for $0\leq j\leq K$ , which is equivalent to $(2)$ .

Now, we prove $(2)\Leftrightarrow(6)$ . Based on Eq. 89 we have

(V_{K}(x))^{(j)}=\sum_{l=0}^{j}\binom{j}{l}(U(x))^{(l)}(S_{K}(x)^{\dagger})^{(j-l)}.

(100)

From $(2)$ we have

(V_{K}(0))^{(j)}=\sum_{l=0}^{j}\binom{j}{l}(-iH)^{l}(iH)^{j-l}=0,

(101)

for $1<j<K$ . The reverse direction can be proven similarly based on the derivative of the formula $S_{K}(x)=V_{K}(x)^{\dagger}U(x)$ . ∎

From Proposition 5, we have the following order condition for the $K$ th-order Trotter formula and remainders,

$\displaystyle S_{K}(x)$	$\displaystyle=U(x)+\mathcal{O}(x^{K+1}),$	(102)
$\displaystyle J_{K}(x)$	$\displaystyle=\mathcal{O}(x^{K}),$
$\displaystyle V_{K}(x)$	$\displaystyle=I+\mathcal{O}(x^{K+1}).$

Compare the recurrence formula in Eq. 94 and the order condition in Eq. 102, we will obtain the following relationship for the Taylor-series expansion terms $G_{s}$ and $C_{s}$ for $V_{K}(x)$ and $J_{K}(x)$ , respectively,

$\displaystyle G_{s}$	$\displaystyle=0,\quad s=1,2,.,K,$	(103)
$\displaystyle C_{s}$	$\displaystyle=0,\quad s=0,1,.,K-1,$
$\displaystyle G_{s}$	$\displaystyle=C_{s-1},\quad s=K+1,K+2,.,2K+1.$

Based on Eq. 103, we are going to expand $J_{K}(x)$ based on the operator-valued Taylor-series expansion with integral remainders,

J_{K}(x)=J_{K,L}(x)+J_{K,res,2K}(x)=\sum_{s=K}^{2K}C_{s}\frac{x^{s}}{s!}+J_{K,res,2K}(x),

(104)

where

		$\displaystyle C_{s}=J_{K}^{(s)}(0),\quad s=K,K+1,.,2K,$		(105)
		$\displaystyle J_{K,res,s}(x)=\int_{0}^{x}d\tau\frac{(x-\tau)^{s}}{s!}J_{K}^{(s+1)}(\tau).$		(105)

Here, $J_{K,L}(x):=\sum_{s=K}^{2K}C_{s}\frac{x^{s}}{s!}$ denotes the leading-order terms in $J_{K}(x)$ . Then, from Eq. 94 and Eq. 103, the $K$ th-order Trotter remainder can be expressed as

$\displaystyle V_{K}(x)$	$\displaystyle=I+M_{K}(x)=I+\sum_{s=K+1}^{2K+1}F_{K,s}^{(nc)}(x)+F_{K,\mathcal{res}}^{(nc)}(x),$	(106)
$\displaystyle F_{K,s}^{(nc)}(x)$	$\displaystyle=C_{s-1}\frac{x^{s}}{s!},\quad s=K+1,K+2,.,2K+1,$
$\displaystyle F_{K,\mathcal{res}}^{(nc)}(x)$	$\displaystyle=\int_{0}^{x}d\tau\left(M_{K}(\tau)J_{K,L}(\tau)+V_{K}(\tau)J_{K,res,2K}(\tau)\right).$

Here, $M_{K}(x):=V_{K}(x)-I$ .

From Eq. 106 we can see that, the leading-order expansion terms $F_{K,s}^{(nc)}(x)$ of $V_{K}(x)$ with $s=K+1,K+2,...,2K+1$ owns a simple expression related to $C_{s-1}:=J_{K}^{(s-1)}(0)$ . Later we will show that, $C_{s}$ can be simply written as a summation of nested commutators with the form,

\mathrm{ad}_{H_{l_{j}}}^{m_{j}}...\mathrm{ad}_{H_{l_{2}}}^{m_{2}}\mathrm{ad}_{H_{l_{1}}}^{m_{1}}H_{l},

(107)

where $\sum_{j}m_{j}=s$ , $H_{l},H_{l_{1}},...,H_{l_{j}}$ are different summands in $H$ . Furthermore, we will show that $\{C_{s}\}$ are all anti-Hermitian. As a result, the expansion-order-pairing based on Euler’s formula introduced in Sec. IV can also be applied here. Since $\{C_{s}\}$ are anti-Hermitian, we can expand them by the Pauli operators as follows:

C_{s}=i\|C_{s}\|_{1}\sum_{\gamma}\Pr(\gamma|s)P^{(nc)}_{s}(\gamma),

(108)

where $\{P^{(nc)}_{s}(\gamma)\}$ are Hermitian Pauli operators with coefficients $1$ or $-1$ . $\|C_{s}\|_{1}$ denotes the $1$ -norm of this expansion. The leading-order expansion term $F_{K,s}^{(nc)}(x)$ can then be expressed as

	$\displaystyle F_{K,s}^{(nc)}(x)$	$\displaystyle=i\\|C_{s-1}\\|_{1}\frac{x^{s}}{s!}\sum_{\gamma}\Pr(\gamma\|s-1)P^{(nc)}_{s-1}(\gamma)$		(109)
		$\displaystyle=:\eta^{(nc)}_{s}V_{K,s}^{(nc)}.$		(109)

Here, $\eta^{(nc)}_{s}:=\|C_{s-1}\|_{1}\frac{x^{s}}{s!}$ is the $1$ -norm of the LCU formula of $F_{K,s}^{(nc)}(x)$ in Eq. 109. $V_{K,s}^{(nc)}$ is the normalized LCU formula.

The residue term $F_{K,\mathcal{res}}^{(nc)}$ in Eq. 106, however, is complicated and hard to be expressed simply using nested commutators. Due to the hardness to compensate $F_{K,\mathcal{res}}^{(nc)}$ , we will remove it in the truncated Trotter remainder formula,

		$\displaystyle\tilde{V}_{K}^{(nc)}(x)=I+\sum_{s=K+1}^{2K+1}F_{K,s}^{(nc)}(x)$		(110)
		$\displaystyle=\sum_{s=K}^{2K}\frac{\eta^{(nc)}_{s}}{\eta_{\Sigma}^{(nc)}}\left(I+\eta^{(nc)}_{\Sigma}V_{K,s}^{(nc)}\right)$
		$\displaystyle=\sqrt{1+(\eta_{\Sigma}^{(nc)})^{2}}\sum_{s=K}^{2K}\frac{\eta^{(nc)}_{s}}{\eta_{\Sigma}^{(nc)}}R_{K,s}^{(nc)}(\eta_{\Sigma}).$

Here, $\eta_{\Sigma}^{(nc)}:=\sum_{s=K+1}^{2K+1}\|C_{s-1}\|_{1}\frac{x^{s}}{s!}$ . In the fourth line, we apply the following pairing procedure based on Euler’s formula, similar to Eq. 75,

		$\displaystyle I+\eta_{\Sigma}^{(nc)}V_{K,s}^{(nc)}=I+i\eta_{\Sigma}^{(nc)}\sum_{\gamma}\Pr(\gamma\|s-1)P^{(nc)}_{s-1}(\gamma)$		(111)
		$\displaystyle=\sqrt{1+(\eta_{\Sigma}^{(nc)})^{2}}\sum_{\gamma}\Pr(\gamma\|s-1)\exp\left(i\theta(\eta_{\Sigma}^{(nc)})P^{(nc)}_{s-1}(\gamma)\right)$
		$\displaystyle=:\sqrt{1+(\eta_{\Sigma}^{(nc)})^{2}}R_{2,s}^{(nc)}(\eta_{\Sigma}^{(nc)}),$

for $s=K+1,K+2,...,2K+1$ . Recall that $\theta(x):=\tan^{-1}(x)$ .

$\tilde{V}_{K}^{(nc)}(x)$ in Eq. 110 is the final LCU formula for the nested-commutator compensation of $V_{K}(x)$ . In what follows, we estimate the $1$ -norm $\mu_{K}^{(nc)}(x)$ and distance $\varepsilon_{K}^{(nc)}(x)$ of this LCU formula,

	$\displaystyle\mu_{K}^{(nc)}(x)$	$\displaystyle=\sqrt{1+(\eta_{\Sigma}^{(nc)})^{2}},$		(112)
	$\displaystyle\varepsilon_{K}^{(nc)}(x)$	$\displaystyle=\\|\tilde{V}_{K}^{(nc)}(x)-V_{K}(x)\\|=\\|F_{K,\mathcal{res}}^{(nc)}(x)\\|.$		(112)

Proposition 6 (Bound the 1-norm and error of nested-commutator expansion formula).

$\tilde{V}^{(nc)}_{K}(x)$ in Eq. 110 is a $(\mu_{K}^{(nc)}(x),\varepsilon_{K}^{(nc)}(x))$ formula of the $K$ th-order Trotter remainder $V_{K}(x)$ with

	$\displaystyle\mu_{K}^{(nc)}(x)$	$\displaystyle=\sqrt{1+(\eta_{\Sigma}^{(nc)})^{2}}=\sqrt{1+\left(\sum_{s=K}^{2K}\\|C_{s}\\|_{1}\frac{x^{s+1}}{(s+1)!}\right)^{2}}$		(113)
	$\displaystyle\varepsilon_{K}^{(nc)}(x)$	$\displaystyle\leq\int_{0}^{x}d\tau\left(\\|M_{K}(\tau)\\|\\|J_{K,L}(\tau)\\|+\\|J_{K,res,2K}(\tau)\\|\right),$		(113)

where

$\displaystyle\\|J_{K,L}(\tau)\\|$	$\displaystyle\leq\sum_{s=K}^{2K}\\|C_{s}\\|\frac{x^{s+1}}{(s+1)!},$	(114)
$\displaystyle\\|J_{K,res,2K}(\tau)\\|$	$\displaystyle\leq\int_{0}^{x}d\tau\frac{(x-\tau)^{s}}{s!}\\|J_{K}^{(s+1)}(\tau)\\|,$
$\displaystyle\\|M_{K}(\tau)\\|$	$\displaystyle\leq\int_{0}^{\tau}d\tau_{1}\int_{0}^{\tau_{1}}d\tau_{2}\frac{(\tau_{1}-\tau_{2})^{(K-1)}}{(K-1)!}\\|J_{K}^{(K)}(\tau_{2})\\|.$

Proof.

The value of $\mu_{K}^{(nc)}(x)$ is derived based on Eq. 109 and Eq. 110. To calculate $\varepsilon_{K}^{(nc)}(x)$ , we will use the following bound,

		$\displaystyle\varepsilon_{K}^{(nc)}(x)=\\|F_{K,\mathcal{res}}^{(nc)}(x)\\|$		(115)
		$\displaystyle\leq\int_{0}^{x}d\tau\left(\\|M_{K}(\tau)\\|\\|J_{K,L}(\tau)\\|+\\|V_{K}(\tau)\\|\\|J_{K,res,2K}(\tau)\\|\right)$
		$\displaystyle=\int_{0}^{x}d\tau\left(\\|M_{K}(\tau)\\|\\|J_{K,L}(\tau)\\|+\\|J_{K,res,2K}(\tau)\\|\right).$

In the second equality, we use the fact that $V_{K}(\tau)$ is a unitary. To bound $\|J_{K,L}(\tau)\|$ , we have

\|J_{K,L}(\tau)\|\leq\sum_{s=K}^{2K}\|C_{s}\|\frac{x^{s+1}}{(s+1)!}.

(116)

To bound $J_{K,res,2K}(\tau)$ in Eq. 115, from Eq. 105 we have

\|J_{K,res,2K}(\tau)\|\leq\int_{0}^{x}d\tau\frac{(x-\tau)^{s}}{s!}\|J_{K}^{(s+1)(\tau)}\|.

(117)

Finally, to bound $\|M_{K}(\tau)\|$ in Eq. 115, we have

$\displaystyle\\|M_{K}(\tau)\\|$	$\displaystyle=\left\\|\int_{0}^{\tau}d\tau_{1}V_{K}(\tau_{1})J_{K}(\tau_{1})\right\\|$	(118)
	$\displaystyle\leq\int_{0}^{\tau}d\tau_{1}\\|J_{K}(\tau_{1})\\|=\int_{0}^{\tau}d\tau_{1}\\|J_{K,res,K-1}(\tau_{1})\\|$
	$\displaystyle\leq\int_{0}^{\tau}d\tau_{1}\int_{0}^{\tau_{1}}d\tau_{2}\frac{(\tau_{1}-\tau_{2})^{(K-1)}}{(K-1)!}\\|J_{K}^{(K)}(\tau_{2})\\|.$

In the first line, we use the definition of $M_{K}(\tau)=V_{K}(\tau)-I$ and the recurrence formula in Eq. 94. In the second line, we use the property that $V_{K}(\tau)$ is a unitary. In the third line, we use the order condition in Proposition 5. In the final line, we apply the operator-valued Taylor-series expansion on $J_{K}(\tau)$ and set the truncation order $s_{c}=K-1$ .

∎

From Proposition 6 we can see that, to study the performance of the truncated LCU formula $\tilde{V}^{(nc)}_{K}(x)$ in Eq. 110, we only need to study the property of the derivatives of $J_{K}(x)$ , including $\{C_{s}\}$ . In the following section, we are going to derive the explicit formula of $\tilde{V}^{(nc)}_{K}(x)$ , taking the lattice Hamiltonian with first-order Trotter formulas as an example.

V.2 Example: first-order lattice Hamiltonian

Now, we focus on the lattice Hamiltonians with the form $H=A+B$ in Eq. 10. For the lattice Hamiltonian, the first-order Trotter formula is $S_{1}(x)=e^{-ixB}e^{-ixA}$ . Here, we assume that the time-evolution of each two-qubit component $e^{-ixH_{j,j+1}}$ is easy to be implement on the quantum computer.

To derive the explicit form of the LCU formula $\tilde{V}_{1}^{(nc)}(x)$ in Eq. 110, we first derive $J_{1}(x)$ defined in Eq. 95 and its derivatives $C_{s}:=J_{1}^{(s)}(x)$ . From Eq. 92 and Eq. 95 we have

$\displaystyle R_{1}(x)$	$\displaystyle=\frac{d}{dx}S_{1}(x)^{\dagger}-iHS_{1}(x)^{\dagger}=i[e^{ixA},B]e^{ixB},$	(119)
$\displaystyle J_{1}(x)$	$\displaystyle=S_{1}(x)R_{1}(x)=ie^{-ixB}e^{-ixA}[e^{ixA},B]e^{ixB}$
	$\displaystyle=i\left(B-e^{-ix\mathrm{ad}_{B}}e^{-ix\mathrm{ad}_{A}}B\right).$

Applying the general Libniz formula to $J_{1}(x)$ we have,

J_{1}^{(s)}(x)=(-i)^{s+1}\sum_{\begin{subarray}{c}m_{1},n_{1};\\ m_{1}+n_{1}=s\end{subarray}}\binom{s}{m_{1}}e^{-ix\mathrm{ad}_{B}}\mathrm{ad}_{B}^{m_{1}}e^{-ix\mathrm{ad}_{A}}\mathrm{ad}_{A}^{n_{1}}B.

(120)

If we set the truncation of $J_{1}^{(s)}(x)$ to be $s_{c}$ and apply the following operator Taylor-series expansion formula,

Q(x)=\sum_{s=0}^{s_{c}}\frac{x^{s}}{s!}Q^{(s)}(0)+\int_{0}^{x}d\tau\frac{(x-\tau)^{s_{c}}}{s_{c}!}Q^{(s_{c}+1)}(\tau),

(121)

we can expand $J_{1}(x)$ as

J_{1}(x)=\sum_{s=0}^{s_{c}}C_{s}\frac{x^{s}}{s!}+J_{1,\mathcal{res}}(x),

(122)

where

	$\displaystyle C_{s}=J_{1}^{(s)}(0)$	$\displaystyle=(-i)^{s+1}\sum_{\begin{subarray}{c}m_{1},n_{1};\\ m_{1}+n_{1}=s\end{subarray}}\binom{s}{m_{1}}\mathrm{ad}_{B}^{m_{1}}\mathrm{ad}_{A}^{n_{1}}B,$		(123)
	$\displaystyle J_{1,res,s_{c}}(x)$	$\displaystyle=\int_{0}^{x}d\tau\frac{(x-\tau)^{s_{c}}}{s_{c}!}J_{1}^{(s_{c}+1)}(\tau).$		(123)

We can see that, $C_{s}$ can be written as the summation of nested commutators with concise form $\mathrm{ad}_{B}^{m_{1}}\mathrm{ad}_{A}^{n_{1}}B$ . Note that, $(-i)^{s+1}\mathrm{ad}_{B}^{m_{1}}\mathrm{ad}_{A}^{n_{1}}B$ with $m_{1}+n_{1}=s$ is always anti-Hermitian when $A$ and $B$ are Hermitian. On the other hand, these nested commutators are all with the nice property that their spectral norm and $1$ -norm is linear to the system size $n$ . To be more specific, we have the following norm bound.

Proposition 7.

Consider a lattice Hamiltonian $H=A+B$ with the form in Eq. 10. Suppose the spectral norm and $1$ -norm of its components $H_{j,j+1}$ are upper bounded by $\Lambda$ and $\Lambda_{1}$ . Then for the nested commutators appearing in Eq. 123, we have the following bound

\displaystyle\left\|e^{-i\tau\mathrm{ad}_{B}}\mathrm{ad}_{B}^{m_{1}}e^{-i\tau\mathrm{ad}_{A}}\mathrm{ad}_{A}^{n_{1}}B\right\|\leq\frac{n}{2}3^{m_{1}}2^{n_{1}}2^{s}\Lambda^{s+1},

(124)

where $m_{1},n_{1}$ are non-negative integers satisfying $m_{1}+n_{1}=s$ . As a result, we can bound the spectral norm of $J_{1}^{(s)}$ as $\|J_{1}^{(s)}(x)\|\leq\frac{n}{2}10^{s}\Lambda^{s+1}$ . The $1$ -norm upper bound is to simply replace $\Lambda$ by $\Lambda_{1}$ .

Proof.

We first focus on one Hamiltonian term $H_{j,j+1}$ contained in $B$ and bound the norm,

\left\|e^{-i\tau\mathrm{ad}_{B}}\mathrm{ad}_{B}^{m_{1}}e^{-i\tau\mathrm{ad}_{A}}\mathrm{ad}_{A}^{n_{1}}(H_{j,j+1})\right\|\leq 3^{m_{1}}2^{n_{1}}2^{s}\Lambda^{s+1}.

(125)

To do this, we are going to decompose commutator to the elementary nested commutators in the following form:

		$\displaystyle e^{-i\tau\mathrm{ad}_{B}}\mathrm{ad}_{H_{j_{s},j_{s}+1}}.\mathrm{ad}_{H_{j_{n_{1}+1},j_{n_{1}+1}+1}}\cdot$		(126)
		$\displaystyle\;e^{-i\tau\mathrm{ad}_{A}}\mathrm{ad}_{H_{j_{n_{1}},j_{n_{1}}+1}}.\mathrm{ad}_{H_{j_{1},j_{1}+1}}H_{j,j+1},$		(126)

where $j_{1},j_{2},...,j_{s}$ are the possible vertice’s indices. For each elementary nested commutator, the spectral norm can be easily bounded by $(2\Lambda)^{s}\Lambda$ by simply expanding all the commutators and applying triangle inequality. Here, we use the property that the spectral norm of all the exponential operators with anti-Hermitian exponent is $1$ .

Now, we count the number of the possible elementary commutators with the form in Eq. 126. We will check the action of each adjoint operator $\mathrm{ad}_{A}$ or $\mathrm{ad}_{B}$ from the right to the left. For the first location, if we expand $\mathrm{ad}_{A}$ , there will be only two possible elementary nonzero components, $\mathrm{ad}_{H_{j-1,j}}$ and $\mathrm{ad}_{H_{j+1,j+2}}$ . If the next $\mathrm{ad}$ is still $\mathrm{ad}_{A}$ , the support will still be on the four qubits: $j-1,j,j+1$ , and $j+2$ . As a result, there will still be only two possible components, $\mathrm{ad}_{H_{j-1,j}}$ and $\mathrm{ad}_{H_{j+1,j+2}}$ . Similarly, the exponential operator $e^{-i\tau\mathrm{ad}_{A}}$ will not enlarge the support since one can expand it to the power of $\mathrm{ad}_{A}$ . The support will be enlarged when $\mathrm{ad}_{B}$ comes. In this layer, the support of the operator will be expanded to six qubits, and there will be three elementary components. The number of possible elementary commutators is then

3^{m_{1}}2^{n_{1}}.

(127)

Combining the number of elementary nested commutators and the norm bound for each commutator and applying triangle inequality, we will obtain Eq. 125. Finally, in the operator $B$ , there are $\frac{n}{2}$ possible summands. This finishes the proof of Eq. 124.

Now, we apply Eq. 124 to bound the norm of $J_{1}^{(s)}(x)$ . From Eq. 120 we have

$\displaystyle\\|J_{1}^{(s)}(x)\\|$	$\displaystyle\leq\sum_{\begin{subarray}{c}m_{1},n_{1};\\ m_{1}+n_{1}=s\end{subarray}}\binom{s}{m_{1}}\\|e^{-ix\mathrm{ad}_{B}}\mathrm{ad}_{B}^{m_{1}}e^{-ix\mathrm{ad}_{A}}\mathrm{ad}_{A}^{n_{1}}B\\|$	(128)
	$\displaystyle\leq\frac{n}{2}2^{s}\Lambda^{s+1}\sum_{\begin{subarray}{c}m_{1},n_{1};\\ m_{1}+n_{1}=s\end{subarray}}\binom{s}{m_{1}}3^{m_{1}}2^{n_{1}}$
	$\displaystyle=\frac{n}{2}0^{s}\Lambda^{s+1}.$

In the third line, we apply the binomial theorem.

Since $1$ -norm can be estimated based on the same logic by counting the number of elementary nested commutators and the $1$ -norm of each nest commutator, the derivation for the $1$ -norm is similar by replacing $\Lambda$ to $\Lambda_{1}$ . ∎

As introduced in subsection V.1, when $K=1$ , we set $F_{1,s}^{(nc)}(x)$ with $s=2,3$ to be the leading-order terms and set the truncation order $s_{c}=3$ . That is, we only compensate the second- and third-order error using LCU methods. While we are not able to achieve the logarithmic accuracy similar to PTSC algorithms, we can achieve a high accuracy of $\mathcal{O}(\varepsilon^{-1/3})$ , which is cubicly improved comparing to the bare first-order Trotter result $\mathcal{O}(\varepsilon^{-1})$ .

Based on Eq. 110, the truncated nested-commutator LCU formula for $V_{1}(x)$ can be written as

	$\displaystyle\tilde{V}_{1}(x)$	$\displaystyle=I+\sum_{s=2}^{3}F_{1,s}^{(nc)}(x),$		(129)
		$\displaystyle=\sqrt{1+(\eta_{\Sigma}^{(nc)})^{2}}\sum_{s=2}^{3}\frac{\eta_{s}^{(nc)}}{\eta_{\Sigma}^{(nc)}}R_{1,s}^{(nc)}(\eta_{\Sigma}).$		(129)

Here, $\eta_{\Sigma}^{(nc)}:=\sum_{s=2}^{3}\|C_{s-1}\|_{1}\frac{x^{s}}{s!}$ . The explicit form of $R_{1,s}^{(nc)}(\eta_{\Sigma})$ can be obtained by the definitions in Eq. 109, Eq. 111 and the Pauli operator decomposition based on the nested-commutator form in Eq. 123.

Combined with the deterministic first-order Trotter formula, the overall LCU formula for $U(x)$ is

\displaystyle\tilde{U}_{1}^{(nc)}(x)=\tilde{V}_{1}^{(nc)}(x)S_{1}(x).

(130)

Following Proposition 6 and Proposition 7, we can bound the $1$ -norm and error of the LCU formula in Eq. 129 and Eq. 130 as follows.

Proposition 8 (first-order Trotter-LCU formula by nested-commutator compensation for lattice Hamiltonians).

Consider a lattice Hamiltonian with the form in Eq. 10. For $\min\{\frac{1}{4\Lambda},\frac{3}{20\Lambda_{1}}\}>x>0$ , Eq. 129 is a $(\mu_{1}^{(nc)}(x),\varepsilon_{1}^{(nc)}(x))$ -LCU formula of $V_{1}(x)$ with

	$\displaystyle\mu_{1}^{(nc)}(x)$	$\displaystyle\leq\exp\left(10n^{2}(\Lambda_{1}x)^{4}\right),$		(131)
	$\displaystyle\varepsilon_{1}^{(nc)}(x)$	$\displaystyle\leq 5n^{2}(\Lambda x)^{4}.$		(131)

As a result, $\tilde{U}_{1}^{(nc)}(x)$ in Eq. 130 is a $(\mu_{1}^{(nc)}(x),\varepsilon_{1}^{(nc)}(x))$ -LCU formula of $U(x)$ . Here, $\Lambda_{1}$ and $\Lambda$ are defined in Eq. 11.

Proof.

We start from bounding the $1$ -norm $\mu_{1}^{(nc)}(x)$ . From Proposition 6 we have

$\displaystyle\mu_{1}^{(nc)}(x)$	$\displaystyle\leq\sqrt{1+\left(\sum_{s=1}^{2}\\|C_{s}\\|_{1}\frac{x^{s+1}}{(s+1)!}\right)^{2}}$	(132)
	$\displaystyle\leq 1+\frac{1}{2}\left(\sum_{s=1}^{2}\\|C_{s}\\|_{1}\frac{x^{s+1}}{(s+1)!}\right)^{2}$
	$\displaystyle=1+n^{2}\left(\frac{25}{8}(\Lambda_{1}x)^{4}+\frac{250}{12}(\Lambda_{1}x)^{5}+\frac{1250}{3!3!}(\Lambda_{1}x)^{6}\right)$
	$\displaystyle\leq 1+3n^{2}\frac{25}{8}(\Lambda_{1}x)^{4}\leq\exp\left(10n^{2}(\Lambda_{1}x)^{4}\right).$

In the third line, we use the fact that $C_{s}=J_{1}^{(s)}(0)$ and Proposition 7. In the fourth line, we use the assumption that $\Lambda_{1}x\leq\frac{3}{20}$ .

Now, we bound the spectral norm distance $\varepsilon_{1}^{(nc)}(x)=\|F_{1,\mathcal{res}}^{(nc)}\|$ . From Proposition 6 we know that we only need to bound $\|J_{1,L}(\tau)\|$ , $\|J_{1,res,2}(\tau)\|$ , and $M_{1}(\tau)$ based on Eq. 114. For $\|J_{1,L}(\tau)\|$ , from Proposition 7 we have

\displaystyle\|J_{1,L}(\tau)\|

\displaystyle\leq\sum_{s=1}^{2}\|C_{s}\|\frac{\tau^{s}}{s!}\leq\frac{n}{2}\sum_{s=1}^{2}0^{s}\Lambda^{s+1}\frac{\tau^{s}}{s!}.

(133)

For $\|J_{1,res,2}(\tau)\|$ and $\|M_{1}(\tau)\|$ , from Eq. 114 and Proposition 7 we have

$\displaystyle\\|J_{1,res,2}(\tau)\\|$	$\displaystyle\leq\int_{0}^{\tau}d\tau_{1}\frac{(\tau-\tau_{1})^{2}}{2!}\\|J_{1}^{(3)}(\tau_{1})\\|$	(134)
	$\displaystyle\leq\frac{n}{2}0^{3}\Lambda^{4}\frac{\tau^{3}}{3!},$
$\displaystyle\\|M_{1}(\tau)\\|$	$\displaystyle\leq\int_{0}^{\tau}d\tau_{1}\int_{0}^{\tau_{1}}d\tau_{2}\\|J_{1}^{(1)}(\tau_{2})\\|$
	$\displaystyle\leq\frac{n}{2}0\Lambda^{2}\frac{\tau^{2}}{2!}.$

Based on Eq. 133, Eq. 134 and Proposition 6,

$\displaystyle\\|F_{1,\mathcal{res}}^{(nc)}\\|$	$\displaystyle\leq\int_{0}^{x}d\tau\left(\\|M_{1}(\tau)\\|\\|J_{1,L}(\tau)\\|+\\|J_{1,res,2}(\tau)\\|\right)$	(135)
	$\displaystyle\leq\int_{0}^{x}d\tau\left(\frac{n^{2}}{4}\sum_{s=1}^{2}10^{s+1}\Lambda^{s+3}\frac{\tau^{s+2}}{s!2!}+\frac{n}{2}10^{3}\Lambda^{4}\frac{\tau^{3}}{3!}\right)$
	$\displaystyle=\frac{n^{2}}{4}\sum_{s=1}^{2}0^{s+1}\frac{(\Lambda x)^{s+2}}{s!2!(s+3)}+\frac{n}{2}0^{3}\frac{(\Lambda x)^{4}}{4!}$
	$\displaystyle\leq 2\cdot\frac{n^{2}}{4}0^{2}\frac{(\Lambda x)^{4}}{8}+\frac{n}{2}0^{3}\frac{(\Lambda x)^{4}}{4!}$
	$\displaystyle\leq 5n^{2}(\Lambda x)^{4}.$

In the fourth line, we use the assumption that $x\leq\frac{1}{4\Lambda}$ . ∎

V.3 General construction and performance

We can easily extend the first-order analysis above to the higher-order case. In Appendix B, we provide an explicit construction for second-order nested-commutator expansion, which will be used for the later numerical results. For the general LCU formula for the $K$ th-order Trotter remainder, we have the following proposition to characterize the LCU formulas $\tilde{V}_{K}(x)$ in Eq. 110.

Proposition 9 (Trotter-LCU formula by nested-commutator compensation for lattice Hamiltonians).

Consider a lattice Hamiltonian with the form in Eq. 10. We set $\beta:=2(4\kappa+5)$ where $\kappa$ is the stage number of the $K$ th-order Trotter formula ( $K=1$ or $2k$ ). For $\min\{\frac{K+1}{\beta\Lambda},\frac{K+2}{\beta\Lambda_{1}}\}>x>0$ , $\tilde{V}_{K}^{(nc)}(x)$ in Eq. 110 is a $(\mu_{K}^{(nc)}(x),\varepsilon_{K}^{(nc)}(x))$ -LCU formula of $V_{K}(x)$ with

	$\displaystyle\mu_{K}^{(nc)}$	$\displaystyle\leq\exp\left(\frac{n^{2}\kappa^{2}\beta^{2(K+1)}}{2(K!)^{2}}(\Lambda_{1}x)^{2K+2}\right),$		(136)
	$\displaystyle\varepsilon_{K}^{(nc)}$	$\displaystyle\leq(n\kappa)^{2}\frac{\beta^{2K+1}}{K!(K+1)!}(\Lambda x)^{2K+2}.$		(136)

As a result, $\tilde{U}_{K}^{(nc)}(x)$ in Eq. 110 is a $(\mu_{K}^{(nc)}(x),\varepsilon_{K}^{(nc)}(x))$ -LCU formula of $U(x)$ . Here, $\Lambda$ and $\Lambda_{1}$ are, respectively, the largest spectral norm and $1$ -norm of the lattice Hamiltonian components defined in Eq. 11.

The proof of Proposition 9 is in Appendix C.

The circuit of random-sampling implementation of the NCC algorithm is similar to the PTSC algorithm, which is illustrated in Fig. 7. The only difference is that we sample the Pauli operators based on the nested-commutator expansion formula. In Appendix D, we generalize our random-sampling algorithm to the $K$ th-order situation to demonstrate its scalability. Specifically, the space and time cost of the sampling algorithm are $\mathcal{O}(K\kappa_{K})$ and $\mathcal{O}(K(\log{\kappa_{K}}+\log{n}))$ , respectively. In practice, we need to expand the leading-order terms $\{F_{K,s}\}$ to a summation of different adjoint operators based on the methods in subsection V.1 first, and then calculate the corresponding sampling probability $\Pr(\gamma|s)$ . We have the following theorem to characterize the gate complexity of the $K$ th-order NCC algorithm with random-sampling implementation.

Theorem 2 (Gate complexity of the $K$ th-order random-sampling Trotter-LCU algorithm by nested-commutator compensation for lattice Hamiltonians).

In a $K$ th-order Trotter-LCU algorithm ( $K=1$ or $2k$ ) based on nested-commutator compensation, if the segment number $\nu$ satisfy all the requirements below,

$\displaystyle\nu$	$\displaystyle\geq\max\left\{\frac{\beta\Lambda}{K+1}t,\frac{\beta\Lambda_{1}}{K+2}t\right\},$	(137)
$\displaystyle\nu$	$\displaystyle\geq\left(\frac{\kappa^{2}}{2\ln\mu(K!)^{2}}\right)^{\frac{1}{2K+1}}\beta^{1+\frac{1}{2K+1}}n^{\frac{2}{2K+1}}(\Lambda_{1}t)^{1+\frac{1}{2K+1}},$
$\displaystyle\nu$	$\displaystyle\geq\left(\frac{\mu\kappa^{2}}{K!(K+1)!}\right)^{\frac{1}{2K+1}}\beta\varepsilon^{-\frac{1}{2K+1}}n^{\frac{2}{2K+1}}(\Lambda t)^{1+\frac{1}{2K+1}},$

we can then realize a $(\mu,\varepsilon)$ -LCU formula for $U(t)$ based on $\nu$ segments of $\tilde{U}^{(nc)}_{K}(x)$ in Eq. 110. As a result, the gate complexity of random-sampling $K$ th-order Trotter-LCU algorithm based on nested-commutator compensation for the lattice Hamiltonian is

\mathcal{O}\left(n^{1+\frac{2}{2K+1}}t^{1+\frac{1}{2K+1}}\varepsilon^{-\frac{1}{2K+1}}\right).

(138)

Here, $\beta:=2(4\kappa+5)$ where $\kappa$ is the stage number of the Trotter formula. $\Lambda$ and $\Lambda_{1}$ are, respectively, the largest spectral norm and $1$ -norm of the lattice Hamiltonian components defined in Eq. 11.

Proof.

For the random-sampling implementation, the overall LCU formula for $U(t)$ is to repeat the sampling of $\tilde{U}_{K}^{(nc)}(x)$ for $\nu$ times, $\tilde{U}_{K,\mathcal{tot}}^{(nc)}(t)=\tilde{U}_{K}^{(nc)}(x)^{\nu}$ . Using Proposition 2 and 9, when $0<x<\min\{\frac{K+1}{\beta\Lambda},\frac{K+2}{\beta\Lambda_{1}}\}$ , we conclude that $\tilde{U}_{K,\mathcal{tot}}^{(nc)}(t)$ is a $(\mu_{K,\mathcal{tot}}^{(nc)}(t),\epsilon_{K,\mathcal{tot}}^{(nc)}(t))$ -LCU formula of $U(t)$ with

		$\displaystyle\mu_{K,\mathcal{tot}}^{(nc)}(t)=\mu_{K}(x)^{\nu}\leq\exp\left(\frac{n^{2}\kappa^{2}\beta^{2(K+1)}}{2(K!)^{2}}\frac{(\Lambda_{1}t)^{2K+2}}{\nu^{2K+1}}\right),$		(139)
		$\displaystyle\varepsilon_{K,\mathcal{tot}}^{(nc)}(t)\leq\nu\mu_{K,\mathcal{tot}}^{(nc)}(t)\varepsilon_{K}^{(nc)}(x)$
		$\displaystyle\quad\quad\leq\nu\mu_{K,\mathcal{tot}}^{(nc)}(t)\left((n\kappa)^{2}\frac{\beta^{2K+1}}{K!(K+1)!}\frac{(\Lambda t)^{2K+2}}{\nu^{2K+2}}\right).$

To realize a $(\mu,\varepsilon)$ -LCU formula for $U(t)$ , we only need the segment number $\nu$ satisfy all the requirements in Eq. 137. It suffices to choice

\nu=\mathcal{O}\left(n^{\frac{2}{2K+1}}t^{1+\frac{1}{2K+1}}\varepsilon^{-\frac{1}{2K+1}}\right).

(140)

Based on Eq. 86, the gate complexity of the $K$ th-order Trotter-LCU algorithm is then

\mathcal{O}(\nu(\kappa_{K}L+s_{c}))=\mathcal{O}\left(n^{1+\frac{2}{2K+1}}t^{1+\frac{1}{2K+1}}\varepsilon^{-\frac{1}{2K+1}}\right).

(141)

Here we use the fact that $s_{c}=2K+1=\mathcal{O}(1)$ and $L=\mathcal{O}(n)$ for lattice Hamiltonians. ∎

So far, we have restricted our construction of the nested-commutator expansion to lattice Hamiltonians. In practice, various physical Hamiltonians, including those for the electronic structure of quantum materials Babbush et al. (2018a), quantum chemistry Hamiltonians Lee et al. (2021); Berry et al. (2019); von Burg et al. (2021) and power-law interaction Hamiltonians Childs et al. (2021), also possess sparse properties. As a result, the methods for the nested-commutator expansion of the Trotter remainder $V_{K}(x)$ introduced in subsection V.1 can also be applied to a general Hamiltonian $H$ .

In Appendix E, we discuss how to perform the nested-commutator expansion of $V_{K}(x)$ for general Hamiltonians, and discuss the performance of the resulting LCU formula in Proposition 13. We find that the $1$ -norm of the LCU formula based on nested-commutator expansion is closely related to the following nested commutator norm of the Hamiltonian $H=\sum_{l=1}^{L}H_{l}$ ,

\tilde{\alpha}_{\mathrm{com}}(H):=\sum_{l_{s+1}=1}^{L}...\sum_{l_{2}=1}^{L}\|[H_{l_{s+1}},...[H_{l_{2}},H_{l}]]...]\|.

(142)

$\tilde{\alpha}_{\mathrm{com}}(H)$ is originally defined in Ref. Childs et al. (2021) to analyze the performance of Trotter methods. In Ref. Childs et al. (2021), the authors estimate the values of $\alpha^{(s)}_{H}$ for typical Hamiltonian models like plane-wave-basis quantum chemistry models, $k$ -local Hamiltonian, and Hamiltonians with power-law interactions. Following similar estimation methods, we can also calculate $\tilde{\alpha}_{\mathrm{com}}(H)$ for different models and consider their explicit nested-commutator expansions. We will leave the explicit evaluation of other typical Hamiltonians for a future work.

VI CONCLUSION AND OUTLOOK

We study the Hamiltonian simulation algorithms based on the composition of Trotter and LCU algorithms. In both theoretical and numerical studies, we show that the $0$ th-order paired Taylor-series compensation (PTSC) algorithm, $2k$ th-order PTSC algorithm and the $2k$ th-order nested-commutator compensation (NCC) algorithm enjoy different advantages and will be useful in different scenarios. Taking the $n$ -qubit lattice Hamiltonian as an example: the $0$ th-order PTSC algorithm performs the best when $t$ is small compared with $n$ and $1/\varepsilon$ ; the $2k$ th-order PTSC algorithm performs the best when $n$ is small compared with $t$ and $1/\varepsilon$ ; while the $2k$ th-order NC algorithm performs the best when $1/\varepsilon$ is small compared with $n$ and $t$ . In practice, with finite system size $n$ , simulation time $t$ and inverse accuracy $1/\varepsilon$ , we can think about a hybrid implementation of different algorithms. For example, when the sparsity $L$ of a given Hamiltonian is large, we can first split the Hamiltonian to two parts,

H=H_{1}+H_{2}=\sum_{j=1}^{L_{1}}H_{j}+\sum_{k=L_{1}+1}^{L}H_{k},

(143)

where the summands in $H_{1}$ are the few dominant terms with large coefficients. We can then perform second-order Trotter only for $H_{1}$ , and use PTS to expand the remainder

V_{2}(x;H_{1})=\prod_{j_{1}=L_{1}}^{1}e^{iH_{j_{1}}x}e^{-iHx}\prod_{j_{2}=1}^{L_{1}}e^{iH_{j_{2}}x}.

(144)

If the number of dominant terms $L_{1}$ is small, we can then reduce the $L$ dependence of the algorithm similar to $0$ th-order PTSC algorithm while keep the good $t$ -dependence of second-order PTSC algorithm. As another example, we can hybridize second-order PTSC and NCC algorithms: we apply the nested-commutator compensation for the leading-order terms (i.e., the terms with $s=3,4$ , and $5$ ), and normal Taylor-series compensation for higher-order terms. In this case, we can find an optimal truncation location $s_{c}$ which fulfills the high simulation accuracy requirement and keeps the nested-commutator scaling for the leading-order compensation terms.

The design of Trotter-LCU algorithms is based on a series connection of Trotter and LCU algorithms. A similar composition method can also be exploited for other Hamiltonian simulation algorithms Hagan and Wiebe (2023). For example, we may replace the deterministic Trotter with the ones with random permutation Childs et al. (2019). Recently, Cho et al. Cho et al. (2024) consider similar idea to compensate the Trotter error using randomized unitary operators. Using anticommutative cancellation Zhao and Yuan (2021), we can further reduce the compensation terms.

Acknowledgements.

We thank Xiaoming Zhang, Xiao Yuan, Min-Hsiu Hsieh, Yuan Su, Kaiwen Gui, Ming Yuan, Senrui Chen, and Ying Li for helpful discussion and suggestions. We would like to especially thank Wenjun Yu for highlighting the validity of the variant where the ancillary qubit is measured and reset for each segment. P. Z. and L. J. acknowledge support from the ARO MURI (W911NF-21-1-0325), AFOSR MURI (FA9550-19-1-0399, FA9550-21-1-0209), AFRL (FA8649-21-P-0781), NSF (OMA-1936118, ERC-1941583, OMA-2137642), NTT Research, and the Packard Foundation (2020-71479). J.S. would like to thank support from the Innovate UK (Project No.10075020) and support through Schmidt Sciences, LLC. Q. Z. acknowledges HKU Seed Fund for Basic Research for New Staff via Project No. 2201100596, Guangdong Natural Science Fund via Project No. 2023A1515012185, National Natural Science Foundation of China (NSFC) via Projects No. 12305030 and No. 12347104, Hong Kong Research Grant Council (RGC) via No. 27300823, No. N_HKU718/23, and No. R6010-23, Guangdong Provincial Quantum Science Strategic Initiative GDZX2200001.

Appendix A PAIRED TAYLOR-SERIES COMPENSATION WITH HIGHER-ORDER TROTTER FORMULAS

Following the same idea in subsection IV.2, we now generalize it to the case with $2k$ th-order Trotter formula. Expanding the $2k$ th-order Trotter remainder, we have

V_{2k}(x)=\sum_{s=0}^{\infty}F_{2k,s}(x)=\sum_{s=0}^{\infty}\eta_{s}V_{2k,s},

(145)

where

$\displaystyle F_{2k,s}(x)$	$\displaystyle=\eta_{s}\sum_{r,\gamma}\Pr(r,\gamma\|s)P_{2k}(r,\gamma),$	(146)
$\displaystyle\Pr(r,\gamma\|s)$	$\displaystyle=\Pr(r,\vec{r}_{1:\kappa},\vec{r}^{\prime}_{1:\kappa}\|s)\sum_{l_{1:r}}p_{l_{1:r}}^{(r)},$
$\displaystyle P_{2k}(r,\gamma)$	$\displaystyle=(-i)^{2r-s}\prod^{\leftarrow}\vec{P}^{\vec{r}^{\prime}_{\kappa}}.\prod^{\leftarrow}\vec{P}^{\vec{r}^{\prime}_{1}}\left(P^{(r)}_{l_{1:r}}\right)\cdot$
	$\displaystyle\quad\prod^{\rightarrow}\vec{P}^{\vec{r}_{1}}.\prod^{\rightarrow}\vec{P}^{\vec{r}_{\kappa}}.$

Here, we use $\gamma$ to denote all the expansion variables $\{\vec{r}_{1:\kappa},\vec{r}^{\prime}_{1:\kappa},l_{1:r}\}$ besides $r$ .

We ignore the derivation and provide the general form of the LCU formula for $\tilde{V}_{2k}^{(p)}(x)$ ,

	$\displaystyle\tilde{V}_{2k}^{(p)}(x)$	$\displaystyle=\mu_{2k}^{(p)}(x)\Big{(}\sum_{s=2k+1}^{4k+1}\mathrm{Pr}_{2k}^{(p)}(s)R_{2k,s}^{(p)}(\eta_{\Sigma})$		(147)
		$\displaystyle\quad+\sum_{s=4k+2}^{s_{c}}\mathrm{Pr}_{2k}^{(p)}(s)V_{2k,s}^{(p)}\Big{)}.$		(147)

where

	$\displaystyle\mu_{2k}^{(p)}(x)$	$\displaystyle=\sqrt{1+(\eta_{\Sigma})^{2}}+\sum_{s=4k+2}^{s_{c}}\eta_{s},\quad\eta_{\Sigma}=\sum_{s=2k+1}^{4k+1}\eta_{s},$		(148)
	$\displaystyle\mathrm{Pr}^{(p)}(s)$	$\displaystyle=\frac{1}{\mu_{2k}^{(p)}(x)}\begin{cases}&\sqrt{1+(\eta_{\Sigma})^{2}}\frac{\eta_{s}}{\eta_{\Sigma}},\\ &\quad s=2k+1,2k+2,...,4k+1,\\ &\eta_{s},\\ &\quad s=4k+2,4k+3,...,s_{c}.\\ \end{cases}$

The Pauli rotation unitary $R_{2k,2q+1}^{(p)}(y):=\exp(i\theta(y)P_{2k}^{\prime}(r,\gamma)^{\prime})$ where $\theta(y):=\tan^{-1}(1+y^{2})$ and $P_{2k}^{\prime}(r,\gamma):=(-i)^{\mathbbm{1}[P_{2k,s}(r,\gamma):\text{anti-Her}]}P_{2k}(r,\gamma)$ .

Combined with the deterministic Trotter formula, the overall LCU formula for $U(x)$ is

\displaystyle\tilde{U}_{2k}^{(p)}(x)=\tilde{V}_{2k}^{(p)}(x)S_{2k}(x).

(149)

The following proposition gives the performance characterization of $\tilde{U}_{2k}^{(p)}(x)$ to approximate $U(x)$ .

Proposition 10 ( $2k$ th-order Trotter-LCU formula by paired Taylor-series compensation).

For $0<x<1/(2\lambda)$ and $s_{c}\geq 4k+1$ , $\tilde{V}_{2k}^{(p)}(x)$ in Eq. 147 is a $(\mu_{2k}^{(p)}(x),\varepsilon_{2k}^{(p)}(x))$ -LCU formula of $V_{2k}(x)$ with

	$\displaystyle\mu_{2k}^{(p)}(x)$	$\displaystyle\leq e^{(e+c_{k})(2\lambda x)^{4k+2}},$		(150)
	$\displaystyle\varepsilon_{2k}^{(p)}(x)$	$\displaystyle\leq\left(\frac{2e\lambda x}{s_{c}+1}\right)^{s_{c}+1}.$		(150)

Here,

c_{k}:=\frac{1}{2}\left(\frac{e}{2k+1}\right)^{4k+2},

(151)

so that $0.3>c_{k}>0$ . As a result, $\tilde{U}_{2k}^{(p)}(x)$ in Eq. 149 is a $(\mu_{2k}^{(p)}(x),\varepsilon_{2k}^{(p)}(x))$ -LCU formula of $U(x)$ .

Proof.

We will focus on the case with $k=1$ . The proof for a general $k$ is simiar. We first bound the normalization factor $\mu_{2}^{(p)}(x)$ . When $2\lambda x<1$ we have

	$\displaystyle\mu_{2}^{(p)}(x)\leq\sqrt{1+(\eta_{\Sigma})^{2}}+\sum_{s=6}^{\infty}\eta_{s}$
	$\displaystyle\leq 1+\frac{1}{2}\eta_{\Sigma}^{2}+\left(e^{2\lambda x}-\sum_{s=0}^{5}\eta_{s}\right)$
	$\displaystyle=\frac{1}{2}(2\lambda x)^{6}\left(\frac{1}{3!}+\frac{2\lambda x}{4!}+\frac{(2\lambda x)^{2}}{5!}\right)^{2}+\left(e^{2\lambda x}-\sum_{s=1}^{5}\eta_{s}\right)$		(152)
	$\displaystyle\leq\frac{1}{2}(2\lambda x)^{6}\left(\sum_{s=3}^{\infty}\frac{1}{3!}\right)^{2}+\left(e^{2\lambda x}-\sum_{s=1}^{5}\eta_{s}\right)$
	$\displaystyle\leq\frac{1}{2}\left(\frac{e}{3}\right)^{6}(2\lambda x)^{6}+e^{e(2\lambda x)^{6}}\leq e^{(e+c_{1})(2\lambda x)^{6}}.$

In the fifth line, we use Corollary 1.

The distance bound can be derived in the same manner as the one in Proposition 4. ∎

Appendix B TIGHT NESTED-COMMUTATOR ANALYSIS FOR SECOND-ORDER TROTTER-LCU ALGORITHM

We can easily extend the methods for the first-order analysis in subsection V.2 to the higher-order case. Taking the second-order case as an example, the second-order Trotter formula of the lattice Hamiltonian is $S_{2}(x)=e^{-i\frac{x}{2}A}e^{-ixB}e^{-i\frac{x}{2}A}$ .

To derive the explicit LCU formula $\tilde{V}_{2}^{(nc)}(x)$ in Eq. 110, we first derive $J_{2}(x)$ defined in Eq. 95 and its derivatives $C_{s}:=J_{2}^{(s)}(x)$ . From Eq. 92 and Eq. 95 we have

$\displaystyle S_{2}(x)^{\dagger}$	$\displaystyle=e^{ixH}+\int_{0}^{x}d\tau e^{i(x-\tau)H}R_{2}(\tau),$	(153)
$\displaystyle R_{2}(\tau)$	$\displaystyle=i[e^{i\frac{x}{2}A},B]e^{ixB}e^{i\frac{x}{2}A}+\frac{i}{2}[e^{i\frac{x}{2}A}e^{ixB},A]e^{i\frac{x}{2}A},$
$\displaystyle J_{2}(x)$	$\displaystyle=S_{2}(x)R_{2}(x)=\frac{i}{2}\left(A-e^{-i\frac{x}{2}\mathrm{ad}_{A}}e^{-ix\mathrm{ad}_{B}}A\right)$
	$\displaystyle+i\left(e^{-i\frac{x}{2}\mathrm{ad}_{A}}B-e^{-i\frac{x}{2}\mathrm{ad}_{A}}e^{-ix\mathrm{ad}_{B}}e^{-i\frac{x}{2}\mathrm{ad}_{A}}B\right).$

Following the approach in subsection V.1, we expand $V_{2}(x)$ and $J_{2}(x)$ by

\displaystyle V_{2}(x)=\sum_{s=0}^{\infty}G_{s}\frac{x^{s}}{s!},\quad J_{2}(x)=\sum_{s=0}^{\infty}C_{s}\frac{x^{s}}{s!},

(154)

then based on Proposition 5 and the recurrence formula Eq. 94 we have,

	$\displaystyle G_{1}=G_{2}=0,\quad C_{0}=C_{1}=0,$		(155)
	$\displaystyle G_{s}=C_{s-1},\quad s=3,4,5.$		(155)

Combining Eq. 153 and Eq. 155, we can show that

G_{s}=\mathcal{O}(n^{\lfloor\frac{s}{3}\rfloor}).

(156)

Therefore, the first three nontrivial terms $G_{3},G_{4},G_{5}=\mathcal{O}(n)$ . These terms will be set as the leading-order terms.

Based on Eq. 155, we are going to expand $J_{K}(x)$ based on the operator-valued Taylor-series expansion with integral remainders,

J_{2}(x)=J_{2,L}(x)+J_{2,res,4}(x)=\sum_{s=2}^{4}C_{s}\frac{x^{s}}{s!}+J_{2,res,4}(x),

(157)

where

	$\displaystyle C_{s}$	$\displaystyle=J_{2}^{(s)}(0),\quad s=2,3,4,$		(158)
	$\displaystyle J_{2,res,s}(x)$	$\displaystyle=\int_{0}^{x}d\tau\frac{(x-\tau)^{s}}{s!}J_{2}^{(s+1)}(\tau).$		(158)

$J_{2,L}(x):=\sum_{s=2}^{4}C_{s}\frac{x^{s}}{s!}$ denotes the leading-order terms in $J_{K}(x)$ . Then, from Eq. 94 and Eq. 155, the second-order Trotter remainder can be expressed as

$\displaystyle V_{2}(x)$	$\displaystyle=I+\sum_{s=3}^{5}F_{2,s}^{(nc)}(x)+F_{2,\mathcal{res}}^{(nc)}(x),$	(159)
$\displaystyle F_{2,s}^{(nc)}(x)$	$\displaystyle=C_{s-1}\frac{x^{s}}{s!},\quad 3,4,5,$
$\displaystyle F_{2,\mathcal{res}}^{(nc)}(x)$	$\displaystyle=\int_{0}^{x}d\tau\left(M_{2}(\tau)J_{2,L}(\tau)+V_{2}(\tau)J_{2,res,4}(\tau)\right).$

We put the explicit nested-commutator expressions of the leading-order terms $F_{2,s}^{(nc)}(x)$ ( $s=3,4$ , or $5$ ) in Sec. B.

In practice, we truncate the formula with the order $s_{c}=5$ . Based on Eq. 110, the truncated nested-commutator LCU formula for $V_{2}(x)$ can be written as

	$\displaystyle\tilde{V}_{2}(x)$	$\displaystyle=I+\sum_{s=3}^{5}F_{2,s}^{(nc)}(x),$		(160)
		$\displaystyle=\sqrt{1+(\eta_{\Sigma}^{(nc)})^{2}}\sum_{s=3}^{5}\frac{\eta_{s}^{(nc)}}{\eta_{\Sigma}^{(nc)}}R_{2,s}^{(nc)}(\eta_{\Sigma}).$		(160)

Here, $\eta_{\Sigma}^{(nc)}:=\sum_{s=3}^{5}\|C_{s-1}\|_{1}\frac{x^{s}}{s!}$ . The explicit form of $R_{2,s}^{(nc)}(\eta_{\Sigma})$ can be obtained by the definitions in Eq. 109, Eq. 111 and the Pauli operator decomposition based on the nested-commutator form in Eq. 164.

For a tight numerical estimation, we seek for a tighter bound of the $1$ -norm $\mu_{2}^{(nc)}(x)$ and spectral norm accuracy $\varepsilon_{2}^{(nc)}(x)$ for the LCU formula in Eq. 160.

B.1 Bound the 1-norm of recurrence function

Hereafter, we define $A^{\prime}:=-\frac{i}{2}A$ and $B^{\prime}:=-iB$ to simplify the notation. We have

$\displaystyle J_{2}(x)$	$\displaystyle=J_{a}(x)+J_{b}(x),$	(161)
$\displaystyle J_{a}(x)$	$\displaystyle=\left(e^{x\mathrm{ad}_{A^{\prime}}}e^{x\mathrm{ad}_{B^{\prime}}}A^{\prime}-A^{\prime}\right),$
$\displaystyle J_{b}(x)$	$\displaystyle=\left(e^{x\mathrm{ad}_{A^{\prime}}}e^{x\mathrm{ad}_{B^{\prime}}}e^{x\mathrm{ad}_{A^{\prime}}}B^{\prime}-e^{x\mathrm{ad}_{A^{\prime}}}B^{\prime}\right).$

Apply the Libniz rule to $J_{a}(x)$ and $J_{b}(x)$ , we have

$\displaystyle J_{a}^{(s)}(x)$	$\displaystyle=\sum_{l=0}^{s}\binom{s}{l}e^{x\mathrm{ad}_{A^{\prime}}}\mathrm{ad}_{A^{\prime}}^{l}e^{x\mathrm{ad}_{B^{\prime}}}\mathrm{ad}_{B^{\prime}}^{s-l}A^{\prime},$	(162)
$\displaystyle J_{b}^{(s)}(x)$	$\displaystyle=\sum_{\begin{subarray}{c}l_{1},l_{2},l_{3}\\ l_{1}+l_{2}+l_{3}=s,l_{2}+l_{3}\neq 0\end{subarray}}\binom{s}{l_{1},l_{2},l_{3}}\cdot$
	$\displaystyle\quad\quad e^{x\mathrm{ad}_{A^{\prime}}}\mathrm{ad}_{A^{\prime}}^{l_{1}}e^{x\mathrm{ad}_{B^{\prime}}}\mathrm{ad}_{B^{\prime}}^{l_{2}}e^{x\mathrm{ad}_{A^{\prime}}}\mathrm{ad}_{A^{\prime}}^{l_{3}}B^{\prime}.$

Then,

	$\displaystyle C_{s}$	$\displaystyle=\sum_{l=0}^{s-1}\mathrm{ad}_{A^{\prime}}^{l}\mathrm{ad}_{B^{\prime}}^{s-l}A$		(163)
		$\displaystyle\quad+\sum_{\begin{subarray}{c}l_{1},l_{2},l_{3}\\ l_{1}+l_{2}+l_{3}=s,l_{2}+l_{3}\neq 0\end{subarray}}\binom{s}{l_{1},l_{2},l_{3}}\mathrm{ad}_{A^{\prime}}^{l_{1}}\mathrm{ad}_{B^{\prime}}^{l_{2}}\mathrm{ad}_{A^{\prime}}^{l_{3}}B.$		(163)

We can solve the explicit form of the leading-order terms with $s=2,3$ , and $4$ ,

$\displaystyle C_{2}$	$\displaystyle=\left(\mathrm{ad}_{B^{\prime}}^{2}A^{\prime}+2\mathrm{ad}_{A^{\prime}}\mathrm{ad}_{B^{\prime}}A^{\prime}\right)+\left(3\mathrm{ad}_{A^{\prime}}^{2}B^{\prime}+2\mathrm{ad}_{B^{\prime}}\mathrm{ad}_{A^{\prime}}B^{\prime}\right),$	(164)
$\displaystyle C_{3}$	$\displaystyle=\left(\mathrm{ad}_{B^{\prime}}^{3}A^{\prime}+3\mathrm{ad}_{A^{\prime}}\mathrm{ad}_{B^{\prime}}^{2}A^{\prime}+3\mathrm{ad}_{A^{\prime}}^{2}\mathrm{ad}_{B^{\prime}}A^{\prime}\right)+(7\mathrm{ad}_{A^{\prime}}^{3}B^{\prime}$
	$\displaystyle\quad+3\mathrm{ad}_{B^{\prime}}\mathrm{ad}_{A^{\prime}}^{2}B^{\prime}+6\mathrm{ad}_{A^{\prime}}\mathrm{ad}_{B^{\prime}}\mathrm{ad}_{A^{\prime}}B^{\prime}+3\mathrm{ad}_{B^{\prime}}^{2}\mathrm{ad}_{A^{\prime}}B^{\prime}),$
$\displaystyle C_{4}$	$\displaystyle=\left(\mathrm{ad}_{B^{\prime}}^{4}A^{\prime}+4\mathrm{ad}_{A^{\prime}}\mathrm{ad}_{B^{\prime}}^{3}A^{\prime}+6\mathrm{ad}_{A^{\prime}}^{2}\mathrm{ad}_{B^{\prime}}^{2}A^{\prime}+4\mathrm{ad}_{A^{\prime}}^{3}\mathrm{ad}_{B^{\prime}}A^{\prime}\right)$
	$\displaystyle\quad+(5\mathrm{ad}_{A^{\prime}}^{4}B^{\prime}+4\mathrm{ad}_{B^{\prime}}\mathrm{ad}_{A^{\prime}}^{3}B^{\prime}+2\mathrm{ad}_{A^{\prime}}^{2}\mathrm{ad}_{B^{\prime}}\mathrm{ad}_{A^{\prime}}B^{\prime}$
	$\displaystyle\quad+2\mathrm{ad}_{A^{\prime}}\mathrm{ad}_{B^{\prime}}\mathrm{ad}_{A^{\prime}}^{2}B^{\prime}+6\mathrm{ad}_{B^{\prime}}^{2}\mathrm{ad}_{A^{\prime}}^{2}B^{\prime}$
	$\displaystyle\quad+2\mathrm{ad}_{A^{\prime}}\mathrm{ad}_{B^{\prime}}^{2}\mathrm{ad}_{A^{\prime}}B^{\prime}+4\mathrm{ad}_{B^{\prime}}^{3}\mathrm{ad}_{A^{\prime}}B^{\prime}).$

We can then expand the operators $A^{\prime}$ and $B^{\prime}$ to Pauli operators and solve the $1$ -norm of $C_{2},C_{3}$ , and $C_{4}$ under the Pauli decomposition based on Eq. 164. The $1$ -norm of $\tilde{V}_{2}^{(nc)}(x)$ is given by

\tilde{\mu}_{2}^{(nc)}(x)=\sqrt{1+\left(\sum_{s=2}^{4}\|C_{s}\|_{1}\frac{x^{s+1}}{(s+1)!}\right)^{2}}.

(165)

We summarize the procedure to estimate the segment number $\nu$ of second-order Trotter-LCU under nested-commutator compensation as follows.

1.

For a specific lattice Hamiltonian, get the explicit Pauli expansion form of $C_{2}$ , $C_{3}$ , $C_{4}$ by Eq. 164.
2.

Get the expression of the $1$ -norm $\tilde{\mu}^{(nc)}(x)$ by Eq. 165. Calculate $x_{U}$ based on $\tilde{\mu}^{(nc)}(x)=2$ .
3.

Solve the following residue operator,

$V_{2,\mathcal{res}}(x)=V_{2}(x)-\tilde{V}_{2}(x)=U(x)S_{2}(x)^{\dagger},$ (166)

calculate its spectral norm $\|V_{2,\mathcal{res}}(x)\|$ numerically. Check either the following requirement are satisfied:

$\|V_{2,\mathcal{res}}(x)\|\frac{t}{x}\leq\varepsilon,$ (167)

where $\varepsilon$ is a preset accuracy requirement. If not, we search the largest $x$ value in the region $0<x<x_{U}$ by dichotomy which makes Eq. 167 satisfied.
4.

The number of segments is given by $\nu:=t/x$ . We then calculate the number of gates based on methods in Appendix B.2.

We have a tighter count bound for the norm of $J_{a}^{(s)}(x)$ and $J_{b}^{(s)}(x)$ ,

Proposition 11.

Consider a lattice Hamiltonian $H=A+B$ with the form in Eq. 10. Suppose the spectral norm and $1$ -norm of its components $H_{j,j+1}$ are bounded by $\Lambda$ and $\Lambda_{1}$ . Then for the recurrence function $J_{2}(x)$ , we have the following norm bound for its derivatives,

	$\displaystyle\\|J^{(s)}_{2}(x)\\|$	$\displaystyle\leq\frac{n}{2}(7^{s}+2^{s})\Lambda^{s+1},$		(168)
	$\displaystyle\\|J^{(s)}_{2}(x)\\|_{1}$	$\displaystyle\leq\frac{n}{2}(7^{s}+2^{s})\Lambda_{1}^{s+1}.$		(168)

Proof.

From Eq. 162 we have,

$\displaystyle\\|J_{a}^{(s)}(x)\\|$	$\displaystyle\leq\sum_{l=0}^{s}\binom{s}{l}\\|e^{x\mathrm{ad}_{A^{\prime}}}\mathrm{ad}_{A^{\prime}}^{l}e^{x\mathrm{ad}_{B^{\prime}}}\mathrm{ad}_{B^{\prime}}^{s-l}A^{\prime}\\|,$	(169)
$\displaystyle\\|J_{b}^{(s)}(x)\\|$	$\displaystyle\leq\sum_{\begin{subarray}{c}l_{1},l_{2},l_{3}\\ l_{1}+l_{2}+l_{3}=s,l_{2}+l_{3}\neq 0\end{subarray}}\binom{s}{l_{1},l_{2},l_{3}}\cdot$
	$\displaystyle\quad\\|e^{x\mathrm{ad}_{A^{\prime}}}\mathrm{ad}_{A^{\prime}}^{l_{1}}e^{x\mathrm{ad}_{B^{\prime}}}\mathrm{ad}_{B^{\prime}}^{l_{2}}e^{x\mathrm{ad}_{A^{\prime}}}\mathrm{ad}_{A^{\prime}}^{l_{3}}B^{\prime}\\|.$

To bound the norm of nested commutators, we use the same methods in Proposition 7. We have

		$\displaystyle\\|e^{x\mathrm{ad}_{A^{\prime}}}\mathrm{ad}_{A^{\prime}}^{l}e^{x\mathrm{ad}_{B^{\prime}}}\mathrm{ad}_{B^{\prime}}^{s-l}A\\|$		(170)
		$\displaystyle\quad\quad\leq(3^{l}2^{s-l})(\Lambda^{l}(2\Lambda)^{s-l})\cdot(\frac{n}{2}\Lambda),$
		$\displaystyle\\|e^{x\mathrm{ad}_{A^{\prime}}}\mathrm{ad}_{A^{\prime}}^{l_{1}}e^{x\mathrm{ad}_{B^{\prime}}}\mathrm{ad}_{B^{\prime}}^{l_{2}}e^{x\mathrm{ad}_{A^{\prime}}}\mathrm{ad}_{A^{\prime}}^{l_{3}}B\\|$
		$\displaystyle\quad\quad\leq(4^{l_{1}}3^{l_{2}}2^{l_{3}})(\Lambda^{l_{1}}(2\Lambda)^{l_{2}}\Lambda^{l_{3}})\cdot(\frac{n}{2}\Lambda).$

Here, the first bracket of each bound corresponds to possible nest commutators, while the second bracket of each bound indicates the norm enlargement of each nested commutator.

Combining Eq. 169 and Eq. 170, we have

$\displaystyle\\|J_{a}^{(s)}(x)\\|$	$\displaystyle\leq\sum_{l=0}^{s}\binom{s}{l}(3\Lambda)^{l}(2\cdot 2\Lambda)^{s-l})\cdot(\frac{n}{2}\Lambda)$	(171)
	$\displaystyle\leq(7\Lambda)^{s}\left(\frac{n}{2}\Lambda\right),$
$\displaystyle\\|J_{b}^{(s)}(x)\\|$	$\displaystyle\leq\sum_{\begin{subarray}{c}l_{1},l_{2},l_{3}\\ l_{1}+l_{2}+l_{3}=s,l_{2}+l_{3}\neq 0\end{subarray}}\binom{s}{l_{1},l_{2},l_{3}}\cdot$
	$\displaystyle\quad(4\Lambda)^{l_{1}}(3\cdot 2\Lambda)^{l_{2}}(2\Lambda)^{l_{3}})\cdot(\frac{n}{2}\Lambda)$
	$\displaystyle\leq(2\Lambda)^{s}\left(\frac{n}{2}\Lambda\right).$

The $1$ -norm bound can be derived in the same way. ∎

We can derive the bounds for $\tilde{\mu}_{2}^{(nc)}(x)$ and $\varepsilon_{2}^{(nc)}(x)$ as follows:

	$\displaystyle\tilde{\mu}_{2}^{(nc)}(x)=\sqrt{1+\left(\sum_{s=2}^{4}\\|C_{s}\\|_{1}\frac{x^{s+1}}{(s+1)!}\right)^{2}}$		(172)
	$\displaystyle\leq\sqrt{1+\frac{n^{2}}{4}\left(\sum_{s=2}^{4}(7^{s}+12^{s})\frac{(\Lambda_{1}x)^{s+1}}{(s+1)!}\right)^{2}}$		(173)
	$\displaystyle\leq\exp\left((35n)^{2}(\Lambda_{1}x)^{6}\right),$		(174)
	$\displaystyle\varepsilon_{2}^{(nc)}(x)\leq\frac{n^{2}}{4}(7^{2}+12^{2})\sum_{s=2}^{4}(7^{s}+12^{s})\frac{(\Lambda x)^{s+4}}{3!s!(s+4)}$
	$\displaystyle\quad\quad\quad+\frac{n}{2}(7^{5}+12^{5})\frac{(\Lambda x)^{6}}{6!}$		(175)
	$\displaystyle\leq(3n^{2}+185n)(\Lambda x)^{6}\leq 370n^{2}(\Lambda x)^{6},$		(176)

Here, Eq. 174 holds when $\Lambda_{1}x\leq 1/3$ , Eq. 176 holds when $\Lambda x\leq 21/72$ .

B.2 Estimating the gate counts

To estimate the performance, i.e., gate complexity of the Trotter-LCU algorithm based on nested-commutator compensation, we need to first estimate the number of segments $\nu=t/x$ in the algorithm. This is determined by the following three constraints,

$\displaystyle\Lambda_{1}x,\Lambda x\leq 1\quad\Leftarrow$	$\displaystyle\nu\geq\max\{\Lambda_{1}t,\Lambda t\},$	(177)
$\displaystyle(\tilde{\mu}^{(nc)})^{\nu}\leq 2\quad\Leftrightarrow$	$\displaystyle\nu\ln\tilde{\mu}^{(nc)}\leq\ln 2,$
$\displaystyle\nu\tilde{\mu}^{\nu}(\varepsilon^{(nc)})\leq\varepsilon\quad\Leftarrow$	$\displaystyle\nu\varepsilon^{(nc)}\leq\frac{1}{2}\varepsilon.$

After we solve the required segment number $\nu$ by Eq. 177, we can estimate the gate number accordingly. Here, we introduce the method to estimate the gate number of the LCU part. The gate number in the implementation of second-order Trotter formula can be evaluated following the methods in Ref. Childs et al. (2018). In the worst-case scenario, the Pauli weight of the gate is determined by the weight of Pauli operators contained in $C_{4}$ in Eq. 164. The largest Pauli weight is $6$ . As a result, a controlled-Pauli gate will cost at most six $\mathcal{CNOT}$ gates and no non-Clifford gate.

If we consider the paired algorithm, when we sample the third-order term, it will be a Pauli rotation unitary on four qubits. In this case, it will cost eight $\mathcal{CNOT}$ gates and two single-qubit Pauli rotation gates $R_{z}(\theta)$ .

We summarize the whole procedure to estimate the gate complexity.

1.

Input: Hamiltonian parameters: $H=A+B$ , $n$ , $\Lambda$ , $\Lambda_{1}$ . Normalization requirements $\mu=2$ , accuracy requirements $\varepsilon$ , time requirements $t$ .
2.
Normalization factor estimation.
1. (a)
  
  Analytical way (scalable). Using Eq. 173 to get the function $\tilde{\mu}^{(nc)}(x)$ .
2. (b)
  
  Numerical way (scalable). Using Eq. 164 to get the form of $C_{2}$ , $C_{3}$ , and $C_{4}$ . Then get the function $\tilde{\mu}^{(nc)}(x)$ by Eq. 172.
3.
Accuracy estimation.
1. (a)
  
  Analytical way (scalable). Using Eq. 175 to get the function $\varepsilon^{(nc)}(x)$ .
2. (b)
  
  Numerical way (unscalable). Calculate $V_{2,\mathcal{res}}(x):=U(x)S_{2}(x)^{\dagger}-I-\sum_{s=2}^{4}C_{s}\frac{x^{s+1}}{(s+1)!}$ numerically. Solve its largest singular value, which is an upper bound of $\varepsilon^{(nc)}(x)$ .
4.

Based on the constraints in Eq. 177, calculate the segment number $\nu$ .
5.

Analyze the $\mathcal{CNOT}$ and $R_{z}(\theta)$ gate number.

Appendix C EXPLICIT NESTED-COMMUTATOR COMPENSATION FOR HIGHER-ORDER TROTTER REMAINDERS

In this section, we provide detailed results for the nested-commutator compensation for higher-order Trotter remainders introduced in Sec. V. Here, we will focus on the lattice model Hamiltonians in Eq. 10. The results for general Hamiltonians will be presented in Appendix E.

As introduced in Sec. V.1, for the (asymmetric) multiplicative remainder $V_{K}(x)$ , our aim is to construct a LCU formula with the following form

V_{K}(x)=I+\sum_{s=K+1}^{2K+1}F_{K,s}^{(nc)}(x)+F_{K,\mathcal{res}}^{(nc)}(x),

(178)

where $F_{K,\mathcal{res}}^{(nc)}(x)$ denotes the term with $x$ -order $\mathcal{O}(x^{2K+2})$ . In practice, we will use the truncated LCU formula with paired leading-order terms, derived in Eq. 110,

		$\displaystyle\tilde{V}_{K}^{(nc)}(x)=I+\sum_{s=K+1}^{2K+1}F_{K,s}^{(nc)}(x)$		(179)
		$\displaystyle=I+i\sum_{s=K}^{2K}\eta^{(nc)}_{s}V_{K,s}^{(nc)}$
		$\displaystyle=\sum_{s=K}^{2K}\frac{\eta^{(nc)}_{s}}{\eta_{\Sigma}^{(nc)}}\left(I+\eta^{(nc)}_{\Sigma}V_{K,s}^{(nc)}\right)$
		$\displaystyle=\sqrt{1+(\eta_{\Sigma}^{(nc)})^{2}}\sum_{s=K}^{2K}\frac{\eta^{(nc)}_{s}}{\eta_{\Sigma}^{(nc)}}R_{K,s}^{(nc)}(\eta_{\Sigma}),$

where $\eta^{(nc)}_{\Sigma}=\sum_{s=K+1}^{2K+1}\eta^{(nc)}_{s}=\sum_{s=K}^{2K}\|C_{s-1}\|_{1}\frac{x^{s+1}}{(s+1)!}$ .

We are going to finish the following tasks,

1.

(Sec. C.1) Derive the explicit formulas for the leading-order expansion terms $F_{K,s}^{(nc)}(x)$ with $s=K+1,...,2K+1$ .
2.

(Sec. C.2) Prove the $1$ -norm $\mu_{K}^{(nc)}(x)$ and error bound $\varepsilon_{K}^{(nc)}(x)$ in Proposition 9.

C.1 Derivation of LCU formula with nested-commutator form

We first introduce the canonical expression of the $K$ th-order Trotter formula,

S_{K}(x):=W(x)=\prod_{j=1}^{\kappa}W_{j}(x)=W_{\kappa}(x)...W_{2}(x)W_{1}(x),

(180)

where

W_{j}(x):=e^{-ixb_{j}B}e^{-ixa_{j}A},

(181)

is the $j$ th stage of the Trotter formula. $\kappa$ is the stage number. We have $\kappa=1$ for $K=1$ and $\kappa\leq 2\times 5^{k-1}$ when $K=2k$ . The stage lengths $a_{j}$ and $b_{j}$ are determined based on Eq. 24 and Eq. 25. We have

	$\displaystyle 0\leq$	$\displaystyle a_{j},b_{j}\leq 1,\quad\forall j=1,2,.,\kappa,$		(182)
	$\displaystyle\sum_{j=1}^{\kappa}$	$\displaystyle a_{j}=\sum_{j=1}^{\kappa}b_{j}=1.$		(182)

For example, for the second-order Trotter formula $S_{2}(x)=e^{-ix\frac{A}{2}}e^{-ixB}e^{-ix\frac{A}{2}}$ , we set the stage number $\kappa=2$ with $a_{1}=\frac{1}{2},b_{1}=1;a_{2}=\frac{1}{2},b_{2}=0$ .

The Hermitian conjugate of $W(x)$ is

W(x)^{\dagger}=\prod_{j=\kappa}^{1}W_{j}(x)^{\dagger}=W_{1}(x)^{\dagger}W_{2}(x)^{\dagger}...W_{\kappa}(x)^{\dagger}.

(183)

As is discussed in Sec. V.1, to derive the nested-commutator form of $V_{K}(x)$ , we first solve $J_{K}(x)$ defined in Eq. 95. We have

	$\displaystyle R_{K}(x)$	$\displaystyle=\frac{d}{dx}W(x)^{\dagger}-iHW(x)^{\dagger},$		(184)
	$\displaystyle J_{K}(x)$	$\displaystyle=W(x)R_{K}(x).$		(184)

From Proposition 5, we can write the $K$ th-Trotter remainder $V_{K}(x)$ and $J_{K}(x)$ as the following form:

	$\displaystyle V_{K}(x)$	$\displaystyle=I+M_{K}(x)=I+\sum_{s=K+1}^{2K+1}C_{s-1}\frac{x^{s}}{s!}+F_{K,\mathcal{res}}(x),$		(185)
	$\displaystyle J_{K}(x)$	$\displaystyle=\sum_{s=K}^{2K}C_{s}\frac{x^{s}}{s!}+J_{K,res,2K}(x),$		(185)

where $F_{K,\mathcal{res}}(x)=\mathcal{O}(x^{2K+2})$ and $J_{K,res,2K}(x)=\mathcal{O}(x^{2K+1})$ are the higher-order remaining terms to be analyzed later. We also denote

	$\displaystyle F_{L}(x)$	$\displaystyle=\sum_{s=K+1}^{2K+1}C_{s-1}\frac{x^{s}}{s!},$		(186)
	$\displaystyle J_{L}(x)$	$\displaystyle=\sum_{s=K}^{2K}C_{s}\frac{x^{s}}{s!},$		(186)

as the leading-orders whose explicit forms will be calculated in this section. We will show that $F_{L}(x)$ contains $\mathcal{O}(n)$ term where $n$ is the lattice size.

Now, we are going to solve the exact form of $C_{s}$ for the leading-orders. We first try to solve the succinct form of $R_{K}(x)$ based on its definition in Eq. 184. Taking the derivative for each Trotter stage, we have

		$\displaystyle R_{K}(x)=\frac{d}{dx}\left[\prod_{j=\kappa}^{1}W_{j}(x)^{\dagger}\right]-iH\prod_{j=\kappa}^{1}W_{j}(x)^{\dagger}$		(187)
		$\displaystyle=\sum_{j=\kappa}^{1}\prod_{l=j-1}^{1}W_{l}(x)^{\dagger}\frac{d}{dx}W_{j}(x)^{\dagger}\prod_{l=\kappa}^{j+1}W_{l}(x)^{\dagger}$
		$\displaystyle\quad\quad-\sum_{j=\kappa}^{1}(ia_{j}A+ib_{j}B)\prod_{l=\kappa}^{1}W_{l}(x)^{\dagger}.$

Here we assume $1\leq j-1<j+1\leq\kappa$ . When $j=1$ (or $\kappa$ ), the value of the product $\prod_{l=j-1}^{1}W_{l}(x)^{\dagger}$ (or $\prod_{l=\kappa}^{j+1}W_{l}(x)^{\dagger}$ ) will be regarded as $I$ . Recall that $W_{j}(x)^{\dagger}=e^{ixa_{j}A}e^{ixb_{j}B}$ . We further expand $\frac{d}{dx}W_{j}(x)^{\dagger}$ and merge the two terms together,

		$\displaystyle R_{K}(x)=\sum_{j=\kappa}^{1}\prod_{l=j-1}^{1}W_{l}(x)^{\dagger}\left(ia_{j}AW_{j}(x)^{\dagger}+W_{j}(x)^{\dagger}ib_{j}B\right)\cdot$		(188)
		$\displaystyle\quad\quad\prod_{l=\kappa}^{j+1}W_{l}(x)^{\dagger}-\sum_{j=\kappa}^{1}(ia_{j}A+ib_{j}B)\prod_{l=\kappa}^{1}W_{l}(x)^{\dagger}$
		$\displaystyle=\sum_{j=\kappa}^{1}\left[\prod_{l=j}^{1}W_{l}(x)^{\dagger},ia_{j+1}A+ib_{j}B\right]\prod_{l=\kappa}^{j+1}W_{l}(x)^{\dagger}.$

Here, we assume $a_{\kappa+1}=0$ . Now, we simplify the commutator by splitting the product and then change the summation order,

		$\displaystyle R_{K}(x)=\sum_{j=\kappa}^{1}\sum_{s=j}^{1}\Big{(}\prod_{m=s-1}^{1}W_{m}(x)^{\dagger}[W_{s}(x)^{\dagger},ia_{j+1}A+ib_{j}B]\cdot$		(189)
		$\displaystyle\qquad\qquad\prod_{m=j}^{s+1}W_{m}(x)^{\dagger}\Big{)}\prod_{l=\kappa}^{j+1}W_{l}(x)^{\dagger}$
		$\displaystyle=\sum_{s=\kappa}^{1}\sum_{j=\kappa}^{s}\Big{(}\prod_{m=s-1}^{1}W_{m}(x)^{\dagger}[W_{s}(x)^{\dagger},ia_{j+1}A+ib_{j}B]\cdot$
		$\displaystyle\qquad\qquad\prod_{m=j}^{s+1}W_{m}(x)^{\dagger}\Big{)}\prod_{l=\kappa}^{j+1}W_{l}(x)^{\dagger}$
		$\displaystyle=\sum_{s=\kappa}^{1}\prod_{m=s-1}^{1}W_{m}(x)^{\dagger}[W_{s}(x)^{\dagger},ic_{j+1}A+id_{s}B]\prod_{l=\kappa}^{s+1}W_{l}(x)^{\dagger},$

where

c_{s}:=\sum_{j=\kappa}^{s}a_{j},\quad d_{s}=\sum_{j=\kappa}^{s}b_{j},\quad s\leq\kappa.

(190)

We also set $c_{\kappa+1}=0$ .

Based on Eq. 189, we now derive the succinct form of $J_{K}(x)$ ,

\displaystyle J_{K}(x)=W(x)R_{K}(x)=\prod_{l=1}^{\kappa}W_{l}(x)R_{K}(x)=\sum_{s=\kappa}^{1}\prod_{l=s}^{\kappa}W_{l}(x)[W_{s}(x)^{\dagger},ic_{s+1}A+id_{s}B]\prod_{m=\kappa}^{s+1}W_{m}(x)^{\dagger}.

(191)

We expand each stage of the Trotter formula $W_{l}(x)$ in the formula,

	$\displaystyle J_{K}(x)$	$\displaystyle=i\sum_{j=\kappa}^{1}\left(\prod_{l=j+1}^{\kappa}W_{l}(x)^{\dagger}\left(c_{j+1}A+d_{j}B\right)\prod_{l=\kappa}^{j+1}W_{l}(x)^{\dagger}-\prod_{l=j}^{\kappa}W_{l}(x)^{\dagger}\left(c_{j+1}A+d_{j}B\right)\prod_{l=\kappa}^{j}W_{l}(x)^{\dagger}\right)$		(192)
		$\displaystyle=-i\sum_{j=\kappa}^{1}\left(\prod_{l=j}^{\kappa}e^{-i\tau b_{l}\mathrm{ad}_{B}}e^{-i\tau a_{l}\mathrm{ad}_{A}}\left(c_{j+1}A+d_{j}B\right)-\prod_{l=j+1}^{\kappa}e^{-i\tau b_{l}\mathrm{ad}_{B}}e^{-i\tau a_{l}\mathrm{ad}_{A}}\left(c_{j+1}A+d_{j}B\right)\right).$		(192)

Finally, we apply the following operator-valued Taylor expansion formula with integral form of the remainder:

\displaystyle Q(x)=\sum_{s=0}^{k}\frac{x^{s}}{s!}Q^{(s)}(0)+\int_{0}^{x}d\tau\frac{(x-\tau)^{k}}{k!}Q^{(k+1)}(\tau).

(193)

By the general Libniz formula, we obtain the derivatives of $J_{K}(x)$ as follows:

	$\displaystyle J^{(s)}_{K}(x)$	$\displaystyle=(-i)^{s+1}\sum_{j=\kappa}^{1}\sum_{\begin{subarray}{c}m_{j},n_{j};...;m_{\kappa},n_{\kappa}\\ m_{j},n_{j}\neq 0;\sum_{l=j}^{\kappa}m_{l}+n_{l}=s\end{subarray}}\binom{s}{m_{j},n_{j},...,m_{\kappa},n_{\kappa}}\cdot$		(194)
		$\displaystyle\quad\prod_{l=j}^{\kappa}(b_{l}^{m_{l}}a_{l}^{n_{l}})\prod_{l=j}^{\kappa}\left(e^{-ixb_{l}\mathrm{ad}_{B}}\mathrm{ad}_{B}^{m_{l}}e^{-ixa_{l}\mathrm{ad}_{A}}\mathrm{ad}_{A}^{n_{l}}\right)(c_{j+1}A+d_{j}B).$		(194)

We can then expand $J_{K}(x)$ around $t=0$ as follows:

\displaystyle J_{K}(x)

\displaystyle=\sum_{s=K}^{2K}C_{s}\frac{x^{s}}{s!}+J_{K,res,2K}(x).

(195)

Here, we use the order condition in Proposition 5 so that the terms with expansion order from $0$ to $K-1$ are all zeros. The $s$ th-order term $C_{s}$ and the $2K$ th-order residue $J_{K,res,2K}(x)$ can then be expressed as

	$\displaystyle C_{s}$	$\displaystyle=J^{(s)}_{K}(0),$		(196)
	$\displaystyle J_{K,res,2K}(x)$	$\displaystyle=\int_{0}^{x}d\tau\frac{(x-\tau)^{2K}}{(2K)!}J^{(2K+1)}_{K}(\tau).$		(196)

C.2 Norm bounds for LCU formula

In Sec. C.1 we have derived the explicit form of the LCU formula for the $K$ th-order Trotter remainder $V_{K}(x)$ . Based on Eq. 192, Eq. 194, Eq. 196 and Proposition 6, we are going to prove the $1$ -norm bound $\mu^{(nc)}_{K}(x)$ and error bound $\varepsilon^{(nc)}_{K}(x)$ in Proposition 9.

First of all, we need to estimate the norms of $J_{K}^{(s)}(x)$ . We have the following results.

Proposition 12 (Upper bound of the norm of nested commutators).

Consider a lattice Hamiltonian $H=A+B$ with the form in Eq. 10. Suppose the spectral norm and $1$ -norm of its components $H_{j,j+1}$ are bounded by $\Lambda$ and $\Lambda_{1}$ . Then for the nested commutators appearing in Eq. 196, we have the following bound

		$\displaystyle\left\\|\prod_{l=j}^{\kappa}e^{-i\tau b_{l}\mathrm{ad}_{B}}\mathrm{ad}_{B}^{m_{l}}e^{-i\tau a_{l}\mathrm{ad}_{A}}\mathrm{ad}_{A}^{n_{l}}(c_{j+1}A+d_{j}B)\right\\|$		(197)
		$\displaystyle\quad\leq\frac{n}{2}2^{s}\Lambda^{s+1}\Big{(}c_{j+1}\prod_{l=j}^{\kappa}(2(l-j)+2)^{m_{l}}(2(l-j)+1)^{n_{l}}$
		$\displaystyle\quad\quad+d_{j}\prod_{l=j}^{\kappa}(2(l-j)+3)^{m_{l}}(2(l-j)+2)^{n_{l}}\Big{)}$

where $\{m_{l},n_{l}\}_{l=1}^{\kappa}$ are non-negative integers satisfying $\sum_{l=j}^{\kappa}(m_{l}+n_{l})=s$ . $\kappa$ is the stage number determined by the Trotter formula in Eq. 180. $\{a_{l},b_{l}\}_{l=1}^{\kappa}$ are defined by the Trotter formula in Eq. 181. $\{c_{l},d_{l}\}_{l=1}^{\kappa}$ are defined in Eq. 190. As a result, we can bound the norm of $J^{(s)}_{K}(x)$ defined in Eq. 194 as follows:

\|J^{(s)}_{K}(x)\|\leq n\kappa(4\kappa+5)^{s}2^{s}\Lambda^{s+1}.

(198)

The $1$ -norm upper bound for $J^{(s)}_{K}(x)$ is to simply replace $\Lambda$ by $\Lambda_{1}$ .

Proof.

To begin with, we now focus on one Hamiltonian term $H_{j,j+1}$ in $A$ and bound the norm

		$\displaystyle\left\\|\prod_{l=j}^{\kappa}e^{-i\tau b_{l}\mathrm{ad}_{B}}\mathrm{ad}_{B}^{m_{l}}e^{-i\tau a_{l}\mathrm{ad}_{A}}\mathrm{ad}_{A}^{n_{l}}(H_{j,j+1})\right\\|$		(199)
		$\displaystyle\quad\leq 2^{s}\Lambda^{s+1}\prod_{l=j}^{\kappa}(2(l-j)+2)^{m_{l}}(2(l-j)+1)^{n_{l}}.$		(199)

To this end, we decompose operator to the elementary nested commutators with the form,

		$\displaystyle e^{-i\tau\mathrm{ad}_{B}}\mathrm{ad}_{H_{j_{s},j_{s}+1}}.\mathrm{ad}_{H_{j_{s-m_{\kappa}+1},j_{s-m_{\kappa}+1}+1}}\cdot$		(200)
		$\displaystyle.$
		$\displaystyle e^{-i\tau\mathrm{ad}_{B}}\mathrm{ad}_{H_{j_{m_{j}+n_{j}},j_{m_{j}+n_{j}}+1}}.\mathrm{ad}_{H_{j_{n_{j}+1},j_{n_{j}+1}+1}}\cdot$
		$\displaystyle\;e^{-i\tau\mathrm{ad}_{A}}\mathrm{ad}_{H_{j_{n_{j}},j_{n_{j}}+1}}.\mathrm{ad}_{H_{j_{1},j_{1}+1}}H_{j,j+1},$

where $j_{1},j_{2},...,j_{s}$ are the possible vertex indices. For each elementary nested commutator, the spectral norm can be bounded by

(2\Lambda)^{s}\Lambda.

(201)

This can he done by expand all the commutators and apply triangle inequality. Here, we use the property that the spectral norm of all the exponential operator with anti-Hermitian exponent is 1.

Now, we count the number of the possible elementary commutators with the form in Eq. 200. We will check the action of the adjoint operators from right to left. For the first location, we know that $\mathrm{ad}_{A}H_{j,j+1}=0$ . For simplicity, we keep the term $\mathrm{ad}_{H_{j,j+1}}H_{j,j+1}$ in the counting of elementary commutators. If the next $\mathrm{ad}$ is still $\mathrm{ad}_{A}$ , the support will still be on the two qubits $j$ and $j+1$ . As a result, there will be one possible elementary term $\mathrm{ad}_{H_{j,j+1}}$ left. Similarly, the exponential operator $e^{-i\tau\mathrm{ad}_{A}}$ will not enlarge the support since one can expand it to the power of $\mathrm{ad}_{A}$ . The support will be enlarged when $\mathrm{ad}_{B}$ comes into play. In this layer, the support of the operator will be expanded to four qubits: $j-1,j,j+1$ , and $j+2$ . We can decompose $\mathrm{ad}_{B}$ to $2$ nonzero elementary elements, $\mathrm{ad}_{H_{j-1,j}}$ and $\mathrm{ad}_{H_{j+1,j+2}}$ . If the next operator is still $\mathrm{ad}_{B}$ or $e^{-i\mathrm{ad}_{B}}$ , it will not enlarge the support. Following this logic, we can see that the number of possible elementary commutators is bounded by

		$\displaystyle(2(\kappa-j)+2)^{m_{\kappa}}(2(\kappa-j)+1)^{n_{\kappa}}.4^{m_{j+1}}3^{n_{j+1}}2^{m_{j}}1^{n_{j}}$		(202)
		$\displaystyle\quad=\prod_{l=j}^{\kappa}(2(l-j)+2)^{m_{l}}(2(l-j)+1)^{n_{l}}.$		(202)

We remark that, the elementary nested commutators with $n_{j}\neq 0$ is actually $0$ . Here, we keep these commutators for the simplicity of counting.

Combining the number of elementary nested commutators and the norm bound for each commutator and applying triangle inequality, we will obtain Eq. 199.

Similarly, we can check one Hamiltonian term $H_{j,j+1}$ in $B$ and bound the norm with

		$\displaystyle\left\\|\prod_{l=j}^{\kappa}e^{-i\tau b_{l}\mathrm{ad}_{B}}\mathrm{ad}_{B}^{m_{l}}e^{-i\tau a_{l}\mathrm{ad}_{A}}\mathrm{ad}_{A}^{n_{l}}(H_{j,j+1})\right\\|$		(203)
		$\displaystyle\quad\leq 2^{s}\Lambda^{s+1}\prod_{l=j}^{\kappa}(2(l-j)+3)^{m_{l}}(2(l-j)+2)^{n_{l}}.$		(203)

The counting logic is similar to the case for $H_{j,j+1}$ in $A$ . The only difference is that when counting the number of elementary nested commutators, the action of the first $\mathrm{ad}_{A}$ will enlarge the operator space to four qubits.

Applying Eq. 199 and Eq. 203 for all the components $H_{j,j+1}$ in $H$ , we will obtain Eq. 197.

Now, we apply Eq. 197 to bound the norm of $J^{(s)}_{K}(x)$ in Eq. 194. We have

	$\displaystyle\\|J^{(s)}_{K}(x)\\|\leq\sum_{j=\kappa}^{1}\sum_{\begin{subarray}{c}m_{j},n_{j};...;m_{\kappa},n_{\kappa}\\ \sum_{l=j}^{\kappa}m_{l}+n_{l}=s\end{subarray}}\binom{s}{m_{j},n_{j},...,m_{\kappa},n_{\kappa}}\prod_{l=j}^{\kappa}(b_{l}^{m_{l}}a_{l}^{n_{l}})\left\\|\prod_{l=j}^{\kappa}\left(e^{-itb_{l}\mathrm{ad}_{B}}\mathrm{ad}_{B}^{m_{l}}e^{-ita_{l}\mathrm{ad}_{A}}\mathrm{ad}_{A}^{n_{l}}\right)(c_{j+1}A+d_{j}B)\right\\|$
	$\displaystyle\leq 2^{s}\Lambda^{s+1}\frac{n}{2}\sum_{j=\kappa}^{1}\sum_{\begin{subarray}{c}m_{j},n_{j};...;m_{\kappa},n_{\kappa}\\ \sum_{l=j}^{\kappa}m_{l}+n_{l}=s\end{subarray}}\binom{s}{m_{j},n_{j},...,m_{\kappa},n_{\kappa}}\prod_{l=j}^{\kappa}(b_{l}^{m_{l}}a_{l}^{n_{l}})$
	$\displaystyle\quad\left(c_{j+1}\prod_{l=j}^{\kappa}(2(l-j)+2)^{m_{l}}(2(l-j)+1)^{n_{l}}+d_{j}\prod_{l=j}^{\kappa}(2(l-j)+3)^{m_{l}}(2(l-j)+2)^{n_{l}}\right)$
	$\displaystyle\leq 2^{s}\Lambda^{s+1}\frac{n}{2}\sum_{j=\kappa}^{1}c_{j+1}\left(\sum_{l=j}^{\kappa}b_{l}(2(l-j)+2)+a_{l}(2(l-j)+1)\right)^{s}+d_{j}\left(\sum_{l=j}^{\kappa}b_{l}(2(l-j)+3)+a_{l}(2(l-j)+2)\right)^{s}$
	$\displaystyle\leq 2^{s}\Lambda^{s+1}\frac{n}{2}\sum_{j=\kappa}^{1}c_{j+1}\left(4(\kappa-j)+3\right)^{s}+d_{j}\left(4(\kappa-j)+5\right)^{s}$		(204)
	$\displaystyle\leq 2^{s}\Lambda^{s+1}\frac{n}{2}\sum_{j=\kappa}^{1}\left(4(\kappa-j)+3\right)^{s}+\left(4(\kappa-j)+5\right)^{s}$
	$\displaystyle\leq 2^{s}\Lambda^{s+1}\frac{n}{2}\cdot 2\kappa(4\kappa+5)^{s}=n\kappa(4\kappa+5)^{s}2^{s}\Lambda^{s+1}.$

In the third line, we use multinomial theorem. In the fourth line, we use the fact that $a_{l},b_{l}>0$ and $\sum_{l}a_{l},\sum_{l}b_{l}\leq 1$ . The fifth line is due to $0\leq c_{j},d_{j}<1$ for all $j$ . In the sixth line, we use the following bound:

		$\displaystyle\;\;\sum_{j=1}^{\kappa}(4(\kappa-j)+a)^{s}=\sum_{l=0}^{\kappa-1}(4l+a)^{s}$		(205)
		$\displaystyle\leq\int_{0}^{\kappa}(4x+a)^{s}dx=\frac{1}{4}\frac{(4\kappa+a)^{s+1}-a^{s+1}}{s+1}$
		$\displaystyle=\kappa\frac{1}{s+1}\sum_{m=0}^{s}(4\kappa+a)^{m}a^{s-m}\leq\kappa(4\kappa+a)^{s}.$

Here, $a$ is any real number.

Since $1$ -norm can be estimated based on the same logic by counting the number of nested commutators and the $1$ -norm of each nest commutator, the derivation for the $1$ -norm is similar by replacing $\Lambda$ to $\Lambda_{1}$ .

∎

Now, we prove the $1$ -norm bound $\mu^{(nc)}_{K}(x)$ of $\tilde{V}_{K}(x)$ . From Proposition 6 we have

	$\displaystyle\\|\tilde{V}_{K}(x)\\|_{1}=\sqrt{1+\left(\sum_{s=K}^{2K}\\|C_{s}\\|_{1}\frac{x^{s+1}}{(s+1)!}\right)^{2}}$
	$\displaystyle\leq 1+\frac{1}{2}\left(\sum_{s=K}^{2K}\\|C_{s}\\|_{1}\frac{x^{s+1}}{(s+1)!}\right)^{2}$
	$\displaystyle\leq 1+\frac{1}{2}n^{2}\kappa^{2}\left(\sum_{s=K}^{2K}\beta^{s}\frac{(\Lambda_{1}x)^{s+1}}{(s+1)!}\right)^{2}$		(206)
	$\displaystyle\leq 1+\frac{1}{2}n^{2}\kappa^{2}\left((K+1)\beta^{K+1}\frac{(\Lambda_{1}x)^{K+1}}{(K+1)!}\right)^{2}$
	$\displaystyle\leq 1+\frac{n^{2}\kappa^{2}\beta^{2(K+1)}}{2(K!)^{2}}(\Lambda_{1}x)^{2K+2}$
	$\displaystyle\leq\exp\left(\frac{n^{2}\kappa^{2}\beta^{2(K+1)}}{2(K!)^{2}}(\Lambda_{1}x)^{2K+2}\right).$

Here, we set $\beta:=2(4\kappa+5)$ . In the third line, we use Proposition 12. In the fourth line, we use the assumption that $\beta\Lambda_{1}x\leq(K+2)$ .

Then, we prove the distance bound $\varepsilon_{K}^{(nc)}(x)=\|F_{K,\mathcal{res}}\|$ . From Proposition 6 we know that we only need to bound $\|J_{K,L}(\tau)\|$ , $\|J_{K,res,2K}(\tau)\|$ , and $M_{K}(\tau)$ based on Eq. 114.

We start from $\|J_{K,L}(\tau)\|$ . From Proposition 7 we have

	$\displaystyle\\|J_{K,L}(\tau)\\|$	$\displaystyle\leq\sum_{s=K}^{2K}\\|C_{s}\\|\frac{\tau^{s}}{s!}$		(207)
		$\displaystyle\leq n\kappa\sum_{s=K}^{2K}\beta^{s}\Lambda^{s+1}\frac{\tau^{s}}{s!}.$		(207)

Then, we bound $\|J_{K,res,2K}(\tau)\|$ and $\|M_{K}(\tau)\|$ . From Eq. 114 and Proposition 12 we have

$\displaystyle\\|J_{K,res,2K}(\tau)\\|$	$\displaystyle\leq\int_{0}^{\tau}d\tau_{1}\frac{(\tau-\tau_{1})^{2K}}{(2K)!}\\|J_{K}^{(2K+1)}(\tau_{1})\\|$	(208)
	$\displaystyle\leq n\kappa\frac{(\beta\tau)^{2K+1}}{(2K+1)!}\Lambda^{2K+2},$
$\displaystyle\\|M_{K}(\tau)\\|$	$\displaystyle\leq\int_{0}^{\tau}d\tau_{1}\int_{0}^{\tau_{1}}d\tau_{2}\frac{(\tau_{1}-\tau_{2})^{(K-1)}}{(K-1)!}\\|J_{K}^{(K)}(\tau_{2})\\|$
	$\displaystyle\leq n\kappa\frac{\beta^{K}\tau^{K+1}}{(K+1)!}\Lambda^{K+1}.$

Based on Eq. 207, Eq. 208 and Proposition 6 we have

	$\displaystyle\\|F_{K,\mathcal{res}}(x)\\|\leq\int_{0}^{\tau}\left(\\|M_{K}(\tau)\\|\\|J_{L}(\tau)\\|+\\|J_{K,res,2K}(\tau)\\|\right)$
	$\displaystyle\leq\int_{0}^{x}d\tau(n\kappa)^{2}\sum_{s=K}^{2K}\frac{\beta^{s+K}\Lambda^{s+K+2}\tau^{s+K+1}}{(K+1)!s!}$
	$\displaystyle\quad+(n\kappa)\frac{\beta^{2K+1}\Lambda^{2K+2}\tau^{2K+1}}{(2K+1)!}$		(209)
	$\displaystyle=(n\kappa)^{2}\sum_{s=K}^{2K}\frac{\beta^{s+K}(\Lambda x)^{s+K+2}}{(K+1)!s!(s+K+2)}+(n\kappa)\frac{\beta^{2K+1}(\Lambda x)^{2K+2}}{(2K+2)!}$
	$\displaystyle\leq(n\kappa)^{2}(K+1)\frac{\beta^{2K}(\Lambda x)^{2K+2}}{(2K+2)K!(K+1)!}+(n\kappa)\frac{\beta^{2K+1}x^{2K+2}}{(2K+2)!}$
	$\displaystyle\leq(n\kappa)^{2}\frac{\beta^{2K+1}}{K!(K+1)!}(\Lambda x)^{2K+2}.$

In the fourth line, we assume $\beta\Lambda x\leq K+1$ . In the fifth line, we use the fact that $n\kappa>1$ .

To summarize, from Eq. 206 and Eq. 209, we have proven that,

$\displaystyle\mu_{K}^{(nc)}$	$\displaystyle=\\|V_{K}(x)\\|_{1}\leq\exp\left(\frac{n^{2}\kappa^{2}\beta^{2(K+1)}}{2(K!)^{2}}(\Lambda_{1}x)^{2K+2}\right),$	(210)
$\displaystyle\varepsilon_{K}^{(nc)}$	$\displaystyle=\\|\tilde{V}_{K}(x)-V_{K}(x)\\|=\\|F_{K,\mathcal{res}}(x)\\|$
	$\displaystyle\leq(n\kappa)^{2}\frac{\beta^{2K+1}}{K!(K+1)!}(\Lambda x)^{2K+2}.$

Here, $\beta:=2(4\kappa+5)$ . We assume that $\beta x\leq\max\{\frac{K+1}{\Lambda},\frac{K+2}{\Lambda_{1}}\}$ . This finish the proof of Proposition 9.

Appendix D EFFICIENT SAMPLING FOR THE HIGHER-ORDER NESTED-COMMUTATOR COMPENSATION ALGORITHM

Following the analysis in Appendix C, we now design a general sampling method for the general $K$ th-order Trotter remainder $\tilde{V}_{K}^{(nc)}(x)$ . We consider the general $K$ th-order Trotter formula in the canonical form in Eq. 180.

The truncated Trotter remainder can be expanded as

	$\displaystyle V_{K}(x)$	$\displaystyle=I+\sum_{s=K+1}^{2K+1}F_{K,s},$		(211)
	$\displaystyle F_{K,s}$	$\displaystyle=C_{s-1}\frac{x^{s}}{s!}.$		(211)

Based on Eq. 194, we have shown that $C_{s}$ can be written as

\displaystyle C_{s}=\sum_{j=\kappa}^{1}C_{s,j}^{(A^{\prime})}+C_{s,j}^{(B^{\prime})},

(212)

where $A^{\prime}:=-iA,B^{\prime}:=-iB$ ,

$\displaystyle C_{s,j}^{(A^{\prime})}$	$\displaystyle=c_{j+1}\sum_{\begin{subarray}{c}m_{j},n_{j};...;m_{\kappa},n_{\kappa}\\ m_{j},n_{j}\neq 0;\sum_{l=j}^{\kappa}m_{l}+n_{l}=s\end{subarray}}\binom{s}{m_{j},n_{j},...,m_{\kappa},n_{\kappa}}\cdot$	(213)
	$\displaystyle\quad\quad\prod_{l=j}^{\kappa}(b_{l}^{m_{l}}a_{l}^{n_{l}})\left(\mathrm{ad}_{B^{\prime}}^{m_{l}}\mathrm{ad}_{A^{\prime}}^{n_{l}}\right)A^{\prime},$
$\displaystyle C_{s,j}^{(B^{\prime})}$	$\displaystyle=d_{j}\sum_{\begin{subarray}{c}m_{j},n_{j};...;m_{\kappa},n_{\kappa}\\ m_{j},n_{j}\neq 0;\sum_{l=j}^{\kappa}m_{l}+n_{l}=s\end{subarray}}\binom{s}{m_{j},n_{j},...,m_{\kappa},n_{\kappa}}\cdot$
	$\displaystyle\quad\quad\prod_{l=j}^{\kappa}(b_{l}^{m_{l}}a_{l}^{n_{l}})\left(\mathrm{ad}_{B^{\prime}}^{m_{l}}\mathrm{ad}_{A^{\prime}}^{n_{l}}\right)B^{\prime}.$

The values $c_{s}$ and $d_{s}$ are defined in Eq. 190.

We now construct efficient random sampling of $C_{s,j}^{(A^{\prime})}$ and $C_{s,j}^{(B^{\prime})}$ Eq. 213 based on LCU formulas with the $1$ -norm of

	$\displaystyle\mu_{s,j}^{(A^{\prime})}$	$\displaystyle=2^{s}\Lambda_{1}^{s+1}\frac{n}{2}c_{j+1}\chi_{A^{\prime},j}^{s},$		(214)
	$\displaystyle\mu_{s,j}^{(B^{\prime})}$	$\displaystyle=2^{s}\Lambda_{1}^{s+1}\frac{n}{2}d_{j}\chi_{B^{\prime},j}^{s},$		(214)

respectively. Here,

	$\displaystyle\chi_{A^{\prime},j}$	$\displaystyle=\sum_{l=j}^{\kappa}b_{l}(2(l-j)+2)+a_{l}(2(l-j)+1),$		(215)
	$\displaystyle\chi_{B^{\prime},j}$	$\displaystyle=\sum_{l=j}^{\kappa}b_{l}(2(l-j)+3)+a_{l}(2(l-j)+2).$		(215)

There are $\frac{n}{2}$ summands $\{H_{q,q+1}\}$ in $A^{\prime}$ . We now focus on a generic summand $H_{q,q+1}$ in $A^{\prime}$ and check the action of the adjoint operators on it,

		$\displaystyle\quad\mathrm{ad}_{s,j;\vec{m},\vec{n}}^{(A^{\prime})}H_{q,q+1}=$		(216)
		$\displaystyle(-i)\mathrm{ad}_{B^{\prime}}^{m_{\kappa}}\mathrm{ad}_{A^{\prime}}^{n_{\kappa}}.\mathrm{ad}_{B^{\prime}}^{m_{j+1}}\mathrm{ad}_{A^{\prime}}^{n_{j+1}}\mathrm{ad}_{B^{\prime}}^{m_{j}}\mathrm{ad}_{A^{\prime}}^{n_{j}}H_{q,q+1}.$		(216)

Recall that $m_{\kappa}+n_{\kappa}+...+m_{j+1}+n_{j+1}+m_{j}+n_{j}=s$ .

We would like to follow that construction in Proposition 12, where we first decompose the adjoint operator in Eq. 216 into the elementary operators with the form,

\displaystyle\mathrm{ad}_{q_{1:s}}^{(s)}H_{q,q+1}=(-i)^{s+1}\mathrm{ad}_{H_{q_{s},q_{s}+1}}.\mathrm{ad}_{H_{q_{1},q_{1}+1}}H_{q,q+1}.

(217)

For each elementary nested commutator, the $1$ -norm is

\Lambda_{1}^{(s)}=(2\Lambda_{1})^{s}\Lambda_{1}.

(218)

Here, we have assumed that all the summands $\{H_{q,q+1}\}$ in $A$ or $B$ have been padded similar to Eq. 18 so that their $1$ -norms are all $\Lambda_{1}$ .

Now, we count the number of elementary nested commutators with the form of Eq. 217 in the adjoint operator $\mathrm{ad}_{s,j}^{(A^{\prime})}$ in Eq. 216. In the proof of Proposition 12, we show the number of possible elementary nested commutator is upper bounded by

		$\displaystyle\quad N(\mathrm{ad}_{s,j;\vec{m},\vec{n}}^{(A^{\prime})}H_{q,q+1})$		(219)
		$\displaystyle=(2(\kappa-j)+2)^{m_{\kappa}}(2(\kappa-j)+1)^{n_{\kappa}}.4^{m_{j+1}}3^{n_{j+1}}2^{m_{j}}1^{n_{j}}$
		$\displaystyle=\prod_{l=j}^{\kappa}(2(l-j)+2)^{m_{l}}(2(l-j)+1)^{n_{l}}.$

We now discuss how to achieve this bound by “padding” zero-valued elementary nested commutators into the decomposition of $\mathrm{ad}_{s,j}^{(A^{\prime})}$ . We notice that, after the sequential action of $\mathrm{ad}_{A^{\prime}}^{n_{j}},\mathrm{ad}_{B^{\prime}}^{m_{j}},\mathrm{ad}_{A^{\prime}}^{n_{j+1}},\mathrm{ad}_{B^{\prime}}^{m_{j+1}},...$ , the support, i.e., the index of the qubit where the operators act nontrivially, of the resulting operator is given by the “light cone” of $q:(q+1),(q-1):(q+2),(q-2):(q+3),(q-3):(q+4),...$ , as illustrated inFig. 6. We keep track of the largest possible support of the resulting adjoint operators and pad the LCU formula in the following situations:

1.

(Padding small summands) When the $1$ -norm of the summand $H_{q_{l},q_{l}+1}$ is smaller than $\Lambda_{1}$ , we add extra $\pm I$ terms in the LCU formula of $H_{q_{l},q_{l}+1}$ to make its $1$ -norm to $\Lambda_{1}$ , similar to Eq. 18.
2.

(Padding $0$ -valued commutators) Many nested commutators with the form of Eq. 217 may be 0 due to the commutation relationship, for example $\mathrm{ad}_{H_{q+2,q+3}}\mathrm{ad}_{H_{q-1,q}}H_{q,q+1}=0$ , since the support of $\mathrm{ad}_{H_{q-1,q}}H_{q,q+1}$ is on qubits $q-1,q$ and $q+1$ which commutes with $H_{q+2,q+3}$ . However, we keep all these $0$ -valued terms as long as the support of the nested commutator is in the “light-cone” range of the operator.
3.
(Padding boundary terms) For the summands $H_{q,q+1}$ which are close to the boundary, after a few action of the adjoint operators, the “light cone” will touch the boundary of the system. In this case, we introduce virtual padding ancillary qubits and extra padding nested commutators on it:
1. (a)
  
  For the summand $H_{0}$ or $H_{n}$ which own only one-qubit support on the boundary, we redefine the Pauli terms in the LCU of them. For example, if $H_{0}=\Lambda_{1}\sum_{\omega}p_{\omega}P_{0}^{(\omega)}$ , we then redefined it to $H_{-1,0}=\sum_{\omega}p_{\omega}\left(I_{-1}\otimes P_{0}\right)^{(\omega)}$ .
2. (b)
  
  We define the “virtual” summand $H_{q_{l},q_{l}+1}$ where qubits $q_{l},q_{l}+1$ are all virtual qubits as
  
  $H_{q_{l},q_{l}+1}:=\frac{\Lambda_{1}}{2}\left(I_{q_{l},q_{l}+1}+(-I_{q_{l},q_{l}+1})\right).$ (220)
Since all the operations on the virtual qubits are $\pm I$ , we do not need to introduce these qubits in the real implementation.

After these padding, we have now construct the LCU formula of $\mathrm{ad}_{s,j}^{(A^{\prime})}$ with the following form:

		$\displaystyle\quad\mathrm{ad}_{s,j;\vec{m},\vec{n}}^{(A^{\prime})}H_{q,q+1}$		(221)
		$\displaystyle=N(\mathrm{ad}_{s,j}^{(A^{\prime})}H_{q,q+1})\cdot$
		$\displaystyle\quad\quad\sum_{q_{1},...,q_{s}}(-i)^{s+1}\mathrm{ad}_{H_{q_{s},q_{s}+1}}.\mathrm{ad}_{H_{q_{1},q_{1}+1}}H_{q,q+1}$
		$\displaystyle=:N(\mathrm{ad}_{s,j}^{(A^{\prime})}H_{q,q+1})\sum_{q_{1},...,q_{s}}\mathrm{ad}_{q_{1:s}}^{(s)}H_{q,q+1}$

where $q_{1},...,q_{s}$ are indices in the light-cone region. Recall that all the elementary operators with the form Eq. 217 are with the same $1$ -norm of $(2\Lambda_{1})^{s}\Lambda_{1}$ , irrelevant of the qubit index $q$ and the rank number $\{m_{l},n_{l}\}_{l=j}^{\kappa}$ . Therefore, the $1$ -norm of $\mathrm{ad}_{s,j}^{(A^{\prime})}H_{q,q+1}$ is

\mu\{\mathrm{ad}_{s,j;\vec{m},\vec{n}}^{(A^{\prime})}H_{q,q+1}\}=N\left(\mathrm{ad}_{s,j;\vec{m},\vec{n}}^{(A^{\prime})}\right)\,(2\Lambda_{1})^{s}\Lambda_{1},

(222)

which is irrelevant of the qubit index $q$ .

Now, based on Eq. 221, we can write $C_{s,j}^{(A^{\prime})}$ in Eq. 213 as

		$\displaystyle\frac{C_{s,j}^{(A^{\prime})}}{c_{j+1}}=\sum_{\begin{subarray}{c}1\leq q\leq n\\ q:\text{odd}\end{subarray}}\sum_{\begin{subarray}{c}m_{j},n_{j};...;m_{\kappa},n_{\kappa}\\ \sum_{l=j}^{\kappa}m_{l}+n_{l}=s\end{subarray}}\binom{s}{m_{j},n_{j},...,m_{\kappa},n_{\kappa}}\cdot$		(223)
		$\displaystyle\quad\quad\prod_{l=j}^{\kappa}(b_{l}^{m_{l}}a_{l}^{n_{l}})\mathrm{ad}_{s,j;\vec{m},\vec{n}}^{(A^{\prime})}H_{q,q+1}$
		$\displaystyle=\sum_{\begin{subarray}{c}1\leq q\leq n\\ q:\text{odd}\end{subarray}}\sum_{\begin{subarray}{c}m_{j},n_{j};...;m_{\kappa},n_{\kappa}\\ \sum_{l=j}^{\kappa}m_{l}+n_{l}=s\end{subarray}}\binom{s}{m_{j},n_{j},...,m_{\kappa},n_{\kappa}}\cdot$
		$\displaystyle\quad\prod_{l=j}^{\kappa}(b_{l}^{m_{l}}a_{l}^{n_{l}})N(\mathrm{ad}_{s,j}^{(A^{\prime})}H_{q,q+1})\sum_{q_{1},...,q_{s}}\mathrm{ad}_{q_{1:s}}^{(s)}H_{q,q+1}$
		$\displaystyle=\chi_{A^{\prime},j}^{s}\mathrm{Mul}\left(\{m_{j},n_{j},...,m_{\kappa},n_{\kappa}\};\{\vec{p}_{A^{\prime},b},\vec{p}_{A^{\prime},a}\};s\right)\cdot$
		$\displaystyle\quad\sum_{\begin{subarray}{c}1\leq q\leq n\\ q:\text{odd}\end{subarray}}\sum_{q_{1},...,q_{s}}\mathrm{ad}_{q_{1:s}}^{(s)}H_{q,q+1},$

where

\displaystyle p_{A^{\prime},b;l}=\frac{b_{l}(2(l-j)+2)}{\chi_{A^{\prime},j}},\quad p_{A^{\prime},a;l}=\frac{a_{l}(2(l-j)+2)}{\chi_{A^{\prime},j}},

(224)

for $l=j,j+1,...,\kappa$ and $\mathrm{Mul}(v;\vec{p};s)$ denotes a multinomial distribution where we sample the variable value $v$ based on the probability distribution $\vec{p}$ for $s$ times. Based on Eq. 222 and Eq. 223, we can easily check that the $1$ -norm of $C_{s,j}^{(A^{\prime})}$ is given by $\mu_{s,j}^{(A^{\prime})}$ in Eq. 214. More importantly, since the $1$ -norm of $\mathrm{ad}_{q_{1:s}}^{(s)}H_{q,q+1}$ is independent of $q$ and $q_{1},...,q_{s}$ , the sampling of $q$ from all odd qubits and $q_{1},...,q_{s}$ from the light-cone region follows uniform distribution.

Similarly, we can decompose and pad extra terms in $C_{s,j}^{(B^{\prime})}$ to construct LCU with the following form:

$\displaystyle\frac{C_{s,j}^{(B^{\prime})}}{d_{j}}$	$\displaystyle=\chi_{B^{\prime},j}^{s}\mathrm{Mul}\left(\{m_{j},n_{j},...,m_{\kappa},n_{\kappa}\};\{\vec{p}_{B^{\prime},b},\vec{p}_{B^{\prime},a}\};s\right)\cdot$	(225)
	$\displaystyle\quad\sum_{\begin{subarray}{c}1\leq q\leq n\\ q:\text{even}\end{subarray}}\sum_{q_{1},...,q_{s}}\mathrm{ad}_{q_{1:s}}^{(s)}H_{q,q+1},$
$\displaystyle p_{B^{\prime},b;l}$	$\displaystyle=\frac{b_{l}(2(l-j)+3)}{\chi_{B^{\prime},j}},\quad p_{B^{\prime},a;l}=\frac{a_{l}(2(l-j)+2)}{\chi_{A^{\prime},j}},$

for $l=j,j+1,...,\kappa$ , so that the $1$ -norm of $C_{s,j}^{(B^{\prime})}$ is given by $\mu_{s,j}^{(B^{\prime})}$ in Eq. 214. Again, the $1$ -norm of the operator $\sum_{q_{1},...,q_{s}}\mathrm{ad}_{q_{1:s}}^{(s)}H_{q,q+1}$ is $(2\Lambda_{1})^{s}\Lambda_{1}$ , independent of $q$ and $q_{1},...,q_{s}$ . Therefore, the $1$ -norm of $C_{s}$ in Eq. 212 is

\mu\{C_{s}\}=\sum_{j=\kappa}^{1}\mu_{s,j}^{(A^{\prime})}+\mu_{s,j}^{(B^{\prime})},

(226)

and the $1$ -norm of $F_{K,s}$ is

\displaystyle\mu\{F_{K,s}\}=\mu\{C_{s-1}\}\frac{x^{s}}{s!},

(227)

Based on the above analysis, we summarize the overall sampling procedure in Fig. 12 and Algorithm 2. We consider a multistage sampling: first, we sample the expansion order $s$ based on $\mu\{F_{K,s}\}$ ; second, for a given expansion order $s$ , we sample the terms $C_{s-1,j}^{(A^{\prime})}$ or $C_{s-1,j}^{(B^{\prime})}$ in the nested commutator $C_{s-1}$ based on Eq. 212 and Eq. 214; third, we sample the power of the adjoint operators $\vec{m}$ and $\vec{n}$ based on the multinomial distribution in Eq. 223 and Eq. 225; fourth, we uniformly sample the specific Hamiltonian summands $H_{q,q+1}$ and uniformly sample the adjoint Hamiltonian summands $\{H_{q_{1},q_{1}+1},...,H_{q_{s-1},q_{s-1}+1}\}$ , each from the light-cone region; finally, if there are multiple terms in the Hamiltonian summands, we then uniformly sample the specific Pauli terms in the summands.

Algorithm 2 Nested-commutator compensation algorithm: sampling of Pauli operators

1:An

n

-qubit Hamiltonian

H

; unit evolution time

0<x<1

for each

K

th-order Trotter segment; the canonical form of the

K

th-order Trotter method with coefficients

\{a_{j},b_{j}\}

for

j=1,...,\kappa

;

2:Sampling of a Pauli operator

P^{(\omega_{j};...;\omega_{j_{s}},b_{s})}

from the Trotter remainder

\tilde{V}_{K}^{(nc)}(x)

3:Calculate the

1

-norm of the nested commutator

C_{s}

for

s=K,...,2K

and its

j

th-stage components

C_{s,j}^{(A^{\prime})}

and

C_{s,j}^{(B^{\prime})}

for

j=1,...,\kappa

based on Eq. 214 and Eq. 226.

4:Sample the expansion order

s\in\{K,...,2K\}

based on the

1

-norm

\mu\{F_{K,s+1}\}=\mu\{C_{s}\}\frac{x^{s+1}}{(s+1)!}

. This determines the sampled nested commutator

C_{s}

5:From

C_{s}

, sample the

j

th-stage components

C_{s,j}^{(A^{\prime})}

C_{s,j}^{(B^{\prime})}

based on the

1

-norm

\mu_{s,j}^{(A^{\prime})}

and

\mu_{s,j}^{(B^{\prime})}

in Eq. 214.

6:From

C_{s,j}^{(A^{\prime})}

(or

C_{s,j}^{(B^{\prime})}

) with a given

j

, sample the power of the sequential adjoint operators

m_{j},n_{j};...;m_{\kappa},n_{\kappa}

based on a multinomial distribution

\mathrm{Mul}(\{\vec{m},\vec{n}\};\{\vec{p}_{A^{\prime}(B^{\prime}),b},\vec{p}_{A^{\prime}(B^{\prime}),a}\};s)

defined in Eq. 223 and Eq. 225.

7:Sample the index of the starting Hamiltonian summand

q

uniformly from

A^{\prime}

(

B^{\prime}

). Sample the index

q_{1},...,q_{s}

of the subsequent adjoint Hamiltonian summands, each from the “light-cone” region of that location.

8:For Hamiltonian summands indexed by

q

and

\{q_{1},...,q_{s}\}

, sample the Pauli operators

P^{(\omega_{j})}

and

\{P^{(\omega_{j_{1}})},...,P^{(\omega_{j_{s}})}\}

independently based on the padded LCU formula for each Hamiltonian summand in Eq. 18. For each adjoint location

q_{1},...,q_{s}

, uniformly and independently sample the multiplication order

b_{1},...,b_{s}\in\{0,1\}

, which indicates the multiplication order of the Pauli operators.

9:Set

P:=P^{(\omega_{j})}

10:for

l=1~\textbf{to}~s

\triangleright

Calculate the output Pauli operator

11: if

b_{l}=0

then

12: Set

P:=P^{(\omega_{j_{1}})}\cdot P

13: else

14: Set

P:=-P\cdot P^{(\omega_{j_{1}})}

15: end if

16:end for

17:Output

P

as the sampled Pauli operator.

Now, we analyze the space and time cost of the whole sampling algorithm in Algorithm 2. The calculation of the $1$ -norms of $C_{s}$ and its components $C_{s,j}^{(A^{\prime})}$ and $C_{s,j}^{(B^{\prime})}$ requires $\mathcal{O}(K\kappa)$ spacetime resources. Consider a parallel calculation, we need $\mathcal{O}(K\kappa)$ spatial resources and $\mathcal{O}(1)$ time resources. We store all the above $1$ -norm coefficients in the memory with the size $\mathcal{O}(K\kappa)$ . The sampling of $F_{K,s}$ from $K$ discrete values requires $\mathcal{O}(\log{K})$ steps Bringmann and Panagiotou (2012). For a given nested commutator $C_{s-1}$ , the sampling of $C_{s,j}^{(A^{\prime})}$ and $C_{s,j}^{(B^{\prime})}$ from $\kappa$ discrete values requires $\mathcal{O}(\log{\kappa})$ steps. The multinomial sampling of the power of adjoint operators $\vec{m}$ and $\vec{n}$ requires $\mathcal{O}(s\log{\kappa})$ steps. Finally, the uniform sampling of the Hamiltonian summands from the light-cone region requires $\mathcal{O}(s\log{n})$ steps. To summarize, the space and time cost of Algorithm 2 are $\mathcal{O}(K\kappa)$ and $\mathcal{O}(K(\log{\kappa}+\log{n}))$ , respectively.

Appendix E NESTED COMMUTATOR COMPENSATION FOR TROTTER FORMULAS OF GENERAL HAMILTONIANS

We now extend the methods to analyze the lattice model Hamiltonian to a general Hamiltonian. Consider a $L$ -sparse Hamiltonian with the form $H=\sum_{l=1}^{L}H_{l}$ . The $K$ th-order Trotter formula ( $K=1$ or $2k$ , $k\in\mathbb{N}_{+}$ ) for $U(x)=e^{-ixH}$ can be written as

S_{K}(x)=\prod_{j=1}^{\kappa}\prod_{l=1}^{L}e^{-ixa_{(j,l)}H_{\pi_{j}(l)}}.

(228)

Here, $\kappa$ is the number of stages in the Trotter formula, that is, how many times each Hamiltonian component $H_{l}$ is repeated in the implementation. We have $\kappa=1$ for $K=1$ and $\kappa=2\times 5^{k-1}$ when $K=2k$ . The stage length coefficients $a_{(j,l)}$ are determined based on Eq. 24 and Eq. 25. The permutation $\pi_{j}$ indicates the ordering of the summands $\{H_{l}\}$ within the $j$ th stage in the Trotter formula. In Suzuki’s constructions Suzuki (1990) of Trotter formulas considered in this work, we alternately reverse the ordering of summands between neighboring stages.

In what follows, we omit the subscript $K$ in $S_{K}(x)$ for simplicity. To further simplify the notation, we introduce the lexicographical order Childs et al. (2021) for the pair of tuples $(j,l)$ in Eq. 228. For two pairs of tuples $(j,l)$ and $(j^{\prime},l^{\prime})$ , we have

1.

$(j,l)\succeq(j^{\prime},l^{\prime})$ if $j>j^{\prime}$ , or if $j=j^{\prime}$ and $l>l^{\prime}$ .
2.

$(j,l)\succ(j^{\prime},l^{\prime})$ if $(j,l)\succeq(j^{\prime},l^{\prime})$ and $(j,l)\neq(j^{\prime},l^{\prime})$ .

We can also define $(j,l)\preceq(j^{\prime},l^{\prime})$ and $(j,l)\prec(j^{\prime},l^{\prime})$ in the same way. We denote the number of different tuples $(j,l)$ in $S_{K}(x)$ as $\Upsilon$ , which is usually equal to $\kappa L$ . We can then express $S_{K}(x)$ and $S_{K}(x)^{\dagger}$ in the following way:

\displaystyle S_{K}(x)=\prod_{(j,l)}^{\leftarrow}e^{-ixa_{(j,l)}H_{\pi_{j}(l)}},\quad S_{K}(x)^{\dagger}=\prod_{(j,l)}^{\rightarrow}e^{ixa_{(j,l)}H_{\pi_{j}(l)}}.

(229)

Similar to the case of lattice Hamiltonians, based on Eq. 94 and Eq. 111, we can expand the Trotter remainder $V_{K}(x)$ to the following form:

V_{K}(x)=I+F_{L}(x)+F_{K,\mathcal{res}}(x),

(230)

where

	$\displaystyle F_{L}(x)$	$\displaystyle=\sum_{s=K}^{2K}C_{s}\frac{x^{s+1}}{(s+1)!},$		(231)
	$\displaystyle F_{K,\mathcal{res}}(x)$	$\displaystyle=\int_{0}^{x}d\tau\left(M_{K}(\tau)J_{L}(\tau)+V_{K}(\tau)J_{K,res,2K}(\tau)\right).$		(231)

Here, $F_{L}(x)$ indicates the leading-order terms to be compensated by LCU formula, $F_{K,\mathcal{res}}(x)=\mathcal{O}(x^{2K+2})$ is the high-order part. In practice, we remove the high-order part with order $s>2K+1$ and implement only the leading-order terms

	$\displaystyle\tilde{V}_{K}^{(nc)}(x)$	$\displaystyle=I+F_{L}(x)=I+\sum_{s=K+1}^{2K+1}C_{s-1}\frac{x^{s}}{s!}$		(232)
		$\displaystyle=\sqrt{1+(\eta_{\Sigma}^{(nc)})^{2}}\sum_{s=K+1}^{2K+1}\frac{\eta_{s}^{(nc)}}{\eta_{\Sigma}^{(nc)}}R_{2,s}^{(nc)}(\eta_{\Sigma}),$		(232)

where $\eta_{\Sigma}^{(nc)}:=\sum_{s=K+1}^{2K+1}\|C_{s-1}\|_{1}\frac{x^{s}}{s!}$ . The explicit form of $R_{2,s}^{(nc)}(\eta_{\Sigma})$ can be obtained by the definitions in Eq. 109, Eq. 111 and the Pauli operator decomposition based on the nested-commutator form in Eq. 164.

We are going to finish the following tasks:

1.

(App. E.1) Derive the explicit formulas for the leading-order expansion terms $C_{s}$ with $s=K,...,2K$ .
2.

(App. E.2) Prove the $1$ -norm $\mu_{K}^{(nc)}(x)$ and error bound $\varepsilon_{K}^{(nc)}(x)$ of the LCU formula $\tilde{V}^{(nc)}_{K}(x)$ in Eq. 232.

E.1 Derivation of nested-commutator form for general Hamiltonians

Based on the Trotter formulas in Eq. 229, we expand $R_{K}(x)$ in Eq. 92 as follows:

	$\displaystyle R_{K}(x)$	$\displaystyle=i\sum_{\gamma=1}^{\Upsilon}\prod_{\gamma^{\prime}=\gamma-1}^{1}e^{ixa_{\gamma^{\prime}}H_{\gamma^{\prime}}}\left(a_{\gamma}H_{\gamma}\right)\cdot$		(233)
		$\displaystyle\quad\prod_{\gamma^{\prime}=\Upsilon}^{\gamma+1}e^{ixa_{\gamma^{\prime}}H_{\gamma^{\prime}}}-iH\prod_{\gamma^{\prime}=\Upsilon}^{1}e^{ixa_{\gamma^{\prime}}H_{\gamma^{\prime}}},$		(233)

$J_{K}(x)$ in Eq. 95 can then be written as,

	$\displaystyle J_{K}(x)$	$\displaystyle=i\sum_{\gamma=1}^{\Upsilon}\prod_{\gamma^{\prime}=\gamma+1}^{\Upsilon}e^{-ixa_{\gamma^{\prime}}\mathrm{ad}_{H_{\gamma^{\prime}}}}(a_{\gamma}H_{\gamma})$		(234)
		$\displaystyle\quad-i\prod_{\gamma^{\prime}=1}^{\Upsilon}e^{-ixa_{\gamma^{\prime}}\mathrm{ad}_{H_{\gamma^{\prime}}}}H.$		(234)

The derivative of $J_{K}(x)$ is

	$\displaystyle J^{(s)}_{K}(x)$	$\displaystyle=i\sum_{\gamma=1}^{\Upsilon}\sum_{\begin{subarray}{c}m_{\gamma+1}...m_{\Upsilon}\\ \sum_{\upsilon=\gamma+1}^{\Upsilon}m_{\upsilon}=s\end{subarray}}\binom{s}{m_{\gamma+1},...,m_{\Upsilon}}\left(\prod_{\gamma^{\prime}=\gamma+1}^{\Upsilon}(-ia_{\gamma^{\prime}}\mathrm{ad}_{H_{\gamma^{\prime}}})^{m_{\gamma^{\prime}}}e^{-ixa_{\gamma^{\prime}}\mathrm{ad}_{H_{\gamma^{\prime}}}}\right)(a_{\gamma}H_{\gamma})$		(235)
		$\displaystyle\quad-i\sum_{\begin{subarray}{c}m_{1}...m_{\Upsilon}\\ \sum_{\upsilon=1}^{\Upsilon}m_{\upsilon}=s\end{subarray}}\binom{s}{m_{1},...,m_{\Upsilon}}\left(\prod_{\gamma^{\prime}=1}^{\Upsilon}(-ia_{\gamma^{\prime}}\mathrm{ad}_{H_{\gamma^{\prime}}})^{m_{\gamma^{\prime}}}e^{-ixa_{\gamma^{\prime}}\mathrm{ad}_{H_{\gamma^{\prime}}}}\right)H.$		(235)

Using the operator-valued Taylor-series expansion in Eq. 121, we have

	$\displaystyle C_{s}$	$\displaystyle=i\sum_{\gamma=1}^{\Upsilon}\sum_{\begin{subarray}{c}\sum_{(j^{\prime},l^{\prime})}m_{(j^{\prime},l^{\prime})}=s\\ (j^{\prime},l^{\prime})\succ(j,l)\end{subarray}}\binom{s}{\{m_{(j^{\prime},l^{\prime})}\}}\left(\prod_{(j^{\prime},l^{\prime})\succ(j,l)}^{\leftarrow}(-ia_{(j^{\prime},l^{\prime})}\mathrm{ad}_{H_{\pi_{j^{\prime}}(l^{\prime})}})^{m_{(j^{\prime},l^{\prime})}}\right)(a_{(j,l)}H_{\pi_{j}(l)})$		(236)
		$\displaystyle\quad-i\sum_{\sum_{(j^{\prime},l^{\prime})}m_{(j^{\prime},l^{\prime})}=s}\binom{s}{\{m_{(j^{\prime},l^{\prime})}\}}\left(\prod_{(j^{\prime},l^{\prime})}^{\leftarrow}(-ia_{(j^{\prime},l^{\prime})}\mathrm{ad}_{H_{\pi_{j^{\prime}}(l^{\prime})}})^{m_{(j^{\prime},l^{\prime})}}\right)H$		(236)

which is the nested-commutator form. Here, $\{m_{(j^{\prime},l^{\prime})}\}$ are a group of integers whose summation is $s$ . Their corresponding multinomial coefficient is given by

\binom{s}{\{m_{(j^{\prime},l^{\prime})}\}}:=\frac{s!}{\prod_{(j^{\prime},l^{\prime})}m_{(j^{\prime},l^{\prime})}!}.

(237)

If we define

A_{(j,l)}=A_{\gamma}=-ia_{\gamma}H_{\gamma}=-ia_{(j,l)}H_{\pi_{j}(l)},

(238)

we can simplify Eq. 236 as

	$\displaystyle C_{s}$	$\displaystyle=-\sum_{\gamma=1}^{\Upsilon}\sum_{\begin{subarray}{c}m_{\gamma+1}...m_{\Upsilon}\\ \sum_{\upsilon=\gamma+1}^{\Upsilon}m_{\upsilon}=s\end{subarray}}\binom{s}{m_{\gamma+1},...,m_{\Upsilon}}\prod_{\gamma^{\prime}=\gamma+1}^{\Upsilon}\mathrm{ad}_{A_{\gamma^{\prime}}}^{m_{\gamma^{\prime}}}A_{\gamma}$		(239)
		$\displaystyle+\sum_{\begin{subarray}{c}m_{1}...m_{\Upsilon}\\ \sum_{\upsilon=1}^{\Upsilon}m_{\upsilon}=s\end{subarray}}\binom{s}{m_{1},...,m_{\Upsilon}}\prod_{\gamma^{\prime}=1}^{\Upsilon}\mathrm{ad}_{A_{\gamma^{\prime}}}^{m_{\gamma^{\prime}}}(-iH),$		(239)

which is the summation of nested commutators. Therefore, one can still pair the leading-order terms $F_{L}(x)$ defined in Eq. 231 with $I$ to suppress the $1$ -norm, shown in Eq. 232.

E.2 Norm bounds for the nested-commutator expansion

Now, we are going to bound the $1$ -norm $\mu^{(nc)}_{K}(x)$ and distance $\varepsilon^{(nc)}_{K}(x)$ of the truncated LCU formula $\tilde{V}_{K}^{(nc)}(x)$ in Eq. 110 for a general Hamiltonian. We will use the following formula in the derivation.

Lemma 4 (Theorem 5 in Childs et al. (2021)).

Let $A_{1},A_{2}$ , …, $A_{r}$ and $B$ be operators. Then, the conjugation has the expansion,

		$\displaystyle e^{\tau\mathrm{ad}_{A_{r}}}.e^{\tau\mathrm{ad}_{A_{2}}}e^{\tau\mathrm{ad}_{A_{1}}}B$		(240)
		$\displaystyle\quad=G_{0}+G_{1}\tau+.+G_{s-1}\frac{\tau^{s-1}}{(s-1)!}+G_{res,s}(\tau).$		(240)

Here, $G_{0},G_{1},...,G_{s-1}$ are operators independent of $\tau$ . The operator-valued function $G_{res,s}(\tau)$ is given by

		$\displaystyle G_{res,s}(\tau)=\sum_{\gamma=1}^{r}\sum_{\begin{subarray}{c}m_{1}+...+m_{\gamma}=s\\ m_{\gamma}\neq 0\end{subarray}}e^{\tau\mathrm{ad}_{A_{r}}}.e^{\tau\mathrm{ad}_{A_{\gamma+1}}}\cdot$		(241)
		$\displaystyle\int_{0}^{\tau}d\tau_{2}\frac{(\tau-\tau_{2})^{m_{\gamma}-1}\tau^{m_{1}+...+m_{\gamma-1}}}{(m_{\gamma}-1)!m_{\gamma-1}!...m_{1}!}e^{\tau_{2}\mathrm{ad}_{A_{\gamma}}}\mathrm{ad}_{A_{\gamma}}^{m_{\gamma}}.\mathrm{ad}_{A_{1}}^{m_{1}}B.$		(241)

Furthermore, we have the spectral-norm bound,

\|G_{res,s}(\tau)\|\leq\alpha_{\mathrm{com}}^{(s)}(A_{r},...,A_{1};B)\frac{|\tau|^{s}}{s!}e^{2|\tau|\sum_{\gamma=1}^{r}\|A_{r}\|},

(242)

for general operators and

\|G_{res,s}(\tau)\|\leq\alpha_{\mathrm{com}}^{(s)}(A_{r},...,A_{1};B)\frac{|\tau|^{s}}{s!},

(243)

when $A_{1},...,A_{r}$ are anti-Hermitian. $\alpha_{\mathrm{com}}^{(s)}(A_{r},...,A_{1};B)$ is defined in Eq. 245.

We have the following proposition.

Proposition 13 (Trotter-LCU formula by nested-commutator compensation for general Hamiltonians).

For $x>0$ , $\tilde{V}_{K}(x)$ in Eq. 232 is a $(\mu^{(nc)}(x),\varepsilon^{(nc)}(x))$ -LCU formula of $K$ th-order Trotter remainder $V_{K}(x)=U(x)S_{K}(x)$ with

$\displaystyle\mu^{(nc)}$	$\displaystyle\leq\sqrt{1+4\kappa^{2}\left(\sum_{s=K}^{2K}\alpha_{H,1}^{(s)}\frac{x^{s+1}}{(s+1)!}\right)^{2}},$	(244)
$\displaystyle\varepsilon^{(nc)}$	$\displaystyle\leq 4\kappa^{2}\alpha_{H}^{(K)}\sum_{s=K}^{2K}\alpha_{H}^{(s)}\frac{x^{s+K+2}}{s!(K+1)!(s+K+2)}$
	$\displaystyle\quad+2\kappa\frac{x^{2K+2}}{(2K+2)!}\alpha^{(2K+1)}_{H}.$

Here, we define the nested-commutator norms to be

		$\displaystyle\alpha^{(s)}_{H}=\sum_{l=1}^{L}\alpha_{\mathrm{com}}^{(s)}(\overleftarrow{\{H_{\pi_{j^{\prime}}(l^{\prime})}\}};H_{l}),$		(245)
		$\displaystyle\alpha_{\mathrm{com}}^{(s)}(A_{r},.,A_{1};B)=$
		$\displaystyle\quad\sum_{m_{1}+...+m_{r}=s}\binom{s}{m_{1},...,m_{r}}\\|\mathrm{ad}_{A_{r}}^{m_{r}}.\mathrm{ad}_{A_{1}}^{m_{1}}B\\|,$

where $\overleftarrow{\{H_{\pi_{j^{\prime}}(l^{\prime})}\}}$ indicates a sequence of $\Upsilon=\kappa L$ summands with the lexicographical order given by the $K$ th-order Trotter formula in Eq. 229. $\alpha^{(s)}_{H,1}$ is defined similarly by replacing the spectral norm to $1$ -norm in Eq. 245.

Proof.

From Eq. 239 we can bound the $1$ -norm and spectral norm of $C_{s}$ by

$\displaystyle\\|C_{s}\\|$	$\displaystyle\leq\sum_{\gamma=1}^{\Upsilon}\alpha_{\mathrm{com}}^{(s)}(A_{\Upsilon},.,A_{\gamma+1};A_{\gamma})$	(246)
	$\displaystyle\quad+\alpha_{\mathrm{com}}^{(s)}(A_{\Upsilon},.,A_{1};H),$
$\displaystyle\\|C_{s}\\|_{1}$	$\displaystyle\leq\sum_{\gamma=1}^{\Upsilon}\alpha_{com,1}^{(s)}(A_{\Upsilon},.,A_{\gamma+1};A_{\gamma})$
	$\displaystyle\quad+\alpha_{com,1}^{(s)}(A_{\Upsilon},.,A_{1};H).$

Now, we are going to bound $\|J_{K}(x)\|$ and $\|J_{K,res,2K}(x)\|$ . From Eq. 234 and using Lemma 4 we have

	$\displaystyle J_{K}(x)$	$\displaystyle=-\sum_{\gamma=1}^{\Upsilon}\prod_{\gamma^{\prime}=\gamma+1}^{\Upsilon}e^{\mathrm{ad}_{A_{\gamma^{\prime}}}}(A_{\gamma})+\prod_{\gamma^{\prime}=1}^{\Upsilon}e^{\mathrm{ad}_{A_{\gamma^{\prime}}}}(-iH)$		(247)
		$\displaystyle=\sum_{\gamma=1}^{\Upsilon}G^{(\gamma)}_{res,K}(x)+G^{(0)}_{res,K}(x).$		(247)

In the second line, we use Lemma 4 to expand all the conjugate matrix exponentials to the following form:

		$\displaystyle\prod_{\gamma^{\prime}=\gamma+1}^{\Upsilon}e^{\mathrm{ad}_{A_{\gamma^{\prime}}}}(A_{\gamma})=G^{(\gamma)}_{0}+G^{(\gamma)}_{1}x+..$		(248)
		$\displaystyle\quad\quad+G^{(\gamma)}_{K-1}\frac{x^{K-1}}{(K-1)!}+G^{(\gamma)}_{res,K},\quad\gamma=1,2.,\Upsilon,$
		$\displaystyle\prod_{\gamma^{\prime}=1}^{\Upsilon}e^{\mathrm{ad}_{A_{\gamma^{\prime}}}}(-iH)=G^{(0)}_{0}+G^{(0)}_{1}x+..$
		$\displaystyle\quad\quad+G^{(0)}_{K-1}\frac{x^{K-1}}{(K-1)!}+G^{(0)}_{res,K}.$

The low-order terms $G_{s}^{(\gamma)}$ with $s=0,1,...,K-1$ in Eq. 247 cancel out due to the order condition in Proposition 5. Therefore, we have

		$\displaystyle\\|J_{K}(x)\\|\leq\sum_{\gamma=1}^{\Upsilon}\\|G^{(\gamma)}_{res,K}(x)\\|+\\|G^{(0)}_{res,K}(x)\\|$		(249)
		$\displaystyle\leq\frac{x^{K}}{K!}\Big{(}\sum_{\gamma=1}^{\Upsilon}\alpha_{\mathrm{com}}^{(K)}(A_{\Upsilon},.,A_{\gamma+1};A_{\gamma})+\alpha_{\mathrm{com}}^{(K)}(A_{\Upsilon},.,A_{1};H)\Big{)}$
		$\displaystyle\leq 2\frac{x^{K}}{K!}\sum_{\gamma=1}^{\Upsilon}\alpha_{\mathrm{com}}^{(K)}(A_{\Upsilon},.,A_{1};H_{\gamma})$
		$\displaystyle=2\kappa\frac{x^{K}}{K!}\sum_{l=1}^{L}\alpha_{\mathrm{com}}^{(K)}(\overleftarrow{\{a_{(j^{\prime},l^{\prime})}H_{\pi_{j^{\prime}}(l^{\prime})}\}};H_{l})$
		$\displaystyle\leq 2\kappa\frac{x^{K}}{K!}\sum_{l=1}^{L}\alpha_{\mathrm{com}}^{(K)}(\overleftarrow{\{H_{\pi_{j^{\prime}}(l^{\prime})}\}};H_{l})=:2\kappa\frac{x^{K}}{K!}\alpha^{(K)}_{H}.$

Following the same way, we can bound $\|C_{s}\|$ and $\|C_{s}\|_{1}$ in Eq. 246 as

	$\displaystyle\\|C_{s}\\|$	$\displaystyle\leq 2\kappa\alpha^{(s)}_{H},$		(250)
	$\displaystyle\\|C_{s}\\|_{1}$	$\displaystyle\leq 2\kappa\alpha^{(s)}_{H,1}.$		(250)

Now, we are going to bound $\|J_{K,res,2K}(x)\|$ . Similar to Eq. 247, we expand all the conjugate matrix exponentials to $2K$ th-order,

	$\displaystyle J_{K}(x)$	$\displaystyle=\sum_{\gamma=1}^{\Upsilon}\prod_{\gamma^{\prime}=\gamma+1}^{\Upsilon}e^{\mathrm{ad}_{A_{\gamma^{\prime}}}}(A_{\gamma})+\prod_{\gamma^{\prime}=1}^{\Upsilon}e^{\mathrm{ad}_{A_{\gamma^{\prime}}}}(-iH)$		(251)
		$\displaystyle=\sum_{s=K}^{2K}C_{s}\frac{x^{s}}{s!}+\sum_{\gamma=1}^{\Upsilon}G^{(\gamma)}_{res,2K+1}(x)+G^{(0)}_{res,2K+1}(x),$		(251)

in the second line, we use the expansion of $J_{K}(x)$ in Eq. 104. Based on Eq. 249 we then have

$\displaystyle J_{K,res,2K}(x)=\sum_{\gamma=1}^{\Upsilon}$	$\displaystyle G^{(\gamma)}_{res,2K+1}(x)+G^{(0)}_{res,2K+1}(x)$	(252)
$\displaystyle\Rightarrow\\|J_{K,res,2K}(x)\\|$	$\displaystyle\leq\sum_{\gamma=1}^{\Upsilon}\\|G^{(\gamma)}_{res,2K+1}(x)\\|+\\|G^{(0)}_{res,2K+1}(x)\\|$
	$\displaystyle\leq 2\kappa\frac{x^{2K+1}}{(2K+1)!}\alpha^{(2K+1)}_{H}.$

We omit the derivation to the second line since it is the same as the one in Eq. 249.

Then we can bound $\|M_{K}(x)\|$ by

\displaystyle\|M_{K}(x)\|\leq\int_{0}^{x}d\tau\|J_{K}(\tau)\|\leq 2\kappa\frac{x^{K+1}}{(K+1)!}\alpha_{H}^{(K)}.

(253)

Finally, by applying Proposition 6 and using Eq. 250, we can bound the $1$ -norm $\mu^{(nc)}_{K}(x)$ as follows:

\mu^{(nc)}_{K}(x)\leq\sqrt{1+4\kappa^{2}\left(\sum_{s=K}^{2K}\alpha_{H,1}^{(s)}\frac{x^{s+1}}{(s+1)!}\right)^{2}},

(254)

while the accuracy $\varepsilon^{(nc)}_{K}(x)$ can be bounded using Eq. 246, Eq. 252, and Eq. 253,

		$\displaystyle\varepsilon^{(nc)}_{K}(x)=\\|F_{K,\mathcal{res}}(x)\\|\leq\int_{0}^{x}d\tau\left(\\|M_{K}(\tau)\\|\\|J_{L}(\tau)\\|+\\|J_{K,res,2K}(\tau)\\|\right)$		(255)
		$\displaystyle\leq\int_{0}^{x}d\tau\left(4\kappa^{2}\frac{\tau^{K+1}}{(K+1)!}\alpha_{H}^{(K)}\sum_{s=K}^{2K}\alpha_{H}^{(s)}\frac{\tau^{s}}{s!}+2\kappa\frac{\tau^{2K+1}}{(2K+1)!}\alpha^{(2K+1)}_{H}\right)$
		$\displaystyle\leq 4\kappa^{2}\alpha_{H}^{(K)}\sum_{s=K}^{2K}\alpha_{H}^{(s)}\frac{x^{s+K+2}}{s!(K+1)!(s+K+2)}+2\kappa\frac{x^{2K+2}}{(2K+2)!}\alpha^{(2K+1)}_{H}.$

∎

From Proposition 13 we can see that, to characterize the performance of the Trotter-LCU algorithm, we only need to estimate the values of $\alpha^{(s)}_{H}$ and $\alpha^{(s)}_{H,1}$ for a given Hamiltonian. We can further simplify the form of $\alpha_{H}^{(s)}$ by the following upper bound Childs et al. (2021),

	$\displaystyle\alpha_{H}^{(s)}$	$\displaystyle\leq\kappa^{s}\sum_{l_{s+1}=1}^{L}.\sum_{l_{2}=1}^{L}\\|[H_{l_{s+1}},.[H_{l_{2}},H_{l}]].]\\|$		(256)
		$\displaystyle=:\kappa^{s}\tilde{\alpha}_{\mathrm{com}}(H),$		(256)

which is because the commutator terms in the left-hand side must be of the form on the right. Moreover, if we fix one term $\|[H_{l_{s+1}},...[H_{l_{2}},H_{l}]]...]\|$ on the right-hand side, we can find at most $\kappa^{s}$ times of this term on the left-hand side.

Appendix F SOME USEFUL FORMULAS IN THE PROOF

Lemma 5 (Tail bound for the Poisson distribution (Theorem 1 in Canonne (2016))).

Suppose $\hat{X}$ is a random variable with Poisson distribution so that $\Pr(\hat{X}=s)=\mathrm{Poi}(s;x)=e^{-x}\frac{x^{s}}{s!}$ , where $x>0$ is the expectation value. Then, for any $\epsilon>0$ , we have,

\Pr(\hat{X}\geq x+\epsilon)\leq e^{-\frac{\epsilon^{2}}{2x}h(\frac{\epsilon}{x})},

(257)

and, for any $0<\epsilon<x$ ,

\Pr(\hat{X}\leq x-\epsilon)\leq e^{-\frac{\epsilon^{2}}{2x}h(-\frac{\epsilon}{x})}.

(258)

Here, $h(u):=2\frac{(1+u)\ln(1+u)-u}{u^{2}}$ for $u\geq-1$ .

From Lemma 5 we have the following corollaries.

Corollary 1.

For $x>0$ and positive integer $k$ such that $x<k+1$ , we have

\sum_{s=k+1}^{\infty}\frac{x^{s}}{s!}\leq\left(\frac{ex}{k+1}\right)^{k+1}.

(259)

Proof.

We set $\epsilon=(k+1)-x$ . From Lemma 5 we have

	$\displaystyle\Pr(\hat{X}\geq k+1)$	$\displaystyle\leq\exp\left(-\frac{(k+1-x)^{2}}{2x}h(\frac{k+1-x}{x})\right)$		(260)
		$\displaystyle=e^{-x}\left(\frac{ex}{k+1}\right)^{k+1}.$		(260)

Therefore,

		$\displaystyle\quad\Pr(\hat{X}\geq k+1)=e^{-x}\sum_{s=k+1}^{\infty}\frac{x^{s}}{s!}\leq e^{-x}\left(\frac{ex}{k+1}\right)^{k+1}$		(261)
		$\displaystyle\Rightarrow\sum_{s=k+1}^{\infty}\frac{x^{s}}{s!}\leq\left(\frac{ex}{k+1}\right)^{k+1}.$		(261)

∎

Corollary 2.

For $x>0$ and positive integer $k$ such that $x<k+1$ , we have

e^{x}-\sum_{s=1}^{k}\frac{x^{s}}{s!}\leq e^{ex^{k+1}}.

(262)

Proof.

When $x<k+1$ , from Corollary 1 we have

		$\displaystyle\quad e^{-x}\sum_{s=k+1}^{\infty}\frac{x^{s}}{s!}\leq e^{-x}\left(\frac{ex}{k+1}\right)^{k+1}$		(263)
		$\displaystyle\Rightarrow 1-e^{-x}-e^{-x}\sum_{s=1}^{k}\frac{x^{s}}{s!}\leq e^{-x}\left(\frac{ex}{k+1}\right)^{k+1}$
		$\displaystyle\Rightarrow e^{x}-\sum_{s=1}^{k}\frac{x^{s}}{s!}\leq 1+\left(\frac{ex}{k+1}\right)^{k+1}$
		$\displaystyle\quad\leq 1+\frac{(ex)^{k+1}}{(k+1)!}\leq e^{ex^{k+1}}.$

∎

Lemma 6 (Proposition 9 in Wan et al. (2022)).

For any $\beta>0$ , $1>\epsilon>0$ , we have $\left(\frac{e\beta}{s}\right)^{s}\leq\epsilon$ , for all $s\geq f(\beta,\epsilon):=\frac{\ln(\frac{1}{\epsilon})}{W_{0}\left(\frac{1}{e\beta}\ln(\frac{1}{\epsilon})\right)}$ . Here, $W_{0}(y)$ is the principle branch of the Lambert $W$ function.

Lemma 7 (Theorem 2.7 in Hoorfar and Hassani (2008)).

When $y\geq e$ we have

		$\displaystyle\ln(y)-\ln\ln(y)+\frac{1}{2}\frac{\ln\ln(y)}{\ln(y)}\leq W_{0}(y)$		(264)
		$\displaystyle\quad\leq\ln(y)-\ln\ln(y)+\frac{e}{e-1}\frac{\ln\ln(y)}{\ln(y)}.$		(264)

Appendix G ADDITIONAL NUMERICAL RESULTS

In this section, we provide more numerical results by comparing the gate costs of Trotter-LCU algorithms with other typical algorithms, especially the Trotter algorithm and “post-Trotter” algorithm with best performance, i.e., the fourth-order Trotter algorithm and the quantum signal processing (QSP) algorithm. We will mainly consider two different scenarios—generic $L$ -sparse Hamiltonians and lattice Hamiltonians. In the former case, the previously known best result is given by the QSP algorithm Low and Chuang (2017); in the latter case, since we can take advantage of the spacial locality and commutator information, the fourth-order Trotter algorithm is known to have the best performance Childs et al. (2018); Childs and Su (2019).

For the Trotter algorithms and Trotter-LCU algorithms, we first compile the circuit to $\mathcal{CNOT}$ gates, single-qubit Clifford gates and non-Clifford $Z$ -axis rotation gate $R_{z}(\theta)=e^{i\theta Z}$ . On the other hand, the circuit compilation for the QSP algorithm is more complicated: we need to decompose the state-preparation oracles and the select- $H$ gates. We follow the qROM design in Ref. Babbush et al. (2018b) based on a sawtooth structure. The detailed gate resource analysis can be found in a companion work Sun et al. (2024). Based on the qROM design, we decompose all the state preparation oracles and the select- $H$ gates to Toffoli gates, which can be further decomposed to $\mathcal{CNOT}$ gates, single-qubit Clifford gates and $T$ gates.

For a fair comparison between the gate costs of Trotter, Trotter-LCU and QSP algorithms, we need to further compile the $Z$ -axis rotation gate $R_{z}(\theta)$ to $T$ gate. We follow the gate compilation work in Ref. Bocharov et al. (2015), where the expected $T$ -gate number to compile $R_{z}(\theta)$ with random $\theta$ is about

c_{T}=1.149\log_{2}(1/\epsilon)+9.2,

(265)

where $\epsilon$ is the compilation accuracy. We set the accuracy $\epsilon=10^{-15}$ in the later resource estimation. In this case, $c_{T}\approx 66$ .

G.1 Generic $L$ -sparse Hamiltonians

For the generic $L$ -sparse Hamiltonian, we choose the $2$ -local Hamiltonian

H=\sum_{i,j}J_{ij}X_{i}X_{j}+\sum_{i}Z_{i},

(266)

with $J_{ij}=1$ as an example, in which case we ignore the commutator information between different Hamiltonian summands. This simple model works as a representative of many generic Hamiltonians where the commutator information is not helpful or too complicated to count on, e.g., quantum chemistry Hamiltonian for the molecules. Without the commutator information, the former best Hamiltonian simulation algorithm is QSP Low and Chuang (2017).

In Fig. 13 and Fig. 14, we estimate and compare the $\mathcal{CNOT}$ and $T$ gate counts for the PTSC algorithms, QSP and fourth-order Trotter algorithm with respect to the evolution time. We choose fourth-order Trotter algorithm with random permutation since it performs the best over all the Trotter algorithms. The gate counting method for fourth-order Trotter with random permutation is based on the analytical bounds in Ref. Childs et al. (2019). For the QSP algorithm, we estimate the gate count based on the QROM construction in Ref. Babbush et al. (2018b). Only the high-accuracy results $\varepsilon=10^{-5}$ for our method and quantum signal processing are presented since our method and quantum signal processing has a logarithmic dependence and hence are less prone to the accuracy while the Trotter formulae has polynomial dependence on the accuracy.

From Fig. 13 and Fig. 14 we can see that, the PTSC algorithm owns a lower $\mathcal{CNOT}$ gate cost than the QSP algorithm because it does not need the qROM for classical data loading. On the other hand, the PTSC algorithms have a larger T-gate cost than QSP algorithm. This is mainly due to the compilation cost of arbitrary $Z$ -axis rotation gate $R_{z}(\theta)$ to $T$ gates: for each $R_{z}(\theta)$ gate with a random phase $\theta$ , we need $c_{T}=66$ $T$ gates on average to compile it to the accuracy of $\varepsilon=10^{-15}$ . In the Trotter or Trotter-LCU algorithms, the non-Clifford gate cost mainly originates from the $R_{z}(\theta)$ gates: in each segment, there are roughly $\kappa_{K}L$ $R_{z}(\theta)$ gates, leading to about $\kappa_{K}c_{T}L$ compiled $T$ gates. As a comparison, there are only few $R_{z}(\theta)$ gates in QSP algorithm used for the phase iteraction procedure to realize the Jacobi-Anger polynomials. The major non-Clifford gate cost lies in the compilation of state-preparation oracles and the select- $H$ gates, each of which can be realized by about $4L$ $T$ gates based on the qROM design in Ref. Babbush et al. (2018b).

In Fig. 15 we compare the qubit number required to implement the QSP and the PTSC algorithms, based on the $2$ -local Hamiltonian model. We can see a clear advantage of the spacial resource cost of PTSC to QSP algorithm.

G.2 Lattice Hamiltonians

Now, we consider the case of lattice Hamiltonians. We consider the Heisenberg model, $H=J\sum_{i}\vec{\sigma}_{i}\vec{\sigma}_{i+1}+h\sum_{i}Z_{i}$ , where $\vec{\sigma}_{i}:=(X_{i},Y_{i},Z_{i})$ is the vector of Pauli operators on the $i$ th qubit and $J=h=1$ .

We compare the $\mathcal{CNOT}$ and $T$ gate number of the second-order NCC algorithm, QSP and fourth-order Trotter algorithm in Fig. 16 and Fig. 17, respectively. The fourth-order Trotter error analysis is based on the nested-commutator bound (Proposition M.1 in Ref. Childs et al. (2021)), which is currently the tightest Trotter error bound. The performance of our second-order NCC algorithm is analyzed based on the detailed analysis in Section B. We explicitly calculate the $1$ -norm of the LCU formula and use the analytical bound for the accuracy analysis. Here, we do not introduce the empirical analysis for a fair comparison. Since the explicit evaluation of our fourth-order NCC algorithm is complicated, we mainly present the results for our second-order algorithm.

As addressed in Childs et al. (2021); Childs and Su (2019), fourth-order Trotter formula shows a near-optimal scaling with respect to the system size for the lattice model, which is clearly shown in Fig. 16. From Fig. 16 we can see that, our second-order NCC algorithm shows an advantage over the fourth-order Trotter algorithm. Furthermore, the $n$ and $t$ scaling of our second-order NCC algorithm is similar to the fourth-order Trotter algorithm, which is near optimal.

Appendix H COHERENT IMPLEMENTATION

In the main text, we primarily focus on the random-sampling implementation of the Trotter-LCU algorithms due to its simplicity. When the block encoding of an LCU formula of $V$ is feasible on a fault-tolerant quantum computer, we can also consider the coherent implementation of the Trotter-LCU algorithms.

The gate complexity of the coherent implementation of the Trotter-LCU algorithm is determined by the segment number $\nu$ , the Trotter order $K$ , the number of elementary unitaries $\Gamma$ and the gate complexity of each elementary unitary in the LCU formula. The value $\Gamma$ is related to the specific compensation method we use, which is usually proportional to $L$ and the truncation order $s_{c}$ . The gate complexity is

\displaystyle N_{K}=\mathcal{O}(\nu(\kappa_{K}L+\Gamma)),

(267)

To estimate the segment number $\nu$ , we first find a proper evolution time $x$ such that the $1$ -norm $\mu_{x}\leq 2$ . The segment number is then $\nu=t/x$ .

For the PTSC formula, based on the $1$ -norm bound in Theorem 1, we find that to ensure $\mu_{x}\leq 2$ , it is sufficient to set $x=\frac{1}{2\lambda}(\frac{\ln 2}{e+c_{k}})^{\frac{1}{2K+2}}=\mathcal{O}(\frac{1}{\lambda})$ . As a result, the number of segment $\nu=t/x=\mathcal{O}(\lambda t)$ . From Eq. 267, the overall gate complexity of the algorithm is

N_{K}^{(p)}=\mathcal{O}(\nu(\kappa_{K}L+\Gamma))=\mathcal{O}\left(\lambda tL\frac{\log(1/\varepsilon)}{\log\log(1/\varepsilon)}\right),

(268)

which is the same as the case when no Trotter formula is applied Berry et al. (2015).

For the NCC formula, based on the $1$ -norm bound in Theorem 2, we find that to ensure $\mu_{x}\leq 2$ , we need to set $x=\frac{1}{\beta\Lambda_{1}}\left(\frac{2\ln 2(K!)^{2}}{(n\kappa)^{2}}\right)^{\frac{1}{2K+2}}\Rightarrow\nu=\mathcal{O}(n^{\frac{1}{K+1}}t)$ . To achieve an overall simulation accuracy $\varepsilon$ , we need

\nu\varepsilon_{K}^{(nc)}(x)\leq\varepsilon\Rightarrow\nu=\mathcal{O}(n^{\frac{2}{2K+1}}t^{1+\frac{1}{2K+1}}\varepsilon^{-\frac{1}{2K+1}}).

(269)

Therefore, it suffice to choose $\nu$ to be

\nu=\mathcal{O}(n^{\frac{2}{2K+1}}t^{1+\frac{1}{2K+1}}\varepsilon^{-\frac{1}{2K+1}}).

(270)

In each segment, we need to implement $K$ th-order Trotter formula and the $(K+1)$ th- to $(2K+1)$ th-order LCU formula. The number of elementary unitaries is $\mathcal{O}(n)$ . Therefore, the overall gate complexity is

N_{K}^{(nc)}=\mathcal{O}(n^{1+\frac{2}{2K+1}}t^{1+\frac{1}{2K+1}}\varepsilon^{-\frac{1}{2K+1}}).

(271)

Although it does not achieve the logarithmic accuracy dependence seen in standard post-Trotter algorithms Berry et al. (2015); Low and Chuang (2019), it has the unique advantage of a system-size dependence of $\mathcal{O}(n)$ , which could be advantageous in large-scale coherent Hamiltonian simulations.

References

Feynman (1982) R. P. Feynman, International Journal of Theoretical Physics 21, 467 (1982), ISSN 1572-9575, URL https://doi.org/10.1007/BF02650179.
Abrams and Lloyd (1999) D. S. Abrams and S. Lloyd, Phys. Rev. Lett. 83, 5162 (1999), URL https://link.aps.org/doi/10.1103/PhysRevLett.83.5162.
Aspuru-Guzik et al. (2005) A. Aspuru-Guzik, A. D. Dutoi, P. J. Love, and M. Head-Gordon, Science 309, 1704 (2005), eprint https://www.science.org/doi/pdf/10.1126/science.1113479, URL https://www.science.org/doi/abs/10.1126/science.1113479.
Farhi et al. (2014) E. Farhi, J. Goldstone, and S. Gutmann, A quantum approximate optimization algorithm (2014), URL https://arxiv.org/abs/1411.4028.
Zhou et al. (2020) L. Zhou, S.-T. Wang, S. Choi, H. Pichler, and M. D. Lukin, Phys. Rev. X 10, 021067 (2020), URL https://link.aps.org/doi/10.1103/PhysRevX.10.021067.
Harrow et al. (2009) A. W. Harrow, A. Hassidim, and S. Lloyd, Phys. Rev. Lett. 103, 150502 (2009), URL https://link.aps.org/doi/10.1103/PhysRevLett.103.150502.
Lloyd (1996) S. Lloyd, Science 273, 1073 (1996), eprint https://www.science.org/doi/pdf/10.1126/science.273.5278.1073, URL https://www.science.org/doi/abs/10.1126/science.273.5278.1073.
Suzuki (1990) M. Suzuki, Physics Letters A 146, 319 (1990), ISSN 0375-9601, URL https://www.sciencedirect.com/science/article/pii/037596019090962N.
Suzuki (1991) M. Suzuki, Journal of Mathematical Physics 32, 400 (1991), URL https://aip.scitation.org/doi/10.1063/1.529425.
Berry et al. (2007) D. W. Berry, G. Ahokas, R. Cleve, and B. C. Sanders, Communications in Mathematical Physics 270, 359 (2007), URL https://doi.org/10.1007/s00220-006-0150-x.
Campbell (2019) E. Campbell, Phys. Rev. Lett. 123, 070503 (2019), URL https://link.aps.org/doi/10.1103/PhysRevLett.123.070503.
Childs et al. (2019) A. M. Childs, A. Ostrander, and Y. Su, Quantum 3, 182 (2019), ISSN 2521-327X, URL https://doi.org/10.22331/q-2019-09-02-182.
Childs and Su (2019) A. M. Childs and Y. Su, Phys. Rev. Lett. 123, 050503 (2019), URL https://link.aps.org/doi/10.1103/PhysRevLett.123.050503.
Endo et al. (2019) S. Endo, Q. Zhao, Y. Li, S. Benjamin, and X. Yuan, Phys. Rev. A 99, 012334 (2019), URL https://link.aps.org/doi/10.1103/PhysRevA.99.012334.
Heyl et al. (2019) M. Heyl, P. Hauke, and P. Zoller, Science Advances 5, eaau8342 (2019).
Chen et al. (2021) C.-F. Chen, H.-Y. Huang, R. Kueng, and J. A. Tropp, PRX Quantum 2, 040305 (2021), URL https://link.aps.org/doi/10.1103/PRXQuantum.2.040305.
Şahinoğlu and Somma (2021) B. Şahinoğlu and R. D. Somma, npj Quantum Information 7, 1 (2021), URL https://www.nature.com/articles/s41534-021-00451-w.
Su et al. (2021) Y. Su, H.-Y. Huang, and E. T. Campbell, Quantum 5, 495 (2021), URL https://quantum-journal.org/papers/q-2021-07-05-495/.
Tran et al. (2020) M. C. Tran, S.-K. Chu, Y. Su, A. M. Childs, and A. V. Gorshkov, Phys. Rev. Lett. 124, 220502 (2020), URL https://link.aps.org/doi/10.1103/PhysRevLett.124.220502.
Childs et al. (2021) A. M. Childs, Y. Su, M. C. Tran, N. Wiebe, and S. Zhu, Phys. Rev. X 11, 011020 (2021), URL https://link.aps.org/doi/10.1103/PhysRevX.11.011020.
Layden (2021) D. Layden, arXiv preprint arXiv:2107.08032 (2021).
Zhao et al. (2022) Q. Zhao, Y. Zhou, A. F. Shaw, T. Li, and A. M. Childs, Phys. Rev. Lett. 129, 270502 (2022), URL https://link.aps.org/doi/10.1103/PhysRevLett.129.270502.
Reiher et al. (2017) M. Reiher, N. Wiebe, K. M. Svore, D. Wecker, and M. Troyer, Proceedings of the National Academy of Sciences 114, 7555 (2017), eprint https://www.pnas.org/doi/pdf/10.1073/pnas.1619152114, URL https://www.pnas.org/doi/abs/10.1073/pnas.1619152114.
Berry et al. (2014) D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, and R. D. Somma, in Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing (Association for Computing Machinery, New York, NY, USA, 2014), STOC ’14, p. 283–292, ISBN 9781450327107, URL https://doi.org/10.1145/2591796.2591854.
Berry et al. (2015) D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, and R. D. Somma, Phys. Rev. Lett. 114, 090502 (2015), URL https://link.aps.org/doi/10.1103/PhysRevLett.114.090502.
Berry et al. (2015) D. W. Berry, A. M. Childs, and R. Kothari, in 56th Annual IEEE Symposium on Foundations of Computer Science (2015), pp. 792–809, URL https://ieeexplore.ieee.org/abstract/document/7354428.
Low and Chuang (2019) G. H. Low and I. L. Chuang, Quantum 3, 163 (2019), ISSN 2521-327X, URL https://doi.org/10.22331/q-2019-07-12-163.
Low and Chuang (2017) G. H. Low and I. L. Chuang, Phys. Rev. Lett. 118, 010501 (2017), URL https://link.aps.org/doi/10.1103/PhysRevLett.118.010501.
Low (2019) G. H. Low, in 51st Annual ACM Symposium on Theory of Computing (2019), pp. 491–502, ISBN 9781450367059, URL https://doi.org/10.1145/3313276.3316386.
Childs and Wiebe (2012) A. M. Childs and N. Wiebe, Quantum Information and Computation 12, 0901 (2012), ISSN 1533-7146, URL http://dx.doi.org/10.26421/QIC12.11-12.
Long (2011) G. L. Long, International Journal of Theoretical Physics 50, 1305 (2011), ISSN 1572-9575, URL https://doi.org/10.1007/s10773-010-0603-z.
Lin and Tong (2022) L. Lin and Y. Tong, PRX Quantum 3, 010318 (2022), URL https://link.aps.org/doi/10.1103/PRXQuantum.3.010318.
Yang et al. (2021) Y. Yang, B.-N. Lu, and Y. Li, PRX Quantum 2, 040361 (2021), URL https://link.aps.org/doi/10.1103/PRXQuantum.2.040361.
Wan et al. (2022) K. Wan, M. Berta, and E. T. Campbell, Phys. Rev. Lett. 129, 030503 (2022), URL https://link.aps.org/doi/10.1103/PhysRevLett.129.030503.
Faehrmann et al. (2022) P. K. Faehrmann, M. Steudtner, R. Kueng, M. Kieferova, and J. Eisert, Quantum 6, 806 (2022), ISSN 2521-327X, URL https://doi.org/10.22331/q-2022-09-19-806.
Lin and Tong (2020) L. Lin and Y. Tong, Quantum 4, 372 (2020), ISSN 2521-327X, URL https://doi.org/10.22331/q-2020-12-14-372.
Zeng et al. (2021) P. Zeng, J. Sun, and X. Yuan, Universal quantum algorithmic cooling on a quantum computer (2021), URL https://arxiv.org/abs/2109.15304.
Zhang et al. (2022) R. Zhang, G. Wang, and P. Johnson, Quantum 6, 761 (2022), ISSN 2521-327X, URL https://doi.org/10.22331/q-2022-07-11-761.
Kitaev (1995) A. Y. Kitaev, Quantum measurements and the abelian stabilizer problem (1995), URL https://arxiv.org/abs/quant-ph/9511026.
Litinski (2019) D. Litinski, Quantum 3, 128 (2019), ISSN 2521-327X, URL https://doi.org/10.22331/q-2019-03-05-128.
Canonne (2016) C. Canonne, A short note on poisson tail bounds (2016), URL http://www.cs.columbia.edu/~ccanonne/files/misc/2017-poissonconcentration.pdf.
Babbush et al. (2018a) R. Babbush, N. Wiebe, J. McClean, J. McClain, H. Neven, and G. K.-L. Chan, Phys. Rev. X 8, 011044 (2018a), URL https://link.aps.org/doi/10.1103/PhysRevX.8.011044.
Lee et al. (2021) J. Lee, D. W. Berry, C. Gidney, W. J. Huggins, J. R. McClean, N. Wiebe, and R. Babbush, PRX Quantum 2, 030305 (2021), URL https://link.aps.org/doi/10.1103/PRXQuantum.2.030305.
Berry et al. (2019) D. W. Berry, C. Gidney, M. Motta, J. R. McClean, and R. Babbush, Quantum 3, 208 (2019), ISSN 2521-327X, URL https://doi.org/10.22331/q-2019-12-02-208.
von Burg et al. (2021) V. von Burg, G. H. Low, T. Häner, D. S. Steiger, M. Reiher, M. Roetteler, and M. Troyer, Phys. Rev. Res. 3, 033055 (2021), URL https://link.aps.org/doi/10.1103/PhysRevResearch.3.033055.
Hagan and Wiebe (2023) M. Hagan and N. Wiebe, Quantum 7, 1181 (2023), ISSN 2521-327X, URL https://doi.org/10.22331/q-2023-11-14-1181.
Cho et al. (2024) C.-H. Cho, D. W. Berry, and M.-H. Hsieh, Phys. Rev. A 109, 062431 (2024), URL https://link.aps.org/doi/10.1103/PhysRevA.109.062431.
Zhao and Yuan (2021) Q. Zhao and X. Yuan, Quantum 5, 534 (2021), ISSN 2521-327X, URL https://doi.org/10.22331/q-2021-08-31-534.
Childs et al. (2018) A. M. Childs, D. Maslov, Y. Nam, N. J. Ross, and Y. Su, Proceedings of the National Academy of Sciences 115, 9456 (2018), URL https://www.pnas.org/content/115/38/9456.
Bringmann and Panagiotou (2012) K. Bringmann and K. Panagiotou, in Automata, Languages, and Programming, edited by A. Czumaj, K. Mehlhorn, A. Pitts, and R. Wattenhofer (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012), pp. 133–144, ISBN 978-3-642-31594-7.
Hoorfar and Hassani (2008) A. Hoorfar and M. Hassani, J. Inequal. Pure and Appl. Math 9, 5 (2008), URL https://www.emis.de/journals/JIPAM/article983.html?sid=983.
Babbush et al. (2018b) R. Babbush, C. Gidney, D. W. Berry, N. Wiebe, J. McClean, A. Paler, A. Fowler, and H. Neven, Phys. Rev. X 8, 041015 (2018b), URL https://link.aps.org/doi/10.1103/PhysRevX.8.041015.
Sun et al. (2024) J. Sun, P. Zeng, T. Gur, and M. Kim, arXiv preprint arXiv:2406.04307 (2024).
Bocharov et al. (2015) A. Bocharov, M. Roetteler, and K. M. Svore, Phys. Rev. Lett. 114, 080502 (2015), URL https://link.aps.org/doi/10.1103/PhysRevLett.114.080502.

$\displaystyle\\|U^{\nu}-\tilde{U}^{\nu}\\|$	$\displaystyle=\left\\|\sum_{k=1}^{\nu}U^{k-1}(U-\tilde{U})\tilde{U}^{\nu-k}\right\\|$
	$\displaystyle\leq\sum_{k=1}^{\nu}\\|U^{k-1}\\|\\|U-\tilde{U}\\|\\|\tilde{U}^{\nu-k}\\|$
	$\displaystyle\leq\nu\\|U-\tilde{U}\\|\sum_{k=1}^{\nu}\max\{\\|U\\|,\\|\tilde{U}\\|\}^{\nu-1}$	(39)
	$\displaystyle\leq\nu\varepsilon\mu^{\nu-1}\leq\nu\mu^{\prime}\varepsilon.$

$\displaystyle F_{1,s}(x)$	$\displaystyle=i\eta_{s}\sum_{\{r,\gamma\}\in\text{Her}}\Pr(r,\gamma\|s)P_{1,s}(r,\gamma)$	(72)
	$\displaystyle\quad+i\sum_{\{r,\gamma\}\in\text{anti-Her}}\Pr(r,\gamma\|s)(-i)P_{1,s}(r,\gamma),$
	$\displaystyle=i\eta_{s}\sum_{r,\gamma}\Pr(r,\gamma\|s)(-i)^{\mathbbm{1}[P_{1,s}(r,\gamma):\text{anti-Her}]}P_{1,s}(r,\gamma).$

		$\displaystyle I+\eta_{\Sigma}V_{1,s}^{\prime}=I+i\eta_{\Sigma}\sum_{r,\gamma}\Pr(r,\gamma\|s)P_{1,s}^{\prime}(r,\gamma)$		(75)
		$\displaystyle=\sum_{r,\gamma}\Pr(r,\gamma\|s)(I+\eta_{\Sigma}P_{1,s}^{\prime}(r,\gamma))$
		$\displaystyle=\sqrt{1+\eta_{\Sigma}^{2}}\sum_{r,\gamma}\Pr(r,\gamma\|s)\exp\left(i\theta(\eta_{\Sigma})P_{1,s}^{\prime}(r,\gamma)\right)$
		$\displaystyle=:\sqrt{1+\eta_{\Sigma}^{2}}R_{1,s}(\eta_{\Sigma}),\quad s=2,3.$

$\displaystyle\\|J_{K,L}(\tau)\\|$	$\displaystyle\leq\sum_{s=K}^{2K}\\|C_{s}\\|\frac{x^{s+1}}{(s+1)!},$	(114)
$\displaystyle\\|J_{K,res,2K}(\tau)\\|$	$\displaystyle\leq\int_{0}^{x}d\tau\frac{(x-\tau)^{s}}{s!}\\|J_{K}^{(s+1)}(\tau)\\|,$
$\displaystyle\\|M_{K}(\tau)\\|$	$\displaystyle\leq\int_{0}^{\tau}d\tau_{1}\int_{0}^{\tau_{1}}d\tau_{2}\frac{(\tau_{1}-\tau_{2})^{(K-1)}}{(K-1)!}\\|J_{K}^{(K)}(\tau_{2})\\|.$

		$\displaystyle\varepsilon_{K}^{(nc)}(x)=\\|F_{K,\mathcal{res}}^{(nc)}(x)\\|$		(115)
		$\displaystyle\leq\int_{0}^{x}d\tau\left(\\|M_{K}(\tau)\\|\\|J_{K,L}(\tau)\\|+\\|V_{K}(\tau)\\|\\|J_{K,res,2K}(\tau)\\|\right)$
		$\displaystyle=\int_{0}^{x}d\tau\left(\\|M_{K}(\tau)\\|\\|J_{K,L}(\tau)\\|+\\|J_{K,res,2K}(\tau)\\|\right).$

Simple and high-precision Hamiltonian simulation by compensating Trotter error with linear combination of unitary operations

Abstract

I INTRODUCTION

II SUMMARY of RESULTS

II.1 General idea

II.2 Paired Taylor-series compensation: overview

II.3 Nested-commutator compensation: overview

II.4 Efficient random-sampling implementation

II.5 Performance comparison

III PRELIMINARIES

III.1 Trotter formulas

Lemma 1 (Order condition of Trotter formulas, Theorem 4 in Childs et al. (2021)).

III.2 LCU formulas

Definition 1 (Childs and Wiebe (2012)).

Proposition 1 (Performance of the random-sampling LCU implementation, Theorem 2 from Faehrmann et al. (2022)).

Proposition 2 (Product of LCU formulas).

Proof.

IV Trotter-LCU algorithms with paired Taylor-series compensation

IV.1 Zeroth-order case

Proposition 3 (0th-order Trotter-LCU formula by paired Taylor-series compensation).

Proof.

IV.2 First-order case

Lemma 2 (Lemma 1 in Childs and Su (2019)).

Proposition 4 (first-order Trotter-LCU formula by paired Taylor-series compensation).

Proof.

IV.3 Random-sampling implementation and performance

Theorem 1 (Gate complexity of the KKth-order random-sampling Trotter-LCU algorithm by paired Taylor-series compensation).

Proof.

V TROTTER-LCU FORMULA WITH NESTED-COMMUTATOR COMPENSATION

V.1 Derivation of the nested-commutator formula

Lemma 3 ((Lemma A.1 in Childs et al. (2021)).

Proposition 5 (Order condition).

Proof.

Proposition 6 (Bound the 1-norm and error of nested-commutator expansion formula).

Proof.

V.2 Example: first-order lattice Hamiltonian

Proposition 7.

Proof.

Proposition 8 (first-order Trotter-LCU formula by nested-commutator compensation for lattice Hamiltonians).

Proof.

V.3 General construction and performance

Proposition 9 (Trotter-LCU formula by nested-commutator compensation for lattice Hamiltonians).

Theorem 2 (Gate complexity of the KKth-order random-sampling Trotter-LCU algorithm by nested-commutator compensation for lattice Hamiltonians).

Proof.

VI CONCLUSION AND OUTLOOK

Acknowledgements.

Appendix A PAIRED TAYLOR-SERIES COMPENSATION WITH HIGHER-ORDER TROTTER FORMULAS

Proposition 10 (2​k2kth-order Trotter-LCU formula by paired Taylor-series compensation).

Proof.

Appendix B TIGHT NESTED-COMMUTATOR ANALYSIS FOR SECOND-ORDER TROTTER-LCU ALGORITHM

B.1 Bound the 1-norm of recurrence function

Proposition 11.

Proof.

B.2 Estimating the gate counts

Appendix C EXPLICIT NESTED-COMMUTATOR COMPENSATION FOR HIGHER-ORDER TROTTER REMAINDERS

C.1 Derivation of LCU formula with nested-commutator form

C.2 Norm bounds for LCU formula

Proposition 12 (Upper bound of the norm of nested commutators).

Proof.

Appendix D EFFICIENT SAMPLING FOR THE HIGHER-ORDER NESTED-COMMUTATOR COMPENSATION ALGORITHM

Appendix E NESTED COMMUTATOR COMPENSATION FOR TROTTER FORMULAS OF GENERAL HAMILTONIANS

E.1 Derivation of nested-commutator form for general Hamiltonians

E.2 Norm bounds for the nested-commutator expansion

Lemma 4 (Theorem 5 in Childs et al. (2021)).

Proposition 13 (Trotter-LCU formula by nested-commutator compensation for general Hamiltonians).

Proof.

Appendix F SOME USEFUL FORMULAS IN THE PROOF

Lemma 5 (Tail bound for the Poisson distribution (Theorem 1 in Canonne (2016))).

Corollary 1.

Proof.

Corollary 2.

Proof.

Lemma 6 (Proposition 9 in Wan et al. (2022)).

Lemma 7 (Theorem 2.7 in Hoorfar and Hassani (2008)).

Appendix G ADDITIONAL NUMERICAL RESULTS

G.1 Generic LL-sparse Hamiltonians

G.2 Lattice Hamiltonians

Appendix H COHERENT IMPLEMENTATION

References

Theorem 1 (Gate complexity of the $K$ th-order random-sampling Trotter-LCU algorithm by paired Taylor-series compensation).

Theorem 2 (Gate complexity of the $K$ th-order random-sampling Trotter-LCU algorithm by nested-commutator compensation for lattice Hamiltonians).

Proposition 10 ( $2k$ th-order Trotter-LCU formula by paired Taylor-series compensation).

G.1 Generic $L$ -sparse Hamiltonians