This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantum Pricing with a Smile: Implementation of Local Volatility Model on Quantum Computer

Kazuya Kaneko kazuya-kaneko@fintec.co.jp Mizuho-DL Financial Technology Co., Ltd.
2-4-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan
   Koichi Miyamoto koichi-miyamoto@fintec.co.jp Mizuho-DL Financial Technology Co., Ltd.
2-4-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan
   Naoyuki Takeda naoyuki-takeda@fintec.co.jp Mizuho-DL Financial Technology Co., Ltd.
2-4-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan
   Kazuyoshi Yoshino kazuyoshi-yoshino@fintec.co.jp Mizuho-DL Financial Technology Co., Ltd.
2-4-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan
(December 22, 2024)
Abstract

Applications of the quantum algorithm for Monte Carlo simulation to pricing of financial derivatives have been discussed in previous papers. However, up to now, the pricing model discussed in such papers is Black-Scholes model, which is important but simple. Therefore, it is motivating to consider how to implement more complex models used in practice in financial institutions. In this paper, we then consider the local volatility (LV) model, in which the volatility of the underlying asset price depends on the price and time. We present two types of implementation. One is the register-per-RN way, which is adopted in most of previous papers. In this way, each of random numbers (RNs) required to generate a path of the asset price is generated on a separated register, so the required qubit number increases in proportion to the number of RNs. The other is the PRN-on-a-register way, which is proposed in the author’s previous work. In this way, a sequence of pseudo-random numbers (PRNs) generated on a register is used to generate paths of the asset price, so the required qubit number is reduced with a trade-off against circuit depth. We present circuit diagrams for these two implementations in detail and estimate required resources: qubit number and T-count.

pacs:
Valid PACS appear here
preprint: APS/123-QED

I Introduction

With recent advances of quantum computing technologies, researchers are beginning considering how to utilize them in industries. One major target is finance (see Orus for a review). Since financial institutions are performing enormous tasks of numerical calculation in their daily works, speed-up of such tasks by quantum computer can bring significant benefits to them. One of such tasks is pricing of financial derivatives111As textbooks of financial derivatives and pricing of them, we refer to Hull ; Shreve1 ; Shreve2 . Financial derivatives, or simply derivatives, are contracts in which payoffs are determined in reference to the prices of underlying assets at some fixed times. Large banks typically have a huge number of derivatives written on various types of assets such as stock price, foreign exchange rate, interest rate, commodity and so on. Therefore, pricing of derivatives is an important issue for them.

In derivative pricing, we represent random movements of underlying asset prices using stochastic processes and calculate a derivative price as a expected value of the sum of payoffs discounted by the risk-free interest rate under some specific probability measure. In order to calculate the expected value, Monte Carlo simulation is often used. There are quantum algorithms for Monte Carlo simulationMontanaro ; Suzuki , which bring quadratic speed-up compared with that on classical computers and there already exists some works which discuss application of such quantum algorithms to derivative pricingRebentrost ; Stamatopoulos ; RamosCalderer . However, in order to bring benefits to practice in finance, previous works have some room to be extended. That is, previous works consider the Black-Scholes (BS) modelBlackScholes ; Merton . Although the BS model is the pioneering model for derivative pricing and still used in many situations in today’s financial firms, it is insufficient to consider only the BS model as an application target of Monte Carlo for practical business for some reasons. First, for various types of derivatives, market prices of derivatives are inconsistent with the BS model. This phenomenon is called volatility smile, which we will explain in Section II. In order to precisely price derivatives taking into account volatility smiles, financial firms often use models which have more degree of freedom than the BS models. Second, the BS model is so simple that analytic formulae are available for the price of some types of derivatives in the model. In such cases, Monte Carlo simulation is not necessary. Actually, banks use Monte Carlo simulation mainly for complex models which can take into account volatility smiles. Although it is the natural first step to consider Monte Carlo in the BS model, the above points motivate us to consider how to apply quantum algorithms for Monte Carlo to the advanced models.

In this paper, we will focus on one of the advanced models, that is, the local volatility (LV) modelDupire . The LV model, which we will describe later, is the model in which a volatility of an asset price depends on the price itself and time, so the BS model is included in this category as a special case. With degrees of freedom to adjust the function form of volatility, the LV model can make derivative prices consistent with volatility smiles. So this model is widely used to price derivatives, especially exotic derivatives, which have complex transaction terms such as early redemption, in many banks.

In order to price a derivative by Monte Carlo simulation, we generate paths, that is, random trajectories of time evolution of asset prices, then calculate the expectation value of the sum of discounted payoffs which arise in each path. Since we cannot generate continuous paths on computers, we usually consider evolutions on a discretized time grid, using a random number (RN) for each time step, which represents the stochastic evolution in the step. In this paper, we focus on how to implement such a time evolution in the LV model on quantum computers.

We can consider two ways to implement the time evolution. In this paper, we call them the register-per-RN way and the PRN-on-a-register way. The difference between them is how to generate RNs required to generate a path. The register-per-RN is adopted in previous papersRebentrost ; Stamatopoulos ; RamosCalderer . In this way, following the procedure described in, e.g., Grover , one creates a superposition of bit strings which correspond to binary representations of possible values of a RN, where the probability amplitude of a each bit string is the square root of the possibility that the RN take the corresponding value. The point is that one register is used for one RN, so the required qubit number is proportional to the number of RNs used to generate a path. This can be problematic in terms of qubit number when many RNs are required. The number of RNs is equal to that of time steps times and that of underlying assets222In this paper, we consider arbitrage-free and complete markets, standard assumptions for derivative pricing, so the number of stochastic factors is equal to that of assets. For details, see Shreve1 ; Shreve2 .. The number of time steps can be large for derivatives with long maturity. Maturity can be as long as 30 years, so if we take time grid points monthly, the total number of time steps is 360. The number of underlying assets can be multiple, and furthermore, there are situations where we must simultaneously consider assets concerning different derivative contracts in a portfolio, for example, XVA333XVA is the term which collectively represents various types of Value Adjustment on derivative prices, for example, credit value adjustment (CVA), price subtraction taking into account the default of the counterparty. In this paper, we ignore such technical issues. If you are interested, see Gregory .. Assuming that the number of asset is 𝒪(10)\mathcal{O}(10) and that of time steps is 𝒪(100)\mathcal{O}(100), the total number of RNs becomes 𝒪(103104)\mathcal{O}(10^{3}-10^{4}). If we use a register with 𝒪(10)\mathcal{O}(10) qubits for each RN, the total qubit number can be 𝒪(105)\mathcal{O}(10^{5}) easily. The current state-of-art quantum computers have only 𝒪(10)\mathcal{O}(10) qubitsArute . Even if we obtain large-scale fault-tolerant machines in the future, the large qubit overhead might be required to make a logical bit (see Campbell as a review and references therein). Therefore, calculations which require the large number of qubits as above might be prohibitive even in the future. This situation is similar to credit portfolio risk measurement, which is described in Miyamoto .

We are then motivated to consider the PRN-on-a-register way, which is proposed in Miyamoto . In this way, one does not create RNs on different register, but generates a sequence of pseudo-random number (PRN) on a register. At each time step, the PRN sequence is progressed and its value is used to evolve the asset price. Therefore, the required qubit number does not depend on the number of RNs and is largely reduced. The drawback is the circuit depth. Here, we define the circuit depth as the number of layers consisting of gates on distinct qubits that can be performed simultaneously, as T-depth used in Amy2 ; Selinger . Since calculations to update the PRN is sequentially performed on a register, the circuit depth is now proportional to the number of RNs. Since in the fault-tolerant computation some kinds of gates, for example T-gates in the Clifford+T gate set, can take long time to be runBravyi ; Fowler , the sequential run of such gates might be also prohibitive in terms of calculation timeEgger . At any rate, in the current stage, where it is difficult to foresee the spec of future quantum computers, we believe that it is meaningful to consider the implementation which saves qubits but consumes depth as a limit.

When it comes to the LV model, the PRN-on-a-register way becomes more motivating, since its disadvantage on the circuit depth compared with the register-per-RN way is alleviated. In the LV model, the volatility varies over time steps depending on the asset price, so the calculation for the time evolution is necessarily stepwise444In the multi-asset case, parallel computing over assets is possible in the register-per-RN way.. Therefore, the PRN-on-a-register way and the register-per-RN way are equivalent with respect to this point, that is, the circuit depth is proportional to the time step number in both ways. This is different from the situation in credit portfolio risk management Egger , where, in the register-per-RN way, a register is assigned to each random number which determine whether each obligor defaults or not and parallel processing on different registers reduces circuit depth.

In this paper, we design the quantum circuits in the above two way in the level of elementary arithmetic. In doing so, we follow the policies of the two ways to the extent possible. That is, not only with respect to RNs but also in other aspects, we try to reduce qubits accepting some additional procedures in the PRN-on-a-register way, and vice versa in the register-per-RN way. For example, in the PRN-on-a-register way, we have to intermediately output the information of the volatility used to evolve the asset price at each time step and clear it by the next step. Otherwise, we need a register to hold the information per step and the required qubit number becomes proportional to the number of time steps. It is nontrivial to implement such a procedure and we will present how to do this later. Note that such clearing procedure is unnecessary in the register-per-RN way.

We then estimate the resources to implement the proposed circuits. We focus on two metrics: qubit number and T-count. As mentioned above, we see that the qubit number in the PRN-on-a-register way is independent from the time step number and much less than the register-per-RN way. The T-count is proportional to the time step number in the both ways. We see that in some specific setting the both ways yield the T-counts of same order of magnitude, except that in the PRN-on-a-register way is larger by some 𝒪(1)\mathcal{O}(1) factor.

The rest of this paper is organized as follows. Section II and III are preliminary sections, the former and the latter briefly explain the LV model and the quantum algorithm for Monte Carlo simulation, respectively. In section IV, we present the circuit diagram in the two way. In section V, we estimate qubit number and T-count of the proposed circuits. Section VI gives a summary.

II LV model

In this paper, we consider only the single-asset case, since it is straightforward to extend the discussion in this paper to the multi-asset case.

II.1 pricing of derivatives

Consider a party A involved in a derivative contract written on some asset. We let StS_{t} denote a stochastic process which represents the asset price at time tt, which is set as t=0t=0 at the present. We assume that the payoffs arise at the multiple times tipay,i=1,2,t^{\rm pay}_{i},i=1,2,... and the ii-th payoff is given by fipay(Stipay)f^{\rm pay}_{i}\left(S_{t^{\rm pay}_{i}}\right), where fipayf^{\rm pay}_{i} is some function which maps \mathbb{R} to \mathbb{R}. The positive payoff means that A receives a money from the counterparty and the negative one means vice versa. For example, the case where A buys an European call option with the strike KK corresponds to

f1pay(St1pay)=max{St1payK,0}f^{\rm pay}_{1}(S_{t^{\rm pay}_{1}})=\max\left\{S_{t^{\rm pay}_{1}}-K,0\right\} (1)

with a single payment date t1payt^{\rm pay}_{1}. Note that this type of derivative contract is too simple to cover all trades in financial markets. For example, callable contracts, in which either of the parties has a right to terminate the contract at some times, are widely dealt in markets. We leave studies for such exotic derivatives for future works and, in this paper, concider only those which can be expressed in the above form.

Following the theory of arbitrage-free pricing, the price VV of the contract for A is given as follows Shreve1 ; Shreve2 :

V=E[ifipay(Stipay)],V=E\left[\sum_{i}f^{\rm pay}_{i}\left(S_{t^{\rm pay}_{i}}\right)\right], (2)

where E[]E[\cdot] represents the expectation value under some probability measure, the so-called risk-neutral measure. Here and hereafter, we assume that the risk-free interest rate is 0 for simplicity.

II.2 LV model

In the LV model, the evolution of the asset price is modeled by the following stochastic differential equation (SDE)

dSt=σ(t,St)dWtdS_{t}=\sigma(t,S_{t})dW_{t} (3)

in the risk-neutral measure555Note that the drift term does not exist since we are now assuming the risk-free rate is 0. WtW_{t} is the Wiener process which drives StS_{t}. dXtdX_{t} is the increment of a stochastic process XtX_{t} over an infinitesimal time interval dtdt. The deterministic function σ:[0,)[0,)\sigma:[0,\infty)\otimes\mathbb{R}\rightarrow[0,\infty) is the local volatility. Note that the BS model corresponds to the case where

σ(t,S)=σBSS,\sigma(t,S)=\sigma_{\rm BS}S, (4)

where σBS\sigma_{\rm BS} is a positive constant, which we hereafter call a BS volatility.

The LV model was proposed to explain volatility smile. In order to describe this, let us define implied volatility first. In the BS model, a price of a European call option with strike KK and maturity TT at t=0t=0 is given by the following formula:

Vcall,BS(T,K,S0,σBS)\displaystyle V_{\rm call,BS}(T,K,S_{0},\sigma_{\rm BS}) =\displaystyle= ΦSN(d1)S0ΦSN(d2)K\displaystyle\Phi_{\rm SN}(d_{1})S_{0}-\Phi_{\rm SN}(d_{2})K
d1\displaystyle d_{1} =\displaystyle= 1σBST[ln(S0K)+12σBS2T]\displaystyle\frac{1}{\sigma_{\rm BS}\sqrt{T}}\left[\ln\left(\frac{S_{0}}{K}\right)+\frac{1}{2}\sigma_{\rm BS}^{2}T\right]
d2\displaystyle d_{2} =\displaystyle= d1σBST,\displaystyle d_{1}-\sigma_{\rm BS}\sqrt{T}, (5)

where ΦSN\Phi_{\rm SN} is the cumulative distribution function of the standard normal distribution. We can price the option if we determine the BS volatility. Conversely, given the market price of the option Vcall,mkt(T,K)V_{\rm call,mkt}(T,K), we can reversely calculate the BS volatility. That is, we can define the following function of KK and TT:

σIV\displaystyle\sigma_{\rm IV} :\displaystyle: (T,K)\displaystyle(T,K)\mapsto
σIV(T,K)s.t.Vcall,BS(T,K,S0,σIV(T,K))=Vcall,mkt(T,K).\displaystyle\sigma_{\rm IV}(T,K)\ s.t.\ V_{\rm call,BS}(T,K,S_{0},\sigma_{\rm IV}(T,K))=V_{\rm call,mkt}(T,K).

We call BS volatilities drawn back from the market option prices by (LABEL:eq:IV) as implied volatilities.

If the market is described well by the BS model, implied volatilities σIV(T,K)\sigma_{\rm IV}(T,K) take a same value for any KK and TT. Although this is the case for some markets, σIV(T,K)\sigma_{\rm IV}(T,K) varies depending on KK and TT in many markets. Especially, if σIV(T,K)\sigma_{\rm IV}(T,K) depends on KK, it is said that we observe the volatility smile for the market.

Volatility smiles mean that possible scenarios of asset price evolution in the BS model do not match those which market participants consider. For example, if market participants think that extreme scenarios, big crashes or sharp rises, are more possible than the BS model predicts, the volatility smile arises. In fact, it is often said that the Black Monday, the big crash in the stock markets at 1987, was one of triggers of appearance of volatility smiles.

With the LV model, we can make European option prices given by the model consistent with any market prices, as long as there is no arbitrage in the market. This is intuitively apparent since we can expect that the degree of freedom of the local volatility σ(t,S)\sigma(t,S) as a two-dimensional function is available to reproduce the two-dimensional function Vcall,mkt(T,K)V_{\rm call,mkt}(T,K). In fact, if we can get Vcall,mkt(T,K)V_{\rm call,mkt}(T,K) as a function, that is, the market option prices for continuously infinite strikes and maturities, we can determine σ(T,K)\sigma(T,K) which reproduces Vcall,mkt(T,K)V_{\rm call,mkt}(T,K) as followsDupire :

σ2(T,K)=2TVcall,mkt(T,K)2K2Vcall,mkt(T,K).\sigma^{2}(T,K)=2\frac{\frac{\partial}{\partial T}V_{\rm call,mkt}(T,K)}{\frac{\partial^{2}}{\partial K^{2}}V_{\rm call,mkt}(T,K)}. (7)

In reality, the market option prices are available only for several strikes and maturities. Therefore, in the practical business, we usually use a specific functional form of σ(t,S)\sigma(t,S) with degrees of freedom sufficient to reproduce several available market option prices. In this paper, we use the following form. First, we set the ntn_{t} grid points in the time axis, t0=0<t1<t2<<tntt_{0}=0<t_{1}<t_{2}<...<t_{n_{t}}. Second, we set the nSn_{S} grid points in the asset price axis for each time grid point, that is, si,1,,si,nSs_{i,1},...,s_{i,n_{S}} for tit_{i}. Then, σ(t,S)\sigma(t,S) is set as follows:

σ(t,S)=ai,jS+bi,j;forsi,j1S<si,j,j=1,,nS+1\sigma(t,S)=a_{i,j}S+b_{i,j}\ ;{\rm for}\ s_{i,j-1}\leq S<s_{i,j},j=1,...,n_{S}+1 (8)

for ti1t<tit_{i-1}\leq t<t_{i}, where ai,j,bi,ja_{i,j},b_{i,j} are constants satisfying σ(t,S)>0\sigma(t,S)>0 for any tt and SS and si,0=,si,nS+1=+s_{i,0}=-\infty,s_{i,n_{S}+1}=+\infty. In other words, the two-dimensional space of (t,S)(t,S) is divided in the direction of tt and in each region σ(t,S)\sigma(t,S) is set to a function which is piecewise-linear with respect to SS. In this paper, we assume that ai,j,bi,ja_{i,j},b_{i,j} are preset to the value for which the option prices in the LV model, which we here write as Vcall,LV(T,K,S0,{ai,j},{bi,j})V_{\rm call,LV}(T,K,S_{0},\{a_{i,j}\},\{b_{i,j}\}), match the market prices by some standard. For example, they can be set to

argmin{ai,j},{bi,j}I[Vcall,LV(TI,KI,S0,{ai,j},{bi,j})Vcall,mkt(TI,KI)]2,\mathop{\rm argmin}\limits_{\{a_{i,j}\},\{b_{i,j}\}}\sum_{I}{\left[V_{\rm call,LV}(T_{I},K_{I},S_{0},\{a_{i,j}\},\{b_{i,j}\})-V_{\rm call,mkt}(T_{I},K_{I})\right]^{2}}, (9)

where (TI,KI)(T_{I},K_{I})’s are several sets of maturity and strike for which the market option prices are available.

II.3 Monte Carlo simulation in the LV model

We here describe how to calculate the derivative price (2) in the LV model by Monte Carlo simulation.

First, we have to discretize the time into sufficiently small meshes, since we can deal with the continuous time on neither classical nor quantum computers. For simplicity, we set the time grid points to the above tit_{i}’s, those for the LV function. Then, the time evolution given by (3) is approximated as follows:

ΔSti:=Sti+1Stiσ(ti,Sti)Δtiwi,Δti=ti+1ti,\Delta S_{t_{i}}:=S_{t_{i+1}}-S_{t_{i}}\approx\sigma(t_{i},S_{t_{i}})\sqrt{\Delta t_{i}}w_{i},\Delta t_{i}=t_{i+1}-t_{i}, (10)

where w1,,wntw_{1},...,w_{n_{t}} are independent standard normal random numbers (SNRNs). Among various ways to discretize the SDE, we here adopt the Euler-Maruyama method Maruyama .

Even after time discretization, we cannot consider all of continuously infinite patterns of SNRNs. One solution for this is discretized approximation of SNRNs. We can choose the finite numbers of the grid points and assign probability to each point so that standard normal distribution is approximately reproduced. Now, the patterns of discretized SNRNs are finite, so we can approximate (2) as

Vnpnifipay(Stipay(n)),V\approx\sum_{n}p_{n}\sum_{i}f^{\rm pay}_{i}\left(S^{(n)}_{t^{\rm pay}_{i}}\right), (11)

where pnp_{n} is the probability that the nn-th pattern of values of SNRNs are realized and St(n)S^{(n)}_{t} is the asset price at time tt in the nn-th pattern.

There are some possible ways to take petterns considered in (11). In the register-per-RN way, we take all patterns. If we take NN grids to discretize each of ntn_{t} SNRNs, the number of possible patterns of SNRNs is NntN^{n_{t}}. Although this is exponentially large, quantum computers can take into account all patterns with a polynomial number of qubits by quantum superposition.

On the other hand, this cannot be adopted on classical computers, since the number of the SNRN patterns are exponentially large. Usually, Monte Carlo pricing on classical computers is done in the following way, which the PRN-on-a-register way is also based on. We consider sampled patterns of SNRNs. That is, we generate finite but sufficiently many sample sets of (w1,,wnt)(w_{1},...,w_{n_{t}}) and use them to generate sample paths of the asset price which evolves according to (10). We then approximate (2) by the average of sums of payoffs in sample paths,

V1Npathn=1Npathifipay(Stipay(n)),V\approx\frac{1}{N_{\rm path}}\sum_{n=1}^{N_{\rm path}}\sum_{i}f^{\rm pay}_{i}\left(S^{(n)}_{t^{\rm pay}_{i}}\right), (12)

where St(n)S^{(n)}_{t} is the value of the asset price at time tt on the nn-th sample path and NpathN_{\rm path} is the number of sample paths.

III Quantum Algorithm for Monte Carlo Simulation

III.1 outline of the algorithm

We here review the quantum algorithm for Monte Carlo simulationMontanaro ; Suzuki . It can be divided into the following steps. First, we create a superposition of possible values of a random number used to calculate a sample value of the integrand on a register. If multiple random numbers are necessary to calculate the integrand, one register is assigned per random number. As mentioned above, continuous random numbers must be approximated in some discretized way. Second, we calculate the integrand into another register, using the random numbers. Note that the results for many patterns of random numbers are simultaneously calculated in quantum parallelism. Third, by controlled rotation, the integrand value is reflected into the amplitude of the ancilla. Finally, amplitude estimation Bassard ; Suzuki ; Nakaji on the ancilla gives the expectation value of the integrand.

The quantum state is transformed as follows:

|0|0|0\displaystyle\ket{0}\ket{0}\ket{0} (13)
\displaystyle\rightarrow (ipi|xi)|0|0\displaystyle\left(\sum_{i}{\sqrt{p_{i}}\ket{x_{i}}}\right)\ket{0}\ket{0}
\displaystyle\rightarrow (ipi|xi|f(xi))|0\displaystyle\left(\sum_{i}{\sqrt{p_{i}}\ket{x_{i}}}\ket{f(x_{i})}\right)\ket{0}
\displaystyle\rightarrow ipi|xi|f(xi)(1f(xi)|0+f(xi)|1)\displaystyle\sum_{i}{\sqrt{p_{i}}\ket{x_{i}}}\ket{f(x_{i})}\left(\sqrt{1-f(x_{i})}\ket{0}+\sqrt{f(x_{i})}\ket{1}\right)
.

Here, the first, second and third kets correspond to the random number registers, the integrand register and the ancilla, respectively. xix_{i} represents the binary representation of values of random numbers in the ii-th pattern and pip_{i} is the probability that it realizes. ff is the integrand and f(xi)f(x_{i}) is its value for xix_{i}. Note that the probability to observe 1 on the ancilla is ipif(xi)\sum_{i}{p_{i}f(x_{i})}, the integral value which we want. Although we do not explain how to estimate the probability amplitude in this paper, it is studied in many papers. For example, see Bassard ; Suzuki ; Nakaji . Using such methods, we can estimate the integral with the statistical error which decays as 𝒪(N1)\mathcal{O}(N^{-1}), where NN is the number of oracle calls. This decay rate is quadratically faster than that in the classical algorithm, 𝒪(N1/2)\mathcal{O}(N^{-1/2}).

III.2 the scheme using the PRN generator

We here briefly review the quantum way for Monte Calro simulation using the PRN generator. The calculation flow for the current problem, the time evolution of asset price in the LV model, based on this way is described in Section IV.2.

It is proposed in Miyamoto in order to reduce the required qubits to generate RNs in the application of the quantum algorithm for Monte Carlo to extremely high-dimensional integrations. When it is neccesary to generate many RNs to compute the integrand, the naive way, in which we assign a register to each RN and create a superposition of possible values, leads to the increase of qubit numbers in proportion to the number of RNs. In order to aviod this, we can adopt the following way. First, we prepare two registers, RsampR_{\rm samp} and RPRNR_{\rm PRN}. Then, we create a superposition of integers, which specify the start point of the PRN sequence, on RsampR_{\rm samp}. With the start point, we sequentially generate PRNs on RPRNR_{\rm PRN}. This is possible because a PRN sequence is a deterministic sequence whose recursion equation is explicitely given, and in Miyamoto we gave the implementation of one of PRN generators on quantum circuits. Using the PRNs, we compute the integrand step by step, which corresponds to time evolution of the asset price and calculation of payoffs in this paper. Finally, the expectation value of the integrand is estimated by quantum amplitude estimation. In this way, since we need only RsampR_{\rm samp} and RPRNR_{\rm PRN} to generate PRNs, the required qubit number is now independent from the number of RNs and much smaller than the naive way. The drawback is the increase of the circuit depth.

IV Circuit Design

Now, we present quantum circuits for time evolution of an asset price in the LV model in the two ways: PRN-on-a-register and register-per-RN.

IV.1 elementary gate

Before we present circuits, we here list up elementary gates we use.

  • Adder: |x|y|x+y|y\ket{x}\ket{y}\rightarrow\ket{x+y}\ket{y}

  • Controlled Adder:
    |c|x|y{|c|x+y|y;forc=1|c|x|y;forc=0\ket{c}\ket{x}\ket{y}\rightarrow\begin{cases}\ket{c}\ket{x+y}\ket{y}\ ;\ {\rm for}\ c=1\\ \ket{c}\ket{x}\ket{y}\ ;\ {\rm for}\ c=0\end{cases}

  • Multiplier: |x|y|z|x|y|z+xy\ket{x}\ket{y}\ket{z}\rightarrow\ket{x}\ket{y}\ket{z+xy}

  • Divider: |x|y|0|x|y|x/y\ket{x}\ket{y}\ket{0}\rightarrow\ket{x}\ket{y}\ket{x/y}

We here simply assume their existence. Actually, implementation of such elementary arithmetics are widely studied in previous works: see, for example, Vedral ; Beckman ; Draper ; Cuccaro ; Takahashi ; VanMeter ; Draper2 ; Takahashi2 ; Portugal ; AlvarezSanchez ; Takahashi3 ; Thapliyal ; Thapliyal2 ; Lin ; Babu ; Jayashree ; MunozCoreas ; Khosropour ; Jamal ; Dibbo ; Thapliyal3 . We will discuss the implementation in Section V.1.

With these, we can construct other types of arithmetic we use. For example, subtraction |x|y|xy|y\ket{x}\ket{y}\rightarrow\ket{x-y}\ket{y} can be done as addition by the method of 2’s complement. Comparison |x|y|z|x|y|z(x>y)\ket{x}\ket{y}\ket{z}\rightarrow\ket{x}\ket{y}\ket{z\oplus(x>y)} can be done as subtraction in 2’s complement method, since the most significant bit represents whether the result of subtraction is positive or negative. So a comparator is constructed as two adder, including uncomputation.

Note that the above multiplier uses two registers as operands and outputs the product into another register. Most of previously proposed multipliers are of this output-to-other type. On the other hand, we also need the self-update type of multiplier which updates either of input registers with the product, otherwise we have to add a register for each multiplication and qubit number explodes. Such a operation is realized by the following trick. When we want to multiply xx by yy, given the two registers holding xx and yy and an ancilla register, we can do:

|x|y|0|x|y|xy|xy|y|x|xy|y|0.\ket{x}\ket{y}\ket{0}\rightarrow\ket{x}\ket{y}\ket{xy}\rightarrow\ket{xy}\ket{y}\ket{x}\rightarrow\ket{xy}\ket{y}\ket{0}. (14)

Here, the first step is output-to-other multiplication. The second step is swap between the first and third registers, which is not necessary if we change our recognition on which of two register is ancillary at every multiplication. The third step is the inverse operation of division.

IV.2 the PRN-on-a-register way

IV.2.1 calculation flow

We first present the calculation flow in the PRN-on-a-register way. We consider the flow until calculation of the sum of payoffs, which corresponds to from the first to the third line in (13), since the controlled rotation in the fourth line does not depend on the problem.

In the PRN-on-a-register way, PRNs are used for evolution of the asset price (10). More concretely, we preselect some sequence of pseudo standard normal random numbers (PSNRNs) and divide it into subsequences, then evolve the asset price using them.

Before we present the calculation flow, we explain some setups. We prepare the following register:

  • RsampR_{\rm samp}

    This is a register where a superposition of integers which determine the start point of the PSNRN sequence. We write its qubit number as nsampn_{\rm samp}. Nsamp=2nsampN_{\rm samp}=2^{n_{\rm samp}} is the number of sample paths we generate.

  • RWR_{W}

    This is a register where we sequentially generate PSNRNs.

  • RSR_{S}

    This is a register where the value of the asset price is stored and which we update for each time step of the evolution, using RWR_{W}.

  • RpayoffR_{\rm payoff}

    This is a register into which the payoffs determined by RSR_{S} are added.

Note that we need some ancillary registers in addition to the above registers. We assume that the required calculation precision is ndign_{\rm dig}-bit accuracy and RW,RS,RpayoffR_{W},R_{S},R_{\rm payoff} and ancillary registers necessary to update them have ndign_{\rm dig} qubits.

We assume that the following gates are available to generate a sequence of PSNRNs.

  • PWP_{W}

    This progresses a PSNRN sequence by one step. In other words, it acts on RWR_{W} and updates xix_{i} to xi+1x_{i+1}, where xix_{i} is the ii-th element of the sequence: |xi|xi+1\ket{x_{i}}\rightarrow\ket{x_{i+1}}.

  • JWJ_{W}

    This lets the PSNRN sequence jump to the starting point. That is, it is input an integer ii on a register and outputs xint+1x_{in_{t}+1} into another register which is initially set to |0\ket{0}: |i|0|i|xint+1\ket{i}\ket{0}\rightarrow\ket{i}\ket{x_{in_{t}+1}}. Remember that ntn_{t}, the number of time steps, is equal to the number of RNs used to generate one sample path.

The concrete implementation of these gates are discussed later.

Then, the calculation flow is as follows:

  1. 1.

    Initialize all registers to |0\ket{0} except RSR_{S}, which is initialized to |St0\ket{S_{t_{0}}}.

  2. 2.

    Generate a equiprobable superposition of |0,|1,,|Nsamp1\ket{0},\ket{1},...,\ket{N_{\rm{samp}}-1}, that is, 1Nsampi=0Nsamp1|inPRN\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1}{\ket{i}_{n_{\rm{PRN}}}} on RsampR_{\rm samp}. This is done by operating a Hadamard gate to each of nsampn_{\rm{samp}} qubits.

  3. 3.

    Operate JWJ_{W} to set xint+1x_{in_{t}+1} to RWR_{W}, where ii is determined by the state of RsampR_{\rm{samp}}. These are the starting points of subsequences.

  4. 4.

    Perform the time evolution (10) using the value on RWR_{W}. RSR_{S} is updated from |St0\ket{S_{t_{0}}} to |St1\ket{S_{t_{1}}}.

  5. 5.

    Calculate the payoff at time t1t_{1} and add into RpayoffR_{\rm payoff}.

  6. 6.

    Operate PPRNP_{\rm{PRN}} to update RWR_{W} from xint+1x_{in_{t}+1} to xint+2x_{in_{t}+2}.

  7. 7.

    Iterate operations similar to 4-6 for each time steps until the time reaches tntt_{n_{t}}.

  8. 8.

    Finally we obtain a superposition of states in which the value on RpayoffR_{\rm payoff} is the sum of payoffs in each sample path. Estimate the expectation value of RpayoffR_{\rm payoff} by methods like Bassard ; Suzuki . This is an estimate for (12).

Here and hereafter, we assume that a payoff arises at each time step, for simplicity.

The flow of state transformation is as follows:

|0|0|St0|0\displaystyle\ket{0}\ket{0}\ket{S_{t_{0}}}\ket{0}
2\displaystyle\xrightarrow{2} 1Nsampi=0Nsamp1\displaystyle\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1} |i|0|St0|0\displaystyle{\ket{i}}\ket{0}\ket{S_{t_{0}}}\ket{0}
3\displaystyle\xrightarrow{3} 1Nsampi=0Nsamp1\displaystyle\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1} |i|xint+1|St0|0\displaystyle{\ket{i}}\ket{x_{in_{t}+1}}\ket{S_{t_{0}}}\ket{0}
4\displaystyle\xrightarrow{4} 1Nsampi=0Nsamp1\displaystyle\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1} |i|xint+1|St1(i)|0\displaystyle{\ket{i}}\ket{x_{in_{t}+1}}\ket{S^{(i)}_{t_{1}}}\ket{0}
5\displaystyle\xrightarrow{5} 1Nsampi=0Nsamp1\displaystyle\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1} |i|xint+1|St1(i)|j=11fjpay(Stj(i))\displaystyle{\ket{i}}\ket{x_{in_{t}+1}}\ket{S^{(i)}_{t_{1}}}\ket{\sum_{j=1}^{1}{f^{\rm pay}_{j}(S^{(i)}_{t_{j}})}}
6\displaystyle\xrightarrow{6} 1Nsampi=0Nsamp1\displaystyle\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1} |i|xint+2|St1(i)|j=11fjpay(Stj(i))\displaystyle{\ket{i}}\ket{x_{in_{t}+2}}\ket{S^{(i)}_{t_{1}}}\ket{\sum_{j=1}^{1}{f^{\rm pay}_{j}(S^{(i)}_{t_{j}})}}
7\displaystyle\xrightarrow{7} \displaystyle...\qquad\quad\quad\quad
7\displaystyle\xrightarrow{7} 1Nsampi=0Nsamp1\displaystyle\frac{1}{\sqrt{N_{\rm{\rm{samp}}}}}\sum_{i=0}^{N_{\rm{\rm{samp}}}-1} |i|xint+nt|Stnt(i)|j=1ntfjpay(Stj(i)),\displaystyle{\ket{i}}\ket{x_{in_{t}+n_{t}}}\ket{S^{(i)}_{t_{n_{t}}}}\ket{\sum_{j=1}^{n_{t}}{f^{\rm pay}_{j}(S^{(i)}_{t_{j}})}}, (15)

where the first, second, third and fourth kets correspond to Rsamp,RW,RSR_{\rm samp},R_{W},R_{S} and RpayoffR_{\rm payoff}, respectively.

IV.2.2 overview of the circuit

Schematically, the circuit which realizes the flow (15) is as shown in Figure 1. In the figure, the gate UjU_{j} corresponds to the jj-th step of asset price evolution, that is, the jj-th iteration of step 4-6 in the above calculation flow.

UjU_{j} is implemented as shown in Figure 2. PWP_{W} is already explained and the gate Payoffj{\rm Payoff}_{j} calculates fjpay(Stj(i))f^{\rm pay}_{j}(S^{(i)}_{t_{j}}) using RSR_{S} and adds it into RpayoffR_{\rm payoff}. In addition to these, UjU_{j} has gates V1(j),,VnS(j)V^{(j)}_{1},...,V^{(j)}_{n_{S}}, which update RSR_{S}.

The detail of Vk(j)V^{(j)}_{k} is shown in Figure 3. This gate (i) checks whether the asset price is in the kk-th interval [sj,k1,sj,k)[s_{j,k-1},s_{j,k}), (ii) if so, update RSR_{S} using the LV in the interval, (iii) clears the intermediate output. It requires ancillary registers Rcount,RSR_{\rm count},R_{S^{\prime}} and RgR_{g}. They have log2nt,ndig\lceil\log_{2}{n_{t}}\rceil,n_{\rm dig} and 1 qubits respectively. At the start of Vk(j)V^{(j)}_{k}, RcountR_{\rm count} takes |j\ket{j} or |j+1\ket{j+1} and the others take |0\ket{0}. Then the detailed calculation flow is:

  1. 1.

    If RcountR_{\rm count} is jj and RSR_{S} is in [sj,k1,sj,k)[s_{j,k-1},s_{j,k}), flip RgR_{g}.

  2. 2.

    If RgR_{g} is 1, update RSR_{S} as

    StjStj+1=Stj+(aj,kStj+bj,k)Δtjxint+jS_{t_{j}}\rightarrow S_{t_{j+1}}=S_{t_{j}}+(a_{j,k}S_{t_{j}}+b_{j,k})\sqrt{\Delta t_{j}}x_{in_{t}+j} (16)

    using the value xint+jx_{in_{t}+j} on RWR_{W} and add 1 to RcountR_{\rm count}.

  3. 3.

    Calculate

    Stj+1bj,kΔtjxint+j1+aj,kΔtjxint+j\frac{S_{t_{j+1}}-b_{j,k}\sqrt{\Delta t_{j}}x_{in_{t}+j}}{1+a_{j,k}\sqrt{\Delta t_{j}}x_{in_{t}+j}} (17)

    into RSR_{S^{\prime}}, using the value on RSR_{S} as Stj+1S_{t_{j+1}} and that on RWR_{W} as xint+jx_{in_{t}+j}.

  4. 4.

    If RcountR_{\rm count} is j+1j+1 and RSR_{S^{\prime}} is in [sj,k1,sj,k)[s_{j,k-1},s_{j,k}), flip RgR_{g}. This uncomputes RgR_{g}.

  5. 5.

    Do the inverse operation of 3.

Let us explain the meaning of this flow. First, RcountR_{\rm count} is necessary as an indicator of whether the jj-th step of evolution has been already done or not. Without this, it is possible that the asset price is doubly updated in a time step. If and only if the jj-th step has not been done, that is, RcountR_{\rm count} is jj and the asset price is in [sj,k1,sj,k)[s_{j,k-1},s_{j,k}), the update of the asset price with the LV function aj,kS+bj,ka_{j,k}S+b_{j,k} is done. To do this conditional update, the check result is intermediately output to RgR_{g} and the gate corresponding (16) is operated on RSR_{S} under control by RgR_{g}. Besides, the increment of RcountR_{\rm count} controlled by RgR_{g} is also done, so that RcountR_{\rm count} indicates completion of the jj-th step if so. Steps 3-5 is necessary to clear RgR_{g}. If the asset price has been updated in Step 2, Step 3 draws back it to the value before the update. Conversely, we can determine whether the update has been done in Step 2 from the result of Step 3. That is, for the reason mentioned soon later, the condition that RcountR_{\rm count} is j+1j+1 and RSR_{S^{\prime}} is in [sj,k1,sj,k)[s_{j,k-1},s_{j,k}) after Step 3 is equivalent to the condition that RcountR_{\rm count} is jj and RSR_{S} is in [sj,k1,sj,k)[s_{j,k-1},s_{j,k}) before Step 2. Therefore, Step 4 flip RgR_{g} if and only if it is |1\ket{1}, so it goes back to |0\ket{0}. In summary, through the sequential operation of V1(j),,VnS+1(j)V^{(j)}_{1},...,V^{(j)}_{n_{S}+1}, RSR_{S} is updated only once at the appropriate Vk(j)V^{(j)}_{k}, RcountR_{\rm count} is updated from |j\ket{j} to |j+1\ket{j+1} and all intermediate outputs on ancillary registers are cleared.

We here mention a restriction on the LV model so that it can be implemented in the PRN-on-a-register way. Note that through V1(j),,VnS+1(j)V^{(j)}_{1},...,V^{(j)}_{n_{S}+1}, the state is transformed from |j|Stj(i)\ket{j}\ket{S^{(i)}_{t_{j}}} to |j+1|Stj+1(i)\ket{j+1}\ket{S^{(i)}_{t_{j+1}}}, where the first and second kets correspond to RcountR_{\rm count} and RSR_{S} respectively and other registers are omitted since they are unchanged. This means that the map from Stj(i)S^{(i)}_{t_{j}} to Stj+1(i)S^{(i)}_{t_{j+1}} must be one-to-one correspondence, since unitarity is violated if not. Actually, this is not so strong restriction. As shown in Appendix, if we set ai,j,bi,ja_{i,j},b_{i,j} so that σ(t,S)\sigma(t,S) is continuous with respect to SS and we set Δtj\Delta t_{j} is small enough that the increment ΔStj\Delta S_{t_{j}} is much smaller than StjS_{t_{j}} itself, the above condition is satisfied.

This one-to-one correspondence lets Step 3 work. That is, since the map between Stj(i)S^{(i)}_{t_{j}} and Stj+1(i)S^{(i)}_{t_{j+1}} is one-to-one correspondence, the result of Step 3 is in [sj,k1,sj,k)[s_{j,k-1},s_{j,k}) if and only if the value on RSR_{S} before Step 3 is in the image of [sj,k1,sj,k)[s_{j,k-1},s_{j,k}) under the map.

IV.2.3 implementation of respective gates

We now consider how to implement respective gates in the PRN-on-a-register way to the level of four arithmetic operations.

(i) Vk(j)V^{(j)}_{k}

Note that most parts of Vk(j)V^{(j)}_{k} consist of only arithmetic operations, addition, subtraction, multiplication, division and comparison, which mentioned in Sec IV.1.

For example, the gate zz(x=jandyI)z\leftarrow z\oplus(x=j\ {\rm and}\ y\in I) can be divided to two parts. The first part is checking that the value on RcountR_{\rm count} is equal to jj and this can be done by the multiple control Toffoli gate, which is studied in Selinger ; Amy ; Maslov . The second part is checking that the asset price is in a given interval, which can be constructed from two comparisons. Combining these, the gate zz(x=jandyI)z\leftarrow z\oplus(x=j\ {\rm and}\ y\in I) in Figure 3 is constructed as shown in Figure 4. Note that the bitwise flips X1j0X1jnx1X^{1-j_{0}}\otimes...\otimes X^{1-j_{n_{x}-1}} are operated before the multi control Toffoli. Here, jaj_{a} is the aa-th digit of the binary representation of jj, so the aa-th qubit is flipped if and only if ja=0j_{a}=0. This convert |x\ket{x} to |1|1\ket{1}...\ket{1} if and only if x=jx=j.

The operation xx+(ax+b)yx\leftarrow x+(ax+b)y in Figure 3 can be realized as follows:

|x|y|0\displaystyle\ket{x}\ket{y}\ket{0} \displaystyle\rightarrow |x|y|1\displaystyle\ket{x}\ket{y}\ket{1} (18)
\displaystyle\rightarrow |x|y|1+ay\displaystyle\ket{x}\ket{y}\ket{1+ay}
\displaystyle\rightarrow |(1+ay)x|y|1+ay\displaystyle\ket{(1+ay)x}\ket{y}\ket{1+ay}
\displaystyle\rightarrow |(1+ay)x+by|y|1+ay\displaystyle\ket{(1+ay)x+by}\ket{y}\ket{1+ay}
\displaystyle\rightarrow |(1+ay)x+by|y|0,\displaystyle\ket{(1+ay)x+by}\ket{y}\ket{0},

where the third ket corresponds to an ancillary register. The first arrow is just setting a constant on a register. The second arrow is the multiplication by a constant aa. The third arrow is the self-update multiplication, so it needs another ancillary register. The fourth arrow is again multiplication by a constant bb and the final arrow is uncomputation of the first and second arrows. Note that this is done under control by RgR_{g}. In order for this to be controlled, it is sufficient to control only the second, fourth and final arrows, since the third arrow becomes a multiplication by 1 without the second. Also note that multiplication by a nn-bit constant aa or bb can be done by nn adders, that is, nn shift-and-add’s: ax=i=0n1ai2ixax=\sum_{i=0}^{n-1}{a_{i}2^{i}x}, where aia_{i} is the ii-th bit of aa. This saves qubits compared with the case where we use a multiplier, which needs holding aa on an ancillary register.

The operation x(xby)/(1+ay)x\leftarrow(x-by)/(1+ay) in Figure 3 is done as follows:

|x|y|0|0\displaystyle\ket{x}\ket{y}\ket{0}\ket{0} \displaystyle\rightarrow |x|y|1|0\displaystyle\ket{x}\ket{y}\ket{1}\ket{0} (19)
\displaystyle\rightarrow |x|y|1+ay|0\displaystyle\ket{x}\ket{y}\ket{1+ay}\ket{0}
\displaystyle\rightarrow |xby|y|1+ay|0\displaystyle\ket{x-by}\ket{y}\ket{1+ay}\ket{0}
\displaystyle\rightarrow |xby|y|1+ay|(xby)/(1+ay),\displaystyle\ket{x-by}\ket{y}\ket{1+ay}\ket{(x-by)/(1+ay)},

where the first, second, third and fourth kets are RS,RWR_{S},R_{W}, another ancillary register and RSR_{S^{\prime}}, respectively. Here, the first and second arrows are same as (18), the third is the multiplication by a constant b-b and the final one is division. Here, we do not have to uncompute RSR_{S} and the ancillary register, since the whole of this operation is uncomputed soon later in Vj,kV_{j,k}.

(ii) JW,PWJ_{W},P_{W}

In Miyamoto , implementation of PRN on quantum circuits is presented, using the PRN generator called PCGPCG . Note that this PRN generator generates uniform RNs. We now need PSNRNs, so we must transform uniform distribution to standard normal distribution. There are some method and we adopt the inverse transform sampling. Schematically, JWJ_{W} and PWP_{W} are implemented as shown in Figure 5. Here, JPRNJ_{PRN} is the gate to let the PRN sequence jump to the int+1in_{t}+1 and PPRNP_{PRN} is the gate to progress the PRN sequence by a step. They sequentially generate uniform RNs on the ancillary register RPRNR_{\rm PRN} and they are transformed to PSNRN on RWR_{W}.

Although we refer to Miyamoto for the detail of implementation of the PRN generator, we here briefly explain. This generator is combination of linear congruential generator (LCG) and permutation of bit string. For LCG, update of the PRN sequence is done by

xn+1=axn+cmodN,x_{n+1}=ax_{n}+c\ {\rm mod}\ N, (20)

where a,Na,N are positive integers and cc is a nonnegative integer, and the nn-th element of the sequence is computed from nn and the initial value x0x_{0} by

xn=anx0+c(an1)a1modN,.x_{n}=a^{n}x_{0}+\frac{c(a^{n}-1)}{a-1}\ {\rm mod}\ N,. (21)

According to Vedral , we can construct the modular adder using 5 plain adders. Modular multiplication by a nn-bit constant can be done as nn modular shift-and-add’s. Modular division by a constant a1a-1 can be done as modular multiplication by a constant integer β\beta such that β(a1)=1modN\beta(a-1)=1\ {\rm mod}\ N, if exists. Modular exponentiation axmodNa^{x}\ {\rm mod}\ N is computed as a sequence controlled modular multiplicationVedral . So, to summarize, we can perform (20) and (21) using only controlled adders. In (20), the state is transformed as

|x|0|0\displaystyle\ket{x}\ket{0}\ket{0} \displaystyle\rightarrow |x|axmodN|0\displaystyle\ket{x}\ket{ax\ {\rm mod}\ N}\ket{0} (22)
\displaystyle\rightarrow |0|axmodN|0\displaystyle\ket{0}\ket{ax\ {\rm mod}\ N}\ket{0}
\displaystyle\rightarrow |0|axmodN|c\displaystyle\ket{0}\ket{ax\ {\rm mod}\ N}\ket{c}
\displaystyle\rightarrow |0|ax+cmodN|c\displaystyle\ket{0}\ket{ax+c\ {\rm mod}\ N}\ket{c}
\displaystyle\rightarrow |0|ax+cmodN|0.\displaystyle\ket{0}\ket{ax+c\ {\rm mod}\ N}\ket{0}.

In the circuit we are considering, the first and second registers are RPRNR_{\rm PRN} and an ancillary register, which interchange their role at every step, and the third is another ancillary register. Each step corresponds to an elementary operation as follows. The first arrow is modular multiplication. The second arrow is the inverse modular multiplication by a integer α\alpha such that aα=1modNa\alpha=1\ {\rm mod}\ N and this clearing step is necessary to avoid increase of ancillas. The third is just loading cc on an ancillary register, the fourth is modular addition and the last is unloading. (21) progresses as follows:

|n|0|0|0\displaystyle\ket{n}\ket{0}\ket{0}\ket{0} (23)
\displaystyle\rightarrow |n|anmodN|0|0\displaystyle\ket{n}\ket{a^{n}\ {\rm mod}\ N}\ket{0}\ket{0}
\displaystyle\rightarrow |n|anmodN|(x0+ca1)anmodN|0\displaystyle\ket{n}\ket{a^{n}\ {\rm mod}\ N}\ket{\left(x_{0}+\frac{c}{a-1}\right)a^{n}\ {\rm mod}\ N}\ket{0}
\displaystyle\rightarrow |n|anmodN|(x0+ca1)anmodN|ca1\displaystyle\ket{n}\ket{a^{n}\ {\rm mod}\ N}\ket{\left(x_{0}+\frac{c}{a-1}\right)a^{n}\ {\rm mod}\ N}\ket{\frac{c}{a-1}}
\displaystyle\rightarrow |n|anmodN|(x0+ca1)anca1modN|ca1\displaystyle\ket{n}\ket{a^{n}\ {\rm mod}\ N}\ket{\left(x_{0}+\frac{c}{a-1}\right)a^{n}-\frac{c}{a-1}\ {\rm mod}\ N}\ket{\frac{c}{a-1}}
\displaystyle\rightarrow |n|0|(x0+ca1)anca1modN|0,\displaystyle\ket{n}\ket{0}\ket{\left(x_{0}+\frac{c}{a-1}\right)a^{n}-\frac{c}{a-1}\ {\rm mod}\ N}\ket{0},

where the first, third, second and fourth registers are respectively Rsamp,RPRNR_{\rm samp},R_{\rm PRN} and two ancillary registers. The first arrow is modular exponentiation, the second is modular multiplication, the third is loading, the fourth is modular addition and the last is uncomputation of the first and third.

We do not explain permutation: see Miyamoto for the detail. We just make a comment that it is implemented by a simple circuit, for example, Xorshift is implemented as a sequence of CNOT.

We also need the gate to calculate ΦSN1\Phi_{\rm SN}^{-1}, the inverse function of CDF of standard normal distribution. There are some ways to calculate this efficiently and we here adopt the method in Hormann . In the method, \mathbb{R} is divided into some intervals and ΦSN1\Phi_{\rm SN}^{-1} is approximated by a polynomial in each interval. We adopt the setting where the number of intervals is 109666if we include (,x0ICDF)(-\infty,x^{\rm ICDF}_{0}) and [xnICDFICDF,,)[x^{\rm ICDF}_{n^{\rm ICDF}},\infty,), it is 111 and polynomials are cubic, which realizes the error smaller than 10610^{-6}. Here, we write the approximated inverse CDF as

ΦSN1(x)cm,3x3+cm,2x2+cm,1x+cm,0\Phi_{\rm SN}^{-1}(x)\approx c_{m,3}x^{3}+c_{m,2}x^{2}+c_{m,1}x+c_{m,0} (24)

for xm1ICDFx<xmICDF,m=0,1,,nICDF+1x^{\rm ICDF}_{m-1}\leq x<x^{\rm ICDF}_{m},m=0,1,...,n_{\rm ICDF}+1, where x0ICDF<x1ICDF<<xnICDFICDFx^{\rm ICDF}_{0}<x^{\rm ICDF}_{1}<...<x^{\rm ICDF}_{n_{\rm ICDF}} are the end points of the intervals and nICDFn_{\rm ICDF} is the number of the interval. Consider that x1ICDF=,xnICDF+1ICDF=+x^{\rm ICDF}_{-1}=-\infty,x^{\rm ICDF}_{n_{\rm ICDF}+1}=+\infty.

Such a piecewise cubic function can be implemented as Figure 6. We here explain how it works. First, the sequence of comparators and “Load cm,isc_{m,i}^{\prime}s” gates load cm,0,,cm,3c_{m,0},...,c_{m,3} into the register Rc0,,Rc,3R_{c_{0}},...,R_{c,3} respectively as follows. The comparators compare the value xx of RPRNR_{\rm PRN} and the grid points xmICDFx^{\rm ICDF}_{m} and flip RgR_{g} if x<xmICDFx<x^{\rm ICDF}_{m}. If RgR_{g} is 1, the “Load cm,isc_{m,i}^{\prime}s” gates are activated. They are actually collections of bitwise flips, that is, XX gates. If xxnICDFICDFx\geq x^{\rm ICDF}_{n_{\rm ICDF}}, only “Load cnICDF+1,isc_{n_{\rm ICDF}+1,i}^{\prime}s” gate is performed and it loads cnICDF+1,0,,cnICDF+1,3c_{n_{\rm ICDF}+1,0},...,c_{n_{\rm ICDF}+1,3}. If xnICDF1ICDFx<xnICDFICDFx^{\rm ICDF}_{n_{\rm ICDF}-1}\leq x<x^{\rm ICDF}_{n_{\rm ICDF}}, “Load cnICDF,isc_{n_{\rm ICDF},i}^{\prime}s” and “Load cnICDF+1,isc_{n_{\rm ICDF}+1,i}^{\prime}s” are performed. So, we set “Load cnICDF,isc_{n_{\rm ICDF},i}^{\prime}s” so that it compensates flips done by “Load cnICDF+1,isc_{n_{\rm ICDF}+1,i}^{\prime}s” and cnICDF,0,,cnICDF,3c_{n_{\rm ICDF},0},...,c_{n_{\rm ICDF},3} are successfully loaded. The case where xx is in another interval is similar. The point is that if xM1ICDFx<xMICDFx^{\rm ICDF}_{M-1}\leq x<x^{\rm ICDF}_{M}, every other gates are activated. That is, the activated gates are “Load cm,isc_{m,i}^{\prime}s” of m=M,M+2,,nICDF,nICDF+1m=M,M+2,...,n_{\rm ICDF},n_{\rm ICDF}+1 if nICDFMn_{\rm ICDF}-M is even and m=M,M+2,,nICDF1,nICDF+1m=M,M+2,...,n_{\rm ICDF}-1,n_{\rm ICDF}+1 if nICDFMn_{\rm ICDF}-M is odd. This is because every comparator after the MM-th one flips RgR_{g} and RgR_{g} takes 0 and 1 alternatingly. Considering this, the XX gates in “Load cm,isc_{m,i}^{\prime}s” are set as Figure 7, so that cm,0,,cm,3c_{m,0},...,c_{m,3} for appropriate mm are loaded after the sequence of all activated gates. After load of coefficients, the cubic function is calculated in the Horner’s method

((cm,3x+cm,2)x+cm,1)x+cm,0.((c_{m,3}x+c_{m,2})x+c_{m,1})x+c_{m,0}. (25)

This is done by the sequence of adders and multipliers in the latter half of the circuit in Figure 6.

(iii) Payoff

In this paper, we do not consider gates to calculate payoffs in detail, since the resource the gates require is same in both the PRN-on-a-register way and the register-per-RN way. We here make just a short comment. In many cases, a payoff can be expressed in the following form:

fipay=min{max{aiSti+bi,fi},ci},f^{\rm pay}_{i}=\min\{\max\{a_{i}S_{t_{i}}+b_{i},f_{i}\},c_{i}\}, (26)

where ai,bi,ci,fia_{i},b_{i},c_{i},f_{i} are real constants, that is, a linear function of the asset price with the upper bound (cap) cic_{i} and the lower bound (floor) fif_{i}. For example, a payoff in an European call option (1) corresponds to ai=1,bi=K,ci=+,fi=0a_{i}=1,b_{i}=-K,c_{i}=+\infty,f_{i}=0. Payoffs expressed as (26) can be calculated easily by combination of comparators, adders and multipliers.

IV.3 the register-per-RN way

IV.3.1 calculation flow

Also for the register-per-RN way, we start from presenting the calculation flow, which is somewhat simpler than the PRN-on-a-register way. Again, we consider the flow until calculation of the payoff sum.

Before we present the calculation flow, we explain the required registers.

  • RWi,i=1,,ntR_{W_{i}},i=1,...,n_{t}

    This is a register for the ii-th SNRN. We need such a register per random number, so the total number is ntn_{t}.

  • RSi,i=0,1,,ntR_{S_{i}},i=0,1,...,n_{t}

    This is a register where the value of the asset price at time tit_{i} is held.

  • Rpayoff,i,i=1,,ntR_{{\rm payoff},i},i=1,...,n_{t}

    This is a register where the value of the sum of payoffs by tit_{i} is held.

We again omit ancillary registers here and explain them later. Besides, we again assume that these and ancillary registers necessary to update them have ndign_{\rm dig} qubits.

We here concretely define a superposition of SNRN values as the following state. In advance, we set the equally spaced NSN+1N_{\rm SN}+1 points for discretization of the distribution xSN,0<xSN,1<<xSN,NSNx_{{\rm SN},0}<x_{{\rm SN},1}<...<x_{{\rm SN},N_{\rm SN}}, where xSN,0x_{{\rm SN},0} and xSN,NSNx_{{\rm SN},N_{\rm SN}} are the upper and lower bounds of the distribution and set to, say, -4 and +4, respectively. We here assume NSN=2ndigN_{\rm SN}=2^{n_{\rm dig}} for simplicity. Then, we define |SN\ket{\rm SN} as

|SN=i=0NSN1pSN,i|i,\ket{\rm SN}=\sum_{i=0}^{N_{\rm SN}-1}{\sqrt{p_{{\rm SN},i}}\ket{i}}, (27)

where pSN,i=xSN,ixSN,i+1ϕSN(x)𝑑xp_{{\rm SN},i}=\int_{x_{{\rm SN},i}}^{x_{{\rm SN},i+1}}\phi_{\rm SN}(x)dx and ϕSN(x)\phi_{\rm SN}(x) is the probability density function of the standard normal distribution. We consider how to create such a state later. Since xSN,ix_{{\rm SN},i} can be easily calculated from the index ii by a linear function, we identify ii as xSN,ix_{{\rm SN},i}.

Then, the calculation flow is as follows:

  1. 1.

    Initialize all registers to |0\ket{0} except RS0R_{S_{0}}, which is initialized to |St0\ket{S_{t_{0}}}.

  2. 2.

    Generate superpositions of SNRNs on RW1,,RWntR_{W_{1}},...,R_{W_{n_{t}}}. That is, set each of them to |SN\ket{\rm SN}.

  3. 3.

    Perform the time evolution (10) using the value on RW1R_{W_{1}} as w1w_{1}. The result is output to RS1R_{S_{1}} as |St1\ket{S_{t_{1}}}.

  4. 4.

    Calculate the payoff at time t1t_{1} using RS1R_{S_{1}} and output the sum of it and the previous payoffs to Rpayoff,iR_{{\rm payoff},i}.

  5. 5.

    Iterate operations similar to 3-4 for each time step until the time reaches tntt_{n_{t}}.

  6. 6.

    Finally we obtain a superposition of states in which the value on Rpayoff,ntR_{{\rm payoff},n_{t}} is the sum of payoffs for each pattern of values of SNRNs. Estimate the expectation value of Rpayoff,ntR_{{\rm payoff},n_{t}} to get (11).

The flow of state transformation is as follows. Writing only RW1,,RWntR_{W_{1}},...,R_{W_{n_{t}}}, RS0,RS1,,RSntR_{S_{0}},R_{S_{1}},...,R_{S_{n_{t}}} and Rpayoff,1,,Rpayoff,ntR_{{\rm payoff},1},...,R_{{\rm payoff},n_{t}},

|0nt|St0|0nt|0nt\displaystyle\ket{0}^{\otimes n_{t}}\ket{S_{t_{0}}}\ket{0}^{\otimes n_{t}}\ket{0}^{\otimes n_{t}} (28)
2\displaystyle\xrightarrow{2} |SNnt|St0|0nt|0nt\displaystyle\ket{\rm SN}^{\otimes n_{t}}\ket{S_{t_{0}}}\ket{0}^{\otimes n_{t}}\ket{0}^{\otimes n_{t}}
3\displaystyle\xrightarrow{3} i1=0NSN1pSN,i1|i1|SNnt1|St0|St1(i1)|0nt1|0nt\displaystyle\sum_{i_{1}=0}^{N_{\rm SN}-1}\sqrt{p_{{\rm SN},i_{1}}}\ket{i_{1}}\ket{\rm SN}^{\otimes n_{t}-1}\ket{S_{t_{0}}}\ket{S^{(i_{1})}_{t_{1}}}\ket{0}^{\otimes n_{t}-1}\ket{0}^{\otimes n_{t}}
4\displaystyle\xrightarrow{4} i1=0NSN1pSN,i1|i1|SNnt1|St0|St1(i1)|0nt1|fi1pay(St1(i1))|0nt1\displaystyle\sum_{i_{1}=0}^{N_{\rm SN}-1}\sqrt{p_{{\rm SN},i_{1}}}\ket{i_{1}}\ket{\rm SN}^{\otimes n_{t}-1}\ket{S_{t_{0}}}\ket{S^{(i_{1})}_{t_{1}}}\ket{0}^{\otimes n_{t}-1}\ket{f^{\rm pay}_{i_{1}}(S^{(i_{1})}_{t_{1}})}\ket{0}^{\otimes n_{t}-1}
5\displaystyle\xrightarrow{5} \displaystyle...
5\displaystyle\xrightarrow{5} i1,,int=0NSN1pSN,i1pSN,int|i1|int|St0|St1(i1)|Stnt(i1int)|fi1pay(St1(i1))|j=1ntfjpay(Stj(i1ij)),\displaystyle\sum_{i_{1},\cdots,i_{n_{t}}=0}^{N_{\rm SN}-1}\sqrt{p_{{\rm SN},i_{1}}...p_{{\rm SN},i_{n_{t}}}}\ket{i_{1}}...\ket{i_{n_{t}}}\ket{S_{t_{0}}}\ket{S^{(i_{1})}_{t_{1}}}...\ket{S^{(i_{1}\cdots i_{n_{t}})}_{t_{n_{t}}}}\ket{f^{\rm pay}_{i_{1}}(S^{(i_{1})}_{t_{1}})}...\ket{\sum_{j=1}^{n_{t}}{f^{\rm pay}_{j}(S^{(i_{1}...i_{j})}_{t_{j}})}},

where Stj(i1ij)S^{(i_{1}...i_{j})}_{t_{j}} is the value of the asset price at time tjt_{j} evolved by w1=xSN,i1,,wj=xSN,ijw_{1}=x_{{\rm SN},i_{1}},...,w_{j}=x_{{\rm SN},i_{j}}.

IV.3.2 overview of the circuit

The outline of the circuit in the register-per-RN is as shown in Figure 8. First, |SN\ket{\rm SN} is created on each RWjR_{W_{j}} by the gate SN, which is considered in detail later. After that, the gate UjU_{j} performs jj-th step of asset price evolution and payoff calculation. For each evolution step, ancillary registers Rflg,jR_{{\rm flg},j} and RLV,jR_{{\rm LV},j}, which have 1 and 2ndig2n_{\rm dig} qubits respectively, are necessary. UjU_{j} is then implemented as Figure 9. In this gate, the sequence of comparators and “Load” gates set aj,k,bj,ka_{j,k},b_{j,k} in (8) into RLV,jR_{{\rm LV},j} by the trick similar to that in the circuit in Figure 6. Then, the operation xx+(ax+b)yx\leftarrow x+(ax+b)y updates the asset price according to (10). Since the register-per-RN way does not aim to uncompute ancillary qubits in order to reduce them, xx+(ax+b)yx\leftarrow x+(ax+b)y can be done as follows:

|x|y|a|b|0|0\displaystyle\ket{x}\ket{y}\ket{a}\ket{b}\ket{0}\ket{0} \displaystyle\rightarrow |x|y|a|b|0|x\displaystyle\ket{x}\ket{y}\ket{a}\ket{b}\ket{0}\ket{x} (29)
\displaystyle\rightarrow |x|y|a|b|xy|x\displaystyle\ket{x}\ket{y}\ket{a}\ket{b}\ket{xy}\ket{x}
\displaystyle\rightarrow |x|y|a|b|xy|x+axy\displaystyle\ket{x}\ket{y}\ket{a}\ket{b}\ket{xy}\ket{x+axy}
\displaystyle\rightarrow |x|y|a|b|xy|x+axy+by,\displaystyle\ket{x}\ket{y}\ket{a}\ket{b}\ket{xy}\ket{x+axy+by},

where the first ket is RSj1R_{S_{j-1}}, the second is RWjR_{W_{j}}, the third and fourth are RLV,jR_{{\rm LV},j}, the fifth is an ancillary register and the last is RSjR_{S_{j}}. So, this operation consists of copying a state and three multiplications. At the end of UjU_{j}, the payoff is calculated using RSjR_{S_{j}}. Here, the ”Payoffj{\rm Payoff}_{j}” gate performs the following operation

|Stj(i1ij)|k=1j1fkpay(Stk(i1ik))|0\displaystyle\ket{S^{(i_{1}\cdots i_{j})}_{t_{j}}}\ket{\sum_{k=1}^{j-1}{f^{\rm pay}_{k}(S^{(i_{1}...i_{k})}_{t_{k}})}}\ket{0} (30)
\displaystyle\rightarrow |Stj(i1ij)|k=1j1fkpay(Stk(i1ik))|k=1jfkpay(Stk(i1ik)),\displaystyle\ket{S^{(i_{1}\cdots i_{j})}_{t_{j}}}\ket{\sum_{k=1}^{j-1}{f^{\rm pay}_{k}(S^{(i_{1}...i_{k})}_{t_{k}})}}\ket{\sum_{k=1}^{j}{f^{\rm pay}_{k}(S^{(i_{1}...i_{k})}_{t_{k}})}},

where the first, second and third ket are RSjR_{S_{j}}, Rpayoff,j1R_{{\rm payoff},j-1} and Rpayoff,jR_{{\rm payoff},j}. This is done as copying Rpayoff,j1R_{{\rm payoff},j-1} to Rpayoff,jR_{{\rm payoff},j} followed by calculation and addition of fjpay(Stj(i1ij))f^{\rm pay}_{j}(S^{(i_{1}...i_{j})}_{t_{j}}) to Rpayoff,jR_{{\rm payoff},j}.

IV.3.3 implementation of the SN gate

Let us now consider implementation of the SN gate, which creates a superposition of SNRN values. The outline was presented in Grover . In addition to this, we here explain some details, which were not explicitely explained in Grover .

We construct the state in the recursive way. We assume that we have already divided [xSN,0,xSN,NSN][x_{{\rm SN},0},x_{{\rm SN},N_{\rm SN}}] by equally-spaced 2m+12^{m}+1 points xSN,0(m)=xSN,0<xSN,1(m)<<xSN,2m(m)=xSN,NSNx_{{\rm SN},0}^{(m)}=x_{{\rm SN},0}<x^{(m)}_{{\rm SN},1}<...<x^{(m)}_{{\rm SN},2^{m}}=x_{{\rm SN},N_{\rm SN}} and created

|SNm=i=02m1pSN,i(m)|i,\ket{{\rm SN}_{m}}=\sum_{i=0}^{2^{m}-1}{\sqrt{p^{(m)}_{{\rm SN},i}}\ket{i}}, (31)

where pSN,i(m)=xSN,i(m)xSN,i+1(m)ϕSN(x)𝑑xp^{(m)}_{{\rm SN},i}=\int_{x^{(m)}_{{\rm SN},i}}^{x^{(m)}_{{\rm SN},i+1}}\phi_{\rm SN}(x)dx. We also assume that we have a gate to efficiently compute θi(m)=arccosfi(m)\theta^{(m)}_{i}=\arccos\sqrt{f^{(m)}_{i}} with the input ii, where fi(m)f^{(m)}_{i} is

fi(m)=xSN,i(m)(xSN,i(m)+xSN,i+1(m))/2ϕSN(x)𝑑xxSN,i(m)xSN,i+1(m)ϕSN(x)𝑑x.f^{(m)}_{i}=\frac{\int_{x^{(m)}_{{\rm SN},i}}^{\left(x^{(m)}_{{\rm SN},i}+x^{(m)}_{{\rm SN},i+1}\right)/2}\phi_{\rm SN}(x)dx}{\int_{x^{(m)}_{{\rm SN},i}}^{x^{(m)}_{{\rm SN},i+1}}\phi_{\rm SN}(x)dx}. (32)

Then, the following state transformation is possible:

|SNm|0|0\displaystyle\ket{{\rm SN}_{m}}\ket{0}\ket{0} =\displaystyle= i=02m1pSN,i(m)|i|0|0\displaystyle\sum_{i=0}^{2^{m}-1}{\sqrt{p^{(m)}_{{\rm SN},i}}\ket{i}\ket{0}\ket{0}} (33)
\displaystyle\rightarrow i=02m1pSN,i(m)|i|0|θi(m)\displaystyle\sum_{i=0}^{2^{m}-1}{\sqrt{p^{(m)}_{{\rm SN},i}}\ket{i}\ket{0}\ket{\theta^{(m)}_{i}}}
\displaystyle\rightarrow i=02m1pSN,i(m)|i(cosθi(m)|0+sinθi(m)|1)|θi(m)\displaystyle\sum_{i=0}^{2^{m}-1}{\sqrt{p^{(m)}_{{\rm SN},i}}\ket{i}\left(\cos\theta^{(m)}_{i}\ket{0}+\sin\theta^{(m)}_{i}\ket{1}\right)\ket{\theta^{(m)}_{i}}}
=\displaystyle= i=02m+11pSN,i(m+1)|i|θi(m)\displaystyle\sum_{i=0}^{2^{m+1}-1}{\sqrt{p^{(m+1)}_{{\rm SN},i}}\ket{i}\ket{\theta^{(m)}_{i}}}
=\displaystyle= |SNm+1|0,\displaystyle\ket{{\rm SN}_{m+1}}\ket{0},

where we use the gate to compute θi(m)\theta^{(m)}_{i} at the first arrow and perform the controlled rotation at the second arrow. Repeating this until m=ndig1m=n_{\rm dig}-1, we get |SN\ket{\rm SN}.

However, as far as the authors know, neither Grover nor other papers present the gate to compute θi(m)\theta^{(m)}_{i}. Here, we propose a way to do this on the basis of a simple Taylor expansion. Consider

g(x,δ)=xx+δ/2ϕSN(x)𝑑xxx+δϕSN(x)𝑑x.g(x,\delta)=\frac{\int_{x}^{x+\delta/2}\phi_{\rm SN}(x)dx}{\int_{x}^{x+\delta}\phi_{\rm SN}(x)dx}. (34)

We can show that

g(x,δ)12+18δx+116δ2+𝒪(δ3)g(x,\delta)\approx\frac{1}{2}+\frac{1}{8}\delta x+\frac{1}{16}\delta^{2}+\mathcal{O}(\delta^{3}) (35)

by simple calculation. So, g(x,δ)g(x,\delta) is well-approximated by a linear function of xx for small δ\delta. We can use this to compute fi(m)f^{(m)}_{i}, since

fi(m)=g(xSN,i(m),Δ2m),Δ=xSN,NSNxSN,0.f^{(m)}_{i}=g\left(x^{(m)}_{{\rm SN},i},\frac{\Delta}{2^{m}}\right),\Delta=x_{{\rm SN},N_{\rm SN}}-x_{{\rm SN},0}. (36)

Given xSN,NSN,xSN,0x_{{\rm SN},N_{\rm SN}},x_{{\rm SN},0} and mm large enough that Δ/2m\Delta/2^{m} is sufficiently small, this can be approximately seen as a linear function of ii, since xSN,i(m)=xSN,0+Δ2mix^{(m)}_{{\rm SN},i}=x_{{\rm SN},0}+\frac{\Delta}{2^{m}}i. Actually, we numerically confirmed that for m7m\geq 7 the above approximation gives error smaller than 10510^{-5}.

We then reach the circuit in Figure 10 for calculation of fi(m)f^{(m)}_{i}. For m6m\leq 6, as shown in Figure 10, the sequence of comparators and “Load” gates set fi(m)f^{(m)}_{i} to the output register Rfi(m)R_{f^{(m)}_{i}}, using the trick similar to the circuit in Figure 6 again. Note that comparators now check whether the input value ii is equal to a given integer constant or not, so each of them is actually a combination of bitwise flips and a multi-controlled Toffoli gate, similar to that appeared in Figure 4. Also note that, if the input ii is II, “Load fi(m)f^{(m)}_{i}” gates are activated for iIi\geq I. Considering this point, each “Load” gate is set so that operations of it and the following gates are successfully compensated. For m7m\geq 7, the fi(m)f^{(m)}_{i} is just a linear transformation, which is implemented as bitwise flips followed by a constant multiplier. Of course, according to required accuracy, the value of mm where the two types of fi(m)f^{(m)}_{i} are switched should be adjusted and the degree of the Taylor approximation for g(x,δ)g(x,\delta) should be increased.

Using this fi(m)f^{(m)}_{i} gate, the SN gate is constructed as Figure 11. First, we operate a Hadamard gate to the most significant bit in RWjR_{W_{j}} to assign probability 1/2 to positive and negative halves of [xSN,0,xSN,NSN][x_{{\rm SN},0},x_{{\rm SN},N_{\rm SN}}] respectively. Then, we operate the sequence of the gates UmSNU^{\rm SN}_{m}, which corresponds to the mm-th step of the above recursive calculation. UmSNU^{\rm SN}_{m} is constructed as a combination of the fi(m)f^{(m)}_{i} gate, the gates to calculate square root and arc cosine and the controlled rotation gate R(θ)R(\theta).

We here comment on implementation of arccos and square root. The implementation of the inverse trigonometric function based on the piecewise polynomial approximation is proposed in Haner . Although Haner considers not arccos but arcsin, these can be easily related as arccos(x)=π2arcsin(x)\arccos(x)=\frac{\pi}{2}-\arcsin(x). We adopt a setting with the polynomial degree 3 and 2 intervals, which leads to accuracy 10510^{-5}Haner . The circuit for square root is given in MunozCoreas2 .

V Estimation of Required Resources

Then, let us estimate the machine resources required for the implementation in the PRN-on-a-register way and the register-per-RN way. We consider the two metrics, qubit number and T-count, as many papers do.

V.1 elementary gates

Table 1: Resources for various elementary gates. We here assume that operands are nn-bit. We omit subleading terms with respect to nn.
Gate Qubits T-count Reference
Total Operand Output Ancilla
Adder 2n2n 2n2n 0 (self-update) 0 14n14n Thapliyal ; Thapliyal2 ; Thapliyal3
Ctrl Adder 2n2n 2n2n 0 (self-update) 0 21n21n MunozCoreas ; Thapliyal3
Modular Adder777Since we can construct the modular adder using 5 plain addersVedral , we use 5 times the values of the adder for T-count. 2n2n 2n2n 0 (self-update) 0 70n70n Thapliyal ; Thapliyal2 ; Thapliyal3 ; Vedral
Multiplier 3n3n 2n2n nn 0 21n221n^{2} MunozCoreas
Divider 5n5n 2n2n nn 2n2n 35n235n^{2} Thapliyal3
Multi Ctrl Toffoli 2n2n nn 1 nn 8n8n Selinger ; Maslov
Square Root888The circuit given in MunozCoreas2 takes a nn-bit input and returns the n/2n/2-bit square root and the n/2n/2-bit remainder. In order to keep nn-bit accuracy, we add nn 0’s to the input and calculate the nn-bit square root of the virtual 2n2n-bit input. We treat the nn bits added to the input and the nn bits where the remainder is output as ancillas. 4n4n nn nn 2n2n 14n214n^{2} MunozCoreas2
arccos 105 3.4×1043.4\times 10^{4} 999Since the value shown in Haner is Toffoli count, we simply multiply it by 7 for converting it to T-count. Haner
controlled rotation
(with accuracy of 2n2^{-n})
2 (control and target) 3n3n Egger ; Kliuchnikov ; Amy

We first summarize the resources of the elementary gates which are necessary to construct the LV circuit. We here consider fixed-point arithmetic. Among various implementations proposed previously, we adopt one for each of the elementary gates and summarize their resources when operands are nn-bit in Table LABEL:tbl:resElem. Since we aim to estimate the orders of the above metrics, we take only the leading term with respect to nn for required resources. For example, we approximate an+ban+b as anan..

We make a comment on multiplication and the division. For these operation, we use modified versions of circuits proposed in MunozCoreas and Thapliyal3 . This is because of the following reason. In order to store the strict value of the product of two nn-bit numbers, original circuits use 2n2n-bits. This causes a problem in the situation where we have to sequentially perform multiplication as in the LV circuit, since the required qubit number doubles at every multiplication. Therefore, we have to truncate lower bits of product and keep the digit number constant. In order to do this, we modify the multiplier in MunozCoreas . We also construct the divider, which is dedicated to drawback of the modified multiplication. This is why the qubit number for divider in Table LABEL:tbl:resElem is different from that in MunozCoreas ; Thapliyal3 . We explain the details of the modified multiplier and divider in Appendix.

V.2 assumptions on registers

We assume the following points on qubit numbers of various registers, some of which have already been mentioned.

  • Registers which store numerical numbers, RW,RS,Rpayoff,RLV,jR_{W},R_{S},R_{\rm payoff},R_{{\rm LV},j} etc., and ancillary registers concerning them have ndign_{\rm dig} qubits. This is determined according to the required accuracy. We hereafter set ndig=16n_{\rm dig}=16.

  • RPRNR_{\rm PRN} in Figure 5 exceptionally has nPRNn_{\rm PRN} qubits, since this is set to so large a value that the PRN sequence has good statistical property, e.g. long period. PCG considers the PRN generators with 64 bits and we also use this value for nPRNn_{\rm PRN} hereafter. Ancillary registers necessary for calculation of PRN sequence have nPRNn_{\rm PRN} qubits too. Even if nPRN>ndign_{\rm PRN}>n_{\rm dig}, we use only ndign_{\rm dig} qubits in RPRNR_{\rm PRN} for transformation to PSNRN.

  • RsampR_{\rm samp} has nsampn_{\rm samp} qubits.

  • Other registers, e.g. RcountR_{\rm count}, have only several qubits and we neglect their contributions to the whole qubit number.

V.3 the PRN-on-a-register way

Then, let us consider the required resources in the PRN-on-a-register way.

V.3.1 qubit number

Table 2: Qubits necessary in each step in the PRN-on-a-register circuit. We neglect registers with only several qubits.
Part Register Qubit Note
Whole RsampR_{\rm samp} nsampn_{\rm samp}
RSR_{S} ndign_{\rm dig}
RpayoffR_{\rm payoff} ndign_{\rm dig}
RPRNR_{\rm PRN} nPRNn_{\rm PRN}
JPRNJ_{\rm PRN} ancilla 2nPRN2n_{\rm PRN} To hold intermediate outputs; see (21)
ΦSN1\Phi_{\rm SN}^{-1} RWR_{W} ndign_{\rm dig}
ancilla 6ndig6n_{\rm dig} To hold the coefficients of the polynomial and the intermediate outputs; see Figure 6
Vk(j)V^{(j)}_{k} RWR_{W} ndign_{\rm dig}
RSR_{S^{\prime}} ndign_{\rm dig}
ancilla 4ndig4n_{\rm dig} For xx+(ax+b)yx\leftarrow x+(ax+b)y and zz+xby1+ayz\leftarrow\frac{z+x-by}{1+ay}; see the comment in the body text.
PPRNP_{\rm PRN} ancilla 2nPRN2n_{\rm PRN} To hold intermediate outputs; see (22).

In Table LABEL:tbl:qubit, we summarize qubits necessary in each step in the circuit. Registers which hold some values throughout the circuit are as follows: Rsamp,RS,RpayoffR_{\rm samp},R_{S},R_{\rm payoff} ans RPRNR_{\rm PRN}. Except these, the following parts in the circuit can consume qubit number most heavily.

  • JPRNJ_{\rm PRN} and PPRNP_{\rm PRN}: 2nPRN2n_{\rm PRN} qubits

  • ΦSN1\Phi_{\rm SN}^{-1}: 7ndig7n_{\rm dig} qubits

Therefore, the total qubit number required in the PRN-on-a-register way is roughly

nsamp+2ndig+nPRN+max{2nPRN,7ndig}n_{\rm samp}+2n_{\rm dig}+n_{\rm PRN}+\max\{2n_{\rm PRN},7n_{\rm dig}\} (37)

Let us comment on some technical points for obtaining Table LABEL:tbl:qubit. We first make a supplementary explanation on the ancillary qubit number in Vk(j)V^{(j)}_{k}. There are two parts which require ancillas in Vk(j)V^{(j)}_{k}. First, xx+(ax+b)yx\leftarrow x+(ax+b)y needs the following ancillas: a ndign_{\rm dig}-bit register to which 1+ay1+ay is output, a ndign_{\rm dig}-bit register to which the result is temporally output in the self-update multiplication and a 2ndig2n_{\rm dig}-bit register necessary for the inverse division to clear the input xx. Second, zz+xby1+ayz\leftarrow\frac{z+x-by}{1+ay} needs the following: a ndign_{\rm dig}-bit register to which 1+ay1+ay is output and a 2ndig2n_{\rm dig}-bit register necessary for division. In total, 4ndig4n_{\rm dig} bits are sufficient101010Strictly speaking, comparisons between RSR_{S} or RSR_{S^{\prime}} and sj,ks_{j,k}’s require loading sj,ks_{j,k}’s into some register. This does not require another register, since at least one of ancillary registers used in xx+(ax+b)yx\leftarrow x+(ax+b)y and zz+xby1+ayz\leftarrow\frac{z+x-by}{1+ay} is empty at loading. .

We also comment on the ancilla number in ΦSN1\Phi_{\rm SN}^{-1}. As we can see from Figure 6, we need four registers to which coefficients are loaded and two registers for intermediate outputs. Therefore, 6ndig6n_{\rm dig} ancillas are necessary111111Although we also need a register to which xmICDFx^{\rm ICDF}_{m}’s are loaded at comparisons between them and RPRNR_{\rm PRN}, we can use RWR_{W} or intermediate output registers, which are empty at comparisons.

V.3.2 T-count

Since we are interested in only the leading contribution, we focus on multiplications, divisions and repeated additions. Besides, we do not consider the T-count of JWJ_{W}, which is used only once. For the parts in UjU_{j}, which is used repeatedly, we specify T-counts as follows:

  1. 1.

    Vk(j)V^{(j)}_{k}
    One Vk(j)V^{(j)}_{k} includes the following parts:

    • xx+(ax+b)yx\leftarrow x+(ax+b)y
      As we can see in (18), this includes one multiplication and one division, which come from one self-update multiplication, and 3ndig3n_{\rm dig} controlled additions, which comes from two controlled multiplications by constant and one inverse. In total, the T-count is 119ndig2119n_{\rm dig}^{2}.

    • zz+xby1+ayz\leftarrow\frac{z+x-by}{1+ay}
      As we can see in (19), this includes one division and 2ndig2n_{\rm dig} additions, which comes from two multiplications by constant. In total, the T-count is 63ndig263n_{\rm dig}^{2}.

    • Uncomputation of zz+xby1+ayz\leftarrow\frac{z+x-by}{1+ay}
      Similar to the above.

    Therefore, the total T-count in one Vk(j)V^{(j)}_{k} is 245ndig2245n_{\rm dig}^{2}. Since Vk(j)V^{(j)}_{k} is used nS+1n_{S}+1 times, the total T-count in them is 245ndig2nS245n_{\rm dig}^{2}n_{S} (only the leading term).

  2. 2.

    PPRNP_{\rm PRN}
    This includes two modular multiplications by constant, which comes from one self-update modular multiplication. These are decomposed into 2nPRN2n_{\rm PRN} modular additions. So the T-count is roughly 140nPRN2140n_{\rm PRN}^{2}.

  3. 3.

    ΦSN1\Phi_{\rm SN}^{-1} and its inverse
    Each of them includes 2(nICDF+1)2(n_{\rm ICDF}+1) additions (nICDF+1n_{\rm ICDF}+1 comparisons) and five multiplications. So the T-count for each is roughly 105ndig2+28ndignICDF105n_{\rm dig}^{2}+28n_{\rm dig}n_{\rm ICDF}.

Summing up these and considering UjU_{j} is used in ntn_{t} times, the T-count in the whole circuit is roughly

(245ndig2nS+140nPRN2+210ndig2+56ndignICDF)nt.(245n_{\rm dig}^{2}n_{S}+140n_{\rm PRN}^{2}+210n_{\rm dig}^{2}+56n_{\rm dig}n_{\rm ICDF})n_{t}. (38)

V.4 the register-per-RN way

Next, we consider the required resources in the register-per-RN way.

V.4.1 qubit number

Table 3: Qubits added in each time step in the register-per-RN circuit. We neglect registers with only several qubits. We only take the leading contributions.
Register Qubit Note
RSiR_{S_{i}} ndign_{\rm dig}
RWiR_{W_{i}} ndign_{\rm dig}
Rpayoff,iR_{{\rm payoff},i} ndign_{\rm dig}
RLV,iR_{{\rm LV},i} 2ndig2n_{\rm dig}
ancilla used for xx+(ax+b)yx\leftarrow x+(ax+b)y in UtiU_{t_{i}} ndign_{\rm dig} See (29)
ancilla for UmSN,m=1,,ndig1U^{\rm SN}_{m},m=1,...,n_{\rm dig}-1 output fi(m)f^{(m)}_{i} ndig2n_{\rm dig}^{2} ndign_{\rm dig} for one UmSNU^{\rm SN}_{m}
ancilla for SQRT 2ndig22n_{\rm dig}^{2} 2ndig2n_{\rm dig} for one UmSNU^{\rm SN}_{m}
qubits used for arccos
(input, output and intermediate output)
105ndig105n_{\rm dig} 105 for one UmSNU^{\rm SN}_{m}

In the register-per-RN way, registers shown in Table LABEL:tbl:qubitRN are added per time step. Note that we do not uncompute ancillas. Summing up all registers, the qubit number necessary for one time step is roughly 3ndig2+111ndig3n_{\rm dig}^{2}+111n_{\rm dig}, and for the entire circuit it is

(3ndig2+111ndig)nt.(3n_{\rm dig}^{2}+111n_{\rm dig})n_{t}. (39)

Note that the dominant part comes from the iterative calculation in the SN gates, which prepare superpositions of the values of the SNRNs.

V.4.2 T-count

Again, we focus on operations with large T-count. For each part in the circuit, we estimate the T-count as follows:

  1. 1.

    SN gate
    The mm-th iteration UmSNU^{\rm SN}_{m} in the SN gate includes the following parts:

    • square root, arccos, controlled rotation
      T-counts are 14ndig14n_{\rm dig}, 3.4×1043.4\times 10^{4} and 3ndig3n_{\rm dig}, respectively.

    • fi(m)f^{(m)}_{i}
      For 2m62\leq m\leq 6, we use 2m2^{m} mm-controlled Toffoli gates to check the value on RWiR_{W_{i}} and load fi(m)f^{(m)}_{i} which corresponds to the value. T-count for this is 2m(8m9)2^{m}(8m-9)121212Here, we use 8m98m-9, the accurate value of T-count of the mm-controlled Toffoli gatesSelinger ; Maslov , since the approximation as 8m8m is too crude for small mm.. Summing this for m=2,,6m=2,...,6 leads to about 4000. Since this is much smaller than T-count for arccos in one iteration, we neglect this. For m7m\geq 7, we do multiplication between a mm-bit variable and a ndign_{\rm dig}-bit constant, which is decomposed ndign_{\rm dig} additions of mm-bit. Then, T-count is 14mndig14mn_{\rm dig}.

    Summing up these and taking only dominant contributions, one SN gate has T-count of (7ndig2+3.4×104)ndig(7n_{\rm dig}^{2}+3.4\times 10^{4})n_{\rm dig} roughly.

  2. 2.

    UjU_{j}
    This includes 2nS2n_{S} additions (nSn_{S} comparisons) and three multiplications. So one UjU_{j} gates has T-count of 63ndig2+28nSndig63n_{\rm dig}^{2}+28n_{S}n_{\rm dig} roughly.

In total, we can estimate T-count of the entire circuit in the register-per-RN way as

(7ndig2+63ndig+28nS+3.4×104)ndignt.(7n_{\rm dig}^{2}+63n_{\rm dig}+28n_{S}+3.4\times 10^{4})n_{\rm dig}n_{t}. (40)

V.5 comparison between two ways

Table 4: Qubit number and T-count required in the PRN-on-a-register and register-per-RN ways.
PRN-on-a-register register-per-RN
qubit nsamp+2ndig+nPRN+max{2nPRN,7ndig}n_{\rm samp}+2n_{\rm dig}+n_{\rm PRN}+\max\{2n_{\rm PRN},7n_{\rm dig}\} (3ndig2+111ndig)nt(3n_{\rm dig}^{2}+111n_{\rm dig})n_{t}
T-count (245ndig2nS+140nPRN2+210ndig2+56ndignICDF)nt(245n_{\rm dig}^{2}n_{S}+140n_{\rm PRN}^{2}+210n_{\rm dig}^{2}+56n_{\rm dig}n_{\rm ICDF})n_{t} (7ndig2+63ndig+28nS+3.4×104)ndignt(7n_{\rm dig}^{2}+63n_{\rm dig}+28n_{S}+3.4\times 10^{4})n_{\rm dig}n_{t}
Table 5: Required resources to implement the PRN-on-a-register and register-per-RN ways in the valid case (41). The following values are obtained from Table LABEL:tbl:ressum.
PRN-on-a-register register-per-RN
qubit 2.4×1022.4\times 10^{2} 9.2×1059.2\times 10^{5}
T-count 3.7×1083.7\times 10^{8} 2.1×1082.1\times 10^{8}

We then compare resources necessary in two ways in Table LABEL:tbl:ressum. Naturally, qubit number is independent from ntn_{t} in the PRN-on-a-register way but proportional to ntn_{t} in the register-per-RN way and T-count is proportional to ntn_{t} in the both ways. If we take a setting, which is typically necessary for practical use in derivative pricing131313nsamp=16n_{\rm samp}=16 corresponds to 65536 sample paths. :

nsamp\displaystyle n_{\rm samp} =\displaystyle= 16,\displaystyle 16,
ndig\displaystyle n_{\rm dig} =\displaystyle= 16,\displaystyle 16,
nPRN\displaystyle n_{\rm PRN} =\displaystyle= 64,\displaystyle 64,
nICDF\displaystyle n_{\rm ICDF} =\displaystyle= 109,\displaystyle 109,
nt\displaystyle n_{t} =\displaystyle= 360,\displaystyle 360,
nS\displaystyle n_{S} =\displaystyle= 5\displaystyle 5 (41)

the values in Table LABEL:tbl:ressum becomes as Table LABEL:tbl:ressumspec. The total T-count is of same order of magnitude in the both way but larger for the PRN-on-a-register way by a factor 2 roughly. We here comment on the parts which consume T-count most heavily in each way. In the PRN-on-a-register way, there are two parts which contribute to T-count equally and dominantly. The first is the update of the asset price in Vk(j)V^{(j)}_{k}. Note that additional operations for reduction of qubits, such as inverse division in self-update multiplication and drawing back the asset price to clear RgR_{g}, increase T-count compared with the register-per-RN way. The second is modular multiplications in update of the PRN sequence. Since the PRN generator requires the large bit number, say nPRN=64n_{\rm PRN}=64, in order to it keep good statistical properties such as long period, the T-count of operations for the PRN becomes large. On the other hand, in the register-per-RN way the dominant contribution to T-count comes from arccos’s in preparing SNRNs. Because not only an arccos itself is T-count consuming but also it is used in each iteration in the SN gate, the total T-count piles up.

VI Summary

In this paper, we considered how to implement time evolution of the asset price in the LV model on quantum computers. Similar to other problems in finance, derivative pricing by Monte Carlo simulation requires a large number of random numbers, which is proportional to ntn_{t}, the number of time steps for asset price evolution, and this may cause difficulty in implementation. We then considered two ways of implementation: the PRN-on-a-register way and the register-per-RN way. In the former we sequentially generate pseudo random numbers on a register and use them to evolve the asset price. In the latter, standard normal random numbers necessary to time evolution are created as superpositions on separate registers. For both ways, we present the concrete quantum circuits in detail. For not only random number generation but also other aspects, we try to save qubit numbers permitting some additional procedures in the PRN-on-a-register way and do opposite in the register-per-RN way. We then give estimations of qubit number and T-count required in each way. In the PRN-on-a-register way, qubit number is kept constant against ntn_{t}. On the other hand, in the register-per-RN way qubit number is proportional to ntn_{t}. Each way has T-count consuming parts and the total T-counts for both ways are of same order of magnitude, expect the PRN-on-a-register way has the larger T-count by a factor about 2, in some specific setting.

Note that analyses of resources required for implementation of the LV model in this paper strongly depend on designs of elementary circuits for arithmetic. For example, in the register-per-RN way the dominant contribution to T-count comes from arccos’s in preparing SNRNs. If more efficient circuits are proposed and we replace the current choice with them, required resources may change.

Finally, we would like to notice that this study is not enough for application of quantum algorithm for Monte Carlo simulation to pricing in the LV model. Although we assumed that the LV function is given, in practice we have to calibrate the LV so that the model prices of European options fit to the market prices. Besides, we have not considered how to evaluate terms in exotic derivatives, for example, early exercise. In future works, we will consider such things and aim to present how to apply quantum computers in the whole process of exotic derivative pricing.

Appendix A Sufficient condition on σ(t,S)\sigma(t,S) in the PRN-on-a-resigter way

We here show that σ(t,S)\sigma(t,S) which is continuous with respect to SS and takes the form of (8) and sufficiently small Δtj\Delta t_{j} lead to one-to-one correspondence between Stj(i)S^{(i)}_{t_{j}} and Stj+1(i)S^{(i)}_{t_{j+1}}. We see Stj+1(i)S^{(i)}_{t_{j+1}} as a function of Stj(i)S^{(i)}_{t_{j}} and define a function ff by

f(S)=S+σ(tj,S)Δtjwjf(S)=S+\sigma(t_{j},S)\sqrt{\Delta t_{j}}w_{j} (42)

for fixed wiw_{i} so that Stj+1(i)=f(Stj(i))S^{(i)}_{t_{j+1}}=f(S^{(i)}_{t_{j}}) holds. Except for the grid points S=sj,0sj,nSS=s_{j,0}...s_{j,n_{S}}, f(S)f(S) is differentiable and

f(S)=1+aj,kΔtjwj;forsj,k1<S<sj,k,k=0,,nS+1.f^{\prime}(S)=1+a_{j,k}\sqrt{\Delta t_{j}}w_{j};\ {\rm for}\ s_{j,k-1}<S<s_{j,k},k=0,...,n_{S}+1. (43)

Since wjw_{j} is bounded in numerical computation, f(S)f^{\prime}(S) is always positive expect the grid points for sufficiently small Δtj\Delta t_{j}. Besides, if σ(tj,S)\sigma(t_{j},S) is continuous with respect to SS also at the grid points, so is f(S)f(S). Combining these, we find that f(S)f(S) is strictly increasing for small Δtj\Delta t_{j}. Thus, the correspondence between Stj(i)S^{(i)}_{t_{j}} and Stj+1(i)S^{(i)}_{t_{j+1}} is one-to-one.

Small Δtj\Delta t_{j} is required from another perspective, too. The original dynamics of the asset price is given as the continuous-time evolution (3) and (10) is the discretized approximation of it. Therefore, in order for this approximation to be accurate, the increment of asset price should be sufficiently smaller than the asset price itself: |σ(tj,Stj(i))Δtjwj||Stj(i)|\left|\sigma(t_{j},S^{(i)}_{t_{j}})\sqrt{\Delta t_{j}}w_{j}\right|\ll\left|S^{(i)}_{t_{j}}\right|. If this condition is satisfied, |aj,kΔtjwj|1\left|a_{j,k}\sqrt{\Delta t_{j}}w_{j}\right|\ll 1, so f(S)>0f^{\prime}(S)>0 is met.

Appendix B Truncated multiplier and divider

We here describe the modified version of multiplier and divider. We assume that we consider the fixed-point arithmetic with nintn_{\rm int} bits in the integer part and nfracn_{\rm frac} bits in the fractional part, n=nint+nfracn=n_{\rm int}+n_{\rm frac} bits in total. We hereafter call such numbers (nint,nfrac)(n_{\rm int},n_{\rm frac})-bit numbers. We want to keep this digit setting before and after multiplication. In order to do this, we adopt the following policy.

  • We simply truncate the digits lower than the nfracn_{\rm frac}-th fractional digit in the product. This might cause numerical errors around and the nfracn_{\rm frac}-th fractional digit and such a tiny error might accumulate, but we simply neglect this concern.

  • We assume the overflow from the nintn_{\rm int}-bit integer part never occurs.

We then approximate the product as follows. Writing a number xx in binary representation as xnint1xnint2x0.x1xnfracx_{n_{\rm int}-1}x_{n_{\rm int}-2}...x_{0}.x_{-1}...x_{-n_{\rm frac}}, where xix_{i} is the ii-th integer digit of xx and xjx_{-j} is the jj-th fractional digit of xx, we do

xy=i=nfracnint1xi2iyfnfrac,nint,ymul(x):=i=nfracnint1xi2iy~i,xy=\sum_{i=-n_{\rm frac}}^{n_{\rm int}-1}{x_{i}2^{i}y}\approx f^{\rm mul}_{n_{\rm frac},n_{\rm int},y}(x):=\sum_{i=-n_{\rm frac}}^{n_{\rm int}-1}{x_{i}2^{i}\tilde{y}_{i}}, (44)

where

y~i={ynint1y0.y1y(nfraci);fori<0y;fori0.\tilde{y}_{i}=\begin{cases}y_{n_{\rm int}-1}...y_{0}.y_{-1}...y_{-(n_{\rm frac}-i)}\ ;\ {\rm for}\ i<0\\ y\ ;\ {\rm for}\ i\geq 0\end{cases}. (45)

This truncated multiplication is implemented as a circuit in Figure 12. Note that the circuit in Figure 12 actually calculates not fnfrac,nint,ymul(x)f^{\rm mul}_{n_{\rm frac},n_{\rm int},y}(x) but

i=0nint1xi2i(ynint1iy0.y1ynfrac)+j=1nfracxj2j(ynint1y0.y1y(nfracj)).\sum_{i=0}^{n_{\rm int}-1}{x_{i}2^{i}(y_{n_{\rm int}-1-i}...y_{0}.y_{-1}...y_{-n_{\rm frac}})}+\sum_{j=1}^{n_{\rm frac}}{x_{-j}2^{-j}(y_{n_{\rm int}-1}...y_{0}.y_{-1}...y_{-(n_{\rm frac}-j)})}. (46)

This is equal to fnfrac,nint,ymul(x)f^{\rm mul}_{n_{\rm frac},n_{\rm int},y}(x) if our assumption that the overflow from the nn-bit integer part never occurs is satisfied.

We define the truncated division as the inverse of the truncated multiplication: z/yfnfrac,nint,ydiv(z):=(fnfrac,nint,ymul)1(z)z/y\approx f^{\rm div}_{n_{\rm frac},n_{\rm int},y}(z):=\left(f^{\rm mul}_{n_{\rm frac},n_{\rm int},y}\right)^{-1}(z). Given two (nint,nfrac)(n_{\rm int},n_{\rm frac})-bit numbers y,zy,z which satisfies z=fnfrac,nint,ymul(x)z=f^{\rm mul}_{n_{\rm frac},n_{\rm int},y}(x), we can find the (nint,nfrac)(n_{\rm int},n_{\rm frac})-bit number xx as follows:

  1. 1.

    Set i=nint1i=n_{\rm int}-1, d=zd=z and x=0x=0.

  2. 2.

    Update dd with d2iy~id-2^{i}\tilde{y}_{i}

  3. 3.

    If d<0d<0, update dd with d+2iy~id+2^{i}\tilde{y}_{i} (dd returns to the value before step 2) and set xi=0x_{i}=0. If d0d\geq 0, set xi=1x_{i}=1.

  4. 4.

    Decrement ii by 1.

  5. 5.

    Repeat step 2-4 until ii becomes nfrac1-n_{\rm frac}-1 at step 4.

  6. 6.

    Output xx.

Note that 2iy~i>j=nfraci12jy~j2^{i}\tilde{y}_{i}>\sum_{j=-n_{\rm frac}}^{i-1}2^{j}\tilde{y}_{j}. This ensures that sequential subtractions by 2iy~i2^{i}\tilde{y}_{i} and checking whether the difference is positive or negative lead to determining each digit of xx. The above procedure is implemented as the circuit in Figure 13. Note that we add dummy qubits which correspond to the 2nint12n_{\rm int}-1-th to nintn_{\rm int}-th integer digits of zz. Also note that this circuit transforms the dividend register from |z\ket{z} to |0\ket{0}. If we want to reserve |z\ket{z}, we can copy the state to another ancillary register by CNOT gates and use the copy as the dividend state. That is, we can do

|z|0|y|0|z|z|y|0|z|0|y|x.\ket{z}\ket{0}\ket{y}\ket{0}\rightarrow\ket{z}\ket{z}\ket{y}\ket{0}\rightarrow\ket{z}\ket{0}\ket{y}\ket{x}. (47)

Despite the trick to truncate the digits, the structures of the circuits for truncated multiplication and division are similar to that in MunozCoreas and the restoring division circuit in Thapliyal3 respectively. We therefore use the qubit number and T-count of the circuits in MunozCoreas ; Thapliyal3 as those of our truncated arithmetic circuits. The exception is the qubit number of divider, for which we use 5n5n. This is larger than the value in Thapliyal3 by 2n2n, reflecting the addition of dummy qubits and the register to which the dividend is copied141414Actually, added qubits are not 2n2n but nint+nn_{\rm int}+n, but we consider that 2n2n qubits are added for simplicity and conservativeness. .

References

  • (1) R. Orus et al. ”Quantum computing for finance: overview and prospects”, Reviews in Physics 4, 100028 (2019)
  • (2) J. C. Hull, ”Options, Futures, and Other Derivatives”, Prentice Hall (2012)
  • (3) S. Shreve, ”Stochastic Calculus for Finance I: The Binomial Asset Pricing Model”, Springer (2004)
  • (4) S. Shreve, ”Stochastic Calculus for Finance II: Continuous-Time Models”, Springer (2004)
  • (5) A. Montanaro, “Quantum speedup of Monte Carlo methods”, Proc. Roy. Soc. Ser. A, 471, 2181 (2015)
  • (6) Y. Suzuki et. al., “Amplitude Estimation without Phase Estimation”, Quantum Information Processing, 19, 75 (2020)
  • (7) P. Rebentrost et. al., ”Quantum computational finance: Monte Carlo pricing of financial derivatives”, Phys. Rev. A 98, 022321 (2018)
  • (8) N. Stamatopoulos et al., ”Option Pricing using Quantum Computers”, arXiv:1905.02666
  • (9) S. Ramos-Calderer et al., ”Quantum unary approach to option pricing”, arXiv:1912.01618
  • (10) F. Black and M. Scholes, ”The Pricing of Options and Corporate Liabilities”, Journal of Political Economy 81, 637 (1973).
  • (11) R. C. Merton, ”Theory of Rational Option Pricing”, The Bell Journal of Economics and Management Science 4, 141 (1973)
  • (12) B. Dupire, ”Pricing with a Smile”, Risk, 7, 18-20 (1994)
  • (13) L. Grover et. al., “Creating superpositions that correspond to efficiently integrable probability distributions”, arXiv:quant-ph/0208112
  • (14) J. Gregory, ”The xVA Challenge: Counterparty Credit Risk, Funding, Collateral and Capital”, Wiley (2015)
  • (15) F. Arute,et al., ”Quantum supremacy using a programmable superconducting processor”, Nature, 574, 505 (2019)
  • (16) E. T. Campbell et al., ”Roads towards fault-tolerant universal quantum computation”, Nature, 549, 172 (2017)
  • (17) K. Miyamoto and K. Shiohara, ”Reduction of Qubits in Quantum Algorithm for Monte Carlo Simulation by Pseudo-random Number Generator”, arXiv:1911.12469
  • (18) M. Amy et al., ”A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32(6): 818-830 (2013)
  • (19) P. Selinger, Phys. Rev. A 87, 042302 (2013)
  • (20) S. Bravyi and J. Haah, “Magic-state distillation with low overhead,” Phys. Rev. A 86, 052329 (2012)
  • (21) A. G. Fowler and C. Gidney, ”Low overhead quantum computation using lattice surgery”, arXiv:1808.06709
  • (22) D. J. Egger at al., ”Credit Risk Analysis using Quantum Computers”, arXiv:1907.03044
  • (23) G. Maruyama, ”On the Transition Probability Functions of the Markov Process”, Rendiconti del Circolo Matematico di Palermo 4, 48 (1955)
  • (24) G. Bassard et. al., “Quantum amplitude amplification and estimation”, Contemporary Mathematics, 305, 53 (2002)
  • (25) K. Nakaji, ”Faster Amplitude Estimation”, arXiv:2003.02417
  • (26) V. Vedral et. al., “Quantum Networks for Elementary Arithmetic Operations”, Phys. Rev. A 54, 147 (1996)
  • (27) D. Beckman et. al., ”Efficient networks for quantum factoring”, Phys. Rev. A, 54, 1034 (1996)
  • (28) T. G. Draper, ”Addition on a quantum computer”, arXiv:quant-ph/0008033
  • (29) S. A. Cuccaro et. al., ”A new quantum ripple-carry addition circuit”, The Eighth Workshop on Quantum Information Processing (2004)
  • (30) Y. Takahashi et. al., ”A linear-size quantum circuit for addition with no ancillary qubits”, Quantum Information and Computation, 5(6), 440–448 (2005)
  • (31) R. Van Meter et. al., ”Fast quantum modular exponentiation”, Phys. Rev. A 71(5), 052320 (2005)
  • (32) T. G. Draper et. al., ”A logarithmic-depth quantum carry-lookahead adder”, Quantum Information and Computation, 6(4), 351 (2006)
  • (33) Y. Takahashi et. al., ”Quantum addition circuits and unbounded fan-out”, Quantum Information and Computation, 10(9&10), 0872 (2010)
  • (34) R. Portugal et. al., ”Reversible Karatsubas algorithm”, Journal of Universal Computer Science, 12(5), 499 (2006)
  • (35) J. J. Alvarez-Sanchezet. al., ”A quantum architecture for multiplying signed integers”, Journal of Physics: Conference Series, 128(1), 012013 (2008)
  • (36) Y. Takahashi et. al., ”A fast quantum circuit for addition with few qubits”, Quantum Information and Computation, 8(6), 636 (2008)
  • (37) H. Thapliyal, “Mapping of subtractor and adder-subtractor circuits on reversible quantum gates,” Transactions on Computational Science XXVII. 10, (2016)
  • (38) H. Thapliyal and N. Ranganathan, “Design of Efficient Reversible Logic Based Binary and BCD Adder Circuits”, J. Emerg. Technol. Comput. Syst. 9, 17, 1 (2013)
  • (39) C.-C. Lin, et al., “Qlib: Quantum module library,” J. Emerg. Technol. Comput. Syst., 11, 1, pp. 7:1–7:20 (2014)
  • (40) H. M. H. Babu, “Cost-efficient design of a quantum multiplier–accumulator unit,” Quantum Information Processing, 16, 1, 30 (2016)
  • (41) H. V. Jayashree et al., “Ancilla-input and garbage-output optimized design of a reversible quantum integer multiplier,” The Journal of Supercomputing, 72, 4, 1477 (2016).
  • (42) E. Muñoz-Coreas and H. Thapliyal, ”Quantum Circuit Design of a T-count Optimized Integer Multiplier”, IEEE Transactions on Computers, 68, 5 (2019)
  • (43) A. Khosropour et al., “Quantum division circuit based on restoring division algorithm,” in Information Technology: New Generations (ITNG), 2011 Eighth International Conference on. IEEE, 2011, pp. 1037–1040.
  • (44) L. Jamal and H. M. H. Babu, “Efficient approaches to design a reversible floating point divider,” in 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), May 2013, pp. 3004–3007.
  • (45) S. V. Dibbo et al., “An efficient design technique of a quantum divider circuit,” in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), May 2016, pp. 2102–2105.
  • (46) H. Thapliyal et al., ”Quantum Circuit Designs of Integer Division Optimizing T-count and T-depth”, IEEE Transactions on Emerging Topics in Computing (2019)
  • (47) M. Amy, D. Maslov, and M. Mosca, IEEE Trans. CAD 33(10), 1476 (2014)
  • (48) D. Maslov, ”On the advantages of using relative phase Toffolis with an application to multiple control Toffoli optimization”, Phys. Rev. A 93, 022311 (2016)
  • (49) M. E. O’Neill, “PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation”, Harvey Mudd College Computer Science Department Tachnical Report (2014); http://www.pcg-random.org/
  • (50) W. Hörmann and J. Leydold, ”Continuous random variate generation by fast numerical inversion”, ACM Transactions on Modeling and Computer Simulation 13(4):347, (2003)
  • (51) T. Haner et al., ”Optimizing Quantum Circuits for Arithmetic”, arXiv:1805.12445
  • (52) E. Muñoz-Coreas and H. Thapliyal, ”T-count and Qubit Optimized Quantum Circuit Design of the Non-Restoring Square Root Algorithm”, ACM Journal on Emerging Technologies in Computing Systems, 14, 3 (2018)
  • (53) V. Kliuchnikov et al., ”Practical approximation of single-qubit unitaries by single-qubit quantum Clifford and T circuits”, IEEE Transactions on Computers, 65, 1, 161 (2016)
  • (54) M. Amy et al., “A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 818 (2013).
Refer to caption
Figure 1: The overview of the circuit for asset price evolution in the LV model in the PRN-on-a-register way. Here and hereafter, ancillary qubits are sometimes omitted for simple display.

.

Refer to caption
Figure 2: The overview of the UjU_{j}, which performs the jj-th step of asset price evolution, in the PRN-on-a-register way.
Refer to caption
Figure 3: Vk(j)V^{(j)}_{k}, which updates RSR_{S} if the asset price is in the kk-th grid of the LV function. Here and hereafter, the wire going over a gate means that the corresponding register is not used in the operation of the gate. A formula at the center of a gate represents the operation the gate performs and ’-1’ means the inverse operation. A formula beside a wire and in a gate means the input or the output of the gate.
Refer to caption
Figure 4: The gate which outputs whether x=jx=j and y[α,β)y\in[\alpha,\beta) or not. The control by a register means the multiple control by qubits therein.
Refer to caption
(a) JWJ_{W}, the gate to let the PSNRN sequence jump to the int+1in_{t}+1 element.
Refer to caption
(b) PWP_{W}, the gate to progress the PSNRN sequence by a step.
Figure 5: The circuits to generate PSNRN sequences.
Refer to caption
Figure 6: The gate to calculate the inverse CDF of standard normal distribution by piecewise polynomial approximation.
Refer to caption
(a) The detail of the “Load cm,ic_{m,i}’s” gate.
Refer to caption
(b) The detail of the “Load cm,ic_{m,i}” gate.
Refer to caption
(c) The detail of the “Load cm,i,kc_{m,i,k}” gate.
Figure 7: Parts of the circuit in Figure 6. Here, cm,i,kc_{m,i,k} means the kk-th digit of cm,ic_{m,i}.
Refer to caption
Figure 8: The overview of the circuit for asset price evolution in the LV model in the register-per-RN way.
Refer to caption
Figure 9: UjU_{j}, which performs the jj-th step of asset price evolution, in the register-per-RN way.
Refer to caption
(a) For m6m\leq 6.
Refer to caption
(b) For m7m\geq 7.
Figure 10: Circuit to compute fi(m)f^{(m)}_{i}.
Refer to caption
(a) The overview of SN gate.
Refer to caption
(b) UmSNU^{\rm SN}_{m}, the mm-th step in the SN gate.
Figure 11: Implementation of the SN gate.
Refer to caption
Figure 12: The circuit to perform truncated multiplication.
Refer to caption
(a) The overview.
Refer to caption
(b) The TruncDivPart gate. Note that in the 2’s complement method we can check whether a number is negative or not by seeing the most significant bit so we do not have to use an adder as a comparator.
Figure 13: The circuit for the truncated division.