Quantum Pricing with a Smile: Implementation of Local Volatility Model on Quantum Computer

Kazuya Kaneko kazuya-kaneko@fintec.co.jp Mizuho-DL Financial Technology Co., Ltd.
2-4-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan Koichi Miyamoto koichi-miyamoto@fintec.co.jp Mizuho-DL Financial Technology Co., Ltd.
2-4-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan Naoyuki Takeda naoyuki-takeda@fintec.co.jp Mizuho-DL Financial Technology Co., Ltd.
2-4-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan Kazuyoshi Yoshino kazuyoshi-yoshino@fintec.co.jp Mizuho-DL Financial Technology Co., Ltd.
2-4-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan

(December 22, 2024)

Abstract

Applications of the quantum algorithm for Monte Carlo simulation to pricing of financial derivatives have been discussed in previous papers. However, up to now, the pricing model discussed in such papers is Black-Scholes model, which is important but simple. Therefore, it is motivating to consider how to implement more complex models used in practice in financial institutions. In this paper, we then consider the local volatility (LV) model, in which the volatility of the underlying asset price depends on the price and time. We present two types of implementation. One is the register-per-RN way, which is adopted in most of previous papers. In this way, each of random numbers (RNs) required to generate a path of the asset price is generated on a separated register, so the required qubit number increases in proportion to the number of RNs. The other is the PRN-on-a-register way, which is proposed in the author’s previous work. In this way, a sequence of pseudo-random numbers (PRNs) generated on a register is used to generate paths of the asset price, so the required qubit number is reduced with a trade-off against circuit depth. We present circuit diagrams for these two implementations in detail and estimate required resources: qubit number and T-count.

pacs:

Valid PACS appear here

^†^†preprint: APS/123-QED

I Introduction

With recent advances of quantum computing technologies, researchers are beginning considering how to utilize them in industries. One major target is finance (see Orus for a review). Since financial institutions are performing enormous tasks of numerical calculation in their daily works, speed-up of such tasks by quantum computer can bring significant benefits to them. One of such tasks is pricing of financial derivatives¹¹1As textbooks of financial derivatives and pricing of them, we refer to Hull ; Shreve1 ; Shreve2 . Financial derivatives, or simply derivatives, are contracts in which payoffs are determined in reference to the prices of underlying assets at some fixed times. Large banks typically have a huge number of derivatives written on various types of assets such as stock price, foreign exchange rate, interest rate, commodity and so on. Therefore, pricing of derivatives is an important issue for them.

In derivative pricing, we represent random movements of underlying asset prices using stochastic processes and calculate a derivative price as a expected value of the sum of payoffs discounted by the risk-free interest rate under some specific probability measure. In order to calculate the expected value, Monte Carlo simulation is often used. There are quantum algorithms for Monte Carlo simulationMontanaro ; Suzuki , which bring quadratic speed-up compared with that on classical computers and there already exists some works which discuss application of such quantum algorithms to derivative pricingRebentrost ; Stamatopoulos ; RamosCalderer . However, in order to bring benefits to practice in finance, previous works have some room to be extended. That is, previous works consider the Black-Scholes (BS) modelBlackScholes ; Merton . Although the BS model is the pioneering model for derivative pricing and still used in many situations in today’s financial firms, it is insufficient to consider only the BS model as an application target of Monte Carlo for practical business for some reasons. First, for various types of derivatives, market prices of derivatives are inconsistent with the BS model. This phenomenon is called volatility smile, which we will explain in Section II. In order to precisely price derivatives taking into account volatility smiles, financial firms often use models which have more degree of freedom than the BS models. Second, the BS model is so simple that analytic formulae are available for the price of some types of derivatives in the model. In such cases, Monte Carlo simulation is not necessary. Actually, banks use Monte Carlo simulation mainly for complex models which can take into account volatility smiles. Although it is the natural first step to consider Monte Carlo in the BS model, the above points motivate us to consider how to apply quantum algorithms for Monte Carlo to the advanced models.

In this paper, we will focus on one of the advanced models, that is, the local volatility (LV) modelDupire . The LV model, which we will describe later, is the model in which a volatility of an asset price depends on the price itself and time, so the BS model is included in this category as a special case. With degrees of freedom to adjust the function form of volatility, the LV model can make derivative prices consistent with volatility smiles. So this model is widely used to price derivatives, especially exotic derivatives, which have complex transaction terms such as early redemption, in many banks.

In order to price a derivative by Monte Carlo simulation, we generate paths, that is, random trajectories of time evolution of asset prices, then calculate the expectation value of the sum of discounted payoffs which arise in each path. Since we cannot generate continuous paths on computers, we usually consider evolutions on a discretized time grid, using a random number (RN) for each time step, which represents the stochastic evolution in the step. In this paper, we focus on how to implement such a time evolution in the LV model on quantum computers.

We can consider two ways to implement the time evolution. In this paper, we call them the register-per-RN way and the PRN-on-a-register way. The difference between them is how to generate RNs required to generate a path. The register-per-RN is adopted in previous papersRebentrost ; Stamatopoulos ; RamosCalderer . In this way, following the procedure described in, e.g., Grover , one creates a superposition of bit strings which correspond to binary representations of possible values of a RN, where the probability amplitude of a each bit string is the square root of the possibility that the RN take the corresponding value. The point is that one register is used for one RN, so the required qubit number is proportional to the number of RNs used to generate a path. This can be problematic in terms of qubit number when many RNs are required. The number of RNs is equal to that of time steps times and that of underlying assets²²2In this paper, we consider arbitrage-free and complete markets, standard assumptions for derivative pricing, so the number of stochastic factors is equal to that of assets. For details, see Shreve1 ; Shreve2 .. The number of time steps can be large for derivatives with long maturity. Maturity can be as long as 30 years, so if we take time grid points monthly, the total number of time steps is 360. The number of underlying assets can be multiple, and furthermore, there are situations where we must simultaneously consider assets concerning different derivative contracts in a portfolio, for example, XVA³³3XVA is the term which collectively represents various types of Value Adjustment on derivative prices, for example, credit value adjustment (CVA), price subtraction taking into account the default of the counterparty. In this paper, we ignore such technical issues. If you are interested, see Gregory .. Assuming that the number of asset is $\mathcal{O}(10)$ and that of time steps is $\mathcal{O}(100)$ , the total number of RNs becomes $\mathcal{O}(10^{3}-10^{4})$ . If we use a register with $\mathcal{O}(10)$ qubits for each RN, the total qubit number can be $\mathcal{O}(10^{5})$ easily. The current state-of-art quantum computers have only $\mathcal{O}(10)$ qubitsArute . Even if we obtain large-scale fault-tolerant machines in the future, the large qubit overhead might be required to make a logical bit (see Campbell as a review and references therein). Therefore, calculations which require the large number of qubits as above might be prohibitive even in the future. This situation is similar to credit portfolio risk measurement, which is described in Miyamoto .

We are then motivated to consider the PRN-on-a-register way, which is proposed in Miyamoto . In this way, one does not create RNs on different register, but generates a sequence of pseudo-random number (PRN) on a register. At each time step, the PRN sequence is progressed and its value is used to evolve the asset price. Therefore, the required qubit number does not depend on the number of RNs and is largely reduced. The drawback is the circuit depth. Here, we define the circuit depth as the number of layers consisting of gates on distinct qubits that can be performed simultaneously, as T-depth used in Amy2 ; Selinger . Since calculations to update the PRN is sequentially performed on a register, the circuit depth is now proportional to the number of RNs. Since in the fault-tolerant computation some kinds of gates, for example T-gates in the Clifford+T gate set, can take long time to be runBravyi ; Fowler , the sequential run of such gates might be also prohibitive in terms of calculation timeEgger . At any rate, in the current stage, where it is difficult to foresee the spec of future quantum computers, we believe that it is meaningful to consider the implementation which saves qubits but consumes depth as a limit.

When it comes to the LV model, the PRN-on-a-register way becomes more motivating, since its disadvantage on the circuit depth compared with the register-per-RN way is alleviated. In the LV model, the volatility varies over time steps depending on the asset price, so the calculation for the time evolution is necessarily stepwise⁴⁴4In the multi-asset case, parallel computing over assets is possible in the register-per-RN way.. Therefore, the PRN-on-a-register way and the register-per-RN way are equivalent with respect to this point, that is, the circuit depth is proportional to the time step number in both ways. This is different from the situation in credit portfolio risk management Egger , where, in the register-per-RN way, a register is assigned to each random number which determine whether each obligor defaults or not and parallel processing on different registers reduces circuit depth.

In this paper, we design the quantum circuits in the above two way in the level of elementary arithmetic. In doing so, we follow the policies of the two ways to the extent possible. That is, not only with respect to RNs but also in other aspects, we try to reduce qubits accepting some additional procedures in the PRN-on-a-register way, and vice versa in the register-per-RN way. For example, in the PRN-on-a-register way, we have to intermediately output the information of the volatility used to evolve the asset price at each time step and clear it by the next step. Otherwise, we need a register to hold the information per step and the required qubit number becomes proportional to the number of time steps. It is nontrivial to implement such a procedure and we will present how to do this later. Note that such clearing procedure is unnecessary in the register-per-RN way.

We then estimate the resources to implement the proposed circuits. We focus on two metrics: qubit number and T-count. As mentioned above, we see that the qubit number in the PRN-on-a-register way is independent from the time step number and much less than the register-per-RN way. The T-count is proportional to the time step number in the both ways. We see that in some specific setting the both ways yield the T-counts of same order of magnitude, except that in the PRN-on-a-register way is larger by some $\mathcal{O}(1)$ factor.

The rest of this paper is organized as follows. Section II and III are preliminary sections, the former and the latter briefly explain the LV model and the quantum algorithm for Monte Carlo simulation, respectively. In section IV, we present the circuit diagram in the two way. In section V, we estimate qubit number and T-count of the proposed circuits. Section VI gives a summary.

II LV model

In this paper, we consider only the single-asset case, since it is straightforward to extend the discussion in this paper to the multi-asset case.

II.1 pricing of derivatives

Consider a party A involved in a derivative contract written on some asset. We let $S_{t}$ denote a stochastic process which represents the asset price at time $t$ , which is set as $t=0$ at the present. We assume that the payoffs arise at the multiple times $t^{\rm pay}_{i},i=1,2,...$ and the $i$ -th payoff is given by $f^{\rm pay}_{i}\left(S_{t^{\rm pay}_{i}}\right)$ , where $f^{\rm pay}_{i}$ is some function which maps $\mathbb{R}$ to $\mathbb{R}$ . The positive payoff means that A receives a money from the counterparty and the negative one means vice versa. For example, the case where A buys an European call option with the strike $K$ corresponds to

f^{\rm pay}_{1}(S_{t^{\rm pay}_{1}})=\max\left\{S_{t^{\rm pay}_{1}}-K,0\right\}

(1)

with a single payment date $t^{\rm pay}_{1}$ . Note that this type of derivative contract is too simple to cover all trades in financial markets. For example, callable contracts, in which either of the parties has a right to terminate the contract at some times, are widely dealt in markets. We leave studies for such exotic derivatives for future works and, in this paper, concider only those which can be expressed in the above form.

Following the theory of arbitrage-free pricing, the price $V$ of the contract for A is given as follows Shreve1 ; Shreve2 :

V=E\left[\sum_{i}f^{\rm pay}_{i}\left(S_{t^{\rm pay}_{i}}\right)\right],

(2)

where $E[\cdot]$ represents the expectation value under some probability measure, the so-called risk-neutral measure. Here and hereafter, we assume that the risk-free interest rate is 0 for simplicity.

II.2 LV model

In the LV model, the evolution of the asset price is modeled by the following stochastic differential equation (SDE)

dS_{t}=\sigma(t,S_{t})dW_{t}

(3)

in the risk-neutral measure⁵⁵5Note that the drift term does not exist since we are now assuming the risk-free rate is 0. $W_{t}$ is the Wiener process which drives $S_{t}$ . $dX_{t}$ is the increment of a stochastic process $X_{t}$ over an infinitesimal time interval $dt$ . The deterministic function $\sigma:[0,\infty)\otimes\mathbb{R}\rightarrow[0,\infty)$ is the local volatility. Note that the BS model corresponds to the case where

\sigma(t,S)=\sigma_{\rm BS}S,

(4)

where $\sigma_{\rm BS}$ is a positive constant, which we hereafter call a BS volatility.

The LV model was proposed to explain volatility smile. In order to describe this, let us define implied volatility first. In the BS model, a price of a European call option with strike $K$ and maturity $T$ at $t=0$ is given by the following formula:

$\displaystyle V_{\rm call,BS}(T,K,S_{0},\sigma_{\rm BS})$	$\displaystyle=$	$\displaystyle\Phi_{\rm SN}(d_{1})S_{0}-\Phi_{\rm SN}(d_{2})K$
$\displaystyle d_{1}$	$\displaystyle=$	$\displaystyle\frac{1}{\sigma_{\rm BS}\sqrt{T}}\left[\ln\left(\frac{S_{0}}{K}\right)+\frac{1}{2}\sigma_{\rm BS}^{2}T\right]$
$\displaystyle d_{2}$	$\displaystyle=$	$\displaystyle d_{1}-\sigma_{\rm BS}\sqrt{T},$	(5)

where $\Phi_{\rm SN}$ is the cumulative distribution function of the standard normal distribution. We can price the option if we determine the BS volatility. Conversely, given the market price of the option $V_{\rm call,mkt}(T,K)$ , we can reversely calculate the BS volatility. That is, we can define the following function of $K$ and $T$ :

	$\displaystyle\sigma_{\rm IV}$	$\displaystyle:$	$\displaystyle(T,K)\mapsto$
			$\displaystyle\sigma_{\rm IV}(T,K)\ s.t.\ V_{\rm call,BS}(T,K,S_{0},\sigma_{\rm IV}(T,K))=V_{\rm call,mkt}(T,K).$

We call BS volatilities drawn back from the market option prices by (LABEL:eq:IV) as implied volatilities.

If the market is described well by the BS model, implied volatilities $\sigma_{\rm IV}(T,K)$ take a same value for any $K$ and $T$ . Although this is the case for some markets, $\sigma_{\rm IV}(T,K)$ varies depending on $K$ and $T$ in many markets. Especially, if $\sigma_{\rm IV}(T,K)$ depends on $K$ , it is said that we observe the volatility smile for the market.

Volatility smiles mean that possible scenarios of asset price evolution in the BS model do not match those which market participants consider. For example, if market participants think that extreme scenarios, big crashes or sharp rises, are more possible than the BS model predicts, the volatility smile arises. In fact, it is often said that the Black Monday, the big crash in the stock markets at 1987, was one of triggers of appearance of volatility smiles.

With the LV model, we can make European option prices given by the model consistent with any market prices, as long as there is no arbitrage in the market. This is intuitively apparent since we can expect that the degree of freedom of the local volatility $\sigma(t,S)$ as a two-dimensional function is available to reproduce the two-dimensional function $V_{\rm call,mkt}(T,K)$ . In fact, if we can get $V_{\rm call,mkt}(T,K)$ as a function, that is, the market option prices for continuously infinite strikes and maturities, we can determine $\sigma(T,K)$ which reproduces $V_{\rm call,mkt}(T,K)$ as followsDupire :

\sigma^{2}(T,K)=2\frac{\frac{\partial}{\partial T}V_{\rm call,mkt}(T,K)}{\frac{\partial^{2}}{\partial K^{2}}V_{\rm call,mkt}(T,K)}.

(7)

In reality, the market option prices are available only for several strikes and maturities. Therefore, in the practical business, we usually use a specific functional form of $\sigma(t,S)$ with degrees of freedom sufficient to reproduce several available market option prices. In this paper, we use the following form. First, we set the $n_{t}$ grid points in the time axis, $t_{0}=0<t_{1}<t_{2}<...<t_{n_{t}}$ . Second, we set the $n_{S}$ grid points in the asset price axis for each time grid point, that is, $s_{i,1},...,s_{i,n_{S}}$ for $t_{i}$ . Then, $\sigma(t,S)$ is set as follows:

\sigma(t,S)=a_{i,j}S+b_{i,j}\ ;{\rm for}\ s_{i,j-1}\leq S<s_{i,j},j=1,...,n_{S}+1

(8)

for $t_{i-1}\leq t<t_{i}$ , where $a_{i,j},b_{i,j}$ are constants satisfying $\sigma(t,S)>0$ for any $t$ and $S$ and $s_{i,0}=-\infty,s_{i,n_{S}+1}=+\infty$ . In other words, the two-dimensional space of $(t,S)$ is divided in the direction of $t$ and in each region $\sigma(t,S)$ is set to a function which is piecewise-linear with respect to $S$ . In this paper, we assume that $a_{i,j},b_{i,j}$ are preset to the value for which the option prices in the LV model, which we here write as $V_{\rm call,LV}(T,K,S_{0},\{a_{i,j}\},\{b_{i,j}\})$ , match the market prices by some standard. For example, they can be set to

\mathop{\rm argmin}\limits_{\{a_{i,j}\},\{b_{i,j}\}}\sum_{I}{\left[V_{\rm call,LV}(T_{I},K_{I},S_{0},\{a_{i,j}\},\{b_{i,j}\})-V_{\rm call,mkt}(T_{I},K_{I})\right]^{2}},

(9)

where $(T_{I},K_{I})$ ’s are several sets of maturity and strike for which the market option prices are available.

II.3 Monte Carlo simulation in the LV model

We here describe how to calculate the derivative price (2) in the LV model by Monte Carlo simulation.

First, we have to discretize the time into sufficiently small meshes, since we can deal with the continuous time on neither classical nor quantum computers. For simplicity, we set the time grid points to the above $t_{i}$ ’s, those for the LV function. Then, the time evolution given by (3) is approximated as follows:

\Delta S_{t_{i}}:=S_{t_{i+1}}-S_{t_{i}}\approx\sigma(t_{i},S_{t_{i}})\sqrt{\Delta t_{i}}w_{i},\Delta t_{i}=t_{i+1}-t_{i},

(10)

where $w_{1},...,w_{n_{t}}$ are independent standard normal random numbers (SNRNs). Among various ways to discretize the SDE, we here adopt the Euler-Maruyama method Maruyama .

Even after time discretization, we cannot consider all of continuously infinite patterns of SNRNs. One solution for this is discretized approximation of SNRNs. We can choose the finite numbers of the grid points and assign probability to each point so that standard normal distribution is approximately reproduced. Now, the patterns of discretized SNRNs are finite, so we can approximate (2) as

V\approx\sum_{n}p_{n}\sum_{i}f^{\rm pay}_{i}\left(S^{(n)}_{t^{\rm pay}_{i}}\right),

(11)

where $p_{n}$ is the probability that the $n$ -th pattern of values of SNRNs are realized and $S^{(n)}_{t}$ is the asset price at time $t$ in the $n$ -th pattern.

There are some possible ways to take petterns considered in (11). In the register-per-RN way, we take all patterns. If we take $N$ grids to discretize each of $n_{t}$ SNRNs, the number of possible patterns of SNRNs is $N^{n_{t}}$ . Although this is exponentially large, quantum computers can take into account all patterns with a polynomial number of qubits by quantum superposition.

On the other hand, this cannot be adopted on classical computers, since the number of the SNRN patterns are exponentially large. Usually, Monte Carlo pricing on classical computers is done in the following way, which the PRN-on-a-register way is also based on. We consider sampled patterns of SNRNs. That is, we generate finite but sufficiently many sample sets of $(w_{1},...,w_{n_{t}})$ and use them to generate sample paths of the asset price which evolves according to (10). We then approximate (2) by the average of sums of payoffs in sample paths,

V\approx\frac{1}{N_{\rm path}}\sum_{n=1}^{N_{\rm path}}\sum_{i}f^{\rm pay}_{i}\left(S^{(n)}_{t^{\rm pay}_{i}}\right),

(12)

where $S^{(n)}_{t}$ is the value of the asset price at time $t$ on the $n$ -th sample path and $N_{\rm path}$ is the number of sample paths.

III Quantum Algorithm for Monte Carlo Simulation

III.1 outline of the algorithm

We here review the quantum algorithm for Monte Carlo simulationMontanaro ; Suzuki . It can be divided into the following steps. First, we create a superposition of possible values of a random number used to calculate a sample value of the integrand on a register. If multiple random numbers are necessary to calculate the integrand, one register is assigned per random number. As mentioned above, continuous random numbers must be approximated in some discretized way. Second, we calculate the integrand into another register, using the random numbers. Note that the results for many patterns of random numbers are simultaneously calculated in quantum parallelism. Third, by controlled rotation, the integrand value is reflected into the amplitude of the ancilla. Finally, amplitude estimation Bassard ; Suzuki ; Nakaji on the ancilla gives the expectation value of the integrand.

The quantum state is transformed as follows:

	$\displaystyle\ket{0}\ket{0}\ket{0}$	(13)
$\displaystyle\rightarrow$	$\displaystyle\left(\sum_{i}{\sqrt{p_{i}}\ket{x_{i}}}\right)\ket{0}\ket{0}$
$\displaystyle\rightarrow$	$\displaystyle\left(\sum_{i}{\sqrt{p_{i}}\ket{x_{i}}}\ket{f(x_{i})}\right)\ket{0}$
$\displaystyle\rightarrow$	$\displaystyle\sum_{i}{\sqrt{p_{i}}\ket{x_{i}}}\ket{f(x_{i})}\left(\sqrt{1-f(x_{i})}\ket{0}+\sqrt{f(x_{i})}\ket{1}\right)$
	.

Here, the first, second and third kets correspond to the random number registers, the integrand register and the ancilla, respectively. $x_{i}$ represents the binary representation of values of random numbers in the $i$ -th pattern and $p_{i}$ is the probability that it realizes. $f$ is the integrand and $f(x_{i})$ is its value for $x_{i}$ . Note that the probability to observe 1 on the ancilla is $\sum_{i}{p_{i}f(x_{i})}$ , the integral value which we want. Although we do not explain how to estimate the probability amplitude in this paper, it is studied in many papers. For example, see Bassard ; Suzuki ; Nakaji . Using such methods, we can estimate the integral with the statistical error which decays as $\mathcal{O}(N^{-1})$ , where $N$ is the number of oracle calls. This decay rate is quadratically faster than that in the classical algorithm, $\mathcal{O}(N^{-1/2})$ .

III.2 the scheme using the PRN generator

We here briefly review the quantum way for Monte Calro simulation using the PRN generator. The calculation flow for the current problem, the time evolution of asset price in the LV model, based on this way is described in Section IV.2.

It is proposed in Miyamoto in order to reduce the required qubits to generate RNs in the application of the quantum algorithm for Monte Carlo to extremely high-dimensional integrations. When it is neccesary to generate many RNs to compute the integrand, the naive way, in which we assign a register to each RN and create a superposition of possible values, leads to the increase of qubit numbers in proportion to the number of RNs. In order to aviod this, we can adopt the following way. First, we prepare two registers, $R_{\rm samp}$ and $R_{\rm PRN}$ . Then, we create a superposition of integers, which specify the start point of the PRN sequence, on $R_{\rm samp}$ . With the start point, we sequentially generate PRNs on $R_{\rm PRN}$ . This is possible because a PRN sequence is a deterministic sequence whose recursion equation is explicitely given, and in Miyamoto we gave the implementation of one of PRN generators on quantum circuits. Using the PRNs, we compute the integrand step by step, which corresponds to time evolution of the asset price and calculation of payoffs in this paper. Finally, the expectation value of the integrand is estimated by quantum amplitude estimation. In this way, since we need only $R_{\rm samp}$ and $R_{\rm PRN}$ to generate PRNs, the required qubit number is now independent from the number of RNs and much smaller than the naive way. The drawback is the increase of the circuit depth.

IV Circuit Design

Now, we present quantum circuits for time evolution of an asset price in the LV model in the two ways: PRN-on-a-register and register-per-RN.

IV.1 elementary gate

Before we present circuits, we here list up elementary gates we use.

•

Adder: $\ket{x}\ket{y}\rightarrow\ket{x+y}\ket{y}$
•

Controlled Adder:
$\ket{c}\ket{x}\ket{y}\rightarrow\begin{cases}\ket{c}\ket{x+y}\ket{y}\ ;\ {\rm for}\ c=1\\ \ket{c}\ket{x}\ket{y}\ ;\ {\rm for}\ c=0\end{cases}$
•

Multiplier: $\ket{x}\ket{y}\ket{z}\rightarrow\ket{x}\ket{y}\ket{z+xy}$
•

Divider: $\ket{x}\ket{y}\ket{0}\rightarrow\ket{x}\ket{y}\ket{x/y}$

We here simply assume their existence. Actually, implementation of such elementary arithmetics are widely studied in previous works: see, for example, Vedral ; Beckman ; Draper ; Cuccaro ; Takahashi ; VanMeter ; Draper2 ; Takahashi2 ; Portugal ; AlvarezSanchez ; Takahashi3 ; Thapliyal ; Thapliyal2 ; Lin ; Babu ; Jayashree ; MunozCoreas ; Khosropour ; Jamal ; Dibbo ; Thapliyal3 . We will discuss the implementation in Section V.1.

With these, we can construct other types of arithmetic we use. For example, subtraction $\ket{x}\ket{y}\rightarrow\ket{x-y}\ket{y}$ can be done as addition by the method of 2’s complement. Comparison $\ket{x}\ket{y}\ket{z}\rightarrow\ket{x}\ket{y}\ket{z\oplus(x>y)}$ can be done as subtraction in 2’s complement method, since the most significant bit represents whether the result of subtraction is positive or negative. So a comparator is constructed as two adder, including uncomputation.

Note that the above multiplier uses two registers as operands and outputs the product into another register. Most of previously proposed multipliers are of this output-to-other type. On the other hand, we also need the self-update type of multiplier which updates either of input registers with the product, otherwise we have to add a register for each multiplication and qubit number explodes. Such a operation is realized by the following trick. When we want to multiply $x$ by $y$ , given the two registers holding $x$ and $y$ and an ancilla register, we can do:

\ket{x}\ket{y}\ket{0}\rightarrow\ket{x}\ket{y}\ket{xy}\rightarrow\ket{xy}\ket{y}\ket{x}\rightarrow\ket{xy}\ket{y}\ket{0}.

(14)

Here, the first step is output-to-other multiplication. The second step is swap between the first and third registers, which is not necessary if we change our recognition on which of two register is ancillary at every multiplication. The third step is the inverse operation of division.

IV.2 the PRN-on-a-register way

IV.2.1 calculation flow

We first present the calculation flow in the PRN-on-a-register way. We consider the flow until calculation of the sum of payoffs, which corresponds to from the first to the third line in (13), since the controlled rotation in the fourth line does not depend on the problem.

In the PRN-on-a-register way, PRNs are used for evolution of the asset price (10). More concretely, we preselect some sequence of pseudo standard normal random numbers (PSNRNs) and divide it into subsequences, then evolve the asset price using them.

Before we present the calculation flow, we explain some setups. We prepare the following register:

•

$R_{\rm samp}$

This is a register where a superposition of integers which determine the start point of the PSNRN sequence. We write its qubit number as $n_{\rm samp}$ . $N_{\rm samp}=2^{n_{\rm samp}}$ is the number of sample paths we generate.
•

$R_{W}$

This is a register where we sequentially generate PSNRNs.
•

$R_{S}$

This is a register where the value of the asset price is stored and which we update for each time step of the evolution, using $R_{W}$ .
•

$R_{\rm payoff}$

This is a register into which the payoffs determined by $R_{S}$ are added.

Note that we need some ancillary registers in addition to the above registers. We assume that the required calculation precision is $n_{\rm dig}$ -bit accuracy and $R_{W},R_{S},R_{\rm payoff}$ and ancillary registers necessary to update them have $n_{\rm dig}$ qubits.

We assume that the following gates are available to generate a sequence of PSNRNs.

•

$P_{W}$

This progresses a PSNRN sequence by one step. In other words, it acts on $R_{W}$ and updates $x_{i}$ to $x_{i+1}$ , where $x_{i}$ is the $i$ -th element of the sequence: $\ket{x_{i}}\rightarrow\ket{x_{i+1}}$ .
•

$J_{W}$

This lets the PSNRN sequence jump to the starting point. That is, it is input an integer $i$ on a register and outputs $x_{in_{t}+1}$ into another register which is initially set to $\ket{0}$ : $\ket{i}\ket{0}\rightarrow\ket{i}\ket{x_{in_{t}+1}}$ . Remember that $n_{t}$ , the number of time steps, is equal to the number of RNs used to generate one sample path.

The concrete implementation of these gates are discussed later.

Then, the calculation flow is as follows:

1.

Initialize all registers to $\ket{0}$ except $R_{S}$ , which is initialized to $\ket{S_{t_{0}}}$ .
2.

Generate a equiprobable superposition of $\ket{0},\ket{1},...,\ket{N_{\rm{samp}}-1}$ , that is, $\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1}{\ket{i}_{n_{\rm{PRN}}}}$ on $R_{\rm samp}$ . This is done by operating a Hadamard gate to each of $n_{\rm{samp}}$ qubits.
3.

Operate $J_{W}$ to set $x_{in_{t}+1}$ to $R_{W}$ , where $i$ is determined by the state of $R_{\rm{samp}}$ . These are the starting points of subsequences.
4.

Perform the time evolution (10) using the value on $R_{W}$ . $R_{S}$ is updated from $\ket{S_{t_{0}}}$ to $\ket{S_{t_{1}}}$ .
5.

Calculate the payoff at time $t_{1}$ and add into $R_{\rm payoff}$ .
6.

Operate $P_{\rm{PRN}}$ to update $R_{W}$ from $x_{in_{t}+1}$ to $x_{in_{t}+2}$ .
7.

Iterate operations similar to 4-6 for each time steps until the time reaches $t_{n_{t}}$ .
8.

Finally we obtain a superposition of states in which the value on $R_{\rm payoff}$ is the sum of payoffs in each sample path. Estimate the expectation value of $R_{\rm payoff}$ by methods like Bassard ; Suzuki . This is an estimate for (12).

Here and hereafter, we assume that a payoff arises at each time step, for simplicity.

The flow of state transformation is as follows:

	$\displaystyle\ket{0}\ket{0}\ket{S_{t_{0}}}\ket{0}$
$\displaystyle\xrightarrow{2}$	$\displaystyle\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1}$	$\displaystyle{\ket{i}}\ket{0}\ket{S_{t_{0}}}\ket{0}$
$\displaystyle\xrightarrow{3}$	$\displaystyle\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1}$	$\displaystyle{\ket{i}}\ket{x_{in_{t}+1}}\ket{S_{t_{0}}}\ket{0}$
$\displaystyle\xrightarrow{4}$	$\displaystyle\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1}$	$\displaystyle{\ket{i}}\ket{x_{in_{t}+1}}\ket{S^{(i)}_{t_{1}}}\ket{0}$
$\displaystyle\xrightarrow{5}$	$\displaystyle\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1}$	$\displaystyle{\ket{i}}\ket{x_{in_{t}+1}}\ket{S^{(i)}_{t_{1}}}\ket{\sum_{j=1}^{1}{f^{\rm pay}_{j}(S^{(i)}_{t_{j}})}}$
$\displaystyle\xrightarrow{6}$	$\displaystyle\frac{1}{\sqrt{N_{\rm{samp}}}}\sum_{i=0}^{N_{\rm{samp}}-1}$	$\displaystyle{\ket{i}}\ket{x_{in_{t}+2}}\ket{S^{(i)}_{t_{1}}}\ket{\sum_{j=1}^{1}{f^{\rm pay}_{j}(S^{(i)}_{t_{j}})}}$
$\displaystyle\xrightarrow{7}$	$\displaystyle...\qquad\quad\quad\quad$
$\displaystyle\xrightarrow{7}$	$\displaystyle\frac{1}{\sqrt{N_{\rm{\rm{samp}}}}}\sum_{i=0}^{N_{\rm{\rm{samp}}}-1}$	$\displaystyle{\ket{i}}\ket{x_{in_{t}+n_{t}}}\ket{S^{(i)}_{t_{n_{t}}}}\ket{\sum_{j=1}^{n_{t}}{f^{\rm pay}_{j}(S^{(i)}_{t_{j}})}},$	(15)

where the first, second, third and fourth kets correspond to $R_{\rm samp},R_{W},R_{S}$ and $R_{\rm payoff}$ , respectively.

IV.2.2 overview of the circuit

Schematically, the circuit which realizes the flow (15) is as shown in Figure 1. In the figure, the gate $U_{j}$ corresponds to the $j$ -th step of asset price evolution, that is, the $j$ -th iteration of step 4-6 in the above calculation flow.

$U_{j}$ is implemented as shown in Figure 2. $P_{W}$ is already explained and the gate ${\rm Payoff}_{j}$ calculates $f^{\rm pay}_{j}(S^{(i)}_{t_{j}})$ using $R_{S}$ and adds it into $R_{\rm payoff}$ . In addition to these, $U_{j}$ has gates $V^{(j)}_{1},...,V^{(j)}_{n_{S}}$ , which update $R_{S}$ .

The detail of $V^{(j)}_{k}$ is shown in Figure 3. This gate (i) checks whether the asset price is in the $k$ -th interval $[s_{j,k-1},s_{j,k})$ , (ii) if so, update $R_{S}$ using the LV in the interval, (iii) clears the intermediate output. It requires ancillary registers $R_{\rm count},R_{S^{\prime}}$ and $R_{g}$ . They have $\lceil\log_{2}{n_{t}}\rceil,n_{\rm dig}$ and 1 qubits respectively. At the start of $V^{(j)}_{k}$ , $R_{\rm count}$ takes $\ket{j}$ or $\ket{j+1}$ and the others take $\ket{0}$ . Then the detailed calculation flow is:

1.

If $R_{\rm count}$ is $j$ and $R_{S}$ is in $[s_{j,k-1},s_{j,k})$ , flip $R_{g}$ .
2.

If $R_{g}$ is 1, update $R_{S}$ as

$S_{t_{j}}\rightarrow S_{t_{j+1}}=S_{t_{j}}+(a_{j,k}S_{t_{j}}+b_{j,k})\sqrt{\Delta t_{j}}x_{in_{t}+j}$ (16)

using the value $x_{in_{t}+j}$ on $R_{W}$ and add 1 to $R_{\rm count}$ .
3.

Calculate

$\frac{S_{t_{j+1}}-b_{j,k}\sqrt{\Delta t_{j}}x_{in_{t}+j}}{1+a_{j,k}\sqrt{\Delta t_{j}}x_{in_{t}+j}}$ (17)

into $R_{S^{\prime}}$ , using the value on $R_{S}$ as $S_{t_{j+1}}$ and that on $R_{W}$ as $x_{in_{t}+j}$ .
4.

If $R_{\rm count}$ is $j+1$ and $R_{S^{\prime}}$ is in $[s_{j,k-1},s_{j,k})$ , flip $R_{g}$ . This uncomputes $R_{g}$ .
5.

Do the inverse operation of 3.

Let us explain the meaning of this flow. First, $R_{\rm count}$ is necessary as an indicator of whether the $j$ -th step of evolution has been already done or not. Without this, it is possible that the asset price is doubly updated in a time step. If and only if the $j$ -th step has not been done, that is, $R_{\rm count}$ is $j$ and the asset price is in $[s_{j,k-1},s_{j,k})$ , the update of the asset price with the LV function $a_{j,k}S+b_{j,k}$ is done. To do this conditional update, the check result is intermediately output to $R_{g}$ and the gate corresponding (16) is operated on $R_{S}$ under control by $R_{g}$ . Besides, the increment of $R_{\rm count}$ controlled by $R_{g}$ is also done, so that $R_{\rm count}$ indicates completion of the $j$ -th step if so. Steps 3-5 is necessary to clear $R_{g}$ . If the asset price has been updated in Step 2, Step 3 draws back it to the value before the update. Conversely, we can determine whether the update has been done in Step 2 from the result of Step 3. That is, for the reason mentioned soon later, the condition that $R_{\rm count}$ is $j+1$ and $R_{S^{\prime}}$ is in $[s_{j,k-1},s_{j,k})$ after Step 3 is equivalent to the condition that $R_{\rm count}$ is $j$ and $R_{S}$ is in $[s_{j,k-1},s_{j,k})$ before Step 2. Therefore, Step 4 flip $R_{g}$ if and only if it is $\ket{1}$ , so it goes back to $\ket{0}$ . In summary, through the sequential operation of $V^{(j)}_{1},...,V^{(j)}_{n_{S}+1}$ , $R_{S}$ is updated only once at the appropriate $V^{(j)}_{k}$ , $R_{\rm count}$ is updated from $\ket{j}$ to $\ket{j+1}$ and all intermediate outputs on ancillary registers are cleared.

We here mention a restriction on the LV model so that it can be implemented in the PRN-on-a-register way. Note that through $V^{(j)}_{1},...,V^{(j)}_{n_{S}+1}$ , the state is transformed from $\ket{j}\ket{S^{(i)}_{t_{j}}}$ to $\ket{j+1}\ket{S^{(i)}_{t_{j+1}}}$ , where the first and second kets correspond to $R_{\rm count}$ and $R_{S}$ respectively and other registers are omitted since they are unchanged. This means that the map from $S^{(i)}_{t_{j}}$ to $S^{(i)}_{t_{j+1}}$ must be one-to-one correspondence, since unitarity is violated if not. Actually, this is not so strong restriction. As shown in Appendix, if we set $a_{i,j},b_{i,j}$ so that $\sigma(t,S)$ is continuous with respect to $S$ and we set $\Delta t_{j}$ is small enough that the increment $\Delta S_{t_{j}}$ is much smaller than $S_{t_{j}}$ itself, the above condition is satisfied.

This one-to-one correspondence lets Step 3 work. That is, since the map between $S^{(i)}_{t_{j}}$ and $S^{(i)}_{t_{j+1}}$ is one-to-one correspondence, the result of Step 3 is in $[s_{j,k-1},s_{j,k})$ if and only if the value on $R_{S}$ before Step 3 is in the image of $[s_{j,k-1},s_{j,k})$ under the map.

IV.2.3 implementation of respective gates

We now consider how to implement respective gates in the PRN-on-a-register way to the level of four arithmetic operations.

(i) $V^{(j)}_{k}$

Note that most parts of $V^{(j)}_{k}$ consist of only arithmetic operations, addition, subtraction, multiplication, division and comparison, which mentioned in Sec IV.1.

For example, the gate $z\leftarrow z\oplus(x=j\ {\rm and}\ y\in I)$ can be divided to two parts. The first part is checking that the value on $R_{\rm count}$ is equal to $j$ and this can be done by the multiple control Toffoli gate, which is studied in Selinger ; Amy ; Maslov . The second part is checking that the asset price is in a given interval, which can be constructed from two comparisons. Combining these, the gate $z\leftarrow z\oplus(x=j\ {\rm and}\ y\in I)$ in Figure 3 is constructed as shown in Figure 4. Note that the bitwise flips $X^{1-j_{0}}\otimes...\otimes X^{1-j_{n_{x}-1}}$ are operated before the multi control Toffoli. Here, $j_{a}$ is the $a$ -th digit of the binary representation of $j$ , so the $a$ -th qubit is flipped if and only if $j_{a}=0$ . This convert $\ket{x}$ to $\ket{1}...\ket{1}$ if and only if $x=j$ .

The operation $x\leftarrow x+(ax+b)y$ in Figure 3 can be realized as follows:

$\displaystyle\ket{x}\ket{y}\ket{0}$	$\displaystyle\rightarrow$	$\displaystyle\ket{x}\ket{y}\ket{1}$	(18)
	$\displaystyle\rightarrow$	$\displaystyle\ket{x}\ket{y}\ket{1+ay}$
	$\displaystyle\rightarrow$	$\displaystyle\ket{(1+ay)x}\ket{y}\ket{1+ay}$
	$\displaystyle\rightarrow$	$\displaystyle\ket{(1+ay)x+by}\ket{y}\ket{1+ay}$
	$\displaystyle\rightarrow$	$\displaystyle\ket{(1+ay)x+by}\ket{y}\ket{0},$

where the third ket corresponds to an ancillary register. The first arrow is just setting a constant on a register. The second arrow is the multiplication by a constant $a$ . The third arrow is the self-update multiplication, so it needs another ancillary register. The fourth arrow is again multiplication by a constant $b$ and the final arrow is uncomputation of the first and second arrows. Note that this is done under control by $R_{g}$ . In order for this to be controlled, it is sufficient to control only the second, fourth and final arrows, since the third arrow becomes a multiplication by 1 without the second. Also note that multiplication by a $n$ -bit constant $a$ or $b$ can be done by $n$ adders, that is, $n$ shift-and-add’s: $ax=\sum_{i=0}^{n-1}{a_{i}2^{i}x}$ , where $a_{i}$ is the $i$ -th bit of $a$ . This saves qubits compared with the case where we use a multiplier, which needs holding $a$ on an ancillary register.

The operation $x\leftarrow(x-by)/(1+ay)$ in Figure 3 is done as follows:

$\displaystyle\ket{x}\ket{y}\ket{0}\ket{0}$	$\displaystyle\rightarrow$	$\displaystyle\ket{x}\ket{y}\ket{1}\ket{0}$	(19)
	$\displaystyle\rightarrow$	$\displaystyle\ket{x}\ket{y}\ket{1+ay}\ket{0}$
	$\displaystyle\rightarrow$	$\displaystyle\ket{x-by}\ket{y}\ket{1+ay}\ket{0}$
	$\displaystyle\rightarrow$	$\displaystyle\ket{x-by}\ket{y}\ket{1+ay}\ket{(x-by)/(1+ay)},$

where the first, second, third and fourth kets are $R_{S},R_{W}$ , another ancillary register and $R_{S^{\prime}}$ , respectively. Here, the first and second arrows are same as (18), the third is the multiplication by a constant $-b$ and the final one is division. Here, we do not have to uncompute $R_{S}$ and the ancillary register, since the whole of this operation is uncomputed soon later in $V_{j,k}$ .

(ii) $J_{W},P_{W}$

In Miyamoto , implementation of PRN on quantum circuits is presented, using the PRN generator called PCGPCG . Note that this PRN generator generates uniform RNs. We now need PSNRNs, so we must transform uniform distribution to standard normal distribution. There are some method and we adopt the inverse transform sampling. Schematically, $J_{W}$ and $P_{W}$ are implemented as shown in Figure 5. Here, $J_{PRN}$ is the gate to let the PRN sequence jump to the $in_{t}+1$ and $P_{PRN}$ is the gate to progress the PRN sequence by a step. They sequentially generate uniform RNs on the ancillary register $R_{\rm PRN}$ and they are transformed to PSNRN on $R_{W}$ .

Although we refer to Miyamoto for the detail of implementation of the PRN generator, we here briefly explain. This generator is combination of linear congruential generator (LCG) and permutation of bit string. For LCG, update of the PRN sequence is done by

x_{n+1}=ax_{n}+c\ {\rm mod}\ N,

(20)

where $a,N$ are positive integers and $c$ is a nonnegative integer, and the $n$ -th element of the sequence is computed from $n$ and the initial value $x_{0}$ by

x_{n}=a^{n}x_{0}+\frac{c(a^{n}-1)}{a-1}\ {\rm mod}\ N,.

(21)

According to Vedral , we can construct the modular adder using 5 plain adders. Modular multiplication by a $n$ -bit constant can be done as $n$ modular shift-and-add’s. Modular division by a constant $a-1$ can be done as modular multiplication by a constant integer $\beta$ such that $\beta(a-1)=1\ {\rm mod}\ N$ , if exists. Modular exponentiation $a^{x}\ {\rm mod}\ N$ is computed as a sequence controlled modular multiplicationVedral . So, to summarize, we can perform (20) and (21) using only controlled adders. In (20), the state is transformed as

$\displaystyle\ket{x}\ket{0}\ket{0}$	$\displaystyle\rightarrow$	$\displaystyle\ket{x}\ket{ax\ {\rm mod}\ N}\ket{0}$	(22)
	$\displaystyle\rightarrow$	$\displaystyle\ket{0}\ket{ax\ {\rm mod}\ N}\ket{0}$
	$\displaystyle\rightarrow$	$\displaystyle\ket{0}\ket{ax\ {\rm mod}\ N}\ket{c}$
	$\displaystyle\rightarrow$	$\displaystyle\ket{0}\ket{ax+c\ {\rm mod}\ N}\ket{c}$
	$\displaystyle\rightarrow$	$\displaystyle\ket{0}\ket{ax+c\ {\rm mod}\ N}\ket{0}.$

In the circuit we are considering, the first and second registers are $R_{\rm PRN}$ and an ancillary register, which interchange their role at every step, and the third is another ancillary register. Each step corresponds to an elementary operation as follows. The first arrow is modular multiplication. The second arrow is the inverse modular multiplication by a integer $\alpha$ such that $a\alpha=1\ {\rm mod}\ N$ and this clearing step is necessary to avoid increase of ancillas. The third is just loading $c$ on an ancillary register, the fourth is modular addition and the last is unloading. (21) progresses as follows:

	$\displaystyle\ket{n}\ket{0}\ket{0}\ket{0}$	(23)
$\displaystyle\rightarrow$	$\displaystyle\ket{n}\ket{a^{n}\ {\rm mod}\ N}\ket{0}\ket{0}$
$\displaystyle\rightarrow$	$\displaystyle\ket{n}\ket{a^{n}\ {\rm mod}\ N}\ket{\left(x_{0}+\frac{c}{a-1}\right)a^{n}\ {\rm mod}\ N}\ket{0}$
$\displaystyle\rightarrow$	$\displaystyle\ket{n}\ket{a^{n}\ {\rm mod}\ N}\ket{\left(x_{0}+\frac{c}{a-1}\right)a^{n}\ {\rm mod}\ N}\ket{\frac{c}{a-1}}$
$\displaystyle\rightarrow$	$\displaystyle\ket{n}\ket{a^{n}\ {\rm mod}\ N}\ket{\left(x_{0}+\frac{c}{a-1}\right)a^{n}-\frac{c}{a-1}\ {\rm mod}\ N}\ket{\frac{c}{a-1}}$
$\displaystyle\rightarrow$	$\displaystyle\ket{n}\ket{0}\ket{\left(x_{0}+\frac{c}{a-1}\right)a^{n}-\frac{c}{a-1}\ {\rm mod}\ N}\ket{0},$

where the first, third, second and fourth registers are respectively $R_{\rm samp},R_{\rm PRN}$ and two ancillary registers. The first arrow is modular exponentiation, the second is modular multiplication, the third is loading, the fourth is modular addition and the last is uncomputation of the first and third.

We do not explain permutation: see Miyamoto for the detail. We just make a comment that it is implemented by a simple circuit, for example, Xorshift is implemented as a sequence of CNOT.

We also need the gate to calculate $\Phi_{\rm SN}^{-1}$ , the inverse function of CDF of standard normal distribution. There are some ways to calculate this efficiently and we here adopt the method in Hormann . In the method, $\mathbb{R}$ is divided into some intervals and $\Phi_{\rm SN}^{-1}$ is approximated by a polynomial in each interval. We adopt the setting where the number of intervals is 109⁶⁶6if we include $(-\infty,x^{\rm ICDF}_{0})$ and $[x^{\rm ICDF}_{n^{\rm ICDF}},\infty,)$ , it is 111 and polynomials are cubic, which realizes the error smaller than $10^{-6}$ . Here, we write the approximated inverse CDF as

\Phi_{\rm SN}^{-1}(x)\approx c_{m,3}x^{3}+c_{m,2}x^{2}+c_{m,1}x+c_{m,0}

(24)

for $x^{\rm ICDF}_{m-1}\leq x<x^{\rm ICDF}_{m},m=0,1,...,n_{\rm ICDF}+1$ , where $x^{\rm ICDF}_{0}<x^{\rm ICDF}_{1}<...<x^{\rm ICDF}_{n_{\rm ICDF}}$ are the end points of the intervals and $n_{\rm ICDF}$ is the number of the interval. Consider that $x^{\rm ICDF}_{-1}=-\infty,x^{\rm ICDF}_{n_{\rm ICDF}+1}=+\infty$ .

Such a piecewise cubic function can be implemented as Figure 6. We here explain how it works. First, the sequence of comparators and “Load $c_{m,i}^{\prime}s$ ” gates load $c_{m,0},...,c_{m,3}$ into the register $R_{c_{0}},...,R_{c,3}$ respectively as follows. The comparators compare the value $x$ of $R_{\rm PRN}$ and the grid points $x^{\rm ICDF}_{m}$ and flip $R_{g}$ if $x<x^{\rm ICDF}_{m}$ . If $R_{g}$ is 1, the “Load $c_{m,i}^{\prime}s$ ” gates are activated. They are actually collections of bitwise flips, that is, $X$ gates. If $x\geq x^{\rm ICDF}_{n_{\rm ICDF}}$ , only “Load $c_{n_{\rm ICDF}+1,i}^{\prime}s$ ” gate is performed and it loads $c_{n_{\rm ICDF}+1,0},...,c_{n_{\rm ICDF}+1,3}$ . If $x^{\rm ICDF}_{n_{\rm ICDF}-1}\leq x<x^{\rm ICDF}_{n_{\rm ICDF}}$ , “Load $c_{n_{\rm ICDF},i}^{\prime}s$ ” and “Load $c_{n_{\rm ICDF}+1,i}^{\prime}s$ ” are performed. So, we set “Load $c_{n_{\rm ICDF},i}^{\prime}s$ ” so that it compensates flips done by “Load $c_{n_{\rm ICDF}+1,i}^{\prime}s$ ” and $c_{n_{\rm ICDF},0},...,c_{n_{\rm ICDF},3}$ are successfully loaded. The case where $x$ is in another interval is similar. The point is that if $x^{\rm ICDF}_{M-1}\leq x<x^{\rm ICDF}_{M}$ , every other gates are activated. That is, the activated gates are “Load $c_{m,i}^{\prime}s$ ” of $m=M,M+2,...,n_{\rm ICDF},n_{\rm ICDF}+1$ if $n_{\rm ICDF}-M$ is even and $m=M,M+2,...,n_{\rm ICDF}-1,n_{\rm ICDF}+1$ if $n_{\rm ICDF}-M$ is odd. This is because every comparator after the $M$ -th one flips $R_{g}$ and $R_{g}$ takes 0 and 1 alternatingly. Considering this, the $X$ gates in “Load $c_{m,i}^{\prime}s$ ” are set as Figure 7, so that $c_{m,0},...,c_{m,3}$ for appropriate $m$ are loaded after the sequence of all activated gates. After load of coefficients, the cubic function is calculated in the Horner’s method

((c_{m,3}x+c_{m,2})x+c_{m,1})x+c_{m,0}.

(25)

This is done by the sequence of adders and multipliers in the latter half of the circuit in Figure 6.

(iii) Payoff

In this paper, we do not consider gates to calculate payoffs in detail, since the resource the gates require is same in both the PRN-on-a-register way and the register-per-RN way. We here make just a short comment. In many cases, a payoff can be expressed in the following form:

f^{\rm pay}_{i}=\min\{\max\{a_{i}S_{t_{i}}+b_{i},f_{i}\},c_{i}\},

(26)

where $a_{i},b_{i},c_{i},f_{i}$ are real constants, that is, a linear function of the asset price with the upper bound (cap) $c_{i}$ and the lower bound (floor) $f_{i}$ . For example, a payoff in an European call option (1) corresponds to $a_{i}=1,b_{i}=-K,c_{i}=+\infty,f_{i}=0$ . Payoffs expressed as (26) can be calculated easily by combination of comparators, adders and multipliers.

IV.3 the register-per-RN way

IV.3.1 calculation flow

Also for the register-per-RN way, we start from presenting the calculation flow, which is somewhat simpler than the PRN-on-a-register way. Again, we consider the flow until calculation of the payoff sum.

Before we present the calculation flow, we explain the required registers.

•

$R_{W_{i}},i=1,...,n_{t}$

This is a register for the $i$ -th SNRN. We need such a register per random number, so the total number is $n_{t}$ .
•

$R_{S_{i}},i=0,1,...,n_{t}$

This is a register where the value of the asset price at time $t_{i}$ is held.
•

$R_{{\rm payoff},i},i=1,...,n_{t}$

This is a register where the value of the sum of payoffs by $t_{i}$ is held.

We again omit ancillary registers here and explain them later. Besides, we again assume that these and ancillary registers necessary to update them have $n_{\rm dig}$ qubits.

We here concretely define a superposition of SNRN values as the following state. In advance, we set the equally spaced $N_{\rm SN}+1$ points for discretization of the distribution $x_{{\rm SN},0}<x_{{\rm SN},1}<...<x_{{\rm SN},N_{\rm SN}}$ , where $x_{{\rm SN},0}$ and $x_{{\rm SN},N_{\rm SN}}$ are the upper and lower bounds of the distribution and set to, say, -4 and +4, respectively. We here assume $N_{\rm SN}=2^{n_{\rm dig}}$ for simplicity. Then, we define $\ket{\rm SN}$ as

\ket{\rm SN}=\sum_{i=0}^{N_{\rm SN}-1}{\sqrt{p_{{\rm SN},i}}\ket{i}},

(27)

where $p_{{\rm SN},i}=\int_{x_{{\rm SN},i}}^{x_{{\rm SN},i+1}}\phi_{\rm SN}(x)dx$ and $\phi_{\rm SN}(x)$ is the probability density function of the standard normal distribution. We consider how to create such a state later. Since $x_{{\rm SN},i}$ can be easily calculated from the index $i$ by a linear function, we identify $i$ as $x_{{\rm SN},i}$ .

Then, the calculation flow is as follows:

1.

Initialize all registers to $\ket{0}$ except $R_{S_{0}}$ , which is initialized to $\ket{S_{t_{0}}}$ .
2.

Generate superpositions of SNRNs on $R_{W_{1}},...,R_{W_{n_{t}}}$ . That is, set each of them to $\ket{\rm SN}$ .
3.

Perform the time evolution (10) using the value on $R_{W_{1}}$ as $w_{1}$ . The result is output to $R_{S_{1}}$ as $\ket{S_{t_{1}}}$ .
4.

Calculate the payoff at time $t_{1}$ using $R_{S_{1}}$ and output the sum of it and the previous payoffs to $R_{{\rm payoff},i}$ .
5.

Iterate operations similar to 3-4 for each time step until the time reaches $t_{n_{t}}$ .
6.

Finally we obtain a superposition of states in which the value on $R_{{\rm payoff},n_{t}}$ is the sum of payoffs for each pattern of values of SNRNs. Estimate the expectation value of $R_{{\rm payoff},n_{t}}$ to get (11).

The flow of state transformation is as follows. Writing only $R_{W_{1}},...,R_{W_{n_{t}}}$ , $R_{S_{0}},R_{S_{1}},...,R_{S_{n_{t}}}$ and $R_{{\rm payoff},1},...,R_{{\rm payoff},n_{t}}$ ,

	$\displaystyle\ket{0}^{\otimes n_{t}}\ket{S_{t_{0}}}\ket{0}^{\otimes n_{t}}\ket{0}^{\otimes n_{t}}$	(28)
$\displaystyle\xrightarrow{2}$	$\displaystyle\ket{\rm SN}^{\otimes n_{t}}\ket{S_{t_{0}}}\ket{0}^{\otimes n_{t}}\ket{0}^{\otimes n_{t}}$
$\displaystyle\xrightarrow{3}$	$\displaystyle\sum_{i_{1}=0}^{N_{\rm SN}-1}\sqrt{p_{{\rm SN},i_{1}}}\ket{i_{1}}\ket{\rm SN}^{\otimes n_{t}-1}\ket{S_{t_{0}}}\ket{S^{(i_{1})}_{t_{1}}}\ket{0}^{\otimes n_{t}-1}\ket{0}^{\otimes n_{t}}$
$\displaystyle\xrightarrow{4}$	$\displaystyle\sum_{i_{1}=0}^{N_{\rm SN}-1}\sqrt{p_{{\rm SN},i_{1}}}\ket{i_{1}}\ket{\rm SN}^{\otimes n_{t}-1}\ket{S_{t_{0}}}\ket{S^{(i_{1})}_{t_{1}}}\ket{0}^{\otimes n_{t}-1}\ket{f^{\rm pay}_{i_{1}}(S^{(i_{1})}_{t_{1}})}\ket{0}^{\otimes n_{t}-1}$
$\displaystyle\xrightarrow{5}$	$\displaystyle...$
$\displaystyle\xrightarrow{5}$	$\displaystyle\sum_{i_{1},\cdots,i_{n_{t}}=0}^{N_{\rm SN}-1}\sqrt{p_{{\rm SN},i_{1}}...p_{{\rm SN},i_{n_{t}}}}\ket{i_{1}}...\ket{i_{n_{t}}}\ket{S_{t_{0}}}\ket{S^{(i_{1})}_{t_{1}}}...\ket{S^{(i_{1}\cdots i_{n_{t}})}_{t_{n_{t}}}}\ket{f^{\rm pay}_{i_{1}}(S^{(i_{1})}_{t_{1}})}...\ket{\sum_{j=1}^{n_{t}}{f^{\rm pay}_{j}(S^{(i_{1}...i_{j})}_{t_{j}})}},$

where $S^{(i_{1}...i_{j})}_{t_{j}}$ is the value of the asset price at time $t_{j}$ evolved by $w_{1}=x_{{\rm SN},i_{1}},...,w_{j}=x_{{\rm SN},i_{j}}$ .

IV.3.2 overview of the circuit

The outline of the circuit in the register-per-RN is as shown in Figure 8. First, $\ket{\rm SN}$ is created on each $R_{W_{j}}$ by the gate SN, which is considered in detail later. After that, the gate $U_{j}$ performs $j$ -th step of asset price evolution and payoff calculation. For each evolution step, ancillary registers $R_{{\rm flg},j}$ and $R_{{\rm LV},j}$ , which have 1 and $2n_{\rm dig}$ qubits respectively, are necessary. $U_{j}$ is then implemented as Figure 9. In this gate, the sequence of comparators and “Load” gates set $a_{j,k},b_{j,k}$ in (8) into $R_{{\rm LV},j}$ by the trick similar to that in the circuit in Figure 6. Then, the operation $x\leftarrow x+(ax+b)y$ updates the asset price according to (10). Since the register-per-RN way does not aim to uncompute ancillary qubits in order to reduce them, $x\leftarrow x+(ax+b)y$ can be done as follows:

$\displaystyle\ket{x}\ket{y}\ket{a}\ket{b}\ket{0}\ket{0}$	$\displaystyle\rightarrow$	$\displaystyle\ket{x}\ket{y}\ket{a}\ket{b}\ket{0}\ket{x}$	(29)
	$\displaystyle\rightarrow$	$\displaystyle\ket{x}\ket{y}\ket{a}\ket{b}\ket{xy}\ket{x}$
	$\displaystyle\rightarrow$	$\displaystyle\ket{x}\ket{y}\ket{a}\ket{b}\ket{xy}\ket{x+axy}$
	$\displaystyle\rightarrow$	$\displaystyle\ket{x}\ket{y}\ket{a}\ket{b}\ket{xy}\ket{x+axy+by},$

where the first ket is $R_{S_{j-1}}$ , the second is $R_{W_{j}}$ , the third and fourth are $R_{{\rm LV},j}$ , the fifth is an ancillary register and the last is $R_{S_{j}}$ . So, this operation consists of copying a state and three multiplications. At the end of $U_{j}$ , the payoff is calculated using $R_{S_{j}}$ . Here, the ” ${\rm Payoff}_{j}$ ” gate performs the following operation

			$\displaystyle\ket{S^{(i_{1}\cdots i_{j})}_{t_{j}}}\ket{\sum_{k=1}^{j-1}{f^{\rm pay}_{k}(S^{(i_{1}...i_{k})}_{t_{k}})}}\ket{0}$		(30)
		$\displaystyle\rightarrow$	$\displaystyle\ket{S^{(i_{1}\cdots i_{j})}_{t_{j}}}\ket{\sum_{k=1}^{j-1}{f^{\rm pay}_{k}(S^{(i_{1}...i_{k})}_{t_{k}})}}\ket{\sum_{k=1}^{j}{f^{\rm pay}_{k}(S^{(i_{1}...i_{k})}_{t_{k}})}},$		(30)

where the first, second and third ket are $R_{S_{j}}$ , $R_{{\rm payoff},j-1}$ and $R_{{\rm payoff},j}$ . This is done as copying $R_{{\rm payoff},j-1}$ to $R_{{\rm payoff},j}$ followed by calculation and addition of $f^{\rm pay}_{j}(S^{(i_{1}...i_{j})}_{t_{j}})$ to $R_{{\rm payoff},j}$ .

IV.3.3 implementation of the SN gate

Let us now consider implementation of the SN gate, which creates a superposition of SNRN values. The outline was presented in Grover . In addition to this, we here explain some details, which were not explicitely explained in Grover .

We construct the state in the recursive way. We assume that we have already divided $[x_{{\rm SN},0},x_{{\rm SN},N_{\rm SN}}]$ by equally-spaced $2^{m}+1$ points $x_{{\rm SN},0}^{(m)}=x_{{\rm SN},0}<x^{(m)}_{{\rm SN},1}<...<x^{(m)}_{{\rm SN},2^{m}}=x_{{\rm SN},N_{\rm SN}}$ and created

\ket{{\rm SN}_{m}}=\sum_{i=0}^{2^{m}-1}{\sqrt{p^{(m)}_{{\rm SN},i}}\ket{i}},

(31)

where $p^{(m)}_{{\rm SN},i}=\int_{x^{(m)}_{{\rm SN},i}}^{x^{(m)}_{{\rm SN},i+1}}\phi_{\rm SN}(x)dx$ . We also assume that we have a gate to efficiently compute $\theta^{(m)}_{i}=\arccos\sqrt{f^{(m)}_{i}}$ with the input $i$ , where $f^{(m)}_{i}$ is

f^{(m)}_{i}=\frac{\int_{x^{(m)}_{{\rm SN},i}}^{\left(x^{(m)}_{{\rm SN},i}+x^{(m)}_{{\rm SN},i+1}\right)/2}\phi_{\rm SN}(x)dx}{\int_{x^{(m)}_{{\rm SN},i}}^{x^{(m)}_{{\rm SN},i+1}}\phi_{\rm SN}(x)dx}.

(32)

Then, the following state transformation is possible:

$\displaystyle\ket{{\rm SN}_{m}}\ket{0}\ket{0}$	$\displaystyle=$	$\displaystyle\sum_{i=0}^{2^{m}-1}{\sqrt{p^{(m)}_{{\rm SN},i}}\ket{i}\ket{0}\ket{0}}$	(33)
	$\displaystyle\rightarrow$	$\displaystyle\sum_{i=0}^{2^{m}-1}{\sqrt{p^{(m)}_{{\rm SN},i}}\ket{i}\ket{0}\ket{\theta^{(m)}_{i}}}$
	$\displaystyle\rightarrow$	$\displaystyle\sum_{i=0}^{2^{m}-1}{\sqrt{p^{(m)}_{{\rm SN},i}}\ket{i}\left(\cos\theta^{(m)}_{i}\ket{0}+\sin\theta^{(m)}_{i}\ket{1}\right)\ket{\theta^{(m)}_{i}}}$
	$\displaystyle=$	$\displaystyle\sum_{i=0}^{2^{m+1}-1}{\sqrt{p^{(m+1)}_{{\rm SN},i}}\ket{i}\ket{\theta^{(m)}_{i}}}$
	$\displaystyle=$	$\displaystyle\ket{{\rm SN}_{m+1}}\ket{0},$

where we use the gate to compute $\theta^{(m)}_{i}$ at the first arrow and perform the controlled rotation at the second arrow. Repeating this until $m=n_{\rm dig}-1$ , we get $\ket{\rm SN}$ .

However, as far as the authors know, neither Grover nor other papers present the gate to compute $\theta^{(m)}_{i}$ . Here, we propose a way to do this on the basis of a simple Taylor expansion. Consider

g(x,\delta)=\frac{\int_{x}^{x+\delta/2}\phi_{\rm SN}(x)dx}{\int_{x}^{x+\delta}\phi_{\rm SN}(x)dx}.

(34)

We can show that

g(x,\delta)\approx\frac{1}{2}+\frac{1}{8}\delta x+\frac{1}{16}\delta^{2}+\mathcal{O}(\delta^{3})

(35)

by simple calculation. So, $g(x,\delta)$ is well-approximated by a linear function of $x$ for small $\delta$ . We can use this to compute $f^{(m)}_{i}$ , since

f^{(m)}_{i}=g\left(x^{(m)}_{{\rm SN},i},\frac{\Delta}{2^{m}}\right),\Delta=x_{{\rm SN},N_{\rm SN}}-x_{{\rm SN},0}.

(36)

Given $x_{{\rm SN},N_{\rm SN}},x_{{\rm SN},0}$ and $m$ large enough that $\Delta/2^{m}$ is sufficiently small, this can be approximately seen as a linear function of $i$ , since $x^{(m)}_{{\rm SN},i}=x_{{\rm SN},0}+\frac{\Delta}{2^{m}}i$ . Actually, we numerically confirmed that for $m\geq 7$ the above approximation gives error smaller than $10^{-5}$ .

We then reach the circuit in Figure 10 for calculation of $f^{(m)}_{i}$ . For $m\leq 6$ , as shown in Figure 10, the sequence of comparators and “Load” gates set $f^{(m)}_{i}$ to the output register $R_{f^{(m)}_{i}}$ , using the trick similar to the circuit in Figure 6 again. Note that comparators now check whether the input value $i$ is equal to a given integer constant or not, so each of them is actually a combination of bitwise flips and a multi-controlled Toffoli gate, similar to that appeared in Figure 4. Also note that, if the input $i$ is $I$ , “Load $f^{(m)}_{i}$ ” gates are activated for $i\geq I$ . Considering this point, each “Load” gate is set so that operations of it and the following gates are successfully compensated. For $m\geq 7$ , the $f^{(m)}_{i}$ is just a linear transformation, which is implemented as bitwise flips followed by a constant multiplier. Of course, according to required accuracy, the value of $m$ where the two types of $f^{(m)}_{i}$ are switched should be adjusted and the degree of the Taylor approximation for $g(x,\delta)$ should be increased.

Using this $f^{(m)}_{i}$ gate, the SN gate is constructed as Figure 11. First, we operate a Hadamard gate to the most significant bit in $R_{W_{j}}$ to assign probability 1/2 to positive and negative halves of $[x_{{\rm SN},0},x_{{\rm SN},N_{\rm SN}}]$ respectively. Then, we operate the sequence of the gates $U^{\rm SN}_{m}$ , which corresponds to the $m$ -th step of the above recursive calculation. $U^{\rm SN}_{m}$ is constructed as a combination of the $f^{(m)}_{i}$ gate, the gates to calculate square root and arc cosine and the controlled rotation gate $R(\theta)$ .

We here comment on implementation of arccos and square root. The implementation of the inverse trigonometric function based on the piecewise polynomial approximation is proposed in Haner . Although Haner considers not arccos but arcsin, these can be easily related as $\arccos(x)=\frac{\pi}{2}-\arcsin(x)$ . We adopt a setting with the polynomial degree 3 and 2 intervals, which leads to accuracy $10^{-5}$ Haner . The circuit for square root is given in MunozCoreas2 .

V Estimation of Required Resources

Then, let us estimate the machine resources required for the implementation in the PRN-on-a-register way and the register-per-RN way. We consider the two metrics, qubit number and T-count, as many papers do.

V.1 elementary gates

Table 1: Resources for various elementary gates. We here assume that operands are

n

-bit. We omit subleading terms with respect to

n

Gate

Qubits

T-count

Reference

Total

Operand

Output

Ancilla

Adder

2n

2n

0 (self-update)

14n

Thapliyal ; Thapliyal2 ; Thapliyal3

Ctrl Adder

2n

2n

0 (self-update)

21n

MunozCoreas ; Thapliyal3

Modular Adder⁷⁷7Since we can construct the modular adder using 5 plain addersVedral , we use 5 times the values of the adder for T-count.

2n

2n

0 (self-update)

70n

Thapliyal ; Thapliyal2 ; Thapliyal3 ; Vedral

Multiplier

3n

2n

n

21n^{2}

MunozCoreas

Divider

5n

2n

n

2n

35n^{2}

Thapliyal3

Multi Ctrl Toffoli

2n

n

n

8n

Selinger ; Maslov

Square Root⁸⁸8The circuit given in MunozCoreas2 takes a

n

-bit input and returns the

n/2

-bit square root and the

n/2

-bit remainder. In order to keep

n

-bit accuracy, we add

n

0’s to the input and calculate the

n

-bit square root of the virtual

2n

-bit input. We treat the

n

bits added to the input and the

n

bits where the remainder is output as ancillas.

4n

n

n

2n

14n^{2}

MunozCoreas2

arccos

105

3.4\times 10^{4}

⁹⁹9Since the value shown in Haner is Toffoli count, we simply multiply it by 7 for converting it to T-count.

Haner

controlled rotation

(with accuracy of

2^{-n}

)

2 (control and target)

3n

Egger ; Kliuchnikov ; Amy

We first summarize the resources of the elementary gates which are necessary to construct the LV circuit. We here consider fixed-point arithmetic. Among various implementations proposed previously, we adopt one for each of the elementary gates and summarize their resources when operands are $n$ -bit in Table LABEL:tbl:resElem. Since we aim to estimate the orders of the above metrics, we take only the leading term with respect to $n$ for required resources. For example, we approximate $an+b$ as $an$ ..

We make a comment on multiplication and the division. For these operation, we use modified versions of circuits proposed in MunozCoreas and Thapliyal3 . This is because of the following reason. In order to store the strict value of the product of two $n$ -bit numbers, original circuits use $2n$ -bits. This causes a problem in the situation where we have to sequentially perform multiplication as in the LV circuit, since the required qubit number doubles at every multiplication. Therefore, we have to truncate lower bits of product and keep the digit number constant. In order to do this, we modify the multiplier in MunozCoreas . We also construct the divider, which is dedicated to drawback of the modified multiplication. This is why the qubit number for divider in Table LABEL:tbl:resElem is different from that in MunozCoreas ; Thapliyal3 . We explain the details of the modified multiplier and divider in Appendix.

V.2 assumptions on registers

We assume the following points on qubit numbers of various registers, some of which have already been mentioned.

•

Registers which store numerical numbers, $R_{W},R_{S},R_{\rm payoff},R_{{\rm LV},j}$ etc., and ancillary registers concerning them have $n_{\rm dig}$ qubits. This is determined according to the required accuracy. We hereafter set $n_{\rm dig}=16$ .
•

$R_{\rm PRN}$ in Figure 5 exceptionally has $n_{\rm PRN}$ qubits, since this is set to so large a value that the PRN sequence has good statistical property, e.g. long period. PCG considers the PRN generators with 64 bits and we also use this value for $n_{\rm PRN}$ hereafter. Ancillary registers necessary for calculation of PRN sequence have $n_{\rm PRN}$ qubits too. Even if $n_{\rm PRN}>n_{\rm dig}$ , we use only $n_{\rm dig}$ qubits in $R_{\rm PRN}$ for transformation to PSNRN.
•

$R_{\rm samp}$ has $n_{\rm samp}$ qubits.
•

Other registers, e.g. $R_{\rm count}$ , have only several qubits and we neglect their contributions to the whole qubit number.

V.3 the PRN-on-a-register way

Then, let us consider the required resources in the PRN-on-a-register way.

V.3.1 qubit number

Table 2: Qubits necessary in each step in the PRN-on-a-register circuit. We neglect registers with only several qubits.

Part	Register	Qubit	Note
Whole	$R_{\rm samp}$	$n_{\rm samp}$
	$R_{S}$	$n_{\rm dig}$
	$R_{\rm payoff}$	$n_{\rm dig}$
	$R_{\rm PRN}$	$n_{\rm PRN}$
$J_{\rm PRN}$	ancilla	$2n_{\rm PRN}$	To hold intermediate outputs; see (21)
$\Phi_{\rm SN}^{-1}$	$R_{W}$	$n_{\rm dig}$
$\Phi_{\rm SN}^{-1}$	ancilla	$6n_{\rm dig}$	To hold the coefficients of the polynomial and the intermediate outputs; see Figure 6
$V^{(j)}_{k}$	$R_{W}$	$n_{\rm dig}$
	$R_{S^{\prime}}$	$n_{\rm dig}$
	ancilla	$4n_{\rm dig}$	For $x\leftarrow x+(ax+b)y$ and $z\leftarrow\frac{z+x-by}{1+ay}$ ; see the comment in the body text.
$P_{\rm PRN}$	ancilla	$2n_{\rm PRN}$	To hold intermediate outputs; see (22).

In Table LABEL:tbl:qubit, we summarize qubits necessary in each step in the circuit. Registers which hold some values throughout the circuit are as follows: $R_{\rm samp},R_{S},R_{\rm payoff}$ ans $R_{\rm PRN}$ . Except these, the following parts in the circuit can consume qubit number most heavily.

•

$J_{\rm PRN}$ and $P_{\rm PRN}$ : $2n_{\rm PRN}$ qubits
•

$\Phi_{\rm SN}^{-1}$ : $7n_{\rm dig}$ qubits

Therefore, the total qubit number required in the PRN-on-a-register way is roughly

n_{\rm samp}+2n_{\rm dig}+n_{\rm PRN}+\max\{2n_{\rm PRN},7n_{\rm dig}\}

(37)

Let us comment on some technical points for obtaining Table LABEL:tbl:qubit. We first make a supplementary explanation on the ancillary qubit number in $V^{(j)}_{k}$ . There are two parts which require ancillas in $V^{(j)}_{k}$ . First, $x\leftarrow x+(ax+b)y$ needs the following ancillas: a $n_{\rm dig}$ -bit register to which $1+ay$ is output, a $n_{\rm dig}$ -bit register to which the result is temporally output in the self-update multiplication and a $2n_{\rm dig}$ -bit register necessary for the inverse division to clear the input $x$ . Second, $z\leftarrow\frac{z+x-by}{1+ay}$ needs the following: a $n_{\rm dig}$ -bit register to which $1+ay$ is output and a $2n_{\rm dig}$ -bit register necessary for division. In total, $4n_{\rm dig}$ bits are sufficient¹⁰¹⁰10Strictly speaking, comparisons between $R_{S}$ or $R_{S^{\prime}}$ and $s_{j,k}$ ’s require loading $s_{j,k}$ ’s into some register. This does not require another register, since at least one of ancillary registers used in $x\leftarrow x+(ax+b)y$ and $z\leftarrow\frac{z+x-by}{1+ay}$ is empty at loading. .

We also comment on the ancilla number in $\Phi_{\rm SN}^{-1}$ . As we can see from Figure 6, we need four registers to which coefficients are loaded and two registers for intermediate outputs. Therefore, $6n_{\rm dig}$ ancillas are necessary¹¹¹¹11Although we also need a register to which $x^{\rm ICDF}_{m}$ ’s are loaded at comparisons between them and $R_{\rm PRN}$ , we can use $R_{W}$ or intermediate output registers, which are empty at comparisons.

V.3.2 T-count

Since we are interested in only the leading contribution, we focus on multiplications, divisions and repeated additions. Besides, we do not consider the T-count of $J_{W}$ , which is used only once. For the parts in $U_{j}$ , which is used repeatedly, we specify T-counts as follows:

1.
$V^{(j)}_{k}$
One $V^{(j)}_{k}$ includes the following parts:
- •
  
  $x\leftarrow x+(ax+b)y$
  As we can see in (18), this includes one multiplication and one division, which come from one self-update multiplication, and $3n_{\rm dig}$ controlled additions, which comes from two controlled multiplications by constant and one inverse. In total, the T-count is $119n_{\rm dig}^{2}$ .
- •
  
  $z\leftarrow\frac{z+x-by}{1+ay}$
  As we can see in (19), this includes one division and $2n_{\rm dig}$ additions, which comes from two multiplications by constant. In total, the T-count is $63n_{\rm dig}^{2}$ .
- •
  
  Uncomputation of $z\leftarrow\frac{z+x-by}{1+ay}$
  Similar to the above.
Therefore, the total T-count in one $V^{(j)}_{k}$ is $245n_{\rm dig}^{2}$ . Since $V^{(j)}_{k}$ is used $n_{S}+1$ times, the total T-count in them is $245n_{\rm dig}^{2}n_{S}$ (only the leading term).
2.

$P_{\rm PRN}$
This includes two modular multiplications by constant, which comes from one self-update modular multiplication. These are decomposed into $2n_{\rm PRN}$ modular additions. So the T-count is roughly $140n_{\rm PRN}^{2}$ .
3.

$\Phi_{\rm SN}^{-1}$ and its inverse
Each of them includes $2(n_{\rm ICDF}+1)$ additions ( $n_{\rm ICDF}+1$ comparisons) and five multiplications. So the T-count for each is roughly $105n_{\rm dig}^{2}+28n_{\rm dig}n_{\rm ICDF}$ .

Summing up these and considering $U_{j}$ is used in $n_{t}$ times, the T-count in the whole circuit is roughly

(245n_{\rm dig}^{2}n_{S}+140n_{\rm PRN}^{2}+210n_{\rm dig}^{2}+56n_{\rm dig}n_{\rm ICDF})n_{t}.

(38)

V.4 the register-per-RN way

Next, we consider the required resources in the register-per-RN way.

V.4.1 qubit number

Table 3: Qubits added in each time step in the register-per-RN circuit. We neglect registers with only several qubits. We only take the leading contributions.

Qubit

Note

R_{S_{i}}

n_{\rm dig}

R_{W_{i}}

n_{\rm dig}

R_{{\rm payoff},i}

n_{\rm dig}

R_{{\rm LV},i}

2n_{\rm dig}

ancilla used for

x\leftarrow x+(ax+b)y

U_{t_{i}}

n_{\rm dig}

See (29)

ancilla for

U^{\rm SN}_{m},m=1,...,n_{\rm dig}-1

output

f^{(m)}_{i}

n_{\rm dig}^{2}

n_{\rm dig}

for one

U^{\rm SN}_{m}

ancilla for SQRT

2n_{\rm dig}^{2}

2n_{\rm dig}

for one

U^{\rm SN}_{m}

qubits used for arccos

(input, output and intermediate output)

105n_{\rm dig}

105 for one

U^{\rm SN}_{m}

In the register-per-RN way, registers shown in Table LABEL:tbl:qubitRN are added per time step. Note that we do not uncompute ancillas. Summing up all registers, the qubit number necessary for one time step is roughly $3n_{\rm dig}^{2}+111n_{\rm dig}$ , and for the entire circuit it is

(3n_{\rm dig}^{2}+111n_{\rm dig})n_{t}.

(39)

Note that the dominant part comes from the iterative calculation in the SN gates, which prepare superpositions of the values of the SNRNs.

V.4.2 T-count

Again, we focus on operations with large T-count. For each part in the circuit, we estimate the T-count as follows:

1.
SN gate
The $m$ -th iteration $U^{\rm SN}_{m}$ in the SN gate includes the following parts:
- •
  
  square root, arccos, controlled rotation
  T-counts are $14n_{\rm dig}$ , $3.4\times 10^{4}$ and $3n_{\rm dig}$ , respectively.
- •
  
  $f^{(m)}_{i}$
  For $2\leq m\leq 6$ , we use $2^{m}$ $m$ -controlled Toffoli gates to check the value on $R_{W_{i}}$ and load $f^{(m)}_{i}$ which corresponds to the value. T-count for this is $2^{m}(8m-9)$ ¹²¹²12Here, we use $8m-9$ , the accurate value of T-count of the $m$ -controlled Toffoli gatesSelinger ; Maslov , since the approximation as $8m$ is too crude for small $m$ .. Summing this for $m=2,...,6$ leads to about 4000. Since this is much smaller than T-count for arccos in one iteration, we neglect this. For $m\geq 7$ , we do multiplication between a $m$ -bit variable and a $n_{\rm dig}$ -bit constant, which is decomposed $n_{\rm dig}$ additions of $m$ -bit. Then, T-count is $14mn_{\rm dig}$ .
Summing up these and taking only dominant contributions, one SN gate has T-count of $(7n_{\rm dig}^{2}+3.4\times 10^{4})n_{\rm dig}$ roughly.
2.

$U_{j}$
This includes $2n_{S}$ additions ( $n_{S}$ comparisons) and three multiplications. So one $U_{j}$ gates has T-count of $63n_{\rm dig}^{2}+28n_{S}n_{\rm dig}$ roughly.

In total, we can estimate T-count of the entire circuit in the register-per-RN way as

(7n_{\rm dig}^{2}+63n_{\rm dig}+28n_{S}+3.4\times 10^{4})n_{\rm dig}n_{t}.

(40)

V.5 comparison between two ways

Table 4: Qubit number and T-count required in the PRN-on-a-register and register-per-RN ways.

	PRN-on-a-register	register-per-RN
qubit	$n_{\rm samp}+2n_{\rm dig}+n_{\rm PRN}+\max\{2n_{\rm PRN},7n_{\rm dig}\}$	$(3n_{\rm dig}^{2}+111n_{\rm dig})n_{t}$
T-count	$(245n_{\rm dig}^{2}n_{S}+140n_{\rm PRN}^{2}+210n_{\rm dig}^{2}+56n_{\rm dig}n_{\rm ICDF})n_{t}$	$(7n_{\rm dig}^{2}+63n_{\rm dig}+28n_{S}+3.4\times 10^{4})n_{\rm dig}n_{t}$

Table 5: Required resources to implement the PRN-on-a-register and register-per-RN ways in the valid case (41). The following values are obtained from Table LABEL:tbl:ressum.

	PRN-on-a-register	register-per-RN
qubit	$2.4\times 10^{2}$	$9.2\times 10^{5}$
T-count	$3.7\times 10^{8}$	$2.1\times 10^{8}$

We then compare resources necessary in two ways in Table LABEL:tbl:ressum. Naturally, qubit number is independent from $n_{t}$ in the PRN-on-a-register way but proportional to $n_{t}$ in the register-per-RN way and T-count is proportional to $n_{t}$ in the both ways. If we take a setting, which is typically necessary for practical use in derivative pricing¹³¹³13 $n_{\rm samp}=16$ corresponds to 65536 sample paths. :

$\displaystyle n_{\rm samp}$	$\displaystyle=$	$\displaystyle 16,$
$\displaystyle n_{\rm dig}$	$\displaystyle=$	$\displaystyle 16,$
$\displaystyle n_{\rm PRN}$	$\displaystyle=$	$\displaystyle 64,$
$\displaystyle n_{\rm ICDF}$	$\displaystyle=$	$\displaystyle 109,$
$\displaystyle n_{t}$	$\displaystyle=$	$\displaystyle 360,$
$\displaystyle n_{S}$	$\displaystyle=$	$\displaystyle 5$	(41)

the values in Table LABEL:tbl:ressum becomes as Table LABEL:tbl:ressumspec. The total T-count is of same order of magnitude in the both way but larger for the PRN-on-a-register way by a factor 2 roughly. We here comment on the parts which consume T-count most heavily in each way. In the PRN-on-a-register way, there are two parts which contribute to T-count equally and dominantly. The first is the update of the asset price in $V^{(j)}_{k}$ . Note that additional operations for reduction of qubits, such as inverse division in self-update multiplication and drawing back the asset price to clear $R_{g}$ , increase T-count compared with the register-per-RN way. The second is modular multiplications in update of the PRN sequence. Since the PRN generator requires the large bit number, say $n_{\rm PRN}=64$ , in order to it keep good statistical properties such as long period, the T-count of operations for the PRN becomes large. On the other hand, in the register-per-RN way the dominant contribution to T-count comes from arccos’s in preparing SNRNs. Because not only an arccos itself is T-count consuming but also it is used in each iteration in the SN gate, the total T-count piles up.

VI Summary

In this paper, we considered how to implement time evolution of the asset price in the LV model on quantum computers. Similar to other problems in finance, derivative pricing by Monte Carlo simulation requires a large number of random numbers, which is proportional to $n_{t}$ , the number of time steps for asset price evolution, and this may cause difficulty in implementation. We then considered two ways of implementation: the PRN-on-a-register way and the register-per-RN way. In the former we sequentially generate pseudo random numbers on a register and use them to evolve the asset price. In the latter, standard normal random numbers necessary to time evolution are created as superpositions on separate registers. For both ways, we present the concrete quantum circuits in detail. For not only random number generation but also other aspects, we try to save qubit numbers permitting some additional procedures in the PRN-on-a-register way and do opposite in the register-per-RN way. We then give estimations of qubit number and T-count required in each way. In the PRN-on-a-register way, qubit number is kept constant against $n_{t}$ . On the other hand, in the register-per-RN way qubit number is proportional to $n_{t}$ . Each way has T-count consuming parts and the total T-counts for both ways are of same order of magnitude, expect the PRN-on-a-register way has the larger T-count by a factor about 2, in some specific setting.

Note that analyses of resources required for implementation of the LV model in this paper strongly depend on designs of elementary circuits for arithmetic. For example, in the register-per-RN way the dominant contribution to T-count comes from arccos’s in preparing SNRNs. If more efficient circuits are proposed and we replace the current choice with them, required resources may change.

Finally, we would like to notice that this study is not enough for application of quantum algorithm for Monte Carlo simulation to pricing in the LV model. Although we assumed that the LV function is given, in practice we have to calibrate the LV so that the model prices of European options fit to the market prices. Besides, we have not considered how to evaluate terms in exotic derivatives, for example, early exercise. In future works, we will consider such things and aim to present how to apply quantum computers in the whole process of exotic derivative pricing.

Appendix A Sufficient condition on $\sigma(t,S)$ in the PRN-on-a-resigter way

We here show that $\sigma(t,S)$ which is continuous with respect to $S$ and takes the form of (8) and sufficiently small $\Delta t_{j}$ lead to one-to-one correspondence between $S^{(i)}_{t_{j}}$ and $S^{(i)}_{t_{j+1}}$ . We see $S^{(i)}_{t_{j+1}}$ as a function of $S^{(i)}_{t_{j}}$ and define a function $f$ by

f(S)=S+\sigma(t_{j},S)\sqrt{\Delta t_{j}}w_{j}

(42)

for fixed $w_{i}$ so that $S^{(i)}_{t_{j+1}}=f(S^{(i)}_{t_{j}})$ holds. Except for the grid points $S=s_{j,0}...s_{j,n_{S}}$ , $f(S)$ is differentiable and

f^{\prime}(S)=1+a_{j,k}\sqrt{\Delta t_{j}}w_{j};\ {\rm for}\ s_{j,k-1}<S<s_{j,k},k=0,...,n_{S}+1.

(43)

Since $w_{j}$ is bounded in numerical computation, $f^{\prime}(S)$ is always positive expect the grid points for sufficiently small $\Delta t_{j}$ . Besides, if $\sigma(t_{j},S)$ is continuous with respect to $S$ also at the grid points, so is $f(S)$ . Combining these, we find that $f(S)$ is strictly increasing for small $\Delta t_{j}$ . Thus, the correspondence between $S^{(i)}_{t_{j}}$ and $S^{(i)}_{t_{j+1}}$ is one-to-one.

Small $\Delta t_{j}$ is required from another perspective, too. The original dynamics of the asset price is given as the continuous-time evolution (3) and (10) is the discretized approximation of it. Therefore, in order for this approximation to be accurate, the increment of asset price should be sufficiently smaller than the asset price itself: $\left|\sigma(t_{j},S^{(i)}_{t_{j}})\sqrt{\Delta t_{j}}w_{j}\right|\ll\left|S^{(i)}_{t_{j}}\right|$ . If this condition is satisfied, $\left|a_{j,k}\sqrt{\Delta t_{j}}w_{j}\right|\ll 1$ , so $f^{\prime}(S)>0$ is met.

Appendix B Truncated multiplier and divider

We here describe the modified version of multiplier and divider. We assume that we consider the fixed-point arithmetic with $n_{\rm int}$ bits in the integer part and $n_{\rm frac}$ bits in the fractional part, $n=n_{\rm int}+n_{\rm frac}$ bits in total. We hereafter call such numbers $(n_{\rm int},n_{\rm frac})$ -bit numbers. We want to keep this digit setting before and after multiplication. In order to do this, we adopt the following policy.

•

We simply truncate the digits lower than the $n_{\rm frac}$ -th fractional digit in the product. This might cause numerical errors around and the $n_{\rm frac}$ -th fractional digit and such a tiny error might accumulate, but we simply neglect this concern.
•

We assume the overflow from the $n_{\rm int}$ -bit integer part never occurs.

We then approximate the product as follows. Writing a number $x$ in binary representation as $x_{n_{\rm int}-1}x_{n_{\rm int}-2}...x_{0}.x_{-1}...x_{-n_{\rm frac}}$ , where $x_{i}$ is the $i$ -th integer digit of $x$ and $x_{-j}$ is the $j$ -th fractional digit of $x$ , we do

xy=\sum_{i=-n_{\rm frac}}^{n_{\rm int}-1}{x_{i}2^{i}y}\approx f^{\rm mul}_{n_{\rm frac},n_{\rm int},y}(x):=\sum_{i=-n_{\rm frac}}^{n_{\rm int}-1}{x_{i}2^{i}\tilde{y}_{i}},

(44)

where

\tilde{y}_{i}=\begin{cases}y_{n_{\rm int}-1}...y_{0}.y_{-1}...y_{-(n_{\rm frac}-i)}\ ;\ {\rm for}\ i<0\\ y\ ;\ {\rm for}\ i\geq 0\end{cases}.

(45)

This truncated multiplication is implemented as a circuit in Figure 12. Note that the circuit in Figure 12 actually calculates not $f^{\rm mul}_{n_{\rm frac},n_{\rm int},y}(x)$ but

\sum_{i=0}^{n_{\rm int}-1}{x_{i}2^{i}(y_{n_{\rm int}-1-i}...y_{0}.y_{-1}...y_{-n_{\rm frac}})}+\sum_{j=1}^{n_{\rm frac}}{x_{-j}2^{-j}(y_{n_{\rm int}-1}...y_{0}.y_{-1}...y_{-(n_{\rm frac}-j)})}.

(46)

This is equal to $f^{\rm mul}_{n_{\rm frac},n_{\rm int},y}(x)$ if our assumption that the overflow from the $n$ -bit integer part never occurs is satisfied.

We define the truncated division as the inverse of the truncated multiplication: $z/y\approx f^{\rm div}_{n_{\rm frac},n_{\rm int},y}(z):=\left(f^{\rm mul}_{n_{\rm frac},n_{\rm int},y}\right)^{-1}(z)$ . Given two $(n_{\rm int},n_{\rm frac})$ -bit numbers $y,z$ which satisfies $z=f^{\rm mul}_{n_{\rm frac},n_{\rm int},y}(x)$ , we can find the $(n_{\rm int},n_{\rm frac})$ -bit number $x$ as follows:

1.

Set $i=n_{\rm int}-1$ , $d=z$ and $x=0$ .
2.

Update $d$ with $d-2^{i}\tilde{y}_{i}$
3.

If $d<0$ , update $d$ with $d+2^{i}\tilde{y}_{i}$ ( $d$ returns to the value before step 2) and set $x_{i}=0$ . If $d\geq 0$ , set $x_{i}=1$ .
4.

Decrement $i$ by 1.
5.

Repeat step 2-4 until $i$ becomes $-n_{\rm frac}-1$ at step 4.
6.

Output $x$ .

Note that $2^{i}\tilde{y}_{i}>\sum_{j=-n_{\rm frac}}^{i-1}2^{j}\tilde{y}_{j}$ . This ensures that sequential subtractions by $2^{i}\tilde{y}_{i}$ and checking whether the difference is positive or negative lead to determining each digit of $x$ . The above procedure is implemented as the circuit in Figure 13. Note that we add dummy qubits which correspond to the $2n_{\rm int}-1$ -th to $n_{\rm int}$ -th integer digits of $z$ . Also note that this circuit transforms the dividend register from $\ket{z}$ to $\ket{0}$ . If we want to reserve $\ket{z}$ , we can copy the state to another ancillary register by CNOT gates and use the copy as the dividend state. That is, we can do

\ket{z}\ket{0}\ket{y}\ket{0}\rightarrow\ket{z}\ket{z}\ket{y}\ket{0}\rightarrow\ket{z}\ket{0}\ket{y}\ket{x}.

(47)

Despite the trick to truncate the digits, the structures of the circuits for truncated multiplication and division are similar to that in MunozCoreas and the restoring division circuit in Thapliyal3 respectively. We therefore use the qubit number and T-count of the circuits in MunozCoreas ; Thapliyal3 as those of our truncated arithmetic circuits. The exception is the qubit number of divider, for which we use $5n$ . This is larger than the value in Thapliyal3 by $2n$ , reflecting the addition of dummy qubits and the register to which the dividend is copied¹⁴¹⁴14Actually, added qubits are not $2n$ but $n_{\rm int}+n$ , but we consider that $2n$ qubits are added for simplicity and conservativeness. .

References

(1) R. Orus et al. ”Quantum computing for finance: overview and prospects”, Reviews in Physics 4, 100028 (2019)
(2) J. C. Hull, ”Options, Futures, and Other Derivatives”, Prentice Hall (2012)
(3) S. Shreve, ”Stochastic Calculus for Finance I: The Binomial Asset Pricing Model”, Springer (2004)
(4) S. Shreve, ”Stochastic Calculus for Finance II: Continuous-Time Models”, Springer (2004)
(5) A. Montanaro, “Quantum speedup of Monte Carlo methods”, Proc. Roy. Soc. Ser. A, 471, 2181 (2015)
(6) Y. Suzuki et. al., “Amplitude Estimation without Phase Estimation”, Quantum Information Processing, 19, 75 (2020)
(7) P. Rebentrost et. al., ”Quantum computational finance: Monte Carlo pricing of financial derivatives”, Phys. Rev. A 98, 022321 (2018)
(8) N. Stamatopoulos et al., ”Option Pricing using Quantum Computers”, arXiv:1905.02666
(9) S. Ramos-Calderer et al., ”Quantum unary approach to option pricing”, arXiv:1912.01618
(10) F. Black and M. Scholes, ”The Pricing of Options and Corporate Liabilities”, Journal of Political Economy 81, 637 (1973).
(11) R. C. Merton, ”Theory of Rational Option Pricing”, The Bell Journal of Economics and Management Science 4, 141 (1973)
(12) B. Dupire, ”Pricing with a Smile”, Risk, 7, 18-20 (1994)
(13) L. Grover et. al., “Creating superpositions that correspond to efficiently integrable probability distributions”, arXiv:quant-ph/0208112
(14) J. Gregory, ”The xVA Challenge: Counterparty Credit Risk, Funding, Collateral and Capital”, Wiley (2015)
(15) F. Arute,et al., ”Quantum supremacy using a programmable superconducting processor”, Nature, 574, 505 (2019)
(16) E. T. Campbell et al., ”Roads towards fault-tolerant universal quantum computation”, Nature, 549, 172 (2017)
(17) K. Miyamoto and K. Shiohara, ”Reduction of Qubits in Quantum Algorithm for Monte Carlo Simulation by Pseudo-random Number Generator”, arXiv:1911.12469
(18) M. Amy et al., ”A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32(6): 818-830 (2013)
(19) P. Selinger, Phys. Rev. A 87, 042302 (2013)
(20) S. Bravyi and J. Haah, “Magic-state distillation with low overhead,” Phys. Rev. A 86, 052329 (2012)
(21) A. G. Fowler and C. Gidney, ”Low overhead quantum computation using lattice surgery”, arXiv:1808.06709
(22) D. J. Egger at al., ”Credit Risk Analysis using Quantum Computers”, arXiv:1907.03044
(23) G. Maruyama, ”On the Transition Probability Functions of the Markov Process”, Rendiconti del Circolo Matematico di Palermo 4, 48 (1955)
(24) G. Bassard et. al., “Quantum amplitude amplification and estimation”, Contemporary Mathematics, 305, 53 (2002)
(25) K. Nakaji, ”Faster Amplitude Estimation”, arXiv:2003.02417
(26) V. Vedral et. al., “Quantum Networks for Elementary Arithmetic Operations”, Phys. Rev. A 54, 147 (1996)
(27) D. Beckman et. al., ”Efficient networks for quantum factoring”, Phys. Rev. A, 54, 1034 (1996)
(28) T. G. Draper, ”Addition on a quantum computer”, arXiv:quant-ph/0008033
(29) S. A. Cuccaro et. al., ”A new quantum ripple-carry addition circuit”, The Eighth Workshop on Quantum Information Processing (2004)
(30) Y. Takahashi et. al., ”A linear-size quantum circuit for addition with no ancillary qubits”, Quantum Information and Computation, 5(6), 440–448 (2005)
(31) R. Van Meter et. al., ”Fast quantum modular exponentiation”, Phys. Rev. A 71(5), 052320 (2005)
(32) T. G. Draper et. al., ”A logarithmic-depth quantum carry-lookahead adder”, Quantum Information and Computation, 6(4), 351 (2006)
(33) Y. Takahashi et. al., ”Quantum addition circuits and unbounded fan-out”, Quantum Information and Computation, 10(9&10), 0872 (2010)
(34) R. Portugal et. al., ”Reversible Karatsubas algorithm”, Journal of Universal Computer Science, 12(5), 499 (2006)
(35) J. J. Alvarez-Sanchezet. al., ”A quantum architecture for multiplying signed integers”, Journal of Physics: Conference Series, 128(1), 012013 (2008)
(36) Y. Takahashi et. al., ”A fast quantum circuit for addition with few qubits”, Quantum Information and Computation, 8(6), 636 (2008)
(37) H. Thapliyal, “Mapping of subtractor and adder-subtractor circuits on reversible quantum gates,” Transactions on Computational Science XXVII. 10, (2016)
(38) H. Thapliyal and N. Ranganathan, “Design of Efficient Reversible Logic Based Binary and BCD Adder Circuits”, J. Emerg. Technol. Comput. Syst. 9, 17, 1 (2013)
(39) C.-C. Lin, et al., “Qlib: Quantum module library,” J. Emerg. Technol. Comput. Syst., 11, 1, pp. 7:1–7:20 (2014)
(40) H. M. H. Babu, “Cost-efficient design of a quantum multiplier–accumulator unit,” Quantum Information Processing, 16, 1, 30 (2016)
(41) H. V. Jayashree et al., “Ancilla-input and garbage-output optimized design of a reversible quantum integer multiplier,” The Journal of Supercomputing, 72, 4, 1477 (2016).
(42) E. Muñoz-Coreas and H. Thapliyal, ”Quantum Circuit Design of a T-count Optimized Integer Multiplier”, IEEE Transactions on Computers, 68, 5 (2019)
(43) A. Khosropour et al., “Quantum division circuit based on restoring division algorithm,” in Information Technology: New Generations (ITNG), 2011 Eighth International Conference on. IEEE, 2011, pp. 1037–1040.
(44) L. Jamal and H. M. H. Babu, “Efficient approaches to design a reversible floating point divider,” in 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), May 2013, pp. 3004–3007.
(45) S. V. Dibbo et al., “An efficient design technique of a quantum divider circuit,” in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), May 2016, pp. 2102–2105.
(46) H. Thapliyal et al., ”Quantum Circuit Designs of Integer Division Optimizing T-count and T-depth”, IEEE Transactions on Emerging Topics in Computing (2019)
(47) M. Amy, D. Maslov, and M. Mosca, IEEE Trans. CAD 33(10), 1476 (2014)
(48) D. Maslov, ”On the advantages of using relative phase Toffolis with an application to multiple control Toffoli optimization”, Phys. Rev. A 93, 022311 (2016)
(49) M. E. O’Neill, “PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation”, Harvey Mudd College Computer Science Department Tachnical Report (2014); http://www.pcg-random.org/
(50) W. Hörmann and J. Leydold, ”Continuous random variate generation by fast numerical inversion”, ACM Transactions on Modeling and Computer Simulation 13(4):347, (2003)
(51) T. Haner et al., ”Optimizing Quantum Circuits for Arithmetic”, arXiv:1805.12445
(52) E. Muñoz-Coreas and H. Thapliyal, ”T-count and Qubit Optimized Quantum Circuit Design of the Non-Restoring Square Root Algorithm”, ACM Journal on Emerging Technologies in Computing Systems, 14, 3 (2018)
(53) V. Kliuchnikov et al., ”Practical approximation of single-qubit unitaries by single-qubit quantum Clifford and T circuits”, IEEE Transactions on Computers, 65, 1, 161 (2016)
(54) M. Amy et al., “A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32, 818 (2013).