Implementing arbitrary quantum operations via quantum walks on a cycle graph

Jia-Yi Lin National Lab of Solid State Microstructure, Collaborative Innovation Center of Advanced Microstructures, and Department of Physics, Nanjing University, Nanjing 210093, China. Xin-Yu Li Institute for Brain Sciences and Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China. Yu-Hao Shao Institute for Brain Sciences and Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China. Wei Wang wangwei@nju.edu.cn National Lab of Solid State Microstructure, Collaborative Innovation Center of Advanced Microstructures, and Department of Physics, Nanjing University, Nanjing 210093, China. Institute for Brain Sciences and Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China. Shengjun Wu sjwu@nju.edu.cn National Lab of Solid State Microstructure, Collaborative Innovation Center of Advanced Microstructures, and Department of Physics, Nanjing University, Nanjing 210093, China. Institute for Brain Sciences and Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China. Hefei National Laboratory, University of Science and Technology of China, Hefei 230088, China

Abstract

The quantum circuit model is the most commonly used model for implementing quantum computers and quantum neural networks whose essential tasks are to realize certain unitary operations. The circuit model usually implements a desired unitary operation $U(N)$ by a sequence of single-qubit and two-qubit unitary gates from a universal set. Although this certainly facilitates the experimentalists as they only need to prepare several different kinds of universal gates, the number of gates required to implement an arbitrary desired unitary operation is usually large. Hence the efficiency in terms of the circuit depth or running time is not guaranteed. Here we propose an alternative approach; we use a simple discrete-time quantum walk (DTQW) on a cycle graph to model an arbitrary unitary operation $U(N)$ without the need to decompose it into a sequence of gates of smaller sizes. Our model is essentially a quantum neural network based on DTQW. Firstly, it is universal as we show that any unitary operation $U(N)$ can be realized via an appropriate choice of coin operators. Secondly, our DTQW-based neural network can be updated efficiently via a learning algorithm, i.e., a modified stochastic gradient descent algorithm adapted to our network. By training this network, one can promisingly find approximations to arbitrary desired unitary operations. With an additional measurement on the output, the DTQW-based neural network can also implement general measurements described by positive-operator-valued measures (POVMs). We show its capacity in implementing arbitrary 2-outcome POVM measurements via numeric simulation. We further demonstrate that the network can be simplified and can overcome device noises during the training so that it becomes more friendly for laboratory implementations. Our work shows the capability of the DTQW-based neural network in quantum computation and its potential in laboratory implementations.

I introduction

Quantum walk Aharonov , the quantum counterpart of classical random walk, has been widely applied in achieving various quantum information processing tasks Portugal . Because of its quadratic enhancement of variances, the quantum walk plays a vital role in many quantum search algorithms and provides possible exponential speedups due to the quantum interference during the walk Kempe . Moreover, various experimental implementations of quantum walks prove its feasibility in real-life circumstances of quantum information processing Kia .

On the other hand, machine learning is a core technology in the age of artificial intelligence. Since machine learning faces the challenge of the lack of computational power and quantum computing has a vast computational potential, the possibility of combining quantum computing and machine learning has been considered. Quantum neural networks (QNNs), a newer class of models in the field of quantum machine learning, operate on quantum computers and perform calculations using quantum effects like superposition, entanglement, and interference. Investigations on QNNs farhi2018classification ; zhao2019building ; mitarai2018quantum ; dallaire2018quantum ; amin2018quantum ; CZoufal ; VDunjko ; MSchuld have revealed their potential advantages, such as training and processing speedups. Despite significant developments in the growing field of quantum machine learning, the trade-offs between quantum and classical models have not been systematically studied. In particular, the question of whether quantum neural networks are more powerful than classical neural networks is still open SAaronson2015nph .

A gate-model QNN is a QNN constructed on a gate-model quantum computer using a sequence of unitaries with associated gate parameters farhi2018classification . Recent developments, such as quantum generative adversarial networks and quantum circuit learning, have more general and diverse QNN structures dallaire2018quantum ; mitarai2018quantum ; gyongyosi2019training . Researchers have already proved that typical quantum walks are universal for quantum computation childs2009universal ; lovett2010universal ; kurzynski2013quantum ; bian2015realization ; zhao2015experimental . However, these works mainly focus on state processing, and many auxiliary systems should be employed in general. In contrast, what we are attempting to achieve in this work is a universal control of the quantum system to implement arbitrary quantum operations, without any auxiliary system. For this purpose, we shall introduce a QNN based on discrete-time quantum walks (DTQW) on a cycle graph with specifically parameterized coin operators. We choose the graph to be a cycle because it is simple for laboratory implementations. We will prove that the DTQW-based QNN is indeed capable of realizing arbitrary unitary evolution of the closed system.

Determining the parameters of the DTQW-based QNN analytically is possible. However, any further adjustments on the network, such as a reduction in the number of circuit depth, will pose extraordinary difficulties for analytical methods. In contrast, we will show that such adjustments can be effectively made with gradient descent, a well-known optimization algorithm frequently employed to train machine learning models, including both classical and quantum neural networks darken1992learning ; bengio2013advances . Another significant advantage of using gradient descent is that explicitly decomposing the desired operator into a sequence of gates from a universal set is no longer necessary. Furthermore, we shall simplify the network in various ways to facilitate laboratory implementations. For example, we shall use only rotations along the x-axis as the gates involved in the DTQW. We can still find decent approximations of the desired quantum operations in this situation using our DTQW-based QNN.

Our work is organized as follows. We first introduce our DTQW-based neural network in Sec. II and then prove its universality for quantum control in Sec. III. We further modify gradient descent and apply it to our DTQW-based QNN in Sec. IV. Finally, we simplify the QNN in Sec. V to facilitate the laboratory implementations.

II Quantum neural network based on discrete time quantum walk

The quantum neural network based on quantum gates, the gate-model QNN, was first introduced due to its high experimental feasibility farhi2018classification . The gate-model QNNs utilize a series of unitary operations in a certain order to process the quantum state. The unitary operations involve adjustable parameters. By optimizing these parameters and encoding information to the input and output states, the gate-model QNNs are sufficient to solve various learning tasks. In this section, we introduce the DTQW on a cycle graph. We choose the graph to be a cycle because it is simple for laboratory implementation. We will show that such DTQW also involves a series of adjustable unitary gates and is sufficient to learn quantum operations. Thus the DTQW on a cycle graph can be treated as a special type of the gate-model QNN.

The DTQW on a cycle graph involves two Hilbert spaces, namely the coin space $\mathcal{H}_{c}$ and the position space $\mathcal{H}_{p}$ , which are spanned by orthonormal basis $\{|{0}\rangle_{c},|{1}\rangle_{c}\}$ and $\{|{x}\rangle_{p}\}_{x=0}^{n-1}$ respectively, where $n$ is the number of sites in the cycle. The walker state $|{\Psi}\rangle$ is then in the space $\mathcal{H}=\mathcal{H}_{c}\otimes\mathcal{H}_{p}$ . A schematic representation of the DTQW on a cycle graph is shown in Fig. 1. The process of the DTQW is an iteration of applying coin operators $\hat{C}^{(t)}$ and shift operators

\hat{S}=\sum_{c=0}^{1}\sum_{x=0}^{n-1}\left|{c,x+\delta_{c}\ (\mathrm{mod}\ {n})}\middle\rangle\middle\langle{c,x}\right|

(1)

to the walker state, i.e.,

|{\Psi^{(t+1)}}\rangle=\hat{S}\hat{C}^{(t)}|{\Psi^{(t)}}\rangle,

(2)

where $t=0,1,2,...$ denotes the ordinal of iterations, integer $\delta_{c}$ represents how far the walker is shifted if its coin is in the state $|{c}\rangle$ . For simplicity, we choose $\delta_{c}=c$ throughout this letter. To make sure that the DTQW is flexible enough to implement various quantum operations, the coin operator $\hat{C}^{(t)}$ need to be site-dependent, i.e.,

\hat{C}^{(t)}=\sum_{x=0}^{n-1}\hat{c}_{x}^{(t)}\otimes\left|{x}\middle\rangle\middle\langle{x}\right|,

(3)

where $\hat{c}_{x}^{(t)}\in\mathrm{U}(2)$ flips the coin of the walker during the $t$ -th iteration if the walker is at the site $x$ . Since the operators $\hat{c}_{x}^{(t)}$ are applied to the coin only if the walker is at certain sites $x$ , they are called single-site coin operators.

Refer to caption — Figure 1: This is a schematic representation of the DTQW on a cycle graph with 12 sites, i.e., $n=12$ . The spin-1/2 particle represents the coin system of the DTQW. The dots on the cycle represent the positions that the walker possibly takes. The walker randomly walks on the sites for $T$ -steps. The coin system and the position system combined becomes the total quantum system of the DTQW.

Since the operations during every iteration are unitary, the total effect of a $T$ -step DTQW

\hat{U}_{T,0}=\mathcal{T}\prod_{t=0}^{T-1}\hat{S}\hat{C}^{(t)}

(4)

is also unitary, where $\mathcal{T}\prod$ denotes the time-ordered product. We define $\hat{U}_{t_{1},t_{0}}=\mathcal{T}\prod_{t=t_{0}}^{t_{1}-1}\hat{S}\hat{C}^{(t)}$ so that it is the time evolution operator, i.e., $|{\Psi^{(t_{1})}}\rangle=\hat{U}_{t_{1},t_{0}}|{\Psi^{(t_{0})}}\rangle$ . One can notice that our version of the DTQW on a cycle graph is a straightforward generalization of the conventional Hadamard walk of which $\hat{c}_{x}^{(t)}=\hat{H}$ and $\delta_{c}=1-2c$ .

Every step of DTQW is unitary and is parameterized by $\hat{c}_{x}^{(t)}\in\mathrm{U}(2)$ . These operators $\hat{c}_{x}^{(t)}$ can be treated as the adjustable gates in a gate-model quantum neural network. By adjusting these gates $\hat{c}_{x}^{(t)}$ , we can use the DTQW to implement various quantum operations. Therefore the DTQW can be seen as a special type of gate-model quantum neural network. A schematic representations of the quantum neural network based on the DTQW on a cycle graph is shown in Fig. 2. The circuit depth of this network is the number of walking steps $T$ of the DTQW. In this work, we will denote the quantum neural network based on the DTQW on a cycle graph simply as the DTQW-QNN. We call the system of the quantum walker the underlying system of the DTQW-QNN.

III Universality and complexity of DTQW-based neural network

The implementation of quantum operations via the DTQW on a cycle graph is one of the primary motivations of our work. The universality of DTQW for quantum computation has been shown in general childs2009universal ; lovett2010universal . While the previous work mainly focuses on the mapping from the initial state to the final state in a certain small subspace of the total system, in this work we take the overall effect on the total system into account. In this section, we investigate the capacity and universality of the DTQW on a cycle graph in implementing quantum operations, and show that it is universal for unitary operations, which is the main theorem of this section.

By saying that the DTQW on a cycle graph is universal, we mean that any unitary operation on the overall Hilbert space $\mathcal{H}=\mathcal{H}_{c}\otimes\mathcal{H}_{p}$ can be realized by a DTQW. Hence it is not only universal for computation but also universal for controlling the whole quantum system. To be more formal and specific, the following theorem is provided.

Theorem 1.

For any unitary operator $\hat{V}\in\mathrm{U}(2n)$ , there exists a positive integer $T$ and a family of single-site coin operators $\{\hat{c}_{x}^{(t)}\}\subset\mathrm{U}(2)$ indexed by the set $\{(x,t):0\leq x<n\mbox{ and }0\leq t<T\}$ such that the total effect of the $T$ -step DTQW is $\hat{V}$ , i.e., $\hat{U}_{T,0}=\hat{V}$ , as long as $\delta_{0}\neq\delta_{1}$ and $\gcd(|\delta_{0}-\delta_{1}|,n)=1$ .

We prove the universality of the DTQW on a cycle by decomposing arbitrary unitary operators $\hat{V}$ into a product of two-level unitary operators $\hat{V}=\hat{u}_{m},\dots,\hat{u}_{2}\hat{u}_{1}$ and construct a DTQW to implement every $\hat{u}_{i}$ for $i=1,2,\dots,m$ . A detailed proof is provided in Appendix A.

As a demonstration of Theorem 1, we first implement the controlled NOT (CNOT) gate with a DTQW on a cycle with two sites. We can find that according to Eq. (1), the shift operator

\hat{S}=\begin{bmatrix}1&0&0&0\\ 0&1&0&0\\ 0&0&0&1\\ 0&0&1&0\end{bmatrix}

(5)

is just the CNOT gate we need. Hence a simple one-step DTQW is equivalent to the CNOT gate if we choose all the single-site coin operators $\hat{c}_{x}^{(t)}$ to be the identity operator.

Next, let us consider a more complicated two-level unitary operator, a unitary $\hat{U}$ controlled by two qubits

\hat{V}=\begin{bmatrix}1&0&0&0&0&0&0&0\\ 0&1&0&0&0&0&0&0\\ 0&0&1&0&0&0&0&0\\ 0&0&0&1&0&0&0&0\\ 0&0&0&0&1&0&0&0\\ 0&0&0&0&0&1&0&0\\ 0&0&0&0&0&0&a&b\\ 0&0&0&0&0&0&c&d\end{bmatrix},

(6)

where $a,b,c,d$ are four matrix elements of $\hat{U}$ . Comparing this operator $\hat{V}$ with the general form of two-level operators in Eq. (17), we can find that $c_{0}=c_{1}=1$ , $x_{0}=3$ and $x_{1}=4$ . By substituting $c_{0},c_{1},x_{0},x_{1}$ in Eqs.(26) and (27) with their respective values, we get

\hat{c}_{x}^{(t)}=\begin{cases}\hat{\sigma}_{x}&\mbox{if $t=0$ or $4$, and $x=4$}\\ \begin{bmatrix}d&c\\ b&a\end{bmatrix}&\mbox{if $t=1$ and $x=4$}\\ \hat{I}_{c}&\mbox{otherwise}\end{cases}

(7)

where $\hat{\sigma}_{x}$ is the Pauli $x$ matrix. By choosing the single-site coin operators $\hat{c}_{x}^{(t)}$ according to Eq.(7), we can realize the unitary operator $\hat{V}$ with an eight-step DTQW on a cycle with four sites.

For the most general two-level unitary operators $\hat{V}$ , the calculation is essentially the same as the above example, i.e., find the values of $c_{0},c_{1},x_{0},x_{1}$ by comparing $\hat{V}$ with Eq. (17) and then substitute them in Eqs. (18) and (19) if $c_{0}=c_{1}$ or Eqs. (26) and (27) if otherwise. For unitary operators $\hat{V}$ which are not two-level, we decompose them into a product of two-level unitary operators $\hat{V}=\hat{u}_{m},\dots,\hat{u}_{2}\hat{u}_{1}$ Nielsen2007Quantum . By combining the DTQWs for $\hat{u}_{i}$ one after one, we can realize $\hat{V}$ with the final combined DTQW. As an example, the calculation to implement the Fourier transformation is provided in Appendix B.

Implementing a unitary operation with the construction in the proof of Theorem 1 as above involves numerous steps of the walk. To reduce the number of steps, we provide in Appendix C a further optimized scheme for implementations. With this scheme, no more than $2n^{2}-2n+1$ steps of walk is needed for the DTQW-QNN to be universal, where $n$ is the number of sites in the cycle.

IV finding approximations via gradient descent

It is sometimes cumbersome to find exact realizations of desired quantum operations in analytical ways. However, fair approximations to desired operations are often acceptable for practical purposes. In this section, we introduce an algorithm in a machine learning fashion to find the approximations by applying gradient descent to the DTQW-QNN. With this algorithm, the required number of depth can be further reduced when approximations are allowed.

In order to apply gradient descent to the DTQW-QNN, we have to do the following three things in advance.

Parameterize the single-site coin operators with a four dimensional real vector $\vec{\alpha}^{(x,t)}$ :

\hat{c}_{x}^{(t)}=e^{i\alpha_{3}^{(x,t)}\hat{\sigma}_{3}}e^{i\alpha_{2}^{(x,t)}\hat{\sigma}_{2}}e^{i\alpha_{1}^{(x,t)}\hat{\sigma}_{1}}e^{i\alpha_{0}^{(x,t)}\hat{\sigma}_{0}},

(8)

in which $\hat{\sigma}_{j}$ is the $j$ th Pauli matrix, $\hat{\sigma}_{0}=\hat{I}$ .

2.

Introduce a state-wise loss function $L_{|{\Psi}\rangle}$ :

$L_{|{\Psi}\rangle}=\frac{1}{2}\left\lVert|{\Psi^{(T)}}\rangle-|{\Phi^{(T)}}\rangle\right\rVert^{2},$ (9)

where $|{\Psi^{(T)}}\rangle=\hat{U}_{T,0}|{\Psi}\rangle$ and $|{\Phi^{(T)}}\rangle=\hat{V}|{\Psi}\rangle$ are the final state and the desired final state respectively.

Derive the partial derivative:

\frac{\partial{L_{|{\Psi}\rangle}}}{\partial{\alpha^{(x,t)}_{j}}}=\operatorname{Im}\left(\langle{\Phi^{(t)}}|{\hat{\Sigma}_{j}^{(x,t)}}|{\Psi^{(t)}}\rangle\right),

(10)

where $|{\Psi^{(t)}}\rangle=\hat{U}_{t,0}|{\Psi}\rangle$ and $|{\Phi^{(t)}}\rangle=\hat{U}_{T,t}^{\dagger}|{\Phi^{(T)}}\rangle$ are the forward-propagation and back-propagation states respectively, $\hat{\Sigma}_{j}^{(x,t)}=(\hat{n}_{j}^{(x,t)}\cdot{\vec{\sigma}})\otimes\left|{x}\middle\rangle\middle\langle{x}\right|+\sum_{\xi\neq x}\hat{I}_{c}\otimes\left|{\xi}\middle\rangle\middle\langle{\xi}\right|$ , ${\vec{\sigma}}=\sum_{j=0}^{3}\hat{\sigma}_{j}\vec{e}_{j}$ , and $\hat{n}_{0}^{(x,t)}$ , $\hat{n}_{1}^{(x,t)}$ , $\hat{n}_{2}^{(x,t)}$ , $\hat{n}_{3}^{(x,t)}$ equals

\begin{bmatrix}1\\ 0\\ 0\\ 0\end{bmatrix},\begin{bmatrix}0\\ 1\\ 0\\ 0\end{bmatrix},\begin{bmatrix}0\\ 0\\ \cos{2\alpha_{1}^{(x,t)}}\\ \sin{2\alpha_{1}^{(x,t)}}\end{bmatrix},\begin{bmatrix}0\\ \sin{2\alpha_{2}^{(x,t)}}\\ -\cos{2\alpha_{2}^{(x,t)}}\sin{2\alpha_{1}^{(x,t)}}\\ \cos{2\alpha_{2}^{(x,t)}}\cos{2\alpha_{1}^{(x,t)}}\end{bmatrix}

respectively.

Gradient descent iteratively moves the parameters in the opposite direction of the gradient, i.e.,

\mbox{new }\alpha_{j}^{(x,t)}\leftarrow\mbox{old }\alpha_{j}^{(x,t)}-\eta\frac{\partial{L_{|{\Psi}\rangle}}}{\partial{\alpha_{j}^{(x,t)}}},

(11)

where $\eta$ is a positive real number called learning rate. Hence, the loss gradually drops during the iteration and the approximation to $\hat{V}$ by $\hat{U}_{T,0}$ becomes better and better.

The details of the algorithm to find the parameters of the DTQW-QNN $\left\{\vec{\alpha}^{(x,t)}:0\leq x<n\mbox{ and }0\leq t<T\right\}$ to approximate a desired unitary operator $\hat{V}$ are as the following.

1.

Set the total number of depth $T$ and the learning rate $\eta$ to be an appropriate positive integer and real respectively.
2.

Randomly initialize all the parameters $\alpha^{(x,t)}_{j}$ .
3.

Randomly sample a state $|{\Psi}\rangle$ from the total Hilbert space $\mathcal{H}$ .
4.

Calculate the partial derivatives $\frac{\partial{L_{|{\Psi}\rangle}}}{\partial{\alpha^{(x,t)}_{j}}}$ for all $t$ , $x$ , $j$ according to Eq. (10).
5.

Update all the parameters according to Eq. (11).
6.

Repeat Steps 3 to 5 until an acceptable approximation is reached.

One can notice that our choice of the loss function leads to a friendly form of gradients Eq. (10) for numerical calculation. The states $\left|{\Psi^{(t)}}\right\rangle$ and $\left|{\Phi^{(t)}}\right\rangle$ can be calculated by a forward-propagation and a back-propagation efficiently. Moreover, the gradients can be calculated by implementing a circuit with the help of an ancillary qubit as shown in Fig. 3. At the last of the circuit, the average value $\langle\hat{\sigma}_{3}\rangle$ of the ancillary qubit is measured. The result $\langle\hat{\sigma}_{3}\rangle$ can be used to update the parameters of the DTQW-QNN since $\langle\hat{\sigma}_{3}\rangle$ always coincides with the partial derivative $\partial{L_{|{\Psi}\rangle}}/\partial{\alpha^{(x,t)}_{j}}$ in Eq. (10). This might enable us to implement simultaneous tomography and cloning of an unknown unitary operation.

Besides, the position space $\mathcal{H}_{p}$ is commonly much larger than the coin space $\mathcal{H}_{c}$ . Theorem 1 thus indicates that one can indirectly control a large system by controlling a small two-level coin system via DTQW on a cycle graph. For example, unitary operations and general two-outcome measurements described by positive-operator-valued measures (POVMs) can be applied to the position space in this way straightforwardly according to Theorem 1. If we are only interested in the unitary operators that act on the position space $\mathcal{H}_{p}$ , we only need one arbitrary site to be allowed to assign nonidentity coin operators. The detailed content is provided in Appendix D.

Numerical results

We first test our algorithm with a DTQW-QNN to learn the $\mathrm{SWAP}$ gate. Because all matrix elements of $\mathrm{SWAP}$ are either $0$ or $1$ , it would be visually clear whether a unitary operator is close to $\mathrm{SWAP}$ after the operator is visualized. The change of the DTQW unitary operator $\hat{U}_{T,0}$ during the training is visualized in Fig. 4. As the DTQW-QNN is trained, $\hat{U}_{T,0}$ becomes closer and closer to the desired gate $\mathrm{SWAP}$ . And the DTQW-QNN realizes the $\mathrm{SWAP}$ after the training is finished.

To measure how well the DTQW $\hat{U}_{T,0}$ approximates the desired unitary $\hat{V}$ , we introduce the distance

d(\hat{U}_{T,0},\hat{V})=\sqrt{1-\left|\mathrm{tr}(\hat{U}_{T,0}\hat{V}^{\dagger})/{2n}\right|^{2}}.

(12)

between the operators $\hat{U}_{T,0}$ and $\hat{V}$ . The smaller this distance is, the better the DTQW approximates the desired operator.

In order to show that the DTQW-QNN can actually approximate arbitrary unitary operator, we sample 200 desired operators $\hat{V}$ from $\mathrm{U}(4)$ according to the Haar measure and train 200 DTQW-QNNs in parallel to approximate these operators $\hat{V}$ respectively. The evolution of the distance during the training is plotted in Fig. 5. After the training, the final distance between the DTQW-QNN and the desired operator is smaller than $10^{-7}$ even for the worst case of the 200 samples. For DTQW-QNNs with different number $n$ of sites on the cycle, Fig. 6 shows that the average distance is also always smaller than $10^{-7}$ . From Fig. 6, we can also notice that with more sites on the cycle, the training of the DTQW-QNNs is faster, i.e., less updates are needed.

Training the DTQW-QNN exhibits some similar phenomena as training classical machine learning models. For example, the implicit acceleration by overparameterization arora2018on also emerges in the training of the DTQW-QNN. The implicit acceleration by overparameterization is a phenomenon where the neural network training becomes faster if more layers are added to the network. For DTQW-QNNs, more layers mean more steps of walk, i.e., a larger depth $T$ . As shown in Fig. 7, when the number of depth $T$ is larger, the distance drops faster during the training.

To show that the algorithm also works for larger quantum systems, we apply it to a DTQW on a cycle graph with $20$ sites as a demonstration to realize the quantum Fourier transformation. As shown in Fig. 8, this DTQW-QNN with a $40$ -dimensional underlying quantum system can still be trained to implement the operator we want. For the meta parameters used to generate the numerical results throughout this work, see Appendix E.

V Making the DTQW-based neural network more friendly for implementations

In all previous parts of this work, we have assumed that the single-site coin operators $\hat{c}_{x}^{(t)}$ can take values from $\mathrm{U}(2)$ arbitrarily. This means the single-site coin operator can have arbitrary phase and arbitrary rotational axis. However, it would be much easier to implement rotations along a fixed axis with fixed phases in laboratories. Hence, in this section, we simplify the DTQW so that it becomes easier to implement. Also, there are always noises when DTQW-QNNs are implemented in laboratories. We test it under the situation where noises are presented in the single-site coin operators $\hat{c}_{x}^{(t)}$ . Throughout this section, the numerical demonstrations are all based on DTQWs on a cycle with two sites.

V.1 Random fixed phases

Firstly, it can be observed that the phases $e^{i\alpha_{0}^{(x,t)}\hat{\sigma}_{0}}$ in Eq. (8) of single-site coin operators $\hat{c}_{x}^{(t)}$ are relative phases when $\hat{c}_{x}^{(t)}\otimes\left|{x}\middle\rangle\middle\langle{x}\right|$ are summed in Eq. (3). They are not merely a contribution to the global phase of the DTQW $\hat{U}_{T,0}$ . Hence, any change in one of the phases may cause a nontrivial change in $\hat{U}_{T,0}$ . This seemingly requires an annoying tuning of all the phase factors of single-site coin operators at different times $t$ and at different sites $x$ when the DTQW is implemented.

Fortunately, we find that these phase factors $e^{i\alpha_{0}^{(x,t)}\hat{\sigma}_{0}}$ actually need no adjustment. As shown in Fig. 9a, the DTQW-QNN can still approximate an arbitrary operator $\hat{V}$ via gradient descent even if all the phase factors signed to different sites are random and fixed during the training, i.e.,

\hat{c}_{x}^{(t)}=e^{i\bm{a}^{(x)}}e^{i\alpha_{3}^{(x,t)}\hat{\sigma}_{3}}e^{i\alpha_{2}^{(x,t)}\hat{\sigma}_{2}}e^{i\alpha_{1}^{(x,t)}\hat{\sigma}_{1}},

(13)

where phases $\bm{a}^{(x)}$ are independent real random variables. This releases us from the cumbersome tuning of the phases of single-site coin operators.

V.2 Simple rotations along x-axis only

The formalism of the single-site coin operators $\hat{c}_{x}^{(t)}$ in Eq. (8) involves three consecutive rotations, namely, $e^{i\alpha_{j}^{(x,t)}\hat{\sigma}_{j}},j=1,2,3$ , each along a different axis. To make it easier for laboratory implementations, we simplify the single-site coin operators to be simple rotations only along the $x$ -axis, i.e.,

\hat{c}_{x}^{(t)}=e^{i\bm{a}^{(x)}}e^{i\alpha^{(x,t)}\hat{\sigma}_{1}},

(14)

where $\alpha^{(x,t)}$ now is merely a real parameter. In this situation the DTQW-QNN can still realize arbitrary operators via gradient descent, as indicated by Fig. 9b.

By comparing Figs. 9a and 9b, we can notice that the DTQW-QNN in this section needs much more time to train compared with the DTQW-QNN in Sec. V.1. To reveal the cause, we have trained 200 DTQW-QNNs to approximate 200 randomly sampled operators $\hat{V}$ , respectively. We choose a threshold to be $10^{-1}$ and mark the DTQW-QNNs of which the distance after 200 iterations of training is still larger than the threshold. We find that the phase differences $\bm{a}^{(0)}-\bm{a}^{(1)}$ of these marked DTQW-QNNs are all near $0$ or $\pm\pi$ as shown in Fig. 10. Hence, we conclude that these specific differences in phases cause the DTQW-QNN to be slow to train. This result also corroborates that the phases of single-site coin operators contribute to the DTQW total effect $\hat{U}_{T,0}$ non-trivially as we have stated in Sec. V.1. Now knowing the cause, we can easily avoid these specific phase differences when implementing DTQW-QNNs.

V.3 Noise on rotation axes

When the DTQW-QNN is implemented in laboratories, it is impossible to have all the rotation axes of $\hat{c}_{x}^{(t)}$ be perfectly along the $x$ direction. There are always noises on the rotational axis, i.e.,

\hat{c}_{x}^{(t)}=e^{i\bm{a}^{(x)}}e^{i\alpha^{(x,t)}(\hat{\bm{n}}^{(x,t)}\cdot\vec{\sigma})},

(15)

where

\hat{\bm{n}}^{(x,t)}=\begin{bmatrix}0\\ \cos{\bm{\theta}^{(x,t)}}\\ \sin{\bm{\theta}^{(x,t)}}\cos{\bm{\varphi}^{(x,t)}}\\ \sin{\bm{\theta}^{(x,t)}}\sin{\bm{\varphi}^{(x,t)}}\end{bmatrix},

(16)

where $\bm{\theta}^{(x,t)}$ and $\bm{\varphi}^{(x,t)}$ are independent real random variables. In this situation, approximations to desired operators still can be found via gradient descent, as shown in Fig. 11.

VI Conclusion

In conclusion, we have proposed a quantum neural network based on a simple DTQW on a cycle graph, and used the network to implement arbitrary quantum computation tasks, i.e., unitary operations on an arbitrary $N$ -dimensional Hilbert space.

In order to implement an arbitrary unitary operation via a circuit model, one needs to decompose the unitary into a sequence of smaller unitary operators. However, via our DTQW-QNN, we only need to update the parameters by a learning algorithm. In other words, our model is adaptive to new tasks. With a new computational task given, our network can simply evolve according to the learning algorithm, and there is no need to decompose the desired operation into a sequence of smaller gates.

Regarding the universality of our model, we presented a specific construction of realizing arbitrary two-level unitary operations on the computational basis, and proved that the DTQW-QNN is universal for all unitary operations on the overall Hilbert space of the involved quantum systems. The DTQW-QNN is not only universal for quantum computation but also universal for controlling the whole quantum system. We also provided an optimization so that the circuit depth of the DTQW-QNN does not need to exceed $2n^{2}-2n+1$ to realize an arbitrary unitary operator on a $2n$ -dimensional Hilbert space. However, this is only a theoretical limit of the network size in the worst case for the purpose of analytical proof. The appropriate number of nodes for each task may vary, and it is an open question to find this number for a given task.

Our network evolves according to a learning algorithm based on gradient descent, with the loss function carefully chosen so that the parameter updates can be efficiently calculated in a back-propagation fashion and can be, in principle, directly read out from a measurement. The algorithm performs well in updating the parameters of the neural network. We have shown good approximations of unitary operations on a Hilbert space up to $40$ dimensions, as well as arbitrary two-outcome POVMs. Finally, we have also simplified the DTQW-QNN in various aspects. For example, the rotation gates involved in the DTQW are all limited to be along the $x$ -axis. Such simplifications make the DTQW-QNN more friendly for laboratory implementations while its capability of implementing desired operations is maintained.

We have shown the capability of the DTQW-QNNs in both analytical and numerical ways. Further studies might reveal their total capacity in completing various quantum computation tasks as well as solving machine learning problems, and further experimental implementations would make them more practically useful and closer to real-life applications.

Acknowledgements.

This work is supported by the Innovation Program for Quantum Science and Technology (Grant No. 2021ZD0301701) and the National Natural Science Foundation of China (Grant No. 12175104). Part of the numerical simulations in this work involves the use of QuTiP johansson2013qutip .

References

(1) Y. Aharonov, L. Davidovich, and N. Zagury. Phys. Rev. A 48, 1687 (1993).
(2) R. Portugal, Quantum Walks and Search Algorithms (Springer, New York, 2013).
(3) J. Kempe, Contemp. Phys. 44, 307 (2003).
(4) K. Manouchehri and J. Wang, Physical Implementation of Quantum Walks (Springer Berlin, Heidelberg, 2014).
(5) E. Farhi and H. Neven, arXiv:1802.06002 (2018).
(6) J. Zhao, Y.-H. Zhang, C.-P. Shao, Y.-C. Wu, G.-C. Guo, and G.-P. Guo, Phys. Rev. A 100, 012334 (2019).
(7) K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Phys. Rev. A 98, 032309 (2018).
(8) P.-L. Dallaire-Demers and N. Killoran, Phys. Rev. A 98, 012324 (2018).
(9) M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and R. Melko, Phys. Rev. X 8, 021050 (2018).
(10) C. Zoufal, A. Lucchi, and S. Woerner. npj Quantum Information 5, 103 (2019).
(11) V. Dunjko and H. J. Briegel. Rep. Prog. Phys. 81, 074001 (2018).
(12) M. Schuld, I. Sinayskiy, and F. Petruccione. Quantum Information Processing 13, 2567 (2014).
(13) S. Aaronson. Nature Physics 11, 291 (2015).
(14) L. Gyongyosi and S. Imre, Sci. Rep.9 (2019).
(15) A. M. Childs, Phys. Rev. Lett. 102, 180501 (2009).
(16) N. B. Lovett, S. Cooper, M. Everitt, M. Trevers, and V. Kendon, Phys. Rev. A 81, 042330 (2010).
(17) P. Kurzynski and A. Wojcik, Phys. Rev. Lett. 110, 200404 (2013).
(18) Z. Bian, J. Li, H. Qin, X. Zhan, R. Zhang, B. C. Sanders, and P. Xue, Phys. Rev. Lett. 114, 203602 (2015).
(19) Y.-Y. Zhao, N.-K. Yu, P. Kurzynski, G.-Y. Xiang, C.-F. Li, and G.-C. Guo, Phys. Rev. A 91, 042101 (2015).
(20) C. Darken, J. Chang, and J. Moody, in Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop, Vol. 2 (1992) pp. 3-12.
(21) Y. Bengio, N. Boulanger-Lewandowski, and R. Pascanu, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, New York, 2013) pp. 8624–8628.
(22) M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information (Cambridge University Press, New York, 2010) Chap. 4, Sec. 5, p. 189.
(23) S. Arora, N. Cohen, and E. Hazan, in Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 80 (PMLR, Stockholm, 2018), pp. 244-253.
(24) J. R. Johansson, P. D. Nation, and F. Nori, Comput Phys Commun 184, 1234 (2013).

Appendix A Proof of Theorem 1

Proof of Theorem 1.

Since every unitary operator can be decomposed into a product of two-level unitary operators Nielsen2007Quantum , we only need to show that Theorem 1 stands for $\hat{V}$ of the form

\sum_{i,j=0}^{1}v_{i,j}\left|{c_{i},x_{i}}\middle\rangle\middle\langle{c_{j},x_{j}}\right|+\sum_{\begin{subarray}{c}e\neq(c_{0},x_{0})\\ e\neq(c_{1},x_{1})\end{subarray}}\left|{e}\middle\rangle\middle\langle{e}\right|.

(17)

We prove this by constructing the family of single-site coin operators $\{\hat{c}_{x}^{(t)}\}$ explicitly.

If $c_{0}\neq c_{1}$ , let $t_{\mbox{meet}}$ be the solution to the integer $t$ in

\begin{cases}x_{0}+t\delta_{c_{0}}=x_{1}+t\delta_{c_{1}}\ (\mathrm{mod}\ {n})\\ 0\leq t<n\end{cases}

(18)

and $x_{\mbox{meet}}$ be $x_{0}+t_{\mbox{meet}}\delta_{c_{0}}\ (\mathrm{mod}\ {n})$ . The solution $t_{\mbox{meet}}$ exists and is unique since $\delta_{0}\neq\delta_{1}$ and $\gcd(|\delta_{0}-\delta_{1}|,n)=1$ . Choose $T=n$ and

\hat{c}_{x}^{(t)}=\begin{cases}\sum_{i,j=0}^{1}v_{i,j}\left|{c_{i}}\middle\rangle_{c}\middle\langle{c_{j}}\right|&\mbox{if $t=t_{\mbox{meet}}$ and $x=x_{\mbox{meet}}$}\\ \hat{I}_{c}&\mbox{otherwise}\end{cases}.

(19)

We can verify that this $T$ -step quantum walk realizes the two-level unitary operator $\hat{V}$ by the following calculation

$\displaystyle\hat{U}_{T,0}\|{c,x}\rangle=$	$\displaystyle\hat{U}_{T,t_{m}+1}\hat{U}_{t_{m}+1,t_{m}}\hat{U}_{t_{m},0}\|{c,x}\rangle$	(20)
$\displaystyle=$	$\displaystyle\hat{U}_{T,t_{m}+1}\hat{U}_{t_{m}+1,t_{m}}\|{c,x+t_{m}\delta_{c}\ (\mathrm{mod}\ {n})}\rangle$	(21)
$\displaystyle=$	$\displaystyle\begin{cases}\hat{U}_{T,t_{m}+1}\sum_{i=0}^{1}v_{i,0}\|{c_{i},x_{m}+\delta_{c_{i}}\ (\mathrm{mod}\ {n})}\rangle&\mbox{if $(c,x)=(c_{0},x_{0})$}\\ \hat{U}_{T,t_{m}+1}\sum_{i=0}^{1}v_{i,1}\|{c_{i},x_{m}+\delta_{c_{i}}\ (\mathrm{mod}\ {n})}\rangle&\mbox{if $(c,x)=(c_{1},x_{1})$}\\ \hat{U}_{T,t_{m}+1}\|{c,x+(t_{m}+1)\delta_{c}\ (\mathrm{mod}\ {n})}\rangle&\mbox{otherwise}\end{cases}$	(22)
$\displaystyle=$	$\displaystyle\begin{cases}\sum_{i=0}^{1}v_{i,0}\|{c_{i},x_{m}+(n-t_{m})\delta_{c_{i}}\ (\mathrm{mod}\ {n})}\rangle&\mbox{if $(c,x)=(c_{0},x_{0})$}\\ \sum_{i=0}^{1}v_{i,1}\|{c_{i},x_{m}+(n-t_{m})\delta_{c_{i}}\ (\mathrm{mod}\ {n})}\rangle&\mbox{if $(c,x)=(c_{1},x_{1})$}\\ \|{c,x+n\delta_{c}\ (\mathrm{mod}\ {n})}\rangle&\mbox{otherwise}\end{cases}$	(23)
$\displaystyle=$	$\displaystyle\begin{cases}\sum_{i=0}^{1}v_{i,0}\|{c_{i},x_{i}+n\delta_{c_{i}}\ (\mathrm{mod}\ {n})}\rangle&\mbox{if $(c,x)=(c_{0},x_{0})$}\\ \sum_{i=0}^{1}v_{i,1}\|{c_{i},x_{i}+n\delta_{c_{i}}\ (\mathrm{mod}\ {n})}\rangle&\mbox{if $(c,x)=(c_{1},x_{1})$}\\ \|{c,x}\rangle&\mbox{otherwise}\end{cases}$	(24)
$\displaystyle=$	$\displaystyle\begin{cases}\sum_{i=0}^{1}v_{i,0}\|{c_{i},x_{i}}\rangle&\mbox{if $(c,x)=(c_{0},x_{0})$}\\ \sum_{i=0}^{1}v_{i,1}\|{c_{i},x_{i}}\rangle&\mbox{if $(c,x)=(c_{1},x_{1})$}\\ \|{c,x}\rangle&\mbox{otherwise}\end{cases}$	(25)

, where $\hat{U}_{t_{1},t_{0}}$ stands for $\mathcal{T}\prod_{t=t_{0}}^{t_{1}-1}\hat{S}\hat{C}^{(t)}$ and $t_{m}$ , $x_{m}$ stands for $t_{\mbox{meet}}$ and $x_{\mbox{meet}}$ respectively.

If $c_{0}=c_{1}$ , let $t_{\mbox{meet}}$ be the unique solution to the integer $t$ in

\begin{cases}x_{0}+t\delta_{\tilde{c}_{0}}=x_{1}+t\delta_{\tilde{c}_{1}}\ (\mathrm{mod}\ {n})\\ 0<t<n\end{cases},

(26)

where $\tilde{c}_{0}=c_{0}$ and $\tilde{c}_{1}=1-c_{1}$ . Denote $x_{0}+t_{\mbox{meet}}\delta_{\tilde{c}_{0}}\ (\mathrm{mod}\ {n})$ as $x_{\mbox{meet}}$ . Choose $T=2n$ and

\hat{c}_{x}^{(t)}=\begin{cases}\hat{\sigma}_{x}&\mbox{if $t=0$ or $n$, and $x=x_{1}$}\\ \sum_{i,j=0}^{1}v_{i,j}\left|{\tilde{c}_{i}}\middle\rangle_{c}\middle\langle{\tilde{c}_{j}}\right|&\mbox{if $t=t_{\mbox{meet}}$ and $x=x_{\mbox{meet}}$}\\ \hat{I}_{c}&\mbox{otherwise}\end{cases}.

(27)

It is easy to verify that this is a realization of the two-level unitary operator $\hat{V}$ . ∎

Appendix B Implementing the Fourier transformation

In this section, we demonstrate the calculation to implement the four-by-four Fourier transformation. Firstly, we decompose the Fourier transformation $\mathrm{QFT}=\hat{u}_{6}\hat{u}_{5}\hat{u}_{3}\hat{u}_{3}\hat{u}_{2}\hat{u}_{1}$ Nielsen2007Quantum , where

\hat{u}_{1}=\frac{1}{2}\begin{bmatrix}2&0&0&0\\ 0&2&0&0\\ 0&0&\sqrt{2}&\sqrt{2}\\ 0&0&-\sqrt{2}i&\sqrt{2}\end{bmatrix},

(28)

\hat{u}_{2}=\frac{1}{3}\begin{bmatrix}3&0&0&0\\ 0&-\sqrt{3}&-\sqrt{6}&0\\ 0&\sqrt{6}&-\sqrt{3}&0\\ 0&0&0&3\end{bmatrix},

(29)

\hat{u}_{3}=\frac{1}{4}\begin{bmatrix}4&0&0&0\\ 0&4&0&0\\ 0&0&-1+3i&\sqrt{3}(i-1)\\ 0&0&\sqrt{3}(i+1)&-1-3i\end{bmatrix},

(30)

\hat{u}_{4}=\frac{1}{2}\begin{bmatrix}1&-\sqrt{3}&0&0\\ \sqrt{3}&1&0&0\\ 0&0&2&0\\ 0&0&0&2\end{bmatrix},

(31)

\hat{u}_{5}=\frac{1}{3}\begin{bmatrix}3&0&0&0\\ 0&\sqrt{3}&-\sqrt{6}&0\\ 0&\sqrt{6}&\sqrt{3}&0\\ 0&0&0&3\end{bmatrix},

(32)

\hat{u}_{6}=\frac{1}{2}\begin{bmatrix}2&0&0&0\\ 0&2&0&0\\ 0&0&\sqrt{2}&-\sqrt{2}\\ 0&0&\sqrt{2}&\sqrt{2}\end{bmatrix}.

(33)

All these $\hat{u}_{i}$ are two-level unitary operators. By comparing $\hat{u}_{i}$ with Eq. (17) we can find $c_{0},c_{1},x_{0},x_{1}$ for each $\hat{u}_{i}$ . Then we substitute $c_{0},c_{1},x_{0},x_{1}$ with their value in Eqs. (18) and (19) if $c_{0}=c_{1}$ or Eqs. (26) and (27) if $c_{0}\neq c_{1}$ to find out the DTQW for implementing each $\hat{u}_{i}$ . The DTQW for each $\hat{u}_{i}$ is combined one after another in the temporal order of $\hat{u}_{i}$ to form a large DTQW. In other words, the walker first walks according to the DTQW for implementing $\hat{u}_{1}$ . After the DTQW for implementing $\hat{u}_{1}$ is finished, the walker continues to walk according to the DTQW for implementing $\hat{u}_{2}$ , then $\hat{u}_{3}$ , $\hat{u}_{4}$ , etc. The single-site coin operators $\hat{c}_{x}^{(t)}$ of the final combined DTQW for implementing the quantum Fourier transformation are shown in the following table, where X stands for the Pauli $x$ matrix and I stands for the identity matrix.

	0	1	2	3	4	5	6	7	8	9
0	I	I	I	I	I	I	I	I	I	I
1	X	$\frac{\sqrt{2}}{2}\begin{pmatrix}1&-i\\ 1&1\end{pmatrix}$	X	I	I	$-\frac{\sqrt{3}}{3}\begin{pmatrix}1&\sqrt{2}\\ -\sqrt{2}&1\end{pmatrix}$	X	$-\frac{1}{4}\begin{pmatrix}1+3i&\sqrt{3}(1+i)\\ \sqrt{3}(1-i)&1-3i\end{pmatrix}$	X	I
	10	11	12	13	14	15	16	17	18	19
0	I	$\frac{1}{2}\begin{pmatrix}1&-\sqrt{3}\\ \sqrt{3}&1\end{pmatrix}$	I	I	I	I	I	I	I	I
1	X	I	X	I	I	$\frac{\sqrt{3}}{3}\begin{pmatrix}1&-\sqrt{2}\\ \sqrt{2}&1\end{pmatrix}$	X	$\frac{\sqrt{2}}{2}\begin{pmatrix}1&1\\ -1&1\end{pmatrix}$	X	I

Appendix C Optimization of depth required

We show in this section that any unitary operator $\hat{V}\in\mathrm{U}(2n)$ can be realized with a DTQW-based neural network of depth $2n^{2}-2n+1$ by constructing the implementation.

Before the actual construction, we first introduce the follow lemma so that the total effect of our DTQW-based neural networks becomes more distinct.

Lemma 1.

For any $\hat{V}\in\mathrm{U}(2n)$ , it is realizable by a $T$ -step DTQW on an $n$ -cycle if and only if

\left[\mathcal{T}\prod_{\tau=0}^{T-1}\left(\prod_{\xi=0}^{n-1}\hat{U}^{(\xi+\tau\delta_{0},\tau)}_{\left|{0,\xi}\right\rangle,\left|{1,\xi+\tau\delta}\right\rangle}\right)\right]\hat{V}^{\dagger}\hat{S}^{T}=\hat{I}

(34)

for a family of two-level unitary operators $\left\{\hat{U}^{(\xi,\tau)}_{\left|{0,\xi}\right\rangle,\left|{1,\xi+\tau\delta}\right\rangle}\right\}$ indexed by the set $\{(\xi,\tau):0\leq\xi<n\mbox{ and }0\leq\tau<T\}$ , where $\hat{U}^{(\xi,\tau)}_{\left|{0,\xi}\right\rangle,\left|{1,\xi+\tau\delta}\right\rangle}$ is a two-level unitary acting on the subspace spanned by $\{\left|{0,\xi}\right\rangle,\left|{1,\xi+\tau\delta}\right\rangle\}$ , and $\delta=\delta_{0}-\delta_{1}$ .

This lemma is proved by the following calculation:

$\displaystyle\hat{U}_{T,0}$	$\displaystyle=$	$\displaystyle\mathcal{T}\prod_{t=0}^{T-1}\hat{S}\hat{C}^{(t)},$	(35)
$\displaystyle\hat{U}_{T,0}$	$\displaystyle=$	$\displaystyle\mathcal{T}\prod_{t=0}^{T-1}\Bigg{[}\hat{S}\cdot\prod_{x=0}^{n-1}\Bigg{(}\begin{aligned} &\hat{c}_{x}^{(t)}\otimes\left\|{x}\middle\rangle\middle\langle{x}\right\|\\ &+\sum_{\xi\neq x}\hat{I}_{x}\otimes\left\|{\xi}\middle\rangle\middle\langle{\xi}\right\|\Bigg{)}\Bigg{]},\end{aligned}$	(36)
$\displaystyle\hat{U}_{T,0}$	$\displaystyle=$	$\displaystyle\mathcal{T}\prod_{t=0}^{T-1}\Bigg{[}\begin{aligned} &\hat{S}\cdot\hat{S}^{t}\cdot\hat{S}^{-t}\prod_{x=0}^{n-1}\Bigg{(}\hat{c}_{x}^{(t)}\otimes\|x\rangle\langle x\|\\ &+\sum_{\xi\neq x}\hat{I}_{x}\otimes\|\xi\rangle\langle\xi\|\Bigg{)}\hat{S}^{t}\cdot\hat{S}^{-t}\Bigg{]},\end{aligned}$	(37)

	$\displaystyle\hat{U}_{T,0}$	$\displaystyle=$	$\displaystyle\mathcal{T}\prod_{t=0}^{T-1}\Bigg{[}\begin{aligned} &\hat{S}^{t+1}\prod_{x=0}^{n-1}\Bigg{(}\hat{S}^{-t}\cdot\hat{c}_{x}^{(t)}\otimes\|x\rangle\langle x\|\\ &+\sum_{\xi\neq x}\hat{I}_{x}\otimes\|\xi\rangle\langle\xi\|\cdot\hat{S}^{t}\Bigg{)}\cdot\hat{S}^{-t}\Bigg{]},\end{aligned}$		(38)
	$\displaystyle\hat{U}_{T,0}$	$\displaystyle=$	$\displaystyle S^{T}\cdot\mathcal{T}\prod_{t=0}^{T-1}\left(\prod_{x=0}^{n-1}\hat{U}^{(x,t)}\right).$		(39)

Notice that if $\xi+t\cdot\delta_{c}\neq x$ ,

\hat{U}^{(x,t)}|c,\xi\rangle=|c,\xi\rangle.

(40)

Hence, $\hat{U}^{(x,t)}$ is a two-level unitary, and the possible nonidentity effect subspace is spanned by $\left\{\left|0,x-t\delta_{0}\right\rangle,\left|1,x-t\delta_{1}\right\rangle\right\}$ . Thus

	$\displaystyle\hat{U}_{T,0}$	$\displaystyle=$	$\displaystyle S^{T}\cdot\mathcal{T}\prod_{t=0}^{T-1}\left(\prod_{x=0}^{n-1}\hat{U}_{\left\|0,x-t\delta_{0}\right\rangle,\left\|1,x-t\delta_{1}\right\rangle}^{(x,t)}\right)$		(41)
		$\displaystyle=$	$\displaystyle S^{T}\cdot\mathcal{T}\prod_{t=0}^{T-1}\left(\prod_{\xi=0}^{n-1}\hat{U}_{\|0,\xi\rangle,\|1,\xi+t\delta\rangle}^{\left(\xi+t\delta_{0,t}\right)}\right),$		(41)

where $\delta=\delta_{0}-\delta_{1},\xi=x-t\delta_{0}$ . By moving all shift operators in Eq. (4) to the far left, this lemma is proved.

With this lemma, we can finally start our construction of the implementation for arbitrary unitary operators $\hat{V}$ . Let us denote

\hat{V}_{t}=\left\{\begin{array}[]{ll}\hat{V}_{t}\cdot\hat{V}_{t-1}&\mbox{if $t\geq 2$}\\ \prod_{\xi=0}^{n-1}\hat{U}_{|0,\xi\rangle,|1,\xi+t\delta\rangle}^{\left(\xi+t\delta_{0,t}\right)}\cdot\hat{S}^{-1}&\mbox{if $t=1$};\\ \hat{V}_{0}\cdot\hat{S}^{-1}&\mbox{if $t=0$}\end{array}\right.

(42)

if $2kn\leqslant\tau<2kn+n-k$ and $\xi=(k-1)\delta\ (\mathrm{mod}\ {n})$ :

\hat{U}^{\left(\xi+\tau\delta_{0},\tau\right)}=\hat{U}_{x|1,\xi+\tau\delta\rangle}^{\left(\xi+\tau\delta_{0},\tau\right)}\left(\tilde{V}_{\tau}|0,k\delta\rangle\right),

(43)

if $2kn\leqslant\tau<2kn+n-k-1$ and $\xi=-\delta\ (\mathrm{mod}\ {n})$ :

\hat{U}^{\left(\xi+\tau\delta_{0},\tau\right)}=\hat{U}_{x|0,\xi\rangle}^{\left(\xi+\tau\delta_{0},\tau\right)}\left(\tilde{V}_{\tau}|0,k\delta\rangle\right),

(44)

if $2kn+n+k\leqslant\tau<2(k+1)n$ and $\xi=(k-1-t)\delta\ (\mathrm{mod}\ {n})$ :

\hat{U}^{\left(\xi+\tau\delta_{0},\tau\right)}=\hat{U}_{x|0,\xi\rangle}^{\left(\xi+\tau\delta_{0},\tau\right)}\left(\tilde{V}_{\tau}|1,k\delta\rangle\right),

(45)

if $(2k+1)n\leqslant\tau<2(k+1)n-k$ and $\xi=k\delta\ (\mathrm{mod}\ {n})$ :

\hat{U}^{\left(\xi+\tau\delta_{0},\tau\right)}=\hat{U}_{x|1,\xi+\tau\delta\rangle}^{\left(\xi+\tau\delta_{0},\tau\right)}\left(\tilde{V}_{\tau}|1,k\delta\rangle\right),

(46)

where $\delta=\delta_{0}-\delta_{1}$ , $k=\lfloor\frac{\tau}{2n}\rfloor$ , and $U_{x|\varphi\rangle}^{\left(\xi+\tau\delta_{0},\tau\right)}(|\psi\rangle)$ is any two-level unitary subject to $\left\langle\varphi\left|U_{x|\varphi\rangle}^{\left(\xi+\tau\delta_{0},\tau\right)}\right|\psi\right\rangle=0$ . One can easily verify that such $U_{x|\varphi\rangle}^{\left(\xi+\tau\delta_{0},\tau\right)}(|\psi\rangle)$ always exists as long as $|\psi\rangle=|0,\xi\rangle$ or $|1,\xi+\tau\delta\rangle$ .

For induction on $\lfloor\frac{t}{2n}\rfloor$ , let $t=2n^{2}-2n+1$ , With $c_{x}^{(t)}=\tensor{[}_{p}]{\left\langle{x}\right|}{}\hat{S}^{t}\hat{U}^{(x,t)}\hat{S}^{-t}|{x}\rangle_{p}$ , $\forall c\leqslant 1$ , $\forall l<\lfloor\frac{t}{2n}\rfloor$ , we have

\hat{V}_{t}|c,l\delta\rangle=|c,l\delta\rangle.

(47)

Appendix D Controlling large systems via DTQW-based neural network

In Sec. IV, we mentioned the possibility of controlling a large system via the DTQW indirectly by controlling the $2$ -level coin system. As shown in Fig. 12, this is actually feasible, indicated by the numerical results, when the desired operation on the position system is unitary.

Not only unitary operations can be realized in this indirect controlling fashion, but more general quantum operations such as POVM measurements can also be realized, as shown in Fig. 13. To apply gradient descent in this situation, the loss is defined as

L_{|{\psi}\rangle_{p}}=\frac{1}{2}\sum_{j=0}^{1}\left\lVert|{\psi^{(T)}_{j}}\rangle_{p}-|{\phi^{(T)}_{j}}\rangle_{p}\right\rVert^{2},

(48)

where $|{\psi^{(T)}_{j}}\rangle_{p}=\tensor{[}_{c}]{\left\langle{j}\right|}{}\hat{U}_{t,0}|{0}\rangle_{c}|{\psi}\rangle_{p}$ , and $|{\phi^{(T)}_{j}}\rangle_{p}=\hat{M}_{j}|{\psi}\rangle_{p}$ . This loss is well-selected by us so that the form of partial derivatives in Eq. (10) needs no modification, i.e.,

\frac{\partial{L_{|{\Psi}\rangle}}}{\partial{\alpha^{(x,t)}_{j}}}=\operatorname{Im}\left(\langle{\Phi^{(t)}}|{\hat{\Sigma}_{j}^{(x,t)}}|{\Psi^{(t)}}\rangle\right),

(49)

where $|{\Psi^{(t)}}\rangle=\hat{U}_{t,0}|{0}\rangle_{c}|{\psi}\rangle_{p}$ , $|{\Phi^{(t)}}\rangle=\hat{U}_{t,0}^{\dagger}|{\Phi^{(T)}}\rangle$ , and $|{\Phi^{(T)}}\rangle=\sum_{j=0}^{1}|{j}\rangle_{c}|{\phi^{(T)}_{j}}\rangle_{p}$ . The distance between two measurements $\{\hat{N}_{j}=\tensor{[}_{c}]{\left\langle{j}\right|}{}\hat{U}_{t,0}|{0}\rangle_{c}\}_{j=0}^{1}$ and $\{\hat{M}_{j}\}_{j=0}^{1}$ in Fig. 13 is measured by

d(\{\hat{N}_{j}\}_{j=0}^{1},\{\hat{M}_{j}\}_{j=0}^{1})\\ =\frac{1}{2n\sqrt{2}}\sum_{j=0}^{1}\sqrt{\mathrm{tr}^{2}(\hat{M}_{j}^{\dagger}\hat{M}_{j})+\mathrm{tr}^{2}(\hat{N}_{j}^{\dagger}\hat{N}_{j})-2|\mathrm{tr}(\hat{N}_{j}^{\dagger}\hat{M}_{j})|^{2}}.

(50)

Appendix E Meta parameters used in numerical simulation

For all numerical simulations, the $\delta_{0}$ and $\delta_{1}$ for the shift operator $\hat{S}$ are set to be $0$ and $1$ respectively. And all the real initial parameters $\alpha^{(x,t)}_{j}$ in the coin operators before the training are randomly sampled from $[-2\pi,2\pi]$ uniformly and independently. The training sets are always the Haar-measured pure states from the appropriate Hilbert space. For the desired operator $\hat{V}$ , the number of depth $T$ , the number of sites in the cycle $n$ , the learning rate $\eta$ , the number of samples of DTQW-QNN trained in parallel $N_{\text{sample}}$ and other randomness involved, see the table below [where $\mathrm{U}(2)$ and $\mathrm{U}(4)$ are equipped with corresponding Haar measures].

Figure

\hat{V}

T

n

\eta

N_{\text{sample}}

Other randomness

Fig. 4

\mathrm{SWAP}

0.1

Fig. 5

\mathrm{U}(2)

0.05

200 for each T

Fig. 6

\mathrm{QFT}

2n^{2}

0.05

200 for

n=2,3

50 for

n=4,5

Fig. 7

\mathrm{QFT}

0.01

200 for each T

Fig. 8

\mathrm{QFT}

500

0.05

Fig. 9a

\mathrm{U}(4)

0.05

200

\bm{a}^{(x)}

uniformly sampled from

[0,2\pi]

Fig. 9b

\mathrm{U}(4)

0.05

200

\bm{a}^{(x)}

uniformly sampled from

[0,2\pi]

and shared by all samples

Fig. 10

\mathrm{U}(4)

0.05

200

\bm{a}^{(x)}

uniformly sampled from

[0,2\pi]

and independent for all samples

Fig. 11

\mathrm{QFT}

0.1

100

\bm{a}^{(x)}

uniformly sampled from

[0,2\pi]

and shared by all samples

\bm{\theta}^{(x,t)}

sampled from normal distribution with standard derivation

0.01

\bm{\varphi}^{(x,t)}

uniformly sampled from

[0,2\pi]

Fig. 12

\mathrm{U}(2)

0.01

150

Fig. 13

\mathrm{U}(4)

0.01

150