^†^†thanks: These authors contributed equally to this work and are listed in alphabetical order.^†^†thanks: These authors contributed equally to this work and are listed in alphabetical order.

Quantum convolutional neural network for classical data classification

Tak Hur Department of Physics, Imperial College London, London, SW7 2BW, UK Leeseok Kim Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131, USA Daniel K. Park dkp.quantum@gmail.com Sungkyunkwan University Advanced Institute of Nanotechnology, Suwon, 16419, Republic of Korea

Abstract

With the rapid advance of quantum machine learning, several proposals for the quantum-analogue of convolutional neural network (CNN) have emerged. In this work, we benchmark fully parameterized quantum convolutional neural networks (QCNNs) for classical data classification. In particular, we propose a quantum neural network model inspired by CNN that only uses two-qubit interactions throughout the entire algorithm. We investigate the performance of various QCNN models differentiated by structures of parameterized quantum circuits, quantum data encoding methods, classical data pre-processing methods, cost functions and optimizers on MNIST and Fashion MNIST datasets. In most instances, QCNN achieved excellent classification accuracy despite having a small number of free parameters. The QCNN models performed noticeably better than CNN models under the similar training conditions. Since the QCNN algorithm presented in this work utilizes fully parameterized and shallow-depth quantum circuits, it is suitable for Noisy Intermediate-Scale Quantum (NISQ) devices.

Quantum machine learning and Convolutional neural network and Deep learning

I Introduction

Machine learning techniques with artificial neural networks are ubiquitous in modern society as the ability to make reliable predictions from the vast amount of data is essential in various domains of science and technology. A convolutional neural network (CNN) is one such example, especially for data with a large number of features. It effectively captures spatial correlation within data and learns important features [1], which is shown to be useful for many pattern recognition problems, such as image classification, signal processing, and natural language processing [2]. It has also opened the path to Generative Adversarial Networks (GANs) [3]. CNNs are also rising as a useful tool for scientific research, such as in high energy physics [4, 5], gravitational wave detection [6] and statistical physics [7]. By all means, the computational power required for the success of machine learning algorithms increases with the volume of data, which is increasing at an overwhelming rate. With the potential of quantum computers to outperform any foreseeable classical computers for solving certain computational tasks, Quantum machine learning (QML) has emerged as the potential solution to address the challenge of handling an ever-increasing amount of data. For example, several innovative quantum machine learning algorithms have been proposed to offer speedups over their classical counterparts [8, 9, 10, 11, 12, 13]. Motivated by the benefits of CNN and the potential power of QML, Quantum Convolutional Neural Network (QCNN) algorithms have been developed [14, 15, 16, 17, 18, 19, 20, 21, 22] (see Appendix A for a brief summary and comparison of other approaches to QCNN). Previous constructions of QCNN have reported success in developing efficient quantum arithmetic operations that exactly implement the basic functionalities of classical CNN or in developing parameterized quantum circuits inspired by key characteristics of CNN. While the former likely requires fault-tolerant quantum devices, the latter has been focused on quantum data classification. In particular, Cong et al. proposed a fully parameterized quantum circuit (PQC) architecture inspired by CNN and demonstrated its success for some quantum many-body problems [14]. However, the study of fully parameterized QCNN for performing pattern recognition, such as classification, on classical data is missing.

In this work, we present a fully parameterized quantum circuit model for QCNN that solves supervised classification problems on classical data. In a similar vein to [14], our model only uses two-qubit interactions throughout the entire algorithm in a systematic way. The PQC models—also known as variational quantum circuits [23]—are attractive since they are expected to be suitable for Noisy Intermediate-Scale Quantum (NISQ) hardware [24, 25]. Another advantage of QCNN models for NISQ computing is their intrinsically shallow circuit depth. Furthermore, QCNN models studied in this work exploit entanglement, which is a global property, and hence have the potential to transcend classical CNN that is only able to capture local correlations. We benchmark the performance of the parameterized QCNN with respect to several variables, such as quantum data encoding methods, structures of parameterized quantum circuits, cost functions, and optimizers using two standard datasets, namely MNIST and Fashion MNIST, on Pennylane [26]. The quantum encoding benchmark also examines classical dimensionality reduction methods, which is essential for early quantum computers with a limited number of logical qubits. The various QCNN models tested in this work employs a small number of free parameters, ranging from 12 to 51. Nevertheless, all QCNN models produced high classification accuracy, with the best case being about $99\%$ for MNIST and about $94\%$ for Fashion MNIST. Moreover, we discuss a QCNN model that only requires nearest neighbour qubit interactions, which is a desirable feature for NISQ computing. Comparing classification performances of QCNN and CNN models shows that QCNN is more favorable than CNN under the similar training conditions for both benchmarking datasets.

The remainder of the paper is organized as follows. Section II sets the theoretical framework of this work by describing the classification problem, the QCNN algorithm, and various methods for encoding classical data as a quantum state. Section III describes variables of the QCNN model, such as parameterized quantum circuits, cost functions, and classical data pre-processing methods, that are subject to our benchmarking study. Section IV compares and presents the performance of various designs of QCNN for binary classification of MNIST and Fashion MNIST datasets. Conclusions are drawn and directions for future work are suggested in Section V.

II Theoretical framework

II.1 Classification

Classification is an example of pattern recognition, which is a fundamental problem in data science that can be effectively addressed via machine learning. The goal of $L$ -class classification is to infer the class label of an unseen data point $\tilde{x}\in\mathbbm{C}^{N}$ , given a labelled data set

\mathcal{D}=\left\{(x_{1},y_{1}),\ldots,(x_{M},y_{M})\right\}\subset\mathbbm{C}^{N}\times\{0,1,\ldots,L-1\}.

The classification problem can be solved by training a parameterized quantum circuit. Hereinafter, we refer to fully parameterized quantum circuits trained for machine learning tasks as Quantum Neural Network (QNN). For this supervised classification task, a QNN is trained by optimizing the parameters of quantum gates so as to minimize the cost function

C(\boldsymbol{\theta})=\sum_{i=1}^{M}\alpha_{i}c(y_{i},f(x_{i},\boldsymbol{\theta}))

where $f(x_{i},\boldsymbol{\theta})$ is the machine learning model defined by the set of parameters $\boldsymbol{\theta}$ that predicts the label of $x_{i}$ , $c(a,b)$ quantifies the dissimilarity between $a$ and $b$ , and $\alpha_{i}$ is a weight that satisfies $\sum_{i=1}^{M}\alpha_{i}=1$ . After the training is finished, the class label for the unseen data point $\tilde{x}$ is determined as $\tilde{y}=f(\tilde{x},\boldsymbol{\theta}^{*}),$ where $\boldsymbol{\theta}^{*}=\arg\min_{\boldsymbol{\theta}}C(\boldsymbol{\theta})$ . If the problem is restricted to binary classification (i.e. $L=2$ ), the class label can be inferred from a single-qubit von Neumann measurement. For example, the sign of an expectation value of $\sigma_{z}$ observable can represent the binary label [27]. Hereinafter, we focus on the binary classification, albeit potential future work towards multi-class classification will be discussed in Sec. V.

II.2 Quantum Convolutional Neural Network

An interesting family of quantum neural networks utilizes tree-like (or hierarchical) structures [28] with which the number of qubits from a preceding layer is reduced by a factor of two for the subsequent layer. Such architectures consist of $O(\log(n))$ layers for $n$ input qubits, thereby permitting shallow circuit depth. Moreover, they can avoid one of the most critical problems in the PQC based algorithms known as “barren plateau”, thereby guaranteeing the trainability [29]. These structures also make a natural connection to the tensor network, which serves as a useful ground for exploring many-body physics, neural networks, and the interplay between them.

The progressive reduction of the number of qubits is analogous to the pooling operation in CNN. A distinct feature of the QCNN architecture is the translational invariance, which forces the blocks of parameterized quantum gates to be identical within a layer. The quantum state resulting from an $i$ th layer of QCNN can be expressed as

\outerproduct{\psi_{i}(\boldsymbol{\theta}_{i})}{\psi_{i}(\boldsymbol{\theta}_{i})}=\Tr_{B_{i}}(U_{i}(\boldsymbol{\theta}_{i})\outerproduct{\psi_{i-1}}{\psi_{i-1}}U_{i}(\boldsymbol{\theta}_{i})^{\dagger}),

(1)

where $\Tr_{B_{i}}(\cdot)$ is the partial trace operation over subsystem $B_{i}\in\mathbb{C}^{2^{n/2^{i}}}$ , $U_{i}$ is the parameterized unitary gate operation that includes quantum convolution and the gate part of pooling, and $|\psi_{0}\rangle=|0\rangle^{\otimes n}$ . Following the existing nomenclature, we refer to the structure (or template) of the parameterized quantum circuit as ansatz. In our QCNN architecture, $U_{i}$ always consists of two-qubit quantum circuit blocks, and the convolution and pooling part each uses the identical quantum circuit blocks within the given layer. Since a two-qubit gate requires 15 parameters at most [30], in $i$ th layer consisting of $l_{i}>0$ independent convolutional filter and one pooling operation the maximum number of parameters subject to optimization is $15(l_{i}+1)$ . Then the total number of parameters is at most $15\sum_{i=1}^{\log_{2}(n)}(l_{i}+1)$ if the convolution and pooling operations are iterated until only one qubit remains. One can also consider an interesting hybrid architecture in which the QCNN layers are stacked until $m$ qubits are remaining and then a classical neural network takes over from the $m$ qubit measurement outcomes. In this case, the number of quantum circuit parameters is less than the maximum number given above. Usually, $l_{i}$ is set to be a constant. Therefore, the number of parameters subject to optimization grows as $O(\log(n))$ , which is an exponential reduction compared to the general hierarchical structure discussed in Ref. [28]. This also implies that the number of parameters can be suppressed double-exponentially with the size of classical data if the exponentially large state space is fully utilized for encoding the classical data. An example quantum circuit for a QCNN algorithm with eight qubits for binary classification is depicted in Fig. 1.

Refer to caption — Figure 1: A schematic of the QCNN algorithm for an example of 8 input qubits. The quantum circuit consists of three parts: quantum data encoding (green rectangle), convolutional filters (blue rounded rectangle), and pooling (red circle). The quantum data encoding is fixed in a given structure of QCNN, while the convolutional filter and pooling use parameterized quantum gates. There are three layers in this example, and in each layer, multiple convolutional filters can be applied. The number of filters for $i$ th layer is denoted by $l_{i}$ . In each layer, the convolutional filter applies the same two-qubit ansatz to nearest neighbour qubits in a translationally-invariant way. Similarly, pooling operations within the layer use the same ansatz. In this example, the pooling operation is represented as a controlled unitary transformation, which is activated when the control qubit is $1$ . However, general controlled operations can also be considered. The measurement outcome of the quantum circuit is used to calculate the user-defined cost function. The classical computer is used to compute the new set of parameters based on the gradient, and the quantum circuit parameters are updated accordingly for the subsequent round.

Generalizing Fig. 1 to larger systems can be done simply by connecting all neighboring qubits with the two-qubit parameterized gates in the translationally invariant way.

The optimization of the gate parameters can be carried out by iteratively updating the parameters based on the gradient of the cost function until some condition for the termination is reached. The cost function gradient can be calculated classically or by using a quantum computer via parameter-shift rule [31, 32, 33]. When the parameter-shift rule is used, QCNN requires an exponentially smaller number of quantum circuit executions compared to the general hierarchical structures inspired by tensor networks (e.g. tree tensor network) in Ref. [28]. While the latter uses $O(n)$ runs, the former only uses $O(\log(n))$ runs.

II.3 Quantum data encoding

Many machine learning techniques transform input data $\mathcal{X}$ into a different space to make it easier to work with. This transformation $\phi:\mathcal{X}\rightarrow\mathcal{X}^{\prime}$ is often called a feature map. In quantum computing, the same analogy can be applied to perform a quantum feature map, which acts as $\phi:\mathcal{X}\rightarrow\mathcal{H}$ where the vector space $\mathcal{H}$ is a Hilbert space [34]. In fact, such feature mapping is mandatory when one uses quantum machine learning on classical data, since classical data must be encoded as a quantum state [11, 9, 35, 36, 37, 38]. The quantum feature map $x\in\mathcal{X}\rightarrow\ket{\phi(x)}\in\mathcal{H}$ is equivalent to applying a unitary transformation $U_{\phi}(x)$ to the initial state $\ket{0}^{\otimes{n}}$ to produce $U_{\phi}(x)\ket{0}^{\otimes{n}}=\ket{\phi(x)}$ , where $n$ is the number of qubits. This refers to the green rectangle in Fig. 1.

There exist numerous structures of $U_{\phi}(x)$ to encode the classical input data $x$ into a quantum state. In this work, we benchmark the performance of the QCNN algorithm with several different quantum data encoding techniques. These techniques are explained in detail in this section.

II.3.1 Amplitude encoding

One of the most general approaches to encode classical data as a quantum state is to associate normalized input data with probability amplitudes of a quantum state. This encoding scheme is known as the amplitude encoding (AE). The amplitude encoding represents input data of $x=(x_{1},...,x_{N})^{T}$ of dimension $N=2^{n}$ as amplitudes of an $n$ -qubit quantum state $\ket{\phi(x)}$ as

U_{\phi}(x):x\in\mathbb{R}^{N}\rightarrow\ket{\phi(x)}=\frac{1}{\|x\|}\sum_{i=1}^{N}x_{i}\ket{i},

(2)

where $\ket{i}$ is the $i$ th computational basis state. Clearly, with amplitude encoding, a quantum computer can represent exponentially many classical data. This can be of great advantage in QCNN algorithms. Since the number of parameters subject to optimization scales as $O(\log(n))$ (see Sec. II.2), the amplitude encoding reduces the number of parameters doubly-exponentially with the size (i.e. dimension) of the classical data. However, the quantum circuit depth for amplitude encoding usually grows as $O(poly(N))$ , although there exists a method to reduce it to $O(\log(N))$ at the cost of increasing the number of qubits to $O(N)$ [38].

II.3.2 Qubit encoding

The computational overhead of amplitude encoding motivates qubit encoding, which uses a constant quantum circuit depth while using $O(N)$ number of qubits. The qubit encoding embeds one classical data point $x_{i}$ , that is rescaled to lie between $0$ and $\pi$ , into a single qubit as $\ket{\phi(x_{i})}=\cos(\frac{x_{i}}{2})\ket{0}+\sin(\frac{x_{i}}{2})\ket{1}$ for $i=1,...,N$ . Hence, the qubit encoding maps input data of $x=(x_{1},\ldots,x_{N})^{T}$ to $N$ qubits as

	$\displaystyle U_{\phi}(x):x\in\mathbb{R}^{N}\rightarrow$	$\displaystyle\ket{\phi(x)}$
		$\displaystyle=\bigotimes_{i=1}^{N}(\cos(\frac{x_{i}}{2})\ket{0}+\sin(\frac{x_{i}}{2})\ket{1}),$		(3)

where $x_{i}\in[0,\pi)$ for all $i$ . The encoding circuit can be expressed with a unitary operator $U_{\phi}(x)=\bigotimes_{j=1}^{N}U_{x_{j}}$ where

U_{x_{j}}=e^{-i\frac{x_{j}}{2}\sigma_{y}}:=\begin{bmatrix}\cos(\frac{x_{j}}{2})&-\sin(\frac{x_{j}}{2})\\ \sin(\frac{x_{j}}{2})&\cos(\frac{x_{j}}{2})\end{bmatrix}.

II.3.3 Dense qubit encoding

In principle, since a quantum state of one qubit can be described with two real-valued parameters, two classical data points can be encoded in one qubit. Thus the qubit encoding described above can be generalized to encode two classical vectors per qubit by using rotations around two orthogonal axes [39]. By choosing them to be the $x$ and $y$ axes of the Bloch sphere, this method, which we refer to as dense qubit encoding, encodes $x_{j}=(x_{j_{1}},x_{j_{2}})$ into a single qubit as

\ket{\phi(x_{j})}=e^{-i\frac{x_{j_{2}}}{2}\sigma_{y}}e^{-i\frac{x_{j_{1}}}{2}\sigma_{x}}\ket{0}.

Hence, the dense qubit encoding maps an $N$ -dimensional input data $x=(x_{1},\ldots,x_{N})^{T}$ to $N/2$ qubits as

	$\displaystyle U_{\phi}(x):x\in\mathbb{R}^{N}\rightarrow$	$\displaystyle\ket{\phi(x)}$
		$\displaystyle=\bigotimes_{j=1}^{N/2}\left(e^{-i\frac{x_{N/2+j}}{2}\sigma_{y}}e^{-i\frac{x_{j}}{2}\sigma_{x}}\ket{0}\right).$		(4)

Note that there is freedom to choose which pair of classical data to be encoded in one qubit. In this work, we chose the pairing as shown in Eq. (II.3.3), but one may choose to encode $x_{2j-1}$ and $x_{2j}$ in $j$ th qubit.

II.3.4 Hybrid Encoding

As shown in previous sections, the amplitude encoding is advantageous when the quantum circuit width (i.e. the number of qubits) is considered while the qubit encoding is advantageous when the quantum circuit depth is considered. These two encoding schemes represent the extreme ends of the quantum circuit complexities for loading classical data into a quantum system. In this section, we introduce simple hybrid encoding methods to compromise the quantum circuit complexity between these two extreme ends. In essence, the hybrid encoding implements the amplitude encoding to a number of independent blocks of qubits in parallel. Let us denote the number of qubits in each independent block that amplitude-encodes classical data by $m$ . Then each block can encode $O(2^{m})$ classical data. Let us also denote that there are $b$ such blocks of $m$ qubits by $b$ . Then the quantum system of $b$ blocks contain $b2^{m}$ classical data. The first hybrid encoding, which we refer to as hybrid direct encoding (HDE), can be expressed as

U_{\phi}(x):x\in\mathbb{R}^{N}\rightarrow\ket{\phi(x)}=\bigotimes_{j=1}^{b}\left(\frac{1}{\|x\|_{j}}\sum_{i=1}^{2^{m}}x_{ij}\ket{i}_{j}\right).

(5)

Note that each block can have a different normalization constant, and hence the amplitudes may not be a faithful representation of the data unless the normalization constant have similar values. To circumvent this problem, we also introduce hybrid angle encoding (HAE), which can be expressed as

		$\displaystyle\ket{\phi(x)}$
		$\displaystyle=\bigotimes_{k=1}^{b}\left(\sum_{i=1}^{2^{m}}\prod_{j=0}^{m-1}\cos^{1-\mathtt{i}_{j}}\left(x_{g(j),k}\right)\sin^{\mathtt{i}_{j}}\left(x_{g(j),k}\right)\ket{i}_{k}\right),$		(6)

where $\mathtt{i}\in\{0,1\}^{m}$ is the binary representation of $i$ with $\mathtt{i}_{j}$ being the $j+1$ th bit of the bit string, $x_{j,k}$ represents the $j$ th element of the data assigned to the $k$ th block of qubits, and $g(j)=2^{j}+\sum_{l=0}^{j-1}\mathtt{i}_{l}2^{l}$ . In this case, having $b$ block of $m$ qubits allows $b(2^{m}-1)$ classical data to be encoded. The performance of these hybrid encoding methods will be compared in Sec. IV.

Since the hybrid methods are parallelized, the quantum circuit depth is reduced to $O(2^{m})$ where $m<N$ , while the number of qubits is $O(mN/2^{m})$ . Therefore the hybrid encoding algorithms use fewer number of qubits than the qubit encoding and use shallower quantum circuit depth than the amplitude encoding. Finding the best trade-off between the quantum circuit width and depth (i.e. the choice of $m$ ) depends on the specific details of given quantum hardware.

III Benchmark variables

III.1 Ansatz

An important step in the construction of a QCNN model is the choice of ansatz. In general, the QCNN structure is flexible to use an arbitrary two-qubit unitary operation at each convolutional filter and each pooling step. However, we constrain our design such that all convolutional filters use the same ansatz, and the same applies to all pooling operations (but differ from convolutional filters). We later show that the QCNN with fixed ansatz provides excellent results for the benchmarking datasets. While using different ansatz for all filters can be an interesting attempt for further improvements, this will increase the number of parameters to be optimized.

In the following, we introduce a set of convolutional and pooling ansatz (i.e. parameterized quantum circuit templates) used in our QCNN models.

III.1.1 Convolution filter

parameterized quantum circuits for convolutional layers in QCNN are composed of different configurations of single-qubit and two-qubit gate operations. Most circuit diagrams in Fig. 2 are inspired by past studies. For instance, circuit 1 is used as the parameterized quantum circuit for training a tree tensor network (TTN) [28]. Circuits 2, 3, 4, 5, 7, and 8 are taken from the work by Sim et al. [40] which includes the analysis on expressibility and entangling capability of four-qubit parameterized quantum circuits. We modified these quantum circuits to two-qubit forms to utilize them as building blocks of the convolutional layer, which always consists of two qubits. Circuits 7 and 8 are reduced versions of circuits that recorded the best expressibility in the study. Circuit 2 is a two-qubit version of the quantum circuit that exhibited the best entangling capability. Circuits 3, 4 and 5 are drawn from circuits that have balanced significance in both expressibility and entangling capability. Circuit 6 is developed as a proper candidate of two-body Variational Quantum Eigensolver (VQE) entangler in Ref. [41]. This circuit is also known to be able to implement an arbitrary $SO(4)$ gate [42]. In fact, a total VQE entangler can be constructed by linearly arranging the $SO(4)$ gates throughout input qubits. Since this structure is similar to the structure of convolutional layers in QCNN, the $SO(4)$ gate would be a great candidate to be used in the convolution layer. Circuit 9 represents the parameterization of an arbitrary $SU(4)$ gate [30, 20].

III.1.2 Pooling

The pooling layer applies parameterized quantum gates to two qubits and traces out one of the qubits to reduce the two-qubit states to one-qubit states. Similar to the choice of ansatz for the convolutional filter, there exists a variety of choices of two-qubit circuits of the pooling layer. In this work, we choose a simple form of a two-qubit circuit consisting of two free parameters for the pooling layer. The circuit is shown in Fig. 3.

Application of the parameterized gates in the pooling step in conjunction with the convolutional circuit 9 might be redundant since it is already an arbitrary $SU(4)$ gate. Thus for the convolutional circuit 9, we test two QCNN constructions, with and without the parameterized two-qubit circuit in the pooling layer. In the latter, the pooling layer only consists of tracing out one qubit.

III.2 Cost function

The variational parameters of the ansatz are updated to minimize the cost function calculated on the training data set. In this benchmark study, we test the performance of QCNN models with two different cost functions, namely the mean squared error and the cross-entropy loss.

III.2.1 Mean Squared Error

Before training QCNN, we map original class labels of $\{0,1\}$ to $\{1,-1\}$ respectively to associate them with the eigenvalues of the qubit observables. Then the mean squared error (MSE) between predictions and class labels becomes

C(\boldsymbol{\theta})=\frac{1}{N}\sum_{i=1}^{N}(\hat{M_{z}}(\psi_{i}(\boldsymbol{\theta}))-\tilde{y}_{i})^{2},

(7)

where $\hat{M_{z}}(\psi_{i})=\langle\psi_{i}|\sigma_{z}|\psi_{i}\rangle$ is the Pauli-Z expectation value of one qubit state extracted from QCNN for $i$ th training data, and $\tilde{y}_{i}$ $\in\{1,-1\}$ is the label of the corresponding training data (i.e. $\tilde{y}_{i}=1-2y_{i}$ ). Since QCNN performs a single qubit measurement in the $Z$ basis, the final state can be thought of as a mixed state $a_{i}\outerproduct{0}{0}+b_{i}\outerproduct{1}{1}$ . Then minimizing the cost function above with respect to $\boldsymbol{\theta}$ would correspond to forcing $a_{i}$ to be as larger as possible than $b_{i}$ if the $i$ th training data is labelled $0$ , and vice versa if it is labelled $1$ .

III.2.2 Cross-Entropy Loss

Cross-entropy loss is widely used in training classical neural networks. It measures the performance of a classification model whose output is a probability between 0 and 1. Due to the probabilistic property of quantum mechanics, one could consider the cross-entropy loss by considering probabilities of measuring computational basis states in the single-qubit measurement of QCNN. The cross-entropy loss for the $i$ th training data can be expressed as

	$\displaystyle C(\boldsymbol{\theta})=\sum_{i=1}^{N}\Big{[}$	$\displaystyle y_{i}\log\left(\Pr[\psi_{i}(\boldsymbol{\theta})=1]\right)$
		$\displaystyle+(1-y_{i})\log\left(\Pr[\psi_{i}(\boldsymbol{\theta})=0]\right)\Big{]},$		(8)

where $y_{i}\in\{0,1\}$ is the class label and $\Pr[\psi_{i}(\boldsymbol{\theta})=y_{i}]$ is the probability of measuring the computational basis state $\ket{y_{i}}$ from the QCNN circuit.

III.3 Classical data pre-processing

The size of quantum circuits that can be reliably executed on NISQ devices is limited due to the noise and technical challenges of building quantum hardware. Thus the encoding schemes for high dimensional data usually require the number of qubits that are beyond the current capabilities of quantum devices. Therefore, classical dimensionality reduction techniques will be useful in the near-term application of quantum machine learning techniques. In this work, we pre-process data with three classical dimensionality reduction techniques, namely bilinear interpolation, principal component analysis (PCA) [43] and autoencoding (AutoEnc) [44]. For the simulation presented in the following section, amplitude encoding is used only with bilinear interpolation while all other encoding schemes are tested with PCA and autoencoding. Bilinear interpolation and PCA are carried out by utilizing tf.image.resize from TensorFlow and sklearn.decomposition.PCA from scikit-learn, respectively. Autoencoders are capable of modelling complex non-linear functions, while PCA is a simple linear transformation with cheaper and faster computation. Since the pre-processing step should not produce too much computational resource overhead or result in overfitting, we train a simple autoencoder with one hidden layer. The data in the latent space (i.e. hidden layer) is then fed to quantum circuits.

IV Simulation

IV.1 QCNN results overview

This section reports the classical simulation results of the QCNN algorithm for binary classification carried out with Pennylane [26]. The test is performed with two standard datasets, namely MNIST and Fashion MNIST, under various conditions as described in the previous section. Note that the MNIST and Fashion MNIST datasets are 28 $\times$ 28 image data, each with ten classes. Our benchmark focuses on binary classification, and hence we select classes 0 and 1 for both datasets.

The variational parameters in the QCNN ansatze are optimized by minimizing the cost function with an optimizer provided in Pennylane [26]. In particular, we tested Adam [45] and Nesterov moment [46] optimization algorithms. At each iteration $t$ , we create a small batch by randomly selecting data from the training set. Compared to training on the full data set, training on the mini-batch not only reduces simulation time but also helps the gradients to escape from local minima. For both Adam and Nesterov moment optimizers, the batch size was 25 and the learning rate was 0.01. We also fixed the number of iterations to be 200 to speed up the training process. Note that the training can alternatively be stopped at different conditions, such as when the validation set accuracy does not increase for a predetermined number of consecutive runs [28]. The number of training (test) data are 12665 (2115) and 12000 (2000) for MNIST and fashion MNIST datasets, respectively.

Table 1 and 2 show the mean classification accuracy and one standard deviation obtained from five instances with random initialization of parameters. The number of random initialization is chosen to be the same as that of Ref. [28]. The results are obtained for various QCNN models of different convolutional and pooling circuits and data encoding strategies. When benchmarking with the hybrid encoding schemes (i.e. HDE and HAE), we used two blocks of four qubits, which results in having 32 and 30 features encoded in 8 qubits, respectively. For all results presented here, training is done with the cross-entropy loss. Similar results are obtained when MSE is used for training, and we present the MSE results in Appendix C. Here we only report the classification results obtained with the Nesterov optimizer, since it consistently provided better convergence. The ansatze in the table are listed in the same order as the list of convolutional circuits shown in Fig. 2. The last row of the table (i.e. Ansatz 9b) contains the results when the QCNN circuit only consists of the convolutional circuit 9 without any unitary gates in the pooling step.

Classification Accuracy

Ansatz

# of params

Amplitude

Qubit

Dense

HDE

HAE

96.8\pm 5.3

98.0\pm 0.4

91.4\pm 2.3

97.6\pm 1.1

88.4\pm 9.2

68.7\pm 5.1

88.4\pm 2.6

97.9\pm 0.3

77.7\pm 6.0

94.5\pm 3.1

98.2\pm 4.5

98.2\pm 6.6

98.2\pm 0.5

85.6\pm 4.5

62.2\pm 3.2

93.4\pm 5.4

94.7\pm 2.1

80.0\pm 4.0

93.8\pm 4.4

\mathbb{98.5}\pm 0.2

93.3\pm 3.8

96.9\pm 1.6

95.8\pm 1.7

76.4\pm 2.7

95.3\pm 3.4

98.1\pm 0.2

84.4\pm 0.7

97.8\pm 2.4

98.2\pm 0.4

\mathbb{98.5}\pm 1.2

98.2\pm 0.4

\mathbb{97.2}\pm 1.1

70.2\pm 1.3

96.6\pm 1.0

98.0\pm 0.3

\mathbb{90.4}\pm 3.8

96.7\pm 2.1

98.3\pm 0.4

94.9\pm 2.1

98.1\pm 0.5

96.0\pm 1.3

72.6\pm 5.7

93.5\pm 1.3

98.0\pm 0.1

86.5\pm 5.0

97.2\pm 2.2

98.1\pm 0.4

97.7\pm 1.0

98.1\pm 0.3

93.4\pm 0.5

77.4\pm 1.7

97.0\pm 2.0

\mathbb{98.3}\pm 0.2

86.9\pm 7.3

98.3\pm 2.2

98.2\pm 0.3

93.7\pm 4.5

\mathbb{98.7}\pm 2.4

95.1\pm 1.6

74.6\pm 3.2

97.2\pm 2.2

98.2\pm 0.1

90.2\pm 3.0

98.1\pm 0.7

98.3\pm 0.4

{96.9}\pm 2.4

\mathbb{98.7}\pm 0.1

95.4\pm 2.8

\mathbb{79.7}\pm 1.6

96.6\pm 1.7

\mathbb{98.3}\pm 0.1

89.1\pm 2.6

\mathbb{98.4}\pm 0.2

98.4\pm 0.5

96.4\pm 2.3

\mathbb{98.7}\pm 0.4

{96.7}\pm 1.4

78.1\pm 2.8

97.8\pm 2.2

{98.2}\pm 0.2

87.0\pm 5.3

98.3\pm 0.2

{97.7}\pm 0.6

{96.6}\pm 2.2

98.3\pm 0.5

96.5\pm 1.7

77.4\pm 2.5

\textbf{98.0}\pm 1.2

98.1\pm 0.1

88.5\pm 2.8

Table 1: Mean accuracy and one standard deviation of the classification for 0 and 1 in the MNIST dataset when the model is trained with the cross-entropy loss. The mean and the standard deviation are obtained from five repetitions with random initialization of parameters. The first column shows the ansatz label. The second column shows the total number of parameters that are subject to optimization. For qubit, dense and hybrid encoding, two rows indicate the values obtained with different classical data pre-processing, namely PCA and autoencoding, respectively. The best result under each quantum data encoding method is written in bold.

Classification Accuracy

Ansatz

# of params

Amplitude

Qubit

Dense

HDE

HAE

90.9\pm 2.0

83.1\pm 3.3

87.3\pm 4.2

85.0\pm 3.6

84.7\pm 7.2

64.4\pm 3.2

90.5\pm 1.3

83.9\pm 1.7

82.7\pm 4.6

88.2\pm 3.8

87.2\pm 4.6

86.6\pm 4.6

82.2\pm 2.1

86.9\pm 7.7

63.1\pm 1.4

86.0\pm 4.7

84.3\pm 4.1

82.1\pm 5.0

90.1\pm 2.7

{87.7}\pm 3.6

88.4\pm 7.4

87.2\pm 3.0

88.8\pm 2.2

65.5\pm 1.3

91.7\pm 1.5

85.5\pm 1.2

88.1\pm 4.0

89.1\pm 2.2

87.2\pm 2.4

91.6\pm 3.4

89.7\pm 1.6

{91.8}\pm 1.7

64.7\pm 1.9

{91.2}\pm 0.5

84.7\pm 1.1

{88.4}\pm 4.4

90.7\pm 1.1

86.3\pm 2.9

91.9\pm 1.4

87.6\pm 2.3

\textbf{93.7}\pm 1.4

64.1\pm 2.4

92.4\pm 1.3

84.5\pm 1.7

85.9\pm 3.9

90.4\pm 1.7

87.9\pm 3.9

93.6\pm 1.6

88.7\pm 2.4

90.2\pm 1.9

65.7\pm 0.9

94.3\pm 1.2

87.7\pm 2.6

88.5\pm 2.6

88.2\pm 1.4

87.1\pm 3.8

92.2\pm 0.5

{88.7}\pm 2.9

92.7\pm 2.2

66.3\pm 0.9

93.8\pm 0.8

86.7\pm 1.8

89.7\pm 2.9

89.9\pm 1.9

\mathbb{89.8}\pm 0.9

{91.6}\pm 3.0

88.7\pm 2.6

92.2\pm 2.6

\mathbb{66.5}\pm 0.3

93.2\pm 1.5

86.2\pm 1.5

\textbf{91.6}\pm 3.6

\textbf{91.3}\pm 2.3

88.9\pm 2.7

92.4\pm 2.6

{89.2}\pm 2.0

{92.1}\pm 1.3

65.0\pm 1.3

\textbf{94.3}\pm 1.6

88.6\pm 2.0

88.3\pm 3.7

{88.8}\pm 1.6

{88.0}\pm 2.5

\textbf{94.1}\pm 1.1

\mathbb{90.2}\pm 0.5

92.7\pm 1.5

64.7\pm 2.8

92.9\pm 1.3

\mathbb{88.9}\pm 1.6

87.1\pm 3.6

Table 2: Mean accuracy and one standard deviation of the classification for 0 (t-shirt/top) and 1 (trouser) in the Fashion MNIST dataset when the model is trained with the cross-entropy loss. The mean and the standard deviation are obtained from five repetitions with random initialization of parameters. The first column shows the ansatz label. The second column shows the total number of parameters that are subject to optimization. For qubit, dense and hybrid encoding, two rows indicate the values obtained with different classical data pre-processing, namely PCA and autoencoding, respectively. The best result under each quantum data encoding method is written in bold.

The simulation results show that all ansatze perform reasonably well, while the ones with more number of free parameters tend to produce higher score. Since all ansatze perform reasonably well, one may choose to use the ansatz with smaller number of free parameters to save the training time. For example, by choosing ansatz 4 and amplitude encoding, one can achieve $97.8\%$ classification accuracy while using only 24 free parameters total instead of achieving $98.4\%$ at the cost of increasing the number of free parameters to 51. It is also important to note that most of the results obtained with the hybrid data encoding and PCA is considerably worse than the others. This is due to the normalization problem discussed in Sec. II.3.4, which motivated the development of the hybrid angle encoding. We observed that the normalization problem is negligible in the case with autoencoding. The simulation results clearly demonstrates that the normalization problem is resolved by the hybrid angle encoding as expected, and reasonably good results can be obtained with this method. For MNIST, HAE with PCA provides the best solution among the hybrid encoding schemes on average. On the other hand, for Fashion MNIST, HDE with autoencoding provides the best solution among the hybrid encoding schemes on average.

In the following, we also compare the two classical dimensionality reduction methods by presenting the overall mean accuracy and standard deviation values obtained by averaging over all ansatze and the random initializations. The average values are presented in Tab. 3. As discussed before, the HDE with PCA does not perform well for both datasets due to the data normalization issue. Besides this case, interestingly, PCA works better than autoencoding for MNIST data, and vice versa for Fashion MNIST data, thereby suggesting that the choice of the classical pre-processing method should be data-dependent.

		Qubit	Dense	HDE	HAE
MNIST	PCA	$98.0\pm 1.5$	$98.0\pm 1.0$	$73.7\pm 3.0$	$97.8\pm 0.4$
MNIST	AutoEnc	$95.4\pm 3.6$	$94.2\pm 4.6$	$95.4\pm 2.3$	$86.1\pm 4.1$
Fashion MNIST	PCA	$87.3\pm 3.7$	$87.7\pm 3.2$	$65.0\pm 1.6$	$86.1\pm 1.9$
Fashion MNIST	AutoEnc	$91.0\pm 4.1$	$90.6\pm 4.4$	$92.0\pm 1.5$	$87.2\pm 3.8$

Table 3: Comparison of the classical dimensionality reduction methods for angle encoding, dense encoding, and hybrid encoding. For each encoding scheme, classification results from all instances (i.e. various ansatze and random initialization of parameters) are averaged out to produce the mean and standard deviation.

Finally, we also examine how classification performance improves as the number of convolutional filters in each layer increases. For simplicity, we set the number of convolutional filters in each layer to be same, i.e $l_{1}=l_{2}=l_{3}=L$ (see Fig. 1 for the definition of $l_{i}$ ). Without loss of generality, we pick two ansatze and five encodings. For ansatze, we choose the one with the smallest number of free parameters and another with arbitrary $SU(4)$ operations. These are circuit 2 and circuit 9b, and they use 12 and 45 parameters total, respectively. For data encoding, we tested amplitude, qubit, and dense encoding. The qubit and dense encoding are further grouped under two different classical dimensionality reduction techniques, PCA and autoencoding. Since the qubit and dense encoding load 8 and 16 features, respectively, we label them as PCA8, AutoEnc8, PCA16, and AutoEnc16 based on the number of features and the dimensionality reduction techniques. The classification accuracies for $L=\{1,2,3\}$ are plotted in Fig. 4. The simulation results show that in some cases, the classification accuracy can be improved by increasing the number of convolutional filters. For example, the classification accuracy for MNIST data can be improved from about $86\%$ to $96\%$ when circuit 2 and dense encoding with autoencoding are used. For Fashion MNIST, the classification accuracy is improved from about $88\%$ to $90\%$ when circuit 2 and amplitude encoding are used, and from about $86\%$ to $90\%$ when circuit 2 and qubit encoding with PCA is used. However, we do not observe general trend with respect to the number of convolutional filters. In particular, the relationship between the classification accuracy and $L$ is less obvious for circuit 9b. We speculate that this attributes to the fact that circuit 9b implements an arbitrary $SU(4)$ , which is an arbitrary two-qubit gate, and hence repetitive application of an arbitrary $SU(4)$ is redundant.

IV.2 Boundary conditions of the QCNN circuit

The general structure of QCNN shown in Fig. 1 uses two-qubit gates between the first (top) and last (bottom) qubits, which can be thought of as imposing periodic boundary condition. One may notice that all-to-all connectivity can be established even without connecting the boundaries. Thus we tested the classification performance of a QCNN architecture without having the two-qubit gates to close the loop. We refer to this case as the open boundary QCNN. Without loss of generality, we tested QCNNs with two different ansatz, the convolutional circuit 2 (Ansatz 2 in Tabs. 1 and 2) which uses the smallest number of free parameters and the convolutional circuit 9 which implements an arbitrary $SU(4)$ . In case of the latter, pooling was done without parameterized gates, and hence the ansatz is equivalent to ansatz 9b in Tabs. 1 and 2. By imposing the open-boundary condition in conjunction with the ansatz 9b, one can modify the qubit arrangement of the QCNN circuit so as to use nearest-neighbour qubit interactions only. For an example of 8-qubit QCNN circuit, the modified structure is depicted in Fig. 5. Such design is particularly advantageous for NISQ devices that have limited physical qubit connectivity. For example, if one employs the qubit or the dense encoding method, the QCNN algorithm can be implemented with a 1-dimensional chain of physical qubits.

	Ansatz	AE	PCA8	PCA16	AutoEnc8	AutoEnc16
MNIST	2	$90.4\pm{2.5}$	$98.0\pm{0.3}$	$96.3\pm{3.2}$	$97.7\pm{0.1}$	$86.4\pm{4.6}$
MNIST	9b	$98.0\pm{0.4}$	$98.1\pm{0.2}$	$97.8\pm{0.3}$	$96.8\pm{0.7}$	$94.4\pm{1.9}$
Fashion MNIST	2	$91.1\pm 1.4$	$86.2\pm 0.5$	$88.4\pm 3.0$	$83.3\pm 4.6$	$87.2\pm 5.7$
Fashion MNIST	9b	$90.1\pm 1.4$	$87.6\pm 2.4$	$89.1\pm 1.9$	$91.9\pm 1.4$	$92.6\pm 2.0$

Table 4: Mean classification accuracy and one standard deviation of the classification for 0 and 1 in the benchmarking datasets when the QCNN circuit is constructed under the open boundary condition. Each column represents the results produced under a different encoding scheme with the numbers 8 and 16 indicates the qubit and dense encoding, respectively.

The simulation results are presented in Tab. 4 for MNIST and Fashion MNIST datasets. These results are attained with one convolutional filters per layer, i.e. $l_{1}=l_{2}=l_{3}=1$ . The simulation results demonstrate that for the case of two ansatze tested the classification performance between open- and periodic-boundary QCNN circuits are similar. Although the number of free parameters are the same under these conditions, depending on the specification of the quantum hardware such as the qubit connectivity, the open-boundary QCNN circuit can have shallower depth. The open-boundary circuit with ansatz 9b is even more attractive for NISQ devices since the convolutional operations can be done with only nearest-neighbour qubit interactions as mentioned above.

IV.3 Comparison to CNN

We now compare the classification results of QCNN to that of classical CNN. Our goal is to compare the classification accuracy of the two given a similar number of parameters subject to optimization. To make a fair comparison between the two, we fix all hyperparameters of the two methods to be the same, except we used the Adam optimizer for CNN since it performed significantly better than the Nesterov moment optimizer. A detailed description of the classical CNN architecture is provided in Appendix B.

It is important to note that CNN can be trained with such small number of parameters effectively only when the number of nodes in the input layer is small. Therefore, the CNN results are only comparable to that of the qubit and dense encoding cases which requires 8 and 16 classical input nodes, respectively. We designed four different CNN models with the number of free parameters being 26, 34, 44 and 56 to make them comparable to the QCNN models. In these cases, a dimensionality reduction technique must precede. For hybrid and amplitude encoding, which require relatively simpler data pre-processing, the number of nodes in the CNN input layer is too large to be trained with a small number of parameters as in QCNN.

Comparing the values in Tab. 5 with the QCNN results, one can see that QCNN models perform better than their corresponding CNN models for the MNIST dataset. The same conclusion also holds for the Fashion dataset, except for the CNN models with 44 and 56 parameters that achieve similar performance as their corresponding QCNN models. Another noticeable result is that the QCNN models have considerably smaller standard deviations than the CNN models on average. This implies that the QCNN models not only achieve higher classification accuracy than the CNN models under similar training conditions but also are less sensitive to the random initialization of the free parameters.

			Classification Accuracy
	# of params	input size	PCA	AutoEnc
MNIST	26	8	$91.0\pm 12.7$	$82.7\pm 15.2$
	34	16	$97.0\pm 3.5$	$83.5\pm 15.5$
	44	8	$93.3\pm 13.2$	$90.4\pm 13.4$
	56	16	$93.0\pm 13.4$	$95.5\pm 2.3$
Fashion MNIST	26	8	$82.2\pm 16.6$	$86.8\pm 12.7$
	34	16	$78.8\pm 19.1$	$79.0\pm 19.0$
	44	8	$89.4\pm 3.9$	$92.4\pm 2.8$
	56	16	$91.9\pm 2.0$	$93.6\pm 2.2$

Table 5: Mean classification accuracy and one standard deviation obtained with classical CNN for classifying 0 and 1 in the MNIST and Fashion MNIST datasets. Each column is named with the pre-processing method (PCA or AutoEnc). These results directly compare to the second and third columns of Tabs. 1 and 2 denoted by Qubit and Dense.

In Fig. 6, we present two representative examples of the cross-entropy loss as a function of the number of training iterations. For simplicity, we show such data for two cases in MNIST data classification: circuit 9b and qubit encoding with autoencoding, and circuit 9b and dense encoding with PCA. Considering the number of free parameters, these cases are comparable to the CNN models with 8 inputs with autoencoding and 16 inputs with PCA, respectively. Recall that the mean classification accuracy and one standard deviation in QCNN (CNN) is $96.6\pm 2.2$ ( $90.4\pm 13.4$ ) for the first case, and $98.3\pm 0.5$ ( $93.0\pm 13.4$ ) for the second case. Figure 6 shows that in both cases, the QCNN models are trained faster than the CNN models, while the advantage manifests more clearly in the first case. Furthermore, the standard deviations in the QCNN models are significantly smaller than that of the CNN models.

V Conclusion

Fully parameterized quantum convolutional neural networks pave promising avenues for near-term applications of quantum machine learning and data science. This work presented an extensive benchmark of QCNN for solving classification problems on classical data, a fundamental task in pattern recognition. The QCNN algorithm can be tailored with many variables such as the structure of parameterized quantum circuits (i.e. ansatz) for convolutional filters and pooling operators, quantum data encoding methods, classical data pre-processing methods, cost functions and optimizers. To improve the utility of QCNN for classical data, we also introduced new data encoding schemes, namely hybrid direct encoding and hybrid angle encoding, with which the exchange between quantum circuit depth and width for state preparation can be configured. With diverse combinations of the aforementioned variables, we tested 8-qubit QCNN models for binary classification of MNIST and Fashion MNIST datasets by simulation with Pennylane. The QCNN models tested in this work operated with a small number of free parameters, ranging from 12 to 51. Despite the small number of free parameters, QCNN produced high classification accuracy for all instances, with the best case being close to $99\%$ for MNIST and $94\%$ for Fashion MNIST. We also compared QCNN results to CNN and observed that QCNN performed noticeably better than CNN given the similar training conditions for both benchmarking datasets. The comparison between QCNN and CNN is only valid for qubit and dense encoding cases in which the number of input qubits grows linearly with the dimension of the input data. With amplitude or hybrid encoding, the number of input qubits is substantially smaller than the dimension of the data, and hence there is no classical analogue. We speculate that the advantage of QCNN lies in the ability to exploit entanglement, which is a global effect, while CNN is only capable of capturing local correlations.

The QCNN architecture proposed in this work can be generalized for $L$ -class classification through one-vs-one or one-vs-all strategies. It also remains an interesting future work to examine the construction of a multi-class classifier by leaving $\lceil\log_{2}(L)\rceil$ qubits for measurement in the output layer. Another interesting future work is to optimize the data encoding via training methods provided in Ref. [47]. However, since QCNN itself can be viewed as a feature reduction technique, it is not clear whether introducing another layer of the variational quantum circuit for data encoding would help until a thorough investigation is carried out. Understanding the underlying principle for the quantum advantage demonstrated in this work also remains to be done. One way to study this is by testing QCNN models with a set of data that does not exhibit local correlation but contains some global feature while analyzing the amount of entanglement created in the QCNN circuit. Since the circuit depth grows only logarithmically with the number of input qubits and the gate parameters are learned, the QCNN model is expected to be suitable for NISQ devices. However, the verification through real-world experiments and noisy simulations remains to be done. Furthermore, testing the classification performance as the QCNN models grow bigger remains an interesting future work. Finally, the application of the proposed QCNN algorithms for other real-world datasets such as those relevant to high-energy physics and medical diagnosis is of significant importance.

Acknowledgements

This research is supported by the National Research Foundation of Korea (Grant No. 2019R1I1A1A01050161 and 2021M3H3A1038085) and Quantum Computing Development Program (Grant No. 2019M3E4A1080227). We thank Quantum Open Source Foundation as this work was initiated under the Quantum Computing Mentorship program.

Data availability

The source code used in this study is available at https://github.com/takh04/QCNN.

Conflict of interest

The authors declare that they have no conflict of interest.

Appendix A Related works

The term Quantum Convolutional Neural Network (QCNN) appears in several places, but it refers to a number of different frameworks. Several proposals have been made in the past to reproduce classical CNN on a quantum circuit by imitating the basic arithmetic of the convolutional layer for a given filter [15, 19, 21]. Although these algorithms have the potential to achieve exponential speedups against the classical counterpart in the asymptotic limit, they require an efficient means to implement quantum random access memory (QRAM), expensive subroutines such as the linear combination of unitaries or quantum phase estimation with extra qubits, and they work only for specific types of quantum data embedding. Another branch of CNN-inspired QML algorithms focuses on implementing the convolutional filter as a parameterized quantum circuit, which can be stacked by inserting a classical pooling layer in between [16, 17, 18]. Following the nomenclature provided in [17], we refer to this approach as quanvolutioanl neural network to distinguish it from QCNN. The potential quantum advantage of using quanvolutional layers lies in the fact that quantum computers can access kernel functions in high-dimensional Hilbert spaces much more efficiently than classical computers. In quanvolutional NN, a challenge is to find a good structure for the parametric quantum circuit in which the number of qubits equals the size of the filter. This approach is also limited to qubit encoding since each layer requires a quantum embedding which has a non-negligible cost. Furthermore, stacking quanvolutional layers via pooling requires each parameterized quantum circuit to be measured multiple times for the measurement statistics.

Variational quantum circuits with the hierarchical structure consisting of $O(\log(n))$ layers do not exhibit the problem of “barren plateau” [29]. In other words, the precision required in the measurement grows at most polynomially with the system size. This result guarantees the trainability of the fully parameterized QCNN models studied in this work when randomly initializing their parameters. Furthermore, numerical calculations in Ref. [29] show that the cost function gradient vanishes at a slower rate (with $n$ , the number of initial qubits) when all unitary operators in the same layer are identical as in QCNN [14]. The hierarchical structure inspired by tensor network, without translational invariance, was first introduced in Ref. [28]. The hierarchical quantum circuit can be combined with a classical neural network as demonstrated in Ref. [48].

We note in passing that there exist several works proposing the quantum version of perceptron for binary classification [49, 50, 51]. While our QCNN model differs from them as it implements the entire network as a parameterized quantum circuit, interesting future work is to investigate the alternative approach to construct a complex network of quantum artificial neurons developed in the previous works.

Appendix B Classical CNN

In order to compare the classification accuracy of CNN and QCNN in fair conditions, we fixed hyperparameters used in the optimization step to be the same, which include iteration numbers, batch size, optimizer type, and its learning rates. In addition, we modified the structure of CNN in ways that its number of parameters subject to optimization is as close to that used in QCNN as possible. For example, since QCNN attains the best results with about 40 to 50 free parameters, we adjust the CNN structure accordingly. This led us to come up with two CNN, one with the input shape of (8, 1, 1) and another with the input shape of (16, 1, 1). In order to occupy the small number of input nodes for MNIST and Fashion MNIST classification, PCA and autoencoding are used for data pre-processing as done in QCNN. The CNNs go through convolutional and pooling stages twice, followed by a fully connected layer. The number of free parameters used in the CNN models is 26 or 44 for the case of 8 input nodes and 34 or 56 for the case of 16 input nodes. The training also mimics that of QCNN. For every iteration step, 25 data are randomly selected from the training data set, and trained via Adam optimizer with the learning rate of 0.01. We also fixed the number of iterations to be 200 as done in QCNN. The number of training (test) data are 12665 (2115) and 12000 (2000) for MNIST and fashion MNIST datasets, respectively.

Appendix C QCNN simulation results for MSE loss

In Sec. IV of the main text, we presented the Pennylane simulation results of QCNN trained with the cross-entropy loss. When MSE is used as the cost function, similar results are obtained. We report classification results for MNIST and Fashion MNIST data attained from QCNN models that are trained with MSE in Tab. 6 and 7.

Classification Accuracy

Ansatz

# of params

Amplitude

Qubit

Dense

HDE

HAE

92.4\pm 3.1

96.1\pm 3.3

86.1\pm 8.8

91.4\pm 8.6

88.9\pm 7.1

60.3\pm 3.0

83.4\pm 4.0

97.8\pm 1.2

77.9\pm 6.1

88.2\pm 7.1

86.4\pm 7.1

84.2\pm 10.0

91.4\pm 0.3

88.9\pm 1.4

54.4\pm 4.6

87.5\pm 8.6

93.0\pm 2.7

78.7\pm 6.6

94.1\pm 1.7

98.0\pm 1.4

91.6\pm 8.9

98.3\pm 0.1

95.3\pm 3.2

69.8\pm 6.2

94.2\pm 3.8

98.3\pm 0.3

83.5\pm 3.9

90.1\pm 2.0

98.2\pm 0.1

88.8\pm 6.3

84.9\pm 2.5

85.8\pm 6.3

63.2\pm 3.7

94.8\pm 1.1

98.0\pm 0.2

85.5\pm 2.0

91.9\pm 1.7

98.1\pm 0.1

92.7\pm 3.5

98.3\pm 0.1

94.0\pm 2.0

65.7\pm 2.3

91.8\pm 3.5

98.0\pm 0.1

83.0\pm 4.5

96.2\pm 2.0

98.1\pm 0.1

94.4\pm 3.9

97.8\pm 0.2

92.5\pm 1.7

74.4\pm 3.9

94.8\pm 2.8

98.3\pm 0.2

\mathbb{86.6}\pm 4.2

93.2\pm 4.3

98.4\pm 0.1

95.2\pm 4.5

98.0\pm 0.3

92.4\pm 3.0

68.9\pm 3.8

94.7\pm 4.1

98.1\pm 0.2

80.2\pm 3.5

95.2\pm 1.4

\mathbb{98.5}\pm 0.1

\mathbb{97.0}\pm 2.8

98.3\pm 0.1

93.4\pm 2.0

68.5\pm 3.7

94.0\pm 2.8

98.2\pm 0.1

86.1\pm 4.5

97.0\pm 1.0

98.2\pm 1.0

96.2\pm 1.4

98.4\pm 0.1

\mathbb{96.6}\pm 1.7

\mathbb{77.4}\pm 1.8

\textbf{97.2}\pm 1.8

\textbf{98.5}\pm 0.2

85.0\pm 5.8

\mathbb{98.4}\pm 0.1

\mathbb{98.5}\pm 0.3

\mathbb{97.0}\pm 2.0

\mathbb{98.5}\pm 0.1

95.7\pm 1.7

75.9\pm 0.9

96.7\pm 1.7

98.3\pm 0.3

85.9\pm 3.8

Table 6: Mean accuracy and one standard deviation of the classification for 0 and 1 in the MNIST dataset when the QCNN model is trained with MSE. The mean and the standard deviation are obtained from five repetitions with random initialization of parameters. The first column shows the ansatz label. The second column shows the total number of parameters that are subject to optimization. For qubit, dense and hybrid encoding, two rows indicate the values obtained with different classical data pre-processing, namely PCA and autoencoding, respectively. The best result under each quantum data encoding method is written in bold.

Classification Accuracy

Ansatz

\#

of params

Amplitude

Qubit

Dense

HDE

HAE

88.1\pm 3.0

79.6\pm 10.9

80.0\pm 2.4

81.4\pm 5.5

77.8\pm 8.4

58.6\pm 2.4

89.7\pm 1.8

84.2\pm 1.3

83.1\pm 4.3

87.8\pm 3.0

80.0\pm 8.4

70.0\pm 10.0

78.0\pm 5.2

78.5\pm 8.3

54.1\pm 3.5

88.7\pm 2.8

78.8\pm 4.5

81.8\pm 5.1

87.0\pm 3.0

87.6\pm 3.6

85.1\pm 5.7

92.7\pm 2.4

84.7\pm 7.2

61.2\pm 2.9

90.1\pm 1.1

86.0\pm 2.6

86.2\pm 3.3

89.7\pm 1.3

85.4\pm 2.9

81.2\pm 2.2

90.7\pm 1.5

84.2\pm 5.6

62.5\pm 2.0

88.4\pm 4.9

84.0\pm 1.7

88.8\pm 3.6

90.7\pm 1.2

83.8\pm 4.4

81.1\pm 1.4

88.8\pm 3.0

86.2\pm 3.3

60.4\pm 0.9

90.7\pm 1.9

84.7\pm 3.5

84.9\pm 4.8

88.8\pm 3.0

86.6\pm 2.4

86.1\pm 3.0

89.4\pm 4.4

84.6\pm 5.3

63.9\pm 2.2

91.7\pm 1.8

84.4\pm 1.0

85.8\pm 4.0

90.0\pm 1.2

85.4\pm 3.2

86.4\pm 2.7

92.5\pm 3.6

87.7\pm 6.1

64.4\pm 1.8

91.9\pm 1.8

84.6\pm 0.8

89.0\pm 3.2

89.7\pm 2.8

82.3\pm 2.1

85.8\pm 2.2

90.0\pm 2.3

90.1\pm 3.6

64.9\pm 2.5

\mathbb{92.9}\pm 0.7

86.5\pm 1.9

86.3\pm 7.2

90.8\pm 1.2

\mathbb{89.9}\pm 1.9

\mathbb{92.7}\pm 0.4

{88.8}\pm 1.9

\mathbb{93.3}\pm 1.1

\mathbb{66.8}\pm 1.7

92.8\pm 1.3

88.4\pm 1.4

\textbf{90.8}\pm 2.1

\mathbb{91.0}\pm 1.1

{89.4}\pm 2.5

89.1\pm 2.0

\mathbb{93.0}\pm 1.1

{90.4}\pm 3.1

65.6\pm 2.6

\textbf{92.9}\pm 1.6

\mathbb{89.3}\pm 1.5

89.2\pm 4.6

Table 7: Mean accuracy and one standard deviation of the classification for 0 (t-shirt/top) and 1 (trouser) in the Fashion MNIST dataset when the QCNN model is trained with MSE. The mean and the standard deviation are obtained from five repetitions with random initialization of parameters. The first column shows the ansatz label. The second column shows the total number of parameters that are subject to optimization. For qubit, dense and hybrid encoding, two rows indicate the values obtained with different classical data pre-processing, namely PCA and autoencoding, respectively. The best result under each quantum data encoding method is written in bold.

Appendix D Classification with Hierarchical Quantum Classifier

The hierarchical structure inspired by tensor network named as hierarchical quantum classifier (HQC) was first introduced in Ref. [28]. The HQC therein does not enforce translational invariance, and hence the number of free parameters subject to optimization grows as $O(n)$ for a quantum circuit with $n$ input qubits. Although the simulation presented in the main manuscript aims to benchmark the classification performance of the QML model in which the number of parameters grows as $O(\log(n))$ , we also report the simulation results of HQC with the tensor tree network (TTN) structure [28] in this supplementary section for interested readers. The TTN classifier does not employ parameterized quantum gates for pooling. Thus for certain ansatz, the number of parameters differs from that of QCNN models. For example, although the convolutional circuit 2 in Fig. 2 has two free parameters, only one of them is effective since one of the qubits is traced out as soon as the parameterized gate is applied. For brevity, here we only report the results obtained with the cross-entropy loss but similar results can be obtained with MSE. As can be seen from Tab. 8 and Tab. 9, the number of effective parameters (i.e. the second column) grows faster that that of QCNN models. An interesting observation is that there is no clear trend as the number of parameters is increased beyond 42, which is close to the maximum number of parameters used in QCNN. In other words, there is no clear motivation to increase the number of free parameters beyond 42 or so when seeking to improve the classification performance. Studying overfitting under the growth of the number of parameters remains an interesting open problem.

Classification Accuracy

Ansatz

\#

of params

Amplitude

Qubit

Dense

HDE

HAE

96.5\pm 0.4

94.7\pm 2.2

91.8\pm 1.2

50.5\pm 3.5

82.1\pm 6.5

69.9\pm 1.4

94.2\pm 3.0

\mathbb{98.5}\pm 0.1

81.6\pm 2.3

55.6\pm 1.7

57.3\pm 3.2

64.5\pm 15.9

52.7\pm 1.2

74.0\pm 3.4

53.8\pm 0.2

76.4\pm 8.9

60.3\pm 0.2

56.1\pm 6.9

98.7\pm 0.2

98.2\pm 0.5

97.7\pm 1.2

97.8\pm 0.2

95.7\pm 1.8

81.2\pm 0.9

97.8\pm 1.4

98.3\pm 0.1

\mathbb{88.7}\pm 4.3

95.6\pm 2.2

87.4\pm 15.0

95.9\pm 1.1

84.2\pm 15.5

95.7\pm 1.4

77.0\pm 2.2

89.5\pm 8.0

90.0\pm 10.3

81.5\pm 3.0

97.8\pm 0.5

96.7\pm 2.0

96.0\pm 3.1

97.0\pm 1.1

96.5\pm 1.0

77.1\pm 0.8

94.1\pm 3.4

98.2\pm 0.2

88.1\pm 2.1

98.6\pm 0.2

98.0\pm 0.8

98.1\pm 1.5

97.8\pm 0.1

95.7\pm 3.1

\mathbb{81.6}\pm 0.4

98.2\pm 0.9

98.3\pm 0.3

90.2\pm 3.0

89.9\pm 13.0

88.7\pm 14.8

92.8\pm 1.5

63.0\pm 5.1

94.9\pm 1.6

72.4\pm 0.4

90.2\pm 7.1

93.0\pm 10.8

85.2\pm 1.9

98.6\pm 0.2

95.8\pm 2.5

\mathbb{98.3}\pm 1.0

97.5\pm 0.6

\mathbb{97.9}\pm 0.9

77.8\pm 1.8

96.7\pm 1.1

92.2\pm 12.2

85.7\pm 7.1

\mathbb{98.9}\pm 0.1

\mathbb{98.6}\pm 0.2

97.6\pm 0.9

\mathbb{98.6}\pm 0.2

96.9\pm 0.6

81.0\pm 5.4

\mathbb{98.6}\pm 0.2

98.3\pm 0.0

82.2\pm 5.5

Table 8: Mean accuracy and one standard deviation of the classification for 0 and 1 in the MNIST dataset when the HQC model is trained with cross-entropy loss. The mean and the standard deviation are obtained from five repetitions with random initialization of parameters. The first column shows the ansatz label. The second column shows the total number of parameters that are subject to optimization. For qubit, dense and hybrid encoding, two rows indicate the values obtained with different classical data pre-processing, namely PCA and autoencoding, respectively. The best result under each quantum data encoding method is written in bold.

Classification Accuracy

Ansatz

\#

of params

Amplitude

Qubit

Dense

HDE

HAE

90.1\pm 2.27

89.7\pm 2.3

89.0\pm 3.6

61.8\pm 12.8

76.2\pm 12.2

87.7\pm 1.9

85.6\pm 5.5

61.2\pm 1.1

83.4\pm 18.8

77.7\pm 5.7

66.5\pm 0.0

63.5\pm 9.4

52.6\pm 13.3

55.9\pm 6.8

60.3\pm 0.0

64.7\pm 10.1

49.5\pm 0.3

69.0\pm 13.4

89.5\pm 2.0

90.4\pm 2.6

93.3\pm 1.0

85.0\pm 2.3

93.7\pm 1.7

89.4\pm 1.7

91.0\pm 1.8

68.7\pm 3.1

\mathbb{93.8}\pm 1.7

91.4\pm 1.2

75.0\pm 22.7

92.2\pm 1.1

\mathbb{91.6}\pm 0.5

94.4\pm 0.4

88.0\pm 1.3

82.0\pm 5.0

63.8\pm 2.5

84.2\pm 19.1

90.1\pm 2.0

83.7\pm 18.4

93.0\pm 1.4

92.0\pm 0.9

91.0\pm 2.1

89.2\pm 2.5

86.2\pm 3.4

66.3\pm 1.5

75.9\pm 23.7

\mathbb{92.5}\pm 1.1

88.8\pm 1.8

\mathbb{94.2}\pm 1.1

86.6\pm 1.8

92.3\pm 1.5

\mathbb{89.8}\pm 1.9

\mathbb{91.4}\pm 3.4

68.9\pm 2.9

67.4\pm 23.8

90.9\pm 0.6

85.3\pm 14.0

91.6\pm 3.4

76.4\pm 2.8

92.3\pm 2.5

81.4\pm 6.9

81.4\pm 8.7

62.9\pm 1.8

67.1\pm 23.4

90.7\pm 2.4

\mathbb{90.8}\pm 0.8

94.1\pm 1.1

91.1\pm 0.4

92.8\pm 1.0

89.5\pm 1.7

88.7\pm 3.2

67.6\pm 1.9

84.9\pm 19.5

89.9\pm 1.9

92.0\pm 1.6

93.8\pm 0.7

91.3\pm 2.2

\mathbb{94.5}\pm 0.8

89.6\pm 1.0

91.0\pm 1.8

\mathbb{69.0}\pm 1.7

85.4\pm 19.8

Table 9: Mean accuracy and one standard deviation of the classification for 0 (t-shirt/top) and 1 (trouser) in the Fashion MNIST dataset when the HQC model is trained with cross-entropy loss. The mean and the standard deviation are obtained from five repetitions with random initialization of parameters. The first column shows the ansatz label. The second column shows the total number of parameters that are subject to optimization. For qubit, dense and hybrid encoding, two rows indicate the values obtained with different classical data pre-processing, namely PCA and autoencoding, respectively. The best result under each quantum data encoding method is written in bold.

References

[1] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[2] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436–444, May 2015.
[3] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, page 2672–2680, Cambridge, MA, USA, 2014. MIT Press.
[4] A. Aurisano, A. Radovic, D. Rocco, A. Himmel, M.D. Messier, E. Niner, G. Pawloski, F. Psihas, A. Sousa, and P. Vahle. A convolutional neural network neutrino event classifier. Journal of Instrumentation, 11(09):P09001–P09001, sep 2016.
[5] R. Acciarri et al. Convolutional neural networks applied to neutrino events in a liquid argon time projection chamber. Journal of Instrumentation, 12(03):P03011–P03011, mar 2017.
[6] Daniel George and E.A. Huerta. Deep learning for real-time gravitational wave detection and parameter estimation: Results with advanced ligo data. Physics Letters B, 778:64–70, 2018.
[7] Akinori Tanaka and Akio Tomiya. Detection of phase transition via convolutional neural networks. Journal of the Physical Society of Japan, 86(6):063001, 2017.
[8] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost. Quantum algorithms for supervised and unsupervised machine learning. arXiv preprint arXiv:1307.0411, 2013.
[9] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost. Quantum principal component analysis. Nature Physics, 10(9):631–633, 2014.
[10] Nathan Wiebe, Ashish Kapoor, and Krysta M. Svore. Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning. Quantum Info. Comput., 15(3–4):316–356, March 2015.
[11] Patrick Rebentrost, Masoud Mohseni, and Seth Lloyd. Quantum support vector machine for big data classification. Phys. Rev. Lett., 113:130503, Sep 2014.
[12] Iordanis Kerenidis, Jonas Landman, Alessandro Luongo, and Anupam Prakash. q-means: A quantum algorithm for unsupervised machine learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
[13] Carsten Blank, Daniel K Park, June-Koo Kevin Rhee, and Francesco Petruccione. Quantum classifier with tailored quantum kernel. npj Quantum Information, 6(1):1–7, 2020.
[14] Iris Cong, Soonwon Choi, and Mikhail D. Lukin. Quantum convolutional neural networks. Nature Physics, 15(12):1273–1278, December 2019.
[15] Iordanis Kerenidis, Jonas Landman, and Anupam Prakash. Quantum algorithms for deep convolutional neural networks. arXiv preprint arXiv:1911.01117, 2019.
[16] Junhua Liu, Kwan Hui Lim, Kristin L. Wood, Wei Huang, Chu Guo, and He-Liang Huang. Hybrid quantum-classical convolutional neural networks. 64(9):290311, 2021.
[17] Maxwell Henderson, Samriddhi Shakya, Shashindra Pradhan, and Tristan Cook. Quanvolutional neural networks: powering image recognition with quantum circuits. Quantum Machine Intelligence, 2(1):2, June 2020.
[18] Samuel Yen-Chi Chen, Tzu-Chieh Wei, Chao Zhang, Haiwang Yu, and Shinjae Yoo. Quantum convolutional neural networks for high energy physics data analysis. arXiv preprint arXiv:2012.12177, 2020.
[19] YaoChong Li, Ri-Gui Zhou, RuQing Xu, Jia Luo, and WenWen Hu. A quantum deep convolutional neural network for image recognition. Quantum Science and Technology, 5(4):044003, July 2020.
[20] Ian MacCormack, Conor Delaney, Alexey Galda, Nidhi Aggarwal, and Prineha Narang. Branching Quantum Convolutional Neural Networks. arXiv:2012.14439 [physics, physics:quant-ph], December 2020. arXiv: 2012.14439.
[21] ShiJie Wei, YanHu Chen, ZengRong Zhou, and GuiLu Long. A Quantum Convolutional Neutral Network on NISQ Devices. arXiv:2104.06918 [quant-ph], April 2021. arXiv: 2104.06918.
[22] S. Mangini, F. Tacchino, D. Gerace, D. Bajoni, and C. Macchiavello. Quantum computing models for artificial neural networks. EPL (Europhysics Letters), 134(1):10002, April 2021.
[23] M. Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C. Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R. McClean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, and Patrick J. Coles. Variational quantum algorithms. Nature Reviews Physics, 3(9):625–644, 2021.
[24] John Preskill. Quantum Computing in the NISQ era and beyond. Quantum, 2:79, August 2018.
[25] Kishor Bharti, Alba Cervera-Lierta, Thi Ha Kyaw, Tobias Haug, Sumner Alperin-Lea, Abhinav Anand, Matthias Degroote, Hermanni Heimonen, Jakob S. Kottmann, Tim Menke, Wai-Keong Mok, Sukin Sim, Leong-Chuan Kwek, and Alán Aspuru-Guzik. Noisy intermediate-scale quantum (nisq) algorithms. arXiv preprint arXiv:2101.08448, 2021.
[26] Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, M. Sohaib Alam, Shahnawaz Ahmed, Juan Miguel Arrazola, Carsten Blank, Alain Delgado, Soran Jahangiri, Keri McKiernan, Johannes Jakob Meyer, Zeyue Niu, Antal Száva, and Nathan Killoran. Pennylane: Automatic differentiation of hybrid quantum-classical computations. arXiv preprint arXiv:1811.04968, 2020.
[27] Daniel K. Park, Carsten Blank, and Francesco Petruccione. Robust quantum classifier with minimal overhead. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2021.
[28] Edward Grant, Marcello Benedetti, Shuxiang Cao, Andrew Hallam, Joshua Lockhart, Vid Stojevic, Andrew G. Green, and Simone Severini. Hierarchical quantum classifiers. npj Quantum Information, 4(1):65, December 2018.
[29] Arthur Pesah, M. Cerezo, Samson Wang, Tyler Volkoff, Andrew T. Sornborger, and Patrick J. Coles. Absence of barren plateaus in quantum convolutional neural networks. Phys. Rev. X, 11:041011, Oct 2021.
[30] Farrokh Vatan and Colin Williams. Optimal quantum circuits for general two-qubit gates. Phys. Rev. A, 69:032315, Mar 2004.
[31] Jun Li, Xiaodong Yang, Xinhua Peng, and Chang-Pu Sun. Hybrid quantum-classical approach to quantum optimal control. Phys. Rev. Lett., 118:150503, Apr 2017.
[32] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii. Quantum circuit learning. Phys. Rev. A, 98:032309, Sep 2018.
[33] Maria Schuld, Ville Bergholm, Christian Gogolin, Josh Izaac, and Nathan Killoran. Evaluating analytic gradients on quantum hardware. Phys. Rev. A, 99:032331, Mar 2019.
[34] Maria Schuld and Nathan Killoran. Quantum machine learning in feature hilbert spaces. Phys. Rev. Lett., 122:040504, Feb 2019.
[35] Vittorio Giovannetti, Seth Lloyd, and Lorenzo Maccone. Quantum random access memory. Phys. Rev. Lett., 100:160501, Apr 2008.
[36] Daniel K. Park, Francesco Petruccione, and June-Koo Kevin Rhee. Circuit-based quantum random access memory for classical data. Scientific Reports, 9(1):3949, 2019.
[37] T. M. L. Veras, I. C. S. De Araujo, K. D. Park, and A. J. da Silva. Circuit-based quantum random access memory for classical data with continuous amplitudes. IEEE Transactions on Computers, pages 1–1, 2020.
[38] Israel F. Araujo, Daniel K. Park, Francesco Petruccione, and Adenilton J. da Silva. A divide-and-conquer algorithm for quantum state preparation. Scientific Reports, 11(1):6329, March 2021.
[39] Ryan LaRose and Brian Coyle. Robust data encodings for quantum classifiers. Phys. Rev. A, 102:032420, Sep 2020.
[40] Sukin Sim, Peter D. Johnson, and Alán Aspuru-Guzik. Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms. Advanced Quantum Technologies, 2(12):1900070, 2019.
[41] Robert M. Parrish, Edward G. Hohenstein, Peter L. McMahon, and Todd J. Martínez. Quantum computation of electronic transitions using a variational quantum eigensolver. Phys. Rev. Lett., 122:230401, Jun 2019.
[42] Hai-Rui Wei and Yao-Min Di. Decomposition of orthogonal matrix and synthesis of two-qubit and three-qubit orthogonal gates. Quantum Inf. Comput., 12(3-4):262–270, 2012.
[43] Ian T. Jolliffe. Principal Component Analysis. Springer Series in Statistics. Springer-Verlag New York, 2 edition, 2002.
[44] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
[45] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2017.
[46] Y. Nesterov. A method for solving the convex programming problem with convergence rate $o(1/k^{2})$ . Proceedings of the USSR Academy of Sciences, 269:543–547, 1983.
[47] Seth Lloyd, Maria Schuld, Aroosa Ijaz, Josh Izaac, and Nathan Killoran. Quantum embeddings for machine learning. arXiv preprint arXiv:2001.03622, 2020.
[48] Rui Huang, Xiaoqing Tan, and Qingshan Xu. Variational quantum tensor networks classifiers. Neurocomputing, 452:89–98, 2021.
[49] Francesco Tacchino, Chiara Macchiavello, Dario Gerace, and Daniele Bajoni. An artificial neuron implemented on an actual quantum processor. npj Quantum Information, 5(1):26, 2019.
[50] Stefano Mangini, Francesco Tacchino, Dario Gerace, Chiara Macchiavello, and Daniele Bajoni. Quantum computing model of an artificial neuron with continuously valued input data. Machine Learning: Science and Technology, 1(4):045008, oct 2020.
[51] Cláudio A. Monteiro, Gustavo I.S. Filho, Matheus Hopper J. Costa, Fernando M. de Paula Neto, and Wilson R. de Oliveira. Quantum neuron with real weights. Neural Networks, 143:698–708, 2021.