Problem-informed Graphical Quantum Generative Learning

Bence Bakó Eötvös Loránd University, Budapest, Hungary HUN-REN Wigner Research Centre for Physics, Budapest, Hungary Dániel T. R. Nagy Eötvös Loránd University, Budapest, Hungary HUN-REN Wigner Research Centre for Physics, Budapest, Hungary Péter Hága Ericsson Research, Budapest, Hungary Zsófia Kallus Ericsson Research, Budapest, Hungary Zoltán Zimborás Eötvös Loránd University, Budapest, Hungary HUN-REN Wigner Research Centre for Physics, Budapest, Hungary Algorithmiq Ltd, Kanavakatu 3C, Helsinki, 00160, Finland

Abstract

Leveraging the intrinsic probabilistic nature of quantum systems, generative quantum machine learning (QML) offers the potential to outperform classical learning models. Current generative QML algorithms mostly rely on general-purpose models that, while being very expressive, face several training challenges. A potential way to address these setbacks involves constructing problem-informed models capable of more efficient training on structured problems. In particular, probabilistic graphical models provide a flexible framework for representing structure in generative learning problems and can thus be exploited to incorporate inductive bias in QML algorithms. In this work, we propose a problem-informed quantum circuit Born machine Ansatz for learning the joint probability distribution of random variables, with independence relations efficiently represented by a Markov network (MN). We further demonstrate the applicability of the MN framework in constructing generative learning benchmarks and compare our model’s performance to previous designs, showing it outperforms problem-agnostic circuits. Based on a preliminary analysis of trainability, we narrow down the class of MNs to those exhibiting favorable trainability properties. Finally, we discuss the potential of our model to offer quantum advantage in the context of generative learning.

I Introduction

Refer to caption — Figure 1: Framework for using probabilistic graphical models to introduce inductive bias in quantum generative learning models. Most generative learning problems can be (re)formulated with binary random variables. If we have enough knowledge about the problem, we can identify structure in the form of PGMs. Having a PGM, we can construct and train the corresponding quantum model, that later can be used to draw samples from the learned probability distribution.

In recent years, the field of machine learning (ML) experienced unprecedented growth, giving rise to a wide variety of models and algorithms with the potential to revolutionize several fields [1]. Generative learning is a powerful paradigm in ML that aims to capture the underlying distribution of data in order to generate realistic samples [2]. Quantum machine learning (QML) has emerged as a promising intersection of quantum computing and machine learning in the pursuit of practical quantum advantage [3, 4].

Quantum resources, due to their inherent probabilistic nature, can be used to efficiently draw samples from probability distributions of high complexity [5, 6, 7, 8, 9]. This makes generative QML a natural pathway towards harnessing the potential of quantum computers. Recent advances in this field led to the adaptation of several successful classical generative models [10]. The resulting architectures include quantum circuit Born machines (QCBMs) [11, 12, 13], quantum generative adversarial networks (QGANs) [14, 15] and quantum Boltzmann machines (QBMs) [16, 17].

While their high expressivity makes general-purpose QML models very powerful, they also pose several challenges. Contrary to classical neural networks, variational quantum circuits are much more affected by trainability issues, such as barren plateaus and poor local minima [18, 19, 20, 21, 22]. Furthermore, the no-free-lunch theorem [23] also translates to QML, suggesting, that these problem-agnostic models, like hardware-efficient Ansätze, have poor average performance [24, 25]. The reason behind these barriers can be seen as the lack of sufficient inductive bias, i.e., assumptions about the data that could be encoded into the learning framework. Consequently, a potential way of dealing with them is by constructing problem-informed models, that can be trained more efficiently for structured problems [26, 27, 28, 29].

Probabilistic graphical models (PGMs) provide a mathematical framework for representing structure in generative learning problems defined over random variables [30] and, as such, can be exploited to construct problem-informed QML models, as depicted in Fig. 1. The two main classes of PGMs are Bayesian networks (BNs) and Markov networks (MNs). BNs provide a highly interpretable approach to graphical modelling by using a directed graph, leading to numerous real-world applications across many disciplines [31]. However, in certain domains, such as those involving spacial or relational data, MNs provide a more natural representation, since they do not define the orientation of the graph edges [32].

While there are several excellent works concerning the quantum circuit implementation of BNs [33, 34, 35], MNs are not well-studied in the context of QML. In this work, we investigate the applicability of the framework provided by MNs to generative QML with classical data. We propose a problem-informed model, that aims to learn the distribution over random variables, where the independence relations are efficiently represented by a MN. As opposed to previous problem-agnostic QCBMs, this Ansatz can capture higher order correlations between the corresponding random variables and potentially reduce the number of trainable parameters, while also increasing performance. While this construction relies on the knowledge of the MN structure, that can be hard to infer from data, the graph representation is readily available in various application domains. We argue that this new model class has the potential to demonstrate quantum advantage, since it contains the class of QAOA circuits [36], that were shown to produce classically hard probability distributions [7, 37]. Besides model design, we also show the potential of the PGM framework to construct benchmarks for generative QML models, where the problem complexity can be fine-tuned in multiple factors. We perform numerical experiments based on this benchmark proposal against both problem-agnostic and BN-based QML models. Finally, we present a preliminary analysis of trainability and define a class of efficient graphical representations, that have higher potential in this context.

It is important to note that, while we concentrate on the significance of problem-specific model construction in the context of QML, this trend is also present in classical ML, where much larger models can be implemented [38]. This further illustrates the power of problem-informed approaches, relevant not only for near-term devices, but potentially for the large-scale fault-tolerant quantum computing as well.

The structure of this paper is as follows: Sec. II offers a review of relevant concepts regarding PGMs and generative QML. In Sec. III, we present our generative QML model and benchmark proposal, along with numerical experiments. Finally, in Sec. IV, we discuss potential future directions.

II Background & Notation

II.1 Probabilistic Graphical Modelling

Explicitly encoding the probabilities of each assignment in a high-dimensional state-space is infeasible, as it scales exponentially with the number of random variables. In a space of only ten binary variables, we would need $2^{10}=1024$ numbers to represent a probability distribution. PGMs were developed to tackle this problem by using a graph representation to compactly encode a complex distribution of interacting random variables. These graphs effectively capture the independence relations between variables and enable to split the joint probability distribution into smaller factors, each over a smaller subspace. Furthermore, this framework is also useful for inference and learning tasks.

Here we give a brief description of the mathematical framework of the two main families of PGMs and their connections. Since any higher order PGM can be embedded into PGMs over binary random variables, we focus our attention on the latter. For an extensive study of PGMs, we refer the reader to [30].

II.1.1 Bayesian networks

BNs use directed acyclic graphs to represent the conditional dependencies between random variables. In these models, a variable is independent of all other variables given its parents in the graph. Consequently, the factors of the joint probability distribution can be interpreted simply as conditional probabilities.

Definition 1 (Bayesian network factorization).

A distribution $P_{\mathcal{B}}$ over the space of $n$ random variables $\mathbf{X}=\{X_{1},\dots,X_{n}\}$ factorizes according to a Bayesian network $\mathcal{B}=(\mathcal{G},P)$ , if $P_{\mathcal{B}}$ can be expressed as a product

P(X_{1},\dots,X_{n})=\prod\limits_{i=1}^{n}P(X_{i}|Pa^{\mathcal{G}}_{X_{i}}),

where $Pa^{\mathcal{G}}_{X_{i}}$ denotes the parents of the node associated to variable $X_{i}$ in graph $\mathcal{G}$ .

II.1.2 Markov networks

Markov networks, Markov random fields or undirected graphical models are defined over general undirected graphs to represent a set of random variables having the Markov property. In general, the global Markov property applies, which states, that any two subsets of variables are conditionally independent given a separating subset. As opposed to BNs, where the factors are straightforward to comprehend, here they cannot be interpreted directly. However, we can view a factor as describing “compatibilities” between different values of the variables in the corresponding subset. A factor here describes a general purpose function $\phi:\mathrm{Val}(\mathbf{D})\rightarrow\mathbb{R}$ , where $\mathrm{Val}(\mathbf{D})$ denotes all possible joint states of a set of random variables $\mathbf{D}$ . Each factor corresponds to a clique in the graph, however, the usual graphical representation does not make it clear, whether the joint probability distribution factorizes according to the maximal cliques or the subsets thereof.

Definition 2 (Markov network factorization).

We say that a distribution $P_{\Phi}$ with $\Phi=\{\phi_{1}(\mathbf{D}_{1}),\dots,\phi_{K}(\mathbf{D}_{K})\}$ factorizes over a Markov network $\mathcal{H}$ if each $\mathbf{D}_{k}$ $(k=1,\dots,K)$ is a complete subgraph of $\mathcal{H}$ and $P_{\Phi}$ is a Gibbs distribution parametrized by these factors as follows:

\begin{split}P_{\Phi}(X_{1},\dots,X_{N})&=\frac{1}{Z}\tilde{P}_{\Phi}(X_{1},\dots,X_{N})\\ &=\frac{1}{Z}\phi_{1}(\mathbf{D}_{1})\times\phi_{2}(\mathbf{D}_{2})\times\dots\times\phi_{m}(\mathbf{D}_{m}),\end{split}

where $\times$ denotes the factor product, $\tilde{P}_{\Phi}$ is the unnormalized measure and

Z=\sum\limits_{X_{1},\dots,X_{n}}\tilde{P}_{\Phi}(X_{1},\dots X_{n})

is the normalizing constant or partition function.

For the MN in Fig. 2b, with independence relations $(A\perp D\;|\;C)$ , $(B\perp D\;|\;C)$ and assuming maximal clique factorisation, the joint probability distribution takes the form $P(A,B,C,D)=\frac{1}{Z}\phi_{1}(A,B,C)\phi_{2}(C,D)$ . Here $\phi_{1}(A,B,C)$ and $\phi_{2}(C,D)$ denote the two maximal cliques of the graph.

In this picture, the factors are encoded as complete tables, however, for positive factor values, there exists an alternative parametrization, that connects MNs to energy-based models [39]. In this case, the joint probability distribution can be written as

P_{\Phi}(X_{1},\dots,X_{N})=\exp{\left[-\sum\limits_{i=1}^{m}\epsilon_{i}(\mathbf{D}_{i})\right]},

(1)

where $\epsilon_{i}(\mathbf{D}_{i})=-\ln{\phi_{i}(\mathbf{D}_{i})}$ is called the energy function. This parametrization of MNs is usually referred to as the log-linear model. We use this representation as inspiration for constructing our MN-based QML model.

II.1.3 Connection between BNs and MNs

Bayesian and Markov networks are incomparable in terms of independence relations they capture. However, we can convert one type into the other, that can represent the same probability distribution by potentially introducing new edges between the nodes of the graph.

The transformation from BNs to MNs, called moralization, follows a simple rule: if two nodes are connected by a directed edge in the BN graph or they are both parents of at least one node, then they are connected by an undirected edge in the MN graph. Having the undiercted graph, we can assign a factor to each resulting clique, to obtain a MN. This procedure usually leads to a PGM with more parameters, that can lead to a longer training. Directed graphs, in which parent nodes do not share common children are called moral. Consequently, BNs that are defined over moral graphs, can be converted to MNs without introducing additional edges and parameters.

Turning MNs into BNs is a more difficult task, both conceptually and computationally. First, the undirencted graph needs to be triangulated, meaning that we introduce chords in cycles of $4$ or more and repeat this process until there are no such cycles left. Chords connect nodes, that are in the cycle, but are not already connected. This transformation usually leads to the introduction of a much larger number of edges than in the previous case. Finally, the undirected edges have to be turned into directed ones in an acyclic manner. Since chordal graphs are also moral, PGMs that are defined over chordal graphs represent a class of graphical models, that can be treated as either BNs or MNs and the conversion between the two is straightforward.

II.1.4 Generative learning in PGMs

The two main learning tasks in the context of PGMs are structure learning and parameter estimation [30]. In this work, we restrict our attention to a version of parameter estimation in a generative learning setting. We require our model to learn how to sample the unknown probability distribution that the corresponding PGM induces. We give a formal description of this problem for MNs, but it can be formulated for BNs analogously to Prob. 1. Throughout this work, we use the phrases distribution learning and generative learning interchangeably.

Problem 1 (Distribution Learning in MNs).

Given the graph structure and the clique factorization of a Markov network $\mathcal{H}$ with an unknown joint probability distribution $P^{*}$ , a dataset $\mathcal{D}$ sampled from $P^{*}$ and $\varepsilon,\delta\in(0,1)$ , output with probability at least $1-\delta$ a representation of a distribution $P_{\mathcal{M}}$ satisfying $d(P^{*},P_{\mathcal{M}})\leq\varepsilon$ .

In the above problem formulation, $d(\cdot,\cdot)$ refers to the distance between the two distributions. In this work, we focus on the total variational (TV) distance defined as

\mathrm{TV}(P^{*},P_{\mathcal{M}})=\frac{1}{2}\sum\limits_{x\in\{0,1\}^{n}}|P^{*}(x)-P_{\mathcal{M}}(x)|.

(2)

In MNs, the use of a normalizing constant couples the parameters across the whole network, which prevents us from decomposing the problem and estimating local groups of parameters separately. One of the computational ramifications of this global coupling is, that not even the parameter estimation with complete data can be solved in closed form (except for some special cases, that can essentially be reformulated as BNs [40]). This makes the use of iterative methods, such as gradient descent, unavoidable. Luckily, the likelihood objective is convex, meaning that these methods are guaranteed to converge. However, they come with the disadvantage of having to run inference in each step to calculate the gradients and inference in MNs is #P-complete in general, which makes our distribution learning task formulated above fairly expensive or even intractable classically [41].

II.2 Generative QML Models

II.2.1 General framework of QCBMs

QCBMs, introduced in [11], are paradigmatic generative QML models, that naturally inherit the Born rule and thus can be used to generate tunable and discrete probability distributions that approximate a target distribution. In this section, we review the general components of variational quantum algorithms in the context of QCBMs.

One of the key building blocks is the Ansatz. The Ansatz refers to a parametrized family of quantum circuits used as a hypothesis or approximation for solving the problem in question by iteratively optimizing the parameters to match the desired target behavior or to minimize a cost function. We can differentiate problem-informed and problem-agnostic Ansätze.

The quantum circuit Ising Born machine (QCIBM), introduced by Coyle et al. in [13], in its most general form, is a problem-agnostic Ansatz. The structure of the QCIBM circuit is depicted in Fig. 3, where the corresponding unitaries can be written as

U_{z}(\boldsymbol{\alpha})=\prod\limits_{j}{}U_{z}(\alpha_{j},S_{j})=\prod\limits_{j}\exp{\left(i\alpha_{j}\bigotimes\limits_{k\in S_{j}}Z_{k}\right)},

(3)

U_{f}(\mathbf{\Gamma},\mathbf{\Delta},\mathbf{\Sigma})=\exp{\left(i\sum\limits_{k=1}^{n}(\Gamma_{k}X_{k}+\Delta_{k}Y_{k}+\Sigma_{k}Z_{k})\right)}.

(4)

Here each $S_{j}$ indicates a subset of qubits and the $X,Y,Z$ operators are the Pauli-matrices. The $U_{f}$ operators can also be thought of as a “parametrized measurement”, or letting the measurements be in any local basis. The authors restrict the Hamiltonian, that generates $U_{z}$ to only contain one- and two-body terms, since “only single and two-qubit gates are required for universal quantum computation” and they consider each qubit-pair (all-to-all connectivity).

The goal of generative learning is to draw samples from a model probability distribution $P_{\boldsymbol{\theta}}$ , that is sufficiently close to the target distribution $P^{*}$ , while only having access to a finite number of samples from $P^{*}$ . In the case of QCBMs, the model probability distribution is approximated by repeatedly running the circuit and measuring an observable each time. The final measurements are performed in the computational basis on each qubit, producing a binary string of variables $x_{i}\in\{0,1\}$ from the distribution

P_{\boldsymbol{\theta}}(\mathbf{x})=|\langle\mathbf{x}|U(\boldsymbol{\theta})|0\rangle^{\otimes n}|^{2},

(5)

where the $\boldsymbol{\theta}$ vector contains all circuit parameters.

There are multiple ways to characterize the distance of two distributions. The TV distance, shown in (2), is a good benchmark, but it is not feasible as a cost function. One of the most common metrics used as a cost function in generative modelling is the Kullback-Leibler (KL) divergence

D_{KL}(P_{\boldsymbol{\theta}},P^{*})=\sum_{x}P^{*}(x)\log\left(\frac{P^{*}(x)}{P(x)}\right),

(6)

where in practice $P(x)$ is replaced by an infinitesimal constant for $0$ probabilities.

Unfortunately, not even this function is optimal for data-driven training, as it requires a large number of training samples and it was shown not to be trainable for larger-scale QCBMs [42]. In [12] a more efficient cost function was introduced for gradient-based training of QCBMs. The idea was to compare the distance of the samples drawn from the target and the model distribution in a kernel feature space. This loss function is called the squared maximum mean discrepancy (MMD):

\begin{split}\mathcal{L}_{\text{MMD}}(P_{\boldsymbol{\theta}},P^{*})&=\left\|\sum_{x}P_{\boldsymbol{\theta}}(x)\phi(x)-\sum_{x}P^{*}(x)\phi(x)\right\|^{2}\\ &=\underset{x\sim P_{\boldsymbol{\theta}},y\sim P_{\boldsymbol{\theta}}}{\mathbb{E}}[K(x,y)]+\underset{x\sim P^{*},y\sim P^{*}}{\mathbb{E}}[K(x,y)]\\ &-2\underset{x\sim P_{\boldsymbol{\theta}},y\sim P^{*}}{\mathbb{E}}[K(x,y)],\end{split}

(7)

where the $\phi$ function maps $x$ into a higher dimensional space and by definition $K(x,y)=\phi(x)^{T}\phi(x)$ is the kernel function, for which they used a Gaussian kernel

K_{x,y}=\exp\left(-\frac{1}{2\sigma}|x-y|^{2}\right).

(8)

Having a family of parametrized circuits and a cost function to characterize the distance between the circuit output and the target distribution, the final building block is the optimizer, that adjusts the circuit parameters given the cost function. Both gradient-based and gradient-free optimizers have been used to train QCBMs [12, 43]. We concentrate on gradient-based optimization methods, but since our approach concentrates on Ansatz-design, it is compatible with other optimizers as well.

II.2.2 Existing quantum circuit adaptations of PGMs

Liu and Wang [12] made the first explicit connection between the paradigmatic QCBM model and PGMs. They propose a framework, in which they first construct the Chow-Liu tree of a dataset [44] based on the mutual information between all pairs of bits for training samples. Having this tree graph, they propose a QCBM Ansatz, in which, the connectivity pattern of the $CNOT$ gates respects the graph structure. The Chow-Liu tree offers an effective approach for creating a second-order product approximation of a joint probability distribution. The corresponding graph represents a BN, that can also be regarded as a pairwise MN, but being a second-order approximation, it fails to detect higher order correlations. Another explicit connection between BNs and QCBMs was formulated in [45], where the authors proposed a framework, that utilizes QCBMs for variational inference in PGMs.

Besides the framework of mostly general-purpose QCBMs, there have been several attempts to implement PGMs on a quantum computer. Bayesian networks have an equivalent formulation in the computational basis measurements of a class of quantum circuits known as Bayesian quantum circuits (BQCs) [33]. These are defined such that the probability distribution they sample from, by measuring the given qubits in the computational basis, corresponds to the distribution defined by the corresponding BN. BQCs are implemented with uniformly controlled gates, that can be decomposed into one-qubit rotations and $CNOT$ gates [34, 46]. By definition, these circuits obey certain rules in accordance with the directed, acyclic nature of the underlying graph.

In [35] the authors introduced a minimal extension to BQCs and presented unconditional proof of separation in the expressive power of BNs and the corresponding basis-enhanced BQCs (BBQCs). They showed that by letting the final measurement be in any local basis, this separation appears, that can be associated with quantum nonlocality and contextuality. They also pointed out, that both BQCs and their basis-enhanced versions can be efficiently simulated with classical tensor network methods, when the graphs are sparse enough.

The literature on the quantum circuit implementation of Markov networks is more scarce. In Ref. [47] the authors identified a novel embedding technique of MNs into unitary operators that relies on their log-linear representation. They construct a Hamiltonian composed of Pauli- $Z$ terms and give a quantum algorithm, that implements the exponential of this Hamiltonian, meaning, that measuring the output qubits of the corresponding quantum circuit is equivalent to sampling the corresponding MN. The circuit, that implements this exponentiation, uses a special point-wise polynomial approximation and real part extraction, that might fail. For this reason, they have to measure ancillary qubits in order to determine whether this extraction was successful and start everything over if not. Consequently, the success probability decreases exponentially with the number of maximal cliques. This can of course be amplified with quantum singular value transformation [48], but that further increases the required resources. Since Boltzmann machines form a subclass of pairwise MNs, their quantum circuit implementations can also be regarded as an adaptations of PGMs [17]. However, these models are quite restricted compared to general MNs, since they only consider pairwise correlations, usually in a bipartite manner.

III Quantum Circuit Markov Random Fields

In this section, we present our results, starting with the definition of our QML model, proposed for generative learning in MNs. We then introduce our novel benchmark proposal and compare our model to both problem-agnostic QCIBMs and BBQCs through a series of numerical experiments. As a a preliminary analysis of trainability, we investigate the scaling of the cost function variance for different types of graphs. Finally, we present our argument for a potential quantum advantage of our MN-based model class.

III.1 From Graphical Representation to Variational Ansatz

We propose a QCBM Ansatz for distribution learning in MNs, as described in Prob. 1. We start by constructing a parametrized many-body Ising Hamiltonian, that is inspired by the log-linear model of MNs, and consequently depends on the clique structure of the MN $\mathcal{H}$ . This Hamiltonian takes the form

H^{\prime}(\boldsymbol{\beta})=\sum_{C\in\mathcal{C}_{\mathcal{H}}}\bigotimes_{v\in C}\beta_{C,v}(I+Z_{v}),

where $Z_{v}$ is the Pauli- $Z$ operator acting on qubit $v$ , $\mathcal{C}_{\mathcal{H}}$ refers to the set of cliques and $\boldsymbol{\beta}$ is the set of parameters. Usually some of the MN cliques overlap in nonzero subsets, thus there will be reoccurring terms. Since all terms commute, we can reparametrize the Hamiltonian such that each term only appears once (and identities are excluded): $H^{\prime}(\boldsymbol{\beta})\rightarrow H(\boldsymbol{\alpha})$ .

Having formulated this parametrized Hamiltonian, that encodes the structure of the problem, we consider the unitary it generates as

U_{Z}(\boldsymbol{\alpha})=e^{-iH(\boldsymbol{\alpha})},

(9)

and implement a model similar to QCIBMs, defined in Sec. II.2.1 and Fig. 3. We call these problem-informed QCBM models quantum circuit Markov random fields (QCMRFs).

As an example, the MN shown in Fig. 2b defines a $3$ -local Hamiltonian of the following form: $H(\boldsymbol{\alpha})=\alpha_{1}Z_{A}Z_{B}Z_{C}+\alpha_{2}Z_{A}Z_{B}+\alpha_{3}Z_{B}Z_{C}+\alpha_{4}Z_{A}Z_{C}+\alpha_{5}Z_{C}Z_{D}+\alpha_{6}Z_{A}+\alpha_{7}Z_{B}+\alpha_{8}Z_{C}+\alpha_{9}Z_{D}$ .

Alternatively, one can also limit the locality of the Hamiltonian to get shallower circuits, that in turn can be worse at capturing higher-order correlations. This is ultimately equivalent to considering smaller cliques instead of the maximal clique factorization. The circuit implementation details are discussed in Appendix A.1.

III.2 Benchmark Proposal

Benchmarking generative QML models often relies on generic probability distributions, such as the bars and stripes dataset, or some Hamming weight specific target distribution [11, 42, 49]. Here we describe our proposal for constructing target distributions for these models, where the complexity is tunable in several ways. This construction relies on MNs, where the graph structure can be defined by the user. In general, the “difficulty” of the learning problem is proportional to the clique sizes of the MN. The most general case is a complete graph with a single maximal clique, that corresponds to explicitly encoding the probability to each global state.

Given an undirected graph $\mathcal{G}$ , a set of cliques $\mathcal{C}_{\mathcal{H}}$ , and a classical generator, we construct a target MN $\mathcal{H}$ through the following steps:

1.

To each clique $C\in\mathcal{C}_{\mathcal{H}}$ , we assign a factor as a set of $2^{|C|}$ numbers obtained by querrying the classical generator $2^{|C|}$ times.
2.

Next, we calculate the unnormalized measures of each global assignment by multiplying the corresponding element of each factor.
3.

To get the probability of each assignment, we then normalize the measures by the partition function.
4.

Finally, we sample this joint probability distribution classically several times to construct the training dataset $\mathcal{D}$ .

As the size of the graph increases, this procedure becomes highly inefficient, because of the exponential size of the space. In these cases, instead of calculating the target probabilities exactly, steps $2-4.$ can be replaced by the use of approximate sampling techniques to sample $\mathcal{H}$ directly [30], e. g., using Gibbs sampling.

Similar steps can be taken for BNs as well, but there one chooses $2^{|Pa^{\mathcal{G}}_{X_{i}}|}$ random probabilities for each random variable $X_{i}$ .

The main complexity of the target problem comes from the graph topology and the size of the cliques. This means, that given a graph structure and its maximal clique factorization, we can increase the complexity of the corresponding target distribution by introducing additional edges and considering the maximal cliques of the new graph. The second complexity factor comes from the classical generator, that assigns factor values to the cliques. Throughout our numerical investigations, we sample these values uniformly at random in an IID fashion from a positive range of real numbers. Alternatively, one can consider sampling the factor values from a more complex distribution, as long as classical sampling can be performed efficiently.

III.3 Numerical Experiments

Here we present two types of numerical experiments: the first kind aims to show, that our QCMRF model performs better than the problem-agnostic QCIBM on structured MNs; in the second set, we compare its performance to BBQCs. In the latter case, we consider loop graphs, that first need to be triangulated in order to implement the corresponding BBQC models.

In all experiments, trainings are carried out with two different cost functions: the KL divergence as in (6) and the MMD loss as in (7), where the kernel function is calculated as the average of $3$ different Gaussian kernels with $\sigma\in\{0.25,10,10^{3}\}$ . The KL divergence has access to the exact target probability distribution, while the MMD loss can only access a finite training set.

The quantum circuit simulations are carried out with the Pennylane software package [50], and optimized with Adam [51], with learning rate $0.1$ . We train all models for $500$ epochs.

While we use only a finite number of shots for training, the TV distance between the model and target distributions is calculated analytically in each step. We run multiple experiments with different random factor values and analyse the average performance of all models with both cost functions as measured in the exact TV distance. For better visualization, we also average over a window of $20$ training epochs.

III.3.1 Benchmarks against QCIBMs

To demonstrate the superiority of our model compared to the problem-agnostic QCIBM with all-to-all connectivity, first we present simulation results based on MNs with grid-like topology, always considering the maximal clique factorization. The number of training samples for the MMD loss along with the number of quantum circuit evaluations was set to $10^{4}$ . We consider $5$ sets of uniformly random factor values and take the average performance over these.

All parameters of both models are initialized to $0$ , as in this setting, the model starts the training in the equal superposition of all basis states. This strategy proved to be better than random initialization.

The results are shown in the top row of Fig. 4. We start with a $3\times 3$ grid, which defines a pairwise MN, meaning, that the corresponding QCMRF model incorporates only $2$ -local interactions. Here both models have similar performance, while our QCMRF model reduces the number of trainable parameters from $72$ to $48$ .

We continue by introducing additional edges to the grid. All the maximal cliques of the second graph have size $3$ , leading to a $3$ -local Hamiltonian, that can capture higher-order correlations. Here we can already see some separation in the performance, while still having less parameters ( $60$ ) than the QCIBM. In the last case, the cliques are of size $4$ , which increases the number of trainable parameters in the QCMRF circuit to $76$ . In this case, our model significantly outperforms the $2$ -local QCIBM.

This series of experiments show, that as we increase the connectivity of the graph and the sizes of its maximal cliques, the distribution becomes harder to learn, as it is reflected in the performance of the problem-agnostic QCIBM. However, the performance of our problem-specific QCMRF model is either unaffected by this change, or it performs even better, as its complexity also increases with the underlying MN. This also proves, that using higher order Hamiltonians can actually help in capturing higher order correlations between the random variables of the MN.

Next, we focus on random graphs, that are globally less structured, thus being closer to naturally occurring topologies. Here we explore the role of communities. By communities, we refer to dense subgraphs, that are sparsely connected between each-other. The results are shown in the bottom row of Fig. 4, where the training is done similarly to the previous experiments. Here our QCMRF models reach better performance than the problem-agnostic QCIBMs, while significantly reducing the number of trainable parameters, since they exploit the sparsity of the graph. For these $10$ -node graphs, the corresponding QCIBM model has $85$ parameters, while the QCMRF circuits only have $60$ , $58$ and $66$ respectively. Furthermore, in the third graph, where the communities are connected by a node with large centrality measures, the QCMRF model significantly outperforms the problem-agnostic case. These experiments further prove the usefulness and viability of our Ansatz design approach.

III.3.2 Benchmarks againts BBQCs

Next, we validate our model with BBQCs on non-chordal MNs. In these cases, the undirected graph first has to be triangulated and turned into a directed acyclic graph in order to consider the corresponding BN. Knowing the structure of this BN, one can implement the corresponding BBQC, that is supposed to be able to capture the target distribution exactly. However, this process is very costly: the triangulation of the graph itself is a hard problem and it can introduce a large number of edges, that leads to the introduction of additional variables. Consequently, the corresponding BBQC can have many more trainable parameters and the circuit depth can be significantly larger.

For this comparison, we considered loop graphs, that are easy to triangulate. Since with $0$ parameter initialization, the starting probability distribution is different for QCMRF and BBQC circuits, here we started with random values to have a more fair comparison. The number of shots was set to $10^{3}$ for all experiments along with the size of the training set for the MMD-based optimizations. We ran all experiments with $10$ sets of uniformly random factor values and visualized the average performance in Fig. 5.

All these loop-graphs can be used to implement QCMRF circuits, while to define the corresponding BBQC, the triangulation introduces $n-3$ new edges. Due to this fact, the number of trainable parameters along with the depth of the circuit is significantly increased. In all cases, our MN-based model reaches similar performance to BBQCs, while having much lower cost of implementation, which further highlights the usefulness of our model class. For a deeper comparison between the implementation costs of these models, we refer the reader to Appendix B.

III.4 Trainability

While MNs are capable of representing any probability distribution, we expect that not all types of networks lead to efficiently trainable QML models. For this reason, we conduct a preliminary analysis of the trainability properties of QCMRFs. In particular, we study numerically the scaling of the MMD cost function variance with the number of qubits (or nodes). Having fixed the graph type, for each number of nodes, we define $10$ sets of uniformly random factor values and in each case, we evaluate the variance using $10^{4}$ sets of random circuit parameters. The cost value is calculated with a training set of $10^{4}$ samples and $10^{4}$ quantum circuit evaluations.

A complete graph having maximal clique factorization corresponds to no problem-specific knowledge, since the probability distribution has $2^{n}-1$ degrees of freedom. As shown in Fig. 6, according to our numerics, the cost function variance vanishes exponentially in the number of qubits in this case, which also indicates the presence of deterministic barren plateaus [52]. To study sparser (but still dense) graphs, we concentrate first on Erdős-Rényi graphs with $p=0.5$ edge probability, since in this case, the expected size of the largest clique is $\mathcal{O}(\log n)$ with high probability. Our simulations clearly show, that the scaling of the variance is closer to polynomial. This is also true for triangle chain graphs, where we have $n-2$ cliques of size $3$ in total. While polynomial scaling of the cost variance does not prove the empirical absence of barren plateaus, it is a good first indicator of better trainability.

These investigations lead us to the definition of a subclass of MNs, that restrict the class of problems, that our problem-informed framework can be most useful for.

Definition 3 (Efficient MN representation).

Anddd A probability distribution $P_{\Phi}$ over $n$ binary random variables is said to have an efficient MN representation, if $P_{\Phi}$ factorizes according to a Markov network $\mathcal{H}$ with

\sum_{C\in\mathcal{C}_{\mathcal{H}}}2^{|C|}\in\mathrm{poly}(n),

where $\mathcal{C}_{\mathcal{H}}$ is the set of cliques of $\mathcal{H}$ .

It follows naturally from the our definitions, that QCMRF circuits corresponding to efficient MNs representations, have depth $\mathrm{poly}(n)$ .

III.5 Potential for Quantum Advantage

Quantum advantage, in the context of generative learning, can have various flavours: it can show improvement in the precision (wrt a distance metric); it can refer to faster convergence; or even an improvement in the number of training samples needed. In order to formulate all these cases in a single definition, we first need to define what we mean by a class of distributions being efficiently learnable:

Definition 4 ( $(d,\varepsilon,k,C)$ -learnable).

For a metric $d$ , $\epsilon>0$ , $k>0$ and complexity class $C$ , a class of distributions $P^{n}$ is called $(d,\varepsilon,k,C)$ -learnable, if there exists an algorithm $\mathcal{A}\in C$ , that given $0<\delta<1$ as input and having access to a dataset $\mathcal{D}$ of size $k$ sampled from any $P\in P^{n}$ , outputs with probability at least $1-\delta$ a representation of a distribution $P_{\mathcal{M}}$ satisfying $d(P,P_{\mathcal{M}})\leq\varepsilon$ . $\mathcal{A}$ should run in time $\text{poly}(1/\varepsilon,1/\delta,n)$ .

With this, we extended the definition from [13] with the sample complexity, and now formulate quantum learning advantage:

Definition 5 (Quantum Learning Advantage).

An algorithm $\mathcal{A}\in BQP$ is said to have quantum learning advantage, if there exists a class of distributions $P_{n}$ for which there exists $d$ , $\epsilon$ , $k$ such that $P^{n}$ is $(d,\varepsilon,k,BQP)$ -learnable, but not $(d,\varepsilon,k,BPP)$ -learnable.

Besides learning advantage, generative QML models also have the potential to exceed classical methods for sampling the learned distribution:

Definition 6 (Quantum Advantage in Sampling).

Given a probability distributions $P_{\mathcal{M}1}$ satisfying $d(P,P_{\mathcal{M}1})\leq\varepsilon$ (for some $d$ metric and $\epsilon>0$ ), a quantum algorithm $\mathcal{A}_{1}\in BQP$ is said to have quantum advantage in sampling from the distribution $P$ , if $\mathcal{A}_{1}$ can efficiently sample $P_{\mathcal{M}1}$ and no classical algorithm $\mathcal{A}_{2}\in BPP$ can sample $P_{\mathcal{M}2}$ efficiently, where $d(P,P_{\mathcal{M}2})\leq\varepsilon$ .

In the following, we concentrate on this second definition and argue, that since our QCMRF model class contains the class of QAOA and IQP circuits, it can also produce distributions that are thought to be classically hard. For this, we assume, that the joint distribution of a target MN is learnable by both an arbitrary classical model and a QCMRF model to a given precision $\varepsilon$ and analyze the complexity of sampling the trained models. Previous works [13, 53] presented similar arguments relying on the results of Refs. [54, 7]. We start by sketching current results regarding the hardness of sampling QAOA circuits, then based on this, we present our conjecture about the possible quantum advantage in sampling for our model.

The Quantum Approximate Optimization Algorithm (QAOA) was proposed in [36] for approximately solving combinatorial optimization problems on a quantum computer. In [7], the authors proved, that efficient classical sampling of the output distribution of the $1$ -level QAOA circuit implies the collapse of the polynomial hierarchy to the third level. While the argument was constructed with a $2$ -local QAOA circuit, the proof stands for higher order $Z$ interactions as well, all these being diagonal operators. On the other hand, this was shown for multiplicative error only, meaning, that if the target distribution (being the one defined by the QAOA circuit) $P_{QAOA}$ and the classical model distribution $P_{\mathcal{M}}$ satisfy the bound

|P_{\mathcal{M}}(x)-P_{QAOA}(x)|\leq\varepsilon P_{QAOA}(x),\forall x,

(10)

then the polynomial hierarchy collapses to its third level. In our framework, however, we mostly concentrated on distance as measured in TV, which is equivalent to additive error. Multiplicative error is a stronger constraint, thus simulating up to a bounded additive error is a harder task. These results using multiplicative error were extended in [55] from worst case to average case hardness for $1$ -level QAOA circuits. Finally, in [37], the author proved average case hardness with additive error, which, up to our knowledge, is the strongest result in connection with the weak simulation of $1$ -level QAOA circuits. Since the class of probability distributions defined by the output of QCMRF circuits contains the $1$ -level QAOA circuits, this larger class can also be hard to weakly simulate, i.e., to sample the output probability distribution efficiently classically.

The classical hardness of distributions efficiently captured by MNs and QCMRF circuits makes it reasonable to believe that these two classes have a nonzero intersection that also contains hard cases. This also means, that - provided that we can train a QCMRF model to sufficient precision - we can use the trained model to efficiently sample the distribution of the underlying MN. These facts together with our numerical findings lead to the following conjecture, that is also illustrated in Fig. 7.

Informal Conjecture.

The class of QCMRF circuits, that can learn probability distribution efficiently represented by MNs, also contains classically hard cases, yielding a quantum advantage in sampling.

IV Conclusion & Outlook

In this work, we highlighted the potential of probabilistic graphical models for generative QML. We introduced a framework for constructing quantum circuit Born machine Ansätze, that respect the structure of the Markov network describing the underlying problem. A novel problem construction process was presented for benchmarking generative QML models, where the complexity of the learning task can be tuned in various ways. This benchmarking framework is capable of constructing explicit distribution learning problems and more realistic tasks based on limited samples.

Our numerical experiments demonstrated that our model, called quantum circuit Markov random field, is capable of capturing higher order correlations between the binary random variables of the corresponding MN. This can significantly improve the performance in the case of higher order target models, while potentially reduce the number of trainable parameters on sparse graphs, compared to problem-agnostic approaches. We further validated our model with basis-enhanced Bayesian quantum circuits on non-chordal MNs, since these Bayesian network-based models are able to express the target distribution exactly. The QCMRF models reached the performance of BBQCs on small loop-graphs with less parameters and significantly shallower circuits. All these experiments were conducted using the KL divergence and MMD loss functions, to demonstrate both exact distribution learning tasks and more practical generative learning based on limited training samples.

A preliminary numerical analysis of trainability was presented, that introduced an important constraint on the sparsity of the potentential MNs of interest. We formulated two definitions of quantum advantage, relevant in the context of generative models, where the first concentrated on a learning advantage, and the second focused on efficiently sampling from the learned distribution. We presented an argument in the second setting, highlighting the potential of our model to offer improvements over classical methods, since it contains the class of QAOA circuits, that are known to be hard to sample from. We also believe that this connection between classical MNs and QAOA-type circuits opens up an interesting direction for further investigations.

While we concentrated on learning a target distribution to high accuracy, this alone is not enough to characterize the performance of generative models. Another important factor is the model’s ability to generalize, rather than memorize the training data [56]. In the context of PGMs, this can be investigated with a training dataset of limited size, assessing the trained model’s ability to generate valid but unseen samples.

We also remark, that while our QCMRF model shows significant improvement compared to problem-agnostic QCIBMs on several small examples, BBQCs show better performance on chordal graphs, where both PGM-based models have the same number of trainable parameters. This means, that for chordal graphs, BN-based models are better at capturing the target distribution, although requiring much deeper circuits.

Possible extensions of our QCMRF model could include replacing the final set of general one-qubit unitaries $U_{f}(\Gamma,\Delta,\Sigma)$ with one-parameter rotations (e.g., $R_{X}$ rotations), as well as using multiple layers to have higher resemblance to QAOA circuits and to reach the overparametrized regime. Finally, we also remark that, while our model design framework was constructed in the context of QCBMs, this idea can also help in designing other generative QML models, e.g., QGANs.

V Acknowledgements

The authors would like to thank the support of the Hungarian National Research, Development and Innovation Office (NKFIH) through the KDP-2021 and KDP-2023 funding scheme, the Quantum Information National Laboratory of Hungary and the grants TKP-2021-NVA-04 and FK 135220. The authors also acknowledge the computational resources provided by the Wigner Scientific Computational Laboratory (WSCLAB).

Appendix A Implementation details

In this section, we describe the implementation details of the PGM-based quantum models, including possible decompositions of multi-qubit operators into single-qubit and $CNOT$ gates.

A.1 Quantum Circuit Markov Random Fields

Markov networks define higher-order Ising Hamiltonians as described in III.1, which generate QCMRF circuits composed of $MultiRZ$ gates. For the MN in Fig. 2b, the corresponding parametrized circuit is shown in Fig. 8a. A $k$ -local $MultiRZ$ gate can be implemented in linear depth with $2k$ $CNOT$ gates and a single-qubit $R_{Z}$ rotation. An example for this decomposition in shown in Fig. 8b. There are several other alternative strategies for implementing these circuits, as explained in the context of QAOA in Refs. [57, 58]. For example, one could implement the circuit corresponding to a clique with $m$ random variables with significantly less $CNOT$ gates using an ancillary qubit, as shown in Fig. 8c. This approach reduces the number of gates and the depth, while adding an ancillary qubit for each clique of size $>2$ .

The implementation of the QCIBM circuit can be done similarly, only using $2$ -local $MultiRZ$ gates, that do not need any ancillary qubits.

A.2 Bayesian Quantum Circuits

Here we start by describing the general idea of BQCs introduced in [33]. This model associates a qubit to each binary random variable of the BN and then applies unitary operations in the following manner. First parametrized single-qubit $R_{Y}$ rotation gates are applied to qubits for which the corresponding nodes have no parents. Then uniformly controlled $R_{Y}$ operations are performed on each qubit, where the control qubits correspond to the parents of the given node. Note, that since a node can have multiple parents, this can lead to gates with a large number of controls. The order of application of these unitaries has to follow a two rules: every qubit can only be targeted once and after a qubit was used as a control, it cannot be targeted anymore. These rules ensure the compliance with the directed acyclic nature of the graph. The basis-enhancement introduced in [35] further applies parametrized general $U_{f}(\Gamma,\Delta,\Sigma)$ gates to all qubits. The BBQC corresponding to the BN presented in 2a is shown in Fig. 9a.

While uniformly controlled gates provide an easily interpretable mapping from the BN graph to a parametrized quantum circuit, they cannot be implemented directly. The number of parameters for such a gate is $2^{n_{c}}$ , where $n_{c}$ is the number of control qubits. In [34] and [46], the authors gave several strategies to decompose these operators. For the purpose of this work, it is easiest to think of these as the decomposition in terms of (multi-controlled) $R_{Y}$ rotations and $X$ gates. This decomposition of a uniformly controlled $R_{Y}$ rotation with $2$ controls is shown in Fig. 9b. The multi-controlled gates can be further decomposed into single qubit rotation and $CNOT$ gates with additional ancillary qubits as described in [34] or by adapting the ancilla-free strategy of [59].

Appendix B Resource estimation

Here we review the general cost of implementation, considering the number of parameters, qubits and the circuit depth. We compare these metrics for all three models, where possible.

B.1 Number of parameters

An important factor in QML Ansatz design is the number of trainable parameters. This consideration leads to a delicate balance. On one hand, we want our model to have enough expressivity to capture the target distribution. On the other hand, we want to limit the number of parameters, to provide faster training, minimize noise and potentially escape barren plateaus [20, 21].

In the case of the QCIBM and QCMRF models, the number of parameters depends mostly on the number of terms in Hamiltonian that generates the time evolution. However, these models cannot be compared directly, since in QCIBMs this number only depends on the number of qubits $n$ and in QCMRF circuits there is an explicit dependence on the graph topology and the clique factorization, but also the overlap of the cliques.

To represent a classical MN as complete factor-tables, one needs $\sum_{C\in\mathcal{C}}2^{|C|}$ parameters, $|C|$ referring to the size of clique $C$ . Consequently, our Hamiltonian before reparametrization has exactly these many parameters. This means, that the parameter count in a QCMRF Ansatz is

k_{QCMRF}\leq\sum\limits_{C\in\mathcal{C}}\sum\limits_{k=1}^{|C|}\binom{|C|}{k}+3n=\sum\limits_{C\in\mathcal{C}}(2^{|C|}-1)+3n.

(11)

We can see, that this number does not depend on the problem size, i.e. the number of binary random variables $n$ directly, only on the sizes of the cliques $l_{C}$ . This means, that assuming the clique sizes of a given graph topology is constant, without explicit dependence on the number of nodes, the parameter count is $\mathcal{O}(n)$ .

In the case of the problem-agnostic $2$ -local QCIBM Ansatz, this number is $k_{QCIBM}=n(n-1)/2+4n$ , which is $\mathcal{O}(n^{2})$ .

It is worth mentioning, that for pairwise MNs, meaning those defined by only pairwise interactions lead to two-body terms between the qubits representing connected nodes and one-body Pauli-Z terms on all qubits. These give rise to QCMRF circuits with less or equal number of parameters than the corresponding QCIBM Ansatz, where we have equality only for complete graphs.

A similar analysis can be performed for BBQC circuits as well. Here the number of parameters is exactly

k_{BBQC}=\sum\limits_{i=1}^{n}2^{|Pa^{\mathcal{G}}_{X_{i}}|}+3n,

(12)

where $|Pa^{\mathcal{G}}_{X_{i}}|$ refers to the number of parents of the $i$ -th node.

Since the conversion between BNs and MNs can introduce additional edges, this conversion can also introduce additional parameters. This means, that usually, the number of parameters in BBQCs is larger than in QCMRFs, when the underlying problem is defined by a MN. However, for chordal MNs, having maximal clique factorization, the number of parameters in the two model classes is equal.

B.2 Number of qubits

Since in all the models we consider, we map $n$ binary random variables to $n$ qubits. This means, that the number of qubits in all these cases is $n$ , but certain gate decompositions require ancillary qubits, increasing this number.

For the QCIBM models, since they only contain one- and two-qubit gates, we need no additional ancillas and the final number of qubits is $n$ . In the case of QCMRF circuits we have at least two options, we can use no ancillas leading to the same number, or we can use $n_{C}$ ancillas, where $n_{C}$ refers to the number of cliques, that are larger than $2$ . In this latter case, we end up with $n+n_{C}$ qubits. We have even more freedom for BBQC models. We can use the ancilla-free implementation of multi-controlled gates from [59], or we can use $l-1$ ancillary qubits to implement each multi-qubit gate with $l$ controls, where the ancillas can also be reused. It is also possible to interpolate between the two approaches, $n$ and $n+\sum_{i=1;|Pa^{\mathcal{G}}_{X_{i}}|\neq 0}^{n}(|Pa^{\mathcal{G}}_{X_{i}}|-1)$ . This lead to a trade-off between the number of qubits and circuit depth.

B.3 Circuit depth

Based on the implementation of multi-qubit gates, the depth of the circuit can also vary. In general, the more ancillas we use for decomposing these unitaries, the shallower the final circuit can be. For a fair comparison, here we only consider ancilla-free implementations and analyse the depth accordingly. We also assume full connectivity of qubits and parallel execution of gates that act on disjoint sets of qubits, where possible and use parametrized one-qubit $U_{f}$ and $CNOT$ gates as our basis gate set.

QCIBM circuits being problem agnostic, their depth can be estimated knowing only the number of qubits. The initial Hadamard gates along with the single-qubit $R_{Z}$ rotations and final $U_{f}$ gates can be implemented in constant depth. Since we assume parallel execution of gates and each two-qubit $\exp(-i\alpha ZZ)$ gate can be implemented in depth $3$ , the depth of the whole circuit scales $\mathcal{O}(n)$ .

For QCMRF circuits, the scaling is more complicated, since it depends on the clique-structure of the MN. Here we give an upper-bound on the depth, considering the Hamiltonian before reparametrization. For each clique $C$ , we have $2^{|C|}-1$ $MultiRZ$ gates, and there are ${|C|\choose k}$ $k$ -local gates for each $k\leq|C|$ . To minimize the depth, we can implement two gates in parallel in each step, such that the qubits they act on are complements of each other in the clique. In the ancilla-free case, each $k$ -local gate can be implemented in $2k-1$ depth. This means, that the depth required to implement such a clique is $\mathcal{O}(2^{|C|})$ . If we assume $|\mathcal{C}|$ cliques in the graph and the size of the largest maximal clique is $l_{m}$ , than the depth required is $\mathcal{O}(|\mathcal{C}|2^{l_{m}})$ . However, this is a crude upper bound in the worst case, not taking into account the parallel execution of gates of disjoint cliques. In practice, this scaling can be much better.

The analysis is similar for the BBQC models. Here for each node $X_{i}$ the number of multi-controlled $R_{Y}$ gates scales $\mathcal{O}(2^{|Pa^{\mathcal{G}}_{X_{i}}|})$ . Another difference is that we cannot implement these unitaries in parallel, which further increases the depth. By fixing the largest number of parents as $n^{p}_{m}$ , the depth scales $\mathcal{O}(n2^{n^{p}_{m}})$ in the worst case. It is easy to see, that for BNs and MNs based on chordal graphs, the size of the largest clique is equal to the maximal number of parents.

We compare these two models on the simple yet illustrative example of a graph composed of a single triangle. This is obviously a chordal graph, and the number of trainable parameters is equal in both circuit. In this case, the depth of the QCMRF circuit is $16$ . In the corresponding BN, the first node has no parents leading to depth $1$ . The second node has $1$ parent, leading to $2$ single-controlled $R_{Y}$ gates, each implemented in depth $4$ , thus depth $10$ with the $X$ gates. The final node has $2$ parents, meaning $4$ multi-controlled gates with two controls, each having depth $14$ , thus depth $60$ in total. The final depth of this circuit is $72$ with the $U_{f}$ gates before measurement. This example shows that BBQCs can be much deeper than QCMRF circuits on chordal graphs, while this is not reflected in the number of trainable parameters.

We further demonstrate this difference numerically on two types of graphs. In each case, we compiled the circuits to the given basis gate set using the transpilation method of Qiskit [60] with optimization level $3$ . First, we explore the scaling in the number of nodes for the loop graphs shown in Fig. 5. Here we can see linear dependence in the case of BBQC and $\mathcal{O}(1)$ for QCMRF, which is expected for this very special type of graph (see Fig. 10 left). While these loops capture an important class, they do not provide a fair comparison, since here the triangulation of the undirected graph introduces a significant number of additional parameters for the corresponding BBQC.

As a next comparison, we concentrate on an important class of BNs, called $k$ -gram models, used in natural language processing applications [61]. In the graph representation of a $k$ -gram model, the nodes form an ordered set, in which the parents of the $l$ -th node are nodes $\{l-k+1,\dots,l-1\}$ , where they exist. This induces the factorization

\begin{split}P(X_{1},\dots,X_{n})&=P(X_{1})P(X_{2}|X_{1})P(X_{3}|X_{1},X_{2})\dots\\ &\cdot P(X_{n}|X_{n-k+1},\dots,X_{n-1}).\end{split}

(13)

These models define chordal graphs, where the number of parameters is the same in both quantum models and the maximal number of parents in the BN is the same as the size of the maximal cliques in the corresponding MN. We can observe similar scaling in $k$ for both models, as shown in Fig. 10 (right), but the depth of BBQCs is always at least an order of magnitude higher than that of QCMRF circuits. While this is not significant in a complexity theoretic perspective, it makes a significant difference when implementing on real quantum hardware, especially on near-term devices.

References

[1] I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Computer Science, vol. 2, no. 3, p. 160, 2021.
[2] S. Bond-Taylor, A. Leach, Y. Long, and C. G. Willcocks, “Deep generative modelling: A comparative review of VAEs, GANs, normalizing flows, energy-based and autoregressive models,” IEEE transactions on pattern analysis and machine intelligence, 2021.
[3] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, “Quantum machine learning,” Nature, vol. 549, no. 7671, pp. 195–202, 2017.
[4] M. Schuld, I. Sinayskiy, and F. Petruccione, “The quest for a quantum neural network,” Quantum Information Processing, vol. 13, pp. 2567–2586, 2014.
[5] B. M. Terhal and D. P. DiVincenzo, “Adaptive quantum computation, constant depth quantum circuits and arthur-merlin games,” Quantum Info. Comput., vol. 4, no. 2, pp. 134–145, 2004.
[6] S. Aaronson and A. Arkhipov, “The computational complexity of linear optics,” in Proceedings of the forty-third annual ACM symposium on Theory of computing, pp. 333–342, 2011.
[7] E. Farhi and A. W. Harrow, “Quantum supremacy through the quantum approximate optimization algorithm,” arXiv preprint arXiv:1602.07674, 2016.
[8] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell, et al., “Quantum supremacy using a programmable superconducting processor,” Nature, vol. 574, no. 7779, pp. 505–510, 2019.
[9] L. S. Madsen, F. Laudenbach, M. F. Askarani, F. Rortais, T. Vincent, J. F. Bulmer, F. M. Miatto, L. Neuhaus, L. G. Helt, M. J. Collins, et al., “Quantum computational advantage with a programmable photonic processor,” Nature, vol. 606, no. 7912, pp. 75–81, 2022.
[10] J. Tian, X. Sun, Y. Du, S. Zhao, Q. Liu, K. Zhang, W. Yi, W. Huang, C. Wang, X. Wu, et al., “Recent advances for quantum neural networks in generative learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
[11] M. Benedetti, D. Garcia-Pintos, O. Perdomo, V. Leyton-Ortega, Y. Nam, and A. Perdomo-Ortiz, “A generative modeling approach for benchmarking and training shallow quantum circuits,” npj Quantum Information, vol. 5, no. 1, p. 45, 2019.
[12] J.-G. Liu and L. Wang, “Differentiable learning of quantum circuit Born machines,” Physical Review A, vol. 98, no. 6, p. 062324, 2018.
[13] B. Coyle, D. Mills, V. Danos, and E. Kashefi, “The Born supremacy: quantum advantage and training of an Ising Born machine,” npj Quantum Information, vol. 6, no. 1, p. 60, 2020.
[14] S. Lloyd and C. Weedbrook, “Quantum generative adversarial learning,” Physical review letters, vol. 121, no. 4, p. 040502, 2018.
[15] P.-L. Dallaire-Demers and N. Killoran, “Quantum generative adversarial networks,” Physical Review A, vol. 98, no. 1, p. 012324, 2018.
[16] M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and R. Melko, “Quantum Boltzmann machine,” Physical Review X, vol. 8, no. 2, p. 021050, 2018.
[17] C. Zoufal, A. Lucchi, and S. Woerner, “Variational quantum Boltzmann machines,” Quantum Machine Intelligence, vol. 3, pp. 1–15, 2021.
[18] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, and H. Neven, “Barren plateaus in quantum neural network training landscapes,” Nature communications, vol. 9, no. 1, p. 4812, 2018.
[19] Z. Holmes, K. Sharma, M. Cerezo, and P. J. Coles, “Connecting Ansatz expressibility to gradient magnitudes and barren plateaus,” PRX Quantum, vol. 3, no. 1, p. 010313, 2022.
[20] M. Ragone, B. N. Bakalov, F. Sauvage, A. F. Kemper, C. O. Marrero, M. Larocca, and M. Cerezo, “A unified theory of barren plateaus for deep parametrized quantum circuits,” arXiv preprint arXiv:2309.09342, 2023.
[21] E. Fontana, D. Herman, S. Chakrabarti, N. Kumar, R. Yalovetzky, J. Heredge, S. H. Sureshbabu, and M. Pistoia, “The adjoint is all you need: Characterizing barren plateaus in quantum Ansätze,” arXiv preprint arXiv:2309.07902, 2023.
[22] E. R. Anschuetz and B. T. Kiani, “Quantum variational algorithms are swamped with traps,” Nature Communications, vol. 13, no. 1, p. 7760, 2022.
[23] Y.-C. Ho and D. L. Pepyne, “Simple explanation of the no-free-lunch theorem and its implications,” Journal of optimization theory and applications, vol. 115, pp. 549–570, 2002.
[24] K. Poland, K. Beer, and T. J. Osborne, “No free lunch for quantum machine learning,” arXiv preprint arXiv:2003.14103, 2020.
[25] K. Sharma, M. Cerezo, Z. Holmes, L. Cincio, A. Sornborger, and P. J. Coles, “Reformulation of the no-free-lunch theorem for entangled datasets,” Physical Review Letters, vol. 128, no. 7, p. 070501, 2022.
[26] M. Larocca, F. Sauvage, F. M. Sbahi, G. Verdon, P. J. Coles, and M. Cerezo, “Group-invariant quantum machine learning,” PRX Quantum, vol. 3, no. 3, p. 030341, 2022.
[27] J. J. Meyer, M. Mularski, E. Gil-Fuster, A. A. Mele, F. Arzani, A. Wilms, and J. Eisert, “Exploiting symmetry in variational quantum machine learning,” PRX Quantum, vol. 4, no. 1, p. 010328, 2023.
[28] H. Zheng, Z. Li, J. Liu, S. Strelchuk, and R. Kondor, “Speeding up learning quantum states through group equivariant convolutional quantum ansätze,” PRX Quantum, vol. 4, no. 2, p. 020327, 2023.
[29] J. Bowles, V. J. Wright, M. Farkas, N. Killoran, and M. Schuld, “Contextuality and inductive bias in quantum machine learning,” arXiv preprint arXiv:2302.01365, 2023.
[30] D. Koller and N. Friedman, Probabilistic graphical models: principles and techniques. MIT press, 2009.
[31] D. Heckerman, A. Mamdani, and M. P. Wellman, “Real-world applications of Bayesian networks,” Communications of the ACM, vol. 38, no. 3, pp. 24–26, 1995.
[32] K. P. Murphy, Machine learning: a probabilistic perspective. MIT press, 2012.
[33] G. H. Low, T. J. Yoder, and I. L. Chuang, “Quantum inference on Bayesian networks,” Physical Review A, vol. 89, no. 6, p. 062315, 2014.
[34] S. E. Borujeni, S. Nannapaneni, N. H. Nguyen, E. C. Behrman, and J. E. Steck, “Quantum circuit representation of Bayesian networks,” Expert Systems with Applications, vol. 176, p. 114768, 2021.
[35] X. Gao, E. R. Anschuetz, S.-T. Wang, J. I. Cirac, and M. D. Lukin, “Enhancing generative models via quantum correlations,” Physical Review X, vol. 12, no. 2, p. 021037, 2022.
[36] E. Farhi, J. Goldstone, and S. Gutmann, “A quantum approximate optimization algorithm,” arXiv preprint arXiv:1411.4028, 2014.
[37] H. Krovi, “Average-case hardness of estimating probabilities of random quantum circuits with a linear scaling in the error exponent,” arXiv preprint arXiv:2206.05642, 2022.
[38] L. Von Rueden, S. Mayer, K. Beckh, B. Georgiev, S. Giesselbach, R. Heese, B. Kirsch, J. Pfrommer, A. Pick, R. Ramamurthy, et al., “Informed machine learning–a taxonomy and survey of integrating prior knowledge into learning systems,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 1, pp. 614–633, 2021.
[39] Y. Song and D. P. Kingma, “How to train your energy-based models,” arXiv preprint arXiv:2101.03288, 2021.
[40] V. Gogate, W. Webb, and P. Domingos, “Learning efficient Markov networks,” Advances in neural information processing systems, vol. 23, 2010.
[41] D. Roth, “On the hardness of approximate reasoning,” Artificial Intelligence, vol. 82, no. 1-2, pp. 273–302, 1996.
[42] M. S. Rudolph, S. Lerch, S. Thanasilp, O. Kiss, S. Vallecorsa, M. Grossi, and Z. Holmes, “Trainability barriers and opportunities in quantum generative modeling,” arXiv preprint arXiv:2305.02881, 2023.
[43] B. Coyle, M. Henderson, J. C. J. Le, N. Kumar, M. Paini, and E. Kashefi, “Quantum versus classical generative modelling in finance,” Quantum Science and Technology, vol. 6, no. 2, p. 024013, 2021.
[44] C. Chow and C. Liu, “Approximating discrete probability distributions with dependence trees,” IEEE transactions on Information Theory, vol. 14, no. 3, pp. 462–467, 1968.
[45] M. Benedetti, B. Coyle, M. Fiorentini, M. Lubasch, and M. Rosenkranz, “Variational inference with a quantum computer,” Physical Review Applied, vol. 16, no. 4, p. 044057, 2021.
[46] V. Bergholm, J. J. Vartiainen, M. Möttönen, and M. M. Salomaa, “Quantum circuits with uniformly controlled one-qubit gates,” Phys. Rev. A, vol. 71, p. 052330, May 2005.
[47] N. Piatkowski and C. Zoufal, “On quantum circuits for discrete graphical models,” arXiv preprint arXiv:2206.00398, 2022.
[48] A. Gilyén, Y. Su, G. H. Low, and N. Wiebe, “Quantum singular value transformation and beyond: exponential improvements for quantum matrix arithmetics,” in Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pp. 193–204, 2019.
[49] F. J. Kiwit, M. Marso, P. Ross, C. A. Riofrío, J. Klepsch, and A. Luckow, “Application-oriented benchmarking of quantum generative learning using QUARK,” in 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 1, pp. 475–484, IEEE, 2023.
[50] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, S. Ahmed, V. Ajith, M. S. Alam, G. Alonso-Linaje, B. AkashNarayanan, A. Asadi, et al., “Pennylane: Automatic differentiation of hybrid quantum-classical computations,” arXiv preprint arXiv:1811.04968, 2018.
[51] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[52] A. Arrasmith, Z. Holmes, M. Cerezo, and P. J. Coles, “Equivalence of quantum barren plateaus to cost concentration and narrow gorges,” Quantum Science and Technology, vol. 7, no. 4, p. 045015, 2022.
[53] E. Y. Zhu, S. Johri, D. Bacon, M. Esencan, J. Kim, M. Muir, N. Murgai, J. Nguyen, N. Pisenti, A. Schouela, et al., “Generative quantum learning of joint probability distribution functions,” Physical Review Research, vol. 4, no. 4, p. 043092, 2022.
[54] M. J. Bremner, R. Jozsa, and D. J. Shepherd, “Classical simulation of commuting quantum computations implies collapse of the polynomial hierarchy,” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 467, no. 2126, pp. 459–472, 2011.
[55] A. M. Dalzell, A. W. Harrow, D. E. Koh, and R. L. La Placa, “How many qubits are needed for quantum computational supremacy?,” Quantum, vol. 4, p. 264, 2020.
[56] K. Gili, M. Hibat-Allah, M. Mauri, C. Ballance, and A. Perdomo-Ortiz, “Do quantum circuit Born machines generalize?,” Quantum Science and Technology, vol. 8, no. 3, p. 035021, 2023.
[57] A. Glos, A. Krawiec, and Z. Zimborás, “Space-efficient binary optimization for variational quantum computing,” npj Quantum Information, vol. 8, no. 1, p. 39, 2022.
[58] B. Bakó, A. Glos, Ö. Salehi, and Z. Zimborás, “Prog-qaoa: Framework for resource-efficient quantum optimization through classical programs,” arXiv preprint arXiv:2209.03386, 2022.
[59] A. J. Da Silva and D. K. Park, “Linear-depth quantum circuits for multiqubit controlled gates,” Physical Review A, vol. 106, no. 4, p. 042602, 2022.
[60] Qiskit contributors, “Qiskit: An open-source framework for quantum computing,” 2023.
[61] C. Manning and H. Schutze, Foundations of statistical natural language processing. MIT press, 1999.