Application-level Benchmarking of Quantum Computers using Nonlocal Game Strategies

Jim Furches¹, Sarah Chehade², Kathleen Hamilton², Nathan Wiebe^3,4,5, and Carlos Ortiz Marrero^1,6 ¹Physical Detection Systems and Deployment Division, Pacific Northwest National Laboratory, Richland, WA 99354 ²Quantum Computational Science Group, Oak Ridge National Laboratory, Oak Ridge, TN 37830 ³Department of Computer Science, University of Toronto, ON M5S 1A1, Canada ⁴High Performance Computing Group, Pacific Northwest National Laboratory, Richland, WA 99354 ⁵Canadian Institute for Advanced Research, Toronto, On M5G 1M1, Canada ⁶Department of Computer Science, Colorado State University, Fort Collins, CO 80523 carlos.ortizmarrero@pnnl.gov

Abstract

In a nonlocal game, two noncommunicating players cooperate to convince a referee that they possess a strategy that does not violate the rules of the game. Quantum strategies allow players to optimally win some games by performing joint measurements on a shared entangled state, but computing these strategies can be challenging. We present a variational quantum algorithm to compute quantum strategies for nonlocal games by encoding the rules of a nonlocal game into a Hamiltonian. We show how this algorithm can generate a short-depth optimal quantum strategy for a graph coloring game with a quantum advantage. This quantum strategy is then evaluated on fourteen different quantum hardware platforms to demonstrate its utility as a benchmark. Finally, we discuss potential sources of errors that can explain the observed decreased performance of the executed task and derive an expression for the number of samples required to accurately estimate the win rate in the presence of noise.

1 Introduction

Running simple instances of quantum algorithms with a provable advantage is difficult given the current state of quantum hardware [1, 2]. For this reason, it is important to develop benchmarking tools and techniques that can test and validate the unique aspects of quantum hardware that are consistent with the predictions of quantum theory. In particular, recent work on quantum benchmarking has highlighted the importance of developing benchmarking metrics that can measure progress toward quantum utility of useful quantum algorithms [3].

Low-level benchmark metrics such as randomized benchmarking [4, 5, 6] aim to measure the average error rates of a gate set independent of the initial state or measurement scheme, but are limited, for example, in that it cannot help specify sources of error in an algorithmic pipeline [7] and can overestimate gate fidelity in the presence of errors [8]. High-level benchmarks such as Quantum Volume [9] aim to measure the performance of the entire quantum computing stack, including all classical control systems, but this can be too broad a metric and does not necessarily capture the performance of useful quantum algorithms. In addition, both of these benchmark metrics are difficult to compute at scale and fail to capture the ability of a specific hardware platform at attaining some quantum advantage.

Recent work on nonlocal games has begun to shed light into their utility for quantum hardware verification, quantum advantage, and self-testing [10, 11, 12, 13, 14]. In a nonlocal game, two noncommunicating players cooperate to convince a referee that they possess a strategy that does not violate the rules of a game. When players are allowed to use entanglement as a resource in the development of their joint strategy, they are able to perform computations that no classical computer can replicate without communication and can win the game with higher probability. Nonlocal games have been historically important and provide a unique setting to explore the relationship between classical physics, quantum theory, and other non-signaling theories [15, 16, 17, 18]. An extensive body of research links these games to foundational problems in quantum physics, conjectures in operator algebras, and computational complexity theory [19, 20, 21]. Moreover, advances in quantum information theory and combinatorics have revealed broad classes of games with a provable quantum advantage when players are allowed to incorporate quantum resources into their strategies, such as graph coloring and graph homomorphism games [22, 23], making them exciting experimental candidates for testing quantum hardware [24]. Moreover, nonlocal games are classically verifiable, i.e. given a strategy, you can check in polynomial time if the answers satisfy the rules of the game.

Despite many breakthroughs in our theoretical understanding of nonlocal games, constructing optimal strategies for general nonlocal games remains a challenge. In our work, we propose a new methodology for constructing strategies using variational methods and outline the utility of the strategies found for benchmarking. We begin Section 2 with an introduction to nonlocal games and some definitions. In Section 3, we propose the use of a dual-phase optimization technique to find the resource state and the measurement scheme of a quantum strategies for a nonlocal game. In Section 4, we demonstrate how our method is able to successfully find optimal strategies for CHSH, an N-Partite Symmetric game, and the graph coloring game. For the graph coloring game, we were able to find a short-depth perfect quantum strategy for a graph on $14$ vertices shown to be the smallest graph instance where there exists a strict separation between classical and quantum strategies [25, 26]. We then proceed to test the performance of this novel short-depth strategy on $14$ superconducting quantum computing devices and highlight some potential sources of errors causing decreased performance on some of the devices we tested. In Section 5, we outline how we can use quantum strategies to benchmark quantum devices, their desirable noise robustness properties, and win rate estimation procedure in the presence of device shot noise.

2 Background

A nonlocal game of $N$ players $\mathcal{G}=(Q_{1},...,Q_{N},A_{1},...,A_{N},\lambda)$ (illustrated in Fig. 1) consists of a set of possible questions $Q_{j}$ that player $j$ receives from a referee and a set of answers $A_{j}$ that player $j$ is allowed to send to the referee, which the referee then evaluates against a rule function $\lambda:Q_{1}\times...\times Q_{N}\times A_{1}\times...\times A_{N}\to\{0,1\}$ . Each set of questions $Q_{j}$ and the set of answers $A_{j}$ has cardinality $m_{j}$ and $k_{j}$ , respectively; however, in our work, we assume that there are $m$ questions and $k$ answers for each player. The game proceeds in the following steps:

1.

The players are informed about the rules of the game $\lambda$ , and can collaborate to create a joint strategy, modeled as a conditional probability density between the questions and answers, to maximize their chances of satisfying the rules of the game before it starts.
2.

Players are then separated or isolated to prevent them from communicating. This is referred to as non-signaling, or in other words, each player’s actions are independent of each other.
3.

The referee tests the strategy by asking questions to each player $\mathbf{q}=[q_{1},...,q_{N}]$ and receiving their responses $\mathbf{a}=[a_{1},...,a_{N}]$ , where $q_{i}\in Q_{i}$ and $a_{i}\in A_{i}$ , respectively.
4.

The players win a round if $\lambda(\mathbf{a}|\mathbf{q})=1$ , and lose if $\lambda(\mathbf{a}|\mathbf{q})=0$ . Multiple rounds are played with different questions to establish that players have a valid strategy.

Refer to caption — Figure 1: Flow of a nonlocal game. After formulating a strategy, Alice and Bob separate and cannot communicate. For a quantum strategy, they each take a part of an entangled state $\rho$ and upon receiving a question $q$ from the referee, they perform a measurement on their respective states, giving an answer $a\sim p(a|q)$ . Finally, the referee receives their answers and verifies them against the rules $\lambda(a|q)$ .

It is common that all players share the same set of possible questions $Q$ and answers $A$ . In particular, Synchronous games have rules that require that the answers of two (or more) players be identical when asked the same questions $\lambda(a_{1},\dots,a_{i},a_{j},\dots,a_{N}|\tilde{\mathbf{q}})=0$ , for all $a_{i}\neq a_{j}$ , where $\tilde{\mathbf{q}}$ is a vector of questions. In our work, we only consider computing strategies for synchronous games, although the optimization procedure we propose in Section 3 applies for more general strategies.

Using the rules, we can define the value of the game as

\omega(\mathcal{G})=\sum_{qa}\lambda(a|q)p(q)p(a|q),

(1)

where the sum is taken over all possible values of $q\in Q^{N}$ and $a\in A^{N}$ (we drop the vector notation for convenience). The distribution $p(q)$ of the questions asked is typically chosen to be uniform, and the behavior $p(a|q)$ is determined by the strategy that the players construct. Notice that this is the only term that players can control to maximize their win rate. A strategy is said to be perfect if $\lambda(a|q)=0\implies p(a|q)=0$ and, consequently, $\omega(\mathcal{G})=1$ .

Classical strategies consist of a lookup table that indexes each player’s response to a particular question. It suffices to consider deterministic strategies since stochastic strategies involving shared randomness between the players cannot outperform deterministic strategies due to the linearity of the value of the game [17].

Suppose that players share a quantum state $\ket{\psi}\in\otimes_{i}\mathcal{H}_{i}$ , and each player has a set of positive operator-valued measures (POVMs) with elements of the form $\{\mathcal{M}_{a|q}\}$ , which they perform on their subspace. Using this setup, players can generate the following conditional probability density,

p(a|q)=\mathrm{Tr}\left[\rho\left(\otimes_{i}\mathcal{M}_{a_{i}|q_{i}}\right)\right],

(2)

where $\rho=|\psi\rangle\langle\psi|$ . These densities are known as quantum strategies.

In addition to the above definition of a quantum strategy, there are a variety of competing definitions for a quantum strategy depending on the choice of axioms that describe how joint measurements between two parties should be performed [27]. In our work, we will only consider strategies as defined above, but the study of quantum strategies is a very active area of research [28, 29, 30, 31]. Note that in [32], the authors prove that for synchronous games, a maximally entangled state is sufficient for a quantum strategy to win the graph coloring game.

A game exhibits a quantum advantage if there exists a quantum strategy that performs better than the best possible classical strategy, in which case there is a Bell inequality $\mathcal{I}$ that is violated for some quantum strategies. The inequality has historically served as an experimental test of local realism [33]. Such inequalities are constraints satisfied by classical (local hidden-variable) models, and are often linear inequalities derived from the local realism assumption. More specifically, a Bell inequality consists of a function $\mathcal{I}$ with respect to the probabilities $\{p(a|q)\}$ such that

\mathcal{I}(\{p(a|q)\})\leq\xi,

(3)

for some $\xi\in\mathbb{R}$ . Bell inequalities are a central object for self-testing of states [12]. The construction of such functions $\mathcal{I}$ and constants $\xi$ are as follows: for a given Bell inequality $\mathcal{I}=\sum\limits_{q,a}w_{q,a}p(a|q)$ , where $w_{q,a}$ are weights, there is a corresponding Bell operator $\mathcal{B}=\sum\limits_{q,a}w_{q,a}\bigotimes\limits_{i}\mathcal{M}_{a_{i}|q_{i}}$ , such that a violation is obtained as $\xi=\mathrm{Tr}(\mathcal{B}\rho)$ . If the maximal achievable violation is obtained by using quantum resources, denote $\xi_{Q}$ for this distinction and consider the shifted Bell operator $\xi_{Q}\mathbbm{1}-\mathcal{B}$ . If the shifted Bell operator admits a decomposition

\xi_{Q}\mathbbm{1}-\mathcal{B}=\sum\limits_{\gamma}P_{\gamma}^{\dagger}P_{\gamma},

(4)

where each $P_{\gamma}$ is a polynomial with respect to the measurement operators $\{\mathcal{M}_{a_{i}|q_{i}}\}$ , then we call the decomposition a sum of squares (SOS) for the Bell inequality. Such a decomposition is extremely hard to find [34].

3 Method

In this section, we present a variational quantum algorithm for computing quantum strategies of nonlocal games. Let $\ket{\psi}$ be the shared entangled state between the players and $\mathcal{M}_{a|q}=\bigotimes_{i}\mathcal{M}_{a_{i}|q_{i}}$ be the joint POVM applied to that state for question $q$ , returning $a$ with probability $p(a|q)=\langle\mathcal{M}_{a|q}\rangle$ . It was noted in [35] that fixing these measurement operators gives a Hamiltonian whose ground state is the optimal shared state for this measurement setting. This fact has been used with reinforcement learning to optimize measurements while selecting the optimal shared state through exact diagonalization [36].

3.1 Dual-Phase Optimization

Our approach is a dual-phase optimization (DPO) that alternates between 2 phases: preparing the optimal state $\ket{\psi}$ for the fixed measurements $\{\mathcal{M}_{a|q}\}$ , and optimizing the measurements, while fixing the shared state. We assume that the players parameterize their state $\ket{\psi}\rightarrow\ket{\psi(\theta)}$ and POVMs $\mathcal{M}_{a|q}\rightarrow\mathcal{M}_{a|q}(\phi)$ . The particular choice of parameterization depends on characteristics of the game (e.g. number of qubits required depends on the number of answers).

Algorithm 1 Dual-Phase Optimization

Initialize

\phi

randomly

while

\Delta\langle H\rangle>\epsilon

Construct

H(\phi)

\ket{\psi(\theta)}\leftarrow VQE(H(\phi))

\phi\leftarrow GD(\langle H(\phi)\rangle)

end while

The preparation of the Hamiltonian depends on the specific measurement scheme the players decide on, which depend on the game. Later, we outline a method for constructing a Hamiltonian from arbitrary game rules $\lambda$ and measurements $\{\mathcal{M}_{a|q}\}$ .

The optimal shared state is prepared in the first phase using any VQE procedure $VQE(\cdot)$ . Here, we choose ADAPT-VQE [37] because it generates compact variational circuits for use on near-term quantum hardware, but any other solver can also be used (see B). The reference state $\ket{\psi_{0}}$ can be a product state, e.g. $\ket{0}$ , $\ket{+}$ . We choose a qubit operator pool consisting of all possible Pauli strings $P$ acting on the entire system. The operators added to the state take the form $e^{i\theta P}$ , giving $\ket{\psi(\theta)}=\prod\limits_{j=N}^{1}e^{i\theta_{j}P_{j}}\ket{\psi_{0}}$ , and they are capable of generating the entanglement required to win nonlocal games, provided they act non-trivially on at least 2 qubits.

The second phase uses a gradient descent-like optimizer $GD(\cdot)$ to update the measurement parameters $\phi$ . This requires the calculation of gradients $\nabla_{\phi}\braket{\psi(\theta)|H(\phi)|\psi(\theta)}$ on the quantum device, which can be done through parameter shift rules [38, 39]. In F, we outline the cost of computing this gradient for larger problem instances and some optimization challenges.

3.2 Game Hamiltonians

As mentioned above, DPO requires the construction of a Hamiltonian based on the measurements of the players, which determines the quantum strategy. Player $i$ may measure their qubits $\rho_{i}$ in an arbitrary basis depending on the question, leading to a form for the measurement operators

	$\displaystyle\mathcal{M}_{a\|q}$	$\displaystyle=\bigotimes_{i}U_{iq_{i}}^{\dagger}P_{a_{i}}U_{iq_{i}}$		(5)
		$\displaystyle=U_{q}^{\dagger}P_{a}U_{q},$		(6)

where $P_{a_{i}}=\outerproduct{a_{i}}{a_{i}}$ is the projector onto answer $a_{i}$ , and $U_{iq_{i}}$ acts on $\rho_{i}$ in response to question $q_{i}$ . Because $\langle\mathcal{M}_{a|q}\rangle=p(a|q)$ , we can substitute this into (1) to construct the game operator

	$\displaystyle\beta$	$\displaystyle=\sum_{qa}\lambda(a\|q)p(q)\mathcal{M}_{a\|q}$		(7)
		$\displaystyle=\sum_{q}p(q)U_{q}^{\dagger}\left(\sum_{a}\lambda(a\|q)P_{a}\right)U_{q}$		(8)

with the property $\braket{\beta}=\omega(\mathcal{G})$ . A VQE finds the ground state of a Hamiltonian, so to maximize the win rate, we use a value Hamiltonian $H=-\beta$ in DPO.

Proposition 3.1.

The value $\braket{\beta}=1$ if and only if the players have a perfect quantum strategy, otherwise $\braket{\beta}<1$ .

Proof 3.1.

We show this by first computing the value for a perfect strategy and then for an imperfect strategy. Let $I=\{(q,a)~|~\forall q,a~\lambda(a|q)=0\}$ be the set of question-answer pairs for which the strategy violates the rules, and let $P=I^{c}$ be its complement, the set of correctly answered question-answer pairs.

For a perfect quantum strategy, $I=\emptyset$ and $P=Q^{N}\times A^{N}$ , therefore we get

\displaystyle\braket{\beta}

\displaystyle=\sum_{q,a\in P\cup I}p(q)\lambda(a|q)

\displaystyle=\sum_{qa\in P}p(q,a)+(0)\sum_{qa\in I}p(q,a).

Since $I=\emptyset$ for all $q,a$ pairs for which $\lambda=1$ are contained in $P$ , it follows that the joint probability density in the left term must sum to 1. Hence, we obtain $\braket{\beta}=1$ .

A very similar line of reasoning holds for an imperfect strategy, where $I\neq\emptyset$ . Reusing the above expression,

	$\displaystyle\braket{\beta}$	$\displaystyle=\sum_{qa\in P}p(q,a)+(0)\sum_{qa\in I}p(q,a)$
		$\displaystyle=\sum_{qa\in P}p(q,a)<1,$

since for $p(q,a),~q,a\in P$ no longer contains the full probability density as $I$ contains some possible pairs. We conclude that $\braket{\beta}\leq 1$ , with $\braket{\beta}=1$ iff a strategy is perfect.

$\square$

To parameterize this Hamiltonian for DPO, a general single-qubit unitary $U_{1}$ may be decomposed into 3 parameters, leading to a parameterization of the full measurement gate $U_{q}=\bigotimes_{i,j_{i}}U_{1}(\phi_{iq_{i}j_{i}})$ , where $i$ indexes the player, $q_{i}$ indexes the question, and $j_{i}$ indexes the particular qubit of player $i$ . In measurement layers acting on multiple qubits, we expand each entry of $\phi_{iq_{i}j_{i}}$ to be the concatenated vector of parameters for all gates applied to that qubit, i.e. $U_{q}=\bigotimes_{i}U_{N_{i}}(\phi_{iq_{i}})$ , where $U_{N_{i}}$ is an $N_{i}$ -qubit unitary.

4 Experiments

To evaluate the performance of DPO, we apply it to several nonlocal games with known quantum bounds: CHSH, the N-partite symmetric (NPS) game [36], and the odd-cycle game [14]. Then, we use DPO to explicitly compute an optimal quantum strategy for the coloring game of a $14$ vertex graph called $G_{14}$ [25]. This strategy is then evaluated on quantum hardware, demonstrating that it can be used to benchmark the nonlocal capabilities of quantum devices and find sources of errors.

4.1 CHSH

The Clauser-Horne-Shimony-Holt (CHSH) scenario [33] is the simplest nonlocal game that admits a quantum advantage. CHSH features 2 players, Alice and Bob, who each receive a question $q_{i}\in Q=\{0,1\}$ , answering $a_{i}\in A=\{0,1\}$ . The inequality operator can be expressed in the familiar form $\mathcal{I}=A_{0}B_{0}+A_{1}B_{0}+A_{0}B_{1}-A_{1}B_{1}$ following the rules

\lambda(a|q)=\begin{cases}a_{a}\oplus a_{b},&\text{if }q_{a}=q_{b}=1\\ \delta_{a_{a},a_{b}},&\text{otherwise}\end{cases}

(9)

and making the substitution $\lambda=0\rightarrow\lambda=-1$ in (8). Here $A_{q}$ denotes Alice’s measurement operator and likewise $B_{q}$ for Bob. All classical strategies are bounded by $\braket{\mathcal{I}}\leq 2$ , whereas quantum strategies can violate this up to $\braket{\mathcal{I}}=2\sqrt{2}$ . i.e., from Equation (3), the violation occurs with $\xi_{Q}=2\sqrt{2}$ . It suffices to share a Bell state and then perform the appropriate single-qubit measurements.

We applied DPO to the CHSH game using $R_{y}(\phi)=e^{-i\frac{\phi}{2}Y}$ gates as the $U_{q}$ operators. In the first iteration, ADAPT chose $\ket{\psi(\theta)}=e^{i\frac{\pi}{4}YX}\ket{00}$ , which correctly generates a Bell state $\ket{\Phi^{-}}$ . In the second phase of that iteration, the measurement parameters were optimized to $\phi\approx[0,-\pi/2,\pi/4,-\pi/4]$ by constraining $\phi_{a0}=0$ , giving the optimal inequality value $2\sqrt{2}$ .

4.2 N-partite Symmetric

The NPS scenario [40] involves the correlations between players $N$ , each receiving a binary question $q_{i}\in\{0,1\}$ and returning a dichotomic answer $a_{i}=\pm 1$ . The inequality is expressed in terms of one- and symmetric two-body correlators,

S_{q}=\sum_{i}\braket{\mathcal{M}_{iq}},~S_{qq^{\prime}}=\sum_{i\neq j}\braket{\mathcal{M}_{iq}\mathcal{M}_{jq^{\prime}}},

(10)

where measurement $\mathcal{M}_{iq}=U_{iq}^{\dagger}Z_{i}U_{iq}$ . The classical bound on the correlations is

\mathcal{I}=-2S_{0}+\frac{1}{2}S_{00}-S_{01}+\frac{1}{2}S_{11}+2N\geq 0,

(11)

with negative values only achievable with quantum strategies [36].

We tested 50 DPO trials for the $N=6$ case (Fig. 2). Our algorithm encounters some local minima, particularly at the classical bound of $\mathcal{I}=0$ , but still succeeds in 19/50 attempts reaching $\mathcal{I}=-0.258$ as found in [36] as well. Additionally, 29/50 of the trials violated the classical bound.

4.3 Chromatic Number Game

The objective of the chromatic number game [25] is to color a graph $G=(V,E)$ in such a way that adjacent vertices are never given the same color. If this can be done using $c$ colors, we call this a $c$ -coloring of the graph. It has been shown recently that winning strategies for this game generate the set of all possible correlations for synchronous nonlocal games [41]. This differs from the other nonlocal games we mentioned, as the sets of questions and answers are much larger, and each player requires more qubits to encode their answer. The referee asks a question $q=[v_{a},v_{b}]\in\{(v,v)|v\in V\}\cup E$ , and the players respond with colors $a=[c_{a},c_{b}]\in\{1,\dots,c\}^{2}$ . The rules are given by

\lambda(a|q)=\begin{cases}\delta_{c_{a},c_{b}}&\text{if }v_{a}=v_{b}\\ (1-\delta_{c_{a},c_{b}})&\text{if }(v_{a},v_{b})\in E\end{cases}.

(12)

From these rules, one can encode the graph-coloring game into a Hamiltonian for DPO. For convenience, we denote a vertex question $[v,v]$ as $v$ , and an edge question $[v_{a},v_{b}]\in E$ as $e$ . Let the answers also be given by $c=[c_{a},c_{b}]$ . Then, the expression in (8) gives

\beta=\frac{1}{|Q|}\left[\sum_{v}U_{v}^{\dagger}P_{cc}U_{v}+\sum_{e}U_{e}^{\dagger}(I-P_{cc})U_{v}\right],

(13)

where $P_{cc}=\sum_{c}\ket{cc}\bra{cc}$ is the projector onto the subspace of matching colors, and $|Q|=|V|+|E|$ . Intuitively, the first term maximizes $p(c_{a}=c_{b}|v)$ , and the second term maximizes $p(c_{a}\neq c_{b}|e)=1-p(c_{a}=c_{b}|e)$ . Note that to ensure that all possible questions are asked to each player, $E$ contains both edges $(v_{a},v_{b})$ and their reverse $(v_{b},v_{a})$ .

First we consider the odd-cycle game [17], defined on an odd-length cycle graph $G(n)$ of $n$ vertices. The players are restricted to using 2 colors $c_{a},c_{b}\in\{0,1\}$ , meaning there is no perfect classical strategy because $G(n)$ is 3-colorable. Indeed, the optimal classical win rate is $\omega_{c}(n)=1-1/(2n)$ . The optimal quantum strategy has a higher rate of $\omega_{q}(n)=\cos^{2}[\pi/(4n)]$ , yielding a separation, but no perfect quantum strategy exist. Additionally, this game is of particular interest because it was recently experimentally demonstrated in a spatially separated pair of trapped ions, showing quantum advantage for up to $n\leq 19$ vertices [14].

Fig. 3 shows the distribution strategies discovered by our algorithm with a measurement layer of one $R_{Y}$ gate per player. We evaluated each game instance $G(n)$ with 25 trials. In all instances, we observed that the best discovered strategy was within $10^{-8}$ or lower of the optimal quantum value. The algorithm did get stuck in some local minima near the classical value $\omega_{c}$ , but the median values were within the algorithm tolerance, showing that it is able to find graph coloring strategies for the odd-cycle game.

Now we focus on the quantum chromatic game for the graph $G_{14}$ (Fig. 4). For this graph there exists a perfect quantum strategy with $4$ colors, while the smallest possible coloring strategy classically requires $5$ [25]. Recall that the notion of finding the smallest possible coloring strategy classically is equivalent to finding the chromatic number of this graph [42]. This graph was conjectured to be the smallest possible graph with a perfect quantum strategy for this game, and subsequently this was proved to be the case [25, 26]. In [25] a construction was provided using an orthogonal representation of $G_{14}$ , that is, a map of vertices to vectors in $\mathbb{R}^{4}$ such that adjacent vertices are assigned orthogonal vectors. These vectors are then used to define a set of projective measurement operators acting on the maximally entangled state to get a perfect quantum strategy to color the graph. It is unclear how to obtain an explicit set of ansätze from the projective measurements to construct a short-depth circuit that can be executed on near-term hardware (see C). We use the DPO algorithm to generate a perfect (up to numerical precision error) quantum strategy for this graph.

To simplify the search for strategies, we restrict the players to 2 qubits each, since 2 qubits suffice to represent $c\in\{0,...,\chi_{q}-1\}$ using a binary encoding. We also impose a known constraint on an optimal strategy for synchronous games [25, 28]: Bob’s measurement operators are complex conjugate to Alice’s, halving the number of measurement parameters $\phi$ required. We use a measurement layer per player of general single-qubit $U$ gates, a CNOT from qubit 0 to qubit 1, and $R_{y}$ gates applied to each qubit, resulting in 8 parameters per question or 112 in total (in our code, this is the U3Ry layer).

We classically simulated 500 trials of DPO, achieving a minimum energy of $E=-1.0000$ . We remove the gates added by ADAPT with $|\theta_{i}|<10^{-4}$ . The corresponding circuit was then converted into a Qiskit circuit (Fig. 5), and the evaluation using the classical AerSimulator confirmed a 100% win rate (Fig. 7(a)). The 112 parameters $\phi$ can be found in our code repository (see A).

It is worth noting that in a nonlocal game the referee cannot cross-check answers from previous questions (otherwise the graph would not be 4-quantum colorable), and the players change their coloring for each vertex probabilistically in subsequent runs, using the entanglement to coordinate their answers as required. For example, when asked $q=[A,A]$ multiple times, the responses are nearly uniform among $[\chi_{q}]$ but always match. Furthermore, we found that measurement layers consisting of only single-qubit gates were insufficient and generated imperfect strategies at $E=-0.9921$ . In these cases, we frequently observed that the errors consisted of a cyclic path through some graph edges.

The operators chosen by ADAPT are nonlocal as expected, acting on 2 and 4 qubits. The shared state discovered,

	$\displaystyle\ket{\psi(\mathbf{\theta})}$	$\displaystyle=e^{-i\frac{\pi}{4}YZYY}e^{i\frac{\pi}{4}YIZI}\ket{+}^{\otimes 4}$		(14)
		$\displaystyle=\frac{1}{2}H^{\otimes 4}\sum_{c\in[\chi_{q}]}\ket{cc},$		(15)

is the maximally entangled state followed by local Hadamard gates. This matches the existing strategy described in [25], which leverages the maximally entangled state, up to local unitary operations.

The circuit preparing the shared state $\ket{\psi}$ needs 8 CNOT gates to be transpiled using a ladder-like formulation with CNOT gates applied between nearest-neighbor qubits. This can be reduced to 2 CNOT gates (Fig. 6) by noting that the state $\frac{1}{2}\sum_{c}\ket{cc}$ can be generated with transversal Bell pairs shared between the players on qubits $a_{0},b_{0}$ and $a_{1},b_{1}$ . Applying the transversal Hadamards $H^{\otimes 4}$ in (15) flips the direction of the CNOT gates using a simple circuit identity. We refer to this version of the shared state circuit as the “Bell pair” strategy, which uses the same measurement layer and parameters as the original strategy.

4.4 Experiments on IBM Devices

This strategy was submitted to $11$ IBM quantum devices with $4$ or more qubits (Fig. 7). A decrease in performance was observed on IBM quantum devices compared to the classical simulation due to noise, particularly affecting the success rate of vertex questions (Fig. 7(b), 8). There are several possible sources of noise:

1.

Vertex questions are more sensitive to bit flips, as any 1-bit error will result in the answer violating the rules $X_{j}\ket{cc}\rightarrow\ket{c_{a}c_{b}}$ , while the same is not true for edge questions, since bit flips may not necessarily make the answers agree $X_{j}\ket{c_{a}c_{b}}\centernot\rightarrow\ket{cc}$ (see Section 6). This asymmetry comes from the rules of the game.
2.

As the resource state depends on entanglement, error in entangling 2-qubit gates disrupts the strategy.
3.

Circuit transpilation to hardware with fixed qubit connectivity further incurs two-qubit gate overhead.

This sensitivity suggests that measuring the win rate of the strategy for $G_{14}$ is a good benchmark to evaluate the ability of a quantum device at accurately controlling for bit flip errors, while simultaneously performing nonlocal operations. In particular, the vertex question win rate is very sensitive to noise, measuring the fidelity of the device gates, whereas the edge question win rate can confirm if a device is using quantum resources. The optimal classical strategy of 4 colors consists of a 4-coloring of $G_{13}$ and assigning the most infrequently used color to the apex vertex $\alpha$ . Therefore, all vertex questions would be correctly answered and one edge would be incorrectly answered, resulting in an edge win rate of $36/37\approx 97.3\%$ , or an overall win rate of $86/88\approx 97.7\%$ . Any win rate higher than this requires quantum resources. In our experiments, no device exceeded this threshold (Fig. 8). However, introducing an error-correcting version of our quantum strategy could improve the robustness of this test, which we leave to future investigations.

5 Nonlocal Games as Quantum Hardware Benchmarks

Nonlocal games with perfect strategies can serve as hardware benchmarks by assessing and analyzing the empirical win rates when executed on near-term hardware. Under certain assumptions about the structure of quantum noise, nonlocal games can exhibit quantum advantage in shallow circuits, even with noisy qubits [43]. The ‘noisy entanglement’ generated in shallow circuits enables correlations that classical circuits fundamentally struggle to reproduce. This is seen in [43]: their classical circuits of constant depth cannot simulate the long-range correlations.

In this section, we demonstrate the effectiveness of this benchmark by analyzing hardware noise and its strong correlation to strategy performance. We proceed backwards,from the unentangled readout measurements, to the independent Bell state measurements, to the initial entangled resource state preparation. By investigating the effects of hardware noise on the empirical win rates we seek to establish: a) which questions are most affected by noise, b) which components of the circuit are most affected by noise, and finally, if classical correlations, or quantum noise, could be misinterpreted as a winning quantum strategy.

In addition to the classical bounds provided in Section 4.4, we also consider the worst outcome on hardware: a nearly uniform distribution over all bitstrings. This would skew the win rates in the $G_{14}$ game as follows: for any vertex question would only be $1/4=0.25$ and the average win rate of any edge question would be $12/16=0.75$ . Thus random guessing would return an overall win rate of 59%. In Figures 11, 12, 13 and 14 we include these values as a reference.

Quantum hardware is affected by many sources of noise. The noise profile is time dependent and there are many strategies developed to optimize the scheduling and execution of quantum circuits. Using superconducting qubit platforms from IBM and Rigetti, we investigate the robustness of the original $G_{14}$ strategy, and the Bell pair strategy on superconducting qubit platforms with respect to hardware noise fluctuations over several days, and also to changes in the circuit made during the transpiration step.

5.1 Theoretical Noise Robustness

There is an asymmetric sensitivity to noise between the vertex and edge questions due to the game rules (see Section 4.4). Furthermore, there is variance in the edge questions performance. We hypothesize this arises from the particular strategy and distribution of answers found via the ADAPT algorithm. We further investigate the effects of bit-flip errors on the game strategies.

Multiple bitstrings satisfy the constraints for edge questions $\lambda(c,c|v_{A},v_{B})=0$ for all colors $c$ . Players using the four qubit strategy can win an edge question by outputting a bit string that is either Hamming distance $H(a,b)=1$ from matching (e.g. 0001) or distance 2 (e.g. 1100). While both options satisfy the game rules, choosing bitstrings with a greater Hamming distance reduces the likelihood of losing due to device noise, as higher-weight errors occur less frequently.

To quantify the noise robustness of the strategy resulting from this, we introduce the expected Hamming distance (EHD),

EHD(v_{a},v_{b})=\mathbb{E}_{c_{a},c_{b}\sim p(c_{a},c_{b}|v_{a},v_{b})}\left[H(c_{a},c_{b})\right],

(16)

where $H(a,b)$ denotes the Hamming distance between the binary representations of answers $a$ and $b$ . In general, the EHD is not efficiently computable on a classical computer since it requires sampling the strategy. However, because the $G_{14}$ strategy is sufficiently small, we calculate the EHD for each circuit via classical simulation (Fig. 9).

To show that the EHD effectively predicts question performance, we also plot results collected on ibm_sherbrooke¹¹1Eagle r3 processor, a superconducting qubit platform available from IBM with 127 qubits. We executed the strategy described in Section 4.3 multiple times over the course of a week. Supplemental experimental details are available in the G. The heatmaps exhibit a high degree of correlation ( $r=0.8812,p<0.001$ ), suggesting the strategy produced greatly influences noise robustness. The standard deviation is also presented alongside the win rate (Fig. 10), further highlighting the sensitivity of different questions to variations in device calibration. There are some outliers, particularly $(0,2)$ and $(2,0)$ that perform worse than expected, and the EHD cannot account for variation in the vertex question performance. We leave these to future investigations.

We executed circuits for the original $G_{14}$ strategy and the Bell pair strategy on Rigetti’s Ankaa-2, and Ankaa-3 devices. Both have square lattice qubit connectivity and to take advantage of this, we prioritized running experiments on qubit subsets with cyclic connectivity. The circuits first constructed in Qiskit are exported to Open Quantum Assembly Language (QASM) [44], then imported and compiled into Quil using the qiskit-rigetti plugin [45]. During compilation into native operations that are executable on the Rigetti platforms, circuit optimization is possible.

The compiled circuit can be further optimized through rewiring directives that determine how program qubits are mapped onto hardware qubits. The NAIVE rewiring uses the program qubit register index as the hardware qubit index. This rewiring may require the use of additional operations to mitigate non-neighboring interactions. The PARTIAL rewiring attempts to optimize the mapping between program and physical qubits to optimize the fidelity of the compiled circuit. We specified the rewiring strategy through the use of pre-compilation hooks. If no hooks were specified by the user, the rewiring strategy was not verified and we denote the strategy as (NOT SPECIFIED).

5.2 Noise Robustness of Game Components

In this section we analyze how hardware noise affects different nonlocal game circuit components, supported by results collected on superconducting qubit platforms. This extends the simulated noise results shown in Fig. 8(b) where the error rate of two-qubit gates was inferred from the hardware results reported in Section 4.4. We supplement these results with additional experiments designed to characterize key components of the strategy: readout measurement error, independent entangled measurements, and imperfect resource state preparation (shown in Fig. 15). Throughout this section we analyze and characterize each element individually. We determine the effective win rate that would be observed by the players if one of these elements failed or was replaced by randomness and use this to demonstrate the effectiveness of nonlocal games as a hardware benchmark.

The readout measurement error can be characterized by a $2^{n}\times 2^{n}$ dimensional matrix constructed row-wise from individual computational basis state preparation and measurement: preparing the register in $|0\rangle^{\otimes n}$ , applying $X$ -gates, and projecting the final state onto the computational basis. This can be used to estimate bit flip error probabilities (independent or correlated) [46], and also can be leveraged for readout error mitigation ²²2The results we report do not include readout error mitigation, we reserve this for future work.. We collected data to characterize the readout error on ibm_sherbrooke and Rigetti’s Ankaa-2, and Ankaa-3 platforms. In Fig. 16 we plot the Ankaa-2 and Ankaa-3 results to emphasize the connection to the EHD metric (see Section 5.1). Though the circuits executed on the hardware are very shallow, SPAM error can have a significant impact.

Connecting the SPAM error back to the EHD if a nonlocal circuit was correctly executed, and the only errors occurred during the readout stage, in Fig. 16 we see that vertex questions are more likely to return incorrect answers, while for edge questions, correct answers can still be returned with high probability. For vertex questions, the all-zero bitstring is relatively unaffected by errors during the readout measurement step, in contrast to the remaining three bitstrings. For edge questions, bit-flip errors that occur during the readout step can still return valid edge question bitstrings. The edge question in which the probability of erroneously returning a non-valid bitstring are bitstrings with high Hamming weights. Thus, if a state is correctly prepared and the error only occurs during the readout stage, it affects vertex questions and low-weight edge answers.

The full quantum strategy is composed of multiple circuits needed to evaluate the players’ performance on all questions posed by the referee. The construction assumes that the two players are separated in space to prohibit classical communication, and implementing the strategy requires nonlocal operations. Prior to the final qubit readout, the two players implement entangled unitaries ( $\mathcal{U}_{A}\otimes\mathcal{U}_{B}$ ) that are assumed to be independent. We assess the ability of each player to apply these entangled measurements with high fidelity independently, and simultaneously without corrupting each others operations. This is tested on four qubits $[46,47,48,49]$ connected in a linear chain (see G). A specific Bell state is prepared by applying $\mathcal{U}_{A}\otimes 1\otimes 1$ or $1\otimes 1\otimes\mathcal{U}_{B}$ where only two qubits prepare a Bell state while the other two qubits remain in the $|0\rangle$ state. Then, a Bell basis measurement is applied to the prepared Bell state and the remaining two qubits are measured in the computational basis. This is compared to the preparation of two independent Bell states both measured by Bell state measurement. To amplify the gate noise we construct and execute these circuits with basic unitary folding by inserting pairs of CNOT gates.

The general success probabilities are plotted in Fig. 17.

For the single Bell state preparations, we extract the marginal distributions of each subset and plot the mean probability of observing counts of each Bell state. The mean is evaluated using 14 executions of these experiments on ibm_sherbrooke.

The distinct separation between the success probabilities of isolated Bell state preparation either on $[q_{a},q_{b}]$ or $[q_{c},q_{d}]$ could be caused by individual two qubit gate error rates – indicative that a coupler between particular pairs of qubits could be more stable compared to neighboring qubits. Another cause could be the choice of hardware qubits combined with circuit optimization options (see G).

On Ankaa-2 we prepared the state $|\Psi^{+}\rangle\otimes|0\rangle\otimes|0\rangle$ and observe that over 75% of the observed bitstrings correspond to the correct Bell state. The highest number of counts are returned in the all-zero bitstring, indicated that the state was prepared correctly and measured correctly while the idle qubits remained idle. Preparing the state $|0\rangle\otimes|0\rangle\otimes|\Psi^{+}\rangle$ we observe that between 71-72% of the observed bitstrings correspond to the correct Bell state. However, preparing and measuring the state $|\Psi^{+}\rangle\otimes|\Psi^{+}\rangle$ showed a sharp decline in counts observed in the expected bitstrings.

Stretch	$\|\Phi+\rangle_{A}$	$\|\Phi+\rangle_{B}$	$\|\Psi+\rangle_{A}$	$\|\Psi+\rangle_{B}$
0	0.94 (1)	0.96(2)	0.91(2)	0.96(3)
2	0.92 (1)	0.94(2)	0.89(2)	0.94(3)
4	0.89 (2)	0.90(2)	0.86(2)	0.92(4)

Table 1: Mean and standard error of measuring counts in the target Bell state.

Connecting this characterization back to the nonlocal game as a benchmark: the game construction assumes the players are separated in space and classical communication is not possible. However the implementation on near-term hardware will likely use physical qubits that are connected via tunable couplings. If correlated noise is significant when executing simultaneous multi-qubit gates on non-overlapping qubit subsets, this can affect the win rate of the players. For the Bell state example we observe that this affects the ability to implement and measure two identical states. We believe that correlated noise may impede the performance again of vertex questions. Finally, we consider the impact of hardware noise on the resource state $|\Psi\rangle$ shared by Alice and Bob. With mirrored unitary circuits [47], we measure the probability of applying $\mathcal{U}_{R}\mathcal{U}^{\dagger}_{R}$ and successfully returning to the initial all zero register. Testing the four qubit unitary on ibm_sherbrooke, Ankaa-2, and Ankaa-3 multiple times we find that the success rate fluctuates depending on: hardware, qubit subset, and the choice of resource state.

On ibm_sherbrooke the success rate of the mirrored four qubit unitary was 19.43%. On Ankaa-2, the mirrored four qubit unitary of the original $G_{14}$ strategy, this approach had a success rate $<10\%$ . Specifically on September 29, 2024 the mirrored unitaries successfully returned to the initial state $|0\rangle^{\otimes 4}$ : on qubit subset (9,10,17,16) 8.06%; (2,3,10,9) 6.49 %; (9,10,16,17) 5.66 %; (2,9,16,23) 8.06%; and (2,3,9,10) 7.32 %. The circuit on Ankaa-2 were compiled with PARTIAL re-wiring. For Ankaa-3, the mirrored four qubit unitary success rate was much higher. On September 30, 2024 the mirrored unitaries successfully returned to the initial state $|0\rangle^{\otimes 4}$ : (0,1,4,3) 55.32%; (0,1,3,4) 33.08%. The circuits on Ankaa-3 were compiled with NAIVE re-wiring.

The mirrored circuits are much deeper than the resource state preparation alone, and contain more multi-qubit operations. Since noisy hardware can better prepare shallower, sparser resource state constructions, the mirror fidelity provides a pessimistic estimate of the fidelity of the resource state preparation. However we find it informative to compare the mirror fidelity of the arbitrary four qubit unitary to the mirror fidelity of the shared states used in the Bell state strategy, which we measured $14$ times during one week using ibm_sherbrooke. For this set of shared states the mean success probability was $91.56\pm 0.91\%$ .

Overall what we can infer from these individual characterizations is how hardware can generate nonlocal correlations (in the resource state preparation), how independent qubits can be controlled (via the players entangled operations) and finally the robustness of the players answers to readout errors. The development of a full predictive model is beyond the scope of this work, but from the initial characterizations of the game components it is clear that improving individual components can significantly impact the overall win rate which is of importance in the $G_{14}$ game, where the separation between the classical and quantum strategies is small.

6 Statistical fluctuations and Sample Complexity of Estimating the Win Rate

On near-term quantum devices, the win rate of each circuit (question) is estimated by statistical sampling, using independently drawn samples to estimate the probability that the players return the correct answers. Finite sample effects lead to statistical fluctuations. In this section, we derive an upper bound on the number of individual samples (shots) to draw from a prepared state to sufficiently assess whether a circuit has correctly answered the referee’s question.

In the interactive nonlocal game setup, the scenario is repeated with random questions until the referee is satisfied with the outcome. We consider how to obtain a low error estimate of the win rate with high probability using a finite number of repetitions.

Let $n$ be the number of rounds performed, where each round consists of the referee asking the players all $m$ possible questions once and checking their answers using the rule function $\lambda(a|q)$ . In the context of quantum hardware, this can be viewed as the execution of $m$ quantum circuits with $n$ shots per circuit.

Because the outcome of each question is binary, i.e., $\lambda(a|q)\in\{0,1\}$ , we model the outcome of question $q_{j}$ as a Bernoulli random variable $\lambda_{j}$ with an unknown success probability $p_{j}$ . The random variable describing the game value of a single round is $\omega=\frac{1}{m}\sum_{j}^{m}\lambda_{j}$ . We denote the empirical estimate of the win rate with $n$ rounds as $\bar{\omega}=\frac{1}{n}\sum_{i}^{n}\omega_{i}$ where $\omega_{1},\dots,\omega_{n}\sim\omega$ are i.i.d. samples. Under these mild assumptions, we derive an expression for the number of samples needed to accurately estimate the win rate within error $\epsilon$ .

Theorem 6.1.

Let $\bar{\omega}=\frac{1}{n}\sum_{i}^{n}\omega_{i}$ be the empirical estimate of the game win rate after $n$ rounds, where each round $\omega_{i}$ is independent and identically distributed (i.i.d.). Then, for any $\epsilon>0$ ,

P(|\bar{\omega}-\mathbb{E}[\bar{\omega}]|\geq\epsilon)\leq 2\exp\left(\frac{-n\epsilon^{2}/2}{\bar{\sigma}^{2}/m+\epsilon/3}\right),

(17)

where $m=|Q|$ is the number of questions and $\bar{\sigma}^{2}=\frac{1}{m}\sum_{j}^{m}p_{j}(1-p_{j})$ , where $p_{j}$ is the win rate of question $q_{j}$ .

Proof.

We make use of the Bernstein inequality [48, 49], which is restated here for convenience. Let $S_{n}=\sum_{i}^{n}X_{i}$ be the sum of zero-mean random variables $X_{1},\dots,X_{n}$ and $|X_{i}|\leq c$ almost surely. Then, for any $\epsilon>0$ ,

P(|S_{n}|\geq\epsilon)\leq 2\exp\left(\frac{-\epsilon^{2}/2}{\sum_{i}^{n}\mathrm{Var}[S_{i}]+c\epsilon/3}\right).

(18)

To use the inequality, we construct the sum $S_{n}=\sum_{i}^{n}\omega_{i}-\mathbb{E}[\omega_{i}]$ , subtracting the expectation values to meet the zero-mean condition, yielding

P(|n\bar{\omega}-n\mathbb{E}[\bar{\omega}]|\geq\epsilon)\leq 2\exp\left(\frac{-\epsilon^{2}/2}{\sum_{i}^{n}\mathrm{Var}[\omega_{i}]+c\epsilon/3}\right).

(19)

The magnitude of each term is bounded $|\omega_{i}-\mathbb{E}[\omega_{i}]|\leq 1=c$ . Furthermore, because each round $\omega_{i}\sim\omega$ is i.i.d., $\sum_{i}^{n}\mathrm{Var}[\omega_{i}]=n\bar{\sigma}^{2}/m$ , where $\bar{\sigma}$ is defined above and we have used the fact that the variance of a Bernoulli random variable is $p(1-p)$ . Substituting $n\epsilon$ in place of $\epsilon$ gives (17). ∎

Corollary 6.2 (Sample complexity).

With probability $1-\delta$ , we obtain an $\epsilon$ -close estimate of $\omega$ using at least

n\geq 2\log(2/\delta)\left(\frac{\bar{\sigma}^{2}}{m\epsilon^{2}}+\frac{1}{3\epsilon}\right)

(20)

rounds.

Proof.

This results from setting (17) less than or equal to $\delta$ and solving for $n$ . ∎

Corollary 6.3.

[Confidence interval] With $n$ rounds and with probability $1-\delta$ , the error of our estimate is

|\bar{\omega}-\mathbb{E}[\bar{\omega}]|\leq\frac{2\log(2/\delta)}{3n}+\bar{\sigma}\sqrt{\frac{2\log(2/\delta)}{mn}}.

(21)

Proof.

This can be obtained by solving (20) for $\epsilon$ and taking the positive solution, then applying the identity $\sqrt{x+y}\leq\sqrt{x}+\sqrt{y}$ . ∎

Year	Provider	Device	Strategy	Shots	Win rate (%)
2023	IBM	Guadalupe	Original	1024	78.1(6)
		Auckland	Original	1024	83.9(6)
		Jakarta	Original	1024	84.2(6)
		Manila	Original	1024	84.2(6)
		Cairo	Original	1024	84.3(6)
		Nairobi	Original	1024	84.5(6)
		Quito	Original	1024	84.6(6)
		Mumbai	Original	1024	85.6(6)
		Lima	Original	1024	85.8(6)
		Belem	Original	1024	87.3(6)
		Hanoi	Original	1024	92.5(5)
2024	IBM	Sherbrooke	Bell Pair	4096	94.3(2)
	Rigetti	Ankaa-2	Bell Pair	2048	83.8(4)
	Rigetti	Ankaa-3	Bell Pair	2048	91.8(3)

Table 2: Overall win rate for each device with

95\%

confidence intervals.

From (20), we see two possibilities to achieve asymptotic $O(\log(1/\delta)/\epsilon)$ sampling: with a large number of questions $m$ and when $\bar{\sigma}\simeq\epsilon$ . The first case is not practical because increasing the number of questions counterproductively increases the total number of circuit samples $mn$ .

The second case is also hard to achieve (at present) because it requires a near-perfect strategy on high-fidelity quantum hardware. This results from $\bar{\sigma}$ being directly linked to the win rate of the questions, which depends on both the strategy and quantum hardware. Assuming all questions have equal win rate $p$ for simplicity, this requires (again taking the positive solution) $p\approx\frac{1}{2}(1+\sqrt{1-4\epsilon^{2}})$ , which is approximately $p\approx 1-2\epsilon+O(\epsilon^{2})$ . We expect that $O(\log(1/\delta)/\epsilon)$ sampling may become feasible for perfect strategies with improved gate fidelity, quantum error correction, or amplitude amplification. Table 2 contains all the win rates of our executed experiments with confidence intervals derived from Corollary 6.3.

7 Conclusion

We present a variational algorithm to compute novel quantum strategies for nonlocal games by encoding the rules of a nonlocal game into a Hamiltonian and employing a two-step optimization procedure. Our key insight is to optimize separately the state preparation circuit and the measurement scheme while leveraging robust circuit initialization and general techniques, such as ADAPT, during optimization. The proposed algorithm successfully reproduces known quantum strategies and has also discovered new short-depth, perfect quantum strategies for a graph on $14$ vertices using four qubits. This demonstrates that variational techniques can be effectively used on classical computers to identify short-depth, optimal strategies for small examples of nonlocal games where analytic methods fail. Moreover, these techniques extend to a quantum setting, where sample-based gradient estimation is employed. However, the presence of barren plateaus is a known challenge with the training objective function, suggesting that “warm starts” or other techniques to mitigate vanishing gradients may be necessary for scaling these methods to larger nonlocal games.

We further illustrate how the execution of a nonlocal game strategy can serve as an application-level benchmark for quantum devices. By evaluating the win rates of both vertex and edge questions in these games, the win rate of vertex questions reflects a device’s ability to perform nonlocal operations and maintain gate fidelity, while the win rate of edge questions can help confirm the utilization of entanglement across a device. Although none of the devices we tested surpassed the quantum advantage threshold, primarily due to noise in circuit execution, we believe our results can be improved by optimizing the transpilation of the individual circuit before execution and control of the device calibration schedules. It is also worth noting that although our experiments do not provide a full proof of quantum advantage, given that the particles are not spatially separated enough to guarantee that classical communication does not happen during the experiment, it does provide validation that the quantum hardware in question outputs results consistent with the hypotheses of quantum theory. Recent work has begun to outline ways of guaranteeing a “loop-hole free” full verification of quantum advantage by compiling a multi-prover nonlocal game strategy into a single prover strategy [50, 10, 11] and we leave it to future work to investigate the feasibility and implications of these schemes for the games we studied. In a recent survey [51], the authors outlined five desirable properties for a good quantum benchmark and in our work we argued how the win rate from nonlocal game strategies fit all five points:

•

Relevant: The win rate measures the ability to prepare, control, and manipulate entangled states.
•

Reproducible: Strategy and questions are fixed.
•

Fair: Device independent and the executed circuits are shallow.
•

Verifiable: Straightforward to calculate the win rate via sampling.
•

Usable: Circuits can be made accessible via QASM files and can easily be ported to other quantum devices.

We believe that the continued study and extensions of nonlocal games, in particular graph-based games, can enable the design of more appropriate quantum benchmarks as quantum devices scale and hardware architectures become more complex. Ultimately, our research not only advances the understanding of variational quantum strategies but also lays the foundation for leveraging quantum machine learning techniques to explore other nonlocal games strategies beyond the reach of classical methods.

Acknowledgments

Thanks to David Roberson and Eleanor Rieffel for providing valuable feedback. NW, JF, and COM were funded by grants from the US Department of Energy, Office of Science, National Quantum Information Science Research Centers, Co-Design Center for Quantum Advantage under contract number DE-SC0012704. JF and COM were partially supported by the Laboratory Directed Research and Development Program and Mathematics for Artificial Reasoning for Scientific Discovery investment at the Pacific Northwest National Laboratory, a multiprogram national laboratory operated by Battelle for the U.S. Department of Energy under Contract DEAC05- 76RLO1830. S. C. is supported in part by the DOE Advanced Scientific Computing Research (ASCR) Accelerated Research in Quantum Computing (ARQC) Program under field work proposal ERKJ354. K. H. was supported by the DOE Advanced Scientific Computing Research (ASCR) Pathfinder Testbed Program under FWP ERKJ418.

This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.

This manuscript has been authored in part by UT-Battelle, LLC, under Contract No. DE-AC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.

References

[1] Dalzell AM, McArdle S, Berta M, Bienias P, Chen CF, Gilyén A, et al. Quantum algorithms: A survey of applications and end-to-end complexities. arXiv preprint arXiv:231003011. 2023.
[2] Rieffel EG, Asanjan AA, Alam MS, Anand N, Neira DEB, Block S, et al. Assessing and advancing the potential of quantum computing: A NASA case study. Future Generation Computer Systems. 2024.
[3] Proctor T, Young K, Baczewski AD, Blume-Kohout R. Benchmarking quantum computers. Nature Reviews Physics. 2025:1-14.
[4] Emerson J, Alicki R, Życzkowski K. Scalable noise estimation with random unitary operators. Journal of Optics B: Quantum and Semiclassical Optics. 2005;7(10):S347.
[5] Knill E, Leibfried D, Reichle R, Britton J, Blakestad RB, Jost JD, et al. Randomized benchmarking of quantum gates. Physical Review A—Atomic, Molecular, and Optical Physics. 2008;77(1):012307.
[6] Magesan E, Gambetta JM, Johnson BR, Ryan CA, Chow JM, Merkel ST, et al. Efficient measurement of quantum gate error by interleaved randomized benchmarking. Physical review letters. 2012;109(8):080505.
[7] Helsen J, Roth I, Onorati E, Werner AH, Eisert J. General framework for randomized benchmarking. PRX Quantum. 2022;3(2):020357.
[8] Chen YH, Baldwin CH. Randomized Benchmarking with Leakage Errors; 2025. Available from: https://arxiv.org/abs/2502.00154.
[9] Cross AW, Bishop LS, Sheldon S, Nation PD, Gambetta JM. Validating quantum computers using randomized model circuits. Physical Review A. 2019;100(3):032328.
[10] Natarajan A, Zhang T. Bounding the quantum value of compiled nonlocal games: from CHSH to BQP verification. In: 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS). IEEE; 2023. p. 1342-8.
[11] Kalai Y, Lombardi A, Vaikuntanathan V, Yang L. Quantum advantage from any non-local game. In: Proceedings of the 55th Annual ACM Symposium on Theory of Computing; 2023. p. 1617-28.
[12] Šupić I, Bowles J. Self-testing of quantum systems: a review. Quantum. 2020;4:337.
[13] Hart O, Stephen DT, Williamson DJ, Foss-Feig M, Nandkishore R. Playing nonlocal games across a topological phase transition on a quantum computer. arXiv preprint arXiv:240304829. 2024.
[14] Drmota P, Main D, Ainley E, Agrawal A, Araneda G, Nadlinger D, et al. Experimental Quantum Advantage in the Odd-Cycle Game. Physical Review Letters. 2025;134(7):070201.
[15] Bell JS. On the Einstein Podolsky Rosen paradox. Physics Physique Fizika. 1964;1(3):195.
[16] Clauser JF, Horne MA, Shimony A, Holt RA. Proposed experiment to test local hidden-variable theories. Physical review letters. 1969;23(15):880.
[17] Cleve R, Hoyer P, Toner B, Watrous J. Consequences and limits of nonlocal strategies. In: Proceedings. 19th IEEE Annual Conference on Computational Complexity, 2004. IEEE; 2004. p. 236-49.
[18] Reichardt BW, Unger F, Vazirani U. A classical leash for a quantum system: Command of quantum systems via rigidity of CHSH games. arXiv preprint arXiv:12090448. 2012.
[19] Fritz T. Tsirelson’s problem and Kirchberg’s conjecture. Reviews in Mathematical Physics. 2012;24(05):1250012.
[20] Slofstra W. Tsirelson’s problem and an embedding theorem for groups arising from non-local games. Journal of the American Mathematical Society. 2020;33(1):1-56.
[21] Ji Z, Natarajan A, Vidick T, Wright J, Yuen H. MIP*=RE. Communications of the ACM. 2021;64(11):131-8.
[22] Cameron PJ, Montanaro A, Newman MW, Severini S, Winter A. On the Quantum Chromatic Number of a Graph. The Electronic Journal of Combinatorics. 2007;14(1):R81.
[23] Mančinska L, Roberson DE. Quantum homomorphisms. Journal of Combinatorial Theory, Series B. 2016;118:228-67.
[24] Daniel AK, Zhu Y, Alderete CH, Buchemmavari V, Green AM, Nguyen NH, et al. Quantum computational advantage attested by nonlocal games with the cyclic cluster state. Physical Review Research. 2022;4(3):033068.
[25] Mančinska L, Roberson DE. Oddities of Quantum Colorings. Baltic Journal on Modern Computing. 2016;4(4):846-59.
[26] Lalonde O. On the Quantum Chromatic Numbers of Small Graphs. arXiv preprint arXiv:231108194. 2023. Available from: https://arxiv.org/abs/2311.08194.
[27] Lupini M, Mančinska L, Paulsen VI, Roberson DE, Scarpa G, Severini S, et al. Perfect strategies for non-local games. Mathematical Physics, Analysis and Geometry. 2020;23(1):7.
[28] Helton JW, Mousavi H, Nezhadi SS, Paulsen VI, Russell TB. Synchronous values of games. In: Annales Henri Poincaré. vol. 25. Springer; 2024. p. 4357-97.
[29] Helton JW, Meyer KP, Paulsen VI, Satriano M. Algebras, synchronous games, and chromatic numbers of graphs. New York J Math. 2019;25:328-61.
[30] Ortiz CM, Paulsen VI. Quantum graph homomorphisms via operator systems. Linear Algebra and its Applications. 2016;497:23-43.
[31] Mančinska L, Paulsen VI, Todorov IG, Winter A. Products of synchronous games. arXiv preprint arXiv:210912039. 2021.
[32] Cameron PJ, Montanaro A, Newman MW, Severini S, Winter A. On the quantum chromatic number of a graph. arXiv preprint quant-ph/0608016. 2006.
[33] Clauser JF, Horne MA, Shimony A, Holt RA. Proposed Experiment to Test Local Hidden-Variable Theories. Physical Review Letters. 1969 oct;23(15):880-4.
[34] Kempe J, Kobayashi H, Matsumoto K, Toner B, Vidick T. Entangled games are hard to approximate. SIAM Journal on Computing. 2011;40(3):848-77.
[35] Bharti K, Haug T, Vedral V, Kwek LC. Machine learning meets quantum foundations: A brief survey. AVS Quantum Science. 2020 jul;2(3). Available from: https://doi.org/10.1116%2F5.0007529.
[36] Bharti K, Haug T, Vedral V, Kwek LC. How to Teach AI to Play Bell Non-Local Games: Reinforcement Learning; 2019.
[37] Grimsley HR, Economou SE, Barnes E, Mayhall NJ. An adaptive variational algorithm for exact molecular simulations on a quantum computer. Nature communications. 2019;10(1):3007.
[38] Mitarai K, Negoro M, Kitagawa M, Fujii K. Quantum circuit learning. Physical Review A. 2018 sep;98(3). Available from: https://doi.org/10.1103%2Fphysreva.98.032309.
[39] Schuld M, Bergholm V, Gogolin C, Izaac J, Killoran N. Evaluating analytic gradients on quantum hardware. Physical Review A. 2019 mar;99(3). Available from: https://doi.org/10.1103%2Fphysreva.99.032331.
[40] Tura J, Augusiak R, Sainz AB, Vértesi T, Lewenstein M, Acín A. Detecting nonlocality in many-body quantum states. Science. 2014;344(6189):1256-8.
[41] Harris SJ. Universality of graph homomorphism games and the quantum coloring problem. In: Annales Henri Poincaré. Springer; 2024. p. 1-36.
[42] Paulsen VI, Todorov IG. Quantum chromatic numbers via operator systems. The Quarterly Journal of Mathematics. 2015;66(2):677-92.
[43] Bravyi S, Gosset D, König R, Tomamichel M. Quantum advantage with noisy shallow circuits. Nature Physics. 2020;16(10):1040-5.
[44] Cross AW, Bishop LS, Smolin JA, Gambetta JM. Open quantum assembly language. arXiv preprint arXiv:170703429. 2017.
[45] Rigetti. Qiskit-Rigetti Plugin. GitHub; 2024. https://github.com/rigetti/qiskit-rigetti.
[46] Hamilton KE, Kharazi T, Morris T, McCaskey AJ, Bennink RS, Pooser RC. Scalable quantum processor noise characterization. In: 2020 IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE; 2020. p. 430-40.
[47] Mayer K, Hall A, Gatterman T, Halit SK, Lee K, Bohnet J, et al. Theory of mirror benchmarking and demonstration on a quantum computer. arXiv preprint arXiv:210810431. 2021.
[48] Bernstein S. On a modification of Chebyshev’s inequality and of the error formula of Laplace. Ann Sci Inst Sav Ukraine, Sect Math. 1924;1(4):38-49.
[49] Zhang H, Chen S. Concentration Inequalities for Statistical Inference. Communications in Mathematical Research. 2021;37(1):1-85.
[50] Grilo AB. A Simple Protocol for Verifiable Delegation of Quantum Computation in One Round. In: 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik; 2019. .
[51] Acuaviva A, Aguirre D, Peña R, Sanz M. Benchmarking Quantum Computers: Towards a Standard Performance Evaluation Approach. arXiv preprint arXiv:240710941. 2024.
[52] Cerezo M, Arrasmith A, Babbush R, Benjamin SC, Endo S, Fujii K, et al. Variational quantum algorithms. Nature Reviews Physics. 2021;3(9):625-44.
[53] Warren A, Zhu L, Mayhall NJ, Barnes E, Economou SE. Adaptive variational algorithms for quantum Gibbs state preparation. arXiv preprint arXiv:220312757. 2022.
[54] Sherbert K, Furches J, Shirali K, Economou SE, Marrero CO. Adaptive Quantum Generative Training using an Unbounded Loss Function. In: 2024 IEEE International Conference on Quantum Computing and Engineering (QCE). vol. 1. IEEE; 2024. p. 1731-8.
[55] Childs AM, Wiebe N. Hamiltonian simulation using linear combinations of unitary operations. arXiv preprint arXiv:12025822. 2012.
[56] Chakraborty S. Implementing any linear combination of unitaries on intermediate-term quantum computers. Quantum. 2024;8:1496.
[57] Catli AB, Simon S, Wiebe N. Exponentially Better Bounds for Quantum Optimization via Dynamical Simulation. arXiv preprint arXiv:250204285. 2025.

Appendix

Appendix A Data Availability

The code used to generate the data and figures in this article can be found at
https://github.com/jfurches/nonlocalgames. The authors will make available the data collected for noise characterization by reasonable request.

Appendix B ADAPT-VQE

The Adaptive Derivative-Assembled Pseudo-Trotter ansatz Variational Quantum Eigensolver (ADAPT-VQE) is a hybrid quantum-classical algorithm designed to dynamically construct an efficient and compact ansatz for molecular simulations on quantum hardware [37]. It enhances the traditional Variational Quantum Eigensolver (VQE) by adaptively building a problem-specific ansatz for the quantum state. Unlike traditional approaches such as Unitary Coupled Cluster (UCC), which rely on pre-defined and often redundant wavefunction ansätze, ADAPT-VQE grows the ansatz iteratively by selecting operators that maximize energy recovery at each step. This adaptive approach minimizes the number of parameters and quantum gates required, making it well-suited for noisy intermediate-scale quantum (NISQ) devices.

ADAPT-VQE operates by measuring the gradient of the Hamiltonian’s expectation value with respect to each operator in a predefined operator pool. The operator with the largest gradient is added to the ansatz, and its parameter is optimized alongside previously added parameters using a classical variational optimizer. This process is repeated until the norm of the gradient vector falls below a threshold, ensuring convergence to the desired accuracy.

More concretely: assume we have variational parameters $\boldsymbol{\mathbf{\theta}}^{(k)}=(\theta_{1},\dots,\theta_{k})$ and the operator pool $\mathcal{A}=\{A^{(1)},A^{(2)},\dots A^{(N)}\}$ , the ansatz in iteration $k+1$ of the algorithm may be written as

|\psi_{k+1}(\boldsymbol{\mathbf{\theta}}^{(k+1)})\rangle=e^{-i\theta_{k+1}A_{k+1}}|\psi_{k}(\boldsymbol{\mathbf{\theta}}^{(k)})\rangle.

Notice that the ansatz at iteration $k$ is grown by appending operator $A_{k+1}$ with coefficient $\theta_{k+1}$ ; the operator is chosen by measuring the energy gradients $\left|\left.\partial\langle H\rangle/\partial\theta_{k+1}\right|_{\theta_{k+1}=0}\right|$ for each operator in the pool and selecting the one with the largest gradient. For this step, it can be shown that

\left|\left.\partial\langle H\rangle/\partial\theta_{k+1}\right|_{\theta_{k+1}=0}\right|=\left|\langle\psi_{k}(\boldsymbol{\mathbf{\theta}}^{(k)})|\left[A_{k+1},H\right]|\psi_{k}(\boldsymbol{\mathbf{\theta}}^{(k)})\rangle\right|,

where the right hand side can be efficiently measured on a quantum processor as the size of a problem scales. The pool operator gradient-measurement step is followed by a convergence check: if the pool operator gradient norm is smaller than a threshold $\varepsilon$ , the calculation is terminated; if not, the iteration procedure continues. The ansatz-growing step is followed by a VQE optimization of all variational parameters.

By tailoring the ansatzs to the problem at hand, ADAPT-VQE achieves high accuracy with significantly reduced circuit depth compared to fixed ansatz methods. This variational technique has been studied extensively [52] and it has been extended to tackle problems in Quantum Generative training [53, 54]

Appendix C Original $G_{14}$ Strategy

Here we outline the perfect quantum strategy for the graph $G_{14}$ using $4$ colors as detailed in [25]. The authors construct this strategy by leveraging a 4-dimensional real orthogonal representation of the graph and a transformation derived from quaternion multiplication outlined in [22]. Here is an outline of their construction:

Ortogonal Representation: For each vertex in $v\in V(G_{14})$ we assign a normalized 4D real unit vector $\varphi(v)$ as follows:

•

For each vertex in $G_{13}$ (see Figure 4) you assign it a 3-dimensional vectors with entries in $\{-1,0,1\}$ such that two vertices in $G_{13}$ are adjacent if and only if their corresponding vectors are orthogonal.
•

For each 3-dimensional vector, $(x,y,z)^{T}$ , extended it to a 4D vector by appending a zero: $(x,y,z,0)^{T}$ .
•

Assign the apex vertex $\Omega$ of $G_{14}$ the 4D vector $(0,0,0,1)^{T}$ .
•

Normalize each vector to be a unit vector and let $\varphi(v)$ be the vector corresponding to vertex $v\in G_{14}$ .
•

This assignment guarantees that if vertices $u$ and $v$ are adjacent ( $u\sim v$ ), their vectors are orthogonal ( $\varphi(u)^{T}\varphi(v)=0$ ).

•

To each vector $\varphi(v)=(r_{0},r_{1},r_{2},r_{3})^{T}$ , we associate a set of four mutually orthogonal unit vectors, $\{\varphi_{k}(v)\}_{k=0}^{3}$ , where each vector is a columns of the following matrix:

M_{v}=\begin{pmatrix}r_{0}&-r_{1}&-r_{2}&-r_{3}\\ r_{1}&r_{0}&r_{3}&-r_{2}\\ r_{2}&-r_{3}&r_{0}&r_{1}\\ r_{3}&r_{2}&-r_{1}&r_{0}\end{pmatrix}

So, $\varphi_{0}(v)=(r_{0},r_{1},r_{2},r_{3})^{T}$ , $\varphi_{1}(v)=(-r_{1},r_{0},r_{3},-r_{2})^{T}$ , and so on. These four vectors form the measurement basis for vertex $v$ .

2.

State and Projectors $P_{k}(v)$ : In the corresponding nonlocal game, Alice and Bob share a 4-dimensional maximally entangled state, $|\Psi^{+}\rangle=\frac{1}{2}\sum_{j=0}^{3}|j\rangle_{A}\otimes|j\rangle_{B}$ . Upon receiving a vertex $v$ , a player performs a measurement using projectors $\{P_{k}(v)\}_{k=0}^{3}$ , where each projector is defined by the corresponding basis vector:

$P_{k}(v)=\varphi_{k}(v)\varphi_{k}(v)^{T}$
3.

Joint Probabilities $P(a,b|u,v)$ : The probability that Alice and Bob obtain outcomes (colors) $a$ and $b$ for questions $u$ and $v$ respectively, is given by:

$P(a,b|u,v)=\frac{1}{4}\text{Tr}(P_{a}(u)P_{b}(v))=\frac{1}{4}|\varphi_{a}(u)^{T}\varphi_{b}(v)|^{2}$

This formula ensures that the winning conditions of the coloring game are met with certainty. Specifically, if $u=v$ , then $P(a,b|u,u)=\frac{1}{4}\delta_{ab}$ . If $u\sim v$ , then $P(a,a|u,v)=0$ for all $a$ .

One difficulty in implementing this strategy comes from the fact that measurement schemes are given as projections, which would need to be decomposed, for example, using Linear Combinations of Unitatires (LCU) [55]. The cost of standard LCU can be resource-intensive requiring $\left\lceil{\log(M)}\right\rceil$ ancilla qubits, where $M$ is the number of unitaries in the linear combination, as well as the need to implement the “prepare” unitary and a sophisticated multi-qubit controlled “select” unitary for each projection separately. Other techniques like Ancilla-free LCU [56] might be able to reduce this overhead, but assessing the feasibility of implementing this strategy in near-hardware is non-trivial and outside the scope of the work.

Appendix D Measurement Parameters

Alice’s measurement parameters of the $G_{14}$ strategy are contained within
data/g14_constrained_u3ry/g14_state.json with the key phi. Constructing this into a NumPy array should return a tensor of shape $(1,14,2,4)$ , corresponding to (players, questions, qubits, parameters). This tensor can be transformed to produce the conjugated measurement angles for Bob, as seen in U3RyLayer in measurement.py.

Appendix E Hyperparameters

Problem	Hyperparameter	Value
CHSH	ADAPT Grad Max $\epsilon_{\theta}$	$10^{-3}$
	BFGS Grad Max $\epsilon_{\phi}$	$10^{-5}$
	DPO Tolerance $\Delta E$	$10^{-3}$
NPS	Same as CHSH
$G_{14}$	ADAPT Grad Max $\epsilon_{\theta}$	$10^{-6}$
	BFGS Grad Max $\epsilon_{\phi}$	$10^{-5}$
	DPO Tolerance $\Delta E$	$10^{-6}$

Table 3: Hyperparameters for DPO experiments

We give the algorithm hyperparameters for our experiments. The parameter $\epsilon_{\theta}$ refers to the convergence criteria of ADAPT used to prepare the shared state $\ket{\psi(\theta)}$ . ADAPT finishes when the maximum pool gradient element reaches the threshold, $\max_{A_{i}}\left|\braket{[H,A_{i}]}\right|<\epsilon_{\theta}$ . Similarly, the parameter $\epsilon_{\phi}$ controls the convergence of the second phase of DPO, as the BFGS optimizer halts when $\max_{i}\left|\nabla_{\phi}\braket{H}\right|<\epsilon_{\phi}$ . Finally, $\Delta E$ controls the termination of the overall DPO procedure, ending when $\braket{H^{(k-1)}}-\braket{H^{(k)}}<\Delta E$ at iteration $k$ .

Appendix F Gradient Sample Complexity

In this section, we analyze the efficiency of the gradient simulation to understand its sample complexity. This addresses the practical and theoretical challenges faced when implementing our algorithm. The gradient complexity we consider is in terms of the number of exponentials required to achieve any $\epsilon$ precision.

Theorem F.1.

Let $\mathcal{E}_{j}$ be a random variable describing the error in the gradient estimate for the $j-th$ experiment with variance $\mathbb{E}[\mathcal{E}^{2}_{j}]=\epsilon_{0}^{2}$ . Then the sample complexity of estimating the gradient with $\epsilon^{2}$ variance is given by $N_{\mathrm{exp}}\in\mathcal{O}\left(\frac{N^{2}}{\epsilon^{2}}\right)$ , where $N$ is the dimensionality of the parameter space.

Proof.

Let $\epsilon_{0}=\frac{\epsilon}{\sqrt{N}}$ . By the additivity of the variance, it follows that $\mathbbm{E}\left[\sum\limits_{j=1}^{N}\mathcal{E}_{j}^{2}\right]=N\mathbbm{E}[\mathcal{E}_{j}^{2}]=N\epsilon_{0}^{2}$ . The Euclidean norm of the gradient is approximated using the variances of the measurement outcomes. Hence

\|\nabla\|^{2}\approx\mathbbm{E}\left[\sum\limits_{j=1}^{N}\mathcal{E}_{j}^{2}\right]=N\frac{\epsilon^{2}}{N}=\epsilon^{2}.

(22)

Since each experiment requires $\mathcal{O}\left(\frac{1}{\epsilon_{0}^{2}}\right)=\mathcal{O}\left(\frac{N}{\epsilon^{2}}\right)$ operator exponentials and this must be repeated $N$ times, the total number of operator exponentials $N_{\mathrm{exp}}$ is

N_{\mathrm{exp}}\in\mathcal{O}\left(\frac{N^{2}}{\epsilon^{2}}\right)

(23)

as desired. ∎

Theorem F.2.

Assume that the variational state $\ket{\psi(\theta)}$ requires $N$ parameters to specify and that we wish to minimize $F(\theta):=\langle\psi(\theta)|H|\psi(\theta)\rangle$ over $\theta$ . Assume that $F$ is Lipshitz continuous with constant $C$ and that $\nabla F$ is Lipshitz continuous with constant $L$ . We then have that the number of exponentials required to perform gradient descent optimization with final error in the objective function at most $\epsilon_{\rm tot}$ using learning rate $\eta$ and $N_{\rm epochs}$ epochs is in

O\left(\frac{N^{2}N_{\rm epoch}C^{2}((1+\eta L)^{N_{\rm epoch}}-1)^{2}}{\epsilon_{\rm tot}^{2}L^{2}}\right)

N_{\mathrm{epoch,tot}}\in\mathcal{O}\left(\frac{N^{2}N_{\mathrm{epoch}}}{\gamma^{2}\min\limits_{\theta\in\Gamma}\|\nabla\langle\psi(\theta)|H(\phi)|\psi(\theta)\rangle\|}\right).

(24)

Proof.

The gradient descent rule with learning rate $\eta$ reads

\theta\rightarrow\theta-\eta\nabla_{\phi}\langle\psi(\theta)|H(\phi)|\psi(\theta)\rangle.

(25)

Using our assumption that the gradient is Lipshitz-continuous with constant $L$ , then

\|\nabla\langle\psi(\theta)|H|\psi(\theta)\rangle-\nabla\langle\psi(\theta+\delta)|H|\psi(\theta+\delta)\rangle\|\leq L\delta.

(26)

If we define $\tilde{G}(\theta)$ to be an approximate gradient evaluated at the parameters $\theta$ , then

$\displaystyle\\|\nabla\langle\psi(\theta)\|H\|\psi(\theta)\rangle-\tilde{G}(\theta+\delta)\\|\leq$	$\displaystyle\\|\nabla\langle\psi(\theta)\|H\|\psi(\theta)\rangle-\nabla\langle\psi(\theta+\delta)\|H\|\psi(\theta+\delta)\rangle\\|$
	$\displaystyle\qquad+\\|\nabla\langle\psi(\theta+\delta)\|H\|\psi(\theta+\delta)-\tilde{G}(\theta+\delta)\\|$
	$\displaystyle\leq L\\|\delta\\|+\epsilon.$	(27)

Thus, we can recursively define the error in the parameter vector after $k$ epochs to be $\delta_{k}$ and thus from the triangle inequality and the gradient update rule we have

\|\delta_{k}\|\leq\eta(L\|\delta_{k-1}\|+\epsilon)+\|\delta_{k-1}\|

(28)

We can then solve this recursion relation to find that

	$\displaystyle\\|\delta_{k}\\|$	$\displaystyle\leq\eta\epsilon+(1+\eta L)\eta\epsilon+(1+\eta L)^{2}\eta\epsilon+\cdots$
		$\displaystyle\leq\frac{\epsilon((1+\eta L)^{k}-1)}{L}.$		(29)

Using the assumption that the objective function is Lipshitz-continuous with constant $C$ ,

\displaystyle\|\langle\psi(\theta+\delta)|H|\psi(\theta+\delta)\rangle-\bra{\psi(\theta)}H\ket{\psi(\theta)}\leq C\|\delta\|.

(30)

Then it suffices to choose the error per gradient evaluation such that

\frac{C\epsilon((1+\eta L)^{N_{\rm epoch}}-1)}{L}\leq\epsilon_{tot}.

(31)

Isolating $\epsilon$ yields

\epsilon\leq\frac{\epsilon_{\rm tot}L}{C((1+\eta L)^{N_{\rm epoch}}-1)}.

(32)

This means that from Theorem F.1, the total number of exponentials per epoch that are needed is

N_{\exp}\in O\left(\frac{N^{2}C^{2}((1+\eta L)^{N_{\rm epoch}}-1)^{2}}{\epsilon_{\rm tot}^{2}L^{2}}\right).

(33)

Using the fact that there are $N_{\rm epoch}$ repetitions of the above

N_{\mathrm{\exp,tot}}\in O\left(\frac{N^{2}N_{\rm epoch}C^{2}((1+\eta L)^{N_{\rm epoch}}-1)^{2}}{\epsilon_{\rm tot}^{2}L^{2}}\right).

(34)

∎

This shows that the sample complexity of such problems can, in general, be substantial. In particular, if a small learning rate is required for the evolution, the number of operations needed for optimization can be exponential. The learning rate $\eta$ should be chosen (in the strongly convex case) to be proportional to the smallest eigenvalue of the Hessian matrix, implying that the number of samples scales exponentially with the condition number. This can be prohibitive in cases where some optimization directions are vastly steeper than others, such as in the vicinity of a saddle point. The number of epochs required for optimization is similarly difficult to bound. However, in the case where the optimization function is strongly convex, the number of epochs varies logarithmically with the error in the final objective function. In general, however, such optimization problems are not necessarily strongly convex. For these reasons, we leave the parameters of the gradient descent arbitrary.

As a final note, this suggests that variationally optimizing the parameters for a nonlocal game is not necessarily expected to be efficient, in general. To make this optimization tractable at scale, we need to minimize the number of epochs as much as possible. This can be achieved by starting with a well-informed initital guess for the protocol before attempting to optimize the result. If such conditions are met, the above analysis suggests that a manageable number of operations will be needed to achieve a constant distance from the locally optimized strategy. To tackle the general problem, we suggest exploring alternative optimization approaches such as solving the variational problem using dynamical simulation-based methods [57].

Appendix G Experimental Details

The experiments on ibm_sherbrooke were conducted 7 different times between Sep. 27 - Oct. 1, 2024 with 4096 shots per circuit. The layout was chosen on the first run to be qubits 46-49 (a linear chain) using the dense method of the Qiskit transpiler with no optimization (level 0). For subsequent runs, the same layout was repeated. Each batch contained: the SPAM characterization circuits, the independent unitary noise characterization circuits, mirror fidelity circuits and the Bell pair game circuits. In Table 2 and Fig. 19, the best run on ibm_sherbrooke is reported. Calibration data for the backend was queried and saved at the time the circuit batch entered the queue and in Table 4 we report the two qubit gate error (ECR gates).

name	value
ecr45_44	0.010494
ecr45_46	0.007731
ecr47_46	0.004980
ecr47_48	0.006505
ecr49_48	0.005589
ecr49_50	0.010321
ecr50_51	0.007020

Table 4: Calibration data for individual ECR gates between hardware qubits 46-49 on ibm_sherbrooke.

The reported data from Rigetti’s Ankaa-2 was collected on September 29 2024, and September 30 2024. Each circuit was sampled with 2048 shots and the hardware qubits used are reported in Figs. 11 and 12.

The reported data from Rigetti’s Ankaa-3 was collected on September 30 2024 . Each circuit was sampled with 2048 shots and the hardware qubits used are reported in Figs. 13 and 14.

$\displaystyle\\|\nabla\langle\psi(\theta)\|H\|\psi(\theta)\rangle-\tilde{G}(\theta+\delta)\\|\leq$	$\displaystyle\\|\nabla\langle\psi(\theta)\|H\|\psi(\theta)\rangle-\nabla\langle\psi(\theta+\delta)\|H\|\psi(\theta+\delta)\rangle\\|$
	$\displaystyle\qquad+\\|\nabla\langle\psi(\theta+\delta)\|H\|\psi(\theta+\delta)-\tilde{G}(\theta+\delta)\\|$
	$\displaystyle\leq L\\|\delta\\|+\epsilon.$	(27)

Application-level Benchmarking of Quantum Computers using Nonlocal Game Strategies

Abstract

1 Introduction

2 Background

3 Method

3.1 Dual-Phase Optimization

3.2 Game Hamiltonians

Proposition 3.1.

Proof 3.1.

4 Experiments

4.1 CHSH

4.2 N-partite Symmetric

4.3 Chromatic Number Game

4.4 Experiments on IBM Devices

5 Nonlocal Games as Quantum Hardware Benchmarks

5.1 Theoretical Noise Robustness

5.2 Noise Robustness of Game Components

6 Statistical fluctuations and Sample Complexity of Estimating the Win Rate

Theorem 6.1.

Proof.

Corollary 6.2 (Sample complexity).

Proof.

Corollary 6.3.

Proof.

7 Conclusion

Acknowledgments

References

References

Appendix

Appendix A Data Availability

Appendix B ADAPT-VQE

Appendix C Original G14G_{14} Strategy

Appendix D Measurement Parameters

Appendix E Hyperparameters

Appendix F Gradient Sample Complexity

Theorem F.1.

Proof.

Theorem F.2.

Proof.

Appendix G Experimental Details

Appendix C Original $G_{14}$ Strategy