This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Supplement to “Causal Strategic Linear Regression”

Yonadav Shavit    Benjamin L. Edelman    Brian Axelrod
Strategic classification, Mechanism design

1 Appendix

1.1 Agent Outcomes

Proof of Theorem 1.

Let’s walk through the steps of the algorithm, bounding the error that accumulates along the way.

In the first round we set ω=0\omega=0 in order to obtain an estimate for E[ωTx]E[{\omega^{*}}^{T}x].

Since ω\omega^{*} is a unit vector, the variance of ωTx{\omega^{*}}^{T}x is at most λmax\lambda_{max} plus a constant (from the 11-subgaussian noise).

By Chebyshev’s inequality, this means that O(λmaxϵ2d2)O(\lambda_{max}\epsilon^{-2}d^{2}) samples suffice for the empirical estimator of E[ωTx]E[{\omega^{*}}^{T}x] to have no more than ϵ4d\frac{\epsilon}{4d} error with failure probability Ω(12d)\Omega(\frac{1}{2d}). We call the output of this estimator μ^\hat{\mu} and let μ^d\hat{\mu}_{d} be the r-dimensional vector with μ^\hat{\mu} in every coordinate.

Now we choose ω1.ωd\omega_{1}....\omega_{d} that form an orthonormal basis of the image of the diagonal matrix VV. For each ω\omega we observe the reward ωT(x+Gω)+η{\omega^{*}}^{T}(x+G\omega)+\eta, subtract out μ^\hat{\mu}, and plug it into the empirical mean estimator. For each ωi\omega_{i}, let ν^i\hat{\nu}_{i} be the resulting coefficient. After O(ϵ1dλmax)O(\epsilon^{-1}d\lambda_{max}) samples, each coefficient has at most ϵ4d\frac{\epsilon}{4d} error with failure probability at most 12d\frac{1}{2d}. Since we have computed d+1d+1 estimators, each one with failure probability at most 12d\frac{1}{2d}, a union bound gives us a total failure probability that is sub-constant.

We can now bound the total squared 2\ell_{2} error between said coefficients and GTωG^{T}\omega^{*} in the ω1ωd\omega_{1}...\omega_{d} basis (noting that the choice of basis does not affect the magnitude of the error). We can break up the error into two components using the triangle inequality: the error due to μ^d\hat{\mu}_{d} and the error in the subsequent rounds. Each coordinate of μ^d\hat{\mu}_{d} has error of magnitude at most ϵ4d\frac{\epsilon}{4d}, so the total magnitude of the error in μ^d\hat{\mu}_{d} is at most ϵ4\frac{\epsilon}{4}. The same argument applies for the error in the coordinate estimates, leading to a total 2\ell_{2} error of at most ϵ/2\epsilon/2.

Recall that ω^=ν^/ν^\hat{\omega}=\hat{\nu}/\|\hat{\nu}\|. Let ν:=GTω\nu:=G^{T}\omega^{*}. We can now bound the gap between the agent outcomes incentivized by ω^\hat{\omega} and by ωimp=ν/ν\omega_{imp}=\nu/\nu:

AO(ωimp)AO(ω^)\displaystyle\operatorname{AO}(\omega_{imp})-\operatorname{AO}(\hat{\omega}) =νTνννTν^ν^\displaystyle=\nu^{T}\frac{\nu}{\|\nu\|}-\nu^{T}\frac{\hat{\nu}}{\|\hat{\nu}\|} (1)
=ννTν^ν^\displaystyle=\|\nu\|-\nu^{T}\frac{\hat{\nu}}{\|\hat{\nu}\|} (2)
νν(νϵ/2)ν+ϵ/2\displaystyle\leq\|\nu\|-\frac{\|\nu\|(\|\nu\|-\epsilon/2)}{\|\nu\|+\epsilon/2} (3)
=νϵν+ϵ/2ϵ\displaystyle=\frac{\|\nu\|\epsilon}{\|\nu\|+\epsilon/2}\leq\epsilon (4)

1.2 Prediction Risk

Proof of Lemma 1.
Risk(ω)\displaystyle Risk(\omega) =𝔼x,a[(ωTV(x+Ma)ωT(x+Ma))2]\displaystyle=\mathbb{E}_{x,a}\left[\left(\omega^{T}V\left(x+Ma\right)-{\omega^{*}}^{T}\left(x+Ma\right)\right)^{2}\right]
=\displaystyle= 𝔼x,a[((ωTVxωTx)+(ωTVMaωTMa))2]\displaystyle\mathbb{E}_{x,a}\left[\left(\left(\omega^{T}Vx-{\omega^{*}}^{T}x\right)+\left(\omega^{T}VMa-{\omega^{*}}^{T}Ma\right)\right)^{2}\right]
=\displaystyle= 𝔼x,a[(ωTVxωTx)2]+𝔼x,a[(Vωω)Tx(Ma)T(Vωω)]+𝔼x,a[(ωTVMaωTMa)2]\displaystyle\mathbb{E}_{x,a}\left[\left(\omega^{T}Vx-{\omega^{*}}^{T}x\right)^{2}\right]+\mathbb{E}_{x,a}\left[\left(V\omega-\omega^{*}\right)^{T}x(Ma)^{T}\left(V\omega-\omega^{*}\right)\right]+\mathbb{E}_{x,a}\left[\left(\omega^{T}VMa-{\omega^{*}}^{T}Ma\right)^{2}\right]
=\displaystyle= 𝔼x[(ωTVxωTx)2]+𝔼a[(ωTVMaωTMa)2]\displaystyle\mathbb{E}_{x}\left[\left(\omega^{T}Vx-{\omega^{*}}^{T}x\right)^{2}\right]+\mathbb{E}_{a}\left[\left(\omega^{T}VMa-{\omega^{*}}^{T}Ma\right)^{2}\right]

where the last line follows because MaMa and xx are uncorrelated. ∎

1.3 Parameter Estimation

In this section we describe how we recover ω^opt\hat{\omega}_{opt} in L2L^{2}-distance when there exists an ω\omega such that Σ+Gω\Sigma+G\omega is full rank. Before we proceed we make a couple of observations. When there is no way to make the above matrix full rank, we cannot hope to recover the optimal ω^opt\hat{\omega}_{opt}. If there is no natural variation in e.g. the last two features, and furthermore no agent can act along those features, it is not possible to disentangle their potential effects on the outcome. This also suggests that the parameter recovery is a more substantive demand for the decision maker than the standard linear regression setting. To discover this additional information, the decision maker can incentivize the agents to take actions that help the decision-maker recover the true outcome-governing parameters.

This motivates the algorithm we present in this section. It operates in two stages. First, it recovers the information necessary in order to to identify the decision rule which will provide the most informative agent samples after those agents have gamed. Second, it collects data while incentivizing this action. Finally, it computes an estimate of ω^opt\hat{\omega}_{opt} using the collected data. We present the complete procedure in Algorithm 1.

Algorithm 1 Recovering the Causal Model
1:  Let k1=λmax(GTG)k_{1}=\lambda_{max}(G^{T}G) and k2=Σ2k_{2}=||\Sigma||^{2}
2:  Let κmin=λmin(Σ)\kappa_{min}=\lambda_{min}(\Sigma)
3:  Choose an ϵ>0\epsilon>0
4:  Let n1=O(max(dk1κmin,d2k2κmin))n_{1}=O(\max(\frac{dk_{1}}{\kappa_{min}},\frac{d^{2}k_{2}}{\kappa_{min}}))
5:  Collect samples x1,,xn1x_{1},\ldots,x_{n_{1}}
6:  Let μ^=1n1xi\hat{\mu}=\frac{1}{n_{1}}\sum x_{i}
7:  Let Σ^=1n1xixiT\hat{\Sigma}=\frac{1}{n_{1}}\sum x_{i}x_{i}^{T}
8:  Let n2=O(max(d2μ^2tr(Σ),d3G2tr(Σ)))n_{2}=O(\max({d^{2}||\hat{\mu}||^{2}\mathrm{tr}(\Sigma),d^{3}||G||^{2}\mathrm{tr}(\Sigma)}))
9:  for  i=1di=1...d do
10:     ω=ei\omega=e_{i}
11:     Sample x1,,xn2ix_{1},\ldots,x^{i}_{n_{2}} and subtract μ^\hat{\mu} from each one.
12:     Let G^i=1n2j=1n2xj\hat{G}_{i}=\frac{1}{n_{2}}\sum\limits_{j=1}^{n_{2}}x_{j}
13:  end for
14:  Let ω^opt=argminωΣ^+2μωTG^T+G^ωωTG^T\hat{\omega}_{opt}=\operatorname*{arg\,min}\limits_{\omega}\hat{\Sigma}+2\mu\omega^{T}\hat{G}^{T}+\hat{G}\omega\omega^{T}\hat{G}^{T}
15:  Let n3=O(dϵκmin)n_{3}=O(\frac{d}{\epsilon\kappa_{min}})
16:  Sample x1,,xn3x_{1},\ldots,x_{n_{3}} with ω=ω^opt\omega=\hat{\omega}_{opt}.
17:  Return the output of OLS on x1,,xn3x_{1},\ldots,x_{n_{3}}

The procedure in Algorithm 1 can be summarized as follows:

  1. 1.

    Estimate the first and second moments of the distribution of agents’ features.

  2. 2.

    Estimate the Gramian of the action matrix GG.

  3. 3.

    Compute the most informative choice of ω\omega.

  4. 4.

    Collect samples under the most informative ω\omega and then return the output of OLS.

Before we proceed to the proof of correctness of Algorithm 1, let us build some intuition for why this procedure of choosing a single ω\omega and collecting samples under said ω\omega makes sense. As we show later, the convergence of OLS for linear regression can be controlled by the minimum eigenvalue of the second moment matrix of the samples. Our algorithm finds the value of ω\omega that, after agents game, maximizes this minimum eigenvalue in expectation. It turns out the minimum eigenvalue of the expected second moment matrix of post-gaming samples is convex with respect to the choice of ω\omega. The convexity of the objective suggests that a priori, when choosing ω\omegas to obtain informative samples, the optimal strategy is choose a single specific ω\omega.

The main difficulty in the rest of the algorithm is achieving the necessary precision in the estimation to be able to set up the above optimization problem to identify such an ω\omega.

Theorem 3. When V=IV=I, the output of Algorithm 1 run with parameter ϵ\epsilon satisfies ωωϵ||\omega-\omega^{*}||\leq\epsilon with probability greater than 23\frac{2}{3}.

The proof of Theorem 3 relies on several lemmas. First we bound the L2L_{2} error of OLS as a function of the empirical second moment matrix in Lemma 1. Note that the usual bound for the convergence of OLS is distribution dependent. That is, the expected error is small.

Lemma 1.

Assume V=IV=I. Consider samples x1,,xnx_{1},\ldots,x_{n} and yi=ω^optTxi+ηiy_{i}={\hat{\omega}_{opt}}^{T}x_{i}+\eta_{i}. Let ω\omega be the output of OLS (xi,yi)(x_{i},y_{i}). Then

𝔼η[ωω^opt2]dnκmin\mathbb{E}_{\eta}\left[||\omega-\hat{\omega}_{opt}||^{2}\right]\leq\frac{d}{n\kappa_{min}}

The above proof is elementary and a slight modification of the standard textbook proof (see for example, (liangstat)).

The proof also requires that the optimization to choose the optimal ω\omega is convex.

Lemma 2.

The minimum eigenvalue of the following matrix is convex with respect to ω\omega for any values of x,Gx,G.

i(xi+G^ω)(xi+G^ω)T\sum\limits_{i}(x_{i}+\hat{G}\omega)(x_{i}+\hat{G}\omega)^{T}

Furthermore, when the following conditions are true, then the minimum eigenvalue of the above is within a constant factor of the optimal value.

𝔼[(x+Gω)(x+G^ω)T]\mathbb{E}[(x+G\omega)(x+\hat{G}\omega)^{T}].

  • Σ^Σ2ϵ||\hat{\Sigma}-\Sigma||^{2}\leq\epsilon

  • μμ^2λmax(GTG)ϵd||\mu-\hat{\mu}||^{2}\leq\frac{\lambda_{max}(G^{T}G)\epsilon}{d}

  • G^G2ϵdμ2||\hat{G}-G||^{2}\leq\frac{\epsilon}{d||\mu||^{2}}

  • G^G2ϵd2G2||\hat{G}-G||^{2}\leq\frac{\epsilon}{d^{2}||G||^{2}}

Finally, the above holds true even for an ω\omega with distance at most O(1poly(d))O(\frac{1}{poly(d)}) from the optimum.

Finally, we use a minor lemma for recover of a random vector via the empirical mean estimator. Note that we treat the matrix GG as a vector.

Lemma 3.

Assume V=IV=I. Let g1i,,gnig_{1}^{i},\ldots,g_{n}^{i} be drawn from the distribution Gi+ξG_{i}+\xi and G^\hat{G} be the empirical mean estimator computed from said gjig_{j}^{i}’s. Let Σ\Sigma be the expected second moment matrix of the ξ\xis. Then

𝔼ξGG^2d2tr(Σ)n\mathbb{E}_{\xi}||G-\hat{G}||^{2}\leq\frac{d^{2}\mathrm{tr}(\Sigma)}{n}

We proceed with the proof of Theorem 3 below.

Proof.

The first step of the algorithm is for recovering an estimate of Σ\Sigma and μ\mu. Note that n1n_{1} samples suffice to recover Σ^\hat{\Sigma} and μ^\hat{\mu} such that:

  • Σ^Σ2ϵ||\hat{\Sigma}-\Sigma||^{2}\leq\epsilon

  • μμ^2λmax(GTG)ϵd||\mu-\hat{\mu}||^{2}\leq\frac{\lambda_{max}(G^{T}G)\epsilon}{d}

The for loop recovers an estimate of GG. Via Lemma 3, the samples suffice to ensure that the following two conditions hold:

  • G^G2ϵdμ2||\hat{G}-G||^{2}\leq\frac{\epsilon}{d||\mu||^{2}}

  • G^G2ϵd2G2||\hat{G}-G||^{2}\leq\frac{\epsilon}{d^{2}||G||^{2}}

Then the algorithm computes an estimate of the optimal ω\omega. Via Lemma 2, we have that the optimum guarantees the minimum eigenvalue of an approximate solution will be within a constant factor of the optimum.

This ω\omega guarantees that n3n_{3} samples suffice to ensure the recover of ω\omega^{*} within squared L2L^{2}-distance of O(ϵ)O(\epsilon) in expectation.

Finally the expectations can be used with a Markov inequality to ensure the algorithm succeeds with (arbitrarily high) constant probability. ∎

Now we prove the lemmas. We begin with Lemma 1. This proof is a slight modification of the textbook proof for the convergence of OLS.

Proof.

In this section we derive a bound on the convergence of the least squares estimator when a fixed design matrix XX is used. Note this is exactly the case we encounter, since the choice of ω\omega lets us affect the entries of the design matrix. This is a standard, textbook result and not a main contribution of the paper.

In order to state the result more formally we have to introduce some notation. The goal of the procedure is to recover ω^opt\hat{\omega}_{opt}, when given tuples (xi,ω^optxi+η)(x_{i},\hat{\omega}_{opt}x_{i}+\eta) where η\eta is 1-subgaussian. We aim to characterize ωω^opt||\omega-\hat{\omega}_{opt}|| where ω\omega is obtained from ordinary least squares. Let XX be the vector with the xix_{i}’s in its columns. Let κmin\kappa_{min} be the minimum eigenvalue of 1nXTX\frac{1}{n}X^{T}X (the second moment matrix).

Below all expectations are taken only over the random noise. We assume the second moment matrix is full rank.

𝔼[ωω^opt2]\displaystyle\mathbb{E}[||\omega-\hat{\omega}_{opt}||^{2}] 𝔼[1nκmin(ωω^opt)XTX(ωω^opt)]\displaystyle\leq\mathbb{E}[\frac{1}{n\kappa_{min}}(\omega-\hat{\omega}_{opt})X^{T}X(\omega-\hat{\omega}_{opt})]
=𝔼[1nκminX(ωω^opt)2]\displaystyle=\mathbb{E}[\frac{1}{n\kappa_{min}}||X(\omega-\hat{\omega}_{opt})||^{2}]
=1nκmin𝔼[X(XTX)1XT(Xω^opt+η)Xω^opt2]\displaystyle=\frac{1}{n\kappa_{min}}\mathbb{E}[||X(X^{T}X)^{-1}X^{T}(X\hat{\omega}_{opt}+\eta)-X\hat{\omega}_{opt}||^{2}]
=1nκmin𝔼[X(XTX)1XTη2]\displaystyle=\frac{1}{n\kappa_{min}}\mathbb{E}[||X(X^{T}X)^{-1}X^{T}\eta||^{2}]
dnκmin\displaystyle\leq\frac{d}{n\kappa_{min}}

This motivates our procedure for parameter recovery. We do so in a fashion that attempts to maximize κmin\kappa_{min}. Note that it is the minimum eigenvalue that determines the convergence rate. This is due to the fact that little variation along a dimension makes it hard to disentangle the features’ effect on the outcome via ω^opt\hat{\omega}_{opt} from the constant-variance noise η\eta. ∎

Lemma 2 is somewhat more involved. It is proven in three parts. The first is that the optimization problem is convex. The second is that approximate recovery of S,μ,S,\mu, and GG suffice for approximately minimizing the original expression. The third is that an approximate solution suffices.

Proof.

In this section we describe how to choose the value of ω\omega that maximizes the value of κmin\kappa_{min} for the samples we obtain.

To do so, we examine the expectation of the second moment matrix and make several observations. Let Σ\Sigma denote the expected second moment matrix of xx (i.e. 𝔼[xxT]\mathbb{E}[xx^{T}]. We have:

𝔼[(x+Gω)(x+Gω)T]=Σ+2μωTGT+GωωTGT\mathbb{E}[(x+G\omega)(x+G\omega)^{T}]=\Sigma+2\mu\omega^{T}G^{T}+G\omega\omega^{T}G^{T}
  1. 1.

    The minimum eigenvalue of the above expression is concave with respect to ω\omega. This follows due to the following: x+Gωx+G\omega is a linear operator, the minimum eigenvalue of a Gramian matrix XTXX^{T}X is concave with respect to XX, and the expectation of a concave function is concave (boyd2004convex).

  2. 2.

    Since the agent attempts to maximize their motion in the ω\omega direction, we want to ensure that we move them toward toward the direction that maximizes the minimum eigenvalue of 𝔼[(x+Gω)(x+Gω)T]\mathbb{E}[(x+G\omega)(x+G\omega)^{T}].

However, we do not operate with exact knowledge of GG, etc. It turns out that even approximately solving this optimization problem with estimates for G,Σ,μG,\Sigma,\mu suffices for our purposes, as long as the ω\omega we obtain from our optimization (using the estimates) results in a high value for the minimum eigenvalue of 𝔼[(x+Gω)(x+Gω)T]\mathbb{E}[(x+G\omega)(x+G\omega)^{T}]. Let ω^\hat{\omega} be the maximizing argument for the estimated optimization problem and let ω\omega be the maximizing argument for the original optimization problem. Let QQ be the true maximized second moment matrix including gaming, and Q^\hat{Q} be the maximizing second moment matrix with gaming resulting from replacing the true Σ,μ,G\Sigma,\mu,G with the estimates. In formal terms, we need to show the minimum eigenvalue of the following is large: 𝔼[(x+Gω^)(x+Gω^)T]\mathbb{E}[(x+G\hat{\omega})(x+G\hat{\omega})^{T}]. We note that when yTQ^yy^{T}\hat{Q}y is within ϵ\epsilon of yTQyy^{T}Qy for all yy in the unit ball, the minimum eigenvalues may differ by at most ϵ\epsilon.

yTQ^yyTQy2\displaystyle||y^{T}\hat{Q}y-y^{T}Qy||^{2} =yT(Q^Q)y2\displaystyle=||y^{T}(\hat{Q}-Q)y||^{2}
λmax2(Q^Q)(y)y2\displaystyle\leq\lambda_{max}^{2}(\hat{Q}-Q)(y)||y||^{2}
Q^Q2\displaystyle\leq||\hat{Q}-Q||^{2}

And now we bound the norm of Σ^Σ2||\hat{\Sigma}-\Sigma||^{2} assuming the following:

  1. 1.

    Σ^Σ2ϵ||\hat{\Sigma}-\Sigma||^{2}\leq\epsilon

  2. 2.

    μμ^2λmax(GTG)ϵd||\mu-\hat{\mu}||^{2}\leq\frac{\lambda_{max}(G^{T}G)\epsilon}{d}

  3. 3.

    G^G2ϵdμ2||\hat{G}-G||^{2}\leq\frac{\epsilon}{d||\mu||^{2}}

  4. 4.

    G^G2ϵd2G2||\hat{G}-G||^{2}\leq\frac{\epsilon}{d^{2}||G||^{2}}

We work out the bound below.

Q^Q2\displaystyle||\hat{Q}-Q||^{2} =Σ+2μω^optTGT+Gω^optω^optTGT(Σ^+2μ^ωTG^T+G^ω^optωoptTG^T)2\displaystyle=||\Sigma+2\mu{\hat{\omega}_{opt}}^{T}G^{T}+G\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}G^{T}-(\hat{\Sigma}+2\hat{\mu}\omega^{T}\hat{G}^{T}+\hat{G}\hat{\omega}_{opt}{\omega_{opt}}^{T}\hat{G}^{T})||^{2}
ΣΣ^2+2μω^optTGTμ^ω^optTG^T2+2\displaystyle\leq||\Sigma-\hat{\Sigma}||^{2}+2||\mu{\hat{\omega}_{opt}}^{T}G^{T}-\hat{\mu}{\hat{\omega}_{opt}}^{T}\hat{G}^{T}||^{2}+||\ldots||^{2}
ϵ+2μω^optTGT+μ^ω^optTGTμ^ω^optTGTμ^ω^optTG^T2+2\displaystyle\leq\epsilon+2||\mu{\hat{\omega}_{opt}}^{T}G^{T}+{\hat{\mu}\hat{\omega}_{opt}}^{T}G^{T}-{\hat{\mu}\hat{\omega}_{opt}}^{T}G^{T}-\hat{\mu}{\hat{\omega}_{opt}}^{T}\hat{G}^{T}||^{2}+||\ldots||^{2}
ϵ+dμμ^2ω^optTG2+G^G2μ^ωT2+\displaystyle\leq\epsilon+d||\mu-\hat{\mu}||^{2}||{\hat{\omega}_{opt}}^{T}G||^{2}+||\hat{G}-G||^{2}||\hat{\mu}\omega^{T}||^{2}+\ldots
ϵ+ϵ+G^G2μ^ωT+μωTμωT2+\displaystyle\leq\epsilon+\epsilon+||\hat{G}-G||^{2}||\hat{\mu}\omega^{T}+\mu\omega^{T}-\mu\omega^{T}||^{2}+\ldots
ϵ+ϵ+G^G2(μ^μ2+μω^optT)+\displaystyle\leq\epsilon+\epsilon+||\hat{G}-G||^{2}(||\hat{\mu}-\mu||^{2}+||\mu{\hat{\omega}_{opt}}^{T}||)+\ldots
ϵ+ϵ+G^G2dμ2+\displaystyle\leq\epsilon+\epsilon+||\hat{G}-G||^{2}d||\mu||^{2}+\ldots
3ϵ+G^ω^optω^optTG^TGω^optω^optTGT2\displaystyle\leq 3\epsilon+||\hat{G}\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}\hat{G}^{T}-G\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}G^{T}||^{2}
3ϵ+G^ω^optω^optTG^TG^ω^optω^optTG+G^ω^optω^optTGGω^optω^optTGT2\displaystyle\leq 3\epsilon+||\hat{G}\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}\hat{G}^{T}-\hat{G}\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}G+\hat{G}\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}G-G\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}G^{T}||^{2}
3ϵ+(G^G)ω^optω^optTG^T2+(G^G)ω^optω^optTGT2\displaystyle\leq 3\epsilon+||(\hat{G}-G)\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}\hat{G}^{T}||^{2}+||(\hat{G}-G)\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}G^{T}||^{2}
3ϵ+(G^G)ω^optω^optTG^T(G^G)ω^optω^optTGT+(G^G)ω^optω^optTGT2+d2G^G2G2\displaystyle\leq 3\epsilon+||(\hat{G}-G)\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}\hat{G}^{T}-(\hat{G}-G)\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}G^{T}+(\hat{G}-G)\hat{\omega}_{opt}\hat{\omega}_{opt}^{T}G^{T}||^{2}+d^{2}||\hat{G}-G||^{2}||G||^{2}
4ϵ+(G^G)ω^optω^optT(G^G)2+(G^G)ω^optω^optTGT2\displaystyle\leq 4\epsilon+||(\hat{G}-G)\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}(\hat{G}-G)||^{2}+||(\hat{G}-G)\hat{\omega}_{opt}{\hat{\omega}_{opt}}^{T}G^{T}||^{2}
5ϵ+d2G^G4\displaystyle\leq 5\epsilon+d^{2}||\hat{G}-G||^{4}
6ϵ\displaystyle\leq 6\epsilon

This means if we find an ϵ\epsilon-approximate solution to the system with the estimated values, we obtain a κmin\kappa_{min} within 6ϵ6\epsilon of the optimal. ∎

Finally, we present the proof of Lemma 3:

Proof.

Recall that when the decision-maker fixes ω\omega, it receives samples of the form x+Gωx+G\omega. We note this can be used to recover the matrix GG. In particular, we show how dd rounds, each with O(dtr(Σ)ϵ)O(\frac{d\mathrm{tr}(\Sigma)}{\epsilon}) samples, suffices to recover the matrix to squared Frobenius norm ϵ\epsilon. Recall the procedure we propose simply chooses ω=e1,ed\omega=e_{1},...e_{d}, one-hot coordinate vectors in each round. We first bound the error in G^\hat{G}. coordinate-wise: 𝔼[G^i,jGi,j2]𝔼[xi2]n\mathbb{E}[||\hat{G}_{i,j}-G_{i,j}||^{2}]\leq\frac{\mathbb{E}[x_{i}^{2}]}{n}. A union bound across coordinates shows that O(d2tr(Σ)ϵ)O(\frac{d^{2}\mathrm{tr}(\Sigma)}{\epsilon}) samples suffice to recover GG within squared Frobenius norm ϵ\epsilon. ∎