This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Smoothed Analysis of the Komlós Conjecture

Nikhil Bansal University of Michigan, Ann Arbor. bansal@gmail.com.    Haotian Jiang University of Washington, Seattle. jhtdavid@cs.washington.edu.    Raghu Meka University of California, Los Angeles. raghum@cs.ucla.edu.    Sahil Singla Georgia Institute of Technology, Atlanta. ssingla@gatech.edu.    Makrand Sinha Simons Institute and University of California, Berkeley. makrand@berkeley.edu.
Abstract

The well-known Komlós conjecture states that given nn vectors in d\mathbb{R}^{d} with Euclidean norm at most one, there always exists a ±1\pm 1 coloring such that the \ell_{\infty} norm of the signed-sum vector is a constant independent of nn and dd. We prove this conjecture in a smoothed analysis setting where the vectors are perturbed by adding a small Gaussian noise and when the number of vectors n=ω(dlogd)n=\omega(d\log d). The dependence of nn on dd is the best possible even in a completely random setting.

Our proof relies on a weighted second moment method, where instead of considering uniformly randomly colorings we apply the second moment method on an implicit distribution on colorings obtained by applying the Gram-Schmidt walk algorithm to a suitable set of vectors. The main technical idea is to use various properties of these colorings, including subgaussianity, to control the second moment.

1 Introduction

A central question in discrepancy theory is the following Komlós problem: given vectors v1,,vndv_{1},\ldots,v_{n}\in\mathbb{R}^{d} with Euclidean length at most 11, i.e., vi21\|v_{i}\|_{2}\leq 1 for all i[n]i\in[n], find signs xi{1,1}x_{i}\in\{-1,1\} for i[n]i\in[n] to minimize the discrepancy i=1nxivi\|\sum_{i=1}^{n}x_{i}v_{i}\|_{\infty}. The long-standing Komlós conjecture says that the discrepancy of any collection of such vectors is O(1)O(1), independent of nn and dd. An important special case (up to scaling by t1/2t^{1/2}) is the Beck-Fiala problem, where the vectors v1,,vn{0,1}dv_{1},\ldots,v_{n}\in\{0,1\}^{d} and each viv_{i} has at most tt ones, so vi2t1/2\|v_{i}\|_{2}\leq t^{1/2}. Here, the Komlós conjecture reduces to the Beck-Fiala conjecture [BF81], which says that the discrepancy is O(t1/2)O(t^{1/2}). The question of either proving or disproving these conjectures has received a lot of attention, and after a long line of work, the current best bounds for the Komlos and the Beck-Fiala problem are O((logn)1/2)O((\log n)^{1/2}) and O((tlogn)1/2)O((t\log n)^{1/2}) respectively, due to Banaszczyk [Ban98].

Motivated by the lack of progress for general worst-case instances, there has been a lot of recent work on these problems for random instances, with several interesting results and techniques, see e.g., [EL19, BM19, HR19, FS20, Pot20, TMR20, AN21, MMPPG21]. In this work, we consider the Komlós problem in the more general setting of smoothed analysis, where the input is generated by taking an arbitrary worst case Komlós instance and perturbing it randomly. The smoothed analysis model was first introduced by Spielman and Teng [ST04], and it interpolates nicely between worst case and average case analysis, and has been used extensively since then to study various problems. Recently smoothed analysis models have also been considered in discrepancy theory in a few other works [BJM+22, HRS21] — however the setting and focus of these results is quite different, and in particular they are not directly related to the Komlós or Beck-Fiala conjectures.

Random instances. To put our results in the proper context, we first describe the results on random instances. In general, these results depend on the different regimes of the parameters d,nd,n and tt, and we focus here on the more interesting case of ndn\gg d.

A natural model for random Beck-Fiala instances is where each entry is 11 with probability p=t/dp=t/d, so that each column has tt ones in expectation. In a surprising result, Hoberg and Rothvoss [HR19] showed that disc(A)1\mathrm{disc}(A)\leq 1 w.h.p.111This is much better than the O(t1/2)O(t^{1/2}) bound in the Beck-Fiala conjecture. if n=Ω(d2logd)n=\Omega(d^{2}\log d). Independently, Franks and Saks [FS20] showed that disc(A)2\mathrm{disc}(A)\leq 2 w.h.p. if n=Ω(d3log2d)n=\Omega(d^{3}\log^{2}d), for a more general class of instances. Both these results use interesting Fourier analysis based techniques.

It is not hard to see222 If we fix any coloring xx and consider a random instance, a fixed row has discrepancy O(1)O(1) with probability (pn)1/2\approx(pn)^{-1/2}, so the probability that each row has discrepancy O(1)O(1) is (pn)Ω(d)(pn)^{-\Omega(d)}. As there are (only) 2n2^{n} possible colorings, a first moment argument already requires that 2n(pn)d=Ω(1)2^{n}(pn)^{-d}=\Omega(1). that n=Ω(dlogd)n=\Omega(d\log d) is necessary for O(1)O(1) discrepancy (provided pp is not too small). An important step towards obtaining this optimal dependence was made by Potukuchi [Pot18], who showed that disc(A)1\mathrm{disc}(A)\leq 1 if n=Ω(dlogd)n=\Omega(d\log d) for the dense case of p=1/2p=1/2, using the second moment method. However, the sparse setting with p1p\ll 1 turns out to be more subtle, and was only recently resolved by Altschuler and Weed [AN21] using a more sophisticated conditional second moment method together with Stein’s method of exchangeable pairs. They show that disc(A)1\mathrm{disc}(A)\leq 1 w.h.p. for nΩ(dlogd)n\geq\Omega(d\log d), for every pp.

The case of Gaussian matrices with i.i.d. 𝒩(0,1)\mathcal{N}(0,1) entries has also been considered, where Turner, Meka and Rigollet [TMR20] give almost tight bounds for the entire regime, and in particular show that for n=Ω(dlogd)n=\Omega(d\log d) a discrepancy bound of 1/poly(d)1/\mathrm{poly}(d) holds.

The smoothed Komlós model. We now define our model formally. The input matrix is of the form A=M+RA=M+R, where Md×nM\in\mathbb{R}^{d\times n} is some worst-case matrix with columns of 2\ell_{2}-norm at most 11 and Rd×nR\in\mathbb{R}^{d\times n} is a random matrix with i.i.d. Gaussian entries distributed as 𝒩(0,σ2/d)\mathcal{N}(0,\sigma^{2}/d), where σ1\sigma\leq 1. The σ2/d\sigma^{2}/d variance ensures that each column of RR has 2\ell_{2}-norm roughly σ\sigma (and hence much less than that of MM). Our goal is to understand the discrepancy of AA. We will only be interested in showing the existence of a low discrepancy coloring for AA, and not in algorithmically finding it (this seems far beyond the current techniques).

1.1 Results and Techniques

Our main result is the following.

Theorem 1.1 (Smoothed Komlós).

Let σ>0\sigma>0 and n=ω(dlogd)σ4/3n=\frac{\omega(d\log d)}{\sigma^{4/3}}. Then with probability 1od(1)1-o_{d}(1), the discrepancy of M+RM+R is at most 1/poly(d)1/\mathrm{poly}(d), where Md×nM\in\mathbb{R}^{d\times n} is an arbitrary Komlós instance and Rd×nR\in\mathbb{R}^{d\times n} has i.i.d. 𝒩(0,σ2/d)\mathcal{N}(0,\sigma^{2}/d) entries.

An interpretation of Theorem 1.1 is that any counter-example to the Komlós conjecture (if it exists) will be rigid, or have ndn\approx d. Also, notice that dependence of nn on dd in Theorem 1.1 is essentially the best possible, as already evident in the very special case of M=𝟎M=\bf{0} and σ=1\sigma=1, i.e., a random matrix with i.i.d. 𝒩(0,1)\mathcal{N}(0,1) entries, where n=Ω(dlogd)n=\Omega(d\log d) is necessary to achieve 1/poly(d)1/\mathrm{poly}(d) discrepancy, as discussed earlier.

Remark 1.2.

Our proof techniques also give a high probability bound when n=Ωσ(d1+ϵ)n=\Omega_{\sigma}(d^{1+\epsilon}) for any constant ϵ>0\epsilon>0. However, we do not explore this direction here. It would be interesting to know if the result also holds with high probability when n=ω(dlogd)n=\omega(d\log d), as in the (fully) random setting.

Remark 1.3.

The dependence on the noise parameter σ\sigma in n=Ωd(1/σ)n=\Omega_{d}(1/\sigma) in Theorem 1.1 is necessary, otherwise this would imply an O(1)O(1) bound for the worst case Komlós problem. In particular, each row of the random part RR must at least have enough 1\ell_{1} norm to offset the discrepancy from the worst case part MM (which can be O((logn)1/2)O((\log n)^{1/2}) given the currently known results). As each entry of RR has magnitude about σd1/2\sigma d^{-1/2}, we thus require n=Ωd(1/σ)n=\Omega_{d}(1/\sigma) for each row of M+RM+R to have discrepancy O(1)O(1).

The proof of Theorem 1.1 is based on the classical second moment method, however, it requires several additional ideas beyond those used for random instances, to handle the effect of the worst case part and its interplay with the random part. We describe these briefly next, and discuss them in more detail in Section 1.2.

  • Weighted second moment method. Instead of applying the second moment method to the uniform distribution on the 2n2^{n} colorings, we consider a distribution on low-discrepancy colorings for MM. This is necessary as for a random coloring x{1,1}nx\in\{-1,1\}^{n}, a typical entry of MxMx will scale as n/d\sqrt{n/d}, which is very unlikely to be cancelled by the discrepancy of the random part RxRx, which typically scales as σn/d\sigma\sqrt{n/d} (note that we want to show the existence of some xx such that (Rx)i(Mx)i(Rx)_{i}\approx-(Mx)_{i} for each coordinate i[d]i\in[d]).

  • Subgaussianity of colorings. To ensure that Mx\|Mx\|_{\infty} is typically small, we consider the (implicit) distribution on colorings produced by the Gram-Schmidt (GS) algorithm [BDGL19] applied to MM, which ensures that MxMx is a 11-subgaussian vector [HSSZ19] (details in Section 1.2).

    However, apriori the GS algorithm only guarantees that MxMx is subgaussian, and says nothing about the distribution on the colorings xx. For instance, it could be that any two colorings in the support have the first 9n/109n/10 coordinates identical, and thus look very non-random. This makes the second moment bounds much worse and harder to control.

    To handle this, we use a simple but useful trick to ensure that the distribution on the colorings xx produced by the GS algorithm is also O(1)O(1)-subgaussian. Roughly, this allows us to pretend that colorings xx in the GS distribution behave randomly.

  • Exploiting subgaussianity to get cancellations across rows. Most importantly, due to the worst case part MM, doing a row by row analysis as is typically done in second moment computations for random instances, only works when n=Ω(d2/σ2)n=\Omega(d^{2}/\sigma^{2}) (details in Section 1.2). Roughly, the problem is that considering each coordinate of MxMx separately completely ignores the global properties across the different coordinates that subgaussianity of MxMx implies.

    To get the optimal dependence of nn on dd, a key conceptual idea is to analyze all the rows together and use the subgaussianity of MxMx and xx carefully to get various cancellations across the different rows in the second moment computation. Exploiting subgaussianity also leads to various technical difficulties, as subgaussian vectors can differ from fully random Gaussian vectors in various non-trivial ways.

Notation. Throughout this paper, log\log denotes the natural logarithm. We use the asymptotic notation ω()\omega(\cdot) or o()o(\cdot) where the growth is always with respect to dd — sometimes to emphasize this dependence we will also write ωd()\omega_{d}(\cdot) or od()o_{d}(\cdot). We write 𝔼x𝒢[f(x)]\mathbb{E}_{x\sim\mathcal{G}}[f(x)] to denote the expectation of a function ff where xx is sampled from the distribution 𝒢\mathcal{G} and we abbreviate this to 𝔼[f(x)]\mathbb{E}[f(x)] when the distribution is clear from the context. For reals a,ba,b\in\mathbb{R}, the notation [a±b][a\pm b] is used as a shorthand to denote the interval [ab,a+b][a-b,a+b]. For a set SdS\in\mathbb{R}^{d}, we write δS={δxxS}\delta S=\{\delta x\mid x\in S\} to denote the δ\delta scaling of SS.

1.2 Overview and Preliminaries

We now give a more detailed overview of the proof and the ideas. We also briefly describe the second moment method and some concepts we need such as subgaussianity and properties of the Gram-Schmidt algorithm.

Second moment method. The second moment method (e.g. [AS16]) is based on the following Paley-Zygmund inequality. For any non-negative random variable ZZ, we have that

[Z>0](𝔼[Z])2𝔼[Z2].\mathbb{P}[Z>0]\geq\frac{(\mathbb{E}[Z])^{2}}{\mathbb{E}[Z^{2}]}.

So, if 𝔼[Z2]=(1+o(1))(𝔼[Z])2\mathbb{E}[Z^{2}]=(1+o(1))(\mathbb{E}[Z])^{2}, then this implies that [Z>0]1o(1)\mathbb{P}[Z>0]\geq 1-o(1).

For constraint satisfaction problems, a standard way to use this to show that most random instances are feasible is by defining S=S(R)S=S(R) as the number of solutions to an instance RR, and showing that

𝔼R[S2]=(1+o(1))(𝔼R[S])2,\mathbb{E}_{R}[S^{2}]=(1+o(1))(\mathbb{E}_{R}[S])^{2}, (1)

which gives that R[S(R)>0]=R[S(R)1]1o(1)\mathbb{P}_{R}[S(R)>0]=\mathbb{P}_{R}[S(R)\geq 1]\geq 1-o(1).

Let us consider (1) more closely and define S(R,x)=1S(R,x)=1 if xx is a valid solution for instance RR, and 0 otherwise. Then S(R)=xS(R,x)S(R)=\sum_{x}S(R,x) and (1) can be written as

𝔼R[x,y𝒰[S(R,x)=1,S(R,y)=1]]=(1+o(1))(𝔼R[x𝒰[S(R,x)=1]])2,\mathbb{E}_{R}\left[\mathbb{P}_{x,y\sim\mathcal{U}}\big{[}\,S(R,x)=1\,,\,S(R,y)=1\,\big{]}\right]=(1+o(1))\left(\mathbb{E}_{R}\left[\mathbb{P}_{x\sim\mathcal{U}}\left[S(R,x)=1\right]\right]\right)^{2}, (2)

where 𝒰\mathcal{U} is the uniform distribution over all inputs.

Second moment method for smoothed Komlós. Let Δ\Delta denote the desired discrepancy bound. In our setting, denote by S(R,x)=1S(R,x)=1 that x{±1}nx\in\{\pm 1\}^{n} is a feasible coloring for the smoothed Komlós instance M+RM+R, that is, if (M+R)xΔ\|(M+R)x\|_{\infty}\leq\Delta. Roughly, this condition means that Rx=MxRx=-Mx and hence the discrepancy of the random part RR cancels that of the worst case part MM.

However, if xx is chosen uniformly from {±1}n\{\pm 1\}^{n}, it is not hard to see that this cannot work. The entries (Mx)i(Mx)_{i} will be distributed roughly as 𝒩(0,mi2)\mathcal{N}(0,m_{i}^{2}) where mi=(jMij2)1/2m_{i}=(\sum_{j}M_{ij}^{2})^{1/2} is the 2\ell_{2}-norm of row ii of MM, and in general will be much larger (around 1/σ11/\sigma\gg 1 times) than the entries (Rx)i(Rx)_{i}.

Weighted second moment. To allow a reasonable probability of RxRx cancelling MxMx, a natural idea is to consider a distribution that is mostly supported over colorings xx with low discrepancy on MM. So, we will show (2) where x,yx,y are sampled from some another suitable distribution 𝒢\mathcal{G} instead of the uniform distribution 𝒰\mathcal{U}. Similar ideas have also been used in other contexts such as [AP04]. Notice that this does not affect RxRx, as for any fixed xx, the contribution of the random part (Rx)i(Rx)_{i} is still distributed as N(0,nσ2/d)N(0,n\sigma^{2}/d) (over the randomness of RR).

A natural candidate is the distribution on colorings produced by the Gram-Schmidt (GS) walk algorithm [BDGL19]. In particular, we use the following result.

Theorem 1.4 ([HSSZ19]).

Given vectors v1,,vnmv_{1},\dots,v_{n}\in\mathbb{R}^{m} with vj21\|v_{j}\|_{2}\leq 1, the Gram-Schmidt walk algorithm outputs a random coloring x{1,1}nx\in\{-1,1\}^{n} such that j=1nxjvj\sum_{j=1}^{n}x_{j}v_{j} is 1-subgaussian.

Recall that a random vector YmY\in\mathbb{R}^{m} is α\alpha-subgaussian if for all test vectors θm\theta\in\mathbb{R}^{m},

𝔼[exp(θ,Y)]exp(α2θ222).\mathbb{E}\big{[}\exp(\langle\theta,Y\rangle)\big{]}\leq\exp\bigg{(}\frac{\alpha^{2}\|\theta\|_{2}^{2}}{2}\bigg{)}.

Roughly, this means that YY looks like a Gaussian with variance at most α2\alpha^{2} in every direction.

Let 𝒢\mathcal{G} denote the (implicit) distribution over the colorings output by the GS walk algorithm. For a coloring xx, let us denote Px:=R[S(R,x)=1]P_{x}:=\mathbb{P}_{R}[S(R,x)=1] and for two colorings xx and yy, let

Px,y:=R[S(R,x)=1,S(R,y)=1].P_{x,y}:=\mathbb{P}_{R}\big{[}S(R,x)=1\,,\,S(R,y)=1\big{]}.

Then changing the order of expectation in (2) and substituting, our goal is to show that

𝔼x,y𝒢[Pxy]=(1+o(1))𝔼x,y𝒢[PxPy].\mathbb{E}_{x,y\sim\mathcal{G}}[P_{xy}]=(1+o(1))\cdot\mathbb{E}_{x,y\sim\mathcal{G}}[P_{x}P_{y}]. (3)

However, the set of low discrepancy colorings for MM and the distribution 𝒢\mathcal{G} can be quite complicated and hard to work with. Later, we will ensure that 𝒢\mathcal{G} is also O(1)O(1)-subgaussian, which will suffice for our purposes. Let us first consider (3) more closely.

The key computation. As RR has i.i.d. Gaussian entries, the quantities PxP_{x} and PxyP_{xy} can be written in a very clean way. In particular, as (Rx)i𝒩(0,σ2n/d)(Rx)_{i}\sim\mathcal{N}(0,\sigma^{2}n/d) for any coloring xx, and the (Rx)i(Rx)_{i} are independent for i[d]i\in[d], we can write

Px=R[i=1d((Rx)i(Mx)i±Δ)]=i=1dR[gi(Mx)i±Δ],P_{x}~{}=~{}\mathbb{P}_{R}\bigg{[}\bigcap_{i=1}^{d}((Rx)_{i}\in(Mx)_{i}\pm\Delta)\bigg{]}~{}=~{}\prod_{i=1}^{d}\mathbb{P}_{R}\big{[}g_{i}\in(Mx)_{i}\pm\Delta\big{]},

where giN(0,σ2n/d)g_{i}\sim N(0,\sigma^{2}n/d) and gig_{i}’s are independent.

Similarly, for any fixed colorings xx and yy, writing gi=(Rx)ig_{i}=(Rx)_{i} and gi=(Ry)ig_{i}^{\prime}=(Ry)_{i} we have

Pxy=i=1dR[gi(Mx)i±Δ,gi(My)i±Δ],P_{xy}=\prod_{i=1}^{d}\mathbb{P}_{R}\big{[}g_{i}\in(Mx)_{i}\pm\Delta\,,\,g^{\prime}_{i}\in(My)_{i}\pm\Delta\big{]},

where gig_{i} and gig^{\prime}_{i} are correlated with 𝔼[gigi]=x,yσ2/d\mathbb{E}[g_{i}g_{i}^{\prime}]=\langle x,y\rangle\cdot\sigma^{2}/d.

A standard computation of 2-dimensional gaussian probabilities over rectangles (and ignoring some less crucial terms for the discussion here) gives

[gi(Mx)i±Δ,gi(My)i±Δ][gi(Mx)i±Δ][gi(My)i±Δ]exp(dx,y(Mx)i(My)iσ2n2).\frac{\mathbb{P}\big{[}g_{i}\in(Mx)_{i}\pm\Delta\,,\,g_{i}^{\prime}\in(My)_{i}\pm\Delta\big{]}}{\mathbb{P}\big{[}g_{i}\in(Mx)_{i}\pm\Delta\big{]}\cdot\mathbb{P}\big{[}g_{i}^{\prime}\in(My)_{i}\pm\Delta\big{]}}\approx\exp\bigg{(}\frac{d\langle x,y\rangle(Mx)_{i}(My)_{i}}{\sigma^{2}n^{2}}\bigg{)}. (4)

So to prove (3), we could try to show that for each i[d]i\in[d],

𝔼x,y𝒢[dx,y(Mx)i(My)iσ2n2]=o(1d).\mathbb{E}_{x,y\sim\mathcal{G}}\bigg{[}\frac{d\langle x,y\rangle(Mx)_{i}(My)_{i}}{\sigma^{2}n^{2}}\bigg{]}=o\bigg{(}\frac{1}{d}\bigg{)}. (5)

Indeed, as |x,y|n|\langle x,y\rangle|\leq n and (Mx)i,(My)i(Mx)_{i},(My)_{i} are typically O(1)O(1) (as MxMx and MyMy are subgaussian), setting n=ω(d2/σ2)n=\omega(d^{2}/\sigma^{2}) would suffice to complete the second moment proof. However, this does not give us the optimal dlogdd\log d dependence.

Next, we sketch the two ideas to obtain the optimal dependence.

Subgaussianity of the distribution 𝒢\mathcal{G}. If xx and yy were random colorings, we would typically expect that |x,y|n|\langle x,y\rangle|\approx\sqrt{n} instead of nn above. To achieve this, we apply the GS walk algorithm to the (d+n)×n(d+n)\times n matrix with MM in top dd rows and InI_{n} in the bottom nn rows. (Note that each column still has O(1)O(1) length.) This ensures that the resulting distribution 𝒢\mathcal{G} on the colorings xx is O(1)O(1)-subgaussian, while ensuring that MxMx is also O(1)O(1)-subgaussian.

Handling the rows together. Next, to exploit the subgaussianity of MxMx and MyMy, we look at all the rows together in (5) and consider

i𝔼x,y𝒢[dx,y(Mx)i(My)iσ2n2]=𝔼x,y𝒢[dx,yMx,Myσ2n2].\sum_{i}\mathbb{E}_{x,y\sim\mathcal{G}}\bigg{[}\frac{d\langle x,y\rangle(Mx)_{i}(My)_{i}}{\sigma^{2}n^{2}}\bigg{]}=\mathbb{E}_{x,y\sim\mathcal{G}}\bigg{[}\frac{d\langle x,y\rangle\langle Mx,My\rangle}{\sigma^{2}n^{2}}\bigg{]}. (6)

By the subgaussianity of the colorings x,yx,y and discrepancy vectors Mx,MyMx,My, we expect that 𝔼x,y𝒢|x,y|n\mathbb{E}_{x,y\sim\mathcal{G}}|\langle x,y\rangle|\approx\sqrt{n} and 𝔼x,y|Mx,My|d\mathbb{E}_{x,y}|\langle Mx,My\rangle|\approx\sqrt{d}. Roughly speaking, this implies that the right side of (6) is typically d3/2/(σ2n3/2)d^{3/2}/(\sigma^{2}n^{3/2}), and hence nd/σ4/3n\gg d/\sigma^{4/3} suffices.

The formal argument needs some more care as x,y\langle x,y\rangle and Mx,My\langle Mx,My\rangle are correlated, and as we need to bound the exponential moment of dx,yMx,My/(σ2n2)d\langle x,y\rangle\langle Mx,My\rangle/(\sigma^{2}n^{2}) in (4), instead of the expectation, which gives the additional (necessary) logarithmic factor of logd\log d.

2 Proof of the Smoothed Komlós Conjecture

We use a weighted version of the second moment method as mentioned in the proof overview. Let 𝒢\mathcal{G} be a distribution over coloring that will be specified later. We define the following random variable SS which depends only on the randomness of RR,

S=S(R):=𝔼x𝒢[𝟏{(M+R)xΔ}],\displaystyle S=S(R):=\mathbb{E}_{x\sim\mathcal{G}}[\mathbf{1}\{\|(M+R)x\|_{\infty}\leq\Delta\}],

for some parameter Δ=1/poly(d)\Delta=1/\mathrm{poly}(d) to be chosen later. The purpose of this variable is that the event {S>0}\{S>0\} implies there exists a coloring x𝗌𝗎𝗉𝗉(𝒢)x\in\mathsf{supp}(\mathcal{G}) with discrepancy at most Δ\Delta. Our goal is to show that (S>0)=1o(1)\mathbb{P}(S>0)=1-o(1). As explained in the proof overview, this would follow from the Paley-Zygmund inequality if we can establish that the first moment 𝔼R[S]\mathbb{E}_{R}[S] is always positive, and the second moment satisfies 𝔼R[S2]=(1+o(1))(𝔼R[S])2\mathbb{E}_{R}[S^{2}]=(1+o(1))\cdot(\mathbb{E}_{R}[S])^{2}. We next compute the moments.

First moment computation. We can compute

𝔼R[S]=𝔼x𝒢𝔼R[𝟏{(M+R)xΔ}]>0,\displaystyle\mathbb{E}_{R}[S]~{}=~{}\mathbb{E}_{x\sim\mathcal{G}}\mathbb{E}_{R}[\mathbf{1}\{\|(M+R)x\|_{\infty}\leq\Delta\}]~{}>~{}0,

where the strict inequality follows because fixing any outcome x𝒢x\sim\mathcal{G}, the event {(M+R)xΔ}\{\|(M+R)x\|_{\infty}\leq\Delta\} happens with positive probability (recall that RR is a Gaussian random matrix with each entry 𝒩(0,σ2/d)\mathcal{N}(0,\sigma^{2}/d)).

Second moment computation. For any i[d]i\in[d], denote by mim_{i} and rir_{i} the ithi^{\text{th}} row of the matrices MM and RR respectively. The second moment is given by

𝔼R[S2]\displaystyle\mathbb{E}_{R}[S^{2}] =𝔼R[𝔼x[𝟏{(M+R)xΔ}]𝔼y[𝟏{(M+R)yΔ}]]\displaystyle=\mathbb{E}_{R}\left[\mathbb{E}_{x}[\mathbf{1}\{\|(M+R)x\|_{\infty}\leq\Delta\}]\cdot\mathbb{E}_{y}[\mathbf{1}\{\|(M+R)y\|_{\infty}\leq\Delta\}]\right]
=𝔼R𝔼x,y[𝟏{(M+R)xΔ,(M+R)yΔ}]\displaystyle=\mathbb{E}_{R}\mathbb{E}_{x,y}\left[\mathbf{1}\left\{\|(M+R)x\|_{\infty}\leq\Delta,\|(M+R)y\|_{\infty}\leq\Delta\right\}\right]
=𝔼x,y[R((M+R)xΔ,(M+R)yΔ)]=𝔼xy[Pxy],\displaystyle=\mathbb{E}_{x,y}\left[\mathbb{P}_{R}\left(\|(M+R)x\|_{\infty}\leq\Delta,\|(M+R)y\|_{\infty}\leq\Delta\right)\right]=\mathbb{E}_{xy}[P_{xy}],

where we define

Px,y:=R((M+R)xΔ,(M+R)yΔ).P_{x,y}:=\mathbb{P}_{R}(\|(M+R)x\|_{\infty}\leq\Delta,\|(M+R)y\|_{\infty}\leq\Delta).

Similarly, denoting

Px:=R((M+R)xΔ),P_{x}:=\mathbb{P}_{R}(\|(M+R)x\|_{\infty}\leq\Delta),

we also have

(𝔼R[S])2\displaystyle(\mathbb{E}_{R}[S])^{2} =(𝔼xR((M+R)xΔ))(𝔼yR((M+R)yΔ))\displaystyle=(\mathbb{E}_{x}\mathbb{P}_{R}(\|(M+R)x\|_{\infty}\leq\Delta))\cdot(\mathbb{E}_{y}\mathbb{P}_{R}(\|(M+R)y\|_{\infty}\leq\Delta))
=𝔼x,y[R((M+R)xΔ)R((M+R)yΔ)]\displaystyle=\mathbb{E}_{x,y}\left[\mathbb{P}_{R}(\|(M+R)x\|_{\infty}\leq\Delta)\cdot\mathbb{P}_{R}(\|(M+R)y\|_{\infty}\leq\Delta)\right]
=𝔼xy[PxPy].\displaystyle=\mathbb{E}_{xy}[P_{x}\cdot P_{y}].

To compare the quantities 𝔼R[S2]\mathbb{E}_{R}[S^{2}] and (𝔼R[S])2(\mathbb{E}_{R}[S])^{2}, we first consider a distribution over colorings. A natural distribution to consider is the distribution on colorings derived from the Gram-Schmidt walk which ensures that the discrepancy vector MxMx is 11-subgaussian if xx is sampled from this distribution. However, we shall also need that the colorings xx themselves have a subgaussian tail as well as some additional nice properties that will be useful to compute the second moment. In particular, we prove the following lemma in Section 2.1.

Lemma 2.1 (Truncated Gram-Schmidt Distribution).

Let Md×nM\in\mathbb{R}^{d\times n} be a worst-case Komlós instance. Then, for any constant C>1C^{\prime}>1 there exists a distribution 𝒢\mathcal{G} over colorings x{±1}nx\in\{\pm 1\}^{n} satisfying the following properties:

  • Almost Constant Euclidean Norm for the discrepancy vectors: for every x𝗌𝗎𝗉𝗉(𝒢)x\in\mathsf{supp}(\mathcal{G}) , we have Mx2[r±Δ]\|Mx\|_{2}\in[r\pm\Delta] where r=O(d1/2)r=O(d^{1/2}) and Δ=dC\Delta=d^{-C^{\prime}}.

  • Almost subgaussian tails for the colorings and discrepancy vectors: there exists a constant CC depending on CC^{\prime}, such that for every u𝕊n1u\in\mathbb{S}^{n-1},

    x𝒢[|x,u|t]2dCet2/8 and x𝒢[|Mx,u|t]2dCet2/8.\mathbb{P}_{x\sim\mathcal{G}}\left[|\langle x,u\rangle|\geq t\right]\leq 2d^{C}\cdot e^{-t^{2}/8}\text{ and }\mathbb{P}_{x\sim\mathcal{G}}\left[|\langle Mx,u\rangle|\geq t\right]\leq 2d^{C}\cdot e^{-t^{2}/8}.

Since the colorings sampled from the above distribution are subgaussian, |x,y|n/2|\langle x,y\rangle|\leq n/2 holds with high probability. To compute the second moment to a good precision, we need a careful comparison of the ratio Pxy/(PxPy){P_{xy}}/({P_{x}\cdot P_{y}}) for any two colorings xx and yy where this event occurs. We show the following bound in this case (proof in Section 2.2).

Claim 2.2 (Strong bound).

For any two colorings x,y𝗌𝗎𝗉𝗉(𝒢)x,y\in\mathsf{supp}(\mathcal{G}), denote ϵ=ϵ(x,y)=x,y/n\epsilon=\epsilon(x,y)={\langle x,y\rangle}/n. If |ϵ|1/2|\epsilon|\leq 1/2, then we have

Px,yPxPyβ(x,y) where β(x,y)=exp(δ1+dϵ2+dδ2ϵ2+δ2ϵMx,My),\displaystyle{P_{x,y}}\leq P_{x}P_{y}\cdot\beta(x,y)\text{ where }\beta(x,y)=\exp\left(\delta_{1}+d\epsilon^{2}+{d\delta^{2}\epsilon^{2}}+{\delta^{2}\epsilon}\cdot\langle Mx,My\rangle\right),

where the scaling factor δ:=dσn\delta:=\frac{\sqrt{d}}{\sigma\sqrt{n}} and the error parameter δ11/poly(d)\delta_{1}\leq 1/\mathrm{poly}(d).

When the low probability event |ϵ|1/2|\epsilon|\geq 1/2 occurs, we use the weak bound Pxymin{Px,Py}P_{xy}\leq\min\{P_{x},P_{y}\}.

As xx is sampled from the truncated Gram-Schmidt distribution, the probabilities PxP_{x} turn out to be almost constant for all colorings x𝗌𝗎𝗉𝗉(𝒢)x\in\mathsf{supp}(\mathcal{G}) as the following claim shows.

Claim 2.3.

For any coloring x𝗌𝗎𝗉𝗉(𝒢)x\in\mathsf{supp}(\mathcal{G}),

Px=exp(δx)p where p:=(δΔ2π)dexp(δ2r22),\displaystyle P_{x}=\exp(\delta_{x})\cdot p\text{ where }p:=\left(\frac{\delta\Delta}{\sqrt{2\pi}}\right)^{d}\exp\left(-\frac{\delta^{2}r^{2}}{2}\right), (7)

with the scaling factor δ:=dσn\delta:=\frac{\sqrt{d}}{\sigma\sqrt{n}} and the error parameter δx\delta_{x} satisfying |δx|δ11/poly(d)|\delta_{x}|\leq\delta_{1}\leq 1/\mathrm{poly}(d).

The proof of this claim is in Section 2.2.

We now focus on the case when |ϵ|1/2|\epsilon|\leq 1/2. When we take x,y𝒢x,y\sim\mathcal{G}, as PxP_{x} and PyP_{y} are essentially constant, by 2.2, applying the second moment method reduces to bounding β(x,y)\beta(x,y) as defined in 2.2. To do this, we will use the properties, as described in Lemma 2.1, of the underlying random variables xx and MxMx. The following technical lemma gives a bound on the exponential moment for such random variables.

Lemma 2.4.

Let XX be a non-negative random variable XX that satisfies

(Xt)dC1et2/8 for any t>0,\displaystyle\mathbb{P}(X\geq t)\leq d^{C_{1}}\cdot e^{-t^{2}/8}\text{ for any }t>0,

for some fixed constant C1>0C_{1}>0. Then for any λ=c2logd\lambda=c_{2}\sqrt{\log d} with c232C1c_{2}\geq\sqrt{32C_{1}},

𝔼[exp(X2/λ2)]1+32C1/c22+od(1).\displaystyle\mathbb{E}[\exp(X^{2}/\lambda^{2})]\leq 1+32C_{1}/c_{2}^{2}+o_{d}(1).

We shall prove this lemma in Section 2.1.

We can now complete the proof of Theorem 1.1 by comparing 𝔼R[S2]\mathbb{E}_{R}[S^{2}] and (𝔼R[S])2(\mathbb{E}_{R}[S])^{2}. We show that

Lemma 2.5.

For n=ω(dlogd)σ4/3n=\omega(d\log d)\sigma^{-4/3}, we have

(𝔼R[S])2=p2(1od(1)) and 𝔼R[S2]=p2(1+od(1)).(\mathbb{E}_{R}[S])^{2}=p^{2}(1-o_{d}(1))\text{ and }\mathbb{E}_{R}[S^{2}]=p^{2}(1+o_{d}(1)).

The above implies that 𝔼R[S2]=(1+o(1))(𝔼R[S])2\mathbb{E}_{R}[S^{2}]=(1+o(1))(\mathbb{E}_{R}[S])^{2}, and thus the Paley-Zygmund inequality implies Theorem 1.1 as discussed in the proof overview.

Proof of Lemma 2.5.

For the first moment, 2.3 implies that

(𝔼R[S])2=𝔼x,y𝒢[PxPy]=p2𝔼[exp(δx+δy)]p2exp(2δ1).(\mathbb{E}_{R}[S])^{2}=\mathbb{E}_{x,y\sim\mathcal{G}}[P_{x}P_{y}]=p^{2}\mathbb{E}[\exp(\delta_{x}+\delta_{y})]\geq p^{2}\exp(-2\delta_{1}).

Since 0<δ11/poly(d)0<\delta_{1}\leq 1/\mathrm{poly}(d), the bound follows.

To compute the second moment, 𝔼R[S2]=𝔼x,y𝒢[Pxy]\mathbb{E}_{R}[S^{2}]=\mathbb{E}_{x,y\sim\mathcal{G}}[P_{xy}], we define \mathcal{E} to be the event that the colorings x,y𝒢x,y\sim\mathcal{G} satisfy |x,y|>n/2|\langle x,y\rangle|>n/2 and compute the contribution to the expectation under \mathcal{E} and its complement separately. In particular, using 2.3 and 2.2, we have

𝔼R[S2]\displaystyle\mathbb{E}_{R}[S^{2}] =𝔼x,y𝒢[Px,y]x,y𝒢[]p+𝔼x,y𝒢[PxPyβ(x,y)𝟏[¯]]\displaystyle=\mathbb{E}_{x,y\sim\mathcal{G}}[P_{x,y}]\leq\mathbb{P}_{x,y\sim\mathcal{G}}[\mathcal{E}]\cdot p+\mathbb{E}_{x,y\sim\mathcal{G}}\big{[}P_{x}P_{y}\beta(x,y)\cdot\mathbf{1}[\overline{\mathcal{E}}]\big{]} (8)

For the first term in (8), since ndn\geq d, Lemma 2.1 implies that

x,y𝒢[]poly(d)en/4en/8.\mathbb{P}_{x,y\sim\mathcal{G}}[\mathcal{E}]\leq\mathrm{poly}(d)\cdot e^{-n/4}\leq e^{-n/8}.

Thus, using the exact bound for pp from 2.3 and that δΔpoly(σ/d)\delta\Delta\leq\mathrm{poly}(\sigma/d), the first term

x,y𝒢[]p=p2[]p1\displaystyle\mathbb{P}_{x,y\sim\mathcal{G}}[\mathcal{E}]\cdot p=p^{2}\cdot\mathbb{P}[\mathcal{E}]\cdot p^{-1} p2en/8(2πδΔ)dexp(δ2r2)\displaystyle\leq p^{2}\cdot e^{-n/8}\cdot\left(\frac{\sqrt{2\pi}}{\delta\Delta}\right)^{d}\exp\left(\frac{\delta^{2}r}{2}\right)
p2en/8exp(O(dlog(dn/σ)+d2/(σ2n)))\displaystyle\leq p^{2}\cdot e^{-n/8}\cdot\exp\left(O\left(d\log(dn/\sigma)+d^{2}/(\sigma^{2}n)\right)\right)
=p2od(1),\displaystyle=p^{2}\cdot o_{d}(1), (9)

when n=ω(dlogd)σ4/3n=\omega(d\log d)\sigma^{-4/3}. In particular, as σ1\sigma\leq 1, we have n/8dlog(dn/σ)+d2/(σ2n)n/8\gg d\log(dn/\sigma)+d^{2}/(\sigma^{2}n).

For the second term in (8), using 2.3, we have that Px=pexp(δx)P_{x}=p\cdot\exp(\delta_{x}) where |δx||δ1|1/poly(d)|\delta_{x}|\leq|\delta_{1}|\leq 1/\mathrm{poly}(d). Thus,

𝔼x,y𝒢[PxPyβ(x,y)𝟏[¯]]\displaystyle\mathbb{E}_{x,y\sim\mathcal{G}}\big{[}P_{x}P_{y}\beta(x,y)\cdot\mathbf{1}[\overline{\mathcal{E}}]\big{]} p2exp(2|δ1|)𝔼[β(x,y)𝟏[¯]]\displaystyle~{}\leq~{}p^{2}\cdot\exp(2|\delta_{1}|)\cdot\mathbb{E}\big{[}\beta(x,y)\cdot\mathbf{1}[\overline{\mathcal{E}}]\big{]}
exp(2|δ1|)𝔼[β(x,y)],\displaystyle~{}\leq~{}\exp(2|\delta_{1}|)\cdot\mathbb{E}\big{[}\beta(x,y)\big{]}, (10)

since β(x,y)\beta(x,y) is a non-negative random variable. Recall from 2.2 that β(x,y)exp(δ1)exp(Z)\beta(x,y)\leq\exp(\delta_{1})\cdot\exp(Z) where δ11/poly(d)\delta_{1}\leq 1/\mathrm{poly}(d) and

Z=dϵ2+2δ2ϵ2r2+2δ2|ϵMx,My|.Z=d\epsilon^{2}+2\delta^{2}\epsilon^{2}r^{2}+2\delta^{2}|\epsilon\langle Mx,My\rangle|.

Renormalizing ϵ¯=x,n1/2y\overline{\epsilon}=\langle x,n^{-1/2}y\rangle and θ¯=Mx,r1My\overline{\theta}=\langle Mx,r^{-1}My\rangle and using that δ=dσn\delta=\frac{\sqrt{d}}{\sigma\sqrt{n}} and rdr\leq\sqrt{d}, we have

Z(dn+2d2σ2n2)ϵ¯2+2ddσ2nn|ϵ¯θ¯|(|ϵ¯|+|θ¯|)2/λmin2,\displaystyle Z~{}\leq~{}\left(\frac{d}{n}+\frac{2d^{2}}{\sigma^{2}n^{2}}\right)\cdot\overline{\epsilon}^{2}+\frac{2d\sqrt{d}}{\sigma^{2}n\sqrt{n}}\cdot|\overline{\epsilon}\cdot\overline{\theta}|~{}\leq~{}(|\overline{\epsilon}|+|\overline{\theta}|)^{2}/\lambda_{\min}^{2},

where we denote

λmin=13min{nd,σ2n22d2,σ2n1.52d1.5}.\lambda_{\min}=\frac{1}{3}\sqrt{\min\left\{\frac{n}{d},\frac{\sigma^{2}n^{2}}{2d^{2}},\frac{\sigma^{2}n^{1.5}}{2d^{1.5}}\right\}}.

Note that λmin=ωd(1)logd\lambda_{\min}=\omega_{d}(1)\cdot\sqrt{\log d} when n=ω(dlogd)σ4/3n=\omega(d\log d)\sigma^{-4/3}.

We now bound the tails of ϵ¯\overline{\epsilon} and θ¯\overline{\theta}, which will allow us to bound 𝔼[exp(Z)]\mathbb{E}[\exp(Z)]. Conditioned on any outcome of y𝒢y\sim\mathcal{G}, and as yn1/2=1\|y\|n^{-1/2}=1, the second property in Lemma 2.1 gives that x𝒢[|x,n1/2y|t]2dCexp(t2/8)\mathbb{P}_{x\sim\mathcal{G}}[|\langle x,n^{-1/2}y\rangle|\geq t]\leq 2d^{C}\exp(-t^{2}/8). Averaging over yy thus gives that

[|ϵ¯|t]2dCet2/8.\mathbb{P}\left[|\overline{\epsilon}|\geq t\right]\leq 2d^{C}\cdot e^{-t^{2}/8}.

Similarly, as Myr11\|My\|r^{-1}\leq 1 for any yy in the support of 𝒢\mathcal{G}, we have that x𝒢[|Mx,r1My|t]2dCexp(t2/8)\mathbb{P}_{x\sim\mathcal{G}}[|\langle Mx,r^{-1}My\rangle|\geq t]\leq 2d^{C}\exp(-t^{2}/8), and averaging over yy gives that

[|θ¯|t]2dCet2/8.\mathbb{P}\left[|\overline{\theta}|\geq t\right]\leq 2d^{C}\cdot e^{-t^{2}/8}.

By a union bound, it follows that the random variable X:=|ϵ¯|+|θ¯|X:=|\overline{\epsilon}|+|\overline{\theta}| satisfies the tail condition of Lemma 2.4 with constant C1=2CC_{1}=2C. So when λmin=ωd(1)logd\lambda_{\min}=\omega_{d}(1)\cdot\sqrt{\log d}, the parameter c2:=λmin/logdc_{2}:=\lambda_{\min}/\sqrt{\log d} in Lemma 2.4 satisfies 32C1/c22=od(1)32C_{1}/c_{2}^{2}=o_{d}(1). Therefore, Lemma 2.4 implies that

𝔼[β(x,y)]\displaystyle\ \mathbb{E}[\beta(x,y)] exp(δ1)𝔼[exp(Z)](1+od(1))(1+od(1))=1+od(1).\displaystyle\leq\exp(\delta_{1})\cdot\mathbb{E}[\exp(Z)]\leq(1+o_{d}(1))(1+o_{d}(1))=1+o_{d}(1).

Plugging the above in (2), it follows that the second term

𝔼x,y𝒢[PxPyβ(x,y)𝟏[¯]]\displaystyle\mathbb{E}_{x,y\sim\mathcal{G}}\big{[}P_{x}P_{y}\beta(x,y)\cdot\mathbf{1}[\overline{\mathcal{E}}]\big{]} p2(1+od(1)).\displaystyle\leq p^{2}(1+o_{d}(1)).

Combining this with (8) and (2), we get that 𝔼R[S2]p2(1+od(1))\mathbb{E}_{R}[S^{2}]\leq p^{2}(1+o_{d}(1)). ∎

We now prove the lemmas and claims used in the proof of Theorem 1.1 above.

2.1 Truncated Gram-Schmidt Distribution and Exponential Moments

Proof of Lemma 2.1.

Consider running the Gram-Schmidt walk algorithm on the matrix MM stacked with the identity matrix, i.e. (MIn)\begin{pmatrix}M\\ I_{n}\end{pmatrix}, and let 𝒢0\mathcal{G}_{0} be the distribution over colorings obtained as an output of the algorithm.

Since each column of the stacked matrix has Euclidean norm at most 22, the properties of the Gram-Schmidt walk (Theorem 1.4) guarantees that (x,Mx)n+d(x,Mx)\in\mathbb{R}^{n+d} where x{±1}nx\in\{\pm 1\}^{n} and MxdMx\in\mathbb{R}^{d} is 2-subgaussian. It follows that both xx and MxMx are 22-subgaussian as well when (x,Mx)𝒢0(x,Mx)\sim\mathcal{G}_{0}.

To obtain a distribution 𝒢\mathcal{G} where Mx2\|Mx\|_{2} is almost constant for each coloring x𝗌𝗎𝗉𝗉(𝒢)x\in\mathsf{supp}(\mathcal{G}), we will truncate the distribution 𝒢0\mathcal{G}_{0} in such a way that the tails are also preserved up to poly(d)\mathrm{poly}(d) factors. Towards this end, we first note that with probability 1ecd1-e^{-cd}, we have that Mx2cd\|Mx\|_{2}\leq c^{\prime}\sqrt{d} for constants cc and cc^{\prime}. This is because for any σ\sigma-subgaussian mean-zero random vector XX, the Euclidean norm of X\|X\| has a subgaussian tail (e.g. Exercise 6.3.5 in [Ver18]). In particular, [X2c1σd+t]ec2t2/σ2\mathbb{P}[\|X\|_{2}\geq c_{1}\sigma\sqrt{d}+t]\leq e^{-c_{2}t^{2}/\sigma^{2}} for some universal constants c1,c2>0c_{1},c_{2}>0. Now, by a pigeonhole argument, for a large enough constant CC^{\prime} there exist an annulus WW with width Δ=dC\Delta=d^{-C^{\prime}} and inner radius rcdr\leq c^{\prime}\sqrt{d} such that x𝒢0(xW)dC\mathbb{P}_{x\sim\mathcal{G}_{0}}(x\in W)\geq d^{-C} for a constant CC depending on CC^{\prime}.

We take the distribution 𝒢\mathcal{G} to be the probability measure of 𝒢0\mathcal{G}_{0} conditioned on the event that MxWMx\in W. It then follows that for any coloring x𝒢x\sim\mathcal{G}, we have |Mx2r|Δ|\|Mx\|_{2}-r|\leq\Delta. Moreover, since xx and MxMx were 22-subgaussian prior to conditioning, and the probability mass of the annulus is at least dCd^{-C}, conditioning can only increase the probability of any event by a factor of dCd^{C}. Thus, the tail bounds as stated in the statement of the lemma also follow. ∎

Proof of Lemma 2.4.

The assumption on XX implies that for any t4C1logdt\geq 4\cdot\sqrt{C_{1}\log d}, we have

(Xt)exp(t2/16).\displaystyle\mathbb{P}(X\geq t)\leq\exp(-t^{2}/16). (11)

We express the expectation as an integration

𝔼[exp(X2/λ2)]\displaystyle\mathbb{E}[\exp(X^{2}/\lambda^{2})]~{} =0[exp(X2/λ2)>s]𝑑s=0(Xλlogs)𝑑s\displaystyle=\int_{0}^{\infty}\mathbb{P}[\exp(X^{2}/\lambda^{2})>s]ds\ =\int_{0}^{\infty}\mathbb{P}(X\geq\lambda\sqrt{\log s})ds
1+c3+1+c3(Xλlogs)𝑑s.\displaystyle\leq 1+c_{3}+\int_{1+c_{3}}^{\infty}\mathbb{P}(X\geq\lambda\sqrt{\log s})ds.

Let us set c3=32C1/c22c_{3}={32C_{1}}/c_{2}^{2}, so that c3c_{3}\leq as c232C1c_{2}\geq\sqrt{32C_{1}}. For s1+c3s\geq 1+c_{3}, we have

λlogsc2logdc3/24C1logd\lambda\sqrt{\log s}\geq c_{2}\sqrt{\log d}\cdot\sqrt{c_{3}/2}\geq 4\cdot\sqrt{C_{1}\log d}

using that log(1+x)x/2\sqrt{\log(1+x)}\geq\sqrt{x/2} for x[0,1]x\in[0,1] and as c31c_{3}\leq 1. So the condition t4C1logdt\geq 4\cdot\sqrt{C_{1}\log d} for (11) is satisfied whenever s1+c3s\geq 1+c_{3}, and applying (11) to the above integration gives

1+c3(Xλlogs)𝑑s\displaystyle\int_{1+c_{3}}^{\infty}\mathbb{P}(X\geq\lambda\sqrt{\log s})ds 1+c3exp(λ2logs/16)𝑑s\displaystyle\leq\int_{1+c_{3}}^{\infty}\exp(-\lambda^{2}\log s/16)ds
=(λ2161)1(1+c3)λ2/16+1exp(c22c3logd/16).\displaystyle=\left(\frac{\lambda^{2}}{16}-1\right)^{-1}\cdot(1+c_{3})^{-\lambda^{2}/16+1}~{}\leq~{}\exp(-c_{2}^{2}c_{3}\log d/16).

By our choice of c3c_{3}, the above is at most d2C1d^{-2C_{1}}. Thus, it follows that 𝔼[exp(X2/λ2)]1+32C1/c22+d2C1\mathbb{E}[\exp(X^{2}/\lambda^{2})]\leq 1+32C_{1}/c_{2}^{2}+d^{-2C_{1}}. This proves the lemma. ∎

2.2 Proof of Claims from Section 2

Proof of 2.2.

Since the rows of RR are independent, to compute the above ratio, it suffices to compute the ratio for a single row of M+RM+R. Fix i[d]i\in[d], and let m=mim=m_{i} and r=rir=r_{i} denote the ithi^{\text{th}} row and define a=ai(x):=mxa=a_{i}(x):=-m^{\top}x and b=bi(y):=myb=b_{i}(y):=-m^{\top}y. We want to compare the ratio of (rx[a±Δ],ry[b±Δ])\mathbb{P}(r^{\top}x\in[a\pm\Delta],r^{\top}y\in[b\pm\Delta]) to (rx[a±Δ])(ry[b±Δ])\mathbb{P}(r^{\top}x\in[a\pm\Delta])\cdot\mathbb{P}(r^{\top}y\in[b\pm\Delta]).

Notice that rxr^{\top}x and ryr^{\top}y are Gaussian random variables with mean 0, variance 1/δ21/\delta^{2}, and covariance 𝔼r[rxry]=𝔼r[xrry]=ϵ/δ2.\mathbb{E}_{r}[r^{\top}xr^{\top}y]~{}=~{}\mathbb{E}_{r}[x^{\top}rr^{\top}y]~{}=~{}\epsilon/\delta^{2}. Denoting the square K:=[a±Δ]×[b±Δ]K:=[a\pm\Delta]\times[b\pm\Delta], we have that

(rx[a±Δ],ry[b±Δ])=μϵ(δK),\mathbb{P}(r^{\top}x\in[a\pm\Delta],r^{\top}y\in[b\pm\Delta])=\mu_{\epsilon}(\delta K),

where μϵ\mu_{\epsilon} is the 2-dimensional centered Gaussian measure with covariance matrix (1ϵϵ1)\left(\begin{matrix}1&\epsilon\\ \epsilon&1\end{matrix}\right) and δK\delta K denotes the δ\delta scaling of KK. Similarly, we can write (rx[a±Δ])(ry[b±Δ])=μ(δK)\mathbb{P}(r^{\top}x\in[a\pm\Delta])\cdot\mathbb{P}(r^{\top}y\in[b\pm\Delta])=\mu(\delta K), where μ\mu is the standard 2-dimensional Gaussian measure.

We will compare the ratio μϵ(δK))/μ(δK)\mu_{\epsilon}(\delta K))/\mu(\delta K) by approximating the Gaussian measure over δK\delta K with the density at the center and show the following bound

μϵ(δK))/μ(δK)exp(3α+ϵ2+δ2ϵ2(a2+b2)+2δ2ϵab).\displaystyle\ \mu_{\epsilon}(\delta K))/\mu(\delta K)\leq\exp\left(3\alpha+\epsilon^{2}+{\delta^{2}\epsilon^{2}(a^{2}+b^{2})}+2{\delta^{2}\epsilon ab}\right). (12)

Since (a1(x),,ad(x))=Mx(a_{1}(x),\ldots,a_{d}(x))=Mx and (b1(x),,bd(x))=My(b_{1}(x),\ldots,b_{d}(x))=My, using the above bound for all the rows i[d]i\in[d], we have

Px,yPxPyexp(3dα+dϵ2+δ2ϵ2(Mx22+My22)+δ2ϵMx,My).\displaystyle\frac{P_{x,y}}{P_{x}P_{y}}\leq\exp\left(3d\alpha+d\epsilon^{2}+{\delta^{2}\epsilon^{2}}\cdot(\|Mx\|_{2}^{2}+\|My\|_{2}^{2})+{\delta^{2}\epsilon}\cdot\langle Mx,My\rangle\right).

Since Mx22d+poly(1/d)\|Mx\|_{2}^{2}\leq d+\mathrm{poly}(1/d) for every x𝗌𝗎𝗉𝗉(𝒢)x\in\mathsf{supp}(\mathcal{G}), taking δ1=4dα\delta_{1}=4d\alpha gives the statement of the claim. To finish the proof we prove (12) now.

Abusing notation and denoting by μ(s,t)\mu(s,t) and μϵ(s,t)\mu_{\epsilon}(s,t) the corresponding densities at (s,t)2(s,t)\in\mathbb{R}^{2}, we have the following explicit formula for the density μϵ\mu_{\epsilon}:

μϵ(s,t)=12π1ϵ2exp(s2+t22ϵst2(1ϵ2)).\displaystyle\mu_{\epsilon}(s,t)=\frac{1}{2\pi\sqrt{1-\epsilon^{2}}}\cdot\exp\left(-\frac{s^{2}+t^{2}-2\epsilon st}{2(1-\epsilon^{2})}\right).

Since the edge length of the square δK\delta K is 2δΔ2\delta\Delta, whenever |ϵ|1/2|\epsilon|\leq 1/2, a direct calculation with the densities shows that

sup(s,t)δKμ(s,t)inf(s,t)δKμ(s,t)=exp(2δ2Δ(|a|+|b|))exp(2δ2Δ(|a|+|b|+2Δ))exp(δ1),\displaystyle\frac{\sup_{(s,t)\in\delta K}\mu(s,t)}{\inf_{(s,t)\in\delta K}\mu(s,t)}~{}=~{}\exp\big{(}2\delta^{2}\Delta(|a|+|b|)\big{)}~{}\leq~{}\exp\big{(}2\delta^{2}\Delta(|a|+|b|+2\Delta)\big{)}~{}\leq~{}\exp(\delta_{1}),

and that

sup(s,t)δKμϵ(s,t)inf(s,t)δKμϵ(s,t)exp(4δ2Δ(|a|+|b|+2Δ))exp(2δ1),\displaystyle\frac{\sup_{(s,t)\in\delta K}\mu_{\epsilon}(s,t)}{\inf_{(s,t)\in\delta K}\mu_{\epsilon}(s,t)}\leq\exp(4\delta^{2}\Delta(|a|+|b|+2\Delta))\leq\exp(2\delta_{1}),

where δ1\delta_{1} is as defined in the claim. It follows that whenever |ϵ|1/2|\epsilon|\leq 1/2, we can use the density at the center of KK to obtain

μϵ(K)μ(K)\displaystyle\frac{\mu_{\epsilon}(K)}{\mu(K)} exp(3δ1)μϵ(δa,δb)μ(δa,δb)=11ϵ2exp(3δ1+δ2ϵ2(a2+b2)2(1ϵ2)+δ2ϵab1ϵ2)\displaystyle\leq\exp(3\delta_{1})\cdot\frac{\mu_{\epsilon}(\delta a,\delta b)}{\mu(\delta a,\delta b)}=\frac{1}{\sqrt{1-\epsilon^{2}}}\cdot\exp\left(3\delta_{1}+\frac{\delta^{2}\epsilon^{2}(a^{2}+b^{2})}{2(1-\epsilon^{2})}+\frac{\delta^{2}\epsilon ab}{1-\epsilon^{2}}\right)
exp(3δ1+ϵ2+δ2ϵ2(a2+b2)+2δ2ϵab),\displaystyle\leq\exp\big{(}3\delta_{1}+\epsilon^{2}+{\delta^{2}\epsilon^{2}(a^{2}+b^{2})}+2{\delta^{2}\epsilon ab}\big{)},

thus proving (12). ∎

Proof of 2.3.

We have Px=i[d][rix[ai±Δ]]P_{x}=\prod_{i\in[d]}\mathbb{P}[r_{i}^{\top}x\in[a_{i}\pm\Delta]]. For any fixed i[d]i\in[d], rixr_{i}^{\top}x is distributed as 𝒩(0,1/δ2)\mathcal{N}(0,1/\delta^{2}), so after scaling the quantity [rix[ai±Δ]]=μ(δI)\mathbb{P}[r_{i}^{\top}x\in[a_{i}\pm\Delta]]=\mu(\delta\cdot I) where I=[ai±Δ]I=[a_{i}\pm\Delta] and μ\mu is the standard Gaussian measure in \mathbb{R}. Analogous to the proof of 2.2, one can approximate the Gaussian density at any at the point in II by the center point aa, and compute similarly to the proof of 2.2 that

Px=i[d][rix[ai±Δ]]=(δΔ2π)dexp(αxδ2Mx222),\displaystyle P_{x}=\prod_{i\in[d]}\mathbb{P}[r_{i}^{\top}x\in[a_{i}\pm\Delta]]=\left(\frac{\delta\Delta}{\sqrt{2\pi}}\right)^{d}\exp\left(\alpha_{x}-\frac{\delta^{2}\|Mx\|_{2}^{2}}{2}\right),

for some small error |αx|2δ2Δ(Mx1+dΔ)|\alpha_{x}|\leq 2\delta^{2}\Delta(\|Mx\|_{1}+d\Delta). As Mx2[r±Δ]\|Mx\|_{2}\in[r\pm\Delta] and r=O(d)r=O(\sqrt{d}), we have that Mx1=O(d)\|Mx\|_{1}=O(d) and the statement of the claim follows for some δx|αx|+1/poly(d)1/poly(d)\delta_{x}\leq|\alpha_{x}|+1/\mathrm{poly}(d)\leq 1/\mathrm{poly}(d). ∎

3 Conclusion

For the Komlós problem, as studied in this paper, Gaussian noise is a natural way to model a smoothed analysis setting since the input vectors have Euclidean norm at most one. One can wonder whether similar results can be obtained with more general noise models, for instance, Bernoulli or other discrete noise models. Such noise models are also more conducive for smoothed analysis in other discrepancy settings, such as for the Beck-Fiala problem. The weighted second moment approach used here can also handle Bernoulli noise when the number of vectors nd2n\gg d^{2} but the second moment becomes difficult to control when nn is smaller. It remains an interesting open problem to see if Bernoulli or other discrete noise models can be handled for the regime ndlogdn\gg d\log d.

References

  • [AN21] Dylan J. Altschuler and Jonathan Niles-Weed. The discrepancy of random rectangular matrices. CoRR, abs/2101.04036, 2021.
  • [AP04] Dimitris Achlioptas and Yuval Peres. The threshold for random kk-sat is 2klog2o(k)2^{k}\log 2-o(k). J. Amer. Math. Soc., (4):947––973, 2004.
  • [AS16] Noga Alon and Joel H. Spencer. The Probabilistic Method. John Wiley & Sons, 2016.
  • [Ban98] Wojciech Banaszczyk. Balancing vectors and Gaussian measures of nn-dimensional convex bodies. Random Struct. Algorithms, 12(4):351–360, 1998.
  • [BDGL19] Nikhil Bansal, Daniel Dadush, Shashwat Garg, and Shachar Lovett. The Gram-Schmidt walk: A cure for the Banaszczyk blues. Theory Comput., 15:1–27, 2019.
  • [BF81] József Beck and Tibor Fiala. “Integer-making” theorems. Discrete Appl. Math., 3(1):1–8, 1981.
  • [BJM+22] Nikhil Bansal, Haotian Jiang, Raghu Meka, Sahil Singla, and Makrand Sinha. Prefix discrepancy, smoothed analysis, and combinatorial vector balancing. In Proceedings of ITCS, pages 13:1–13:22, 2022.
  • [BM19] Nikhil Bansal and Raghu Meka. On the discrepancy of random low degree set systems. In Proceedings of SODA 2019, pages 2557–2564, 2019.
  • [EL19] Esther Ezra and Shachar Lovett. On the Beck-Fiala conjecture for random set systems. Random Struct. Algorithms, 54(4):665–675, 2019.
  • [FS20] Cole Franks and Michael Saks. On the discrepancy of random matrices with many columns. Random Struct. Algorithms, 57(1):64–96, 2020.
  • [HR19] Rebecca Hoberg and Thomas Rothvoss. A Fourier-Analytic Approach for the Discrepancy of Random Set Systems. In Proceedings of SODA, pages 2547–2556, 2019.
  • [HRS21] Nika Haghtalab, Tim Roughgarden, and Abhishek Shetty. Smoothed analysis with adaptive adversaries. In Proceedings of FOCS, 2021.
  • [HSSZ19] Christopher Harshaw, Fredrik Sävje, Daniel Spielman, and Peng Zhang. Balancing covariates in randomized experiments using the gram-schmidt walk. arXiv e-prints, pages arXiv–1911, 2019.
  • [MMPPG21] Calum MacRury, Tomáš Masařík, Leilani Pai, and Xavier Pérez-Giménez. The phase transition of discrepancy in random hypergraphs, 2021.
  • [Pot18] Aditya Potukuchi. Discrepancy in random hypergraph models. CoRR, abs/1811.01491, 2018.
  • [Pot20] Aditya Potukuchi. A spectral bound on hypergraph discrepancy. In Proceedings of ICALP, pages 93:1–93:14, 2020.
  • [ST04] Daniel A Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM (JACM), 51(3):385–463, 2004.
  • [TMR20] Paxton Turner, Raghu Meka, and Philippe Rigollet. Balancing gaussian vectors in high dimension. In Proceedings of COLT, pages 3455–3486, 2020.
  • [Ver18] Roman Vershynin. High-Dimensional Probability, volume 1. Cambridge University Press, 2018.