This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Structure vs. Randomness for Bilinear Maps

Alex Cohen    Guy Moshkovitz This research was done as part of the 2020 NYC Discrete Math REU, supported by NSF awards DMS-1802059, DMS-1851420, and DMS-1953141. An earlier version of this paper appeared in the proceedings of the 53rd ACM Symposium on Theory of Computing (STOC 2021).
Abstract

We prove that the slice rank of a 33-tensor (a combinatorial notion introduced by Tao in the context of the cap-set problem), the analytic rank (a Fourier-theoretic notion introduced by Gowers and Wolf), and the geometric rank (an algebro-geometric notion introduced by Kopparty, Moshkovitz, and Zuiddam) are all equal up to an absolute constant. As a corollary, we obtain strong trade-offs on the arithmetic complexity of a biased bilinear map, and on the separation between computing a bilinear map exactly and on average. Our result settles open questions of Haramaty and Shpilka [STOC 2010], and of Lovett [Discrete Anal. 2019] for 33-tensors.

\dajAUTHORdetails

title = Structure vs. Randomness for Bilinear Maps, author = Alex Cohen and Guy Moshkovitz, plaintextauthor = Alex Cohen and Guy Moshkovitz, plaintexttitle = Structure vs. Randomness for Bilinear Maps, keywords = bilinear complexity, tensors, algebraic geometry, \dajEDITORdetailsyear=2022, number=12, received=19 August 2021, published=3 October 2022, doi=10.19086/da.38587,

[classification=text]

1 Introduction

Bilinear maps stand at the forefront of many basic questions in combinatorics and theoretical computer science. A bilinear map is, intuitively, just a collection of matrices. Formally, a bilinear map f:𝔽n1×𝔽n2𝔽mf\colon\mathbb{F}^{n_{1}}\times\mathbb{F}^{n_{2}}\to\mathbb{F}^{m}, where 𝔽\mathbb{F} is any field, is a map f(𝐱,𝐲)=(f1(𝐱,𝐲),,fm(𝐱,𝐲))f(\mathbf{{x}},\mathbf{{y}})=(f_{1}(\mathbf{{x}},\mathbf{{y}}),\ldots,f_{m}(\mathbf{{x}},\mathbf{{y}})) whose every component fkf_{k} is a bilinear form fk(𝐱,𝐲)=i,jai,j,kxiyjf_{k}(\mathbf{{x}},\mathbf{{y}})=\sum_{i,j}a_{i,j,k}x_{i}y_{j}, or equivalently, 𝐱TAk𝐲\mathbf{{x}}^{T}A_{k}\mathbf{{y}} for some matrix Ak𝔽n1×n2A_{k}\in\mathbb{F}^{n_{1}\times n_{2}}. While linear maps are thoroughly understood thanks to linear algebra, bilinear maps are—in more than one way—still very much a mystery.

1.1 Structure vs. randomness

In this paper we prove a tight relation between the slice rank and the analytic rank of bilinear maps, or 33-tensors. Our proof crucially uses the notion of geometric rank as an intermediary, enabling the use of tools from algebraic geometry to ultimately prove that these three notions of rank are in fact equivalent up to a constant.

A 33-tensor (or sometimes simply a tensor) over a field 𝔽\mathbb{F} is a three-dimensional matrix (ai,j,k)i,j,k𝔽n1×n2×n3(a_{i,j,k})_{i,j,k}\in\mathbb{F}^{n_{1}\times n_{2}\times n_{3}} with entries ai,j,k𝔽a_{i,j,k}\in\mathbb{F}. Equivalently, a tensor can be thought of as a degree-33 polynomial, namely, a trilinear form T(𝐱,𝐲,𝐳)=i,j,kai,j,kxiyjzkT(\mathbf{{x}},\mathbf{{y}},\mathbf{{z}})=\sum_{i,j,k}a_{i,j,k}x_{i}y_{j}z_{k} with coefficients ai,j,k𝔽a_{i,j,k}\in\mathbb{F}, where 𝐱=(x1,,xn1)\mathbf{{x}}=(x_{1},\ldots,x_{n_{1}}), 𝐲=(y1,,yn2)\mathbf{{y}}=(y_{1},\ldots,y_{n_{2}}), 𝐳=(z1,,zn3)\mathbf{{z}}=(z_{1},\ldots,z_{n_{3}}).111A trilinear form means that every monomial has exactly one variable from 𝐱\mathbf{{x}}, one from 𝐲\mathbf{{y}}, and one from 𝐳\mathbf{{z}}, and so is linear separately in each of 𝐱\mathbf{{x}},𝐲\mathbf{{y}}, and 𝐳\mathbf{{z}}. Note that a tensor is just a symmetric way to think of a bilinear map f:𝔽n1×𝔽n2𝔽n3f\colon\mathbb{F}^{n_{1}}\times\mathbb{F}^{n_{2}}\to\mathbb{F}^{n_{3}}, where f=(f1,,fn3)f=(f_{1},\ldots,f_{n_{3}}) with fk(𝐱,𝐲)=i,jai,j,kxiyjf_{k}(\mathbf{{x}},\mathbf{{y}})=\sum_{i,j}a_{i,j,k}x_{i}y_{j}; indeed, each fkf_{k} corresponds to a slice (ai,j,k)i,j(a_{i,j,k})_{i,j} of TT.222Yet another point of view is that a 33-tensor is a member of the vector space V1V2V3V_{1}\otimes V_{2}\otimes V_{3}, where V1=𝔽n1,V2𝔽n2,V3=𝔽n3V_{1}=\mathbb{F}^{n_{1}},V_{2}\in\mathbb{F}^{n_{2}},V_{3}=\mathbb{F}^{n_{3}} are finite-dimensional vector spaces over 𝔽\mathbb{F}. As opposed to matrices, which have only one notion of rank, there are multiple notions of rank for 33-tensors. The notions of rank of 33-tensors we consider are defined as follows:

  • The slice rank of TT, denoted SR(T)\operatorname{SR}(T), is the smallest rr\in\mathbb{N} such that TT can be decomposed as T=i=1rfigiT=\sum_{i=1}^{r}f_{i}g_{i} where fif_{i} is an 𝔽\mathbb{F}-linear form in either the 𝐱\mathbf{{x}}, 𝐲\mathbf{{y}}, or 𝐳\mathbf{{z}} variables and gig_{i} is an 𝔽\mathbb{F}-bilinear form in the remaining two sets of variables, for each ii.

  • The analytic rank of TT over a finite field 𝔽\mathbb{F} is given by AR(T)=log|𝔽|𝔼𝐱,𝐲,𝐳χ(T(𝐱,𝐲,𝐳))\operatorname{AR}(T)=-\log_{|\mathbb{F}|}\operatorname*{\mathbb{E}}_{\mathbf{{x}},\mathbf{{y}},\mathbf{{z}}}\chi(T(\mathbf{{x}},\mathbf{{y}},\mathbf{{z}})),333We use 𝔼𝐱\operatorname*{\mathbb{E}}_{\mathbf{{x}}} to denote averaging, so 𝔼𝐱𝔽𝐧\operatorname*{\mathbb{E}}_{\mathbf{{x\in\mathbb{F}^{n}}}} stands for |𝔽|n𝐱𝔽n|\mathbb{F}|^{-n}\sum_{\mathbf{{x}}\in\mathbb{F}^{n}}. where we fix χ\chi to be any nontrivial additive character of 𝔽\mathbb{F} (e.g., χ(x)=exp(2πix/p)\chi(x)=\exp(2\pi ix/p) when 𝔽=𝔽p\mathbb{F}=\mathbb{F}_{p} is prime).

  • The geometric rank of TT, viewed as a bilinear map ff,444Although permuting 𝐱,𝐲,𝐳\mathbf{{x}},\mathbf{{y}},\mathbf{{z}} gives rise to three distinct bilinear maps corresponding to TT, the definition of GR(T)\operatorname{GR}(T) is invariant under them (see Theorem 3.1 in [30]). is defined as GR(T)=codimkerf\operatorname{GR}(T)=\operatorname{codim}\ker f, the codimension of the algebraic variety kerf={(𝐱,𝐲)𝔽¯n1×𝔽¯n2f(𝐱,𝐲)=𝟎}\ker f=\{(\mathbf{{x}},\mathbf{{y}})\in\overline{\mathbb{F}}^{n_{1}}\times\overline{\mathbb{F}}^{n_{2}}\mid f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{0}}\}.

We note that all three notions above generalize matrix rank. Moreover, just like matrix rank, for any T𝔽n×n×nT\in\mathbb{F}^{n\times n\times n} all three quantities lie in the range [0,n][0,n]. Furthermore, for the n×n×nn\times n\times n identity tensor InI_{n} we have SR(In)=GR(In)=n\operatorname{SR}(I_{n})=\operatorname{GR}(I_{n})=n and AR(In)=(1o|𝔽|(1))n\operatorname{AR}(I_{n})=(1-o_{|\mathbb{F}|}(1))n. The slice rank was defined by Tao [40] in the context of the solution of the cap-set problem (similar notions have been considered before in other areas of research). The analytic rank was introduced by Gowers and Wolf [19] in the context of higher-order Fourier analysis. Roughly, this notion measures how close to uniform is the distribution of the values of the polynomial corresponding to the tensor. The geometric rank is an algebro-geometric notion of rank that was recently introduced in [30]. (A related notion was applied by Schmidt [37] in the context of number theory.) Intuitively, it measures the number of “independent” components of the corresponding bilinear map. We use it as a geometric analogue of the bias of the (output distribution of the) bilinear map.

Understanding the structure of dd-tensors, or dd-dimensional matrices, that have low analytic rank is important in many applications of the structure-vs-randomness dichotomy, in additive combinatorics, coding theory and more (see, e.g., [4, 20]). A recent breakthrough, obtained independently by Milićević [33] and by Janzer [25], showed that the partition rank of a dd-tensor, which is a generalization of slice rank to dd-tensors, is bounded from above by roughly AR(T)22𝗉𝗈𝗅𝗒(d)\operatorname{AR}(T)^{2^{2^{{\sf poly}(d)}}}. For fixed dd this is a polynomial bound, which proves a conjecture of Kazhdan and Ziegler [28]. Lovett [31], as others have, asks whether in fact a linear upper bound holds. For 33-tensors, the best known bound until this work was SR(T)O(AR(T)4)\operatorname{SR}(T)\leq O(\operatorname{AR}(T)^{4}) by Haramaty and Shpilka [22]. They write: “It is an interesting open question to decide whether we can do only with the j=1O(log|𝔽|1/δ)iqi\sum_{j=1}^{O(\log_{|\mathbb{F}|}1/\delta)}\ell_{i}\cdot q_{i} part”, which refers to a linear upper bound SR(T)O(AR(T))\operatorname{SR}(T)\leq O(\operatorname{AR}(T)). Our main result is as follows.

Theorem 1 (Main result).

For any 33-tensor TT over a field 𝔽\mathbb{F},

SR(T)3GR(T)8.13AR(T)\operatorname{SR}(T)\leq 3\operatorname{GR}(T)\leq 8.13\operatorname{AR}(T)

where the first inequality holds over any perfect field555Commonly considered fields are perfect, including: any field of characteristic zero, any algebraically closed field, and any finite field., and the second for any finite field 𝔽𝔽2\mathbb{F}\neq\mathbb{F}_{2}.

We note that the reverse inequalities are easy: GR(T)SR(T)\operatorname{GR}(T)\leq\operatorname{SR}(T) (see Theorem 4.1 in [30]) and AR(T)SR(T)\operatorname{AR}(T)\leq\operatorname{SR}(T) (see Lemma 2.2 of [27] or [31]). Thus, as mentioned above, an immediate—and perhaps surprising—corollary of Theorem 1 is that the combinatorial notion SR(T)\operatorname{SR}(T), the algebro-geometric notion GR(T)\operatorname{GR}(T), and the analytic notion AR(T)\operatorname{AR}(T) are all, up to a constant, equivalent notions of rank. In particular, if one wants to estimate the slice rank of a 33-tensor, as Tao did in a solution of the cap-set problem [40], then it is necessary and sufficient to instead estimate the bias of the tensor.

1.2 Complexity vs. bias

The importance of bilinear maps in theoretical computer science cannot be overstated. One example, in the area of algebraic algorithms, is matrix multiplication. Note that the operation of multiplying two matrices X,Y𝔽m×mX,Y\in\mathbb{F}^{m\times m} is a bilinear map 𝖬𝖬n:𝔽n×𝔽n𝔽n{\sf{MM}}_{n}\colon\mathbb{F}^{n}\times\mathbb{F}^{n}\to\mathbb{F}^{n} with n=m2n=m^{2}, as every entry of XYXY is a bilinear form in the entries of XX and YY. It has been a persistent challenge to upper bound the arithmetic complexity of matrix multiplication, that is, the minimum number of +,,,÷+,-,\cdot,\div operations over 𝔽\mathbb{F} required to express 𝖬𝖬n{\sf{MM}}_{n} in terms of its variables. Current research puts the complexity of 𝖬𝖬n{\sf{MM}}_{n} below O(n1.2)O(n^{1.2}) (the state of the art is O(n1.18643)O(n^{1.18643}) due to Alman and Williams [2]), with the ultimate goal of getting all the way down to n1+o(1)n^{1+o(1)}. For another example of the challenge of bilinear maps, this time in the area of circuit complexity, we mention that explicitly finding even a single bilinear map666Or, equivalently, a single degree-33 polynomial k=1Nfk(𝐱,𝐲)zk\sum_{k=1}^{N}f_{k}(\mathbf{{x}},\mathbf{{y}})z_{k}. f:𝔽n×𝔽n𝔽nf\colon\mathbb{F}^{n}\times\mathbb{F}^{n}\to\mathbb{F}^{n} with provably superlinear arithmetic complexity, say Ω(n1.001)\Omega(n^{1.001}), would imply the first such lower bound in circuit complexity. This should be compared with the fact that almost every bilinear map f:𝔽n×𝔽n𝔽nf\colon\mathbb{F}^{n}\times\mathbb{F}^{n}\to\mathbb{F}^{n} has arithmetic complexity Θ(n2)\Theta(n^{2}). Finally, in the area of identity testing, it was shown by Valiant [41] that identity testing of formulas reduces to deciding whether a given bilinear map f:𝔽n×𝔽n𝔽mf\colon\mathbb{F}^{n}\times\mathbb{F}^{n}\to\mathbb{F}^{m} has full commutative rank, meaning a linear combination of its components fif_{i} has full rank nn. Whether this can be decided efficiently remains an open question, despite being raised by Edmonds [14] in the early days of computer science, and it has close ties with a variety of other topics, from perfect matchings in bipartite graphs to matrix scaling (see [18]).

Given the importance of bilinear maps, we propose studying other foundational questions of theoretical computer science via the lens of bilinear maps. Consider Mahaney’s Theorem [32], a classical result in computational complexity. It states that, assuming 𝖯𝖭𝖯{\sf{P}}\neq~{}{\sf{NP}}, no 𝖭𝖯{\sf{NP}}-hard language is sparse. Phrased differently, if a boolean function f:{0,1}{0,1}f\colon\{0,1\}^{*}\to\{0,1\} is “extremely” biased in the sense that |f1(1){0,1}n|𝗉𝗈𝗅𝗒(n)|f^{-1}(1)\cap\{0,1\}^{n}|\leq{\sf poly}(n), then it is not 𝖭𝖯{\sf{NP}}-hard. Multiple other classical results in the same vein have been proved (see [17, 24, 34, 8, 9, 21]), giving implications of such extreme bias for various complexity classes. This raises the following fundamental question.

Question.

For a given class of functions, equipped with notions of complexity and bias, what is the best complexity upper bound in terms of bias?777Contrapositively, this quantifies the phenomenon of high complexity functions exhibiting little bias.

We make progress towards the above Question for the class of bilinear maps ff; our notion of complexity is multiplicative complexity C(f)\operatorname{C}^{*}(f), which is the number of (non-scalar) multiplications needed to compute ff by an arithmetic circuit; our notion of bias is the min-entropy H(f)\operatorname{H_{\infty}}(f) of the output distribution of ff. See Section 5 for a more formal discussion and the proof.

Proposition 1.1.

For any bilinear map f:𝔽n×𝔽n𝔽nf\colon\mathbb{F}^{n}\times\mathbb{F}^{n}\to\mathbb{F}^{n} over any finite field 𝔽𝔽2\mathbb{F}\neq\mathbb{F}_{2},

C(f)=O(H(f)log2|𝔽|n).\operatorname{C}^{*}(f)=O\Big{(}\frac{\operatorname{H_{\infty}}(f)}{\log_{2}|\mathbb{F}|}n\Big{)}.

Another closely related classical result says that, assuming 𝖯𝖭𝖯{\sf{P}}\neq{\sf{NP}}, it is impossible to efficiently solve 𝖲𝖠𝖳{\sf{SAT}} on all but at most polynomially many inputs (this is sometimes phrased as saying that 𝖲𝖠𝖳{\sf{SAT}} is not “𝖯{\sf{P}}-close”). Again, this raises a fundamental question: For a given class of functions, what is the best possible approximation of a hard function? Put differently, what is the best possible worst-case to average-case reduction? For bilinear maps, we give an optimal answer to this question; again, see Section 5 for a more formal discussion and the proof. We say that maps f,gf,g are δ\delta-close if Prx[f(x)=g(x)]=δ\Pr_{x}[f(x)=g(x)]=\delta,888We use Prx\Pr_{x} to denote probability under a uniform choice of xx. and denote by SR(f)\operatorname{SR}(f) the slice rank of the 33-tensor corresponding to a bilinear map ff.

Proposition 1.2.

Let 𝔽𝔽2\mathbb{F}\neq\mathbb{F}_{2} be a finite field. Any two bilinear maps f,g:𝔽n𝔽mf,g\colon\mathbb{F}^{n}\to\mathbb{F}^{m} that are δ\delta-close satisfy

|SR(f)SR(g)|O(log|𝔽|(1/δ)).|\operatorname{SR}(f)-\operatorname{SR}(g)|\leq O(\log_{|\mathbb{F}|}(1/\delta)).

Moreover, this bound is best possible up to the implicit absolute constant.

We note that Kaufman and Lovett [26] prove such a reduction for degree-dd polynomials over general finite fields, improving a previous result by Green and Tao [20]. However, their reduction is qualitative in nature and the implied bounds are far from optimal (see the next subsection for more discussion on previous results and techniques).

1.3 Proof overview

Our proof for the bound SR(T)=O(AR(T))\operatorname{SR}(T)=O(\operatorname{AR}(T)) in Theorem 1 (and ultimately for complexity-vs-bias trade-offs for bilinear maps in Section 5) goes through an algebraically closed field—despite the statement ostensibly being about polynomials over finite fields. We use the concepts of dimension and tangent spaces from algebraic geometry to obtain our slice rank decomposition, which ends up yielding the bound SR(T)=O(GR(T))\operatorname{SR}(T)=O(\operatorname{GR}(T)) (Theorem 3.1). To finish the proof of Theorem 1, we prove a new generalization of the Schwartz-Zippel lemma appropriate for our setting, which yields GR(T)=O(AR(T))\operatorname{GR}(T)=O(\operatorname{AR}(T)) (Proposition 4.1).

To obtain the slice rank decomposition mentioned above, we first prove a result about linear spaces of matrices in which low-rank matrices have high dimension: we show that in any such space, one can always find a somewhat large decomposable subspace (Proposition 3.4, Item (2)); following Atkinson and Lloyd [3] (see also [16]), a space of matrices is decomposable if, roughly speaking, there is a basis where all the matrices have the same block of zeros. This result is proved by looking at a tangent space to a determinantal variety at a non-singular point, which turns out to be a decomposable matrix space in the sense above (Proposition 2.4). We further show that such a matrix space can be thought of as a 33-tensor of low slice rank (Lemma 2.3). Since the above is proved by applying algebro-geometric tools on varieties, which naturally live over the algebraic closure of our finite field, the resulting slice rank decomposition has coefficients in the closure; however, using a result of Derksen [11], one can convert a small slice rank decomposition over the closure into a decomposition over the base field (Proposition 3.2). Finally, we use a result of [30] to combine the slice rank information we obtained above into a bound on the geometric rank (Fact 3.5).

We note that our arguments diverge from proofs used in previous works. In particular, we do not use results from additive combinatorics at all, nor do we use any “regularity lemma” for polynomials or notions of quasi-randomness. Instead, our arguments use a combination of algebraic and geometric ideas, which perhaps helps explain why we are able to obtain linear upper bounds.

Paper organization.

We begin Section 2 by giving a brief review of a few basic concepts from algebraic geometry, then determine the behavior of certain tangent spaces, and end by proving a slice rank upper bound related to these tangent spaces. The first and second inequalities of Theorem 1 are proved in Section 3 and in Section 4, respectively. In Section 5 we prove Proposition 1.1 and Proposition 1.2. We end with some discussion and open questions in Section 6.

2 Tangent spaces and slice rank

2.1 Algebraic geometry essentials

We will need only a very small number of basic concepts from algebraic geometry, which we quickly review next. All the material here can be found in standard textbooks (e.g., [23, 36]). A variety 𝐕\mathbf{{V}} is the set of solutions, in an algebraically closed field, of some finite set of polynomials. More formally, for a field 𝔽\mathbb{F},999We henceforth denote by 𝔽¯\overline{\mathbb{F}} the algebraic closure of the field 𝔽\mathbb{F}. the variety 𝐕𝔽¯n\mathbf{{V}}\subseteq\overline{\mathbb{F}}^{n} cut out by the polynomials f1,,fm𝔽[x1,,xn]f_{1},\ldots,f_{m}\in\mathbb{F}[x_{1},\ldots,x_{n}] is

𝐕=𝕍(f1,,fm):={𝐱𝔽¯nf1(𝐱)==fm(𝐱)=0}.\mathbf{{V}}=\mathbb{V}(f_{1},\ldots,f_{m}):=\{\mathbf{{x}}\in\overline{\mathbb{F}}^{n}\mid f_{1}(\mathbf{{x}})=\cdots=f_{m}(\mathbf{{x}})=0\}.

We say that 𝐕𝔽¯n\mathbf{{V}}\subseteq\overline{\mathbb{F}}^{n} is defined over 𝔽\mathbb{F} as it can be cut out by polynomials whose coefficients lie in 𝔽\mathbb{F}. The ideal of 𝐕\mathbf{{V}} is I(𝐕)={f𝔽[𝐱]p𝐕:f(p)=0}\operatorname{I}(\mathbf{{V}})=\{f\in\mathbb{F}[\mathbf{{x}}]\mid\forall p\in\mathbf{{V}}\colon f(p)=0\}. Any variety 𝐕\mathbf{{V}} can be uniquely written as the union of irreducible varieties, where a variety is said to be irreducible if it cannot be written as the union of strictly contained varieties. The dimension of a variety 𝐕\mathbf{{V}}, denoted dim𝐕\dim\mathbf{{V}}, is the maximal length dd of a chain of irreducible varieties 𝐕1𝐕d𝐕\emptyset\neq\mathbf{{V}}_{1}\subsetneq\cdots\subsetneq\mathbf{{V}}_{d}\subsetneq\mathbf{{V}}. The codimension of 𝐕𝔽¯n\mathbf{{V}}\subseteq\overline{\mathbb{F}}^{n} is simply codim𝐕=ndim𝐕\operatorname{codim}\mathbf{{V}}=n-\dim\mathbf{{V}}.

2.2 Notation

In the rest of the paper we will often find it convenient to identify, with a slight abuse of notation, a bilinear map f:𝔽n1×𝔽n2𝔽mf\colon\mathbb{F}^{n_{1}}\times\mathbb{F}^{n_{2}}\to\mathbb{F}^{m} (or tensor) with a linear subspace of matrices, or matrix space, 𝐋𝔽n1×n2\mathbf{{L}}\preceq\mathbb{F}^{n_{1}\times n_{2}}. If f=(f1,,fm)f=(f_{1},\ldots,f_{m}), we will identify ff with the linear subspace 𝐋\mathbf{{L}} spanned by the mm matrices corresponding to the bilinear forms f1,,fmf_{1},\ldots,f_{m}. Note that this identification is not a correspondence, as it involves choosing a basis for 𝐋\mathbf{{L}}. Importantly, however, since the notions of tensor rank that we study are invariant under the action of the general linear group GLn\mathrm{GL}_{n} on each of the axes, the choice of basis we make is immaterial in the definition of rank, meaning that GR(𝐋)\operatorname{GR}(\mathbf{{L}}), SR(𝐋)\operatorname{SR}(\mathbf{{L}}), AR(𝐋)\operatorname{AR}(\mathbf{{L}}) are nevertheless well defined.

For the reader’s convenience, we summarize below the different perspectives of tensor/bilinear map/matrix space that we use, and how they relate to each other:

  • A tensor T=(ai,j,k)𝔽n1×n2×n3T=(a_{i,j,k})\in\mathbb{F}^{n_{1}\times n_{2}\times n_{3}}, or a multilinear form T(𝐱,𝐲,𝐳)=i,j,kai,j,kxiyjzkT(\mathbf{{x}},\mathbf{{y}},\mathbf{{z}})=\sum_{i,j,k}a_{i,j,k}x_{i}y_{j}z_{k}.

  • A bilinear form f=(f1,,fn3):𝔽n1×𝔽n2𝔽n3f=(f_{1},\ldots,f_{n_{3}})\colon\mathbb{F}^{n_{1}}\times\mathbb{F}^{n_{2}}\to\mathbb{F}^{n_{3}} with fk(𝐱,𝐲)=i,jai,j,kxiyjf_{k}(\mathbf{{x}},\mathbf{{y}})=\sum_{i,j}a_{i,j,k}x_{i}y_{j}.

  • A matrix space 𝐋𝔽n1×n2\mathbf{{L}}\preceq\mathbb{F}^{n_{1}\times n_{2}} spanned by {A1,,An3}\{A_{1},\ldots,A_{n_{3}}\} where Ak=(ai,j,k)i,jA_{k}=(a_{i,j,k})_{i,j}.

2.3 Tangent spaces of a variety

For a variety 𝐕𝕂n\mathbf{{V}}\subseteq\mathbb{K}^{n}, the tangent space 𝐓p𝐕\mathbf{T}_{p}\mathbf{{V}} to 𝐕\mathbf{{V}} at the point p𝐕p\in\mathbf{{V}} is the linear subspace

𝐓p𝐕={𝐯𝕂n|gI(𝐕):g𝐯(p)=0}.\mathbf{T}_{p}\mathbf{{V}}=\Big{\{}\mathbf{{v}}\in\mathbb{K}^{n}\,\Big{|}\,\forall g\in\operatorname{I}(\mathbf{{V}})\colon\frac{\partial g}{\partial\mathbf{{v}}}(p)=0\Big{\}}.

Equivalently, for any choice of a generating set {g1,,gs}𝕂[x1,,xn]\{g_{1},\ldots,g_{s}\}\subseteq\mathbb{K}[x_{1},\ldots,x_{n}] for the ideal I(𝐕)\operatorname{I}(\mathbf{{V}}) (which is finitely generated by Hilbert’s basis theorem), the tangent space at p𝐕p\in\mathbf{{V}} is 𝐓p𝐕=ker𝐉p\mathbf{T}_{p}\mathbf{{V}}=\ker\operatorname{\mathbf{J}}_{p}, where 𝐉p\operatorname{\mathbf{J}}_{p} is the Jacobian matrix

(g1x1(p)g1xn(p)gsx1(p)gsx1(p))s×n.\begin{pmatrix}\frac{\partial{g_{1}}}{\partial x_{1}}(p)&\cdots&\frac{\partial{g_{1}}}{\partial x_{n}}(p)\\ \vdots&\ddots&\vdots\\ \frac{\partial{g_{s}}}{\partial x_{1}}(p)&\cdots&\frac{\partial{g_{s}}}{\partial x_{1}}(p)\end{pmatrix}_{s\times n}.

We will need the following basic fact about tangent spaces (for a proof see, e.g., Theorem 2.3 in [36]).

Fact 2.1.

For any irreducible variety 𝐕\mathbf{{V}} and any p𝐕p\in\mathbf{{V}} we have dim𝐓p𝐕dim𝐕\dim\mathbf{T}_{p}\mathbf{{V}}\geq\dim\mathbf{{V}}.

We will also need the following easy observation about the interplay between tangents and intersections.

Proposition 2.2.

For any two varieties 𝐕\mathbf{{V}} and 𝐖\mathbf{{W}}, and any p𝐕𝐖p\in\mathbf{{V}}\cap\mathbf{{W}},

𝐓p(𝐕𝐖)𝐓p𝐕𝐓p𝐖.\mathbf{T}_{p}(\mathbf{{V}}\cap\mathbf{{W}})\subseteq\mathbf{T}_{p}\mathbf{{V}}\,\cap\,\mathbf{T}_{p}\mathbf{{W}}.

In particular, if 𝐕𝐖\mathbf{{V}}\subseteq\mathbf{{W}} then 𝐓p𝐕𝐓p𝐖\mathbf{T}_{p}\mathbf{{V}}\subseteq\mathbf{T}_{p}\mathbf{{W}}.

Proof.

We have I(𝐕)I(𝐕𝐖)\operatorname{I}(\mathbf{{V}})\subseteq\operatorname{I}(\mathbf{{V}}\cap\mathbf{{W}}) and I(𝐖)I(𝐕𝐖)\operatorname{I}(\mathbf{{W}})\subseteq\operatorname{I}(\mathbf{{V}}\cap\mathbf{{W}}). Therefore, by the definition of a tangent space, for any p𝐕𝐖p\in\mathbf{{V}}\cap\mathbf{{W}} we have 𝐓p(𝐕𝐖)𝐓p𝐕\mathbf{T}_{p}(\mathbf{{V}}\cap\mathbf{{W}})\subseteq\mathbf{T}_{p}\mathbf{{V}} and 𝐓p(𝐕𝐖)𝐓p𝐖\mathbf{T}_{p}(\mathbf{{V}}\cap\mathbf{{W}})\subseteq\mathbf{T}_{p}\mathbf{{W}}, and thus also 𝐓p(𝐕𝐖)𝐓p𝐕𝐓p𝐖\mathbf{T}_{p}(\mathbf{{V}}\cap\mathbf{{W}})\subseteq\mathbf{T}_{p}\mathbf{{V}}\cap\mathbf{T}_{p}\mathbf{{W}}, as claimed. ∎

2.4 Slice rank of tangent spaces of determinantal varieties

We henceforth denote by 𝐌r=𝐌r(𝕂m×n)𝕂m×n\mathbf{{M}}_{r}=\mathbf{{M}}_{r}(\mathbb{K}^{m\times n})\subseteq\mathbb{K}^{m\times n} the variety of matrices in 𝕂m×n\mathbb{K}^{m\times n} of rank at most rr. Note that 𝐌r\mathbf{{M}}_{r} is indeed a variety, as it is cut out by a finite set of polynomials: all (r+1)×(r+1)(r+1)\times(r+1) minors. It is therefore referred to in the literature as a determinantal variety.

The following crucial lemma shows that certain tangent spaces of the variety 𝐌r=𝐌r(𝕂m×n)\mathbf{{M}}_{r}=\mathbf{{M}}_{r}(\mathbb{K}^{m\times n}), which are matrix spaces, have a small slice rank (recall Subsection 2.2 for the terminology).

Lemma 2.3 (Slice rank of tangents).

The tangent space to 𝐌r=𝐌r(𝕂m×n)\mathbf{{M}}_{r}=\mathbf{{M}}_{r}(\mathbb{K}^{m\times n}), for any algebraically closed field 𝕂\mathbb{K}, at any matrix A𝐌rA\in\mathbf{{M}}_{r} with rank(A)=r\operatorname{rank}(A)=r satisfies

SR(𝐓A𝐌r)2r.\operatorname{SR}(\mathbf{T}_{A}\mathbf{{M}}_{r})\leq 2r.

To prove Lemma 2.3 will need the following result, which explicitly describes the tangent space to 𝐌r\mathbf{{M}}_{r} at any matrix of rank exactly rr. It can be deduced from Example 14.16 in [23]. We prove it below for completeness.

Proposition 2.4 (Tangents of determinantal varieties).

The tangent space to 𝐌r=𝐌r(𝕂m×n)\mathbf{{M}}_{r}=\mathbf{{M}}_{r}(\mathbb{K}^{m\times n}), for any algebraically closed field 𝕂\mathbb{K}, at any matrix A𝐌rA\in\mathbf{{M}}_{r} with rank(A)=r\operatorname{rank}(A)=r is

𝐓A𝐌r={CA+ACC𝕂m×m,C𝕂n×n}.\mathbf{T}_{A}\mathbf{{M}}_{r}=\{CA+AC^{\prime}\mid C\in\mathbb{K}^{m\times m},C^{\prime}\in\mathbb{K}^{n\times n}\}.
Proof.

It will be convenient to work with the following equivalent definition of a tangent space of a variety 𝐕\mathbf{{V}} at a point p𝐕p\in\mathbf{{V}};

𝐓p𝐕={𝐯𝕂ngI(𝐕):g(p+t𝐯)g(p)0(modt2)}.\mathbf{T}_{p}\mathbf{{V}}=\{\mathbf{{v}}\in\mathbb{K}^{n}\mid\forall g\in\operatorname{I}(\mathbf{{V}})\colon g(p+t\mathbf{{v}})-g(p)\equiv 0\pmod{t^{2}}\}.

To see this equivalence, observe that, using the Taylor expansion of the polynomial gg at the point pp, we have g(p+t𝐯)g(p)tg𝐯(p)(modt2)g(p+t\mathbf{{v}})-g(p)\equiv t\frac{\partial g}{\partial\mathbf{{v}}}(p)\pmod{t^{2}}.

Now, we will use the fact that the (r+1)×(r+1)(r+1)\times(r+1) minors not only cut out the variety 𝐌r=𝐌r(𝕂m×n)\mathbf{{M}}_{r}=\mathbf{{M}}_{r}(\mathbb{K}^{m\times n}), but in fact generate the ideal I(𝐌r)\operatorname{I}(\mathbf{{M}}_{r}). Indeed, this follows from the fact that the ideal II they generate is prime ([7], Theorem 2.10) and so I=I\sqrt{I}=I, together with Hilbert’s Nullstellensatz which gives I(𝐌r)=I=I\operatorname{I}(\mathbf{{M}}_{r})=\sqrt{I}=I. Let gI,Jg_{I,J} denote the minor of the submatrix whose set of rows and columns are given by I[m]I\subseteq[m] and J[n]J\subseteq[n], respectively. Thus, I(𝐌r)=gI,J|I|=|J|=r+1\operatorname{I}(\mathbf{{M}}_{r})=\langle g_{I,J}\mid|I|=|J|=r+1\rangle.

Since rank(A)=r\operatorname{rank}(A)=r, there are invertible matrices P𝔽m×mP\in\mathbb{F}^{m\times m} and Q𝔽n×nQ\in\mathbb{F}^{n\times n} such that A=PIrQA=PI_{r}Q, where

Ir=(1000001000001000000000000)m×nI_{r}=\begin{pmatrix}1&0&\cdots&0&0&\cdots&\cdots&\cdots&0\\ 0&1&\cdots&0&0&\cdots&\cdots&\cdots&0\\ \vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ 0&0&\cdots&1&0&\cdots&\cdots&\cdots&0\\ 0&0&\cdots&0&0&\cdots&\cdots&\cdots&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\ 0&0&\cdots&0&0&\cdots&\cdots&\cdots&0\end{pmatrix}_{m\times n}

has the r×rr\times r identity matrix as the upper-left submatrix, that is, the submatrix whose set of rows II and set of columns JJ are I=[r]I=[r] and J=[r]J=[r].

Let X𝕂m×nX\in\mathbb{K}^{m\times n}. Put Y=P1XQ1𝕂m×nY=P^{-1}XQ^{-1}\in\mathbb{K}^{m\times n}. For every g=gI,Jg=g_{I,J} with |I|=|J|=r+1|I|=|J|=r+1 we have g(A)=0g(A)=0 and

g(A+tX)=g(P(Ir+tY)Q)=g(P)g(Ir+tY)g(Q).g(A+tX)=g(P(I_{r}+tY)Q)=g(P)g(I_{r}+tY)g(Q).

It follows that

g(A+tX)g(A)0(modt2) if and only if g(Ir+tY)0(modt2)g(A+tX)-g(A)\equiv 0\pmod{t^{2}}\linebreak\quad\text{ if and only if }\quad g(I_{r}+tY)\equiv 0\pmod{t^{2}}.

Write Y=(yi,j)i,jY=(y_{i,j})_{i,j}. Observe that if I=[r]{i}I=[r]\cup\{i\} and J=[r]{j}J=[r]\cup\{j\} for some i>ri>r and j>rj>r then gI,J(Ir+tY)tyi,j(modt2)g_{I,J}(I_{r}+tY)\equiv ty_{i,j}\pmod{t^{2}}, and otherwise gI,J(Ir+tY)0(modt2)g_{I,J}(I_{r}+tY)\equiv 0\pmod{t^{2}}. Thus, YY satisfies g(Ir+tY)0(modt2)g(I_{r}+tY)\equiv 0\pmod{t^{2}} for every gI(𝐌r)g\in\operatorname{I}(\mathbf{{M}}_{r}) if and only if yi,j=0y_{i,j}=0 for every i>ri>r and j>rj>r, or equivalently, Y=Y1Ir+IrY2Y=Y_{1}I_{r}+I_{r}Y_{2} for some Y1𝕂m×mY_{1}\in\mathbb{K}^{m\times m} and Y2𝕂n×nY_{2}\in\mathbb{K}^{n\times n}. We deduce

𝐓A𝐌r\displaystyle\mathbf{T}_{A}\mathbf{{M}}_{r} ={X𝕂m×ngI(𝐌r):\displaystyle=\{X\in\mathbb{K}^{m\times n}\mid\forall g\in\operatorname{I}(\mathbf{{M}}_{r})\colon
g(A+tX)g(A)0(modt2)}\displaystyle g(A+tX)-g(A)\equiv 0\pmod{t^{2}}\}
={PYQY1𝕂m×m,Y2𝕂n×n:Y=Y1Ir+IrY2}\displaystyle=\{PYQ\mid\exists Y_{1}\in\mathbb{K}^{m\times m},Y_{2}\in\mathbb{K}^{n\times n}\colon Y=Y_{1}I_{r}+I_{r}Y_{2}\}
={(PY1P1)A+A(Q1Y2Q)Y1𝕂m×m,Y2𝕂n×n}\displaystyle=\{(PY_{1}P^{-1})A+A(Q^{-1}Y_{2}Q)\mid Y_{1}\in\mathbb{K}^{m\times m},Y_{2}\in\mathbb{K}^{n\times n}\}
={CA+ACC𝕂m×m,C𝕂n×n},\displaystyle=\{CA+AC^{\prime}\mid C\in\mathbb{K}^{m\times m},C^{\prime}\in\mathbb{K}^{n\times n}\},

completing the proof. ∎

We note that any matrix AA with rank(A)=r\operatorname{rank}(A)=r is a nonsingular point of 𝐌r\mathbf{{M}}_{r} (i.e., dim𝐓A𝐌r=dim𝐌r\dim\mathbf{T}_{A}\mathbf{{M}}_{r}=\dim\mathbf{{M}}_{r}), whereas any matrix BB with rank(B)<r\operatorname{rank}(B)<r is a singular point, and in fact, 𝐓B𝐌r(𝕂m×n)=𝕂m×n\mathbf{T}_{B}\mathbf{{M}}_{r}(\mathbb{K}^{m\times n})=\mathbb{K}^{m\times n}.

Proof of Lemma 2.3.

We identify A=(ai,j)𝐌r(𝕂m×n)A=(a_{i,j})\in\mathbf{{M}}_{r}(\mathbb{K}^{m\times n}) with the bilinear form given by A(𝐱,𝐲)=𝐱TA𝐲=i,jai,jxiyjA(\mathbf{{x}},\mathbf{{y}})=\mathbf{{x}}^{T}A\mathbf{{y}}=\sum_{i,j}a_{i,j}x_{i}y_{j}. Since A𝕂m×nA\in\mathbb{K}^{m\times n} and rank(A)r\operatorname{rank}(A)\leq r, there are linear forms f1(𝐱),,fr(𝐱)𝕂[𝐱]f_{1}(\mathbf{{x}}),\ldots,f_{r}(\mathbf{{x}})\in\mathbb{K}[\mathbf{{x}}] and linear forms g1(𝐲),,gr(𝐲)𝕂[𝐲]g_{1}(\mathbf{{y}}),\ldots,g_{r}(\mathbf{{y}})\in\mathbb{K}[\mathbf{{y}}] such that

A(𝐱,𝐲)=i=1rfi(𝐱)gi(𝐲).A(\mathbf{{x}},\mathbf{{y}})=\sum_{i=1}^{r}f_{i}(\mathbf{{x}})g_{i}(\mathbf{{y}}).

It follows that any matrix of the form CA+ACCA+AC^{\prime}, with C𝕂m×mC\in\mathbb{K}^{m\times m} and C𝕂n×nC^{\prime}\in\mathbb{K}^{n\times n}, has a corresponding bilinear form

𝐱T(CA+AC)𝐲=(CT𝐱)TA𝐲+𝐱TA(C𝐲)=i=1rfi(CT𝐱)gi(𝐲)+i=1rfi(𝐱)gi(C𝐲).\displaystyle\begin{split}\mathbf{{x}}^{T}(CA+AC^{\prime})\mathbf{{y}}&=(C^{T}\mathbf{{x}})^{T}A\mathbf{{y}}+\mathbf{{x}}^{T}A(C^{\prime}\mathbf{{y}})\\ &=\sum_{i=1}^{r}f_{i}(C^{T}\mathbf{{x}})g_{i}(\mathbf{{y}})+\sum_{i=1}^{r}f_{i}(\mathbf{{x}})g_{i}(C^{\prime}\mathbf{{y}}).\end{split} (1)

Now, let B1,,BdB_{1},\ldots,B_{d} be any basis of 𝐓A𝐌r\mathbf{T}_{A}\mathbf{{M}}_{r}. Then 𝐓A𝐌r\mathbf{T}_{A}\mathbf{{M}}_{r} corresponds to the trilinear form T=k=1dzkBk(𝐱,𝐲)T=\sum_{k=1}^{d}z_{k}B_{k}(\mathbf{{x}},\mathbf{{y}}) in the variables 𝐱,𝐲,𝐳\mathbf{{x}},\mathbf{{y}},\mathbf{{z}}. By Proposition 2.4, for each k[d]k\in[d] we can write Bk=CkA+ACkB_{k}=C_{k}A+AC_{k}^{\prime} for some Ck𝕂m×mC_{k}\in\mathbb{K}^{m\times m} and Ck𝕂n×nC^{\prime}_{k}\in\mathbb{K}^{n\times n}. Using the decomposition in (1), we obtain the trilinear decomposition

T\displaystyle T =k=1dzkBk(𝐱,𝐲)\displaystyle=\sum_{k=1}^{d}z_{k}\cdot B_{k}(\mathbf{{x}},\mathbf{{y}})
=k=1dzk(i=1rfi(CkT𝐱)gi(𝐲)+i=1rfi(𝐱)gi(Ck𝐲))\displaystyle=\sum_{k=1}^{d}z_{k}\Big{(}\sum_{i=1}^{r}f_{i}(C_{k}^{T}\mathbf{{x}})g_{i}(\mathbf{{y}})+\sum_{i=1}^{r}f_{i}(\mathbf{{x}})g_{i}(C_{k}^{\prime}\mathbf{{y}})\Big{)}
=i=1rhi(𝐱,𝐳)gi(𝐲)+i=1rfi(𝐱)hi(𝐲,𝐳)\displaystyle=\sum_{i=1}^{r}h_{i}(\mathbf{{x}},\mathbf{{z}})g_{i}(\mathbf{{y}})+\sum_{i=1}^{r}f_{i}(\mathbf{{x}})h^{\prime}_{i}(\mathbf{{y}},\mathbf{{z}})

where

hi(𝐱,𝐳):=k=1dzkfi(CkT𝐱),hi(𝐲,𝐳):=k=1dzkgi(Ck𝐲).h_{i}(\mathbf{{x}},\mathbf{{z}}):=\sum_{k=1}^{d}z_{k}f_{i}(C_{k}^{T}\mathbf{{x}}),\quad h^{\prime}_{i}(\mathbf{{y}},\mathbf{{z}}):=\sum_{k=1}^{d}z_{k}g_{i}(C_{k}^{\prime}\mathbf{{y}}).

Note that each hi𝕂[𝐱,𝐳]h_{i}\in\mathbb{K}[\mathbf{{x}},\mathbf{{z}}] and hi𝕂[𝐲,𝐳]h^{\prime}_{i}\in\mathbb{K}[\mathbf{{y}},\mathbf{{z}}] are bilinear forms over 𝕂\mathbb{K}, and recall that each fi𝕂[𝐱]f_{i}\in\mathbb{K}[\mathbf{{x}}] and gi𝕂[𝐲]g_{i}\in\mathbb{K}[\mathbf{{y}}] are linear forms over 𝕂\mathbb{K}. We deduce that each of the 2r2r summands in the decomposition of TT above is a trilinear form of slice rank at most 11 over 𝕂\mathbb{K}. This completes the proof. ∎

3 Slice rank vs. geometric rank

In this section we prove the core of our main result, linearly bounding the slice rank of a tensor from above by its geometric rank.

Theorem 3.1.

For any 33-tensor TT over any perfect field 𝔽\mathbb{F},

SR(T)3GR(T).\operatorname{SR}(T)\leq 3\operatorname{GR}(T).

We in fact get the slightly better constant 22 instead of 33 in Theorem 3.1, at the price of allowing the slice rank decomposition to use coefficients from an algebraic extension.

Let SR¯(T)\overline{\operatorname{SR}}(T) denote, for a tensor TT, the slice rank over the algebraic closure of the field of coefficients of TT. In other words, if TT is a tensor over 𝔽\mathbb{F} then SR¯(T)\overline{\operatorname{SR}}(T) allows coefficients from the algebraic closure 𝔽¯\overline{\mathbb{F}}, rather than just from 𝔽\mathbb{F}, in the decomposition of TT into slice-rank one summands. Clearly, SR¯(T)SR(T)\overline{\operatorname{SR}}(T)\leq\operatorname{SR}(T). We note that for matrices, rank\operatorname{rank} and rank¯\overline{\operatorname{rank}} are equal. For tensors we have the following inequality, essentially due to Derksen [11] (we include a proof sketch at the end of this section).

Proposition 3.2 ([11]).

For any 33-tensor TT over any perfect field, 23SR(T)SR¯(T)\frac{2}{3}\operatorname{SR}(T)\leq\overline{\operatorname{SR}}(T).

We will also need the following properties of slice rank, which are easily deduced from definition. For convenience of application, we state them for matrix spaces.

Proposition 3.3.

The slice rank satisfies the following properties, where 𝐋\mathbf{{L}} and 𝐋\mathbf{{L^{\prime}}} are linear subspaces of matrices:

  1. 1.

    (Dimension bound) SR(𝐋)dim𝐋\operatorname{SR}(\mathbf{{L}})\leq\dim\mathbf{{L}},

  2. 2.

    (Monotonicity) SR(𝐋)SR(𝐋)\operatorname{SR}(\mathbf{{L^{\prime}}})\leq\operatorname{SR}(\mathbf{{L}}) if 𝐋𝐋\mathbf{{L^{\prime}}}\preceq\mathbf{{L}},

  3. 3.

    (Sub-additivity) SR(𝐋+𝐋)SR(𝐋)+SR(𝐋)\operatorname{SR}(\mathbf{{L}}+\mathbf{{L^{\prime}}})\leq\operatorname{SR}(\mathbf{{L}})+\operatorname{SR}(\mathbf{{L^{\prime}}}).

3.1 Linear sections of determinantal varieties

For 𝐋𝕂m×n\mathbf{{L}}\preceq\mathbb{K}^{m\times n} a matrix space we define the variety 𝐋r=𝐋𝐌r\mathbf{{L}}_{r}=\mathbf{{L}}\cap\mathbf{{M}}_{r} (here 𝐌r=𝐌r(𝕂m×n)\mathbf{{M}}_{r}=\mathbf{{M}}_{r}(\mathbb{K}^{m\times n})) of all matrices in 𝐋\mathbf{{L}} of rank at most rr. We next bound the slice rank of a matrix space using these linear sections of a determinantal variety. We denote by codimL𝐗\operatorname{codim}_{L}\mathbf{{X}} the codimension of a variety 𝐗L\mathbf{{X}}\subseteq L inside a linear space LL; that is, codimL𝐗=dimLdim𝐗\operatorname{codim}_{L}\mathbf{{X}}=\dim L-\dim\mathbf{{X}}.

Proposition 3.4.

Let 𝐋𝕂m×n\mathbf{{L}}\preceq\mathbb{K}^{m\times n} be a matrix space over any algebraically closed field 𝕂\mathbb{K}. For any rr\in\mathbb{N},

SR(𝐋)2r+codim𝐋𝐋r.\operatorname{SR}(\mathbf{{L}})\leq 2r+\operatorname{codim}_{\mathbf{{L}}}\mathbf{{L}}_{r}.
Proof.

We proceed by induction on rr. Note that the base case r=0r=0, which reads SR(𝐋)0+codim𝐋{𝟎}=dim𝐋\operatorname{SR}(\mathbf{{L}})\leq 0+\operatorname{codim}_{\mathbf{{L}}}\{\mathbf{{0}}\}=\dim\mathbf{{L}}, follows from Proposition 3.3. We thus move to the inductive step.

Let 𝐕\mathbf{{V}} be an irreducible component of 𝐋r\mathbf{{L}}_{r} with dim𝐕=dim𝐋r\dim\mathbf{{V}}=\dim\mathbf{{L}}_{r}, and let A𝐕𝐌r1A\in\mathbf{{V}}\setminus\mathbf{{M}}_{r-1}. We may indeed assume 𝐕𝐌r1\mathbf{{V}}\setminus\mathbf{{M}}_{r-1}\neq\emptyset, as otherwise 𝐕𝐋r1\mathbf{{V}}\subseteq\mathbf{{L}}_{r-1} and thus dim𝐋r=dim𝐕dim𝐋r1\dim\mathbf{{L}}_{r}=\dim\mathbf{{V}}\leq\dim\mathbf{{L}}_{r-1} and we are done via the induction hypothesis by taking codimensions. Let 𝐏𝐋\mathbf{{P}}\preceq\mathbf{{L}} be the linear subspace 𝐏=𝐋𝐓A𝐌r\mathbf{{P}}=\mathbf{{L}}\,\cap\,\mathbf{T}_{A}\mathbf{{M}}_{r}. We will prove:

  1. 1.

    SR(𝐏)2r\operatorname{SR}(\mathbf{{P}})\leq 2r,

  2. 2.

    codim𝐋𝐏codim𝐋𝐋r\operatorname{codim}_{\mathbf{{L}}}\mathbf{{P}}\leq\operatorname{codim}_{\mathbf{{L}}}\mathbf{{L}}_{r}.

To see why this would complete the inductive step, let 𝐏\mathbf{{P}}^{\perp} be a complement subspace of 𝐏\mathbf{{P}} in 𝐋\mathbf{{L}}, and note that

SR(𝐋)SR(𝐏)+SR(𝐏)SR(𝐏)+codim𝐋𝐏2r+codim𝐋𝐋r\operatorname{SR}(\mathbf{{L}})\leq\operatorname{SR}(\mathbf{{P}})+\operatorname{SR}(\mathbf{{P}}^{\perp})\leq\operatorname{SR}(\mathbf{{P}})+\operatorname{codim}_{\mathbf{{L}}}\mathbf{{P}}\leq 2r+\operatorname{codim}_{\mathbf{{L}}}\mathbf{{L}}_{r}

where the first and second inequalities use Proposition 3.3, and the third inequality uses Items (1) and (2).

For the proof of Item (1), we have

SR(𝐏)=SR(𝐋𝐓A𝐌r)SR(𝐓A𝐌r)2r\operatorname{SR}(\mathbf{{P}})=\operatorname{SR}(\mathbf{{L}}\cap\mathbf{T}_{A}\mathbf{{M}}_{r})\leq\operatorname{SR}(\mathbf{T}_{A}\mathbf{{M}}_{r})\leq 2r

where the first inequality uses Proposition 3.3, and the second inequality uses Lemma 2.3 as rank(A)=r\operatorname{rank}(A)=r. For the proof of Item (2), we have

dim𝐋r=dim𝐕\displaystyle\dim\mathbf{{L}}_{r}=\dim\mathbf{{V}} dim𝐓A𝐕\displaystyle\leq\dim\mathbf{T}_{A}\mathbf{{V}}
dim𝐓A𝐋r\displaystyle\leq\dim\mathbf{T}_{A}\mathbf{{L}}_{r}
dim(𝐓A𝐋𝐓A𝐌r)\displaystyle\leq\dim(\mathbf{T}_{A}\mathbf{{L}}\,\cap\,\mathbf{T}_{A}\mathbf{{M}}_{r})
=dim(𝐋𝐓A𝐌r)=dim𝐏\displaystyle=\dim(\mathbf{{L}}\,\cap\,\mathbf{T}_{A}\mathbf{{M}}_{r})=\dim\mathbf{{P}}

where the first inequality uses Fact 2.1, the second inequality uses Proposition 2.2 together with the fact that 𝐕𝐋r\mathbf{{V}}\subseteq\mathbf{{L}}_{r}, the third inequality uses Proposition 2.2 again, and the last equality uses 𝐓A𝐋=𝐋\mathbf{T}_{A}\mathbf{{L}}=\mathbf{{L}} since 𝐋\mathbf{{L}} is a linear subspace. As the above varieties are subvarieties of 𝐋\mathbf{{L}}, we obtain codim𝐋𝐏codim𝐋𝐋r\operatorname{codim}_{\mathbf{{L}}}\mathbf{{P}}\leq\operatorname{codim}_{\mathbf{{L}}}\mathbf{{L}}_{r}. This proves Item (2) and therefore completes the proof of the inductive step. ∎

We note that an immediate corollary of Proposition 3.4 is a slice rank upper bound of 2r2r for any subspace of matrices of rank at most rr.

3.2 Putting everything together

To prove Theorem 3.1 we also need the following characterization of geometric rank. Recall that GR(T)=codimkerT\operatorname{GR}(T)=\operatorname{codim}\ker T where kerT={(𝐱,𝐲)T(𝐱,𝐲,)=𝟎}\ker T=\{(\mathbf{{x}},\mathbf{{y}})\mid T(\mathbf{{x}},\mathbf{{y}},\cdot)=\mathbf{{0}}\}.

Fact 3.5 ([30]).

For any 33-tensor TT over any field,

GR(T)=minrr+codim{𝐱rankT(𝐱,,)=r}.\operatorname{GR}(T)=\min_{r}\,r+\operatorname{codim}\{\mathbf{{x}}\mid\operatorname{rank}T(\mathbf{{x}},\cdot,\cdot)=r\}.

Fact 3.5 is proved via the decomposition

kerT=r{(𝐱,𝐲)kerTrankT(𝐱,,)=r},\ker T=\bigcup_{r}\{(\mathbf{{x}},\mathbf{{y}})\in\ker T\mid\operatorname{rank}T(\mathbf{{x}},\cdot,\cdot)=r\},

using a result from algebraic geometry on the dimensions of fibers, and the fact that the codimension of a finite union of varieties is the minimum of their codimensions. We refer to Theorem 3.1 in [30] for the formal proof.

We are now ready to prove the main result of this section. First, we show how to obtain Proposition 3.2 from the results in [11].

Proof sketch of Proposition 3.2.

This is obtained by combining Theorem 2.5, Corollary 3.7, and Proposition 4.9 in [11]. These results show that the “GG-stable rank” rank𝔽G(T)\operatorname{rank}^{G}_{\mathbb{F}}(T) over a perfect field 𝔽\mathbb{F} satisfies the following properties, respectively:

  • rank𝔽G(T)=rank𝔽¯G(T)\operatorname{rank}^{G}_{\mathbb{F}}(T)=\operatorname{rank}^{G}_{\overline{\mathbb{F}}}(T),

  • rank𝔽G(T)SR(T)\operatorname{rank}^{G}_{\mathbb{F}}(T)\leq\operatorname{SR}(T),

  • rank𝔽G(T)(2/3)SR(T)\operatorname{rank}^{G}_{\mathbb{F}}(T)\geq(2/3)\operatorname{SR}(T).

Putting these together gives 23SR(T)rank𝔽G(T)=rank𝔽¯G(T)SR¯(T)\frac{2}{3}\operatorname{SR}(T)\leq\operatorname{rank}^{G}_{\mathbb{F}}(T)=\operatorname{rank}^{G}_{\overline{\mathbb{F}}}(T)\leq\overline{\operatorname{SR}}(T), as claimed. ∎

Proof of Theorem 3.1.

Suppose T=(ai,j,k)i,j,k𝔽n1×n2×n3T=(a_{i,j,k})_{i,j,k}\in\mathbb{F}^{n_{1}\times n_{2}\times n_{3}} with 𝔽\mathbb{F} an arbitrary field. Let 𝐋𝔽¯n2×n3\mathbf{{L}}\preceq\overline{\mathbb{F}}^{n_{2}\times n_{3}} be the matrix space spanned by the n1n_{1} slices A1=(a1,j,k)j,k,,An1=(an1,j,k)j,kA_{1}=(a_{1,j,k})_{j,k},\ldots,A_{n_{1}}=(a_{n_{1},j,k})_{j,k}. Note that we may assume, by acting with general linear group GLn1(𝔽)\operatorname{GL}_{n_{1}}(\mathbb{F}) on TT, that the first d:=dim𝐋d:=\dim\mathbf{{L}} slices A1,,AdA_{1},\ldots,A_{d} of TT are linearly independent and the rest are zero matrices; indeed, this action does not change GR(T)\operatorname{GR}(T) (see Lemma 4.2 in [30]) nor does it change SR(T)\operatorname{SR}(T).

Note that for any 𝐱𝔽¯n1\mathbf{{x}}\in\overline{\mathbb{F}}^{n_{1}}, the bilinear form T(𝐱,,)T(\mathbf{{x}},\cdot,\cdot) corresponds to the matrix ixiAi\sum_{i}x_{i}A_{i}; indeed,

T(𝐱,,)\displaystyle T(\mathbf{{x}},\cdot,\cdot) :(𝐲,𝐳)i,j,kai,j,kxiyjzk=ixij,kai,j,kyjzk\displaystyle\colon(\mathbf{{y}},\mathbf{{z}})\mapsto\sum_{i,j,k}a_{i,j,k}x_{i}y_{j}z_{k}=\sum_{i}x_{i}\sum_{j,k}a_{i,j,k}y_{j}z_{k}
=ixi𝐲TAi𝐳=𝐲T(ixiAi)𝐳.\displaystyle=\sum_{i}x_{i}\mathbf{{y}}^{T}A_{i}\mathbf{{z}}=\mathbf{{y}}^{T}\Big{(}\sum_{i}x_{i}A_{i}\Big{)}\mathbf{{z}}.

Using our assumption that Ai=𝟎A_{i}=\mathbf{{0}} for every i>di>d, let

𝐗r\displaystyle\mathbf{{X}}_{r} ={𝐱𝔽¯n1rankT(𝐱,,)r}\displaystyle=\{\mathbf{{x}}\in\overline{\mathbb{F}}^{n_{1}}\mid\operatorname{rank}T(\mathbf{{x}},\cdot,\cdot)\leq r\}
={𝐱𝔽¯n1rank(x1A1++xdAd)r}.\displaystyle=\{\mathbf{{x}}\in\overline{\mathbb{F}}^{n_{1}}\mid\operatorname{rank}\big{(}x_{1}A_{1}+\cdots+x_{d}A_{d}\big{)}\leq r\}.

We claim that codim𝐗r=codim𝐋𝐋r\operatorname{codim}\mathbf{{X}}_{r}=\operatorname{codim}_{\mathbf{{L}}}\mathbf{{L}}_{r}. Recall that 𝐋r={A𝐋rankAr}\mathbf{{L}}_{r}=\{A\in\mathbf{{L}}\mid\operatorname{rank}A\leq r\}. First, we show that the variety 𝐗r\mathbf{{X}}_{r} is isomorphic to the variety 𝐋r×𝔽¯n1d\mathbf{{L}}_{r}\times\overline{\mathbb{F}}^{n_{1}-d}. Indeed, the polynomial map (in fact linear)

(x1,,xn1)(x1A1++xdAd,xd+1,,xn1)(x_{1},\ldots,x_{n_{1}})\mapsto(x_{1}A_{1}+\cdots+x_{d}A_{d},x_{d+1},\ldots,x_{n_{1}})

maps 𝐗r\mathbf{{X}}_{r} to 𝐋r×𝔽¯n1d\mathbf{{L}}_{r}\times\overline{\mathbb{F}}^{n_{1}-d}, and is invertible via a polynomial map (in fact linear) by our assumption that A1,,AdA_{1},\ldots,A_{d} are linearly independent. We deduce from this isomorphism the equality of dimensions dim𝐗r=dim(𝐋r×𝔽¯n1d)\dim\mathbf{{X}}_{r}=\dim(\mathbf{{L}}_{r}\times\overline{\mathbb{F}}^{n_{1}-d}), or equivalently, codim𝐗r=n1dim𝐗r=ddim𝐋r=codim𝐋𝐋r\operatorname{codim}\mathbf{{X}}_{r}=n_{1}-\dim\mathbf{{X}}_{r}=d-\dim\mathbf{{L}}_{r}=\operatorname{codim}_{\mathbf{{L}}}\mathbf{{L}}_{r}, as claimed.

Let rr achieve the minimum in Fact 3.5. This implies that GR(T)=r+codim𝐗r\operatorname{GR}(T)=r+\operatorname{codim}\mathbf{{X}}_{r}. By Theorem 3.4,

SR¯(T)\displaystyle\overline{\operatorname{SR}}(T) =SR(𝐋)2r+codim𝐋𝐋r=2r+codim𝐗r\displaystyle=\operatorname{SR}(\mathbf{{L}})\leq 2r+\operatorname{codim}_{\mathbf{{L}}}\mathbf{{L}}_{r}=2r+\operatorname{codim}\mathbf{{X}}_{r}
=2GR(T)codim𝐗r2GR(T).\displaystyle=2\operatorname{GR}(T)-\operatorname{codim}\mathbf{{X}}_{r}\leq 2\operatorname{GR}(T).

Assuming further that 𝔽\mathbb{F} is a perfect field and using Proposition 3.2, we finally obtain the bound SR(T)32SR¯(T)3GR(T)\operatorname{SR}(T)\leq\frac{3}{2}\overline{\operatorname{SR}}(T)\leq 3\operatorname{GR}(T), as desired. ∎

4 Geometric rank vs. analytic rank

Our main result in this section gives an essentially tight upper bound on the geometric rank in terms of the analytic rank.

Proposition 4.1.

For any 33-tensor TT over any finite field 𝔽\mathbb{F},

AR(T)(1log|𝔽|2)GR(T).\operatorname{AR}(T)\geq(1-\log_{|\mathbb{F}|}2)\operatorname{GR}(T).

4.1 Schwartz-Zippel meet Bézout

We will need a certain generalized version of the classical Schwartz-Zippel lemma that applies to varieties. We note that there are various generalized versions of the Schwartz-Zippel lemma appearing in the literature (e.g., Lemma 14 in [6], Claim 7.2 in [13], Lemma A.3 in [15]). However, in our version below the bound goes down exponentially with the codimension of the variety as soon as the field is larger than the degrees of the polynomials cutting out the variety, which is crucial for proving Proposition 4.1.

We use the notation 𝐕(𝔽):=𝐕𝔽n\mathbf{{V}}(\mathbb{F}):=\mathbf{{V}}\cap\mathbb{F}^{n} for any variety 𝐕𝔽¯n\mathbf{{V}}\subseteq\overline{\mathbb{F}}^{n} defined over 𝔽\mathbb{F}. Recall that a variety 𝐕=𝕍(f1,,fm)\mathbf{{V}}=\mathbb{V}(f_{1},\ldots,f_{m}) is said to be cut out by the polynomials f1,,fmf_{1},\ldots,f_{m}.

Lemma 4.2 (Schwartz-Zippel for varieties).

Let 𝔽\mathbb{F} be a finite field. For any variety 𝐕𝔽¯n\mathbf{{V}}\subseteq\overline{\mathbb{F}}^{n} cut out by polynomials of degrees at most dd,

|𝐕(𝔽)||𝔽|n(d|𝔽|)codim𝐕.\frac{|\mathbf{{V}}(\mathbb{F})|}{|\mathbb{F}|^{n}}\leq\Big{(}\frac{d}{|\mathbb{F}|}\Big{)}^{\operatorname{codim}\mathbf{{V}}}.

We note that the classical Schwartz-Zippel lemma is recovered as the special case of Lemma 4.2 where 𝐕\mathbf{{V}} is cut out by a single polynomial pp. Indeed, in this case, Lemma 4.2 says that if pp is a non-zero polynomial, meaning codim𝐕=1\operatorname{codim}\mathbf{{V}}=1, then |𝐕(𝔽)|/|𝔽|nd/|𝔽||\mathbf{{V}}(\mathbb{F})|/|\mathbb{F}|^{n}\leq d/|\mathbb{F}|.

Let 𝐕0\mathbf{{V}}^{0} denote the union of the 0-dimensional irreducible components of a variety 𝐕\mathbf{{V}}. Note that 𝐕0\mathbf{{V}}^{0} is a finite set. For the proof of Lemma 4.2 we will use the overdetermined case of Bézout’s inequality, which provides an upper bound on |𝐕0||\mathbf{{V}}^{0}| (see [39], Theorem 5).

Fact 4.3 (Overdetermined Bézout’s inequality).

Let 𝐕=𝕍(f1,,fm)𝕂n\mathbf{{V}}=\mathbb{V}(f_{1},\ldots,f_{m})\subseteq\mathbb{K}^{n} be a variety, for an algebraically closed field 𝕂\mathbb{K}, cut out by mnm\geq n polynomials. Write degf1degfm1\deg f_{1}\geq\cdots\geq\deg f_{m}\geq 1. Then

|𝐕0|i=1ndegfi.|\mathbf{{V}}^{0}|\leq\prod_{i=1}^{n}\deg f_{i}.

The degree of an equidimensional101010All irreducible components have the same dimension. variety 𝐕𝕂n\mathbf{{V}}\subseteq\mathbb{K}^{n}, denoted deg𝐕\deg\mathbf{{V}}, is the cardinality of the intersection of 𝐕\mathbf{{V}} with a generic linear subspace in 𝕂n\mathbb{K}^{n} of dimension codim𝐕\operatorname{codim}\mathbf{{V}} (a well-defined, finite number). The degree of an arbitrary variety 𝐕\mathbf{{V}} is the sum of the degrees of its irreducible components. The proof of Lemma 4.2 will “bootstrap” the following generalization of the Schwartz-Zippel lemma.

Fact 4.4 ([6],[13]).

Let 𝔽\mathbb{F} be a finite field. For any variety 𝐕\mathbf{{V}} over 𝔽¯\overline{\mathbb{F}},

|𝐕(𝔽)|deg𝐕|𝔽|dim𝐕.|\mathbf{{V}}(\mathbb{F})|\leq\deg\mathbf{{V}}\cdot|\mathbb{F}|^{\dim\mathbf{{V}}}.
Proof of Lemma 4.2.

We claim that the following inequality holds assuming 𝐕\mathbf{{V}} is equidimensional;

deg𝐕dcodim𝐕.\deg\mathbf{{V}}\leq d^{\operatorname{codim}\mathbf{{V}}}.

Suppose 𝐕\mathbf{{V}} is cut out by mm polynomials of degree at most dd. Note that mcodim𝐕m\geq\operatorname{codim}\mathbf{{V}}. Consider the variety obtained by intersecting 𝐕\mathbf{{V}} with a generic linear subspace in 𝔽¯n\overline{\mathbb{F}}^{n} of dimension codim𝐕\operatorname{codim}\mathbf{{V}}, and observe that it can be embedded as a variety 𝐖𝔽¯n0\mathbf{{W}}\subseteq\overline{\mathbb{F}}^{n_{0}} with n0=codim𝐕n_{0}=\operatorname{codim}\mathbf{{V}}. Then 𝐖\mathbf{{W}} satisfies the following properties:

  • dim𝐖=0\dim\mathbf{{W}}=0,

  • 𝐖\mathbf{{W}} is cut out by mm polynomials of degree at most dd.

In particular, and similarly to the above, mn0m\geq n_{0}. It follows that

deg𝐕=|𝐖|=|𝐖0|dn0=dcodim𝐕,\deg\mathbf{{V}}=|\mathbf{{W}}|=|\mathbf{{W}}^{0}|\leq d^{n_{0}}=d^{\operatorname{codim}\mathbf{{V}}},

where the first equality is by the definition of deg𝐕\deg\mathbf{{V}}, the second equality uses 𝐖=𝐖0\mathbf{{W}}=\mathbf{{W}}^{0} as dim𝐖=0\dim\mathbf{{W}}=0, and the inequality applies Fact 4.3 since we are in the overdetermined case mn0m\geq n_{0}.

To finish the proof, we need to remove the assumption in the above inequality that 𝐕\mathbf{{V}} is equidimensional. This is immediate using the standard technique of replacing the polynomials cutting out 𝐕\mathbf{{V}} with generic linear combinations thereof, giving a variety 𝐕𝐕\mathbf{{V^{\prime}}}\supseteq\mathbf{{V}} over 𝔽¯\overline{\mathbb{F}} with dim𝐕=dim𝐕\dim\mathbf{{V^{\prime}}}=\dim\mathbf{{V}} that is an equidimensional set-theoretic complete intersection (see, e.g., Kollar [29], Theorem 1.5).111111Alternatively, this can be done by defining an appropriate variant of degree for varieties that are not equidimensional. Indeed, starting with a single polynomial (which trivially cuts out an equidimensional variety), adding each such linear combination pp, that is, intersecting with the hypersurface 𝐇p\mathbf{{H}}_{p} corresponding to pp, reduces by one the dimension of every irreducible component 𝐗\mathbf{{X}} (since 𝐗𝐇p\mathbf{{X}}\nsubseteq\mathbf{{H}}_{p} as long as 𝐗𝐕\mathbf{{X}}\nsubseteq\mathbf{{V}}). Now, apply Fact 4.4 to obtain

|𝐕(𝔽)||𝔽|n|𝐕(𝔽)||𝔽|ndcodim𝐕|𝔽|dim𝐕|𝔽|n=(d|𝔽|)codim𝐕,\frac{|\mathbf{{V}}(\mathbb{F})|}{|\mathbb{F}|^{n}}\leq\frac{|\mathbf{{V^{\prime}}}(\mathbb{F})|}{|\mathbb{F}|^{n}}\leq\frac{d^{\operatorname{codim}\mathbf{{V}}}|\mathbb{F}|^{\dim\mathbf{{V}}}}{|\mathbb{F}|^{n}}=\Big{(}\frac{d}{|\mathbb{F}|}\Big{)}^{\operatorname{codim}\mathbf{{V}}},

as desired. ∎

4.2 Putting everything together

We now deduce the desired bound relating GR\operatorname{GR} and AR\operatorname{AR}. We will use the following well known characterization of AR\operatorname{AR}; we include a proof for completeness.

Fact 4.5.

For any 33-tensor T𝔽n1×n2×n3T\in\mathbb{F}^{n_{1}\times n_{2}\times n_{3}} over any finite field 𝔽\mathbb{F},

AR(T)=log|𝔽|Pr𝐱,𝐲[f(𝐱,𝐲)=𝟎]\operatorname{AR}(T)=-\log_{|\mathbb{F}|}\Pr_{\mathbf{{x}},\mathbf{{y}}}[f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{0}}]

where f:𝔽n1×𝔽n2𝔽n3f\colon\mathbb{F}^{n_{1}}\times\mathbb{F}^{n_{2}}\to\mathbb{F}^{n_{3}} is the bilinear map corresponding to TT.

Proof.

Write f=(f1,,fn3)f=(f_{1},\ldots,f_{n_{3}}), so that T(𝐱,𝐲,𝐳)=k=1n3fk(𝐱,𝐲)zkT(\mathbf{{x}},\mathbf{{y}},\mathbf{{z}})=\sum_{k=1}^{n_{3}}f_{k}(\mathbf{{x}},\mathbf{{y}})z_{k}. Denote bias(T)=𝔼𝐱,𝐲,𝐳χ(T(𝐱,𝐲,𝐳))\operatorname{bias}(T)=\operatorname*{\mathbb{E}}_{\mathbf{{x}},\mathbf{{y}},\mathbf{{z}}}\chi(T(\mathbf{{x}},\mathbf{{y}},\mathbf{{z}})), where χ\chi is an arbitrary, nontrivial additive character of 𝔽\mathbb{F}, so that AR(T)=log|𝔽|bias(T)\operatorname{AR}(T)=-\log_{|\mathbb{F}|}\operatorname{bias}(T). Since ff is bilinear, we have bias(T)=Pr𝐱,𝐲[f(𝐱,𝐲)=𝟎]\operatorname{bias}(T)=\Pr_{\mathbf{{x}},\mathbf{{y}}}[f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{0}}]; indeed,

bias(T)\displaystyle\operatorname{bias}(T) =𝔼𝐱,𝐲,𝐳χ(T(𝐱,𝐲,𝐳))=𝔼𝐱,𝐲,𝐳χ(kfk(𝐱,𝐲)zk)\displaystyle=\operatorname*{\mathbb{E}}_{\mathbf{{x}},\mathbf{{y}},\mathbf{{z}}}\chi(T(\mathbf{{x}},\mathbf{{y}},\mathbf{{z}}))=\operatorname*{\mathbb{E}}_{\mathbf{{x}},\mathbf{{y}},\mathbf{{z}}}\chi\Big{(}\sum_{k}f_{k}(\mathbf{{x}},\mathbf{{y}})z_{k}\Big{)}
=𝔼𝐱,𝐲𝔼𝐳kχ(fk(𝐱,𝐲)zk)=𝔼𝐱,𝐲k𝔼z𝔽χ(fk(𝐱,𝐲)z)\displaystyle=\operatorname*{\mathbb{E}}_{\mathbf{{x}},\mathbf{{y}}}\operatorname*{\mathbb{E}}_{\mathbf{{z}}}\prod_{k}\chi(f_{k}(\mathbf{{x}},\mathbf{{y}})z_{k})=\operatorname*{\mathbb{E}}_{\mathbf{{x}},\mathbf{{y}}}\prod_{k}\operatorname*{\mathbb{E}}_{z\in\mathbb{F}}\chi(f_{k}(\mathbf{{x}},\mathbf{{y}})z)
=𝔼𝐱,𝐲k[fk(𝐱,𝐲)=0]=𝔼𝐱,𝐲[f(𝐱,𝐲)=𝟎]\displaystyle=\operatorname*{\mathbb{E}}_{\mathbf{{x}},\mathbf{{y}}}\prod_{k}[f_{k}(\mathbf{{x}},\mathbf{{y}})=0]=\operatorname*{\mathbb{E}}_{\mathbf{{x}},\mathbf{{y}}}[f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{0}}]
=Pr𝐱,𝐲[f(𝐱,𝐲)=𝟎],\displaystyle=\Pr_{\mathbf{{x}},\mathbf{{y}}}[f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{0}}],

where [][\cdot] is the Iverson bracket. This completes the proof. ∎

Proof of Proposition 4.1.

Suppose T𝔽n1×n2×n3T\in\mathbb{F}^{n_{1}\times n_{2}\times n_{3}}. Put 𝐕=ker(T)𝔽¯N\mathbf{{V}}=\ker(T)\subseteq\overline{\mathbb{F}}^{N} with N=n1+n2N=n_{1}+n_{2}. By Lemma 4.2, |𝐕(𝔽)|/|𝔽|N(2/|𝔽|)codim𝐕|\mathbf{{V}}(\mathbb{F})|/|\mathbb{F}|^{N}\leq(2/|\mathbb{F}|)^{\operatorname{codim}\mathbf{{V}}}. Using Fact 4.5, it follows that

AR(T)=log|𝔽||𝐕(𝔽)||𝔽|Ncodim𝐕(1log|𝔽|2).\operatorname{AR}(T)=-\log_{|\mathbb{F}|}\frac{|\mathbf{{V}}(\mathbb{F})|}{|\mathbb{F}|^{N}}\geq\operatorname{codim}\mathbf{{V}}\cdot(1-\log_{|\mathbb{F}|}2).

As GR(T)=codim𝐕\operatorname{GR}(T)=\operatorname{codim}\mathbf{{V}}, we are done. ∎

We are finally ready to combine our various bounds and obtain the main result.

Proof of Theorem 1.

The first inequality is given by Theorem 3.1. The second inequality follows from Proposition 4.1 for any finite 𝔽𝔽2\mathbb{F}\neq\mathbb{F}_{2}, since

GR(T)\displaystyle\operatorname{GR}(T) (1log|𝔽|2)1AR(T)\displaystyle\leq(1-\log_{|\mathbb{F}|}2)^{-1}\operatorname{AR}(T)
(1log32)1AR(T)2.71AR(T).\displaystyle\leq(1-\log_{3}2)^{-1}\operatorname{AR}(T)\leq 2.71\operatorname{AR}(T).

We note that, as evident from the proof of Theorem 1, we in fact obtain the bounds SR(T)3GR(T)3(1+o|𝔽|(1))AR(T)\operatorname{SR}(T)\leq 3\operatorname{GR}(T)\leq 3(1+o_{|\mathbb{F}|}(1))\operatorname{AR}(T).

5 Some complexity results for bilinear maps

5.1 Rank vs. min-entropy

Recall that the min-entropy of a discrete random variable XX is

H(f)=minxlog21Pr[X=x].\operatorname{H_{\infty}}(f)=\min_{x}\,\log_{2}\frac{1}{\Pr[X=x]}.

With a slight abuse of notation, we define the min-entropy of a function X:ABX\colon A\to B, with AA and BB finite, in the same way (using the uniform measure):

H(X)=minbBlog21PraA[f(a)=b]=log2maxbB|X1(b)||A|.\operatorname{H_{\infty}}(X)=\min_{b\in B}\,\log_{2}\frac{1}{\Pr_{a\in A}[f(a)=b]}=-\log_{2}\max_{b\in B}\frac{|X^{-1}(b)|}{|A|}.

Note that we have the trivial bounds 0H(X)log2|B|0\leq\operatorname{H_{\infty}}(X)\leq\log_{2}|B|, where the lower bounds holds when XX is constant and the upper bound when XX is |A|/|B||A|/|B|-to-11.

Recall that SR(f)\operatorname{SR}(f) denotes the slice rank of the 33-tensor corresponding to ff (which can be thought of as the “oracle complexity” of ff, where the oracle produces any desired, arbitrarily hard matrices). Towards the proof of Proposition 1.1, we first deduce from Theorem 1 a tight relation between slice rank and min-entropy for the class of bilinear maps.

Proposition 5.1.

For any bilinear map f:𝔽n×𝔽n𝔽nf\colon\mathbb{F}^{n}\times\mathbb{F}^{n}\to\mathbb{F}^{n} over any finite field 𝔽𝔽2\mathbb{F}\neq\mathbb{F}_{2},

SR(f)=Θ(H(f)log2|𝔽|).\operatorname{SR}(f)=\Theta\Big{(}\frac{\operatorname{H_{\infty}}(f)}{\log_{2}|\mathbb{F}|}\Big{)}.
Proof.

As ff is bilinear, we claim that max𝐛Pr𝐚[f(𝐚)=𝐛]=Pr𝐚[f(𝐚)=𝟎]\max_{\mathbf{{b}}}\Pr_{\mathbf{{a}}}[f(\mathbf{{a}})=\mathbf{{b}}]=\Pr_{\mathbf{{a}}}[f(\mathbf{{a}})=\mathbf{{0}}]. Indeed, this follows from the fact that f(𝐱,𝐲)f(\mathbf{{x}},\mathbf{{y}}) is a linear map for any fixed 𝐲\mathbf{{y}}, and thus for every 𝐛\mathbf{{b}},

Pr𝐱,𝐲[f(𝐱,𝐲)=𝐛]\displaystyle\Pr_{\mathbf{{x}},\mathbf{{y}}}[f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{b}}] =𝔼𝐲Pr𝐱[f(𝐱,𝐲)=𝐛]\displaystyle=\operatorname*{\mathbb{E}}_{\mathbf{{y}}}\Pr_{\mathbf{{x}}}[f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{b}}]
𝔼𝐲Pr𝐱[f(𝐱,𝐲)=𝟎]=Pr𝐱,𝐲[f(𝐱,𝐲)=𝟎].\displaystyle\leq\operatorname*{\mathbb{E}}_{\mathbf{{y}}}\Pr_{\mathbf{{x}}}[f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{0}}]=\Pr_{\mathbf{{x}},\mathbf{{y}}}[f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{0}}].

Therefore,

H(f)=min𝐛log2Pr𝐱,𝐲[f(𝐱,𝐲)=𝐛]=log2Pr𝐱,𝐲[f(𝐱,𝐲)=𝟎]=AR(f)log2|𝔽|\operatorname{H_{\infty}}(f)=\min_{\mathbf{{b}}}-\log_{2}\Pr_{\mathbf{{x}},\mathbf{{y}}}[f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{b}}]=-\log_{2}\Pr_{\mathbf{{x}},\mathbf{{y}}}[f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{0}}]\\ =\operatorname{AR}(f)\log_{2}|\mathbb{F}|

where the last equality uses Fact 4.5. We deduce using Theorem 1 that

SR(f)=Θ(AR(f))=Θ(H(f)/log2|𝔽|),\operatorname{SR}(f)=\Theta(\operatorname{AR}(f))=\Theta(\operatorname{H_{\infty}}(f)/\log_{2}|\mathbb{F}|),

as desired. ∎

Proof of Proposition 1.1.

Note that, almost directly from the definitions, C(f)nSR(T)\operatorname{C}^{*}(f)\leq n\operatorname{SR}(T). The desired bound therefore follows from Proposition 5.1,

C(f)nSR(T)=O(nH(f)/log2|𝔽|).\operatorname{C}^{*}(f)\leq n\operatorname{SR}(T)=O(n\operatorname{H_{\infty}}(f)/\log_{2}|\mathbb{F}|).

Below we show that our bound is in fact an equality (up to a constant) for almost every bilinear map. Let f:𝔽n×𝔽n𝔽nf\colon\mathbb{F}^{n}\times\mathbb{F}^{n}\to\mathbb{F}^{n} be a uniformly random bilinear map. We have C(f)=Θ(n2)\operatorname{C}^{*}(f)=\Theta(n^{2}), since the tensor rank of the corresponding tensor is Θ(n2)\Theta(n^{2}), which is equal to Θ(C(f))\Theta(\operatorname{C}^{*}(f)) for any 𝔽\mathbb{F} large enough, as shown by Strassen [38] (see also [5]). Thus, we next show that H(f)=Θ(nlog2|𝔽|)\operatorname{H_{\infty}}(f)=\Theta(n\log_{2}|\mathbb{F}|). Observe that if L:𝔽n𝔽nL\colon\mathbb{F}^{n}\to\mathbb{F}^{n} is a uniformly random linear map then for any 𝟎𝐲𝔽n\mathbf{{0}}\neq\mathbf{{y}}\in\mathbb{F}^{n} we have that L(𝐲)L(\mathbf{{y}}) is uniformly random in 𝔽n\mathbb{F}^{n}. Fix 𝟎𝐲𝟎𝔽n\mathbf{{0}}\neq\mathbf{{y_{0}}}\in\mathbb{F}^{n}. Then for each component fi(𝐱,𝐲)=:𝐱TAi𝐲f_{i}(\mathbf{{x}},\mathbf{{y}})=:\mathbf{{x}}^{T}A_{i}\mathbf{{y}} of ff we have that fi(𝐱,𝐲𝟎)=𝐱T(Ai𝐲𝟎)f_{i}(\mathbf{{x}},\mathbf{{y_{0}}})=\mathbf{{x}}^{T}(A_{i}\mathbf{{y_{0}}}) is a uniformly random linear form in 𝐱\mathbf{{x}}. Moreover, these nn linear forms f1(𝐱,𝐲𝟎),,fn(𝐱,𝐲𝟎)f_{1}(\mathbf{{x}},\mathbf{{y_{0}}}),\ldots,f_{n}(\mathbf{{x}},\mathbf{{y_{0}}}) are independent. It follows that f(𝐱,𝐲𝟎):𝔽n𝔽nf(\mathbf{{x}},\mathbf{{y_{0}}})\colon\mathbb{F}^{n}\to\mathbb{F}^{n} is a uniformly random linear map. Thus, f(𝐱,𝐲𝟎)f(\mathbf{{x}},\mathbf{{y_{0}}}) is a bijection. We conclude that |f1(𝟎)|=(𝟎𝐲𝔽n1)+|𝔽|n=2|𝔽|n1|f^{-1}(\mathbf{{0}})|=\big{(}\sum_{\mathbf{{0}}\neq\mathbf{{y}}\in\mathbb{F}^{n}}1\big{)}+|\mathbb{F}|^{n}=2|\mathbb{F}|^{n}-1 (and |f1(𝐛)|=|𝔽|n1|f^{-1}(\mathbf{{b}})|=|\mathbb{F}|^{n}-1 for any 𝐛0\mathbf{{b}}\neq 0). Therefore, H(f)=log2(|𝔽|2n/|f1(𝟎)|)=Θ(log2(|𝔽|n))=Θ(nlog2|𝔽|)\operatorname{H_{\infty}}(f)=\log_{2}(|\mathbb{F}|^{2n}/|f^{-1}(\mathbf{{0}})|)=\Theta(\log_{2}(|\mathbb{F}|^{n}))=\Theta(n\log_{2}|\mathbb{F}|), as desired. ∎

5.2 Approximating bilinear maps

Recall that maps f,g:ABf,g\colon A\to B are said to be δ\delta-close if PraA[f(a)=g(a)]=δ\Pr_{a\in A}[f(a)=g(a)]=\delta. Let us recall the classical fact that, for any 𝖭𝖯{\sf{NP}}-complete function f:{0,1}{0,1}f\colon\{0,1\}^{*}\to\{0,1\}, say f=𝖲𝖠𝖳f={\sf{SAT}}, if ff can be computed in polynomial time on all but polynomially many inputs then in fact ff can be computed in polynomial time. Phrased differently, if g:{0,1}{0,1}g\colon\{0,1\}^{*}\to\{0,1\} is such that gn:=g|{0,1}ng_{n}:=g|_{\{0,1\}^{n}} is δ\delta-close to fn:=f|{0,1}nf_{n}:=f|_{\{0,1\}^{n}} with δ=1𝗉𝗈𝗅𝗒(n)/2n\delta=1-{\sf poly}(n)/2^{n} then g𝖯g\in{\sf{P}} implies f𝖯f\in{\sf{P}}. What would be an optimal analogue of this basic fact when ff is coming from the class of bilinear maps? We note that this restriction is already a radical change of regime. For example, the Schwartz-Zippel lemma implies that if two distinct degree-dd forms are δ\delta-close then necessarily δd/|𝔽|\delta\leq d/|\mathbb{F}|. In particular, an agreement that is close to 11, as in the example above, is impossible in the bilinear setting.

Here we prove Proposition 1.2, showing that it suffices to compute ff on a surprisingly small fraction of the inputs in order to be able to compute ff on all inputs. For example, this implies that if SR(g)=O(r)\operatorname{SR}(g)=O(r) and gg agrees with ff on merely an |𝔽|O(r)|\mathbb{F}|^{-O(r)}-fraction of the inputs, then already SR(f)=O(r)\operatorname{SR}(f)=O(r). As before, Theorem 1 supplies the precise bounds we need.

Proof of Proposition 1.2.

Since SR\operatorname{SR} is subadditive by definition, we have

SR(f)=SR(g+fg)SR(g)+SR(fg).\operatorname{SR}(f)=\operatorname{SR}(g+f-g)\leq\operatorname{SR}(g)+\operatorname{SR}(f-g).

By Theorem 1 we have

SR(fg)O(AR(fg)).\operatorname{SR}(f-g)\leq O(\operatorname{AR}(f-g)).

Write AR(fg)=log|𝔽|bias(fg)\operatorname{AR}(f-g)=-\log_{|\mathbb{F}|}\operatorname{bias}(f-g) and bias(fg)=Pr𝐱,𝐲[(fg)(𝐱,𝐲)=𝟎]=δ\operatorname{bias}(f-g)=\Pr_{\mathbf{{x}},\mathbf{{y}}}[(f-g)(\mathbf{{x}},\mathbf{{y}})=\mathbf{{0}}]=\delta. Combining the above inequalities gives

SR(f)SR(g)SR(fg)O(AR(fg))=O(log|𝔽|(1/δ)).\operatorname{SR}(f)-\operatorname{SR}(g)\leq\operatorname{SR}(f-g)\leq O(\operatorname{AR}(f-g))=O(\log_{|\mathbb{F}|}(1/\delta)).

By symmetry, the same bound holds when interchanging ff and gg, which proves the desired bound.

Finally, it remains to see that our bound is sharp, for any value of SR(f),SR(g)\operatorname{SR}(f),\operatorname{SR}(g). Let r,tr,t be positive integers satisfying t=Θ(r)t=\Theta(r), and let nr+tn\geq r+t. Let f,g:𝔽n×𝔽n𝔽nf,g\colon\mathbb{F}^{n}\times\mathbb{F}^{n}\to\mathbb{F}^{n} be the bilinear maps

f(𝐱,𝐲)=(x1y1,,xryr,0,,0)f(\mathbf{{x}},\mathbf{{y}})=(x_{1}y_{1},\ldots,x_{r}y_{r},0,\ldots,0)

and

g(𝐱,𝐲)=(0,,0,xr+1yr+1,,xr+tyr+t,0,,0).g(\mathbf{{x}},\mathbf{{y}})=(0,\ldots,0,x_{r+1}y_{r+1},\ldots,x_{r+t}y_{r+t},0,\ldots,0).

Recall that for an identity tensor ImI_{m} we have SR(Im)=m\operatorname{SR}(I_{m})=m (see, e.g., [35]) and AR(Im)=Θ(m)\operatorname{AR}(I_{m})=\Theta(m). On the one hand, |SR(f)SR(g)|=|SR(Ir)SR(It)|=|rt|=Θ(r)|\operatorname{SR}(f)-\operatorname{SR}(g)|=|\operatorname{SR}(I_{r})-\operatorname{SR}(I_{t})|=|r-t|=\Theta(r). On the other hand,

δ\displaystyle\delta =Pr𝐱,𝐲[f(𝐱,𝐲)=g(𝐱,𝐲)]\displaystyle=\Pr_{\mathbf{{x}},\mathbf{{y}}}[f(\mathbf{{x}},\mathbf{{y}})=g(\mathbf{{x}},\mathbf{{y}})]
=bias(fg)=bias(f)bias(g)=|𝔽|Θ(r)+Θ(t).\displaystyle=\operatorname{bias}(f-g)=\operatorname{bias}(f)\cdot\operatorname{bias}(g)=|\mathbb{F}|^{\Theta(r)+\Theta(t)}.

Therefore, log|𝔽|(1/δ)=Θ(r+t)=Θ(r)\log_{|\mathbb{F}|}(1/\delta)=\Theta(r+t)=\Theta(r) as well, completing the proof. ∎

Corollary 5.2.

Let 𝔽𝔽2\mathbb{F}\neq\mathbb{F}_{2} be a finite field. Any two bilinear maps f,g:𝔽n𝔽mf,g\colon\mathbb{F}^{n}\to\mathbb{F}^{m} that are δ\delta-close satisfy

C(f)O((SR(g)+log|𝔽|(1/δ))n).\operatorname{C}^{*}(f)\leq O((\operatorname{SR}(g)+\log_{|\mathbb{F}|}(1/\delta))n).

6 Discussion and open questions

Several problems are left open by the results in this paper. Of course, it would be interesting to extend our methods to higher-order tensors. It would also be interesting to see other instantiations of classical results of theoretical computer science in the settings of bilinear, or more generally, low-degree polynomial maps. It would be satisfying to extend our main result, Theorem 1, to 𝔽2\mathbb{F}_{2}. As of now, the best bound over 𝔽2\mathbb{F}_{2} remains SR(T)O(AR(T)4)\operatorname{SR}(T)\leq O(\operatorname{AR}(T)^{4}), and we wonder whether it might be that a linear upper bound simply does not hold 𝔽2\mathbb{F}_{2}.

Finally, it remains open to determine the best possible constant CC such that SR(T)CGR(T)\operatorname{SR}(T)\leq C\cdot\operatorname{GR}(T). Let us show below that C3/2C\geq 3/2. Over any field 𝔽\mathbb{F}, let T𝔽3×3×3T\in\mathbb{F}^{3\times 3\times 3} denote the Levi-Civita tensor T=(εi,j,k)i,j,kT=(\varepsilon_{i,j,k})_{i,j,k}. In other words, the trilinear form corresponding to TT is the 33-by-33 determinant polynomial,

T(𝐱,𝐲,𝐳)=det(x1x2x3y1y2y3z1z2z3).T(\mathbf{{x}},\mathbf{{y}},\mathbf{{z}})=\det\begin{pmatrix}x_{1}&x_{2}&x_{3}\\ y_{1}&y_{2}&y_{3}\\ z_{1}&z_{2}&z_{3}\end{pmatrix}.

We will show that GR(T)=2\operatorname{GR}(T)=2 and SR(T)=3\operatorname{SR}(T)=3, giving the bound CSR(T)/GR(T)=3/2C\geq\operatorname{SR}(T)/\operatorname{GR}(T)=3/2. To compute GR(T)\operatorname{GR}(T), observe that the bilinear map f:𝔽3×𝔽3𝔽3f\colon\mathbb{F}^{3}\times\mathbb{F}^{3}\to\mathbb{F}^{3} corresponding to TT is f(𝐱,𝐲)=𝐱×𝐲f(\mathbf{{x}},\mathbf{{y}})=\mathbf{{x}}\times\mathbf{{y}}, that is, the cross product of the vectors 𝐱,𝐲𝔽3\mathbf{{x}},\mathbf{{y}}\in\mathbb{F}^{3}. Therefore, (𝐱,𝐲)kerf(\mathbf{{x}},\mathbf{{y}})\in\ker f if and only if 𝐱×𝐲=𝟎\mathbf{{x}}\times\mathbf{{y}}=\mathbf{{0}}, that is, 𝐱\mathbf{{x}} and 𝐲\mathbf{{y}} are linearly dependent. We deduce that GR(T)=codimkerf=2\operatorname{GR}(T)=\operatorname{codim}\ker f=2, or equivalently dimkerf=4\dim\ker f=4, since 𝐲𝔽3\mathbf{{y}}\in\mathbb{F}^{3} is completely determined by 𝐱𝔽3\mathbf{{x}}\in\mathbb{F}^{3} together with a scalar multiple in 𝔽\mathbb{F}. To compute SR(T)\operatorname{SR}(T), observe that xiyjzkx_{i}y_{j}z_{k} is a monomial of T(𝐱,𝐲,𝐳)T(\mathbf{{x}},\mathbf{{y}},\mathbf{{z}}) if and only if i,j,k[3]i,j,k\in[3] are all distinct. Let S={(i,j,k)[3]3xiyjzk is in the support of T}S=\{(i,j,k)\in[3]^{3}\mid x_{i}y_{j}z_{k}\text{ is in the support of }T\}. Observe that SS forms an antichain; indeed, i+j+k=6i+j+k=6 is constant for all (i,j,k)S(i,j,k)\in S. Thus, by Proposition 4 in [35], SR(T)\operatorname{SR}(T) is equal to the vertex cover number of SS when viewed as a (33-partite) 33-uniform hypergraph. Since each vertex of the hypergraph SS has degree exactly 22, it follows that any vertex cover has at least 3!/23!/2 vertices. We deduce that SR(T)=3\operatorname{SR}(T)=3.

One can actually obtain an infinite family of 33-tensors with a similar ratio, implying that SR(T)/GR(T)\operatorname{SR}(T)/\operatorname{GR}(T) does not drop below 3/23/2 even for large tensors. For any kk\in\mathbb{N}, let Tk𝔽3k×3k×3kT_{k}\in\mathbb{F}^{3k\times 3k\times 3k} be the kk-fold direct sum of TT with itself. We have GR(Tk)=kGR(T)=2k\operatorname{GR}(T_{k})=k\cdot\operatorname{GR}(T)=2k by the additivity of GR\operatorname{GR} with respect to direct sums (see Lemma 4.3 in [30]). Moreover, we have SR(Tk)=kSR(T)=3k\operatorname{SR}(T_{k})=k\cdot\operatorname{SR}(T)=3k since the hypergraph corresponding to the support of TkT_{k} is a disjoint union of copies of the hypergraph corresponding to the support of TT, and thus is also a 22-regular antichain. Therefore, any vertex cover has at least 6k/26k/2 vertices, implying that SR(Tk)=3k\operatorname{SR}(T_{k})=3k.

Let us end by noting a curious analogy between GR/SR\operatorname{GR}/\operatorname{SR} and two other notions of rank for 33-tensors, commutative rank/non-commutative rank. It is known that non-commutative rank is at most twice the commutative rank, which interestingly matches the constant 22 in our Theorem 3.1. Moreover, just like in this paper, constructions were given that witness a 3/23/2 lower bound, and it was conjectured that 3/23/2 might be the correct constant [16]. However, this was recently refuted by Derksen and Makam [12], whose construction achieves a ratio that is arbitrarily close to 22. It would be interesting to understand whether there is a deeper analogy between these two pairs of ranks!

Note added in proof.

Since the submission (November 6, 2020) to the 53rd ACM Symposium on Theory of Computing, bounds for higher-order tensors over large enough fields were obtained by the authors, using considerably more intricate arguments [10]. Additionally, similar results to the ones in the current paper for 33-tensors were obtained by Adiprasito, Kazhdan, and Ziegler ([1], February 6, 2021).121212Their proof relies in part on a generalization of an argument found in a number-theoretic paper of Schmidt [37]; in the case of 33-tensors, Schmidt’s argument gives an inequality analogous to SR(T)O(GR(T))\operatorname{SR}(T)\leq O(\operatorname{GR}(T)) over the complex numbers.

Acknowledgments

We thank the anonymous reviewers for a careful reading and useful suggestions.

References

  • [1] K. Adiprasito, D. Kazhdan, and T. Ziegler, On the Schmidt and analytic ranks for trilinear forms, \hrefhttps://arxiv.org/abs/2102.03659arXiv:2102.03659 (2021).
  • [2] J. Alman and V. V. Williams, A refined Laser Method and faster matrix multiplication, \hrefhttps://arxiv.org/abs/2010.05846arXiv:2010.05846 (2020).
  • [3] M. D. Atkinson and S. Lloyd, Large spaces of matrices of bounded rank, Q. J. Math. 31 (1980), 253–262.
  • [4] A. Bhowmick and S. Lovett, Bias vs structure of polynomials in large fields, and applications in effective algebraic geometry and coding theory, \hrefhttps://arxiv.org/abs/arXiv:1506.02047arXiv:1506.02047 (2015).
  • [5] M. Bläser, Fast Matrix Multiplication, Theory of Computing Library 5 (2013), 1–60.
  • [6] B. Bukh and J. Tsimerman, Sum-product estimates for rational functions, Proc. Lond. Math. Soc. 104 (2012), 1–26.
  • [7] W. Bruns and U. Vetter, Determinantal rings. Lecture Notes in Mathematics. 1327, Springer-Verlag, 1988.
  • [8] J. Cai and D. Sivakumar, The resolution of a Hartmanis conjecture, 36th ACM Symposium on Foundations of Computer Science (FOCS 1995), 362–373.
  • [9] J. Cai and D. Sivakumar, Sparse hard sets for P: resolution of a conjecture of Hartmanis, J. Comput. Syst. Sci. 58 (1999), 280–296.
  • [10] A. Cohen and G. Moshkovitz, Partition and analytic rank are equivalent over large fields, Duke Math. J. (to appear), \hrefhttps://arxiv.org/abs/2102.10509arXiv:2102.10509 (2021).
  • [11] H. Derksen, The G-stable rank for tensors, \hrefhttps://arxiv.org/abs/2002.08435arXiv:2002.08435 (2020).
  • [12] H. Derksen and V. Makam On non-commutative rank and tensor rank, Linear Multilinear Algebra 66 (2018), 1069–1084.
  • [13] Z. Dvir, K. János and S. Lovett, Variety evasive sets, Comput. Complexity 23 (2014), 345–362.
  • [14] J. Edmonds, Systems of distinct representatives and linear algebra, J. Res. Natl. Bur. Stand. 71 (1967), 241–245
  • [15] J. S. Ellenberg, R. Oberlin and T. Tao, The Kakeya set and maximal conjectures for algebraic varieties over finite fields, Mathematika 56 (2010), 1–25.
  • [16] M. Fortin and C. Reutenauer, Commutative/noncommutative rank of linear matrices and subspaces of matrices of low rank, Sémin. Lothar. Comb. 52 (2004).
  • [17] S. Fortune, A note on sparse complete sets, SIAM J. Comput. 8 (1979), 431–433.
  • [18] A. Garg, L. Gurvits, R. Oliveira and A. Wigderson, Operator scaling: Theory and applications, Found. Comput. Math. 20 (2020), 223–290.
  • [19] W. T. Gowers and J. Wolf, Linear forms and higher-degree uniformity for functions on 𝔽pn\mathbb{F}^{n}_{p}, Geom. Funct. Anal. 21 (2011), 36–69.
  • [20] B. Green and T. Tao, The distribution of polynomials over finite fields, with applications to the Gowers norms, Contrib. Discrete Math. 4 (2009), 1–36.
  • [21] J. A. Grochow, NP-hard sets are not sparse unless P=NP: An exposition of a simple proof of Mahaney’s Theorem, with applications, \hrefhttps://arxiv.org/abs/1610.05825arXiv:1610.05825 (2016).
  • [22] E. Haramaty and A. Shpilka, On the structure of cubic and quartic polynomials, 42nd ACM Symposium on Theory of Computing (STOC 2010), 331–340.
  • [23] J. Harris, Algebraic geometry: A first course, Springer-Verlag, 1992.
  • [24] J. Hartmanis, N .Immerman and V. Sewelson, Sparse Sets in 𝖭𝖯{\sf{NP}}-𝖯{\sf{P}}: 𝖤𝖷𝖯𝖳𝖨𝖬𝖤{\sf{EXPTIME}} versus 𝖭𝖤𝖷𝖯𝖳𝖨𝖬𝖤{\sf{NEXPTIME}}, Inf. Control. 65 (1985), 158–181.
  • [25] O. Janzer, Polynomial bound for the partition rank vs the analytic rank of tensors, Discrete Anal. 7 (2020), 1–18.
  • [26] T. Kaufman and S. Lovett, Worst case to average case reductions for polynomials, 49th ACM Symposium on Foundations of Computer Science (FOCS 2008), 166–175.
  • [27] D. Kazhdan and T. Ziegler, Approximate cohomology, Selecta Mathematica 24 (2018), 499–509.
  • [28] D. Kazhdan and T. Ziegler, Properties of high rank subvarieties of affine spaces, Geom. Funct. Anal. 30 (2020), 1063–1096.
  • [29] J. Kollar, Sharp effective Nullstellensatz, J. Am. Math. Soc. 1 (1988), 963–975.
  • [30] S. Kopparty , G. Moshkovitz, and J. Zuiddam, Geometric rank of tensors and subrank of matrix multiplication, 35th Computational Complexity Conference (CCC 2020) 35, 1–21.
  • [31] S. Lovett, The analytic rank of tensors and its applications, Discrete Anal. 7 (2019), 1–10.
  • [32] S. R. Mahaney, Sparse complete sets for 𝖭𝖯{\sf{NP}}: Solution of a conjecture of Berman and Hartmanis, J. Comput. System Sci. 25 (1982), 130–143.
  • [33] L. Milićević, Polynomial bound for partition rank in terms of analytic rank, Geom. Funct. Anal. 29 (2019), 1503–1530.
  • [34] M. Ogiwara and O. Watanabe, On polynomial time bounded truth-table reducibility of 𝖭𝖯{\sf{NP}} sets to sparse sets, SIAM J. Comput. 20 (1991), 471–483.
  • [35] W. Sawin and T. Tao, Notes on the “slice rank” of tensors, \hrefhttps://terrytao.wordpress.com/2016/08/24/notes-on-the-slice-rank-of-tensors/https://terrytao.wordpress.com/2016/08/24/notes-on-the-slice-rank-of-tensors/ (2016).
  • [36] I. R. Shafarevich, Basic algebraic geometry 1, Springer-Verlag (third edition), 2013.
  • [37] W. M. Schmidt, The density of integer points on homogeneous varieties, Acta Math. 154 (1985), 243–296.
  • [38] V. Strassen, Vermeidung von divisionen, J. Reine Angew. Math. 264 (1973), 184–202.
  • [39] T. Tao, Bezout’s inequality, \hrefhttps://terrytao.wordpress.com/2011/03/23/bezouts-inequality/https://terrytao.wordpress.com/2011/03/23/bezouts-inequality/ (2011).
  • [40] T. Tao, A symmetric formulation of the Croot-Lev-Pach-Ellenberg-Gijswijt capset bound, \hrefhttps://terrytao.wordpress.com/2016/05/18/a-symmetric-formulation-of-the-croot-lev-pach-ellenberg-gijswijt-capset-boundhttps://terrytao.wordpress.com/2016/05/18/a-symmetric-formulation-of-the-croot-lev-pach-ellenberg-gijswijt-capset-bound (2016).
  • [41] L. Valiant, The complexity of computing the permanent, Theor. Comput. Sci. 8 (1979), 189–201.
{dajauthors}{authorinfo}

[alex] Alex Cohen
Massachusetts Institute of Technology
Cambridge, MA, USA
alexcoh\imageatmit\imagedotedu
\urlhttps://math.mit.edu/ alexcoh {authorinfo}[guy] Guy Moshkovitz
Department of Mathematics
City University of New York (Baruch College)
New York, NY, USA
guymoshkov\imageatgmail\imagedotcom
\urlhttps://sites.google.com/view/guy-moshkovitz