This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Strassen 2×22\times 2 Matrix Multiplication from a 3-dimensional Volume Form

Benoit Jacob
AMD
benoit.jacob@amd.com
Abstract

The Strassen 2×22\times 2 matrix multiplication algorithm arises from the volume form on the 3-dimensional quotient space of the 2×22\times 2 matrices by the multiples of identity.

1 Introduction

Strassen’s 2×22\times 2 matrix multiplication algorithm [1] is a formula for multiplying 2×22\times 2 matrices aa and bb:

ab=tr(a)tr(b)I+i=16tr(Xia)tr(Yib)Ziab=\mathrm{tr}(a)\mathrm{tr}(b)I+\sum_{i=1}^{6}\mathrm{tr}(X_{i}a)\mathrm{tr}(Y_{i}b)Z_{i} (1)

where II is the identity matrix, tr\mathrm{tr} is the trace, and the Xi,Yi,ZiX_{i},Y_{i},Z_{i} are constant matrices. This formula is a rank 7 decomposition of the matrix multiplication tensor, that is, a decomposition of matrix multiplication into a sum of 7 simple tensors.

This may be applied recursively to multiply n×nn\times n matrices in O(nlog7/log2)O(n^{\log 7/\log 2}) time, approximately O(n2.81)O(n^{2.81}), opening a research field to which the book [2] provides an introduction. One line of research has focused on further improving this asymptotic complexity, notably [3], [4] achieving O(n2.376)O(n^{2.376}) and several refinements, recently [5] and [6], approaching O(n2.37)O(n^{2.37}). Another line of research has pursued decompositions of matrix multiplication tensors for other small matrix sizes such as 3×33\times 3, 4×44\times 4, etc. These have often involved numerical searches, such as the recent [7].

Despite these advances, matrix multiplication algorithms faster than O(n2.81)O(n^{2.81}) are “almost never implemented” [8], and practical evaluations such as [9] have continued favoring the Strassen 2×22\times 2 algorithm. Known algorithms for other small matrix sizes struggle to significantly improve on it: the up-to-date table [10] shows complexity exponents clustering around 2.8. For 4×44\times 4 matrix multiplication, the Strassen algorithm has tensor rank 72=497^{2}=49, and that remained the state of the art for over 55 years until [7] and [11] lowered that from 49 to 48, achieving a complexity exponent of 2.792.79. Moreover, known algorithms with substantially lower asymptotic complexity tend to have large constants in the OO, as discussed in [12].

The Strassen 2×22\times 2 algorithm also stands out from a theoretical perspective: its tensor rank 7 is known to be optimal [13] and it is known to be essentially unique under that constraint [14]. By contrast, the tensor rank of n×nn\times n matrix multiplication is still unknown for all n3n\geq 3. For n=3n=3, it is still only known to be between 19 and 23, see [15].

Optimality and uniqueness make the Strassen 2×22\times 2 algorithm a basic fact of 2-dimensional linear algebra. Such facts are expected to be simple and geometric. However, the original statement and proof of the Strassen algorithm are calculations on matrix coefficients. This has motivated a quest for geometric interpretations. The recent [16] and [8] in particular were inspirational to the present article, and [8] contains a survey of this endeavour, tracing it back to the years following the publication of the original Strassen article [1]. Other recent articles in this line of research include [17], [18], [19] and [20].

The present article offers a geometric interpretation of the Strassen algorithm by addressing a more general question: is the Strassen algorithm an independent fact in multilinear algebra, or could it be related to a known fact? We derive it from the expansion of a 3-dimensional volume form into an antisymmetrized sum of 3!=63!=6 simple tensors. That expansion follows from the one-dimensionality of the space of antisymmetric nn-forms, which is an abstract version of Cavalieri’s principle, the idea that the volume of a solid is unchanged by sliding parallel slices. As to the question of why specifically 2×22\times 2 matrices, the answer is that as matrix multiplication is a tensor of order 3 on matrix spaces, interpreting it as a volume form requires a 3-dimensional matrix space, and the specific case of 2×22\times 2 matrices gives us such a 3-dimensional matrix space by taking the quotient by multiples of the identity matrix: 3=2213=2^{2}-1.

Acknowledgements. The author would like to thank Paolo d’Alberto and Zach Garvey at AMD for helpful comments.

2 Overview

Fix, for this entire article, a 2-dimensional vector space VV over a field kk. Let L(V)\mathrm{L}(V) denote the space of linear maps from VV to itself. Start by considering this trilinear form gg on L(V)\mathrm{L}(V):

g(a1,a2,a3)=tr(a1a2a3)tr(a3a2a1).g(a_{1},a_{2},a_{3})=\mathrm{tr}(a_{1}a_{2}a_{3})-\mathrm{tr}(a_{3}a_{2}a_{1}). (2)

We notice (Lemma 12) that gg is a volume form on the quotient of L(V)\mathrm{L}(V) by the multiples of the identity matrix, which has dimension 3. This gives (Lemma 13) rank 6 decompositions of gg parametrized by bases of the dual space. Our next step is to relate gg to this other trilinear form hh on L(V)\mathrm{L}(V):

h(a1,a2,a3)=tr(a1)tr(a2)tr(a3)tr(a1a2a3).h(a_{1},a_{2},a_{3})=\mathrm{tr}(a_{1})\mathrm{tr}(a_{2})\mathrm{tr}(a_{3})-\mathrm{tr}(a_{1}a_{2}a_{3}). (3)

Using the natural isomorphism L(V)3L(V3)\mathrm{L}(V)^{*\otimes 3}\simeq\mathrm{L}(V^{\otimes 3}), view the trilinear forms tr(a1)tr(a2)tr(a3)\mathrm{tr}(a_{1})\mathrm{tr}(a_{2})\mathrm{tr}(a_{3}), tr(a1a2a3)\mathrm{tr}(a_{1}a_{2}a_{3}) and tr(a3a2a1)\mathrm{tr}(a_{3}a_{2}a_{1}) as respectively the permutations id\mathrm{id}, (123) and (321) permuting the terms in V3V^{\otimes 3} (Lemma 8). This allows viewing hh as the composition of gg with a linear map induced by the permutation (321) (Lemma 16), which allows transporting certain rank 6 decompositions of gg into rank 6 decompositions of hh (Proposition 18, our main result), yielding (Corollary 20)

tr(a1a2a3)=tr(a1)tr(a2)tr(a3){rank 6 decomposition of h},\mathrm{tr}(a_{1}a_{2}a_{3})=\mathrm{tr}(a_{1})\mathrm{tr}(a_{2})\mathrm{tr}(a_{3})-\textrm{\{rank 6 decomposition of $h$\}}, (4)

which is a rank 7 decomposition of tr(a1a2a3)\mathrm{tr}(a_{1}a_{2}a_{3}). Dualizing that yields a rank 7 decomposition of matrix multiplication (Corollary 21) parametrized by a choice of basis. A specific choice yields the original Strassen algorithm (Corollary 22).

3 Terminology and lemmas in tensor algebra

Throughout this article, “vector space” means finite-dimensional vector space. For any vector spaces UU and WW over a field kk, let L(U,W)\mathrm{L}(U,W) denote the space of linear maps from UU to WW. In the case W=UW=U, we write L(U)\mathrm{L}(U) for L(U,U)\mathrm{L}(U,U). In the case W=kW=k, we let U=L(U,k)U^{*}=\mathrm{L}(U,k) denote the dual space of UU.

Given any vector spaces U1,,UnU_{1},\ldots,U_{n}, W1,,WnW_{1},\ldots,W_{n}, we will make the identification

L(U1,W1)L(Un,Wn)L(U1Un,W1Wn).\mathrm{L}(U_{1},W_{1})\otimes...\otimes\mathrm{L}(U_{n},W_{n})\simeq\mathrm{L}(U_{1}\otimes...\otimes U_{n},W_{1}\otimes...\otimes W_{n}).

As special cases of that, for any vector space UU over kk, for any positive integer nn, we identify L(U)nL(Un)\mathrm{L}(U)^{\otimes n}\simeq\mathrm{L}(U^{\otimes n}) and Un(Un)U^{*\otimes n}\simeq(U^{\otimes n})^{*}. The latter identification means concretely that given linear forms μ1,,μn\mu_{1},\ldots,\mu_{n} on a vector space UU, we identify the tensor μ1μn\mu_{1}\otimes\ldots\otimes\mu_{n} as the nn-linear form on UU given, for all vectors u1,,unu_{1},\ldots,u_{n} in UU, by:

(μ1μn)(u1,,un)=μ1(u1)μn(un).(\mu_{1}\otimes\ldots\otimes\mu_{n})(u_{1},\ldots,u_{n})=\mu_{1}(u_{1})\ldots\mu_{n}(u_{n}).

Let us now describe a few other natural isomorphisms of tensor spaces that we will keep as named isomorphisms, refraining from making identifications.

Definition 1.

For any vector space UU, define linear maps ι\iota, ι\iota^{*} and *:

ι\displaystyle\iota\; :UUL(U),\displaystyle:\;U\otimes U^{*}\rightarrow\mathrm{L}(U), vλ\displaystyle v\otimes\lambda ι(vλ)=(uλ(u)v)\displaystyle\mapsto\iota(v\otimes\lambda)=(u\mapsto\lambda(u)v)
ι\displaystyle\iota^{*}\; :UUL(U),\displaystyle:\;U\otimes U^{*}\rightarrow\mathrm{L}(U)^{*}, vλ\displaystyle v\otimes\lambda ι(vλ)=(aλ(a(v)))\displaystyle\mapsto\iota^{*}(v\otimes\lambda)=(a\mapsto\lambda(a(v)))
\displaystyle*\; :L(U)L(U),\displaystyle:\;\mathrm{L}(U)\rightarrow\mathrm{L}(U)^{*}, a\displaystyle a a=(btr(ab)).\displaystyle\mapsto a^{*}=(b\mapsto\mathrm{tr}(ab)).
Lemma 2.

The linear maps ι\iota, ι\iota^{*} and * are isomorphisms.

Proof.

These maps are injective, and when UU has dimension nn, the source and destination spaces have the same dimension n2n^{2}. ∎

Lemma 3.

For any vector space UU, for any u,vu,v in UU and any λ,μ\lambda,\mu in UU^{*}, we have

ι(vλ)ι(uμ)\displaystyle\iota(v\otimes\lambda)\iota(u\otimes\mu) =λ(u)ι(vμ),\displaystyle=\lambda(u)\iota(v\otimes\mu), (5)
ι(vλ)(ι(uμ))\displaystyle\iota^{*}(v\otimes\lambda)(\iota(u\otimes\mu)) =λ(u)μ(v),\displaystyle=\lambda(u)\mu(v), (6)
tr(ι(vλ))\displaystyle\mathrm{tr}(\iota(v\otimes\lambda)) =λ(v).\displaystyle=\lambda(v). (7)
Proof.

For any ww in UU, we have ι(vλ)ι(uμ)(w)=ι(vλ)(μ(w)u)=λ(u)μ(w)v=λ(u)ι(vμ)(w)\iota(v\otimes\lambda)\iota(u\otimes\mu)(w)=\iota(v\otimes\lambda)(\mu(w)u)=\lambda(u)\mu(w)v=\lambda(u)\iota(v\otimes\mu)(w), establishing Equation (5). We have ι(vλ)(ι(uμ))=λ(ι(uμ)(v))=λ(μ(v)u)=μ(v)λ(u)\iota^{*}(v\otimes\lambda)(\iota(u\otimes\mu))=\lambda(\iota(u\otimes\mu)(v))=\lambda(\mu(v)u)=\mu(v)\lambda(u), establishing Equation (6). Let w1,,wnw_{1},\ldots,w_{n} be a basis of UU such that w1=vw_{1}=v. In that basis, the matrix of ι(vλ)\iota(v\otimes\lambda) is (λ(v)λ(w2)λ(wn)000),\left(\begin{smallmatrix}\lambda(v)&\lambda(w_{2})&\cdots&\lambda(w_{n})\\ 0&0&\cdots&0\\ \cdots&\cdots&\cdots&\cdots\end{smallmatrix}\right), whose trace is λ(v)\lambda(v), establishing Equation (7). ∎

Lemma 4.

For any vector space UU, the following diagram commutes, justifying the notation ι\iota^{*}.

UU{U\otimes U^{*}}L(U){\mathrm{L}(U)}L(U){\mathrm{L}(U)^{*}}ι\scriptstyle{\iota}ι\scriptstyle{\iota^{*}}\scriptstyle{*} (8)
Proof.

The claim is that for all vλv\otimes\lambda in UUU\otimes U^{*}, we have ι(vλ)=ι(vλ)\iota^{*}(v\otimes\lambda)=\iota(v\otimes\lambda)^{*} as elements of L(U)\mathrm{L}(U)^{*}. By linearity, it is enough to check that these forms in L(U)\mathrm{L}(U)^{*} agree on rank one elements of L(U)\mathrm{L}(U), which are the ι(uμ)\iota(u\otimes\mu) with uu in UU and μ\mu in UU^{*}. Indeed, the equations from Lemma 3 give:

ι(vλ)(ι(uμ))\displaystyle\iota^{*}(v\otimes\lambda)(\iota(u\otimes\mu)) =λ(u)μ(v)\displaystyle=\lambda(u)\mu(v) by Equation (6)
=tr(λ(u)ι(vμ))\displaystyle=\mathrm{tr}(\lambda(u)\iota(v\otimes\mu)) by Equation (7)
=tr(ι(vλ)ι(uμ))\displaystyle=\mathrm{tr}(\iota(v\otimes\lambda)\iota(u\otimes\mu)) by Equation (5)
=ι(vλ)(ι(uμ))\displaystyle=\iota(v\otimes\lambda)^{*}(\iota(u\otimes\mu)) by Definition 1. \displaystyle\;\qed
Definition 5.

For any vector space UU and any element aa of L(U)\mathrm{L}(U), let La,RaL_{a},R_{a} denote respectively the left and right multiplication-by-aa maps: La(b)=abL_{a}(b)=ab and Ra(b)=baR_{a}(b)=ba for all bb in L(U)\mathrm{L}(U). Thus LaL_{a} and RaR_{a} are elements of L(L(U))\mathrm{L}(\mathrm{L}(U)).

Lemma 6.

For any vector space UU over a field kk and any a,ba,b in L(U)\mathrm{L}(U), we have

(ab)=aLb=bRa(ab)^{*}=a^{*}\circ L_{b}=b^{*}\circ R_{a}

where \circ denotes the composition of linear maps L(U)L(U)k\mathrm{L}(U)\rightarrow\mathrm{L}(U)\rightarrow k.

Proof.

For any cc in L(U)\mathrm{L}(U), we have

(ab)(c)=tr(abc)=tr(aLb(c))=a(Lb(c))=(aLb)(c)(ab)^{*}(c)=\mathrm{tr}(abc)=\mathrm{tr}(aL_{b}(c))=a^{*}(L_{b}(c))=(a^{*}\circ L_{b})(c)

and similarly

(ab)(c)=tr(abc)=tr(bca)=tr(bRa(c))=b(Ra(c))=(bRa)(c).(ab)^{*}(c)=\mathrm{tr}(abc)=\mathrm{tr}(bca)=\mathrm{tr}(bR_{a}(c))=b^{*}(R_{a}(c))=(b^{*}\circ R_{a})(c).\qed
Definition 7.

For any vector space UU, for any permutation σ\sigma in the symmetric group S3S_{3}, define a map tσt_{\sigma} in L(U3)\mathrm{L}(U^{\otimes 3}) by letting, for all u1,u2,u3u_{1},u_{2},u_{3} in UU,

tσ(u1u2u3)=uσ(1)uσ(2)uσ(3).t_{\sigma}(u_{1}\otimes u_{2}\otimes u_{3})=u_{\sigma(1)}\otimes u_{\sigma(2)}\otimes u_{\sigma(3)}. (9)

The following lemma is classical and is encountered around Weyl invariant tensor theory ([21], [22]), which, among other things, establishes that the tσt_{\sigma} span the space of GL(U)\mathrm{GL}(U)-invariant tensors in L(U3)\mathrm{L}(U^{\otimes 3}). We will not need any of that theory, but it is still useful context.

Lemma 8.

For any vector space UU, the images of tidt_{\mathrm{id}}, t(123)t_{(123)}, t(321)t_{(321)} under the map ttt\mapsto t^{*} from Definition 1 are the following trilinear forms, given by their values at any a1a2a3a_{1}\otimes a_{2}\otimes a_{3} in L(U)3\mathrm{L}(U)^{\otimes 3}:

tid(a1a2a3)\displaystyle t_{\mathrm{id}}^{*}(a_{1}\otimes a_{2}\otimes a_{3}) =\displaystyle= tr(a1)tr(a2)tr(a3),\displaystyle\mathrm{tr}(a_{1})\mathrm{tr}(a_{2})\mathrm{tr}(a_{3}),
t(123)(a1a2a3)\displaystyle t_{(123)}^{*}(a_{1}\otimes a_{2}\otimes a_{3}) =\displaystyle= tr(a1a2a3),\displaystyle\mathrm{tr}(a_{1}a_{2}a_{3}),
t(321)(a1a2a3)\displaystyle t_{(321)}^{*}(a_{1}\otimes a_{2}\otimes a_{3}) =\displaystyle= tr(a3a2a1).\displaystyle\mathrm{tr}(a_{3}a_{2}a_{1}).
Proof.

By linearity, it is enough to prove these equations in the case where the aia_{i} are simple tensors of the form ai=ι(viλi)a_{i}=\iota(v_{i}\otimes\lambda_{i}) with viv_{i} in UU and λi\lambda_{i} in UU^{*}. Letting the dot (\cdot) denote multiplication in L(U3)\mathrm{L}(U^{\otimes 3}), for any permutation σ\sigma in S3S_{3}, we have

tσ(a1a2a3)\displaystyle t_{\sigma}^{*}(a_{1}\otimes a_{2}\otimes a_{3}) =\displaystyle= tr(tσ(ι(v1λ1)ι(v2λ2)ι(v3λ3)))\displaystyle\mathrm{tr}(t_{\sigma}\cdot(\iota(v_{1}\otimes\lambda_{1})\otimes\iota(v_{2}\otimes\lambda_{2})\otimes\iota(v_{3}\otimes\lambda_{3})))
=\displaystyle= tr(tσι(v1v2v3λ1λ2λ3))\displaystyle\mathrm{tr}(t_{\sigma}\cdot\iota(v_{1}\otimes v_{2}\otimes v_{3}\otimes\lambda_{1}\otimes\lambda_{2}\otimes\lambda_{3}))
=\displaystyle= tr(ι(tσ(v1v2v3)λ1λ2λ3))\displaystyle\mathrm{tr}(\iota(t_{\sigma}(v_{1}\otimes v_{2}\otimes v_{3})\otimes\lambda_{1}\otimes\lambda_{2}\otimes\lambda_{3}))
=\displaystyle= tr(ι(vσ(1)vσ(2)vσ(3)λ1λ2λ3))\displaystyle\mathrm{tr}(\iota(v_{\sigma(1)}\otimes v_{\sigma(2)}\otimes v_{\sigma(3)}\otimes\lambda_{1}\otimes\lambda_{2}\otimes\lambda_{3}))
=\displaystyle= λ1(vσ(1))λ2(vσ(2))λ3(vσ(3)).\displaystyle\lambda_{1}(v_{\sigma(1)})\lambda_{2}(v_{\sigma(2)})\lambda_{3}(v_{\sigma(3)}).

From here, the results follow for each of the three particular permutations σ\sigma being considered. ∎

In the next section, in the proof of our main result (Proposition 18), we will need the following Lemma 10, which is about composing the LtσL_{t_{\sigma}} with forms that are simple tensors. In the proof of Lemma 10, we will need this simpler lemma about evaluating the LtσL_{t_{\sigma}} on simple tensors:

Lemma 9.

For any vector space UU, for any u1,u2,u3u_{1},u_{2},u_{3} in UU, any ζ1,ζ2,ζ3\zeta_{1},\zeta_{2},\zeta_{3} in UU^{*}, and any permutation σ\sigma in S3S_{3}, we have the following equality between elements of L(U3)\mathrm{L}(U^{\otimes 3}):

Ltσ(i=1,2,3ι(uiζi))=i=1,2,3ι(uσ(i)ζi).L_{t_{\sigma}}\left(\bigotimes_{i=1,2,3}\iota(u_{i}\otimes\zeta_{i})\right)=\bigotimes_{i=1,2,3}\iota(u_{\sigma(i)}\otimes\zeta_{i}).
Proof.

We have

Ltσ(i=1,2,3ι(uiζi))\displaystyle L_{t_{\sigma}}\left(\bigotimes_{i=1,2,3}\iota(u_{i}\otimes\zeta_{i})\right) =tσi=1,2,3ι(uiζi)\displaystyle=t_{\sigma}\cdot\bigotimes_{i=1,2,3}\iota(u_{i}\otimes\zeta_{i})
=i=1,2,3ι(tσ(ui)ζi)\displaystyle=\bigotimes_{i=1,2,3}\iota(t_{\sigma}(u_{i})\otimes\zeta_{i})
=i=1,2,3ι(uσ(i)ζi).\displaystyle=\bigotimes_{i=1,2,3}\iota(u_{\sigma(i)}\otimes\zeta_{i}).\;\qed
Lemma 10.

For any vector space UU, for any v1,v2,v3v_{1},v_{2},v_{3} in UU, any λ1,λ2,λ3\lambda_{1},\lambda_{2},\lambda_{3} in UU^{*}, and any permutation σ\sigma in S3S_{3}, we have the following equality between elements of L(U3)\mathrm{L}(U^{\otimes 3})^{*}:

(i=1,2,3ι(viλi))Ltσ=i=1,2,3ι(viλσ1(i)).\left(\bigotimes_{i=1,2,3}\iota^{*}(v_{i}\otimes\lambda_{i})\right)\circ L_{t_{\sigma}}=\bigotimes_{i=1,2,3}\iota^{*}(v_{i}\otimes\lambda_{\sigma^{-1}(i)}).
Proof.

By linearity, it is enough to verify that both sides agree when evaluated on a rank one tensor of the form i=1,2,3ι(uiζi)\bigotimes_{i=1,2,3}\iota(u_{i}\otimes\zeta_{i}) for some uiu_{i} in UU and ζi\zeta_{i} in UU^{*}. We have:

((i=1,2,3ι(viλi))Ltσ)(i=1,2,3ι(uiζi))\displaystyle\left(\left(\bigotimes_{i=1,2,3}\iota^{*}(v_{i}\otimes\lambda_{i})\right)\circ L_{t_{\sigma}}\right)\left(\bigotimes_{i=1,2,3}\iota(u_{i}\otimes\zeta_{i})\right)
=(i=1,2,3ι(viλi))(i=1,2,3ι(uσ(i)ζi))\displaystyle=\left(\bigotimes_{i=1,2,3}\iota^{*}(v_{i}\otimes\lambda_{i})\right)\left(\bigotimes_{i=1,2,3}\iota(u_{\sigma(i)}\otimes\zeta_{i})\right) by Lemma 9
=i=1,2,3ι(viλi)(ι(uσ(i)ζi))\displaystyle=\prod_{i=1,2,3}\iota^{*}(v_{i}\otimes\lambda_{i})(\iota(u_{\sigma(i)}\otimes\zeta_{i}))
=i=1,2,3λi(uσ(i))ζi(vi)\displaystyle=\prod_{i=1,2,3}\lambda_{i}(u_{\sigma(i)})\zeta_{i}(v_{i}) by Equation (6)\displaystyle\text{by Equation (\ref{iota_pairing}})
=i=1,2,3λσ1(i)(ui)ζi(vi)\displaystyle=\prod_{i=1,2,3}\lambda_{\sigma^{-1}(i)}(u_{i})\zeta_{i}(v_{i}) by commutativity in kk
=i=1,2,3ι(viλσ1(i))(ι(uiζi))\displaystyle=\prod_{i=1,2,3}\iota^{*}(v_{i}\otimes\lambda_{\sigma^{-1}(i)})(\iota(u_{i}\otimes\zeta_{i})) by Equation (6)\displaystyle\text{by Equation (\ref{iota_pairing}})
=(i=1,2,3ι(viλσ1(i)))(i=1,2,3ι(uiζi)).\displaystyle=\left(\bigotimes_{i=1,2,3}\iota^{*}(v_{i}\otimes\lambda_{\sigma^{-1}(i)})\right)\left(\bigotimes_{i=1,2,3}\iota(u_{i}\otimes\zeta_{i})\right).

4 Main results

Let us return to the 2-dimensional vector space VV over a field kk that we had fixed in the overview. Let II denote the identity in L(V)\mathrm{L}(V).

Definition 11.

Let Q=L(V)/kIQ=\mathrm{L}(V)/kI denote the quotient vector space of L(V)\mathrm{L}(V) by the scalar multiples of identity.

Note that dimQ=(dimV)21=3\dim Q=(\dim V)^{2}-1=3. The dual QQ^{*} is identified with the subspace of L(V)\mathrm{L}(V)^{*} consisting of those forms μ\mu that satisfy μ(I)=0\mu(I)=0.

Lemma 12.

The trilinear form gg on L(V)\mathrm{L}(V) is antisymmetric and passes to the quotient QQ, inducing a volume form on QQ.

Proof.

The antisymmetry follows from the definition of gg in Equation (2) and the cyclic property of the trace, tr(a1a2a3)=tr(a2a3a1)\mathrm{tr}(a_{1}a_{2}a_{3})=\mathrm{tr}(a_{2}a_{3}a_{1}). The claim about passing to the quotient is that for a1,a2,a3a_{1},a_{2},a_{3} in L(V)\mathrm{L}(V), if any of the aia_{i} is a scalar multiple of identity, then g(a1,a2,a3)=0g(a_{1},a_{2},a_{3})=0. This is verified directly, for instance if a3=Ia_{3}=I then g(a1,a2,a3)=tr(a1a2)tr(a2a1)=0g(a_{1},a_{2},a_{3})=\mathrm{tr}(a_{1}a_{2})-\mathrm{tr}(a_{2}a_{1})=0. Finally, as dimQ=3\dim Q=3, antisymmetric 3-forms on QQ are volume forms on QQ. ∎

Lemma 13.

For any basis (μ1,μ2,μ3)(\mu_{1},\mu_{2},\mu_{3}) of QQ^{*}, letting ε(σ)\varepsilon(\sigma) denote the signature of a permutation σ\sigma, there exists a scalar α\alpha such that

g=ασS3ε(σ)i=1,2,3μσ(i).g=\alpha\sum_{\sigma\in S_{3}}\varepsilon(\sigma)\bigotimes_{i=1,2,3}\mu_{\sigma(i)}. (10)
Proof.

As the space of volume forms on QQ is one-dimensional, and by Lemma 12 we already know that gg is a volume form on QQ, it is enough to check that the right-hand side is antisymmetric. That is true by construction, that expression being known as an antisymmetrized tensor product. ∎

Remark 14.

The constant α\alpha in Lemma 13 can be computed by picking any c1,c2,c3c_{1},c_{2},c_{3} in L(V)\mathrm{L}(V) such that g(c1,c2,c3)=1g(c_{1},c_{2},c_{3})=1 and using Equation (10) as a definition of α1\alpha^{-1}:

α1=σS3ε(σ)i=1,2,3μσ(i)(ci).\alpha^{-1}=\sum_{\sigma\in S_{3}}\varepsilon(\sigma)\prod_{i=1,2,3}\mu_{\sigma(i)}(c_{i}). (11)
Lemma 15.

The following equalities hold between trilinear forms on L(V)\mathrm{L}(V):

g\displaystyle g =\displaystyle= t(123)t(321),\displaystyle t_{(123)}^{*}-t_{(321)}^{*}, (12)
h\displaystyle h =\displaystyle= tidt(123).\displaystyle t_{\mathrm{id}}^{*}-t_{(123)}^{*}. (13)
Proof.

This follows readily from Lemma 8 and the definitions of gg and hh in Equations (2, 3). ∎

Lemma 16.

The following equality holds between forms in L(V3)\mathrm{L}(V^{\otimes 3})^{*}:

h=gLt(321).h=g\circ L_{t_{(321)}}.
Proof.

We have

h\displaystyle h =(tidt(123))\displaystyle=(t_{\mathrm{id}}-t_{(123)})^{*} by Equation (13)
=((t(123)t(321))t(321))\displaystyle=((t_{(123)}-t_{(321)})\cdot t_{(321)})^{*}
=(t(123)t(321))Lt(321)\displaystyle=(t_{(123)}-t_{(321)})^{*}\circ L_{t_{(321)}} by Lemma 6
=gLt(321)\displaystyle=g\circ L_{t_{(321)}} by Equation (12).∎

While Lemma 13 allowed arbitrary linear forms μi\mu_{i}, Proposition 18 will need to restrict to rank one forms, meaning the ι(vλ)\iota^{*}(v\otimes\lambda) for vv in VV and λ\lambda in VV^{*}. The necessity of that restriction is discussed in Remark 19.

Lemma 17.

For i=1,2,3i=1,2,3, let viv_{i} be a nonzero vector in VV, let λi\lambda_{i} be a nonzero linear form on VV such that λi(vi)=0\lambda_{i}(v_{i})=0, and let μi=ι(viλi)\mu_{i}=\iota^{*}(v_{i}\otimes\lambda_{i}). The following conditions are equivalent:

  1. 1.

    The vectors v1,v2,v3v_{1},v_{2},v_{3} are pairwise noncolinear: ijvispan(vj)i\neq j\Rightarrow v_{i}\not\in\mathrm{span}(v_{j}).

  2. 2.

    The forms λ1,λ2,λ3\lambda_{1},\lambda_{2},\lambda_{3} are pairwise noncolinear: ijλispan(λj)i\neq j\Rightarrow\lambda_{i}\not\in\mathrm{span}(\lambda_{j}).

  3. 3.

    The forms μ1,μ2,μ3\mu_{1},\mu_{2},\mu_{3} are linearly independent.

  4. 4.

    The family (μ1,μ2,μ3)(\mu_{1},\mu_{2},\mu_{3}) is a basis of QQ^{*}.

Proof.

34\ref{muindep}\Leftrightarrow\ref{mubasis} holds because the hypothesis λi(vi)=0\lambda_{i}(v_{i})=0 is equivalent to μiQ\mu_{i}\in Q^{*}, and dimQ=3\dim Q^{*}=3. To prove the other implications, notice that for each ii, we have

span(vi)=ker(λi),\mathrm{span}(v_{i})=\ker(\lambda_{i}),

since λi(vi)=0\lambda_{i}(v_{i})=0 means that span(vi)ker(λi)\mathrm{span}(v_{i})\subset\ker(\lambda_{i}), and as dimV=2\dim V=2, we have dimker(λi)=1\dim\ker(\lambda_{i})=1 hence the inclusion is an equality. This means in particular that for each i, j,

vi,vj are colinear  λi,λj are colinear  μi,μj are colinear.\text{$v_{i},v_{j}$ are colinear $\Leftrightarrow$ $\lambda_{i},\lambda_{j}$ are colinear $\Leftrightarrow$ $\mu_{i},\mu_{j}$ are colinear}.

This readily proves the implications 312\ref{muindep}\Rightarrow\ref{vnoncol}\Leftrightarrow\ref{lnoncol}. Let us prove 13\ref{vnoncol}\Rightarrow\ref{muindep}. Suppose that there exists scalars αi\alpha_{i} such that iαiμi=0\sum_{i}\alpha_{i}\mu_{i}=0. As the μi\mu_{i} are nonzero, at most one of the αi\alpha_{i} can be zero. Thus, for some distinct indices i,j,li,j,l, we have αiμi=αjμj+αlμl\alpha_{i}\mu_{i}=\alpha_{j}\mu_{j}+\alpha_{l}\mu_{l} with αj0\alpha_{j}\neq 0 and αl0\alpha_{l}\neq 0. It follows that αjμj+αlμl\alpha_{j}\mu_{j}+\alpha_{l}\mu_{l} has rank at most one, so αjμj\alpha_{j}\mu_{j} and αlμl\alpha_{l}\mu_{l} are colinear, so μj\mu_{j} and μl\mu_{l} are colinear, so vjv_{j} and vlv_{l} are colinear. ∎

Proposition 18.

For any vectors v1,v2,v3v_{1},v_{2},v_{3} in VV and linear forms λ1,λ2,λ3\lambda_{1},\lambda_{2},\lambda_{3} on VV satisfying the equivalent conditions of Lemma 17, we have:

h=1λ1(v2)λ2(v3)λ3(v1)σS3ε(σ)i=1,2,3ι(vσ(i)λσ(123)(i)).h=\frac{-1}{\lambda_{1}(v_{2})\lambda_{2}(v_{3})\lambda_{3}(v_{1})}\sum_{\sigma\in S_{3}}\varepsilon(\sigma)\bigotimes_{i=1,2,3}\iota^{*}(v_{\sigma(i)}\otimes\lambda_{\sigma(123)(i)}). (14)
Proof.

Let us first explain why the denominator λ1(v2)λ2(v3)λ3(v1)\lambda_{1}(v_{2})\lambda_{2}(v_{3})\lambda_{3}(v_{1}) is nonzero. Because of condition 1 in Lemma 17, whenever iji\neq j, the vector vjv_{j} cannot belong to the one-dimensional space ker(λi)=span(vi)\ker(\lambda_{i})=\mathrm{span}(v_{i}), so λi(vj)0\lambda_{i}(v_{j})\neq 0, so λ1(v2)λ2(v3)λ3(v1)0\lambda_{1}(v_{2})\lambda_{2}(v_{3})\lambda_{3}(v_{1})\neq 0.

Let us now prove Equation (14) up to a scalar factor α\alpha. Let μi=ι(viλi)\mu_{i}=\iota^{*}(v_{i}\otimes\lambda_{i}). Lemma 17 says that the μi\mu_{i} form a basis of QQ^{*}, so we can apply Lemma 13 with that basis to obtain

g=ασS3ε(σ)i=1,2,3ι(vσ(i)λσ(i))g=\alpha\sum_{\sigma\in S_{3}}\varepsilon(\sigma)\bigotimes_{i=1,2,3}\iota^{*}(v_{\sigma(i)}\otimes\lambda_{\sigma(i)})

for some scalar α\alpha. Lemma 16 transforms that into

h=ασS3ε(σ)(i=1,2,3ι(vσ(i)λσ(i)))Lt(321),h=\alpha\sum_{\sigma\in S_{3}}\varepsilon(\sigma)\left(\bigotimes_{i=1,2,3}\iota^{*}(v_{\sigma(i)}\otimes\lambda_{\sigma(i)})\right)\circ L_{t_{(321)}},

which Lemma 10 transforms into

h=ασS3ε(σ)i=1,2,3ι(vσ(i)λσ(123)(i)).h=\alpha\sum_{\sigma\in S_{3}}\varepsilon(\sigma)\bigotimes_{i=1,2,3}\iota^{*}(v_{\sigma(i)}\otimes\lambda_{\sigma(123)(i)}). (15)

There only remains to evaluate the scalar α\alpha. Let ai=ι(viλi)a_{i}=\iota(v_{i}\otimes\lambda_{i}). Notice that tr(ai)=λi(vi)=0\mathrm{tr}(a_{i})=\lambda_{i}(v_{i})=0, so

h(a1,a2,a3)=tr(a1a2a3)=λ1(v2)λ2(v3)λ3(v1).h(a_{1},a_{2},a_{3})=-tr(a_{1}a_{2}a_{3})=-\lambda_{1}(v_{2})\lambda_{2}(v_{3})\lambda_{3}(v_{1}). (16)

On the other hand, evaluating Equation (15) and simplifying that using Equation (6) yields

h(a1,a2,a3)=ασS3ε(σ)i=1,2,3λσ(123)(i)(vi)λi(vσ(i)).h(a_{1},a_{2},a_{3})=\alpha\sum_{\sigma\in S_{3}}\varepsilon(\sigma)\prod_{i=1,2,3}\lambda_{\sigma(123)(i)}(v_{i})\lambda_{i}(v_{\sigma(i)}). (17)

Since λi(vi)=0\lambda_{i}(v_{i})=0, the product in Equation (17) vanishes whenever σ\sigma has a fixed point or σ(123)\sigma(123) has a fixed point. Thus the only σ\sigma contributing to the sum is σ=(123)\sigma=(123). Thus, Equation (17) simplifies to

h(a1,a2,a3)=αi=1,2,3λ(321)(i)(vi)λi(v(123)(i)).h(a_{1},a_{2},a_{3})=\alpha\prod_{i=1,2,3}\lambda_{(321)(i)}(v_{i})\lambda_{i}(v_{(123)(i)}). (18)

further simplifying as

h(a1,a2,a3)=α(λ1(v2)λ2(v3)λ3(v1))2.h(a_{1},a_{2},a_{3})=\alpha(\lambda_{1}(v_{2})\lambda_{2}(v_{3})\lambda_{3}(v_{1}))^{2}.

Combining that with Equation (16) yields

α=1λ1(v2)λ2(v3)λ3(v1)\alpha=\frac{-1}{\lambda_{1}(v_{2})\lambda_{2}(v_{3})\lambda_{3}(v_{1})}\cdot\;\qed
Remark 19.

Two tensors p,qp,q in L(V)3\mathrm{L}(V)^{*\otimes 3} related to each other in the same way as gg and hh are related by Lemma 16, namely q=pLt(321)q=p\circ L_{t_{(321)}}, may still fail to have the same tensor rank if their tensor decompositions involve linear terms in L(V)\mathrm{L}(V)^{*} that are not of rank one.

Proof.

Consider the counterexample of p=t(123)p=t_{(123)}^{*} and q=tidq=t_{\mathrm{id}}^{*}. The same argument as in the proof of Lemma 16 yields q=pLt(321)q=p\circ L_{t_{(321)}}. As noted in Lemma 8, for a1,a2,a3a_{1},a_{2},a_{3} in L(V)\mathrm{L}(V), we have p(a1,a2,a3)=tr(a1a2a3)p(a_{1},a_{2},a_{3})=\mathrm{tr}(a_{1}a_{2}a_{3}) and q(a1a2a3)=tr(a1)tr(a2)tr(a3)q(a_{1}a_{2}a_{3})=\mathrm{tr}(a_{1})\mathrm{tr}(a_{2})\mathrm{tr}(a_{3}). Thus, as tensors in L(V)3\mathrm{L}(V)^{*\otimes 3}, qq has rank one but pp does not. ∎

To elaborate on the previous remark, the linear form atr(a)a\mapsto\mathrm{tr}(a) does not have rank one, so even though qq has rank one as a tensor of order 3 in L(V)3\mathrm{L}(V)^{*\otimes 3}, it does not have rank one as a tensor of order 6 in (VV)3(V\otimes V^{*})^{\otimes 3}, and our tool for transporting tensor decompositions, Lemma 10, applies to tensors of order 6 in (VV)3(V\otimes V^{*})^{\otimes 3}.

5 Strassen algorithms

Proposition 18 is already a form of Strassen’s algorithm, but that may be obscured by the tensor formalism, so let us derive a few more concrete statements as corollaries.

Corollary 20.

For any vectors v1,v2,v3v_{1},v_{2},v_{3} in VV and linear forms λ1,λ2,λ3\lambda_{1},\lambda_{2},\lambda_{3} on VV satisfying the equivalent conditions of Lemma 17, for all a1,a2,a3a_{1},a_{2},a_{3} in L(V)\mathrm{L}(V),

tr(a1a2a3)=tr(a1)tr(a2)tr(a3)+1λ1(v2)λ2(v3)λ3(v1)σS3ε(σ)i=1,2,3λσ(123)(i)(ai(vσ(i))).\mathrm{tr}(a_{1}a_{2}a_{3})=\mathrm{tr}(a_{1})\mathrm{tr}(a_{2})\mathrm{tr}(a_{3})\\ +\frac{1}{\lambda_{1}(v_{2})\lambda_{2}(v_{3})\lambda_{3}(v_{1})}\sum_{\sigma\in S_{3}}\varepsilon(\sigma)\prod_{i=1,2,3}\lambda_{\sigma(123)(i)}(a_{i}(v_{\sigma(i)})).
Proof.

Evaluating Equation (14) at any a1,a2,a3a_{1},a_{2},a_{3} in L(V)\mathrm{L}(V) gives:

tr(a1)tr(a2)tr(a3)tr(a1a2a3)=1λ1(v2)λ2(v3)λ3(v1)σS3ε(σ)i=1,2,3ι(vσ(i)λσ(123)(i))(ai)\mathrm{tr}(a_{1})\mathrm{tr}(a_{2})\mathrm{tr}(a_{3})-\mathrm{tr}(a_{1}a_{2}a_{3})=\\ \frac{-1}{\lambda_{1}(v_{2})\lambda_{2}(v_{3})\lambda_{3}(v_{1})}\sum_{\sigma\in S_{3}}\varepsilon(\sigma)\prod_{i=1,2,3}\iota^{*}(v_{\sigma(i)}\otimes\lambda_{\sigma(123)(i)})(a_{i})

and the result follows by Definition 1. ∎

Corollary 21.

For any vectors v1,v2,v3v_{1},v_{2},v_{3} in VV and linear forms λ1,λ2,λ3\lambda_{1},\lambda_{2},\lambda_{3} on VV satisfying the equivalent conditions of Lemma 17, for all a1,a2a_{1},a_{2} in L(V)\mathrm{L}(V),

a1a2=tr(a1)tr(a2)I+1λ1(v2)λ2(v3)λ3(v1)σS3ε(σ)tr(a1cσ(1),σ(2))tr(a2cσ(2),σ(3))cσ(3),σ(1)a_{1}a_{2}=\mathrm{tr}(a_{1})\mathrm{tr}(a_{2})I\\ +\frac{1}{\lambda_{1}(v_{2})\lambda_{2}(v_{3})\lambda_{3}(v_{1})}\sum_{\sigma\in S_{3}}\varepsilon(\sigma)\mathrm{tr}(a_{1}c_{\sigma(1),\sigma(2)})\mathrm{tr}(a_{2}c_{\sigma(2),\sigma(3)})c_{\sigma(3),\sigma(1)} (19)

where ci,jc_{i,j} in L(V)\mathrm{L}(V) is defined by ci,j(u)=λj(u)vic_{i,j}(u)=\lambda_{j}(u)v_{i} for all uu in VV.

Proof.

Let xx denote the right-hand side of Equation (19). The claim is that a1a2=xa_{1}a_{2}=x. That is equivalent to the claim that tr(a1a2a3)=tr(xa3)\mathrm{tr}(a_{1}a_{2}a_{3})=\mathrm{tr}(xa_{3}) for all a3a_{3} in L(V)\mathrm{L}(V). That claim is directly verified by comparing the expression of tr(a1a2a3)\mathrm{tr}(a_{1}a_{2}a_{3}) given by Corollary 20 to the expression of tr(xa3)\mathrm{tr}(xa_{3}) expanded by using the definition of xx, noting that ci,j=ι(viλj)c_{i,j}=\iota(v_{i}\otimes\lambda_{j}). ∎

Corollary 22.

The original Strassen algorithm is obtained by applying Corollary 21 to the vector space V=k2V=k^{2}, with the following choices: v1=(10)v_{1}=\left(\begin{smallmatrix}1\\ 0\end{smallmatrix}\right), λ1=(01)\lambda_{1}=\left(\begin{smallmatrix}0&1\end{smallmatrix}\right), v2=(01)v_{2}=\left(\begin{smallmatrix}0\\ 1\end{smallmatrix}\right), λ2=(10)\lambda_{2}=\left(\begin{smallmatrix}1&0\end{smallmatrix}\right), v3=(11)v_{3}=\left(\begin{smallmatrix}1\\ 1\end{smallmatrix}\right), λ3=(11)\lambda_{3}=\left(\begin{smallmatrix}1&-1\end{smallmatrix}\right).

Proof.

Applying Corollary 21, expanding the sum over all 6 permutations, and noticing that λ1(v2)λ2(v3)λ3(v1)=1\lambda_{1}(v_{2})\lambda_{2}(v_{3})\lambda_{3}(v_{1})=1, we obtain the following matrix multiplication algorithm: for any two 2×22\times 2 matrices a,ba,b,

ab\displaystyle ab =tr(a)tr(b)I\displaystyle=\mathrm{tr}(a)\mathrm{tr}(b)I
+tr(ac1,2)tr(bc2,3)c3,1\displaystyle+\mathrm{tr}(ac_{1,2})\mathrm{tr}(bc_{2,3})c_{3,1}
+tr(ac2,3)tr(bc3,1)c1,2\displaystyle+\mathrm{tr}(ac_{2,3})\mathrm{tr}(bc_{3,1})c_{1,2}
+tr(ac3,1)tr(bc1,2)c2,3\displaystyle+\mathrm{tr}(ac_{3,1})\mathrm{tr}(bc_{1,2})c_{2,3}
tr(ac2,1)tr(bc1,3)c3,2\displaystyle-\mathrm{tr}(ac_{2,1})\mathrm{tr}(bc_{1,3})c_{3,2}
tr(ac1,3)tr(bc3,2)c2,1\displaystyle-\mathrm{tr}(ac_{1,3})\mathrm{tr}(bc_{3,2})c_{2,1}
tr(ac3,2)tr(bc2,1)c1,3\displaystyle-\mathrm{tr}(ac_{3,2})\mathrm{tr}(bc_{2,1})c_{1,3}

where the ci,j=ι(viλj)=viλjc_{i,j}=\iota(v_{i}\otimes\lambda_{j})=v_{i}\lambda_{j} are:

c1,2=v1λ2=(1000),\displaystyle c_{1,2}=v_{1}\lambda_{2}=\left(\begin{matrix}1&0\\ 0&0\end{matrix}\right), c1,3=v1λ3=(1100),\displaystyle c_{1,3}=v_{1}\lambda_{3}=\left(\begin{matrix}1&-1\\ 0&0\end{matrix}\right),
c2,3=v2λ3=(0011),\displaystyle c_{2,3}=v_{2}\lambda_{3}=\left(\begin{matrix}0&0\\ 1&-1\end{matrix}\right), c2,1=v2λ1=(0001),\displaystyle c_{2,1}=v_{2}\lambda_{1}=\left(\begin{matrix}0&0\\ 0&1\end{matrix}\right),
c3,1=v3λ1=(0101),\displaystyle c_{3,1}=v_{3}\lambda_{1}=\left(\begin{matrix}0&1\\ 0&1\end{matrix}\right), c3,2=v3λ2=(1010).\displaystyle c_{3,2}=v_{3}\lambda_{2}=\left(\begin{matrix}1&0\\ 1&0\end{matrix}\right).

Let ai,ja^{i,j} and bi,jb^{i,j} denote the matrix coefficients, using superscript notation to distinguish that from the subscripts used to index the ci,jc_{i,j} matrices. Let ei,je_{i,j} be the elementary matrix with a 1 at position (i,j)(i,j) and zeros elsewhere. Using the above table of ci,jc_{i,j} matrices, the above equation expands to

ab\displaystyle ab =(a1,1+a2,2)(b1,1+b2,2)(e1,1+e2,2)\displaystyle=(a^{1,1}+a^{2,2})(b^{1,1}+b^{2,2})(e_{1,1}+e_{2,2})
+a1,1(b1,2b2,2)(e1,2+e2,2)\displaystyle+a^{1,1}(b^{1,2}-b^{2,2})(e_{1,2}+e_{2,2})
+(a1,2a2,2)(b2,1+b2,2)e1,1\displaystyle+(a^{1,2}-a^{2,2})(b^{2,1}+b^{2,2})e_{1,1}
+(a2,1+a2,2)b1,1(e2,1e2,2)\displaystyle+(a^{2,1}+a^{2,2})b^{1,1}(e_{2,1}-e_{2,2})
a2,2(b1,1b2,1)(e1,1+e2,1)\displaystyle-a^{2,2}(b^{1,1}-b^{2,1})(e_{1,1}+e_{2,1})
(a1,1a2,1)(b1,1+b1,2)e2,2\displaystyle-(a^{1,1}-a^{2,1})(b^{1,1}+b^{1,2})e_{2,2}
(a1,1+a1,2)b2,2(e1,1e1,2).\displaystyle-(a^{1,1}+a^{1,2})b^{2,2}(e_{1,1}-e_{1,2}).

These bilinear forms in the ai,ja^{i,j} and bi,jb^{i,j} are exactly the terms I, II, III, IV, V, VI, VII introduced in the original Strassen article [1]:

ab\displaystyle ab =I(e1,1+e2,2)\displaystyle=\mathrm{I}\cdot(e_{1,1}+e_{2,2})
+III(e1,2+e2,2)\displaystyle+\mathrm{III}\cdot(e_{1,2}+e_{2,2})
+VIIe1,1\displaystyle+\mathrm{VII}\cdot e_{1,1}
+II(e2,1e2,2)\displaystyle+\mathrm{II}\cdot(e_{2,1}-e_{2,2})
+IV(e1,1+e2,1)\displaystyle+\mathrm{IV}\cdot(e_{1,1}+e_{2,1})
+VIe2,2\displaystyle+\mathrm{VI}\cdot e_{2,2}
+V(e1,2e1,1).\displaystyle+\mathrm{V}\cdot(e_{1,2}-e_{1,1}).

Thus the coefficients of the product matrix abab are:

(ab)1,1\displaystyle(ab)^{1,1} =I+IVV+VII\displaystyle=\mathrm{I}+\mathrm{IV}-\mathrm{V}+\mathrm{VII}
(ab)1,2\displaystyle(ab)^{1,2} =III+V\displaystyle=\mathrm{III}+\mathrm{V}
(ab)2,1\displaystyle(ab)^{2,1} =II+IV\displaystyle=\mathrm{II}+\mathrm{IV}
(ab)2,2\displaystyle(ab)^{2,2} =III+III+VI\displaystyle=\mathrm{I}-\mathrm{II}+\mathrm{III}+\mathrm{VI}

exactly as originally stated by Strassen [1]. ∎

References

  • [1] V. Strassen, “Gaussian elimination is not optimal.” Numerische Mathematik, vol. 13, pp. 354–356, 1969. [Online]. Available: http://eudml.org/doc/131927
  • [2] P. Bürgisser, M. Clausen, and M. A. Shokrollahi, Algebraic Complexity Theory, 1st ed. Springer Publishing Company, Incorporated, 2010.
  • [3] V. Strassen, “Relative bilinear complexity and matrix multiplication.” Journal für die reine und angewandte Mathematik, vol. 1987, no. 375-376, pp. 406–443, 1987. [Online]. Available: https://doi.org/10.1515/crll.1987.375-376.406
  • [4] D. Coppersmith and S. Winograd, “Matrix multiplication via arithmetic progressions,” Journal of Symbolic Computation, vol. 9, no. 3, pp. 251–280, 1990, computational algebraic complexity editorial. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0747717108800132
  • [5] V. V. Williams, Y. Xu, Z. Xu, and R. Zhou, “New bounds for matrix multiplication: from alpha to omega,” 2023. [Online]. Available: https://arxiv.org/abs/2307.07970
  • [6] J. Alman, R. Duan, V. V. Williams, Y. Xu, Z. Xu, and R. Zhou, “More asymmetry yields faster matrix multiplication,” 2024. [Online]. Available: https://arxiv.org/abs/2404.16349
  • [7] A. Novikov, N. Vũ, M. Eisenberger, E. Dupont, P.-S. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. R. Ruiz, A. Mehrabian, M. P. Kumar, A. See, S. Chaudhuri, G. Holland, A. Davies, S. Nowozin, P. Kohli, and M. Balog, “AlphaEvolve: A coding agent for scientific and algorithmic discovery,” 2025. [Online]. Available: https://arxiv.org/abs/2506.13131
  • [8] C. Ikenmeyer and V. Lysikov, “Strassen’s 2×22\times 2 matrix multiplication algorithm: A conceptual perspective,” Annali dell’Università di Ferrara. Sezione 7: Scienze matematiche, vol. 65, 11 2019. [Online]. Available: http://arxiv.org/abs/1708.08083
  • [9] P. D’Alberto, “Strassen’s matrix multiplication algorithm is still faster,” 2023. [Online]. Available: https://arxiv.org/abs/2312.12732
  • [10] A. Sedoglavic, “Yet another catalogue of fast matrix multiplication algorithms,” 2025. [Online]. Available: https://fmm.univ-lille.fr/
  • [11] J.-G. Dumas, C. Pernet, and A. Sedoglavic, “A non-commutative algorithm for multiplying 4x4 matrices using 48 non-complex multiplications,” 2025. [Online]. Available: https://arxiv.org/abs/2506.13242
  • [12] J. Alman and H. Yu, “Improving the leading constant of matrix multiplication,” 2024. [Online]. Available: https://arxiv.org/abs/2410.20538
  • [13] S. Winograd, “On multiplication of 2×22\times 2 matrices,” Linear Algebra and its Applications, vol. 4, no. 4, pp. 381–388, 1971. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0024379571900097
  • [14] H. F. de Groote, “On varieties of optimal algorithms for the computation of bilinear mappings i. the isotropy group of a bilinear mapping,” Theoretical Computer Science, vol. 7, no. 1, pp. 1–24, 1978. [Online]. Available: https://www.sciencedirect.com/science/article/pii/0304397578900385
  • [15] M. Bläser, “On the complexity of the multiplication of matrices of small formats,” J. Complex., vol. 19, no. 1, pp. 43–60, 2003. [Online]. Available: https://doi.org/10.1016/S0885-064X(02)00007-9
  • [16] L. Chiantini, C. Ikenmeyer, J. M. Landsberg, and G. Ottaviani, “The geometry of rank decompositions of matrix multiplication I: 2×22\times 2 matrices,” Experimental Mathematics, vol. 28, no. 3, pp. 322–327, 2019. [Online]. Available: https://doi.org/10.1080/10586458.2017.1403981
  • [17] C. Ikenmeyer and J. Moosbauer, “Strassen’s algorithm via orbit flip graphs,” 2025. [Online]. Available: https://arxiv.org/abs/2503.05467
  • [18] V. P. Burichenko, “On symmetries of the strassen algorithm,” CoRR, vol. abs/1408.6273, 2014. [Online]. Available: http://arxiv.org/abs/1408.6273
  • [19] J. A. Grochow and C. Moore, “Matrix multiplication algorithms from group orbits,” CoRR, vol. abs/1612.01527, 2016. [Online]. Available: http://arxiv.org/abs/1612.01527
  • [20] ——, “Designing strassen’s algorithm,” CoRR, vol. abs/1708.09398, 2017. [Online]. Available: http://arxiv.org/abs/1708.09398
  • [21] M. Markl, “GLn-invariant tensors and graphs,” Archivum Mathematicum, vol. 044, no. 5, pp. 449–463, 2008. [Online]. Available: http://eudml.org/doc/250506
  • [22] H. Weyl, The Classical Groups: Their Invariants and Representations. Princeton University Press, 1939.