This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Expressiveness and Approximation Properties of Graph Neural Networks

Floris Geerts
Department of Computer Science, University of Antwerp, Belgium
floris.geerts@uantwerpen.be
&Juan L. Reutter
School of Engineering, Pontificia Universidad Católica de Chile, Chile & IMFD, Chile
jreutter@ing.puc.cl
Abstract

Characterizing the separation power of graph neural networks (𝖦𝖭𝖭s\mathsf{GNN}\text{s}) provides an understanding of their limitations for graph learning tasks. Results regarding separation power are, however, usually geared at specific 𝖦𝖭𝖭\mathsf{GNN} architectures, and tools for understanding arbitrary 𝖦𝖭𝖭\mathsf{GNN} architectures are generally lacking. We provide an elegant way to easily obtain bounds on the separation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in terms of the Weisfeiler-Leman (𝖶𝖫\mathsf{WL}) tests, which have become the yardstick to measure the separation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. The crux is to view 𝖦𝖭𝖭s\mathsf{GNN}\text{s} as expressions in a procedural tensor language describing the computations in the layers of the 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. Then, by a simple analysis of the obtained expressions, in terms of the number of indexes and the nesting depth of summations, bounds on the separation power in terms of the 𝖶𝖫\mathsf{WL}-tests readily follow. We use tensor language to define Higher-Order Message-Passing Neural Networks (or kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}), a natural extension of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. Furthermore, the tensor language point of view allows for the derivation of universality results for classes of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in a natural way. Our approach provides a toolbox with which 𝖦𝖭𝖭\mathsf{GNN} architecture designers can analyze the separation power of their 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, without needing to know the intricacies of the 𝖶𝖫\mathsf{WL}-tests. We also provide insights in what is needed to boost the separation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}.

1 Introduction

Graph Neural Networks (𝖦𝖭𝖭s\mathsf{GNN}\text{s}) (Merkwirth & Lengauer, 2005; Scarselli et al., 2009) cover many popular deep learning methods for graph learning tasks (see Hamilton (2020) for a recent overview). These methods typically compute vector embeddings of vertices or graphs by relying on the underlying adjacency information. Invariance (for graph embeddings) and equivariance (for vertex embeddings) of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} ensure that these methods are oblivious to the precise representation of the graphs.

Separation power.

Our primary focus is on the separation power of 𝖦𝖭𝖭\mathsf{GNN} architectures, i.e., on their ability to separate vertices or graphs by means of the computed embeddings. It has become standard to characterize 𝖦𝖭𝖭\mathsf{GNN} architectures in terms of the separation power of graph algorithms such as color refinement (𝖢𝖱\mathsf{CR}) and kk-dimensional Weisfeiler-Leman tests (k-𝖶𝖫)(k\text{-}\mathsf{WL}), as initiated in Xu et al. (2019) and Morris et al. (2019). Unfortunately, understanding the separation power of any given 𝖦𝖭𝖭\mathsf{GNN} architecture requires complex proofs, geared at the specifics of the architecture. We provide a tensor language-based technique to analyze the separation power of general 𝖦𝖭𝖭s\mathsf{GNN}\text{s}.

Tensor languages.

Matrix query languages (Brijder et al., 2019; Geerts et al., 2021b) are defined to assess the expressive power of linear algebra. Balcilar et al. (2021a) observe that, by casting various 𝖦𝖭𝖭s\mathsf{GNN}\text{s} into the 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} (Brijder et al., 2019) matrix query language, one can use existing separation results (Geerts, 2021) to obtain upper bounds on the separation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in terms of 1-𝖶𝖫1\text{-}\mathsf{WL} and 2-𝖶𝖫2\text{-}\mathsf{WL}. In this paper, we considerably extend this approach by defining, and studying, a new general-purpose tensor language specifically designed for modeling 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. As in Balcilar et al. (2021a), our focus on tensor languages allows us to obtain new insights about 𝖦𝖭𝖭\mathsf{GNN} architectures. First, since tensor languages can only define invariant and equivariant graph functions, any 𝖦𝖭𝖭\mathsf{GNN} that can be cast in our tensor language inherits these desired properties. More importantly, the separation power of our tensor language is as closely related to 𝖢𝖱\mathsf{CR} and k-𝖶𝖫k\text{-}\mathsf{WL} as 𝖦𝖭𝖭s\mathsf{GNN}\text{s} are. Loosely speaking, if tensor language expressions use k+1k+1 indices, then their separation power is bounded by k-𝖶𝖫k\text{-}\mathsf{WL}. Furthermore, if the maximum nesting of summations in the expression is tt, then tt rounds of k-𝖶𝖫k\text{-}\mathsf{WL} are needed to obtain an upper bound on the separation power. A similar connection is obtained for 𝖢𝖱\mathsf{CR} and a fragment of tensor language that we call “guarded” tensor language.

We thus reduce problem of assessing the separation power of any specific 𝖦𝖭𝖭\mathsf{GNN} architecture to the problem of specifying it in our tensor language, analyzing the number of indices used and counting their summation depth. This is usually much easier than dealing with intricacies of 𝖢𝖱\mathsf{CR} and k-𝖶𝖫k\text{-}\mathsf{WL}, as casting 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in our tensor language is often as simple as writing down their layer-based definition. We believe that this provides a nice toolbox for 𝖦𝖭𝖭\mathsf{GNN} designers to assess the separation power of their architecture. We use this toolbox to recover known results about the separation power of specific 𝖦𝖭𝖭\mathsf{GNN} architectures such as 𝖦𝖨𝖭s\mathsf{GIN}\text{s} (Xu et al., 2019), 𝖦𝖢𝖭s\mathsf{GCN}\text{s} (Kipf & Welling, 2017), Folklore 𝖦𝖭𝖭s\mathsf{GNN}\text{s} (Maron et al., 2019b), kk-𝖦𝖭𝖭s\mathsf{GNN}\text{s} (Morris et al., 2019), and several others. We also derive new results: we answer an open problem posed by Maron et al. (2019a) by showing that the separation power of Invariant Graph Networks (k-𝖨𝖦𝖭s)k\text{-}\mathsf{IGN}\text{s}), introduced by Maron et al. (2019b), is bounded by (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}. In addition, we revisit the analysis by Balcilar et al. (2021b) of 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} (Defferrard et al., 2016), and show that 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet} (Levie et al., 2019) is bounded by 2-𝖶𝖫2\text{-}\mathsf{WL}.

When writing down 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in our tensor language, the less indices needed, the stronger the bounds in terms of k-𝖶𝖫k\text{-}\mathsf{WL} we obtain. After all, (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL} is known to be strictly less separating than k-𝖶𝖫k\text{-}\mathsf{WL} (Otto, 2017). Thus, it is important to minimize the number of indices used in tensor language expressions. We connect this number to the notion of treewidth: expressions of treewidth kk can be translated into expressions using k+1k+1 indices. This corresponds to optimizing expressions, as done in many areas in machine learning, by reordering the summations (a.k.a. variable elimination).

Approximation and universality.

We also consider the ability of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} to approximate general invariant or equivariant graph functions. Once more, instead of focusing on specific architectures, we use our tensor languages to obtain general approximation results, which naturally translate to universality results for 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. We show: (k+1)(k+1)-index tensor language expressions suffice to approximate any (invariant/equivariant) graph function whose separating power is bounded by k-𝖶𝖫k\text{-}\mathsf{WL}, and we can further refine this by comparing the number of rounds in k-𝖶𝖫k\text{-}\mathsf{WL} with the summation depth of the expressions. These results provide a finer picture than the one obtained by Azizian & Lelarge (2021). Furthermore, focusing on “guarded” tensor expressions yields a similar universality result for 𝖢𝖱\mathsf{CR}, a result that, to our knowledge, was not known before. We also provide the link between approximation results for tensor expressions and 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, enabling us to transfer our insights into universality properties of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. As an example, we show that k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} can approximate any graph function that is less separating than (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}. This case was left open in Azizian & Lelarge (2021).

In summary, we draw new and interesting connections between tensor languages, 𝖦𝖭𝖭\mathsf{GNN} architectures and classic graph algorithms. We provide a general recipe to bound the separation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, optimize them, and understand their approximation power. We show the usefulness of our method by recovering several recent results, as well as new results, some of them left open in previous work.

Related work.

Separation power has been studied for specific classes of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} (Morris et al., 2019; Xu et al., 2019; Maron et al., 2019b; Chen et al., 2019; Morris et al., 2020; Azizian & Lelarge, 2021). A first general result concerns the bounds in terms of 𝖢𝖱\mathsf{CR} and 1-𝖶𝖫1\text{-}\mathsf{WL} of Message-Passing Neural Networks (Gilmer et al., 2017; Morris et al., 2019; Xu et al., 2019). Balcilar et al. (2021a) use the 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} matrix query language to obtain upper bounds on the separation power of various 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} can only be used to obtain bounds up to 2-𝖶𝖫2\text{-}\mathsf{WL} and is limited to matrices. Our tensor language is more general and flexible and allows for reasoning over the number of indices, treewidth, and summation depth of expressions. These are all crucial for our main results. The tensor language introduced resembles 𝗌𝗎𝗆\mathsf{sum}-𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} (Geerts et al., 2021b), but with the added ability to represent tensors. Neither separation power nor guarded fragments were considered in Geerts et al. (2021b). See Section A in the supplementary material for more details. For universality, Azizian & Lelarge (2021) is closest in spirit. Our approach provides an elegant way to recover and extend their results. Azizian & Lelarge (2021) describe how their work (and hence also ours) encompasses previous works (Keriven & Peyré, 2019; Maron et al., 2019c; Chen et al., 2019). Our results use connections between k-𝖶𝖫k\text{-}\mathsf{WL} and logics (Immerman & Lander, 1990; Cai et al., 1992), and 𝖢𝖱\mathsf{CR} and guarded logics (Barceló et al., 2020). The optimization of algebraic computations and the use of treewidth relates to the approaches by Aji & McEliece (2000) and Abo Khamis et al. (2016).

2 Background

We denote sets by {}\{\} and multisets by {{}}\{\!\{\}\!\}. For nn\in\mathbb{N}, n>0n>0, [n]:={1,,n}[n]:=\{1,\ldots,n\}. Vectors are denoted by 𝒗,𝒘,{\bm{v}},{\bm{w}},\ldots, matrices by 𝑨,𝑩,{\bm{A}},{\bm{B}},\ldots, and tensors by 𝑺,𝑻,{\bm{\mathsfit{S}}},{\bm{\mathsfit{T}}},\ldots. Furthermore, vi{v}_{i} is the ii-th entry of vector 𝒗{\bm{v}}, Aij{A}_{ij} is the (i,j)(i,j)-th entry of matrix 𝑨{\bm{A}} and S𝒊{\mathsfit{S}}_{{\bm{i}}} denotes the 𝒊=(i1,,ik){\bm{i}}=(i_{1},\ldots,i_{k})-th entry of a tensor 𝑺{\bm{\mathsfit{S}}}. If certain dimensions are unspecified, then this is denoted by a “::”. For example, 𝑨i:{\bm{A}}_{i:} and 𝑨:j{\bm{A}}_{:j} denote the ii-th row and jj-th column of matrix 𝑨{\bm{A}}, respectively. Similarly for slices of tensors.

We consider undirected simple graphs G=(VG,EG,𝖼𝗈𝗅G)G=(V_{G},E_{G},\mathsf{col}_{G}) equipped with a vertex-labelling 𝖼𝗈𝗅G:VG\mathsf{col}_{G}:V_{G}\to\mathbb{R}^{\ell}. We assume that graphs have size nn, so VGV_{G} consists of nn vertices and we often identify VGV_{G} with [n][n]. For a vertex vVGv\in V_{G}, NG(v):={uVGvuEG}N_{G}(v):=\{u\in V_{G}\mid vu\in E_{G}\}. We let 𝒢\mathcal{G} be the set of all graphs of size nn and let 𝒢s\mathcal{G}_{s} be the set of pairs (G,𝒗)(G,{\bm{v}}) with G𝒢G\in\mathcal{G} and 𝒗VGs{\bm{v}}\in V_{G}^{s}. Note that 𝒢=𝒢0\mathcal{G}=\mathcal{G}_{0}.

The color refinement algorithm (𝖢𝖱)\mathsf{CR}) (Morgan, 1965) iteratively computes vertex labellings based on neighboring vertices, as follows. For a graph GG and vertex vVGv\in V_{G}, 𝖼𝗋(0)(G,v):=𝖼𝗈𝗅G(v)\mathsf{cr}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(G,v):=\mathsf{col}_{G}(v). Then, for t0t\geq 0, 𝖼𝗋(t+1)(G,v):=(𝖼𝗋(t)(G,v),{{𝖼𝗋(t)(G,u)uNG(v)}})\mathsf{cr}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(G,v):=\left(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,v),\{\!\{\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,u)\mid u\in N_{G}(v)\}\!\}\right). We collect all vertex labels to obtain a label for the entire graph by defining 𝗀𝖼𝗋(t)(G):={{𝖼𝗋(t)(G,v)vVG}}\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G):=\{\!\{\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,v)\mid v\in V_{G}\}\!\}. The kk-dimensional Weisfeiler-Leman algorithm (k-𝖶𝖫k\text{-}\mathsf{WL}) (Cai et al., 1992) iteratively computes labellings of kk-tuples of vertices. For a kk-tuple 𝒗{\bm{v}}, its atomic type in GG, denoted by 𝖺𝗍𝗉k(G,𝒗)\mathsf{atp}_{k}(G,{\bm{v}}), is a vector in 2(k2)+k\mathbb{R}^{2\genfrac{(}{)}{0.0pt}{4}{k}{2}+k\ell}. The first (k2)\binom{k}{2} entries are 0/10/1-values encoding the equality type of 𝒗{\bm{v}}, i.e., whether vi=vj{v}_{i}={v}_{j} for 1i<jk1\leq i<j\leq k. The second (k2)\binom{k}{2} entries are 0/10/1-values encoding adjacency information, i.e., whether vivjEG{v}_{i}{v}_{j}\in E_{G} for 1i<jk1\leq i<j\leq k. The last kk\ell real-valued entries correspond to 𝖼𝗈𝗅G(vi)\mathsf{col}_{G}({v}_{i})\in\mathbb{R}^{\ell} for 1ik1\leq i\leq k. Initially, for a graph GG and 𝒗VGk{\bm{v}}\in V_{G}^{k}, k-𝖶𝖫k\text{-}\mathsf{WL} assigns the label 𝗐𝗅k(0)(G,𝒗):=𝖺𝗍𝗉k(G,𝒗)\mathsf{wl}_{k}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(G,{\bm{v}}):=\mathsf{atp}_{k}(G,{\bm{v}}). For t0t\geq 0, k-𝖶𝖫k\text{-}\mathsf{WL} revises the label according to 𝗐𝗅k(t+1)(G,𝒗):=(𝗐𝗅k(t)(G,𝒗),M)\mathsf{wl}_{k}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(G,{\bm{v}}):=(\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,{\bm{v}}),M) with M:={{(𝖺𝗍𝗉k+1(G,𝒗u),𝗐𝗅k(t)(G,𝒗[u/1]),,𝗐𝗅k(t)(G,𝒗[u/k]))uVG}}M:=\left\{\!\left\{\left(\mathsf{atp}_{k+1}(G,{\bm{v}}u),\allowbreak\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,{\bm{v}}[u/1]),\ldots,\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,{\bm{v}}[u/k])\right)\mid u\in V_{G}\right\}\!\right\}, where 𝒗[u/i]:=(v1,,vi1,u,vi+1,,vk){\bm{v}}[u/i]:=({v}_{1},\allowbreak\ldots,{v}_{i-1},u,{v}_{i+1},\ldots,{v}_{k}). We use k-𝖶𝖫k\text{-}\mathsf{WL} to assign labels to vertices and graphs by defining: 𝗏𝗐𝗅k(t)(G,v):=𝗐𝗅k(t)(G,(v,,v))\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,v):=\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,(v,\ldots,v)), for vertex-labellings, and 𝗀𝗐𝗅k(t):={{𝗐𝗅k(t)(G,𝒗)𝒗VGk}}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:=\{\!\{\mathsf{wl}_{k}^{(t)}(G,{\bm{v}})\mid{\bm{v}}\in V_{G}^{k}\}\!\}, for graph-labellings. We use 𝖼𝗋()\mathsf{cr}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}, 𝗀𝖼𝗋()\mathsf{gcr}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}, 𝗏𝗐𝗅k()\mathsf{vwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}, and 𝗀𝗐𝗅k()\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}} to denote the stable labellings produced by the corresponding algorithm over an arbitrary number of rounds. Our version of 1-𝖶𝖫1\text{-}\mathsf{WL} differs from 𝖢𝖱\mathsf{CR} in that 1-𝖶𝖫1\text{-}\mathsf{WL} also uses information from non-adjacent vertices; this distinction only matters for vertex embeddings (Grohe, 2021). We use the “folklore” k-𝖶𝖫k\text{-}\mathsf{WL} of Cai et al. (1992), except Cai et al. use 1-𝖶𝖫1\text{-}\mathsf{WL} to refer to 𝖢𝖱\mathsf{CR}. While equivalent to “oblivious” (k+1)-𝖶𝖫(k+1)\text{-}\mathsf{WL} (Grohe, 2021), used in some other works on 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, care is needed when comparing to our work.

Let GG be a graph with VG=[n]V_{G}=[n] and let σ\sigma be a permutation of [n][n]. We denote by σG\sigma\star G the isomorphic copy of GG obtained by applying the permutation σ\sigma. Similarly, for 𝒗VGk{\bm{v}}\in V_{G}^{k}, σ𝒗\sigma\star{\bm{v}} is the permuted version of 𝒗{\bm{v}}. Let 𝔽{\mathbb{F}} be some feature space. A function f:𝒢0𝔽f:\mathcal{G}_{0}\to{\mathbb{F}} is called invariant if f(G)=f(σG)f(G)=f(\sigma\star G) for any permutation π\pi. More generally, f:𝒢s𝔽f:\mathcal{G}_{s}\to{\mathbb{F}} is equivariant if f(σG,σ𝒗)=f(G,𝒗)f(\sigma\star G,\sigma\star{\bm{v}})=f(G,{\bm{v}}) for any permutation σ\sigma. The functions 𝖼𝗋(t):𝒢1𝔽\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathcal{G}_{1}\to{\mathbb{F}} and 𝗏𝗐𝗅k(t):𝒢1𝔽\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathcal{G}_{1}\to{\mathbb{F}} are equivariant, whereas 𝗀𝖼𝗋(t):𝒢0𝔽\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathcal{G}_{0}\to{\mathbb{F}} and 𝗀𝗐𝗅k(t):𝒢0𝔽\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathcal{G}_{0}\to{\mathbb{F}} are invariant, for any t0t\geq 0 and k1k\geq 1.

3 Specifying GNNs

Many 𝖦𝖭𝖭s\mathsf{GNN}\text{s} use linear algebra computations on vectors, matrices or tensors, interleaved with the application of activation functions or 𝖬𝖫𝖯s\mathsf{MLP}\text{s}. To understand the separation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, we introduce a specification language, 𝖳𝖫\mathsf{TL}, for tensor language, that allows us to specify any algebraic computation in a procedural way by explicitly stating how each entry is to be computed. We gauge the separation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} by specifying them as 𝖳𝖫\mathsf{TL} expressions, and syntactically analyzing the components of such 𝖳𝖫\mathsf{TL} expressions. This technique gives rise to Higher-Order Message-Passing Neural Networks (or kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}), a natural extension of 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (Gilmer et al., 2017). For simplicity, we present 𝖳𝖫\mathsf{TL} using summation aggregation only but arbitrary aggregation functions on multisets of real values can be used as well (Section C.5 in the supplementary material).

To introduce 𝖳𝖫\mathsf{TL}, consider a typical layer in a 𝖦𝖭𝖭\mathsf{GNN} of the form 𝑭=σ(𝑨𝑭𝑾){\bm{F}}^{\prime}=\sigma({\bm{A}}\cdot{\bm{F}}\cdot{\bm{W}}), where 𝑨n×n{\bm{A}}\in\mathbb{R}^{n\times n} is an adjacency matrix, 𝑭n×{\bm{F}}\in\mathbb{R}^{n\times\ell} are vertex features such that 𝑭i:{\bm{F}}_{i:}\in\mathbb{R}^{\ell} is the feature vector of vertex ii, σ\sigma is a non-linear activation function, and 𝑾×{\bm{W}}\in\mathbb{R}^{\ell\times\ell} is a weight matrix. By exposing the indices in the matrices and vectors we can equivalently write: for i[n]i\in[n] and s[]s\in[\ell]:

Fis:=σ(j[n]Aij(t[]WtsFjt)).{F}_{is}^{\prime}:=\sigma\Bigl{(}\textstyle\sum_{j\in[n]}{A}_{ij}\cdot\bigl{(}\textstyle\sum_{t\in[\ell]}{W}_{ts}\cdot{F}_{jt}\bigr{)}\Bigr{)}.

In 𝖳𝖫\mathsf{TL}, we do not work with specific matrices or indices ranging over [n][n], but focus instead on expressions applicable to any matrix. We use index variables x1x_{1} and x2x_{2} instead of ii and jj, replace Aij{A}_{ij} with a placeholder E(x1,x2)E(x_{1},x_{2}) and Fjt{F}_{jt} with placeholders Pt(x2)P_{t}(x_{2}), for t[]t\in[\ell]. We then represent the above computation in 𝖳𝖫\mathsf{TL} by \ell expressions ψs(x1)\psi_{s}(x_{1}), one for each feature column, as follows:

ψs(x1)=σ(x2E(x1,x2)(t[]WtsPt(x2))).\psi_{s}(x_{1})=\sigma\Bigl{(}\textstyle\sum_{x_{2}}E(x_{1},x_{2})\cdot\bigl{(}\textstyle\sum_{t\in[\ell]}{W}_{ts}\cdot P_{t}(x_{2})\bigr{)}\Bigr{)}.

These are pure syntactical expressions. To give them a semantics, we assign to EE a matrix 𝑨n×n{\bm{A}}\in\mathbb{R}^{n\times n}, to PtP_{t} column vectors 𝑭:tn×1{\bm{F}}_{:t}\in\mathbb{R}^{n\times 1}, for t[]t\in[\ell], and to x1x_{1} an index i[n]i\in[n]. By letting the variable x2x_{2} under the summation range over 1,2,,n1,2,\ldots,n, the 𝖳𝖫\mathsf{TL} expression ψs(i)\psi_{s}(i) evaluates to Fis{F}_{is}^{\prime}. As such, 𝑭=σ(𝑨𝑭𝑾){\bm{F}}^{\prime}=\sigma({\bm{A}}\cdot{\bm{F}}\cdot{\bm{W}}) can be represented as a specific instance of the above 𝖳𝖫\mathsf{TL} expressions. Throughout the paper we reason about expressions in 𝖳𝖫\mathsf{TL} rather than specific instances thereof. Importantly, by showing that certain properties hold for expressions in 𝖳𝖫\mathsf{TL}, these properties are inherited by all of its instances. We use 𝖳𝖫\mathsf{TL} to enable a theoretical analysis of the separating power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}; It is not intended as a practical programming language for 𝖦𝖭𝖭s\mathsf{GNN}\text{s}.

Syntax.

We first give the syntax of 𝖳𝖫\mathsf{TL} expressions. We have a binary predicate EE, to represent adjacency matrices, and unary vertex predicates PsP_{s}, s[]s\in[\ell], to represent column vectors encoding the \ell-dimensional vertex labels. In addition, we have a (possibly infinite) set Ω\Omega of functions, such as activation functions or 𝖬𝖫𝖯s\mathsf{MLP}\text{s}. Then, 𝖳𝖫(Ω)\mathsf{TL}(\Omega) expressions are defined by the following grammar:

φ:=𝟏x𝗈𝗉y|E(x,y)|Ps(x)|φφ|φ+φ|aφ|f(φ,,φ)|xφ\varphi:=\bm{1}_{x\mathop{\mathsf{op}}y}\,\,|\,\,E(x,y)\,\,|\,\,P_{s}(x)\,\,|\,\,\varphi\cdot\varphi\,\,|\,\,\varphi+\varphi\,\,|\,\,a\cdot\varphi\,\,|\,\,f(\varphi,\ldots,\varphi)\,\,|\,\,\textstyle{\sum}_{x}\,\varphi

where 𝗈𝗉{=,}\mathop{\mathsf{op}}\in\{=,\neq\}, x,yx,y are index variables that specify entries in tensors, s[]s\in[\ell], aa\in\mathbb{R}, and fΩf\in\Omega. Summation aggregation is captured by xφ\sum_{x}\varphi.111We can replace xφ\sum_{x}\varphi by a more general aggregation construct 𝖺𝗀𝗀𝗋xF(φ)\mathsf{aggr}_{x}^{F}(\varphi) for arbitrary functions FF that assign a real value to multisets of real values. We refer to the supplementary material (Section C.5) for details. We sometimes make explicit which functions are used in expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) by writing 𝖳𝖫(f1,f2,)\mathsf{TL}(f_{1},f_{2},\ldots) for f1,f2,f_{1},f_{2},\ldots in Ω\Omega. For example, the expressions ψs(x1)\psi_{s}(x_{1}) described earlier are in 𝖳𝖫(σ)\mathsf{TL}(\sigma).

The set of free index variables of an expression φ\varphi, denoted by 𝖿𝗋𝖾𝖾(φ)\mathsf{free}(\varphi), determines the order of the tensor represented by φ\varphi. It is defined inductively: 𝖿𝗋𝖾𝖾(𝟏x𝗈𝗉y)=𝖿𝗋𝖾𝖾(E(x,y)):={x,y}\mathsf{free}(\bm{1}_{x\mathop{\mathsf{op}}y})=\mathsf{free}(E(x,y)):=\{x,y\}, 𝖿𝗋𝖾𝖾(Ps(x))={x}\mathsf{free}(P_{s}(x))=\{x\}, 𝖿𝗋𝖾𝖾(φ1φ2)=𝖿𝗋𝖾𝖾(φ1+φ2):=𝖿𝗋𝖾𝖾(φ1)𝖿𝗋𝖾𝖾(φ2)\mathsf{free}(\varphi_{1}\cdot\varphi_{2})=\mathsf{free}(\varphi_{1}+\varphi_{2}):=\mathsf{free}(\varphi_{1})\cup\mathsf{free}(\varphi_{2}), 𝖿𝗋𝖾𝖾(aφ1):=𝖿𝗋𝖾𝖾(φ1)\mathsf{free}(a\cdot\varphi_{1}):=\mathsf{free}(\varphi_{1}), 𝖿𝗋𝖾𝖾(f(φ1,,φp)):=i[p]𝖿𝗋𝖾𝖾(φi)\mathsf{free}(f(\varphi_{1},\ldots,\varphi_{p})):=\cup_{i\in[p]}\mathsf{free}(\varphi_{i}), and 𝖿𝗋𝖾𝖾(xφ1):=𝖿𝗋𝖾𝖾(φ1){x}\mathsf{free}(\sum_{x}\varphi_{1}):=\mathsf{free}(\varphi_{1})\setminus\{x\}. We sometimes explicitly write the free indices. In our example expressions ψs(x1)\psi_{s}(x_{1}), x1x_{1} is the free index variable.

An important class of expressions are those that only use index variables {x1,,xk}\{x_{1},\ldots,x_{k}\}. We denote by 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega) the kk-index variable fragment of 𝖳𝖫(Ω)\mathsf{TL}(\Omega). The expressions ψs(x1)\psi_{s}(x_{1}) are in 𝖳𝖫2(σ)\mathsf{TL}_{2}(\sigma).

Semantics.

We next define the semantics of expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega). Let G=(VG,EG,𝖼𝗈𝗅G)G=(V_{G},E_{G},\mathsf{col}_{G}) be a vertex-labelled graph. We start by defining the interpretation [[,ν]]G[\![\cdot,\nu]\!]_{G} of the predicates EE, PsP_{s} and the (dis)equality predicates, relative to GG and a valuation ν\nu assigning a vertex to each index variable:

[[E(x,y),ν]]G:=ifν(x)ν(y)EG then 1 else 0[[Ps(x),ν]]G:=𝖼𝗈𝗅G(ν(x))s[[𝟏x𝗈𝗉y,ν]]G:=ifν(x)𝗈𝗉ν(y) then 1 else 0.\begin{array}[]{rcl@{\hspace*{3.7mm}}rcl}[\![E(x,y),\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!&\mathrm{if~{}}\nu(x)\nu(y)\in E_{G}\text{~{}then~{}}1\text{~{}else~{}}0\hfil\hskip 10.5275pt&[\![P_{s}(x),\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!&\mathsf{col}_{G}(\nu(x))_{s}\in\mathbb{R}\\ [\![\bm{1}_{x\mathop{\mathsf{op}}y},\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!&\mathrm{if~{}}\nu(x)\mathop{\mathsf{op}}\nu(y)\text{~{}then~{}}1\text{~{}else~{}}0.\hfil\hskip 10.5275pt\end{array}

In other words, EE is interpreted as the adjacency matrix of GG and the PsP_{s}’s interpret the vertex-labelling 𝖼𝗈𝗅G\mathsf{col}_{G}. Furthermore, we lift interpretations to arbitrary expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega), as follows:

[[φ1φ2,ν]]G:=[[φ1,ν]]G[[φ2,ν]]G[[φ1+φ2,ν]]G:=[[φ1,ν]]G+[[φ2,ν]]G[[xφ1,ν]]G:=vVG[[φ1,ν[xv]]]G[[aφ1,ν]]G:=a[[φ1,ν]]G[[f(φ1,,φp),ν]]G:=f([[φ1,ν]]G,,[[φp,ν]]G)\begin{array}[]{rcl@{\hspace*{3.7mm}}rcl}[\![\varphi_{1}\cdot\varphi_{2},\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!\!&[\![\varphi_{1},\nu]\!]_{G}\cdot[\![\varphi_{2},\nu]\!]_{G}\hfil\hskip 10.5275pt&\!\![\![\varphi_{1}\!+\!\varphi_{2},\nu]\!]_{G}&\!\!\!\!\!\!:=\!\!\!\!\!\!&[\![\varphi_{1},\nu]\!]_{G}+[\![\varphi_{2},\nu]\!]_{G}\\ [\![\sum_{x}\,\varphi_{1},\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!\!&\sum_{v\in V_{G}}[\![\varphi_{1},\nu[x\mapsto v]]\!]_{G}\hfil\hskip 10.5275pt&[\![a\cdot\varphi_{1},\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!&a\cdot[\![\varphi_{1},\nu]\!]_{G}\\ \!\![\![f(\varphi_{1},\ldots,\varphi_{p}),\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!\!\!&f([\![\varphi_{1},\nu]\!]_{G},\ldots,[\![\varphi_{p},\nu]\!]_{G})\hfil\hskip 10.5275pt\end{array}

where, ν[xv]\nu[x\mapsto v] is the valuation ν\nu but which now maps the index xx to the vertex vVGv\in V_{G}. For simplicity, we identify valuations with their images. For example, [[φ(x),v]]G[\![\varphi(x),v]\!]_{G} denotes [[φ(x),xv]]G[\![\varphi(x),x\mapsto v]\!]_{G}. To illustrate the semantics, for each vVGv\in V_{G}, our example expressions satisfy [[ψs,v]]G=Fvs[\![\psi_{s},v]\!]_{G}={F}_{vs}^{\prime} for 𝑭=σ(𝑨𝑭𝑾){\bm{F}}^{\prime}=\sigma({\bm{A}}\cdot{\bm{F}}\cdot{\bm{W}}) when 𝑨{\bm{A}} is the adjacency matrix of GG and 𝑭{\bm{F}} represents the vertex labels.

kk-MPNNs.

Consider a function f:𝒢s:(G,𝒗)f(G,𝒗)f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}:(G,{\bm{v}})\mapsto f(G,{\bm{v}})\in\mathbb{R}^{\ell} for some \ell\in\mathbb{N}. We say that the function ff can be represented in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) if there exists \ell expressions φ1(x1,,xs),,φ(x1,,xs)\varphi_{1}(x_{1},\ldots,x_{s}),\ldots,\varphi_{\ell}(x_{1},\ldots,x_{s}) in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) such that for each graph GG and each ss-tuple 𝒗VGs{\bm{v}}\in V_{G}^{s}:

f(G,𝒗)=([[φ1,𝒗]]G,,[[φ,𝒗]]G).f(G,{\bm{v}})=\bigl{(}[\![\varphi_{1},{\bm{v}}]\!]_{G},\ldots,[\![\varphi_{\ell},{\bm{v}}]\!]_{G}\bigr{)}.

Of particular interest are kkth-order 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (or kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}) which refers to the class of functions that can be represented in 𝖳𝖫k+1(Ω)\mathsf{TL}_{k+1}(\Omega). We can regard 𝖦𝖭𝖭s\mathsf{GNN}\text{s} as functions f:𝒢sf:\mathcal{G}_{s}\to\mathbb{R}^{\ell}. Hence, a 𝖦𝖭𝖭\mathsf{GNN} is a kk-𝖬𝖯𝖭𝖭\mathsf{MPNN} if its corresponding functions are kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. For example, we can interpret 𝑭=σ(𝑨𝑭𝑾){\bm{F}}^{\prime}=\sigma({\bm{A}}\cdot{\bm{F}}\cdot{\bm{W}}) as a function f:𝒢1f:\mathcal{G}_{1}\to\mathbb{R}^{\ell} such that f(G,v):=𝑭v:f(G,v):={\bm{F}}_{v:}^{\prime}. We have seen that for each s[]s\in[\ell], [[ψs,v]]G=Fvs[\![\psi_{s},v]\!]_{G}={F}_{vs}^{\prime} with ψs𝖳𝖫2(σ)\psi_{s}\in\mathsf{TL}_{2}(\sigma). Hence, f(G,v)=([[ψ1,v]]G,,[[ψ,v]]G)f(G,v)=([\![\psi_{1},v]\!]_{G},\ldots,[\![\psi_{\ell},v]\!]_{G}) and thus ff belongs to 11-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and our example 𝖦𝖭𝖭\mathsf{GNN} is a 11-𝖬𝖯𝖭𝖭\mathsf{MPNN}.

TL represents equivariant or invariant functions.

We make a simple observation which follows from the type of operators allowed in expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega).

Proposition 3.1.

Any function f:𝒢sf:\mathcal{G}_{s}\to\mathbb{R}^{\ell} represented in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) is equivariant (invariant if s=0s=0).

An immediate consequence is that when a 𝖦𝖭𝖭\mathsf{GNN} is a kk-𝖬𝖯𝖭𝖭\mathsf{MPNN}, it is automatically invariant or equivariant, depending on whether graph or vertex tuple embeddings are considered.

4 Separation Power of Tensor Languages

Our first main results concern the characterization of the separation power of tensor languages in terms of the color refinement and kk-dimensional Weisfeiler-Leman algorithms. We provide a fine-grained characterization by taking the number of rounds of these algorithms into account. This will allow for measuring the separation power of classes of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in terms of their number of layers.

4.1 Separation Power

We define the separation power of graph functions in terms of an equivalence relation, based on the definition from Azizian & Lelarge (2021), hereby first focusing on their ability to separate vertices.222We differ slightly from Azizian & Lelarge (2021) in that they only define equivalence relations on graphs.

Definition 1.

Let \mathcal{F} be a set of functions f:𝒢1ff:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{f}}. The equivalence relation ρ1()\rho_{1}(\mathcal{F}) is defined by \mathcal{F} on 𝒢1\mathcal{G}_{1} as follows: ((G,v),(H,w))ρ1()f,f(G,v)=f(H,w)\bigl{(}(G,v),(H,w)\bigr{)}\in\rho_{1}(\mathcal{F})\Longleftrightarrow\forall f\in\mathcal{F},f(G,v)=f(H,w). ∎

In other words, when ((G,v),(H,w))ρ1()((G,v),(H,w))\in\rho_{1}(\mathcal{F}), no function in \mathcal{F} can separate vv in GG from ww in HH. For example, we can view 𝖼𝗋(t)\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} and 𝗏𝗐𝗅k(t)\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} as functions from 𝒢1\mathcal{G}_{1} to some \mathbb{R}^{\ell}. As such ρ1(𝖼𝗋(t))\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) and ρ1(𝗏𝗐𝗅k(t))\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) measure the separation power of these algorithms. The following strict inclusions are known: for all k1k\geq 1, ρ1(𝗏𝗐𝗅k+1(t))ρ1(𝗏𝗐𝗅k(t))\rho_{1}(\mathsf{vwl}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subset\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) and ρ1(𝗏𝗐𝗅1(t))ρ1(𝖼𝗋(t))\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subset\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) (Otto, 2017; Grohe, 2021). It is also known that more rounds (tt) increase the separation power of these algorithms (Fürer, 2001).

For a fragment \mathcal{L} of 𝖳𝖫(Ω)\mathsf{TL}(\Omega) expressions, we define ρ1()\rho_{1}(\mathcal{L}) as the equivalence relation associated with all functions f:𝒢1ff:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{f}} that can be represented in \mathcal{L}. By definition, we here thus consider expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) with one free index variable resulting in vertex embeddings.

4.2 Main Results

We first provide a link between k-𝖶𝖫k\text{-}\mathsf{WL} and tensor language expressions using k+1k+1 index variables:

Theorem 4.1.

For each k1k\geq 1 and any collection Ω\Omega of functions, ρ1(𝗏𝗐𝗅k())=ρ1(𝖳𝖫k+1(Ω))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{TL}_{k+1}(\Omega)\bigr{)}.

This theorem gives us new insights: if we wish to understand how a new 𝖦𝖭𝖭\mathsf{GNN} architecture compares against the k-𝖶𝖫k\text{-}\mathsf{WL} algorithms, all we need to do is to show that such an architecture can be represented in 𝖳𝖫k+1(Ω)\mathsf{TL}_{k+1}(\Omega), i.e., is a kk-𝖬𝖯𝖭𝖭\mathsf{MPNN}, an arguably much easier endeavor. As an example of how to use this result, it is well known that triangles can be detected by 2-𝖶𝖫2\text{-}\mathsf{WL} but not by 1-𝖶𝖫1\text{-}\mathsf{WL}. Thus, in order to design 𝖦𝖭𝖭s\mathsf{GNN}\text{s} that can detect triangles, layer definitions in 𝖳𝖫3\mathsf{TL}_{3} rather than 𝖳𝖫2\mathsf{TL}_{2} should be used.

We can do much more, relating the rounds of k-𝖶𝖫k\text{-}\mathsf{WL} to the notion of summation depth of 𝖳𝖫(Ω)\mathsf{TL}(\Omega) expressions. We also present present similar results for functions computing graph embeddings.

The summation depth 𝗌𝖽(φ)\mathsf{sd}(\varphi) of a 𝖳𝖫(Ω)\mathsf{TL}(\Omega) expression φ\varphi measures the nesting depth of the summations x\sum_{x} in the expression. It is defined inductively: 𝗌𝖽(𝟏x𝗈𝗉y)=𝗌𝖽(E(x,y))=𝗌𝖽(Ps(x)):=0\mathsf{sd}(\bm{1}_{x\mathop{\mathsf{op}}y})=\mathsf{sd}(E(x,y))=\mathsf{sd}(P_{s}(x)):=0, 𝗌𝖽(φ1φ2)=𝗌𝖽(φ1+φ2):=𝗆𝖺𝗑{𝗌𝖽(φ1),𝗌𝖽(φ2)}\mathsf{sd}(\varphi_{1}\cdot\varphi_{2})=\mathsf{sd}(\varphi_{1}+\varphi_{2}):=\mathsf{max}\{\mathsf{sd}(\varphi_{1}),\mathsf{sd}(\varphi_{2})\}, 𝗌𝖽(aφ1):=𝗌𝖽(φ1)\mathsf{sd}(a\cdot\varphi_{1}):=\mathsf{sd}(\varphi_{1}), 𝗌𝖽(f(φ1,,φp)):=𝗆𝖺𝗑{𝗌𝖽(φi)|i[p]}\mathsf{sd}(f(\varphi_{1},\ldots,\varphi_{p})):=\mathsf{max}\{\mathsf{sd}(\varphi_{i})|i\in[p]\}, and 𝗌𝖽(xφ1):=𝗌𝖽(φ1)+1\mathsf{sd}(\sum_{x}\varphi_{1}):=\mathsf{sd}(\varphi_{1})+1. For example, expressions ψs(x1)\psi_{s}(x_{1}) above have summation depth one. We write 𝖳𝖫k+1(t)(Ω)\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega) for the class of expressions in 𝖳𝖫k+1(Ω)\mathsf{TL}_{k+1}(\Omega) of summation depth at most tt, and use kk-𝖬𝖯𝖭𝖭(t)\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} for the corresponding class of kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. We can now refine Theorem 4.1, taking into account the number of rounds used in k-𝖶𝖫k\text{-}\mathsf{WL}.

Theorem 4.2.

For all t0t\geq 0, k1k\geq 1 and any collection Ω\Omega of functions, ρ1(𝗏𝗐𝗅k(t))=ρ1(𝖳𝖫k+1(t)(Ω))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}.

Guarded TL and color refinement.

As noted by Barceló et al. (2020), the separation power of vertex embeddings of simple 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, which propagate information only through neighboring vertices, is usually weaker than that of 1-𝖶𝖫1\text{-}\mathsf{WL}. For these types of architectures, Barceló et al. (2020) provide a relation with the weaker color refinement algorithm, but only in the special case of first-order classifiers. We can recover and extend this result in our general setting, with a guarded version of 𝖳𝖫\mathsf{TL} which, as we will show, has the same separation power as color refinement.

The guarded fragment 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega) of 𝖳𝖫2(Ω)\mathsf{TL}_{2}(\Omega) is inspired by the use of adjacency matrices in simple 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. In 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega) only equality predicates 𝟏xi=xi\bm{1}_{x_{i}=x_{i}} (constant 11) and 𝟏xixi\bm{1}_{x_{i}\neq x_{i}} (constant 0) are allowed, addition and multiplication require the component expressions to have the same (single) free index, and summation must occur in a guarded form xj(E(xi,xj)φ(xj))\sum_{x_{j}}\bigl{(}E(x_{i},x_{j})\cdot\varphi(x_{j})\bigr{)}, for i,j[2]i,j\in[2]. Guardedness means that summation only happens over neighbors. In 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega), all expressions have a single free variable and thus only functions from 𝒢1\mathcal{G}_{1} can be represented. Our example expressions ψs(x1)\psi_{s}(x_{1}) are guarded. The fragment 𝖦𝖳𝖫(t)(Ω)\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega) consists of expressions in 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega) of summation depth at most tt. We denote by 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and 𝖬𝖯𝖭𝖭s(t)\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} the corresponding “guarded” classes of 11-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}.333For the connection to classical 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (Gilmer et al., 2017), see Section H in the supplementary material.

Theorem 4.3.

For all t0t\geq 0 and any collection Ω\Omega of functions: ρ1(𝖼𝗋(t))=ρ1(𝖦𝖳𝖫(t)(Ω))\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}.

As an application of this theorem, to detect the existence of paths of length tt, the number of guarded layers in 𝖦𝖭𝖭s\mathsf{GNN}\text{s} should account for a representation in 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega) of summation depth of at least tt. We recall that ρ1(𝗏𝗐𝗅1(t))ρ1(𝖼𝗋(t))\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subset\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) which, combined with our previous results, implies that 𝖳𝖫2(t)(Ω)\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega) (resp., 11-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}) is strictly more separating than 𝖦𝖳𝖫(t)(Ω)\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega) (resp., 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}).

Graph embeddings.

We next establish connections between the graph versions of k-𝖶𝖫k\text{-}\mathsf{WL} and 𝖢𝖱\mathsf{CR}, and 𝖳𝖫\mathsf{TL} expressions without free index variables. To this aim, we use ρ0()\rho_{0}(\mathcal{F}), for a set \mathcal{F} of functions f:𝒢ff:\mathcal{G}\to\mathbb{R}^{\ell_{f}}, as the equivalence relation over 𝒢\mathcal{G} defined in analogy to ρ1\rho_{1}: (G,H)ρ0()f,f(G)=f(H)(G,H)\in\rho_{0}(\mathcal{F})\Longleftrightarrow\forall f\in\mathcal{F},f(G)=f(H). We thus consider separation power on the graph level. For example, we can consider ρ0(𝗀𝖼𝗋(t))\rho_{0}(\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) and ρ0(𝗀𝗐𝗅k(t))\rho_{0}(\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) for any t0t\geq 0 and k1k\geq 1. Also here, ρ0(𝗀𝗐𝗅k+1(t))ρ0(𝗀𝗐𝗅k(t))\rho_{0}(\mathsf{gwl}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subset\rho_{0}(\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) but different from vertex embeddings, ρ0(𝗀𝖼𝗋(t))=ρ0(𝗀𝗐𝗅1(t))\rho_{0}(\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{0}(\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) (Grohe, 2021). We define ρ0()\rho_{0}(\mathcal{L}) for a fragment \mathcal{L} of 𝖳𝖫(Ω)\mathsf{TL}(\Omega) by considering expressions without free index variables.

The connection between the number of index variables in expressions and k-𝖶𝖫k\text{-}\mathsf{WL} remains to hold. Apart from k=1k=1, no clean relationship exists between summation depth and rounds, however.444Indeed, the best one can obtain for general tensor logic expressions is ρ0(𝖳𝖫k+1(t+k)(Ω))ρ0(𝗀𝗐𝗅k(t))ρ0(𝖳𝖫k+1(t+1)(Ω))\rho_{0}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t+k\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)\bigr{)}. This follows from Cai et al. (1992) and connections to finite variable logics.

Theorem 4.4.

For all t0t\geq 0, k1k\geq 1 and any collection Ω\Omega of functions, we have that:

(1)ρ0(𝗀𝖼𝗋(t))=ρ0(𝖳𝖫2(t+1)(Ω))=ρ0(𝗀𝗐𝗅1(t))(2)ρ0(𝗀𝗐𝗅k())=ρ0(𝖳𝖫k+1(Ω)).\text{(1)}\quad\rho_{0}\bigl{(}\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\quad\quad\text{(2)}\quad\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{TL}_{k+1}(\Omega)\bigr{)}.

Intuitively, in (1) the increase in summation depth by one is incurred by the additional aggregation needed to collect all vertex labels computed by 𝗀𝗐𝗅1(t)\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}.

Optimality of number of indices.

Our results so far tell that graph functions represented in 𝖳𝖫k+1(Ω)\mathsf{TL}_{k+1}(\Omega) are at most as separating as k-𝖶𝖫k\text{-}\mathsf{WL}. What is left unaddressed is whether all k+1k+1 index variables are needed for the graph functions under consideration. It may well be, for example, that there exists an equivalent expression using less index variables. This would imply a stronger upper bound on the separation power by -𝖶𝖫\ell\text{-}\mathsf{WL} for <k\ell<k. We next identify a large class of 𝖳𝖫(Ω)\mathsf{TL}(\Omega) expressions, those of treewidth kk, for which the number of index variables can be reduced to k+1k+1.

Proposition 4.5.

Expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) of treewidth kk are equivalent to expressions in 𝖳𝖫k+1(Ω)\mathsf{TL}_{k+1}(\Omega).

Treewidth is defined in the supplementary material (Section G) and a treewidth of kk implies that the computation of tensor language expressions can be decomposed, by reordering summations, such that each local computation requires at most k+1k+1 indices (see also Aji & McEliece (2000)). As a simple example, consider θ(x1)=x2x3E(x1,x2)E(x2,x3)\theta(x_{1})=\textstyle\sum_{x_{2}}\textstyle\sum_{x_{3}}E(x_{1},x_{2})\cdot E(x_{2},x_{3}) in 𝖳𝖫3(2)\mathsf{TL}_{3}^{\!\scalebox{0.6}{(}2\scalebox{0.6}{)}} such that [[θ,v]]G[\![\theta,v]\!]_{G} counts the number of paths of length two starting from vv. This expression has a treewidth of one. And indeed, it is equivalent to the expression θ~(x1)=x2E(x1,x2)(x1E(x2,x1))\tilde{\theta}(x_{1})=\sum_{x_{2}}E(x_{1},x_{2})\cdot\bigl{(}\sum_{x_{1}}E(x_{2},x_{1})\bigr{)} in 𝖳𝖫2(2)\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}2\scalebox{0.6}{)}} (and in fact in 𝖦𝖳𝖫(2)\mathsf{GTL}^{\!\scalebox{0.6}{(}2\scalebox{0.6}{)}}). As a consequence, no more vertices can be separated by θ(x1)\theta(x_{1}) than by 𝖼𝗋(2)\mathsf{cr}^{\scalebox{0.6}{(}2\scalebox{0.6}{)}}, rather than 𝗏𝗐𝗅22\mathsf{vwl}_{2}^{{2}} as the original expression in 𝖳𝖫3(2)\mathsf{TL}_{3}^{\!\scalebox{0.6}{(}2\scalebox{0.6}{)}} suggests.

On the impact of functions.

All separation results for 𝖳𝖫(Ω)\mathsf{TL}(\Omega) and fragments thereof hold irregardless of the chosen functions in Ω\Omega, including when no functions are present at all. Function applications hence do not add expressive power. While this may seem counter-intuitive, it is due to the presence of summation and multiplication in 𝖳𝖫\mathsf{TL} that are enough to separate graphs or vertices.

5 Consequences for GNNs

We next interpret the general results on the separation power from Section 4 in the context of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}.

1. The separation power of any vertex embedding 𝖦𝖭𝖭\mathsf{GNN} architecture which is an 𝖬𝖯𝖭𝖭(t)\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} is bounded by the power of tt rounds of color refinement.

We consider the Graph Isomorphism Networks (𝖦𝖨𝖭s)\mathsf{GIN}\text{s}) (Xu et al., 2019) and show that these are 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. To do so, we represent them in 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega). Let 𝗀𝗂𝗇\mathsf{gin} be such a network; it updates vertex embeddings as follows. Initially, 𝗀𝗂𝗇(0):𝒢10:(G,v)𝑭v:(0):=𝖼𝗈𝗅G(v)0\mathsf{gin}^{(0)}:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{0}}:(G,v)\mapsto{\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{v:}:=\mathsf{col}_{G}(v)\in\mathbb{R}^{\ell_{0}}. For layer t>0t>0, 𝗀𝗂𝗇(t):𝒢1t\mathsf{gin}^{(t)}:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{t}} is given by: (G,v)𝑭v:(t):=𝗆𝗅𝗉(t)(𝑭v:(t1),uNG(v)𝑭u:(t1))(G,v)\mapsto{\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v:}:=\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigl{(}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v:},\textstyle\sum_{u\in N_{G}(v)}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{u:}\bigr{)}, with 𝑭(t)n×t{\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\in\mathbb{R}^{n\times\ell_{t}} and 𝗆𝗅𝗉(t)=(𝗆𝗅𝗉1(t),,𝗆𝗅𝗉t(t)):2t1t\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}=(\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{1},\ldots,\mathsf{mlp}_{\ell_{t}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}):\mathbb{R}^{2\ell_{t-1}}\to\mathbb{R}^{\ell_{t}} is an 𝖬𝖫𝖯\mathsf{MLP}. We denote by 𝖦𝖨𝖭(t)\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} the class of 𝖦𝖨𝖭s\mathsf{GIN}\text{s} consisting tt layers. Clearly, 𝗀𝗂𝗇(0)\mathsf{gin}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}} can be represented in 𝖦𝖳𝖫(0)\mathsf{GTL}^{\!\scalebox{0.6}{(}0\scalebox{0.6}{)}} by considering the expressions φi(0)(x1):=Pi(x1)\varphi_{i}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1}):=P_{i}(x_{1}) for each i[0]i\in[\ell_{0}]. To represent 𝗀𝗂𝗇(t)\mathsf{gin}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, assume that we have t1\ell_{t-1} expressions φi(t1)(x1)\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}) in 𝖦𝖳𝖫(t1)(Ω)\mathsf{GTL}^{\!\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(\Omega) representing 𝗀𝗂𝗇(t1)\mathsf{gin}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}. That is, we have [[φi(t1),v]]G=Fvi(t1)[\![\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}},v]\!]_{G}={F}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{vi} for each vertex vv and i[t1]i\in[\ell_{t-1}]. Then 𝗀𝗂𝗇(t)\mathsf{gin}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} is represented by t\ell_{t} expressions φi(t)(x1)\varphi_{i}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}) defined as:

𝗆𝗅𝗉i(t)(φ1(t1)(x1),,φt1(t1)(x1),x2E(x1,x2)φ1(t1)(x2),,x2E(x1,x2)φt1(t1)(x2)),\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{i}\Bigl{(}\varphi_{1}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}),\ldots,\varphi_{\ell_{t-1}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}),\textstyle\sum_{x_{2}}E(x_{1},x_{2})\cdot\varphi_{1}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2}),\ldots,\textstyle\sum_{x_{2}}E(x_{1},x_{2})\cdot\varphi_{\ell_{t-1}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2})\Bigr{)},

which are now expressions in 𝖦𝖳𝖫(t)(Ω)\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega) where Ω\Omega consists of 𝖬𝖫𝖯s\mathsf{MLP}\text{s}. We have [[φi(t),v]]G=Fv,i(t)[\![\varphi_{i}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},v]\!]_{G}={F}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v,i} for each vVGv\in V_{G} and i[t]i\in[\ell_{t}], as desired. Hence, Theorem 4.3 tells that tt-layered 𝖦𝖨𝖭s\mathsf{GIN}\text{s} cannot be more separating than tt rounds of color refinement, in accordance with known results (Xu et al., 2019; Morris et al., 2019). We thus simply cast 𝖦𝖨𝖭s\mathsf{GIN}\text{s} in 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega) to obtain an upper bound on their separation power. In the supplementary material (Section D) we give similar analyses for GraphSage 𝖦𝖭𝖭s\mathsf{GNN}\text{s} with various aggregation functions (Hamilton et al., 2017), 𝖦𝖢𝖭s\mathsf{GCN}\text{s} (Kipf & Welling, 2017), simplified 𝖦𝖢𝖭s\mathsf{GCN}\text{s} (𝖲𝖦𝖢s\mathsf{SGC}\text{s}) (Wu et al., 2019), Principled Neighborbood Aggregation (𝖯𝖭𝖠\mathsf{PNA}s) (Corso et al., 2020), and revisit the analysis of 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} (Defferrard et al., 2016) given in Balcilar et al. (2021a).

2. The separation power of any vertex embedding 𝖦𝖭𝖭\mathsf{GNN} architecture which is an kk-𝖬𝖯𝖭𝖭(t)\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} is bounded by the power of tt rounds of k-𝖶𝖫k\text{-}\mathsf{WL}.

For k=1k=1, we consider extended Graph Isomorphism Networks (𝖾𝖦𝖨𝖭s\mathsf{e}\mathsf{GIN}\text{s}) (Barceló et al., 2020). For an 𝖾𝗀𝗂𝗇𝖾𝖦𝖨𝖭\mathsf{egin}\in\mathsf{e}\mathsf{GIN}, 𝖾𝗀𝗂𝗇(0):𝒢10\mathsf{egin}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{0}} is defined as for 𝖦𝖨𝖭s\mathsf{GIN}\text{s}, but for layer t>0t>0, 𝖾𝗀𝗂𝗇(t):𝒢1t\mathsf{egin}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{t}} is defined by (G,v)𝑭v:(t):=𝗆𝗅𝗉(t)(𝑭v:(t1),uNG(v)𝑭u:(t1),uVG𝑭u:(t1))(G,v)\mapsto{\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v:}:=\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigl{(}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v:},\textstyle\sum_{u\in N_{G}(v)}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{u:},\textstyle\sum_{u\in V_{G}}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{u:}\bigr{)}, where 𝗆𝗅𝗉(t)\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} is now an 𝖬𝖫𝖯\mathsf{MLP} from 3t1t\mathbb{R}^{3\ell_{t-1}}\to\mathbb{R}^{\ell_{t}}. The difference with 𝖦𝖨𝖭s\mathsf{GIN}\text{s} is the use of uVG𝑭u:(t1)\sum_{u\in V_{G}}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{u:} which corresponds to the unguarded summation x1φ(t1)(x1)\sum_{x_{1}}\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}). This implies that 𝖳𝖫\mathsf{TL} rather than 𝖦𝖳𝖫\mathsf{GTL} needs to be used. In a similar way as for 𝖦𝖨𝖭s\mathsf{GIN}\text{s}, we can represent 𝖾𝖦𝖨𝖭\mathsf{eGIN} layers in 𝖳𝖫2(t)(Ω)\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega). That is, each 𝖾𝖦𝖨𝖭(t)\mathsf{e}\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} is an 11-𝖬𝖯𝖭𝖭(t)\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}. Theorem 4.2 tells that tt rounds of 1-𝖶𝖫1\text{-}\mathsf{WL} bound the separation power of tt-layered extended 𝖦𝖨𝖭s\mathsf{GIN}\text{s}, conform to Barceló et al. (2020). More generally, any 𝖦𝖭𝖭\mathsf{GNN} looking to go beyond 𝖢𝖱\mathsf{CR} must use non-guarded aggregations.

For k2k\geq 2, it is straightforward to show that tt-layered “folklore” 𝖦𝖭𝖭s\mathsf{GNN}\text{s} (k-𝖥𝖦𝖭𝖭sk\text{-}\mathsf{FGNN}\text{s}) (Maron et al., 2019b) are kk-𝖬𝖯𝖭𝖭(t)\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} and thus, by Theorem 4.2, tt rounds of k-𝖶𝖫k\text{-}\mathsf{WL} bound their separation power. One merely needs to cast the layer definitions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) and observe that k+1k+1 indices and summation depth tt are needed. We thus refine and recover the k-𝖶𝖫k\text{-}\mathsf{WL} bound for k-𝖥𝖦𝖭𝖭sk\text{-}\mathsf{FGNN}\text{s} by Azizian & Lelarge (2021). We also show that the separation power of (k+1)(k+1)-Invariant Graph Networks ((k+1)-𝖨𝖦𝖭s)(k+1)\text{-}\mathsf{IGN}\text{s}) (Maron et al., 2019b) are bounded by k-𝖶𝖫k\text{-}\mathsf{WL}, albeit with an increase in the required rounds.

Theorem 5.1.

For any k1k\geq 1, the separation power of a tt-layered (k+1)-𝖨𝖦𝖭s(k+1)\text{-}\mathsf{IGN}\text{s} is bounded by the separation power of tktk rounds of k-𝖶𝖫k\text{-}\mathsf{WL}.

We hereby answer open problem 1 in Maron et al. (2019a). The case k=1k=1 was solved in Chen et al. (2020) by analyzing properties of 1-𝖶𝖫1\text{-}\mathsf{WL}. By contrast, Theorem 4.2 shows that one can focus on expressing (k+1)-𝖨𝖦𝖭s(k+1)\text{-}\mathsf{IGN}\text{s} in 𝖳𝖫k+1(Ω)\mathsf{TL}_{k+1}(\Omega) and analyzing the summation depth of expressions. The proof of Theorem 5.1 requires non-trivial manipulations of tensor language expressions; it is a simplified proof of Geerts (2020). The additional rounds (tktk) are needed because (k+1)-𝖨𝖦𝖭s(k+1)\text{-}\mathsf{IGN}\text{s} aggregate information in one layer that becomes accessible to k-𝖶𝖫k\text{-}\mathsf{WL} in kk rounds. We defer detail to Section E in the supplementary material, where we also identify a simple class of tt-layered (k+1)-𝖨𝖦𝖭s(k+1)\text{-}\mathsf{IGN}\text{s} that are as powerful as (k+1)-𝖨𝖦𝖭s(k+1)\text{-}\mathsf{IGN}\text{s} but whose separation power is bounded by tt rounds of k-𝖶𝖫k\text{-}\mathsf{WL}.

We also consider “augmented” 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, which are combined with a preprocessing step in which higher-order graph information is computed. In the supplementary material (Section D.3) we show how 𝖳𝖫\mathsf{TL} encodes the preprocessing step, and how this leads to separation bounds in terms of k-𝖶𝖫k\text{-}\mathsf{WL}, where kk depends on the treewidth of the graph information used. Finally, our approach can also be used to show that the spectral 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet}s (Levie et al., 2019) are bounded in separation power by 2-𝖶𝖫2\text{-}\mathsf{WL}. This result complements the spectral analysis of 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet}s given in Balcilar et al. (2021b).

3. The separation power of any graph embedding 𝖦𝖭𝖭\mathsf{GNN} architecture which is a kk-𝖬𝖯𝖭𝖭\mathsf{MPNN} is bounded by the power of k-𝖶𝖫k\text{-}\mathsf{WL}.

Graph embedding methods are commonly obtained from vertex (tuple) embeddings methods by including a readout layer in which all vertex (tuple) embeddings are aggregated. For example, 𝗆𝗅𝗉(vV𝖾𝗀𝗂𝗇(t)(G,v))\mathsf{mlp}(\sum_{v\in V}\mathsf{egin}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,v)) is a typical readout layer for 𝖾𝖦𝖨𝖭s\mathsf{e}\mathsf{GIN}\text{s} . Since 𝖾𝗀𝗂𝗇(t)\mathsf{egin}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} can be represented in 𝖳𝖫2(t)(Ω)\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega), the readout layer can be represented in 𝖳𝖫2(t+1)(Ω)\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega), using an extra summation. So they are 11-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. Hence, their separation power is bounded by 𝗀𝗐𝗅1(t)\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, in accordance with Theorem 4.4. This holds more generally. If vertex embedding methods are kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, then so are their graph versions, which are then bounded by 𝗀𝗐𝗅k()\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}} by our Theorem 4.4.

4. To go beyond the separation power of k-𝖶𝖫k\text{-}\mathsf{WL}, it is necessary to use 𝖦𝖭𝖭s\mathsf{GNN}\text{s} whose layers are represented by expressions of treewidth >k>k.

Hence, to design expressive 𝖦𝖭𝖭s\mathsf{GNN}\text{s} one needs to define the layers such that treewidth of the resulting 𝖳𝖫\mathsf{TL} expressions is large enough. For example, to go beyond 1-𝖶𝖫1\text{-}\mathsf{WL}, 𝖳𝖫3\mathsf{TL}_{3} representable linear algebra operations should be used. Treewidth also sheds light on the open problem from Maron et al. (2019a) where it was asked whether polynomial layers (in 𝑨{\bm{A}}) increase the separation power. Indeed, consider a layer of the form σ(𝑨3𝑭𝑾)\sigma({\bm{A}}^{3}\cdot{\bm{F}}\cdot{\bm{W}}), which raises the adjacency matrix 𝑨{\bm{A}} to the power three. Translated in 𝖳𝖫(Ω)\mathsf{TL}(\Omega), layer expressions resemble x2x3x4E(x1,x2)E(x2,x3)E(x3,x4)\sum_{x_{2}}\sum_{x_{3}}\sum_{x_{4}}E(x_{1},x_{2})\cdot E(x_{2},x_{3})\cdot E(x_{3},x_{4}), of treewidth one. Proposition 4.5 tells that the layer is bounded by 𝗐𝗅1(3)\mathsf{wl}_{1}^{\scalebox{0.6}{(}3\scalebox{0.6}{)}} (and in fact by 𝖼𝗋(3)\mathsf{cr}^{\scalebox{0.6}{(}3\scalebox{0.6}{)}}) in separation power. If instead, the layer is of the form σ(𝑪𝑭𝑾)\sigma({\bm{C}}\cdot{\bm{F}}\cdot{\bm{W}}) where Cij{C}_{ij} holds the number of cliques containing the edge ijij. Then, in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) we get expressions containing x2x3E(x1,x2)E(x1,x3)E(x2,x3)\sum_{x_{2}}\sum_{x_{3}}E(x_{1},x_{2})\cdot E(x_{1},x_{3})\cdot E(x_{2},x_{3}). The variables form a 33-clique resulting in expressions of treewidth two. As a consequence, the separation power will be bounded by 𝗐𝗅2(2)\mathsf{wl}_{2}^{\scalebox{0.6}{(}2\scalebox{0.6}{)}}. These examples show that it is not the number of multiplications (in both cases two) that gives power, it is how variables are connected to each other.

6 Function Approximation

We next provide characterizations of functions that can be approximated by 𝖳𝖫\mathsf{TL} expressions, when interpreted as functions. We recover and extend results from Azizian & Lelarge (2021) by taking the number of layers of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} into account. We also provide new results related to color refinement.

6.1 General TL Approximation Results

We assume that 𝒢s\mathcal{G}_{s} is a compact space by requiring that vertex labels come from a compact set K0K\subseteq\mathbb{R}^{\ell_{0}}. Let \mathcal{F} be a set of functions f:𝒢sff:\mathcal{G}_{s}\to\mathbb{R}^{\ell_{f}} and define its closure ¯\overline{\mathcal{F}} as all functions hh from 𝒢s\mathcal{G}_{s} for which there exists a sequence f1,f2,f_{1},f_{2},\ldots\in\mathcal{F} such that limi𝗌𝗎𝗉G,𝒗fi(G,𝒗)h(G,𝒗)=0\lim_{i\to\infty}\mathsf{sup}_{G,{\bm{v}}}\|f_{i}(G,{\bm{v}})-h(G,{\bm{v}})\|=0 for some norm .\|.\|. We assume \mathcal{F} to satisfy two properties. First, \mathcal{F} is concatenation-closed: if f1:𝒢spf_{1}:\mathcal{G}_{s}\to\mathbb{R}^{p} and f2:𝒢sqf_{2}:\mathcal{G}_{s}\to\mathbb{R}^{q} are in \mathcal{F}, then g:=(f1,f2):𝒢sp+q:(G,𝒗)(f1(G,𝒗),f2(G,𝒗))g:=(f_{1},f_{2}):\mathcal{G}_{s}\to\mathbb{R}^{p+q}:(G,{\bm{v}})\mapsto(f_{1}(G,{\bm{v}}),f_{2}(G,{\bm{v}})) is also in \mathcal{F}. Second, \mathcal{F} is function-closed, for a fixed \ell\in\mathbb{N}: for any ff\in\mathcal{F} such that f:𝒢spf:\mathcal{G}_{s}\to\mathbb{R}^{p}, also gf:𝒢sg\circ f:\mathcal{G}_{s}\to\mathbb{R}^{\ell} is in \mathcal{F} for any continuous function g:pg:\mathbb{R}^{p}\to\mathbb{R}^{\ell}. For such \mathcal{F}, we let \mathcal{F}_{\ell} be the subset of functions in \mathcal{F} from 𝒢s\mathcal{G}_{s} to \mathbb{R}^{\ell}. Our next result is based on a generalized Stone-Weierstrass Theorem (Timofte, 2005), also used in Azizian & Lelarge (2021).

Theorem 6.1.

For any \ell, and any set \mathcal{F} of functions, concatenation and function closed for \ell, we have: ¯={f:𝒢sρs()ρs(f)}\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}\mid\rho_{s}(\mathcal{F})\subseteq\rho_{s}(f)\}.

This result gives us insight on which functions can be approximated by, for example, a set \mathcal{F} of functions originating from a class of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. In this case, ¯\overline{\mathcal{F}_{\ell}} represent all functions approximated by instances of such a class and Theorem 6.1 tells us that this set corresponds precisely to the set of all functions that are equally or less separating than the 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in this class. If, in addition, \mathcal{F}_{\ell} is more separating that 𝖢𝖱\mathsf{CR} or k-𝖶𝖫k\text{-}\mathsf{WL}, then we can say more. Let 𝖺𝗅𝗀{𝖼𝗋(t),𝗀𝖼𝗋(t),𝗏𝗐𝗅k(t),𝗀𝗐𝗅k()}\mathsf{alg}\in\{\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\}.

Corollary 6.2.

Under the assumptions of Theorem 6.1 and if ρ()=ρ(𝖺𝗅𝗀)\rho(\mathcal{F}_{\ell})=\rho(\mathsf{alg}), then ¯={f:𝒢sρ(𝖺𝗅𝗀)ρ(f)}\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}\mid\rho(\mathsf{alg})\subseteq\rho(f)\}.

The properties of being concatenation and function-closed are satisfied for sets of functions representable in our tensor languages, if Ω\Omega contains all continuous functions g:pg:\mathbb{R}^{p}\to\mathbb{R}^{\ell}, for any pp, or alternatively, all 𝖬𝖫𝖯s\mathsf{MLP}\text{s} (by Lemma 32 in Azizian & Lelarge (2021)). Together with our results in Section 4, the corollary implies that 𝖬𝖯𝖭𝖭s(t)\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, 11-𝖬𝖯𝖭𝖭s(t)\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, kk-𝖬𝖯𝖭𝖭s(t)\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} or kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} can approximate all functions with equal or less separation power than 𝖼𝗋(t)\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, 𝗀𝖼𝗋(t)\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, 𝗏𝗐𝗅k(t)\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} or 𝗀𝗐𝗅k()\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}, respectively.

Prop. 3.1 also tells that the closure consists of invariant (s=0s=0) and equivariant ( s>0s>0) functions.

6.2 Consequences for GNNs

All our results combined provide a recipe to guarantee that a given function can be approximated by 𝖦𝖭𝖭\mathsf{GNN} architectures. Indeed, suppose that your class of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} is an 𝖬𝖯𝖭𝖭(t)\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} (respectively, 11-𝖬𝖯𝖭𝖭(t)\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, kk-𝖬𝖯𝖭𝖭(t)\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} or kk-𝖬𝖯𝖭𝖭\mathsf{MPNN}, for some k1k\geq 1). Then, since most classes of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} are concatenation-closed and allow the application of arbitrary 𝖬𝖫𝖯s\mathsf{MLP}\text{s}, this implies that your 𝖦𝖭𝖭s\mathsf{GNN}\text{s} can only approximate functions ff that are no more separating than 𝖼𝗋(t)\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} (respectively, 𝗀𝖼𝗋(t)\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, 𝗏𝗐𝗅k(t)\mathsf{vwl}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{k} or 𝗀𝗐𝗅k()\mathsf{gwl}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}_{k}). To guarantee that that these functions can indeed be approximated, one additionally has to show that your class of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} matches the corresponding labeling algorithm in separation power.

For example, 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in 𝖦𝖨𝖭(t)\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} are 𝖬𝖯𝖭𝖭s(t)\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, and thus 𝖦𝖨𝖭(t)¯\overline{\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}} contains any function f:𝒢1f:\mathcal{G}_{1}\to\mathbb{R}^{\ell} satisfying ρ1(𝖼𝗋(t))ρ1(f)\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f). Similarly, 𝖾𝖦𝖨𝖭(t)\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}s are 11-𝖬𝖯𝖭𝖭s(t)\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, so 𝖾𝖦𝖨𝖭(t)¯\overline{\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}} contains any function satisfying ρ1(𝗐𝗅1(t))ρ1(f)\rho_{1}(\mathsf{wl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f); and when extended with a readout layer, their closures consist of functions f:𝒢0f:\mathcal{G}_{0}\to\mathbb{R}^{\ell} satisfying ρ0(𝗀𝖼𝗋(t))=ρ0(𝗏𝗐𝗅1(t))ρ0(f)\rho_{0}(\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{0}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{0}(f). Finally, k-𝖥𝖦𝖭𝖭(t)k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}s are kk-𝖬𝖯𝖭𝖭s(t)\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, so k-𝖥𝖦𝖭𝖭(t)¯\overline{k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}} consist of functions ff such that ρ1(𝗏𝗐𝗅k(t))ρ1(f)\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f). We thus recover and extend results by Azizian & Lelarge (2021) by including layer information (tt) and by treating color refinement separately from 1-𝖶𝖫1\text{-}\mathsf{WL} for vertex embeddings. Furthermore, Theorem 5.1 implies that (k+1)-𝖨𝖦𝖭¯\overline{(k+1)\text{-}\mathsf{IGN}_{\ell}} consists of functions ff satisfying ρ1(𝗏𝗐𝗅k())ρ1(f)\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}})\subseteq\rho_{1}(f) and ρ0(𝗀𝗐𝗅k())ρ0(f)\rho_{0}(\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}})\subseteq\rho_{0}(f), a case left open in Azizian & Lelarge (2021).

These results follow from Corollary 6.2, that the respective classes of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} can simulate 𝖢𝖱\mathsf{CR} or k-𝖶𝖫k\text{-}\mathsf{WL} on either graphs with discrete (Xu et al., 2019; Barceló et al., 2020) or continuous labels (Maron et al., 2019b), and that they are kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} of the appropriate form.

7 Conclusion

Connecting 𝖦𝖭𝖭s\mathsf{GNN}\text{s} and tensor languages allows us to use our analysis of tensor languages to understand the separation and approximation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. The number of indices and summation depth needed to represent the layers in 𝖦𝖭𝖭s\mathsf{GNN}\text{s} determine their separation power in terms of color refinement and Weisfeiler-Leman tests. The framework of kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} provides a handy toolbox to understand existing and new 𝖦𝖭𝖭\mathsf{GNN} architectures, and we demonstrate this by recovering several results about the power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} presented recently in the literature, as well as proving new results.

8 Aknowledgements & Disclosure Funding

This work is partially funded by ANID–Millennium Science Initiative Program–Code ICN17_002, Chile.

References

  • Abo Khamis et al. (2016) Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. FAQ: Questions Asked Frequently. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pp.  13–28. ACM, 2016. URL https://doi.org/10.1145/2902251.2902280.
  • Aji & McEliece (2000) Srinivas M. Aji and Robert J. McEliece. The generalized distributive law. IEEE Transactions on Information Theory, 46(2):325–343, 2000. URL https://doi.org/10.1109/18.825794.
  • Azizian & Lelarge (2021) Waiss Azizian and Marc Lelarge. Expressive power of invariant and equivariant graph neural networks. In Proceedings of the 9th International Conference on Learning Representations, ICLR, 2021. URL https://openreview.net/forum?id=lxHgXYN4bwl.
  • Balcilar et al. (2021a) Muhammet Balcilar, Pierre Héroux, Benoit Gaüzère, Pascal Vasseur, Sébastien Adam, and Paul Honeine. Breaking the limits of message passing graph neural networks. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  599–608. PMLR, 2021a. URL http://proceedings.mlr.press/v139/balcilar21a.html.
  • Balcilar et al. (2021b) Muhammet Balcilar, Guillaume Renton, Pierre Héroux, Benoit Gaüzère, Sébastien Adam, and Paul Honeine. Analyzing the expressive power of graph neural networks in a spectral perspective. In Proceedings of the 9th International Conference on Learning Representations, ICLR, 2021b. URL https://openreview.net/forum?id=-qh0M9XWxnv.
  • Barceló et al. (2020) Pablo Barceló, Egor V Kostylev, Mikael Monet, Jorge Pérez, Juan Reutter, and Juan Pablo Silva. The logical expressiveness of graph neural networks. In Proceedings of the 8th International Conference on Learning Representations, ICLR, 2020. URL https://openreview.net/forum?id=r1lZ7AEKvB.
  • Barceló et al. (2021) Pablo Barceló, Floris Geerts, Juan L. Reutter, and Maksimilian Ryschkov. Graph neural networks with local graph parameters. In Advances in Neural Information Processing Systems, volume 34, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/d4d8d1ac7e00e9105775a6b660dd3cbb-Abstract.html.
  • Bodnar et al. (2021) Cristian Bodnar, Fabrizio Frasca, Yuguang Wang, Nina Otter, Guido F. Montúfar, Pietro Lió, and Michael M. Bronstein. Weisfeiler and Lehman go topological: Message passing simplicial networks. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  1026–1037. PMLR, 2021. URL http://proceedings.mlr.press/v139/bodnar21a.html.
  • Bouritsas et al. (2020) Giorgos Bouritsas, Fabrizio Frasca, Stefanos Zafeiriou, and Michael M. Bronstein. Improving graph neural network expressivity via subgraph isomorphism counting. In Graph Representation Learning and Beyond (GRL+) Workshop at the 37 th International Conference on Machine Learning, 2020. URL https://arxiv.org/abs/2006.09252.
  • Brijder et al. (2019) Robert Brijder, Floris Geerts, Jan Van den Bussche, and Timmy Weerwag. On the expressive power of query languages for matrices. ACM TODS, 44(4):15:1–15:31, 2019. URL https://doi.org/10.1145/3331445.
  • Bruna et al. (2014) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. In Proceedings of the 2nd International Conference on Learning Representations, ICLR, 2014. URL https://openreview.net/forum?id=DQNsQf-UsoDBa.
  • Cai et al. (1992) Jin-yi Cai, Martin Fürer, and Neil Immerman. An optimal lower bound on the number of variables for graph identifications. Comb., 12(4):389–410, 1992. URL https://doi.org/10.1007/BF01305232.
  • Chen et al. (2019) Zhengdao Chen, Soledad Villar, Lei Chen, and Joan Bruna. On the equivalence between graph isomorphism testing and function approximation with GNNs. In Advances in Neural Information Processing Systems, volume 32, 2019. URL https://proceedings.neurips.cc/paper/2019/file/71ee911dd06428a96c143a0b135041a4-Paper.pdf.
  • Chen et al. (2020) Zhengdao Chen, Lei Chen, Soledad Villar, and Joan Bruna. Can graph neural networks count substructures? In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://proceedings.neurips.cc/paper/2020/file/75877cb75154206c4e65e76b88a12712-Paper.pdf.
  • Corso et al. (2020) Gabriele Corso, Luca Cavalleri, Dominique Beaini, Pietro Liò, and Petar Veličković. Principal neighbourhood aggregation for graph nets. In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://proceedings.neurips.cc/paper/2020/file/99cad265a1768cc2dd013f0e740300ae-Paper.pdf.
  • Csanky (1976) L. Csanky. Fast parallel matrix inversion algorithms. SIAM J. Comput., 5(4):618–623, 1976. URL https://doi.org/10.1137/0205040.
  • Curticapean et al. (2017) Radu Curticapean, Holger Dell, and Dániel Marx. Homomorphisms are a good basis for counting small subgraphs. In Proceedings of the 49th Symposium on Theory of Computing, STOC, pp.  210––223, 2017. URL http://dx.doi.org/10.1145/3055399.3055502.
  • Damke et al. (2020) Clemens Damke, Vitalik Melnikov, and Eyke Hüllermeier. A novel higher-order weisfeiler-lehman graph convolution. In Proceedings of The 12th Asian Conference on Machine Learning, ACML, volume 129 of Proceedings of Machine Learning Research, pp.  49–64. PMLR, 2020. URL http://proceedings.mlr.press/v129/damke20a.html.
  • Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, volume 30, 2016. URL https://proceedings.neurips.cc/paper/2016/file/04df4d434d481c5bb723be1b6df1ee65-Paper.pdf.
  • Fürer (2001) Martin Fürer. Weisfeiler-Lehman refinement requires at least a linear number of iterations. In Proceedings of the 28th International Colloqium on Automata, Languages and Programming, ICALP, volume 2076 of Lecture Notes in Computer Science, pp.  322–333. Springer, 2001. URL https://doi.org/10.1007/3-540-48224-5_27.
  • Geerts (2020) Floris Geerts. The expressive power of kth-order invariant graph networks. CoRR, abs/2007.12035, 2020. URL https://arxiv.org/abs/2007.12035.
  • Geerts (2021) Floris Geerts. On the expressive power of linear algebra on graphs. Theory Comput. Syst., 65(1):179–239, 2021. URL https://doi.org/10.1007/s00224-020-09990-9.
  • Geerts et al. (2021a) Floris Geerts, Filip Mazowiecki, and Guillermo A. Pérez. Let’s agree to degree: Comparing graph convolutional networks in the message-passing framework. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  3640–3649. PMLR, 2021a. URL http://proceedings.mlr.press/v139/geerts21a.html.
  • Geerts et al. (2021b) Floris Geerts, Thomas Muñoz, Cristian Riveros, and Domagoj Vrgoc. Expressive power of linear algebra query languages. In Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pp.  342–354. ACM, 2021b. URL https://doi.org/10.1145/3452021.3458314.
  • Gilmer et al. (2017) Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pp.  1263–1272, 2017. URL {http://proceedings.mlr.press/v70/gilmer17a/gilmer17a.pdf}.
  • Grohe (2021) Martin Grohe. The logic of graph neural networks. In Proceedings of the 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS, pp.  1–17. IEEE, 2021. URL https://doi.org/10.1109/LICS52264.2021.9470677.
  • Hamilton (2020) William L. Hamilton. Graph representation learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 14(3):1–159, 2020. URL https://doi.org/10.2200/S01045ED1V01Y202009AIM046.
  • Hamilton et al. (2017) William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, volume 30, 2017. URL https://proceedings.neurips.cc/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf.
  • Hammond et al. (2011) David K. Hammond, Pierre Vandergheynst, and Rémi Gribonval. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2):129–150, 2011. ISSN 1063-5203. doi: https://doi.org/10.1016/j.acha.2010.04.005. URL https://www.sciencedirect.com/science/article/pii/S1063520310000552.
  • Immerman & Lander (1990) Neil Immerman and Eric Lander. Describing graphs: A first-order approach to graph canonization. In Complexity Theory Retrospective: In Honor of Juris Hartmanis on the Occasion of His Sixtieth Birthday, pp.  59–81. Springer, 1990. URL https://doi.org/10.1007/978-1-4612-4478-3_5.
  • Keriven & Peyré (2019) Nicolas Keriven and Gabriel Peyré. Universal invariant and equivariant graph neural networks. In Advances in Neural Information Processing Systems, volume 32, pp.  7092–7101, 2019. URL https://proceedings.neurips.cc/paper/2019/file/ea9268cb43f55d1d12380fb6ea5bf572-Paper.pdf.
  • Kipf & Welling (2017) Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, ICLR, 2017. URL https://openreview.net/pdf?id=SJU4ayYgl.
  • Levie et al. (2019) Ron Levie, Federico Monti, Xavier Bresson, and Michael M. Bronstein. Cayleynets: Graph convolutional neural networks with complex rational spectral filters. IEEE Trans. Signal Process., 67(1):97–109, 2019. URL https://doi.org/10.1109/TSP.2018.2879624.
  • Maron et al. (2019a) Haggai Maron, Heli Ben-Hamu, and Yaron Lipman. Open problems: Approximation power of invariant graph networks. In NeurIPS 2019 Graph Representation Learning Workshop, 2019a. URL https://grlearning.github.io/papers/31.pdf.
  • Maron et al. (2019b) Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. Provably powerful graph networks. In Advances in Neural Information Processing Systems, volume 32, 2019b. URL https://proceedings.neurips.cc/paper/2019/file/bb04af0f7ecaee4aae62035497da1387-Paper.pdf.
  • Maron et al. (2019c) Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. In Proceedings of the 7th International Conference on Learning Representations, ICLR, 2019c. URL https://openreview.net/forum?id=Syx72jC9tm.
  • Merkwirth & Lengauer (2005) Christian Merkwirth and Thomas Lengauer. Automatic generation of complementary descriptors with molecular graph networks. J. Chem. Inf. Model., 45(5):1159–1168, 2005. URL https://doi.org/10.1021/ci049613b.
  • Morgan (1965) H. L. Morgan. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. Journal of Chemical Documentation, 5(2):107–113, 1965. URL https://doi.org/10.1021/c160017a018.
  • Morris et al. (2019) Christopher Morris, Martin Ritzert, Matthias Fey, William L. Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and Leman go neural: Higher-order graph neural networks. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp.  4602–4609, 2019. URL https://doi.org/10.1609/aaai.v33i01.33014602.
  • Morris et al. (2020) Christopher Morris, Gaurav Rattan, and Petra Mutzel. Weisfeiler and Leman go sparse: Towards scalable higher-order graph embeddings. In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://proceedings.neurips.cc//paper/2020/file/f81dee42585b3814de199b2e88757f5c-Paper.pdf.
  • Otto (2017) Martin Otto. Bounded Variable Logics and Counting: A Study in Finite Models, volume 9 of Lecture Notes in Logic. Cambridge University Press, 2017. URL https://doi.org/10.1017/9781316716878.
  • Otto (2019) Martin Otto. Graded modal logic and counting bisimulation. ArXiv, 2019. URL https://arxiv.org/abs/1910.00039.
  • Scarselli et al. (2009) Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Trans. Neural Networks, 20(1):61–80, 2009. URL https://doi.org/10.1109/TNN.2008.2005605.
  • Timofte (2005) Vlad Timofte. Stone–Weierstrass theorems revisited. Journal of Approximation Theory, 136(1):45–59, 2005. URL https://doi.org/10.1016/j.jat.2005.05.004.
  • Velickovic et al. (2018) Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations, ICLR, 2018. URL https://openreview.net/forum?id=rJXMpikCZ.
  • Wu et al. (2019) Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. Simplifying graph convolutional networks. In Proceedings of the 36th International Conference on Machine Learning, ICML, volume 97 of Proceedings of Machine Learning Research, pp.  6861–6871. PMLR, 2019. URL http://proceedings.mlr.press/v97/wu19e.html.
  • Xu et al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In Proceedings of the 7th International Conference on Learning Representations, ICLR, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
  • Zaheer et al. (2017) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets. In Advances in Neural Information Processing Systems, volume 30, 2017. URL https://proceedings.neurips.cc/paper/2017/file/f22e4747da1aa27e363d86d40ff442fe-Paper.pdf.

Supplementary Material

Appendix A Related Work Cnt’d

We provide additional details on how the tensor language 𝖳𝖫(Ω)\mathsf{TL}(\Omega) considered in this paper relates to recent work on other matrix query languages. Closest to 𝖳𝖫(Ω)\mathsf{TL}(\Omega) is the matrix query language 𝗌𝗎𝗆\mathsf{sum}-𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} (Geerts et al., 2021b) whose syntax is close to that of 𝖳𝖫(Ω)\mathsf{TL}(\Omega). There are, however, key differences. First, although 𝗌𝗎𝗆\mathsf{sum}-𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} uses index variables (called vector variables), they all must occur under a summation. In other words, the concept of free index variables is missing, which implies that no general tensors can be represented. In 𝖳𝖫(Ω)\mathsf{TL}(\Omega), we can represent arbitrary tensors and the presence of free index variables is crucial to define vertex, or more generally, kk-tuple embeddings in the context of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. Furthermore, no notion of summation depth was introduced for 𝗌𝗎𝗆\mathsf{sum}-𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG}. In 𝖳𝖫(Ω)\mathsf{TL}(\Omega), the summation depth is crucial to assess the separation power in terms of the number of rounds of color refinement and k-𝖶𝖫k\text{-}\mathsf{WL}. And in fact, the separation power of 𝗌𝗎𝗆\mathsf{sum}-𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} was not considered before, and neither are finite variable fragments of 𝗌𝗎𝗆\mathsf{sum}-𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} and connections to color refinement and k-𝖶𝖫k\text{-}\mathsf{WL} studied before. Finally, no other aggregation functions were considered for 𝗌𝗎𝗆\mathsf{sum}-𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG}. We detail in Section C.5 that 𝖳𝖫(Ω)\mathsf{TL}(\Omega) can be gracefully extended to 𝖳𝖫(Ω,Θ)\mathsf{TL}(\Omega,\Theta) for some arbitrary set Θ\Theta of aggregation functions.

Connections to 1-𝖶𝖫1\text{-}\mathsf{WL} and 2-𝖶𝖫2\text{-}\mathsf{WL} and the separation power of another matrix query language, 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} (Brijder et al., 2019) were established in Geerts (2021). Yet, the design of 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} is completely different in spirit than that of 𝖳𝖫(Ω)\mathsf{TL}(\Omega). Indeed, 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} does not have index variables or explicit summation aggregation. Instead, it only supports matrix multiplication, matrix transposition, function applications, and turning a vector into a diagonal matrix. As such, 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} can be shown to be included in 𝖳𝖫3(Ω)\mathsf{TL}_{3}(\Omega). Similarly as for 𝗌𝗎𝗆\mathsf{sum}-𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG}, 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} cannot represent general tensors, has no (free) index variables and summation depth is not considered (in view of the absence of an explicit summation).

We also emphasize that neither for 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} nor for 𝗌𝗎𝗆\mathsf{sum}-𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} a guarded fragment was considered. The guarded fragment is crucial to make connections to color refinement (Theorem 4.3). Furthermore, the analysis in terms of the number of index variables, summation depth and treewidth (Theorems 4.1,4.2 and Proposition 4.5), were not considered before in the matrix query language literature. For none of these matrix query languages, approximation results were considered (Section 6.1).

Matrix query languages are used to assess the expressive power of linear algebra. Balcilar et al. (2021a) use 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} and the above mentioned connections to 1-𝖶𝖫1\text{-}\mathsf{WL} and 2-𝖶𝖫2\text{-}\mathsf{WL}, to assess the separation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. More specifically, similar to our work, they show that several 𝖦𝖭𝖭\mathsf{GNN} architectures can be represented in 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG}, or fragments thereof. As a consequence, bounds on their separation power easily follow. Furthermore, Balcilar et al. (2021a) propose new architectures inspired by special operators in 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG}. The use of 𝖳𝖫(Ω)\mathsf{TL}(\Omega) can thus been seen as a continuation of their approach. We note, however, that 𝖳𝖫(Ω)\mathsf{TL}(\Omega) is more general than 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} (which is included in 𝖳𝖫3(Ω)\mathsf{TL}_{3}(\Omega)), allows to represent more complex linear algebra computations by means summation (or other) aggregation, and finally, provides insights in the number of iterations needed for color refinement and k-𝖶𝖫k\text{-}\mathsf{WL}. The connection between the number of variables (or treewidth) and k-𝖶𝖫k\text{-}\mathsf{WL} is not present in the work by Balcilar et al. (2021a), neither is the notion of guarded fragment, needed to connect to color refinement. We believe that it is precisely these latter two insights that make the tensor language approach valuable for any 𝖦𝖭𝖭\mathsf{GNN} designer who wishes to upper bound their 𝖦𝖭𝖭\mathsf{GNN} architecture.

Appendix B Details of Section 3

B.1 Proof of Proposition 3.1

Let G=(V,E,𝖼𝗈𝗅)G=(V,E,\mathsf{col}) be a graph and let σ\sigma be a permutation of VV. As usual, we define σG=(Vσ,Eσ,𝖼𝗈𝗅σ)\sigma\star G=(V^{\sigma},E^{\sigma},\mathsf{col}^{\sigma}) as the graph with vertex set Vσ:=VV^{\sigma}:=V, edge set vwEσvw\in E^{\sigma} if and only if σ1(v)σ1(w)E\sigma^{-1}(v)\sigma^{-1}(w)\in E, and 𝖼𝗈𝗅σ(v):=𝖼𝗈𝗅(σ1(v))\mathsf{col}^{\sigma}(v):=\mathsf{col}(\sigma^{-1}(v)). We need to show that for any expression φ(𝒙)\varphi({\bm{x}}) in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) either [[φ,σ𝒗]]σG=[[φ,𝒗]]G[\![\varphi,\sigma\star{\bm{v}}]\!]_{\sigma\star G}=[\![\varphi,{\bm{v}}]\!]_{G}, or when φ\varphi has no free index variables, [[φ]]σG=[[φ]]G[\![\varphi]\!]_{\sigma\star G}=[\![\varphi]\!]_{G}. We verify this by a simple induction on the structure of expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega).

  • If φ(xi,xj)=𝟏xi𝗈𝗉xj\varphi(x_{i},x_{j})=\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}}, then for a valuation ν\nu mapping xix_{i} to viv_{i} and xjx_{j} to vjv_{j} in VV:

    [[𝟏xi𝗈𝗉xj,ν]]G=𝟏vi𝗈𝗉vj=𝟏σ(vi)𝗈𝗉σ(vj)=[[𝟏xi𝗈𝗉xj,σν]]σG,[\![\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}},\nu]\!]_{G}=\bm{1}_{v_{i}\mathop{\mathsf{op}}v_{j}}=\bm{1}_{\sigma(v_{i})\mathop{\mathsf{op}}\sigma(v_{j})}=[\![\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}},\sigma\star\nu]\!]_{\sigma\star G},

    where we used that σ\sigma is a permutation.

  • If φ(xi)=P(xi)\varphi(x_{i})=P_{\ell}(x_{i}), then for a valuation μ\mu mapping xix_{i} to viv_{i} in VV:

    [[P,μ]]G=(𝖼𝗈𝗅(vi))=(𝖼𝗈𝗅σ(σ(vi))=[[P,σν]]σG,[\![P_{\ell},\mu]\!]_{G}=(\mathsf{col}(v_{i}))_{\ell}=(\mathsf{col}^{\sigma}(\sigma(v_{i}))_{\ell}=[\![P_{\ell},\sigma\star\nu]\!]_{\sigma\star G},

    where we used the definition of 𝖼𝗈𝗅σ\mathsf{col}^{\sigma}.

  • Similarly, if φ(xi,xj)=E(xi,xj)\varphi(x_{i},x_{j})=E(x_{i},x_{j}), then for a valuation ν\nu assigning xix_{i} to viv_{i} and xjx_{j} to vjv_{j}:

    [[φ,ν]]G=𝟏vivjE=𝟏σ(vi)σ(vj)Eσ=[[φ,σν]]σG,[\![\varphi,\nu]\!]_{G}=\bm{1}_{v_{i}v_{j}\in E}=\bm{1}_{\sigma(v_{i})\sigma(v_{j})\in E^{\sigma}}=[\![\varphi,\sigma\star\nu]\!]_{\sigma\star G},

    where we used the definition of EσE^{\sigma}.

  • If φ(𝒙)=φ1(𝒙1)φ2(𝒙2)\varphi({\bm{x}})=\varphi_{1}({\bm{x}}_{1})\cdot\varphi_{2}({\bm{x}}_{2}), then for a valuation ν\nu from 𝒙{\bm{x}} to VV:

    [[φ,ν]]G=[[φ1,ν]]G[[φ2,ν]]G=[[φ1,σν]]σG[[φ2,σν]]σG=[[φ,σν]]σG,[\![\varphi,\nu]\!]_{G}=[\![\varphi_{1},\nu]\!]_{G}\cdot[\![\varphi_{2},\nu]\!]_{G}=[\![\varphi_{1},\sigma\star\nu]\!]_{\sigma\star G}\cdot[\![\varphi_{2},\sigma\star\nu]\!]_{\sigma\star G}=[\![\varphi,\sigma\star\nu]\!]_{\sigma\star G},

    where we used the induction hypothesis for φ1\varphi_{1} and φ2\varphi_{2}. The cases φ(𝒙)=φ1(𝒙1)+φ2(𝒙2)\varphi({\bm{x}})=\varphi_{1}({\bm{x}}_{1})+\varphi_{2}({\bm{x}}_{2}) and φ(𝒙)=aφ1(𝒙)\varphi({\bm{x}})=a\cdot\varphi_{1}({\bm{x}}) are dealt with in a similar way.

  • If φ(𝒙)=f(φ1(𝒙1),,φp(𝒙p))\varphi({\bm{x}})=f(\varphi_{1}({\bm{x}}_{1}),\ldots,\varphi_{p}({\bm{x}}_{p})), then

    [[φ,ν]]G\displaystyle[\![\varphi,\nu]\!]_{G} =f([[φ1,ν]]G,,[[φp,ν]]G)\displaystyle=f([\![\varphi_{1},\nu]\!]_{G},\ldots,[\![\varphi_{p},\nu]\!]_{G})
    =f([[φ1,σν]]σG,,[[φp,σν]]σG)\displaystyle=f([\![\varphi_{1},\sigma\star\nu]\!]_{\sigma\star G},\ldots,[\![\varphi_{p},\sigma\star\nu]\!]_{\sigma\star G})
    =[[φ,σν]]σG,\displaystyle=[\![\varphi,\sigma\star\nu]\!]_{\sigma\star G},

    where we used again the induction hypothesis for φ1,,φp\varphi_{1},\ldots,\varphi_{p}.

  • Finally, if φ(𝒙)=yφ1(𝒙,y)\varphi({\bm{x}})=\sum_{y}\varphi_{1}({\bm{x}},y) then for a valuation ν\nu of 𝒙{\bm{x}} to VV:

    [[φ,ν]]G\displaystyle[\![\varphi,\nu]\!]_{G} =vV[[φ1,ν[yv]]]G=vV[[φ1,σν[yv]]]σG\displaystyle=\sum_{v\in V}[\![\varphi_{1},\nu[y\mapsto v]]\!]_{G}=\sum_{v\in V}[\![\varphi_{1},\sigma\star\nu[y\mapsto v]]\!]_{\sigma\star G}
    =vVσ[[φ1,σν[yv]]]σG=[[φ,σν]]σG,\displaystyle=\sum_{v\in V^{\sigma}}[\![\varphi_{1},\sigma\star\nu[y\mapsto v]]\!]_{\sigma\star G}=[\![\varphi,\sigma\star\nu]\!]_{\sigma\star G},

    where we used the induction hypothesis for φ1\varphi_{1} and that Vσ=VV^{\sigma}=V because σ\sigma is a permutation.

We remark that when φ\varphi does not contain free index variables, then [[φ,ν]]G=[[φ]]G[\![\varphi,\nu]\!]_{G}=[\![\varphi]\!]_{G} for any valuation ν\nu, from which invariance follows from the previous arguments. This concludes the proof of Proposition 3.1.

Appendix C Details of Section 4

In the following sections we prove Theorem 4.1, 4.2, 4.3 and 4.4. More specifically, we start by showing these results in the setting that 𝖳𝖫(Ω)\mathsf{TL}(\Omega) only supports summation aggregation (xe\sum_{x}e) and in which the vertex-labellings in graphs take values in {0,1}\{0,1\}^{\ell}. In this context, we introduce classical logics in Section C.1 and recall and extend connections between the separation power of these logics and the separation power of color refinement and k-𝖶𝖫k\text{-}\mathsf{WL} in Section C.2. We connect 𝖳𝖫(Ω)\mathsf{TL}(\Omega) and logics in Section C.3, to finally obtain the desired proofs in Section C.4. We then show how these results can be generalized in the presence of general aggregation operators in Section C.5, and to the setting where vertex-labellings take values in \mathbb{R}^{\ell} in Section C.6.

C.1 Classical Logics

In what follows, we consider graphs G=(VG,EG,𝖼𝗈𝗅G)G=(V_{G},E_{G},\mathsf{col}_{G}) with 𝖼𝗈𝗅G:VG{0,1}\mathsf{col}_{G}:V_{G}\to\{0,1\}^{\ell}. We start by defining the kk-variable fragment 𝖢k\mathsf{C}^{k} of first-order logic with counting quantifiers, followed by the definition of the guarded fragment 𝖦𝖢\mathsf{GC} of 𝖢2\mathsf{C}^{2}. Formulae φ\varphi in 𝖢k\mathsf{C}^{k} are defined over the set {x1,,xk}\{x_{1},\ldots,x_{k}\} of variables and are formed by the following grammar:

φ:=(xi=xj)|E(xi,xj)|Ps(xi)|¬φ|φφ|mxiφ,\varphi:=(x_{i}=x_{j})\,\,|\,\,E(x_{i},x_{j})\,\,|\,\,P_{s}(x_{i})\,\,|\,\,\neg\varphi\,\,|\,\,\varphi\land\varphi\,\,|\,\,\exists^{\geq m}x_{i}\,\varphi,

where i,j[k]i,j\in[k], EE is a binary predicate, PsP_{s} for s[]s\in[\ell] are unary predicates for some \ell\in\mathbb{N}, and mm\in\mathbb{N}. The semantics of formulae in 𝖢k\mathsf{C}^{k} is defined in terms of interpretations relative to a given graph GG and a (partial) valuation μ:{x1,,xk}VG\mu:\{x_{1},\ldots,x_{k}\}\to V_{G}. Such an interpretation maps formulae, graphs and valuations to Boolean values 𝔹:={,}\mathbb{B}:=\{\bot,\top\}, in a similar way as we did for tensor language expressions.

More precisely, given a graph G=(VG,EG,𝖼𝗈𝗅G)G=(V_{G},E_{G},\mathsf{col}_{G}) and partial valuation μ:{x1,,xk}VG\mu:\{x_{1},\ldots,x_{k}\}\to V_{G}, we define [[φ,μ]]G𝔹𝔹[\![\varphi,\mu]\!]_{G}^{\mathbb{B}}\in\mathbb{B} for valuations defined on the free variables in φ\varphi. That is, we define:

[[xi=xj,μ]]G𝔹\displaystyle[\![x_{i}=x_{j},\mu]\!]_{G}^{\mathbb{B}} :=ifμ(xi)=μ(xj) then  else ;\displaystyle:=\mathrm{if~{}}\mu(x_{i})=\mu(x_{j})\text{~{}then~{}}\top\text{~{}else~{}}\bot;
[[E(xi,xj),μ]]G𝔹\displaystyle[\![E(x_{i},x_{j}),\mu]\!]_{G}^{\mathbb{B}} :=ifμ(xi)μ(xj)EG then  else ;\displaystyle:=\mathrm{if~{}}\mu(x_{i})\mu(x_{j})\in E_{G}\text{~{}then~{}}\top\text{~{}else~{}}\bot;
[[Ps(xi),μ]]G𝔹\displaystyle[\![P_{s}(x_{i}),\mu]\!]_{G}^{\mathbb{B}} :=if𝖼𝗈𝗅G(μ(xi))s=1 then  else ;\displaystyle:=\mathrm{if~{}}\mathsf{col}_{G}(\mu(x_{i}))_{s}=1\text{~{}then~{}}\top\text{~{}else~{}}\bot;
[[¬φ,μ]]G𝔹\displaystyle[\![\neg\varphi,\mu]\!]_{G}^{\mathbb{B}} :=¬[[φ,μ]]G𝔹;\displaystyle:=\neg[\![\varphi,\mu]\!]_{G}^{\mathbb{B}};
[[φ1φ2,μ]]G𝔹\displaystyle[\![\varphi_{1}\land\varphi_{2},\mu]\!]_{G}^{\mathbb{B}} :=[[φ1,μ]]G𝔹[[φ2,μ]]G𝔹;\displaystyle:=[\![\varphi_{1},\mu]\!]_{G}^{\mathbb{B}}\land[\![\varphi_{2},\mu]\!]_{G}^{\mathbb{B}};
[[mxiφ1,μ]]G𝔹\displaystyle[\![\exists^{\geq m}x_{i}\,\varphi_{1},\mu]\!]_{G}^{\mathbb{B}} := if |{vVG[[φ,μ[xiv]]]G𝔹=}|m then  else .\displaystyle:=\text{~{}if~{}}|\{v\in V_{G}\mid[\![\varphi,\mu[x_{i}\mapsto v]]\!]_{G}^{\mathbb{B}}=\top\}|\geq m\text{~{}then~{}}\top\text{~{}else~{}}\bot.

In the last expression, μ[xiv]\mu[x_{i}\mapsto v] denotes the valuation μ\mu modified such that it maps xix_{i} to vertex vv.

We will also need the guarded fragment 𝖦𝖢\mathsf{GC} of 𝖢2\mathsf{C}^{2} in which we only allow equality conditions of the form xi=xix_{i}=x_{i}, component expressions of conjunction and disjunction should have the same single free variable, and counting quantifiers can only occur in guarded form: mx2(E(x1,x2)φ(x2))\exists^{\geq m}x_{2}(E(x_{1},x_{2})\land\varphi(x_{2})) or mx1(E(x2,x1)φ(x1))\exists^{\geq m}x_{1}(E(x_{2},x_{1})\land\varphi(x_{1})). The semantics of formulae in 𝖦𝖢\mathsf{GC} is inherited from formulae in 𝖢2\mathsf{C}^{2}.

Finally, we will also consider 𝖢ωk\mathsf{C}^{k}_{\infty\omega}, that is, the logic 𝖢k\mathsf{C}^{k} extended with infinitary disjunctions and conjunctions. More precisely, we add to the grammar of formulae the following constructs:

αAφα and αAφα\bigvee_{\alpha\in A}\varphi_{\alpha}\text{ and }\bigwedge_{\alpha\in A}\varphi_{\alpha}

where the index set AA can be arbitrary, even containing uncountably many indices. We define 𝖦𝖢ω\mathsf{GC}_{\infty\omega} in the same way by relaxing the finite variable conditions. The semantics is, as expected: [[αAφα,μ]]G𝔹=[\![\bigvee_{\alpha\in A}\varphi_{\alpha},\mu]\!]_{G}^{\mathbb{B}}=\top if for at least one αA\alpha\in A, [[φα,μ]]G𝔹=[\![\varphi_{\alpha},\mu]\!]_{G}^{\mathbb{B}}=\top, and [[αAφα,μ]]G𝔹=[\![\bigwedge_{\alpha\in A}\varphi_{\alpha},\mu]\!]_{G}^{\mathbb{B}}=\top if for all αA\alpha\in A, [[φα,μ]]G𝔹=[\![\varphi_{\alpha},\mu]\!]_{G}^{\mathbb{B}}=\top.

We define the free variables of formulae just as for 𝖳𝖫\mathsf{TL}, and similarly, quantifier rank is defined as summation depth (only existential quantifications increase the quantifier rank). For any of the above logics \mathcal{L} we define (t)\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} as the set of formulae in \mathcal{L} of quantifier rank at most tt.

To capture the separation power of logics, we define ρ1((t))\rho_{1}\bigl{(}\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} as the equivalence relation on 𝒢1\mathcal{G}_{1} defined by

((G,v),(H,w))ρ1((t))φ(x)(t):[[φ,μv]]G𝔹=[[φ,μw]]H𝔹,\bigl{(}(G,v),(H,w)\bigr{)}\in\rho_{1}\bigl{(}\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\Longleftrightarrow\forall\varphi(x)\in\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:[\![\varphi,\mu_{v}]\!]_{G}^{\mathbb{B}}=[\![\varphi,\mu_{w}]\!]_{H}^{\mathbb{B}},

where μv\mu_{v} is any valuation such that μ(x)=v\mu(x)=v, and likewise for ww. The relation ρ0\rho_{0} is defined in a similar way, except that now the relation is only over pairs of graphs, and the characterization is over all formulae with no free variables (also called sentences). Finally, we also use, and define, the relation ρs\rho_{s}, which relates pairs from 𝒢s\mathcal{G}_{s}: consisting of a graph and an ss-tuple of vertices. The relation is defined as

((G,𝒗),(H,𝒘))ρs((t))φ(𝒙)(t):[[φ,μ𝒗]]G𝔹=[[φ,μ𝒘]]H𝔹,\bigl{(}(G,{\bm{v}}),(H,{\bm{w}})\bigr{)}\in\rho_{s}\bigl{(}\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\Longleftrightarrow\forall\varphi({\bm{x}})\in\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:[\![\varphi,\mu_{\bm{v}}]\!]_{G}^{\mathbb{B}}=[\![\varphi,\mu_{\bm{w}}]\!]_{H}^{\mathbb{B}},

where 𝒙{\bm{x}} consist of ss free variables and μ𝒗\mu_{\bm{v}} is a valuation assigning the ii-th variable of 𝒙{\bm{x}} to the ii-th value of 𝒗{\bm{v}}, for any i[s]i\in[s].

C.2 Characterization of Separation Power of Logics

We first connect the separation power of the color refinement and kk-dimensional Weisfeiler-Leman algorithms to the separation power of the logics we just introduced. Although most of these connections are known, we present them in a bit of a more fine-grained way. That is, we connect the number of rounds used in the algorithms to the quantifier rank of formulae in the above logics.

Proposition C.1.

For any t0t\geq 0, we have the following identities:

  1. (1)

    ρ1(𝖼𝗋(t))=ρ1(𝖦𝖢(t))\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} and ρ0(𝗀𝖼𝗋(t))=ρ0(𝗀𝗐𝗅1(t))=ρ0(𝖢2,(t+1))\rho_{0}\bigl{(}\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{C}^{2,(t+1)}\bigr{)};

  2. (2)

    For k1k\geq 1, ρ1(𝗏𝗐𝗅k(t))=ρ1(𝖢k+1,(t))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} and

    ρ0(𝖢k+1,(t+k))ρ0(𝗀𝗐𝗅k(t))ρ0(𝖢k+1,(t+1)).\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+k)}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+1)}\bigr{)}.

    As a consequence, ρ0(𝗀𝗐𝗅k())=ρ0(𝖢k+1)\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{(\infty)}\bigr{)}=\rho_{0}\bigl{(}\mathsf{C}^{k+1}\bigr{)}.

Proof.

For (1), the identity ρ1(𝖼𝗋(t))=ρ1(𝖦𝖢(t))\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} is known and can be found, for example, in Theorem V.10 in Grohe (2021). The identity ρ0(𝗀𝖼𝗋(t))=ρ0(𝗀𝗐𝗅1(t))\rho_{0}\bigl{(}\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} can be found in Proposition V.4 in Grohe (2021). The identity ρ0(𝗀𝗐𝗅1(t))=ρ0(𝖢2,(t+1))\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{C}^{2,(t+1)}\bigr{)} is a consequence of the inclusion shown in (2) for k=1k=1.

For (2), we use that ρk(𝗐𝗅k(t))=ρk(𝖢k+1,(t))\rho_{k}\bigl{(}\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{k}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}, see e.g., Theorem V.8 in Grohe (2021). We argue that this identity holds for ρ1(𝗏𝗐𝗅k(t))=ρ1(𝖢k+1,(t))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}. Indeed, suppose that (G,v)(G,v) and (H,w)(H,w) are not in ρ1(𝖢k+1,(t))\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}. Let φ(x1)\varphi(x_{1}) be a formula in 𝖢k+1,(t)\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}} such that [[φ,v]]G𝔹[[φ,w]]H𝔹[\![\varphi,v]\!]_{G}^{\mathbb{B}}\neq[\![\varphi,w]\!]_{H}^{\mathbb{B}}. Consider the formula φ+(x1,,xk)=φ(x1)i=1k(x1=xi)\varphi^{+}(x_{1},\ldots,x_{k})=\varphi(x_{1})\land\bigwedge_{i=1}^{k}(x_{1}=x_{i}). Then, [[φ+,(v,,v)]]G𝔹[[φ+,(w,,w)]]H𝔹[\![\varphi^{+},(v,\ldots,v)]\!]_{G}^{\mathbb{B}}\neq[\![\varphi^{+},(w,\ldots,w)]\!]_{H}^{\mathbb{B}}, and hence (G,(v,,v))(G,(v,\ldots,v)) and (H,(w,,w))(H,(w,\ldots,w)) are not in ρk(𝖢k+1,(t))\rho_{k}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} either. This implies that 𝗐𝗅k(t)(G,(v,,v))𝗐𝗅k(t)(H,(w,,w))\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,(v,\ldots,v))\neq\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(H,(w,\ldots,w)), and thus, by definition, 𝗏𝗐𝗅k(t)(G,v)𝗏𝗐𝗅k(t)(H,w)\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,v)\neq\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(H,w). In other words, (G,v)(G,v) and (H,w)(H,w) are not in ρ1(𝗏𝗐𝗅k(t))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}, from which the inclusion ρ1(𝗏𝗐𝗅k(t))ρ1(𝖢k+1,(t))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} follows. Conversely, if (G,v)(G,v) and (H,w)(H,w) are not in ρ1(𝗏𝗐𝗅k(t))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}, then 𝗐𝗅k(t)(G,(v,,v))𝗐𝗅k(t)(H,(w,,w))\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,(v,\ldots,v))\neq\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(H,(w,\ldots,w)). As a consequence, (G,(v,,v))(G,(v,\ldots,v)) and (H,(w,,w))(H,(w,\ldots,w)) are not in ρk(𝖢k+1,(t))\rho_{k}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} either. Let φ(x1,,xk)\varphi(x_{1},\ldots,x_{k}) be a formula in 𝖢k+1,(t)\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}} such that [[φ,(v,,v)]]G𝔹[[φ,(w,,w)]]H𝔹[\![\varphi,(v,\ldots,v)]\!]_{G}^{\mathbb{B}}\neq[\![\varphi,(w,\ldots,w)]\!]_{H}^{\mathbb{B}}. Then it is readily shown that we can convert φ(x1,,xk)\varphi(x_{1},\ldots,x_{k}) into a formula φ(x1)\varphi^{-}(x_{1}) in 𝖢k+1,(t)\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}} such that [[φ,v]]G𝔹[[φ,w]]H𝔹[\![\varphi^{-},v]\!]_{G}^{\mathbb{B}}\neq[\![\varphi^{-},w]\!]_{H}^{\mathbb{B}}, and thus (G,v)(G,v) and (H,w)(H,w) are not in ρ1(𝖢k+1,(t))\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}. Hence, we also have the inclusion ρ1(𝗏𝗐𝗅k(t))ρ1(𝖢k+1,(t))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\supseteq\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}, form which the first identity in (2) follows.

It remains to show ρ0(𝖢k+1,(t+k))ρ0(𝗀𝗐𝗅k(t))ρ0(𝖢k+1,(t+1))\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+k)}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+1)}\bigr{)}. Clearly, if (G,H)(G,H) is not in ρ0(𝗀𝗐𝗅k(t))\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} then the multisets of labels 𝗐𝗅k(t)(G,𝒗)\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,{\bm{v}}) and 𝗐𝗅k(t)(H,𝒘)\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(H,{\bm{w}}) differ. It is known that with each label cc one can associate a formula φc\varphi^{c} in 𝖢k+1,(t)\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}} such that [[φc,𝒗]]G𝔹=[\![\varphi^{c},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top if and only if 𝗐𝗅k(t)(G,𝒗)=c\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,{\bm{v}})=c. So, if the multisets are different, there must be a cc that occurs more often in one multiset than in the other one. This can be detected by a fomulae of the form =m(x1,,xk)φc(x1,,xk)\exists^{=m}(x_{1},\ldots,x_{k})\varphi^{c}(x_{1},\ldots,x_{k}) which is satisfied if there are mm tuples 𝒗{\bm{v}} with label cc. It is now easily verified that the latter formula can be converted into a formula in 𝖢k+1,(t+k)\mathsf{C}^{k+1,(t+k)}. Hence, the inclusion ρ0(𝖢k+1,(t+k))ρ0(𝗀𝗐𝗅k(t))\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+k)}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} follows.

For ρ0(𝗀𝗐𝗅k(t))ρ0(𝖢k+1,(t+1))\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+1)}\bigr{)}, we show that if (G,H)(G,H) is in ρ0(𝗀𝗐𝗅k(t))\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}, then this implies that [[φ,μ]]G𝔹=[[φ,μ]]H𝔹[\![\varphi,\mu]\!]_{G}^{\mathbb{B}}=[\![\varphi,\mu]\!]_{H}^{\mathbb{B}} for all formulae in 𝖢k+1,(t+1)\mathsf{C}^{k+1,(t+1)} and any valuation μ\mu (notice that μ\mu is superfluous in this definition when formulas have no free variables). Assume that (G,H)(G,H) is in ρ0(𝗀𝗐𝗅k(t))\rho_{0}(\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}). Since any formula of quantifier rank t+1t+1 is a Boolean combination of formulas of less rank or a formula of the form φ=mxiψ\varphi=\exists^{\geq m}x_{i}\,\psi where ψ\psi is of quantifier rank tt, without loss of generality consider a formula of the latter form, and assume for the sake of contradiction that [[φ,μ]]G𝔹=[\![\varphi,\mu]\!]_{G}^{\mathbb{B}}=\top but [[φ,μ]]H𝔹=[\![\varphi,\mu]\!]_{H}^{\mathbb{B}}=\bot. Since [[φ,μ]]G𝔹=[\![\varphi,\mu]\!]_{G}^{\mathbb{B}}=\top, there must be at least mm elements satisfying ψ\psi. More precisely, let v1,,vpv_{1},\dots,v_{p} in GG be all vertices in GG such that for each valuation μ[xvi]\mu[x\mapsto v_{i}] it holds that [[ψ,μ[xvi]]G𝔹=[\![\psi,\mu[x\mapsto v_{i}]\!]_{G}^{\mathbb{B}}=\top. As mentioned, it must be that pp is at least mm. Using again the fact that ρk(𝗐𝗅k(t))=ρk(𝖢k+1,(t))\rho_{k}\bigl{(}\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{k}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}, we infer that the color 𝗐𝗅k(t1)(G,(vi,,vi))\mathsf{wl}_{k}^{(t-1)}(G,(v_{i},\ldots,v_{i})) is the same, for each such viv_{i}.

Now since 𝗀𝗐𝗅k(t1)(G)=𝗀𝗐𝗅k(t1)(H)\mathsf{gwl}_{k}^{(t-1)}(G)=\mathsf{gwl}_{k}^{(t-1)}(H), it is not difficult to see that there must be exactly pp vertices w1,,wpw_{1},\dots,w_{p} in HH such that 𝗐𝗅k(t1)(G,(vi,,vi))=𝗐𝗅k(t1)(H,(wi,,wi))\mathsf{wl}_{k}^{(t-1)}(G,(v_{i},\ldots,v_{i}))=\mathsf{wl}_{k}^{(t-1)}(H,(w_{i},\ldots,w_{i})). Otherwise, it would simply not be the case that the aggregation step of the colors, assigned by k-𝖶𝖫k\text{-}\mathsf{WL} is the same in GG and HH. By the connection to logic, we again know that for valuation μ[xwi]\mu[x\mapsto w_{i}] it holds that [[ψ,μ[xwi]]H𝔹=[\![\psi,\mu[x\mapsto w_{i}]\!]_{H}^{\mathbb{B}}=\top. It then follows that [[φ,μ]]H𝔹=[\![\varphi,\mu]\!]_{H}^{\mathbb{B}}=\top for any valuation μ\mu, which was to be shown.

Finally, we remark that ρ0(𝗀𝗐𝗅k())=ρ0(𝖢k+1)\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{(\infty)}\bigr{)}=\rho_{0}\bigl{(}\mathsf{C}^{k+1}\bigr{)} follows from the preceding inclusions in (2). ∎

Before moving to tensor languages, where we will use infinitary logics to simulate expressions in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega) and 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega), we recall that, when considering the separation power of logics, we can freely move between the logics and their infinitary counterparts:

Theorem C.2.

The following identities hold for any t0t\geq 0, k2k\geq 2 and s0s\geq 0:

  • (1)

    ρ1(𝖦𝖢ω(t))=ρ1(𝖦𝖢(t))\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega}\bigr{)}=\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)};

  • (2)

    ρs(𝖢ωk,(t))=ρs(𝖢k,(t))\rho_{s}\bigl{(}\mathsf{C}^{k,\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega}\bigr{)}=\rho_{s}\bigl{(}\mathsf{C}^{k,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}.

Proof.

For identity (1), notice that we only need to prove that ρ1(𝖦𝖢(t))ρ1(𝖦𝖢ω(t))\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega}\bigr{)}, the other direction follows directly from the definition. We point out the well-known fact that two tuples (G,v)(G,v) and (H,w)(H,w) belong to ρ1(𝖦𝖢(t))\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} if and only if the unravelling of GG rooted at vv up to depth tt is isomorphic to the unravelling of HH rooted at ww up to root tt. Here the unravelling is the infinite tree whose root is the root node, and whose children are the neighbors of the root node (see e.g. Barceló et al. (2020); Otto (2019). Now for the connection with infinitary logic. Assume that the unravellings of GG rooted at vv and of HH rooted at ww up to level tt are isomorphic, but assume for the sake of contradiction that there is a formula φ(x)\varphi(x) in 𝖦𝖢ω(t)\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega} such that [[φ,μv]]G𝔹[[φ,μw]]H𝔹[\![\varphi,\mu_{v}]\!]_{G}^{\mathbb{B}}\neq[\![\varphi,\mu_{w}]\!]_{H}^{\mathbb{B}}, where μv\mu_{v} and μw\mu_{w} are any valuation mapping variable xx to vv and ww, respectively. Now since GG and HH are finite graphs, one can construct, from formula ϕ\phi, a formula ϕ\phi^{\prime} in 𝖦𝖢(t)\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} such that [[ψ,μv]]G𝔹[[ψ,μw]]H𝔹[\![\psi,\mu_{v}]\!]_{G}^{\mathbb{B}}\neq[\![\psi,\mu_{w}]\!]_{H}^{\mathbb{B}}. Notice that this is in contradiction with our assumption that unravellings where isomorphic and therefore indistinguishable by formulae in 𝖦𝖢(t)\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}. To construct ψ\psi, consider an infinitary disjunction aAαa\bigvee_{a\in A}\alpha_{a}. Since GG and HH have a finite number of vertices, and the formulae have a finite number of variables, the number of different valuations from the variables to the vertices in GG or HH is also finite. Thus, one can replace any extra copy of αa\alpha_{a}, αa\alpha_{a^{\prime}} such that their value is the same in GG and HH. The final result is a finite disjunction, and the truth value over GG and HH is equivalent to the original infinitary disjunction.

For identity (2) we refer to Corollary 2.4 in Otto (2017). ∎

C.3 From 𝖳𝖫(Ω)\mathsf{TL}(\Omega) to 𝖢ωk\mathsf{C}^{k}_{\infty\omega} and 𝖦𝖢ω\mathsf{GC}_{\infty\omega}

We are now finally ready to make the connection between expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) and the infinitary logics introduced earlier.

Proposition C.3.

For any expression φ(𝐱)\varphi({\bm{x}}) in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega) and cc\in\mathbb{R}, there exists an expression φ~c(𝐱)\tilde{\varphi}^{c}({\bm{x}}) in 𝖢ωk\mathsf{C}^{k}_{\infty\omega} such that [[φ,𝐯]]G=c[\![\varphi,{\bm{v}}]\!]_{G}=c if and only if [[φ~c,𝐯]]G𝔹=[\![\tilde{\varphi}^{c},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top for any graph G=(VG,EG,𝖼𝗈𝗅G)G=(V_{G},E_{G},\mathsf{col}_{G}) in 𝒢\mathcal{G} and 𝐯VGk{\bm{v}}\in V_{G}^{k}. Furthermore, if φ(x)𝖦𝖳𝖫(Ω)\varphi(x)\in\mathsf{GTL}(\Omega) then φ~c𝖦𝖢ω\tilde{\varphi}^{c}\in\mathsf{GC}_{\infty\omega}. Finally, if φ\varphi has summation depth tt then φ~c\tilde{\varphi}^{c} has quantifier rank tt.

Proof.

We define φ~c\tilde{\varphi}^{c} inductively on the structure of expressions in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega).

  • φ(xi,xj):=𝟏xi𝗈𝗉xj\varphi(x_{i},x_{j}):=\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}}. Assume first that 𝗈𝗉\mathop{\mathsf{op}} is “==”. We distinguish between (a) iji\neq j and (b) i=ji=j. For case (a), if c=1c=1, then we define φ~1(xi,xj):=(xi=xj)\tilde{\varphi}^{1}(x_{i},x_{j}):=(x_{i}=x_{j}), if c=0c=0, then we define φ~0(xi,xj):=¬(xi=xj)\tilde{\varphi}^{0}(x_{i},x_{j}):=\neg(x_{i}=x_{j}), and if c0,1c\neq 0,1, then we define φ~c(xi,xj):=xixi\tilde{\varphi}^{c}(x_{i},x_{j}):=x_{i}\neq x_{i}. For case (b), if c=1c=1, then we define φ~1(xi,xj):=(xi=xi)\tilde{\varphi}^{1}(x_{i},x_{j}):=(x_{i}=x_{i}), and for any c1c\neq 1, we define φ~c(xi,xj):=¬(xi=xi)\tilde{\varphi}^{c}(x_{i},x_{j}):=\neg(x_{i}=x_{i}). The case when 𝗈𝗉\mathop{\mathsf{op}} is “\neq” is treated analogously.

  • φ(xi):=P(xi)\varphi(x_{i}):=P_{\ell}(x_{i}). If c=1c=1, then we define φ~1(xi):=P(xi)\tilde{\varphi}^{1}(x_{i}):=P_{\ell}(x_{i}), if c=0c=0, then we define φ~0(xi):=¬Pj(xi)\tilde{\varphi}^{0}(x_{i}):=\neg P_{j}(x_{i}). For all other cc, we define φ~c(xi,xj):=¬(xi=xi)\tilde{\varphi}^{c}(x_{i},x_{j}):=\neg(x_{i}=x_{i}).

  • φ(xi,xj):=E(xi,xj)\varphi(x_{i},x_{j}):=E(x_{i},x_{j}). If c=1c=1, then we define φ~1(xi,xj):=E(xi,xj)\tilde{\varphi}^{1}(x_{i},x_{j}):=E(x_{i},x_{j}), if c=0c=0, then we define φ~0(xi,xj):=¬E(xi,xj)\tilde{\varphi}^{0}(x_{i},x_{j}):=\neg E(x_{i},x_{j}). For all other cc, we define φ~c(xi,xj):=¬(xi=xi)\tilde{\varphi}^{c}(x_{i},x_{j}):=\neg(x_{i}=x_{i}).

  • φ:=φ1+φ2\varphi:=\varphi_{1}+\varphi_{2}. We observe that [[φ,𝒗]]G=c[\![\varphi,{\bm{v}}]\!]_{G}=c if and only if there are c1,c2c_{1},c_{2}\in\mathbb{R} such that [[φ1,𝒗]]G=c1[\![\varphi_{1},{\bm{v}}]\!]_{G}=c_{1} and [[φ2,𝒗]]G=c2[\![\varphi_{2},{\bm{v}}]\!]_{G}=c_{2} and c=c1+c2c=c_{1}+c_{2}. Hence, it suffices to define

    φ~c:=c1,c2c=c1+c2φ~1c1φ~2c2,\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}c_{1},c_{2}\in\mathbb{R}\\ c=c_{1}+c_{2}\end{subarray}}\tilde{\varphi}_{1}^{c_{1}}\land\tilde{\varphi}_{2}^{c_{2}},

    where φ~1c1\tilde{\varphi}_{1}^{c_{1}} and φ~2c2\tilde{\varphi}_{2}^{c_{2}} are the expressions such that [[φ1,𝒗]]G=c1[\![\varphi_{1},{\bm{v}}]\!]_{G}=c_{1} if and only if [[φ~1c1,𝒗]]G𝔹=[\![\tilde{\varphi}_{1}^{c_{1}},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top and [[φ2,𝒗]]G=c2[\![\varphi_{2},{\bm{v}}]\!]_{G}=c_{2} if and only if [[φ~2c2,𝒗]]G𝔹=[\![\tilde{\varphi}_{2}^{c_{2}},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top, which exist by induction.

  • φ:=φ1φ2\varphi:=\varphi_{1}\cdot\varphi_{2}. This is case is analogous to the previous one. Indeed, [[φ,𝒗]]G=c[\![\varphi,{\bm{v}}]\!]_{G}=c if and only if there are c1,c2c_{1},c_{2}\in\mathbb{R} such that [[φ1,𝒗]]G=c1[\![\varphi_{1},{\bm{v}}]\!]_{G}=c_{1} and [[φ2,𝒗]]G=c2[\![\varphi_{2},{\bm{v}}]\!]_{G}=c_{2} and c=c1c2c=c_{1}\cdot c_{2}. Hence, it suffices to define

    φ~c:=c1,c2c=c1c2φ~1c1φ~2c2.\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}c_{1},c_{2}\in\mathbb{R}\\ c=c_{1}\cdot c_{2}\end{subarray}}\tilde{\varphi}_{1}^{c_{1}}\land\tilde{\varphi}_{2}^{c_{2}}.
  • φ:=aφ1\varphi:=a\cdot\varphi_{1}. This is case is again dealt with in a similar way. Indeed, [[φ,𝒗]]G=c[\![\varphi,{\bm{v}}]\!]_{G}=c if and only if there is a c1c_{1}\in\mathbb{R} such that [[φ1,𝒗]]G=c1[\![\varphi_{1},{\bm{v}}]\!]_{G}=c_{1} and c=ac1c=a\cdot c_{1}. Hence, it suffices to define

    φ~c:=c1c=ac1φ~1c1.\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}c_{1}\in\mathbb{R}\\ c=a\cdot c_{1}\end{subarray}}\tilde{\varphi}_{1}^{c_{1}}.
  • φ:=f(φ1,,φp)\varphi:=f(\varphi_{1},\ldots,\varphi_{p}) with f:pf:\mathbb{R}^{p}\to\mathbb{R}. We observe that [[φ,𝒗]]G=c[\![\varphi,{\bm{v}}]\!]_{G}=c if and only if there are c1,,cpc_{1},\ldots,c_{p}\in\mathbb{R} such that c=f(c1,,cp)c=f(c_{1},\ldots,c_{p}) and [[φi,𝒗]]G=ci[\![\varphi_{i},{\bm{v}}]\!]_{G}=c_{i} for i[p]i\in[p]. Hence, it suffices to define

    φ~c:=c1,,cpc=f(c1,,cp)φ~1c1φ~pcp.\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}c_{1},\ldots,c_{p}\in\mathbb{R}\\ c=f(c_{1},\ldots,c_{p})\end{subarray}}\tilde{\varphi}_{1}^{c_{1}}\land\cdots\land\tilde{\varphi}_{p}^{c_{p}}.
  • φ:=xiφ1\varphi:=\sum_{x_{i}}\varphi_{1}. We observe that [[φ,μ]]G=c[\![\varphi,\mu]\!]_{G}=c implies that we can partition VGV_{G} into \ell parts V1,,VV_{1},\ldots,V_{\ell}, of sizes m1,,mm_{1},\ldots,m_{\ell}, respectively, such that [[φ1,μ[xiv]]]G=ci[\![\varphi_{1},\mu[x_{i}\to v]]\!]_{G}=c_{i} for each vViv\in V_{i}, and such that all cic_{i}’s are pairwise distinct and c=i=1cimic=\sum_{i=1}^{\ell}c_{i}\cdot m_{i}. It now suffices to consider the following formula

    φ~c:=,m1,,mc1,,cc=i=1micii=1=mixiφ~1cixii=1φ~1ci,\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}\ell,m_{1},\ldots,m_{\ell}\in\mathbb{N}\\ c_{1},\ldots,c_{\ell}\in\mathbb{R}\\ c=\sum_{i=1}^{\ell}m_{i}c_{i}\end{subarray}}\bigwedge_{i=1}^{\ell}\exists^{=m_{i}}x_{i}\,\tilde{\varphi}_{1}^{c_{i}}\land\forall x_{i}\,\bigvee_{i=1}^{\ell}\tilde{\varphi}_{1}^{c_{i}},

    where =mixiψ\exists^{=m_{i}}x_{i}\,\psi is shorthand notation for mixiψ¬mi+1xiψ\exists^{\geq m_{i}}x_{i}\,\psi\land\neg\exists^{\geq m_{i}+1}x_{i}\,\psi, and xiψ\forall x_{i}\,\psi denotes ¬1xi¬ψ\neg\exists^{\geq 1}x_{i}\,\neg\psi.

This concludes the construction of φ~c\tilde{\varphi}^{c}. We observe that we only introduce a quantifiers when φ=xiφ1\varphi=\sum_{x_{i}}\varphi_{1} and hence if we assume by induction that summation depth and quantifier rank are in sync, then if φ1\varphi_{1} has summation depth t1t-1 and thus φ~1c\tilde{\varphi}_{1}^{c} has quantifier rank t1t-1 for any cc\in\mathbb{R}, then φ\varphi has summation depth tt, and as can be seen from the definition of φ~c\tilde{\varphi}^{c}, this formula has quantifier rank tt, as desired.

It remains to verify the claim about guarded expressions. This is again verified by induction. The only case requiring some attention is φ(x1):=x2E(x1,x2)φ1(x2)\varphi(x_{1}):=\sum_{x_{2}}\,E(x_{1},x_{2})\land\varphi_{1}(x_{2}) for which we can define

φ~c:=,m1,,mc1,,cc=i=1micim=i=1mi=mx2E(x1,x2)i=1=mix2E(x1,x2)φ~1ci(x2),\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}\ell,m_{1},\ldots,m_{\ell}\in\mathbb{N}\\ c_{1},\ldots,c_{\ell}\in\mathbb{R}\\ c=\sum_{i=1}^{\ell}m_{i}c_{i}\\ m=\sum_{i=1}^{\ell}m_{i}\end{subarray}}\exists^{=m}x_{2}E(x_{1},x_{2})\land\bigwedge_{i=1}^{\ell}\exists^{=m_{i}}x_{2}\,E(x_{1},x_{2})\land\tilde{\varphi}_{1}^{c_{i}}(x_{2}),

which is a formula in 𝖦𝖢\mathsf{GC} again only adding one to the quantifier rank of the formulae φ~1c\tilde{\varphi}_{1}^{c} for cc\in\mathbb{R}. So also here, we have the one-to-one correspondence between summation depth and quantifier rank. ∎

C.4 Proof of Theorem 4.1, 4.2, 4.3 and 4.4

Proposition C.4.

We have the following inclusions: For any t0t\geq 0 and any collection Ω\Omega of functions:

  • ρ1(𝖼𝗋(t))ρ1(𝖦𝖳𝖫(t)(Ω))\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)};

  • ρ1(𝗏𝗐𝗅k(t))ρ1(𝖳𝖫k+1(t)(Ω))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}; and

  • ρ0(𝗀𝗐𝗅k(t))ρ0(𝖳𝖫k+1(t+1)(Ω))\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{TL}_{k+1}^{(t+1)}(\Omega)\bigr{)}.

Proof.

We first show the second bullet by contraposition. That is, we show that if (G,v)(G,v) and (H,w)(H,w) are not in ρ1(𝖳𝖫k+1(t)(Ω))\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}, then neither are they in ρ1(𝗏𝗐𝗅k(t))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}. Indeed, suppose that there exists an expression φ(x1)\varphi(x_{1}) in 𝖳𝖫k+1(t)(Ω)\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega) such that [[φ,v]]G=cc=[[φ,w]]H[\![\varphi,v]\!]_{G}=c\neq c^{\prime}=[\![\varphi,w]\!]_{H}. From Proposition C.3 we know that there exists a formula φ~c\tilde{\varphi}^{c} in 𝖢ωk+1,(t)\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega} such that [[φ~c,v]]G𝔹=[\![\tilde{\varphi}^{c},v]\!]_{G}^{\mathbb{B}}=\top and [[φ~c,w]]H𝔹=[\![\tilde{\varphi}^{c},w]\!]_{H}^{\mathbb{B}}=\bot. Hence, (G,v)(G,v) and (H,w)(H,w) do no belong to ρ1(𝖢ωk+1,(t))\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega}\bigr{)}. Theorem C.2 implies that (G,v)(G,v) and (H,w)(H,w) also do not belong to ρ1(𝖢k+1,(t))\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}. Finally, Proposition C.1 implies that (G,v)(G,v) and (H,w)(H,w) do not belong to ρ1(𝗏𝗐𝗅k(t))\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}, as desired. The third bullet is shown in precisely the same, but using the identities for ρ0\rho_{0} rather than ρ1\rho_{1}, and 𝗀𝗐𝗅k(t)\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} rather than 𝗏𝗐𝗅k(t)\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}.

Also the first bullet is shown in the same way, using the connection between 𝖦𝖳𝖫(t)(Ω)\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega), 𝖦𝖢ω2,(t)\mathsf{GC}_{\infty\omega}^{2,\scalebox{0.6}{(}t\scalebox{0.6}{)}}, 𝖦𝖢(t)\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} and 𝖼𝗋(t)\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, as given by Proposition C.1, Theorem C.2, and Proposition C.3. ∎

We next show that our tensor languages are also more separating than the color refinement and kk-dimensional Weisfeiler-Leman algorithms.

Proposition C.5.

We have the following inclusions: For any t0t\geq 0 and any collection Ω\Omega of functions:

  • ρ1(𝖦𝖳𝖫(t)(Ω))ρ1(𝖼𝗋(t))\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)};

  • ρ1(𝖳𝖫k+1(t)(Ω))ρ1(𝗏𝗐𝗅k(t))\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}; and

  • ρ0(𝖳𝖫k+1(t+k)(Ω))ρ0(𝗀𝗐𝗅k(t))\rho_{0}\bigl{(}\mathsf{TL}_{k+1}^{(t+k)}(\Omega)\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}.

Proof.

For any of these inclusions to hold, for any Ω\Omega, we need to show the inclusion without the use of any functions. We again use the connections between the color refinement and kk-dimensional Weisfeiler-Leman algorithms and finite variable logics as stated in Proposition C.1. More precisely, we show for any formula φ(𝒙)𝖢k,(t)\varphi({\bm{x}})\in\mathsf{C}^{k,\scalebox{0.6}{(}t\scalebox{0.6}{)}} there exists an expression φ^(𝒙)𝖳𝖫k(t)\hat{\varphi}({\bm{x}})\in\mathsf{TL}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} such that for any graph GG in 𝒢\mathcal{G}, [[φ,𝒗]]G𝔹=[\![\varphi,{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top implies [[φ^,𝒗]]G=1[\![\hat{\varphi},{\bm{v}}]\!]_{G}=1 and [[φ,𝒗]]G𝔹=[\![\varphi,{\bm{v}}]\!]_{G}^{\mathbb{B}}=\bot implies [[φ^,𝒗]]G=0[\![\hat{\varphi},{\bm{v}}]\!]_{G}=0. By appropriately selecting kk and tt and by observing that when φ(x)𝖦𝖢\varphi(x)\in\mathsf{GC} then φ^(x)𝖦𝖳𝖫\hat{\varphi}(x)\in\mathsf{GTL}, the inclusions follow.

The construction of φ^(𝒙)\hat{\varphi}({\bm{x}}) is by induction on the structure of formulae in 𝖢k\mathsf{C}^{k}.

  • φ:=(xi=xj)\varphi:=(x_{i}=x_{j}). Then, we define φ^:=𝟏xi=xj\hat{\varphi}:=\bm{1}_{x_{i}=x_{j}}.

  • φ:=P(xi)\varphi:=P_{\ell}(x_{i}). Then, we define φ^:=P(xi)\hat{\varphi}:=P_{\ell}(x_{i}).

  • φ:=E(xi,xj)\varphi:=E(x_{i},x_{j}). Then, we define φ^:=E(xi,xj)\hat{\varphi}:=E(x_{i},x_{j}).

  • φ:=¬φ1\varphi:=\neg\varphi_{1}. Then, we define φ^:=𝟏xi=xiφ^1\hat{\varphi}:=\bm{1}_{x_{i}=x_{i}}-\hat{\varphi}_{1}.

  • φ:=φ1φ2\varphi:=\varphi_{1}\land\varphi_{2}. Then, we define φ^:=φ^1φ^2\hat{\varphi}:=\hat{\varphi}_{1}\cdot\hat{\varphi}_{2}.

  • φ:=mxiφ1\varphi:=\exists^{\geq m}x_{i}\,\varphi_{1}. Consider a polynomial p(x):=jajxjp(x):=\sum_{j}a_{j}x^{j} such that p(x)=0p(x)=0 for x{0,1,,m1}x\in\{0,1,\ldots,m-1\} and p(x)=1p(x)=1 for x{m,m+1,,n}x\in\{m,m+1,\ldots,n\}. Such a polynomial exists by interpolation. Then, we define φ^:=jaj(xiφ^1)j\hat{\varphi}:=\sum_{j}a_{j}\bigl{(}\sum_{x_{i}}\,\hat{\varphi}_{1}\bigr{)}^{j}.

We remark that we here crucially rely on the assumption that 𝒢\mathcal{G} contains graphs of fixed size nn and that 𝖳𝖫k\mathsf{TL}_{k} is closed under linear combinations and product. Clearly, if φ𝖦𝖢\varphi\in\mathsf{GC}, then the above translations results in an expression φ^𝖦𝖳𝖫(Ω)\hat{\varphi}\in\mathsf{GTL}(\Omega). Furthermore, the quantifier rank of φ\varphi is in one-to-one correspondence to the summation depth of φ^\hat{\varphi}.

We can now apply Proposition C.1. That is, if (G,v)(G,v) and (H,w)(H,w) are not in ρ1(𝖼𝗋(t))\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} then by Proposition C.1, there exists a formula φ(x)\varphi(x) in 𝖦𝖢(t)\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} such that [[φ,v]]G𝔹=[[φ,w]]H𝔹=[\![\varphi,v]\!]_{G}^{\mathbb{B}}=\top\neq[\![\varphi,w]\!]_{H}^{\mathbb{B}}=\bot. We have just shown when we consider φ~\tilde{\varphi}, in 𝖦𝖳𝖫(t)\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, also [[φ~,v]]G[[φ~,w]]H[\![\tilde{\varphi},v]\!]_{G}\neq[\![\tilde{\varphi},w]\!]_{H} holds. Hence, (G,v)(G,v) and (H,w)(H,w) are not in ρ1(𝖦𝖳𝖫(t)(Ω))\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)} either, for any Ω\Omega. Hence, ρ1(𝖦𝖳𝖫(t)(Ω))ρ1(𝖼𝗋(t))\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} holds. The other bullets are shown in the same way, again by relying on Proposition C.1 and using that we can move from 𝗏𝗐𝗅k(t)\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} and 𝗀𝗐𝗅k(t)\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} to logical formulae, and to expressions in 𝖳𝖫k+1(t)\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} and 𝖳𝖫k+1(t+k)\mathsf{TL}_{k+1}^{(t+k)}, respectively, to separate (G,v)(G,v) from (H,w)(H,w) or GG from HH, respectively. ∎

Theorems 4.1, 4.2, 4.3 and 4.4 now follow directly from Propositions C.4 and C.5.

C.5 Other aggregation functions

As is mentioned in the main paper, our upper bound results on the separation power of tensor languages (and hence also of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} represented in those languages) generalize easily when other aggregation functions than summation are used in 𝖳𝖫\mathsf{TL} expressions.

To clarify what we understand by an aggregation function, let us first recall the semantics of summation aggregation. Let φ:=xiφ1\varphi:=\sum_{x_{i}}\varphi_{1}, where xi\sum_{x_{i}} represents summation aggregation, let G=(VG,EG,𝖼𝗈𝗅G)G=(V_{G},E_{G},\mathsf{col}_{G}) be a graph, and let ν\nu be a valuation assigning index variables to vertices in VGV_{G}. The semantics is then given by:

[[xiφ1,ν]]G:=vVG[[φ1,ν[xiv]]]G,[\![\textstyle\sum_{x_{i}}\!\varphi_{1},\nu]\!]_{G}:=\sum_{v\in V_{G}}[\![\varphi_{1},\nu[x_{i}\mapsto v]]\!]_{G},

as explained in Section 3. Semantically, we can alternatively view xiφ1\sum_{x_{i}}\varphi_{1} as a function which takes the sum of the elements in the following multiset of real values:

{{[[φ1,ν[xiv]]]GvVG}}.\{\!\{[\![\varphi_{1},\nu[x_{i}\mapsto v]]\!]_{G}\mid v\in V_{G}\}\!\}.

One can now consider, more generally, an aggregation function FF as a function which assigns to any multiset of values in \mathbb{R} a single real value. For example, FF could be 𝗆𝖺𝗑\mathsf{max}, 𝗆𝗂𝗇\mathsf{min}, 𝗆𝖾𝖺𝗇\mathsf{mean}, \ldots. Let Θ\Theta be such a collection of aggregation functions. We next incorporate general aggregation function in tensor language.

First, we extend the syntax of expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) by generalizing the construct xiφ\sum_{x_{i}}\varphi in the grammar of 𝖳𝖫(Ω)\mathsf{TL}(\Omega) expression. More precisely, we define 𝖳𝖫(Ω,Θ)\mathsf{TL}(\Omega,\Theta) as the class of expressions, formed just like tensor language expressions, but in which two additional constructs, unconditional and conditional aggregation, are allowed. For an aggregation function FF we define:

𝖺𝗀𝗀𝗋xjF(φ) and 𝖺𝗀𝗀𝗋xjF(φ(xj)E(xi,xj)),\mathsf{aggr}_{x_{j}}^{F}(\varphi)\ \ \ \text{ and }\ \ \ \mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})\bigr{)},

where in the latter construct (conditional aggregation) the expression φ(xj)\varphi(x_{j}) represents a 𝖳𝖫(Ω,Θ)\mathsf{TL}(\Omega,\Theta) expression whose only free variable is xjx_{j}. The intuition behind these constructs is that unconditional aggregation 𝖺𝗀𝗀𝗋xjF(φ)\mathsf{aggr}_{x_{j}}^{F}(\varphi) allows for aggregating, using aggregate function FF, over the values of φ\varphi where xjx_{j} ranges unconditionally over all vertices in the graph. In contrast, for conditional aggregation 𝖺𝗀𝗀𝗋xjF(φ(xj)E(xi,xj))\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})\bigr{)}, aggregation by FF of the values of φ(xj)\varphi(x_{j}) is conditioned on the neighbors of the vertex assigned to xix_{i}. That is, the vertices for xjx_{j} range only among the neighbors of the vertex assigned to xix_{i}.

More specifically, the semantics of the aggregation constructs is defined as follows:

[[𝖺𝗀𝗀𝗋xjF(φ),ν]]G\displaystyle[\![\mathsf{aggr}_{x_{j}}^{F}(\varphi),\nu]\!]_{G} :=F({{[[φ,ν[xjv]]]GvVG}}).\displaystyle:=F\left(\{\!\{[\![\varphi,\nu[x_{j}\mapsto v]]\!]_{G}\mid v\in V_{G}\}\!\}\right)\in\mathbb{R}.
[[𝖺𝗀𝗀𝗋xjF(φ(xj)E(xi,xj)),ν]]G\displaystyle[\![\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})\bigr{)},\nu]\!]_{G} :=F({{[[φ,ν[xjv]]]GvVG,(ν(xi),v)EG}}).\displaystyle:=F\left(\{\!\{[\![\varphi,\nu[x_{j}\mapsto v]]\!]_{G}\mid v\in V_{G},(\nu(x_{i}),v)\in E_{G}\}\!\}\right)\in\mathbb{R}.

We remark that we can also consider aggregations functions FF over multisets of values in \mathbb{R}^{\ell} for some \ell\in\mathbb{N}. This requires extending the syntax with 𝖺𝗀𝗀𝗋xjF(φ1,,φ)\mathsf{aggr}_{x_{j}}^{F}(\varphi_{1},\ldots,\varphi_{\ell}) for unconditional aggregation and with 𝖺𝗀𝗀𝗋xjF(φ1(xj),,φ(xj)E(xi,xj))\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi_{1}(x_{j}),\ldots,\varphi_{\ell}(x_{j})\mid E(x_{i},x_{j})\bigr{)} for conditional aggregation. The semantics is as expected: F({{(([[φ1,ν[xjv]]]G,,[[φ,ν[xjv]]]G)vVG}})F(\{\!\{(([\![\varphi_{1},\nu[x_{j}\mapsto v]]\!]_{G},\ldots,[\![\varphi_{\ell},\nu[x_{j}\mapsto v]]\!]_{G})\mid v\in V_{G}\}\!\})\in\mathbb{R} and F({{(([[φ1,ν[xjv]]]G,,[[φ,ν[xjv]]]G)vVG,(ν(xi),v)EG}})F(\{\!\{(([\![\varphi_{1},\nu[x_{j}\mapsto v]]\!]_{G},\ldots,[\![\varphi_{\ell},\nu[x_{j}\mapsto v]]\!]_{G})\mid v\in V_{G},(\nu(x_{i}),v)\in E_{G}\}\!\})\in\mathbb{R}.

The need for considering conditional and unconditional aggregation separately is due to the use of arbitrary aggregation functions. Indeed, suppose that one uses an aggregation function FF for which 00\in\mathbb{R} is a neutral value. That is, for any multiset XX of real values, the equality F(X)=F(X{0})F(X)=F(X\uplus\{0\}) holds. For example, the summation aggregation function satisfies this property. We then observe:

[[𝖺𝗀𝗀𝗋xjF(φ(xj)E(xi,xj)),ν]]G\displaystyle[\![\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})\bigr{)},\nu]\!]_{G} =F({{[[φ,ν[xjv]]]vVG,(ν(xi),v)EG}})\displaystyle=F(\{\!\{[\![\varphi,\nu[x_{j}\mapsto v]]\!]\mid v\in V_{G},(\nu(x_{i}),v)\in E_{G}\}\!\}\bigr{)}
=F({{[[φE(xi,xj),ν[xjv]]]vVG}})\displaystyle=F(\{\!\{[\![\varphi\cdot E(x_{i},x_{j}),\nu[x_{j}\mapsto v]]\!]\mid v\in V_{G}\}\!\}\bigr{)}
=[[𝖺𝗀𝗀𝗋xjF(φ(xj)E(xi,xj)),ν]]G.\displaystyle=[\![\mathsf{aggr}_{x_{j}}^{F}(\varphi(x_{j})\cdot E(x_{i},x_{j})),\nu]\!]_{G}.

In other words, unconditional aggregation can simulate conditional aggregation. In contrast, when 0 is not a neutral value of the aggregation function FF, conditional and unconditional aggregation behave differently. Indeed, in such cases 𝖺𝗀𝗀𝗋xjF(φ(xj)E(xi,xj))\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})\bigr{)} and 𝖺𝗀𝗀𝗋xjF(φ(xj)E(xi,xj))\mathsf{aggr}_{x_{j}}^{F}(\varphi(x_{j})\cdot E(x_{i},x_{j})) may evaluate to different values, as illustrated in the following example.

As aggregation function FF we take the average 𝖺𝗏𝗀(X):=1|X|xXx\mathsf{avg}(X):=\frac{1}{|X|}\sum_{x\in X}x for multisets XX of real values. We remark that 0’s in XX contribute to the size of XX and hence 0 is not a neutral element of 𝖺𝗏𝗀\mathsf{avg}. Now, let us consider the expressions

φ1(xi):=𝖺𝗀𝗀𝗋xj𝖺𝗏𝗀(𝟏xj=xjE(xi,xj)) and φ2(xi):=𝖺𝗀𝗀𝗋xj𝖺𝗏𝗀(𝟏xj=xjE(xi,xj)).\varphi_{1}(x_{i}):=\mathsf{aggr}_{x_{j}}^{\mathsf{avg}}(\bm{1}_{x_{j}=x_{j}}\cdot E(x_{i},x_{j}))\text{ and }\varphi_{2}(x_{i}):=\mathsf{aggr}_{x_{j}}^{\mathsf{avg}}(\bm{1}_{x_{j}=x_{j}}\mid E(x_{i},x_{j})).

Let ν\nu be such that ν(xi)=v\nu(x_{i})=v. Then, [[φ1,ν]]G[\![\varphi_{1},\nu]\!]_{G} results in applying the average to the multiset {{𝟏w=wE(v,w)wVG}}\{\!\{\bm{1}_{w=w}\cdot E(v,w)\mid w\in V_{G}\}\!\} which includes the value 11 for every wNG(v)w\in N_{G}(v) and a 0 for every non-neighbor wNG(v)w\not\in N_{G}(v). In other words, [[φ1,ν]]G[\![\varphi_{1},\nu]\!]_{G} results in |NG(v)|/|VG||N_{G}(v)|/|V_{G}|. In contrast, [[φ2,ν]]G[\![\varphi_{2},\nu]\!]_{G} results in applying the average to the multiset {{𝟏w=wwVG,(v,w)EG}}\{\!\{\bm{1}_{w=w}\mid w\in V_{G},(v,w)\in E_{G}\}\!\}. In other words, this multiset only contains the value 11 for each wNG(v)w\in N_{G}(v), ignoring any information about the non-neighbors of vv. In other words, [[φ2,ν]]G[\![\varphi_{2},\nu]\!]_{G} results in |NG(v)|/|NG(v)|=1|N_{G}(v)|/|N_{G}(v)|=1. Hence, conditional and unconditional aggregation behave differently for the average aggregation function.

This said, one could alternative use a more general variant of conditional aggregation of the form 𝖺𝗀𝗀𝗋xjF(φ|ψ)\mathsf{aggr}_{x_{j}}^{F}(\varphi|\psi) with as semantics [[𝖺𝗀𝗀𝗋xjF(φ|ψ),ν]]G:=F({{[[φ,ν[xjv]]]GvVG,[[ψ,ν[xjv]]G0}})[\![\mathsf{aggr}_{x_{j}}^{F}(\varphi|\psi),\nu]\!]_{G}:=F\bigl{(}\{\!\{[\![\varphi,\nu[x_{j}\to v]]\!]_{G}\mid v\in V_{G},[\![\psi,\nu[x_{j}\to v]\!]_{G}\neq 0\}\!\}\bigr{)} where one creates a multiset only for those valuations ν[xjv]\nu[x_{j}\to v] for which the condition ψ\psi evaluates to a non-zero value. This general form of aggregation includes conditional aggregation, by replacing ψ\psi with E(xi,xj)E(x_{i},x_{j}) and restricting φ\varphi, and unconditional aggregation, by replacing ψ\psi with the constant function 11, e.g., 𝟏xj=xj\bm{1}_{x_{j}=x_{j}}. In order not to overload the syntax of 𝖳𝖫\mathsf{TL} expressions, we will not discuss this general form of aggregation further.

The notion of free index variables for expressions in 𝖳𝖫(Ω,Θ)\mathsf{TL}(\Omega,\Theta) is defined as before, where now 𝖿𝗋𝖾𝖾(𝖺𝗀𝗀𝗋xjF(φ)):=𝖿𝗋𝖾𝖾(φ){xj}\mathsf{free}(\mathsf{aggr}_{x_{j}}^{F}(\varphi)):=\mathsf{free}(\varphi)\setminus\{x_{j}\}, and where 𝖿𝗋𝖾𝖾(𝖺𝗀𝗀𝗋xjF(φ(xj)E(xi,xj)):={xi}\mathsf{free}(\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})):=\{x_{i}\} (recall that 𝖿𝗋𝖾𝖾(φ(xj))={xj}\mathsf{free}(\varphi(x_{j}))=\{x_{j}\} in conditional aggregation). Moreover, summation depth is replaced by the notion of aggregation depth, 𝖺𝗀𝖽(φ)\mathsf{agd}(\varphi), defined in the same way as summation depth except that 𝖺𝗀𝖽(𝖺𝗀𝗀𝗋xjF(φ)):=𝖺𝗀𝖽(φ)+1\mathsf{agd}(\mathsf{aggr}_{x_{j}}^{F}(\varphi)):=\mathsf{agd}(\varphi)+1 and 𝖺𝗀𝖽(𝖺𝗀𝗀𝗋xjF(φ(xj)E(xi,xj)):=𝖺𝗀𝖽(φ)+1\mathsf{agd}(\mathsf{aggr}_{x_{j}}^{F}(\varphi(x_{j})\mid E(x_{i},x_{j})):=\mathsf{agd}(\varphi)+1. Similarly, the fragments 𝖳𝖫k(Ω,Θ)\mathsf{TL}_{k}(\Omega,\Theta) and its aggregation depth restricted fragment 𝖳𝖫k(t)(Ω,Θ)\mathsf{TL}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega,\Theta) are defined as before, using aggregation depth rather than summation depth.

For the guarded fragment, 𝖦𝖳𝖫(Ω,Θ)\mathsf{GTL}(\Omega,\Theta), expressions are now restricted such that aggregations must occur only in the form 𝖺𝗀𝗀𝗋xjF(φ(xj)E(xi,xj))\mathsf{aggr}_{x_{j}}^{F}(\varphi(x_{j})\mid E(x_{i},x_{j})), for i,j[2]i,j\in[2]. In other words, aggregation only happens on multisets of values obtained from neighboring vertices.

We now argue that our upper bound results on the separation power remain valid for the extension 𝖳𝖫(Ω,Θ)\mathsf{TL}(\Omega,\Theta) of 𝖳𝖫(Ω)\mathsf{TL}(\Omega) with arbitrary aggregation functions Θ\Theta.

Proposition C.6.

We have the following inclusions: For any t0t\geq 0, any collection Ω\Omega of functions and any collection Θ\Theta of aggregation functions:

  • ρ1(𝖼𝗋(t))ρ1(𝖦𝖳𝖫(t)(Ω,Θ))\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega,\Theta)\bigr{)};

  • ρ1(𝗏𝗐𝗅k(t))ρ1(𝖳𝖫k+1(t)(Ω,Θ))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega,\Theta)\bigr{)}; and

  • ρ0(𝗀𝗐𝗅k(t))ρ0(𝖳𝖫k+1(t+1)(Ω,Θ))\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{TL}_{k+1}^{(t+1)}(\Omega,\Theta)\bigr{)}.

Proof.

It suffices to show that Proposition C.3 also holds for expressions in the fragments of 𝖳𝖫(Ω,Θ)\mathsf{TL}(\Omega,\Theta) considered. In particular, we only need to revise the case of summation aggregation (that is, φ:=xiφ1\varphi:=\sum_{x_{i}}\varphi_{1}) in the proof of Proposition C.3. Indeed, let us consider the more general case when one of the two aggregating functions are used.

  • φ:=𝖺𝗀𝗀𝗋xiF(φ1)\varphi:=\mathsf{aggr}_{x_{i}}^{F}(\varphi_{1}). We then define

    φ~c:=(m1,,m)(c,c1,,c)𝒞(m1,,m,F)s=1=msxiφ~1csxis=1φ~1cs,\tilde{\varphi}^{c}:=\bigvee_{\ell\in\mathbb{N}}\bigvee_{(m_{1},\ldots,m_{\ell})\in\mathbb{N}^{\ell}}\bigvee_{(c,c_{1},\ldots,c_{\ell})\in\mathcal{C}(m_{1},\ldots,m_{\ell},F)}\bigwedge_{s=1}^{\ell}\exists^{=m_{s}}x_{i}\,\tilde{\varphi}_{1}^{c_{s}}\land\forall x_{i}\,\bigvee_{s=1}^{\ell}\tilde{\varphi}_{1}^{c_{s}},

    where 𝒞(m1,,m,F)\mathcal{C}(m_{1},\ldots,m_{\ell},F) now consists of all (c,c1,,c)+1(c,c_{1},\dots,c_{\ell})\in\mathbb{R}^{\ell+1} such that

    c=F({{c1,,c1m1 times,,c,,cm times}}).c=F\Bigl{(}\{\!\{\underbrace{c_{1},\ldots,c_{1}}_{\text{$m_{1}$ times}},\ldots,\underbrace{c_{\ell},\ldots,c_{\ell}}_{\text{$m_{\ell}$ times}}\}\!\}\Bigr{)}.
  • φ:=𝖺𝗀𝗀𝗋xiF(φ1(xi)E(xj,xi))\varphi:=\mathsf{aggr}_{x_{i}}^{F}(\varphi_{1}(x_{i})\mid E(x_{j},x_{i})). We then define

    φ~c:=(m1,,m)(c,c1,,c)𝒞(m1,,m,F)=mxiE(xj,xi)s=1=msxiE(xj,xi)φ~1cs(xi)\tilde{\varphi}^{c}:=\bigvee_{\ell\in\mathbb{N}}\bigvee_{(m_{1},\ldots,m_{\ell})\in\mathbb{N}^{\ell}}\bigvee_{(c,c_{1},\ldots,c_{\ell})\in\mathcal{C}(m_{1},\ldots,m_{\ell},F)}\exists^{=m}x_{i}\,E(x_{j},x_{i})\ \land\\ \bigwedge_{s=1}^{\ell}\exists^{=m_{s}}x_{i}\,E(x_{j},x_{i})\ \land\ \tilde{\varphi}_{1}^{c_{s}}(x_{i})

    where 𝒞(m1,,m,F)\mathcal{C}(m_{1},\ldots,m_{\ell},F) again consists of all (c,c1,,c)+1(c,c_{1},\dots,c_{\ell})\in\mathbb{R}^{\ell+1} such that

    c=F({{c1,,c1m1 times,,c,,cm times}}) and m=s=1ms.c=F\Bigl{(}\{\!\{\underbrace{c_{1},\ldots,c_{1}}_{\text{$m_{1}$ times}},\ldots,\underbrace{c_{\ell},\ldots,c_{\ell}}_{\text{$m_{\ell}$ times}}\}\!\}\Bigr{)}\text{ and }m=\sum_{s=1}^{\ell}m_{s}.

It is readily verified that [[𝖺𝗀𝗀𝗋xiF(φ),𝒗]]G=c[\![\mathsf{aggr}_{x_{i}}^{F}(\varphi),{\bm{v}}]\!]_{G}=c iff [[φ~c,𝒗]]G𝔹=[\![\tilde{\varphi}^{c},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top, and [[𝖺𝗀𝗀𝗋xiF(φ(xi)E(xj,xi)),𝒗]]G=c[\![\mathsf{aggr}_{x_{i}}^{F}(\varphi(x_{i})\mid E(x_{j},x_{i})),{\bm{v}}]\!]_{G}=c iff [[φ~c,𝒗]]G𝔹=[\![\tilde{\varphi}^{c},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top, as desired.

For the guarded case, we note that the expression φ~c\tilde{\varphi}^{c} above yields a guarded expression as long conditional aggregation is used of the form 𝖺𝗀𝗀𝗋xiF(φ(xi)E(xj,xi))\mathsf{aggr}_{x_{i}}^{F}(\varphi(x_{i})\mid E(x_{j},x_{i})) with i,j[2]i,j\in[2], so we can reuse the argument in the proof of Proposition C.3 for the guarded case.∎

We will illustrate later on (Section D) that this generalization allows for assessing the separation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} that use a variety of aggregation functions.

The choice of supported aggregation functions has, of course, an impact on the ability of 𝖳𝖫(Ω,Θ)\mathsf{TL}(\Omega,\Theta) to match color refinement or the k-𝖶𝖫k\text{-}\mathsf{WL} procedures in separation power. The same holds for 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, as shown by Xu et al. (2019). And indeed, the proof of Proposition C.5 relies on the presence of summation aggregation. We note that most lower bounds on the separation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in terms of color refinement or the k-𝖶𝖫k\text{-}\mathsf{WL} procedures assume summation aggregation since summation suffices to construct injective sum-decomposable functions on multisets (Xu et al., 2019; Zaheer et al., 2017), which are used to simulate color refinement and k-𝖶𝖫k\text{-}\mathsf{WL}. A more in-depth analysis of lower bounding 𝖦𝖭𝖭s\mathsf{GNN}\text{s} with less expressive aggregation functions, possibly using weaker versions of color refinement and k-𝖶𝖫k\text{-}\mathsf{WL} is left as future work.

C.6 Generalization to Graphs with real-valued vertex labels

We next consider the more general setting in which 𝖼𝗈𝗅G:VG\mathsf{col}_{G}:V_{G}\to\mathbb{R}^{\ell} for some \ell\in\mathbb{N}. That is, vertices in a graph can carry real-valued vectors. We remark that no changes to neither the syntax nor the semantics of 𝖳𝖫\mathsf{TL} expressions are needed, yet note that [[Ps(x),ν]]G:=𝖼𝗈𝗅G(ν)s[\![P_{s}(x),\nu]\!]_{G}:=\mathsf{col}_{G}(\nu)_{s} is now an element in \mathbb{R} rather than 0 or 11, for each s[]s\in[\ell].

A first observation is that the color refinement and k-𝖶𝖫k\text{-}\mathsf{WL} procedures treat each real value as a separate label. That is, two values that differ only by any small ϵ>0\epsilon>0, are considered different. The proofs of Theorem 4.1, 4.2, 4.3 and 4.4 rely on connections between color refinement and k-𝖶𝖫k\text{-}\mathsf{WL} and the finite variable logics 𝖦𝖢\mathsf{GC} and 𝖢k+1\mathsf{C}^{k+1}, respectively. In the discrete context, the unary predicates Ps(x)P_{s}(x) used in the logical formulas indicate which label vertices have. That is, [[Ps,v]]G𝔹=[\![P_{s},v]\!]_{G}^{\mathbb{B}}=\top iff 𝖼𝗈𝗅G(v)s=1\mathsf{col}_{G}(v)_{s}=1. To accommodate for real values in the context of separation power, these logics now need to be able to differentiate between different labels, that is, different real numbers. We therefore extend the unary predicates allowed in formulas. More precisely, for each dimension s[]s\in[\ell], we now have uncountably many predicates of the form Ps,rP_{s,r}, one for each rr\in\mathbb{R}. In any formula in 𝖦𝖢\mathsf{GC} or 𝖢k+1\mathsf{C}^{k+1} only a finite number of such predicates may occur. The Boolean semantics of these new predicates is as expected:

[[Ps,r(x),ν]]G𝔹:=if𝖼𝗈𝗅G(μ(xi))s=r then  else .[\![P_{s,r}(x),\nu]\!]_{G}^{\mathbb{B}}:=\mathrm{if~{}}\mathsf{col}_{G}(\mu(x_{i}))_{s}=r\text{~{}then~{}}\top\text{~{}else~{}}\bot.

In other words, in our logics, we can now detect which real-valued labels vertices have. Although, in general, the introduction of infinite predicates may cause problems, we here consider a specific setting in which the vertices in a graph have a unique label. This is commonly assumed in graph learning. Given this, it is easily verified that all results in Section C.2 carry over, where all logics involved now use the unary predicates Ps,rP_{s,r} with s[]s\in[\ell] and rr\in\mathbb{R}.

The connection between 𝖳𝖫\mathsf{TL} and logics also carries over. First, for Proposition C.3 we now need to connect 𝖳𝖫\mathsf{TL} expressions, that use a finite number of predicates PsP_{s}, for s[]s\in[\ell], with the extended logics having uncountably many predicates Ps,rP_{s,r}, for s[]s\in[\ell] and rr\in\mathbb{R}, at their disposal. It suffices to reconsider the case φ(xi)=Ps(xi)\varphi(x_{i})=P_{s}(x_{i}) in the proof of Proposition C.3. More precisely, [[Ps(xi),ν]]G[\![P_{s}(x_{i}),\nu]\!]_{G} can now be an arbitrary value cc\in\mathbb{R}. We now simply define φ~c(xi):=Ps,c(xi)\tilde{\varphi}^{c}(x_{i}):=P_{s,c}(x_{i}). By definition [[Ps(xi),ν]]G=c[\![P_{s}(x_{i}),\nu]\!]_{G}=c if and only if [[Ps,c(xi),ν]]G𝔹=[\![P_{s,c}(x_{i}),\nu]\!]_{G}^{\mathbb{B}}=\top, as desired.

The proof for the extended version of proposition C.5 now needs a slightly different strategy, where we build the relevant 𝖳𝖫\mathsf{TL} formula after we construct the contrapositive of the Proposition. Let us first show how to construct a 𝖳𝖫\mathsf{TL} formula that is equivalent to a logical formula on any graph using only labels in a specific (finite) set RR of real numbers.

In other words, given a set RR of real values, we show that for any formula φ(𝒙)𝖢k,(t)\varphi({\bm{x}})\in\mathsf{C}^{k,(t)} using unary predicates Ps,rP_{s,r} such that rRr\in R, we can construct the desired φ^\hat{\varphi}. As mentioned, we only need to reconsider the case φ(xi):=Ps,r(xi)\varphi(x_{i}):=P_{s,r}(x_{i}). We define

φ^:=1rR,rrrrrR,rr(Ps(xi)r𝟏xi=xi).\hat{\varphi}:=\frac{1}{\prod_{r^{\prime}\in R,r\neq r^{\prime}}r-r^{\prime}}\prod_{r^{\prime}\in R,r\neq r^{\prime}}(P_{s}(x_{i})-r^{\prime}\bm{1}_{x_{i}=x_{i}}).

Then, [[φ^,ν]]G[\![\hat{\varphi},\nu]\!]_{G} evaluates to

rR,rr(rr)rR,rr(rr)={1[[Ps,r,ν]]=0[[Ps,r,ν]]=.\frac{\prod_{r^{\prime}\in R,r\neq r^{\prime}}(r-r^{\prime})}{\prod_{r^{\prime}\in R,r\neq r^{\prime}}(r-r^{\prime})}=\begin{cases}1&[\![P_{s,r},\nu]\!]=\top\\ 0&[\![P_{s,r},\nu]\!]=\bot\end{cases}.

Indeed, if [[Ps,r,ν]]=[\![P_{s,r},\nu]\!]=\top, then 𝖼𝗈𝗅G(v)s=r\mathsf{col}_{G}(v)_{s}=r and hence [[Ps,v]]G=r[\![P_{s},v]\!]_{G}=r, resulting in the same nominator and denominator in the above fraction. If [[Ps,r,ν]]=[\![P_{s,r},\nu]\!]=\bot, then 𝖼𝗈𝗅G(v)s=r\mathsf{col}_{G}(v)_{s}=r^{\prime} for some value rRr^{\prime}\in R with rrr\neq r^{\prime}. In this case, the nominator in the above fraction becomes zero. We remark that this revised construction still results in a guarded 𝖳𝖫\mathsf{TL} expression, when the input logical formula is guarded as well.

Coming back to the proof of the extended version of Proposition C.5, let us show the proof for the the fact that ρ1(𝖦𝖳𝖫(t)(Ω))ρ1(𝖼𝗋(t)),\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}, the other two items being analogous. Assume that there is a pair (G,v)(G,v) and (H,w)(H,w) which is not in ρ1(𝖼𝗋(t))\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}. Then, by Proposition C.1, applied on graphs with real-valued labels, there exists a formula φ(x)\varphi(x) in 𝖦𝖢(t)\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} such that [[φ,v]]G𝔹=[[φ,w]]H𝔹=[\![\varphi,v]\!]_{G}^{\mathbb{B}}=\top\neq[\![\varphi,w]\!]_{H}^{\mathbb{B}}=\bot. We remark that φ(x)\varphi(x) uses finitely many Ps,rP_{s,r} predicates. Let RR be the set of real values used in both GG and HH (and φ(x)\varphi(x)). We note that RR is finite. We invoke the construction sketched above, and obtain a formula φ^\hat{\varphi} in 𝖦𝖳𝖫(t)\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} such that [[φ~,v]]G[[φ~,w]]H[\![\tilde{\varphi},v]\!]_{G}\neq[\![\tilde{\varphi},w]\!]_{H}. Hence, (G,v)(G,v) and (H,w)(H,w) is not in ρ1(𝖦𝖳𝖫(t)(Ω))\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)} either, for any Ω\Omega, which was to be shown.

Appendix D Details of Section 5

We here provide some additional details on the encoding of layers of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in our tensor languages, and how, as a consequence of our results from Section 4, one obtains a bound on their separation power. This section showcases that it is relatively straightforward to represent 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in our tensor languages. Indeed, often, a direct translation of the layers, as defined in the literature, suffices.

D.1 color Refinement

We start with 𝖦𝖭𝖭\mathsf{GNN} architectures related to color refinement, or in other words, architectures which can be represented in our guarded tensor language.

GraphSage.

We first consider a “basic” 𝖦𝖭𝖭\mathsf{GNN}, that is, an instance of GraphSage (Hamilton et al., 2017) in which sum aggregation is used. The initial features are given by 𝑭(0)=(𝒇(0)1,,𝒇(0)d0){\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}=({\bm{f}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{1},\ldots,{\bm{f}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{d_{0}}) where 𝒇(0)in×1{\bm{f}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{i}\in\mathbb{R}^{n\times 1} is a hot-one encoding of the iith vertex label in GG. We can represent the initial embedding easily in 𝖦𝖳𝖫(0)\mathsf{GTL}^{\!\scalebox{0.6}{(}0\scalebox{0.6}{)}}, without the use of any summation. Indeed, it suffices to define φi(0)(x1):=Pi(x1)\varphi_{i}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1}):=P_{i}(x_{1}) for i[d0]i\in[d_{0}]. We have F(0)vj=[[φj(0),v]]G{F}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{vj}=[\![\varphi_{j}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}},v]\!]_{G} for j[d0]j\in[d_{0}], and thus the initial features can be represented by simple expressions in 𝖦𝖳𝖫(0)\mathsf{GTL}^{\!\scalebox{0.6}{(}0\scalebox{0.6}{)}}.

Assume now, by induction, that we can also represent the features computed by a basic 𝖦𝖭𝖭\mathsf{GNN} in layer t1t-1. That is, let 𝑭(t1)n×dt1{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\in\mathbb{R}^{n\times d_{t-1}} be those features and for each i[dt1]i\in[d_{t-1}] let φi(t1)(x1)\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}) be expressions in 𝖦𝖳𝖫(t1)(σ)\mathsf{GTL}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(\sigma) representing them. We assume that, for each i[dt1]i\in[d_{t-1}], F(t1)vi=[[φ(t1)i,v]]G{F}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{vi}=[\![\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{i},v]\!]_{G}. We remark that we assume that a summation depth of t1t-1 is needed for layer t1t-1.

Then, in layer tt, a basic 𝖦𝖭𝖭\mathsf{GNN} computes the next features as

𝑭(t):=σ(𝑭(t1)𝑽(t)+𝑨𝑭(t1)𝑾(t)+𝑩(t)),{\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:=\sigma\left({\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\cdot{\bm{V}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}+{\bm{A}}\cdot{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\cdot{\bm{W}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}+{\bm{B}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\right),

where 𝑨n×n{\bm{A}}\in\mathbb{R}^{n\times n} is the adjacency matrix of GG, 𝑽(t){\bm{V}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} and 𝑾(t){\bm{W}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} are weight matrices in dt1×dt\mathbb{R}^{d_{t-1}\times d_{t}}, 𝑩(t)n×dt{\bm{B}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\in\mathbb{R}^{n\times d_{t}} is a (constant) bias matrix consist of nn copies of 𝒃(t)dt{\bm{b}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\in\mathbb{R}^{d_{t}}, and σ\sigma is some activation function. We can simply use the following expressions φj(t)(x1)\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}), for j[dt]j\in[d_{t}]:

σ((i=1dt1Vij(t)φi(t1)(x1))+x2(E(x1,x2)(i=1dt1Wij(t)φi(t1)(x2)))+bj(t)𝟏x1=x1).\sigma\left(\Bigl{(}\sum_{i=1}^{d_{t-1}}\!{V}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})\Bigr{)}+\sum_{x_{2}}\,\biggl{(}E(x_{1},x_{2})\cdot\Bigl{(}\sum_{i=1}^{d_{t-1}}\!{W}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2})\Bigr{)}\biggl{)}{}+b_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\bm{1}_{x_{1}=x_{1}}\right).

Here, Wij(t){W}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, Vij(t){V}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} and bj(t)b_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} are real values corresponding the weight matrices and bias vector in layer tt. These are expressions in 𝖦𝖳𝖫(t)(σ)\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\sigma) since the additional summation is guarded, and combined with the summation depth of t1t-1 of φi(t1)\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}, this results in a summation depth of tt for layer tt. Furthermore, F(t)vi=[[φ(t)i,v]]G{F}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{vi}=[\![\varphi^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{i},v]\!]_{G}, as desired. If we denote by 𝖻𝖦𝖭𝖭(t)\mathsf{b}\mathsf{GNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} the class of tt-layered basic 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, then our results imply

ρ1(𝖼𝗋(t))ρ1((𝖦𝖳𝖫(t)(Ω))ρ1(𝖻𝖦𝖭𝖭(t)),\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}(\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{b}\mathsf{GNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)},

and thus the separation power of basic 𝖦𝖭𝖭s\mathsf{GNN}\text{s} is bounded by the separation power of color refinement. We thus recover known results by Xu et al. (2019) and Morris et al. (2019).

Furthermore, if one uses a readout layer in basic 𝖦𝖭𝖭s\mathsf{GNN}\text{s} to obtain a graph embedding, one typically applies a function 𝗋𝗈:dtdt\mathsf{ro}:\mathbb{R}^{d_{t}}\to\mathbb{R}^{d_{t}} in the form of 𝗋𝗈(vVG𝑭v(t))\mathsf{ro}\bigl{(}\sum_{v\in V_{G}}{\bm{F}}_{v}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}, in which aggregation takes places over all vertices of the graph. This corresponds to an expression in 𝖳𝖫2(t+1)(σ,𝗋𝗈)\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\sigma,\mathsf{ro}): φj:=𝗋𝗈j(x1φj(t1)(x1))\varphi_{j}:=\mathsf{ro}_{j}\bigl{(}\sum_{x_{1}}\varphi_{j}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})\bigr{)}, where 𝗋𝗈j\mathsf{ro}_{j} is the projection of the readout function on the jjthe coordinate. We note that this is indeed not a guarded expression anymore, and thus our results tell that

ρ0(𝗀𝖼𝗋(t))ρ0(𝖳𝖫2(t+1)(Ω))ρ0(𝖻𝖦𝖭𝖭(t)+𝗋𝖾𝖺𝖽𝗈𝗎𝗍).\rho_{0}\bigl{(}\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}(\mathsf{TL}_{2}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{b}\mathsf{GNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}+\mathsf{readout}\bigr{)}.

More generally, GraphSage allows for the use of general aggregation functions FF on the multiset of features of neighboring vertices. To cast the corresponding layers in 𝖳𝖫(Ω)\mathsf{TL}(\Omega), we need to consider the extension 𝖳𝖫(Ω,Θ)\mathsf{TL}(\Omega,\Theta) with an appropriate set Θ\Theta of aggregation functions, as described in Section C.5. In this way, we can represent layer tt by means of the following expressions φj(t)(x1)\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}), for j[dt]j\in[d_{t}].

σ((i=1dt1Vij(t)φi(t1)(x1))+i=1dt1Wij(t)𝖺𝗀𝗀𝗋x2F(φi(t1)(x2)E(x1,x2))+bj(t)𝟏x1=x1),\sigma\left(\Bigl{(}\sum_{i=1}^{d_{t-1}}\!{V}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})\Bigr{)}+\sum_{i=1}^{d_{t-1}}\!{W}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\mathsf{aggr}_{x_{2}}^{F}\Bigl{(}\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2})\mid E(x_{1},x_{2})\Bigr{)}{}+b_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\bm{1}_{x_{1}=x_{1}}\right),

which is now an expression in 𝖦𝖳𝖫(t)({σ},Θ)\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\{\sigma\},\Theta) and hence the bound in terms of tt iterations of color refinement carries over by Proposition C.6. Here, Θ\Theta simply consists of the aggregation functions used in the layers in GraphSage.

GCNs.

Graph Convolution Networks (𝖦𝖢𝖭s\mathsf{GCN}\text{s}) (Kipf & Welling, 2017) operate alike basic 𝖦𝖭𝖭s\mathsf{GNN}\text{s} except that a normalized Laplacian 𝑫1/2(𝑰+𝑨)𝑫1/2{\bm{D}}^{-1/2}({\bm{I}}+{\bm{A}}){\bm{D}}^{-1/2} is used to aggregate features, instead of the adjacency matrix 𝑨{\bm{A}}. Here, 𝑫1/2{\bm{D}}^{-1/2} is the diagonal matrix consisting of reciprocal of the square root of the vertex degrees in GG plus 11. The initial embedding 𝑭(0){\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}} is just as before. We use again dtd_{t} to denote the number of features in layer tt. In layer t>0t>0, a 𝖦𝖢𝖭\mathsf{GCN} computes 𝑭(t):=σ(𝑫1/2(𝑰+𝑨)𝑫1/2𝑭(t1)𝑾(t)+𝑩(t)){\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:=\sigma({\bm{D}}^{-1/2}({\bm{I}}+{\bm{A}}){\bm{D}}^{-1/2}\cdot{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}{\bm{W}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}+{\bm{B}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}). If, in addition to the activation function σ\sigma we add the function 1x+1::x1x+1\frac{1}{\sqrt{x+1}}:\mathbb{R}\to\mathbb{R}:x\mapsto\frac{1}{\sqrt{x+1}} to Ω\Omega, we can represent the 𝖦𝖢𝖭\mathsf{GCN} layer, as follows. For j[dt]j\in[d_{t}], we define the 𝖦𝖳𝖫(t+1)(σ,1x+1)\mathsf{GTL}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\sigma,\frac{1}{\sqrt{x+1}}) expressions

φj(t)(x1):=σ(f1/x+1(x2E(x1,x2))(i=1dt1Wij(t)φi(t1)(x1))f1/x+1(x2E(x1,x2))+f1/x+1(x2E(x1,x2))(x2E(x1,x2)f1/x+1(x1E(x2,x1))(i=1dt1Wij(t)φi(t1)(x2))),\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}):=\sigma\Biggl{(}f_{1/\sqrt{x+1}}\bigl{(}\sum_{x_{2}}E(x_{1},x_{2})\bigr{)}\cdot\Bigl{(}\sum_{i=1}^{d_{t-1}}\!{W}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})\Bigr{)}\cdot f_{1/\sqrt{x+1}}\bigl{(}\sum_{x_{2}}E(x_{1},x_{2})\bigr{)}\\ {}+f_{1/\sqrt{x+1}}\bigl{(}\sum_{x_{2}}E(x_{1},x_{2})\bigr{)}\cdot\Bigl{(}\sum_{x_{2}}E(x_{1},x_{2})\cdot f_{1/\sqrt{x+1}}\bigl{(}\sum_{x_{1}}E(x_{2},x_{1})\bigr{)}\cdot\Bigl{(}\sum_{i=1}^{d_{t-1}}\!{W}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2})\Bigr{)}\Biggr{)},

where we omitted the bias vector for simplicity. We again observe that only guarded summations are needed. However, we remark that in every layer we now add two the overall summation depth, since we need an extra summation to compute the degrees. In other words, a tt-layered 𝖦𝖢𝖭\mathsf{GCN} correspond to expressions in 𝖦𝖳𝖫(2t)(σ,1x+1)\mathsf{GTL}^{\!\scalebox{0.6}{(}2t\scalebox{0.6}{)}}(\sigma,\frac{1}{\sqrt{x+1}}). If we denote by 𝖦𝖢𝖭(t)\mathsf{GCN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} the class of tt-layered 𝖦𝖢𝖭s\mathsf{GCN}\text{s}, then our results imply

ρ1(𝖼𝗋(2t))ρ1(𝖦𝖳𝖫(2t)(Ω))ρ1(𝖦𝖢𝖭(t)).\rho_{1}\bigl{(}\mathsf{cr}^{(2t)}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GTL}^{\!\scalebox{0.6}{(}2t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GCN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}.

We remark that another representation can be provided, in which the degree computation is factored out (Geerts et al., 2021a), resulting in a better upper bound ρ1(𝖼𝗋(t+1))ρ1(𝖦𝖢𝖭(t))\rho_{1}\bigl{(}\mathsf{cr}^{(t+1)}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GCN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}. In a similar way as for basic 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, we also have ρ0(𝗀𝖼𝗋(t+1))ρ0(𝖦𝖢𝖭(t)+𝗋𝖾𝖺𝖽𝗈𝗎𝗍)\rho_{0}\bigl{(}\mathsf{gcr}^{(t+1)}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{GCN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}+\mathsf{readout}\bigr{)}.

SGCs.

As an other example, we consider a variation of Simple Graph Convolutions (𝖲𝖦𝖢s\mathsf{SGC}\text{s}) (Wu et al., 2019), which use powers the adjacency matrix and only apply a non-linear activation function at the end. That is, 𝑭:=σ(𝑨p𝑭(0)𝑾){\bm{F}}:=\sigma({\bm{A}}^{p}\cdot{\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}\cdot{\bm{W}}) for some pp\in\mathbb{N} and 𝑾d0×d1{\bm{W}}\in\mathbb{R}^{d_{0}\times d_{1}}. We remark that 𝖲𝖦𝖢s\mathsf{SGC}\text{s} actually use powers of the normalized Laplacian, that is, 𝑭:=σ((𝑫1/2(𝑰+𝑨G)𝑫1/2))p𝑭(0)𝑾){\bm{F}}:=\sigma\bigl{(}({\bm{D}}^{-1/2}({\bm{I}}+{\bm{A}}_{G}){\bm{D}}^{-1/2}))^{p}\cdot{\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}\cdot{\bm{W}}\bigr{)} but this only incurs an additional summation depth as for 𝖦𝖢𝖭s\mathsf{GCN}\text{s}. We focus here on our simpler version. It should be clear that we can represent the architecture in 𝖳𝖫p+1(p)(Ω)\mathsf{TL}_{p+1}^{\!\scalebox{0.6}{(}p\scalebox{0.6}{)}}(\Omega) by means of the expressions:

φj(t)(x1):=σ(x2xp+1k=1pE(xk,xk+1)(i=1d0Wijφi(0)(xp+1))),\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}):=\sigma\left(\sum_{x_{2}}\cdots\sum_{x_{p+1}}\prod_{k=1}^{p}E(x_{k},x_{k+1})\cdot\Bigl{(}\sum_{i=1}^{d_{0}}{W}_{ij}\cdot\varphi_{i}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{p+1})\Bigr{)}\right),

for j[d1]j\in[d_{1}]. A naive application of our results would imply an upper bound on their separation power by p-𝖶𝖫p\text{-}\mathsf{WL}. We can, however, use Proposition 4.5. Indeed, it is readily verified that these expressions have a treewidth of one, because the variables form a path. And indeed, when for example, p=3p=3, we can equivalently write φj(t)(x1)\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}) as

σ(x2E(x1,x2)(x1E(x2,x1)(x2E(x1,x2)(i=1d0Wijφi(0)(x2))))),\sigma\left(\sum_{x_{2}}E(x_{1},x_{2})\cdot\biggl{(}\sum_{x_{1}}E(x_{2},x_{1})\cdot\Bigl{(}\sum_{x_{2}}E(x_{1},x_{2})\cdot\bigl{(}\sum_{i=1}^{d_{0}}{W}_{ij}\cdot\varphi_{i}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{2})\bigr{)}\Bigr{)}\biggl{)}\right),

by reordering the summations and reusing index variables. This holds for arbitrary pp. We thus obtain guarded expressions in 𝖦𝖳𝖫(p)(σ)\mathsf{GTL}^{\!\scalebox{0.6}{(}p\scalebox{0.6}{)}}(\sigma) and our results tell that tt-layered 𝖲𝖦𝖢s\mathsf{SGC}\text{s} are bounded by 𝖼𝗋(p)\mathsf{cr}^{\scalebox{0.6}{(}p\scalebox{0.6}{)}} for vertex embeddings, and by 𝗀𝖼𝗋(p)\mathsf{gcr}^{\scalebox{0.6}{(}p\scalebox{0.6}{)}} for 𝖲𝖦𝖢s+𝗋𝖾𝖺𝖽𝗈𝗎𝗍\mathsf{SGC}\text{s}+\mathsf{readout}.

Principal Neighbourhood Aggregation.

Our next example is a 𝖦𝖭𝖭\mathsf{GNN} in which different aggregation functions are used: Principal Neighborhood Aggregation (𝖯𝖭𝖠)(\mathsf{PNA}) is an architecture proposed by Corso et al. (2020) in which aggregation over neighboring vertices is done by means of 𝗆𝖾𝖺𝗇\mathsf{mean}, 𝗌𝗍𝖽𝗏\mathsf{stdv}, 𝗆𝖺𝗑\mathsf{max} and 𝗆𝗂𝗇\mathsf{min}, and this in parallel. In addition, after aggregation, three different scalers are applied. Scalers are diagonal matrices whose diagonal entries are a function of the vertex degrees. Given the features for each vertex vv computed in layer t1t-1, that is, 𝑭(t1)v:1×{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v:}\in\mathbb{R}^{1\times\ell}, a 𝖯𝖭𝖠\mathsf{PNA} computes vv’s new features in layer tt in the following way (see layer definition (8) in (Corso et al., 2020)). First, vectors 𝑮v:(t)1×4{\bm{G}}_{v:}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\in\mathbb{R}^{1\times 4\ell} are computed such that

Gvj(t)={𝗆𝖾𝖺𝗇({{𝗆𝗅𝗉j(Fw:(t1))wNG(v)}})for 1j𝗌𝗍𝖽𝗏({{𝗆𝗅𝗉j(Fw:(t1))wNG(v)}})for +1j2𝗆𝖺𝗑({{𝗆𝗅𝗉j(Fw:(t1))wNG(v)}})for 2+1j3𝗆𝗂𝗇({{𝗆𝗅𝗉j(Fw:(t1))wNG(v)}})for 3+1j4,{G}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}=\begin{cases}\mathsf{mean}\left(\{\!\{\mathsf{mlp}_{j}({F}_{w:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\mid w\in N_{G}(v)\}\!\}\right)&\text{for $1\leq j\leq\ell$}\\ \mathsf{stdv}\left(\{\!\{\mathsf{mlp}_{j}({F}_{w:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\mid w\in N_{G}(v)\}\!\}\right)&\text{for $\ell+1\leq j\leq 2\ell$}\\ \mathsf{max}\left(\{\!\{\mathsf{mlp}_{j}({F}_{w:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\mid w\in N_{G}(v)\}\!\}\right)&\text{for $2\ell+1\leq j\leq 3\ell$}\\ \mathsf{min}\left(\{\!\{\mathsf{mlp}_{j}({F}_{w:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\mid w\in N_{G}(v)\}\!\}\right)&\text{for $3\ell+1\leq j\leq 4\ell$},\end{cases}

where 𝗆𝗅𝗉j:\mathsf{mlp}_{j}:\mathbb{R}^{\ell}\to\mathbb{R} is the projection of an 𝖬𝖫𝖯\mathsf{MLP} 𝗆𝗅𝗉:\mathsf{mlp}:\mathbb{R}^{\ell}\to\mathbb{R}^{\ell} on the jjth coordinate. Then, three different scalers are applied. The first scaler is simply the identity, the second two scalers s1s_{1} and s2s_{2} depend on the vertex degrees. As such, vectors 𝑯v:(t)12{\bm{H}}_{v:}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\in\mathbb{R}^{12\ell} are constructed as follows:

Hvj(t)={Hvj(t)for 1j4s1(𝖽𝖾𝗀G(v))Hvj(t)for 4+1j8s2(𝖽𝖾𝗀G(v))Hvj(t)for 8+1j12,{H}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}=\begin{cases}{H}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}&\text{for $1\leq j\leq 4\ell$}\\ s_{1}(\mathsf{deg}_{G}(v))\cdot{H}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}&\text{for $4\ell+1\leq j\leq 8\ell$}\\ s_{2}(\mathsf{deg}_{G}(v))\cdot{H}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}&\text{for $8\ell+1\leq j\leq 12\ell$},\end{cases}

where s1s_{1} and s2s_{2} are functions from \mathbb{R}\to\mathbb{R} (see (Corso et al., 2020) for details). Finally, the new vertex embedding is obtained as

𝑭(t)v:=𝗆𝗅𝗉(𝑯(t)v:){\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v:}=\mathsf{mlp}^{\prime}({\bm{H}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v:})

for some 𝖬𝖫𝖯\mathsf{MLP} 𝗆𝗅𝗉:12\mathsf{mlp}^{\prime}:\mathbb{R}^{12\ell}\to\mathbb{R}^{\ell}. The above layer definition translates naturally into expressions in 𝖳𝖫(Ω,Θ)\mathsf{TL}(\Omega,\Theta), the extension of 𝖳𝖫(Ω)\mathsf{TL}(\Omega) with aggregate functions (Section C.5). Indeed, suppose that for each j[]j\in[\ell] we have 𝖳𝖫(Ω,Θ)\mathsf{TL}(\Omega,\Theta) expressions φj(t1)(x1)\varphi_{j}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}) such that [[φj(t1),v]]G=F(t1)vj[\![\varphi_{j}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}},v]\!]_{G}={F}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{vj} for any vertex vv. Then, Gvj(t){G}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} simply corresponds to the guarded expressions

ψ(t)j(x1):=𝖺𝗀𝗀𝗋x2𝗆𝖾𝖺𝗇(𝗆𝗅𝗉j(φ1(t1)(x2),,φ(t1)(x2))E(x1,x2)),\psi^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{j}(x_{1}):=\mathsf{aggr}_{x_{2}}^{\mathsf{mean}}(\mathsf{mlp}_{j}(\varphi_{1}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2}),\ldots,\varphi_{\ell}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2}))\mid E(x_{1},x_{2})),

for 1j1\leq j\leq\ell, and similarly for the other components of 𝑮v:(t){\bm{G}}_{v:}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} using the respective aggregation functions, 𝗌𝗍𝖽𝗏\mathsf{stdv}, 𝗆𝖺𝗑\mathsf{max} and 𝗆𝗂𝗇\mathsf{min}. Then, Hvj(t){H}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} corresponds to

ξj(t)(x1)={ψj(t)(x1)for 1j4s1(𝖺𝗀𝗀𝗋x2𝗌𝗎𝗆(𝟏x2=x2E(x1,x2)))ψj(t)(x1)for 4+1j8s2(𝖺𝗀𝗀𝗋x2𝗌𝗎𝗆(𝟏x2=x2E(x1,x2)))ψj(t)(x1)for 8+1j12,\xi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})=\begin{cases}\psi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})&\text{for $1\leq j\leq 4\ell$}\\ s_{1}(\mathsf{aggr}_{x_{2}}^{\mathsf{sum}}(\bm{1}_{x_{2}=x_{2}}\mid E(x_{1},x_{2})))\cdot\psi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})&\text{for $4\ell+1\leq j\leq 8\ell$}\\ s_{2}(\mathsf{aggr}_{x_{2}}^{\mathsf{sum}}(\bm{1}_{x_{2}=x_{2}}\mid E(x_{1},x_{2})))\cdot\psi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})&\text{for $8\ell+1\leq j\leq 12\ell$},\end{cases}

where we use summation aggregation to compute the degree information used in the functions in the scalers s1s_{1} and s2s_{2}. And finally,

φj(t):=𝗆𝗅𝗉j(ξ1(t)(x1),,ξ12(t)(x1))\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:=\mathsf{mlp}_{j}^{\prime}(\xi_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}),\ldots,\xi_{12\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}))

represents F(t)vj{F}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{vj}. We see that all expressions only use two index variables and aggregation is applied in a guarded way. Furthermore, in each layer, the aggregation depth increases with one. As such, a tt-layered 𝖯𝖭𝖠\mathsf{PNA} can be represented in 𝖦𝖳𝖫(t)(Ω,Θ)\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega,\Theta), where Ω\Omega consists of the 𝖬𝖫𝖯s\mathsf{MLP}\text{s} and functions used in scalers, and Θ\Theta consists of 𝗌𝗎𝗆\mathsf{sum} (for computing vertex degrees), and 𝗆𝖾𝖺𝗇\mathsf{mean}, 𝗌𝗍𝖽𝗏\mathsf{stdv}, 𝗆𝖺𝗑\mathsf{max} and 𝗆𝗂𝗇\mathsf{min}. Proposition C.6 then implies a bound on the separation power by 𝖼𝗋(t)\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}.

Other example.

In the same way, one can also easily analyze 𝖦𝖠𝖳s\mathsf{GAT}\text{s} (Velickovic et al., 2018) and show that these can be represented in 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega) as well, and thus bounds by color refinement can be obtained.

D.2 kk-dimensional Weisfeiler-Leman tests

We next discuss architectures related to the kk-dimensional Weisfeiler-Leman algorithms. For k=1k=1, we discussed the extended 𝖦𝖨𝖭s\mathsf{GIN}\text{s} in the main paper. We here focus on arbitrary k2k\geq 2.

Folklore GNNs.

We first consider the “Folklore” 𝖦𝖭𝖭s\mathsf{GNN}\text{s} or k-𝖥𝖦𝖭𝖭sk\text{-}\mathsf{FGNN}\text{s} for short (Maron et al., 2019b). For k2k\geq 2, k-𝖥𝖦𝖭𝖭sk\text{-}\mathsf{FGNN}\text{s} computes a tensors. In particular, the initial tensor 𝑭(0){\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}} encodes 𝖺𝗍𝗉k(G,𝒗)\mathsf{atp}_{k}(G,{\bm{v}}) for each 𝒗VGk{\bm{v}}\in V_{G}^{k}. We can represent this tensor by the following k2(+2)k^{2}(\ell+2) expressions in 𝖳𝖫k(0)\mathsf{TL}_{k}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}:

φr,s,j(0)(x1,,xk):={𝟏xr=xsPj(xr)for j[]E(xr,xs)for j=+1𝟏xr=xsfor j=+2,\varphi_{r,s,j}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1},\ldots,x_{k}):=\begin{cases}\bm{1}_{x_{r}=x_{s}}\cdot P_{j}(x_{r})&\text{for $j\in[\ell]$}\\ E(x_{r},x_{s})&\text{for $j=\ell+1$}\\ \bm{1}_{x_{r}=x_{s}}&\text{for $j=\ell+2$}\end{cases},

for r,s[k]r,s\in[k] and j[+2]j\in[\ell+2]. We note: [[φ(0)r,s,j,(v1,,vk)]]G=F(0)v1,,vk,r,s,j[\![\varphi^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{r,s,j},(v_{1},\ldots,v_{k})]\!]_{G}={\mathsfit{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},r,s,j} for all (r,s,j)[k]2×[+2](r,s,j)\in[k]^{2}\times[\ell+2], as desired. We let τ0:=[k]2×[+2]\tau_{0}:=[k]^{2}\times[\ell+2] and set d0=k2×(+2)d_{0}=k^{2}\times(\ell+2).

Then, in layer tt, a k-𝖥𝖦𝖭𝖭k\text{-}\mathsf{FGNN} computes a tensor

𝑭(t)v1,,vk,:=𝗆𝗅𝗉0(t)(𝑭(t1)v1,,vk,,wVGs=1k𝗆𝗅𝗉s(t)(𝑭v1,,vs1,w,vs+1,,vk,(t1))),{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},\bullet}:=\mathsf{mlp}_{0}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigl{(}{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},\bullet},\sum_{w\in V_{G}}\prod_{s=1}^{k}\mathsf{mlp}_{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}({\bm{\mathsfit{F}}}_{v_{1},\ldots,v_{s-1},w,v_{s+1},\ldots,v_{k},\bullet}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\bigr{)},

where 𝗆𝗅𝗉s(t):dt1dt\mathsf{mlp}_{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathbb{R}^{d_{t-1}}\to\mathbb{R}^{d_{t}^{\prime}}, for s[k]s\in[k], and and 𝗆𝗅𝗉0(t):dt1×dtdt\mathsf{mlp}_{0}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathbb{R}^{d_{t-1}\times d_{t}^{\prime}}\to\mathbb{R}^{d_{t}} are 𝖬𝖫𝖯s\mathsf{MLP}\text{s}. We here use \bullet to denote combinations of indices in τd\tau_{d} for 𝑭(t){\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} and in τd1\tau_{d-1} for 𝑭(t1){\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}.

Let 𝑭(t1)nk×dt1{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\in\mathbb{R}^{n^{k}\times d_{t-1}} be the tensor computed by an k-𝖥𝖦𝖭𝖭k\text{-}\mathsf{FGNN} in layer t1t-1. Assume that for each tuple of elements 𝒋{\bm{j}} in τdt1\tau_{d_{t-1}} we have an expression φ𝒋(t1)(x1,,xk)\varphi_{{\bm{j}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1},\ldots,x_{k}) satisfying [[φ(t1)𝒋,(v1,,vk)]]G=F(t1)v1,,vk,𝒋[\![\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{{\bm{j}}},(v_{1},\ldots,v_{k})]\!]_{G}={\mathsfit{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},{\bm{j}}} and such that it is an expression in 𝖳𝖫k+1(t1)(Ω)\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(\Omega). That is, we need k+1k+1 index variables and a summation depth of t1t-1 to represent layer t1t-1.

Then, for layer tt, for each 𝒋τdt{\bm{j}}\in\tau_{d_{t}}, it suffices to consider the expression

φ𝒋(t)(x1,,xk):=𝗆𝗅𝗉0,𝒋(t)((φ𝒊(t1)(x1,,xk))𝒊τdt1,xk+1s=1k𝗆𝗅𝗉s,𝒋(t)((φ𝒊(t1)(x1,,xs1,xk+1,xs+1,,xk))𝒊τdt1),\varphi_{\bm{j}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1},\ldots,x_{k}):=\mathsf{mlp}_{0,{\bm{j}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\Bigl{(}\bigl{(}\varphi_{{\bm{i}}}^{(t-1)}(x_{1},\ldots,x_{k})\bigr{)}_{{\bm{i}}\in\tau_{d_{t-1}}},{}\\ \sum_{x_{k+1}}\,\prod_{s=1}^{k}\mathsf{mlp}_{s,{\bm{j}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigl{(}(\varphi_{{\bm{i}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1},\ldots,x_{s-1},x_{k+1},x_{s+1},\ldots,x_{k})\bigr{)}_{{\bm{i}}\in\tau_{d_{t-1}}}\Bigr{)},

where 𝗆𝗅𝗉o,𝒋(t)\mathsf{mlp}_{o,{\bm{j}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} and 𝗆𝗅𝗉s,𝒋(t)\mathsf{mlp}_{s,{\bm{j}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} are the projections of the 𝖬𝖫𝖯s\mathsf{MLP}\text{s} on the 𝒋{\bm{j}}-coordinates. We remark that we need k+1k+1 index variables, and one extra summation is needed. We thus obtain expressions in 𝖳𝖫k+1(t)(Ω)\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega) for the ttth layer, as desired. We remark that the expressions are simple translations of the defining layer definitions. Also, in this case, Ω\Omega consists of all 𝖬𝖫𝖯s\mathsf{MLP}\text{s}. When a k-𝖥𝖦𝖭𝖭k\text{-}\mathsf{FGNN} is used for vertex embeddings, we now simply add to each expression a factor s=1k𝟏x1=xs\prod_{s=1}^{k}\bm{1}_{x_{1}=x_{s}}. As an immediate consequence of our results, if we denote by k-𝖥𝖦𝖭𝖭(t)k\text{-}\mathsf{FGNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} the class of tt-layered k-𝖥𝖦𝖭𝖭sk\text{-}\mathsf{FGNN}\text{s}, then for vertex embeddings:

ρ1(𝗏𝗐𝗅k(t))ρ1(𝖳𝖫k+1(t)(Ω))ρ1(k-𝖥𝖦𝖭𝖭(t))\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}k\text{-}\mathsf{FGNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}

in accordance with the known results from Azizian & Lelarge (2021). When used for graph embeddings, an aggregation layer over all kk-tuples of vertices is added, followed by the application of an 𝖬𝖫𝖯\mathsf{MLP}. This results in expressions with no free index variables, and of summation depth t+kt+k, where the increase with kk stems from the aggregation process over all kk-tuples. In view of our results, for graph embeddings:

ρ0(𝗀𝗐𝗅k())ρ0(𝖳𝖫k+1(Ω))ρ0(k-𝖥𝖦𝖭𝖭)\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{TL}_{k+1}(\Omega)\bigr{)}\subseteq\rho_{0}\bigl{(}k\text{-}\mathsf{FGNN}\bigr{)}

in accordance again with Azizian & Lelarge (2021). We here emphasize that the upper bounds in terms of k-𝖶𝖫k\text{-}\mathsf{WL} are obtained without the need to know how k-𝖶𝖫k\text{-}\mathsf{WL} works. Indeed, one can really just focus on casting layers in the right tensor language!

We remark that Azizian & Lelarge (2021) define vertex embedding k-𝖥𝖦𝖭𝖭sk\text{-}\mathsf{FGNN}\text{s} in a different way. Indeed, for a vertex vv, its embedding is obtained by aggregating of all (k1)(k-1) tuples in the remaining coordinates of the tensors. They define 𝗏𝗐𝗅k\mathsf{vwl}_{k} accordingly. From the tensor language point of view, this corresponds to the addition of k1k-1 to the summation depth. Our results indicate that we loose the connection between rounds and layers, as in Azizian & Lelarge (2021). This is the reason why we defined vertex embedding k-𝖥𝖦𝖭𝖭sk\text{-}\mathsf{FGNN}\text{s} in a different way and can ensure a correspondence between rounds and layers for vertex embeddings.

Other higher-order examples.

It is readily verified that tt-layered kk-𝖦𝖭𝖭s\mathsf{GNN}\text{s} (Morris et al., 2019) can be represented in 𝖳𝖫k+1(t)(Ω)\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega), recovering the known upper bound by 𝗏𝗐𝗅k(t)\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} (Morris et al., 2019). It is an equally easy exercise to show that 2-𝖶𝖫2\text{-}\mathsf{WL}-convolutions (Damke et al., 2020) and Ring-𝖦𝖭𝖭s\mathsf{GNN}\text{s} (Chen et al., 2019) are bounded by 2-𝖶𝖫2\text{-}\mathsf{WL}, by simply writing their layers in 𝖳𝖫3(Ω)\mathsf{TL}_{3}(\Omega). The invariant graph networks (k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s}) (Maron et al., 2019b) will be treated in Section E, as their representation in 𝖳𝖫k+1(Ω)\mathsf{TL}_{k+1}(\Omega) requires some work.

D.3 Augmented GNNs

Higher-order 𝖦𝖭𝖭\mathsf{GNN} architectures such as kk-𝖦𝖭𝖭s\mathsf{GNN}\text{s}, k-𝖥𝖦𝖭𝖭sk\text{-}\mathsf{FGNN}\text{s} and k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s}, incur a substantial cost in terms of memory and computation (Morris et al., 2020). Some recent proposals infuse more efficient 𝖦𝖭𝖭s\mathsf{GNN}\text{s} with higher-order information by means of some pre-processing step. We next show that the tensor language approach also enables to obtain upper bounds on the separation power of such “augmented” 𝖦𝖭𝖭s\mathsf{GNN}\text{s}.

We first consider -𝖬𝖯𝖭𝖭s\mathcal{F}\text{-}\mathsf{MPNN}\text{s} (Barceló et al., 2021) in which the initial vertex features are augmented with homomorphism counts of rooted graph patterns. More precisely, let PrP^{r} be a connected rooted graph (with root vertex rr), and consider a graph G=(VG,EG,𝖼𝗈𝗅G)G=(V_{G},E_{G},\mathsf{col}_{G}) and vertex vVGv\in V_{G}. Then, 𝗁𝗈𝗆(Pr,Gv)\mathsf{hom}(P^{r},G^{v}) denotes the number of homomorphism from PP to GG, mapping rr to vv. We recall that a homomorphism is an edge-preserving mapping between vertex sets. Given a collection ={P1r,,Pr}\mathcal{F}=\{P_{1}^{r},\ldots,P_{\ell}^{r}\} of rooted patterns, an -𝖬𝖯𝖭𝖭\mathcal{F}\text{-}\mathsf{MPNN} runs an 𝖬𝖯𝖭𝖭\mathsf{MPNN} on the augmented initial vertex features:

𝑭~(0)v::=(𝑭(0)v:,𝗁𝗈𝗆(P1r,Gv),,𝗁𝗈𝗆(Pr,Gv)).\tilde{{\bm{F}}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{v:}:=({\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{v:},\mathsf{hom}(P_{1}^{r},G^{v}),\ldots,\mathsf{hom}(P_{\ell}^{r},G^{v})).

Now, take any 𝖦𝖭𝖭\mathsf{GNN} architecture that can be cast in 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega) or 𝖳𝖫2(Ω)\mathsf{TL}_{2}(\Omega) and assume, for simplicity of exposition, that a tt-layer 𝖦𝖭𝖭\mathsf{GNN} corresponds to expressions in 𝖦𝖳𝖫(t)(Ω)\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega) or 𝖳𝖫2(t)(Ω)\mathsf{TL}_{2}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega). In order to analyze the impact of the augmented features, one only needs to revise the expressions φj(0)(x1)\varphi_{j}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1}) that represent the initial features. In the absence of graph patterns, φj(0)(x1):=Pj(x1)\varphi_{j}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1}):=P_{j}(x_{1}), as we have seen before. By contrast, to represent 𝑭~(0)vj\tilde{{\bm{F}}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{vj} we need to cast the computation of 𝗁𝗈𝗆(Pir,Gv)\mathsf{hom}(P_{i}^{r},G^{v}) in 𝖳𝖫\mathsf{TL}. Assume that the graph pattern PiP_{i} consists of pp vertices and let us identify the vertex set with [p][p]. Furthermore, without of loss generality, we assume that vertex “11” is the root vertex in PiP_{i}. To obtain 𝗁𝗈𝗆(Pir,Gv)\mathsf{hom}(P_{i}^{r},G^{v}) we need to create an indicator function for the graph pattern PiP_{i} and then count how many times this indicator value is equal to one in GG. The indicator function for PiP_{i} is simply given by the expression uvEPiE(xu,xv)\prod_{uv\in E_{P_{i}}}E(x_{u},x_{v}). Then, counting just pours down to summation over all index variables except the one for the root vertex. More precisely, if we define

φPi(x1):=x2xpuvEPiE(xu,xv),\varphi_{P_{i}}(x_{1}):=\sum_{x_{2}}\cdots\sum_{x_{p}}\prod_{uv\in E_{P_{i}}}E(x_{u},x_{v}),

then [[φPi,v]]G=𝗁𝗈𝗆(Pir,Gv)[\![\varphi_{P_{i}},v]\!]_{G}=\mathsf{hom}(P_{i}^{r},G^{v}). This encoding results in an expression in 𝖳𝖫p\mathsf{TL}_{p}. However, it is well-known that we can equivalently write φPi(x1)\varphi_{P_{i}}(x_{1}) as an expression φ~Pi(x1)\tilde{\varphi}_{P_{i}}(x_{1}) in 𝖳𝖫k+1\mathsf{TL}_{k+1} where kk is the treewidth of the graph PiP_{i}. As such, our results imply that -𝖬𝖯𝖭𝖭s\mathcal{F}\text{-}\mathsf{MPNN}\text{s} are bounded in separation power by k-𝖶𝖫k\text{-}\mathsf{WL} where kk is the maximal treewidth of graphs in \mathcal{F}. We thus recover the known upper bound as given in Barceló et al. (2021) using our tensor language approach.

Another example of augmented 𝖦𝖭𝖭\mathsf{GNN} architectures are the Graph Substructure Networks (𝖦𝖲𝖭s)\mathsf{GSN}\text{s}) (Bouritsas et al., 2020). By contrast to -𝖬𝖯𝖭𝖭s\mathcal{F}\text{-}\mathsf{MPNN}\text{s}, subgraph isomorphism counts rather than homomorphism counts are used to augment the initial features. At the core of a 𝖦𝖲𝖭\mathsf{GSN} thus lies the computation of 𝗌𝗎𝖻(Pr,Gv)\mathsf{sub}(P^{r},G^{v}), the number of subgraphs HH in GG isomorphic to PP (and such that the isomorphisms map rr to vv). In a similar way as for homomorphisms counts, we can either directly cast the computation of 𝗌𝗎𝖻(Pr,Gv)\mathsf{sub}(P^{r},G^{v}) in 𝖳𝖫\mathsf{TL} resulting again in the use of pp index variables. A possible reduction in terms of index variables, however, can be obtained by relying on the result (Theorem 1.1.) by Curticapean et al. (2017) in which it shown that 𝗌𝗎𝖻(Pr,Gv)\mathsf{sub}(P^{r},G^{v}) can be computed in terms of homomorphism counts of graph patterns derived from PrP^{r}. More precisely, Curticapean et al. (2017) define 𝗌𝗉𝖺𝗌𝗆(Pr)\mathsf{spasm}(P^{r}) as the set of graphs consisting of all possible homomorphic images of PrP^{r}. It is then readily verified that if the maximal treewidth of the graphs in 𝗌𝗉𝖺𝗌𝗆(Pr)\mathsf{spasm}(P^{r}) is kk, then 𝗌𝗎𝖻(Pr,Gv)\mathsf{sub}(P^{r},G^{v}) can be cast as an expression in 𝖳𝖫k+1\mathsf{TL}_{k+1}. Hence, 𝖦𝖲𝖭s\mathsf{GSN}s using a pattern collection \mathcal{F} can be represented in 𝖳𝖫k+1\mathsf{TL}_{k+1}, where kk is the maximal treewidth of graphs in any of the spams of patterns in \mathcal{F}, and thus are bounded in separation power k-𝖶𝖫k\text{-}\mathsf{WL} in accordance to the results by Barceló et al. (2021).

As a final example, we consider the recently introduced Message Passing Simplicial Networks (𝖬𝖯𝖲𝖭\mathsf{MPSN}s) (Bodnar et al., 2021). In a nutshell, 𝖬𝖯𝖲𝖭\mathsf{MPSN}s are run on simplicial complexes of graphs instead of on the original graphs. We sketch how our tensor language approach can be used to assess the separation power of 𝖬𝖯𝖲𝖭\mathsf{MPSN}s on clique complexes. We use the simplified version of 𝖬𝖯𝖲𝖭\mathsf{MPSN}s which have the same expressive power as the full version of 𝖬𝖯𝖲𝖭\mathsf{MPSN}s (Theorem 6 in Bodnar et al. (2021)).

We recall some definitions. Let 𝖢𝗅𝗂𝗊𝗎𝖾𝗌(G)\mathsf{Cliques}(G) denote the set of all cliques in GG. Given two cliques cc and cc^{\prime} in 𝖢𝗅𝗂𝗊𝗎𝖾𝗌(G)\mathsf{Cliques}(G), define ccc\prec c^{\prime} if ccc\subset c^{\prime} and there exists no cc^{\prime\prime} in 𝖢𝗅𝗂𝗊𝗎𝖾𝗌(G)\mathsf{Cliques}(G), such that cccc\subset c^{\prime\prime}\subset c^{\prime}. We define 𝖡𝗈𝗎𝗇𝖽𝖺𝗋𝗒(c,G):={c𝖢𝗅𝗂𝗊𝗎𝖾𝗌(G)cc}\mathsf{Boundary}(c,G):=\{c^{\prime}\in\mathsf{Cliques}(G)\mid c^{\prime}\prec c\} and 𝖴𝗉𝗉𝖾𝗋(c,G):={c𝖢𝗅𝗂𝗊𝗎𝖾𝗌(G)c𝖢𝗅𝗂𝗊𝗎𝖾𝗌(G),cc and cc}\mathsf{Upper}(c,G):=\{c^{\prime}\in\mathsf{Cliques}(G)\mid\exists c^{\prime\prime}\in\mathsf{Cliques}(G),c^{\prime}\prec c^{\prime\prime}\text{ and }c\prec c^{\prime\prime}\}.

For each cc in 𝖢𝗅𝗂𝗊𝗎𝖾𝗌(G)\mathsf{Cliques}(G) we have an initial feature vector 𝑭(0)c:1×{\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{c:}\in\mathbb{R}^{1\times\ell}. Bodnar et al. (2021) initialize all initial features with the same value. Then, in layer tt, for each c𝖢𝗅𝗂𝗊𝗎𝖾𝗌(G)c\in\mathsf{Cliques}(G), features are updated as follows:

𝑮(t)c:\displaystyle{\bm{G}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{c:} =FB({{𝗆𝗅𝗉B(𝑭(t1)c:,𝑭(t1)c:)c𝖡𝗈𝗎𝗇𝖽𝖺𝗋𝗒(G,c)}})\displaystyle=F_{B}(\{\!\{\mathsf{mlp}_{B}({\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c:},{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c^{\prime}:})\mid c^{\prime}\in\mathsf{Boundary}(G,c)\}\!\})
𝑯(t)c:\displaystyle{\bm{H}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{c:} =FU({{𝗆𝗅𝗉U(𝑭(t1)c:,𝑭(t1)c:,𝑭(t1)cc:)c𝖴𝗉𝗉𝖾𝗋(G,c)}})\displaystyle=F_{U}(\{\!\{\mathsf{mlp}_{U}({\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c:},{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c^{\prime}:},{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c\cup c^{\prime}:})\mid c^{\prime}\in\mathsf{Upper}(G,c)\}\!\})
𝑭(t)c:\displaystyle{\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{c:} =𝗆𝗅𝗉(𝑭(t1)c:,𝑮(t)c:,𝑯(t)c:),\displaystyle=\mathsf{mlp}({\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c:},{\bm{G}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{c:},{\bm{H}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{c:}),

where FBF_{B} and FUF_{U} are aggregation functions and 𝗆𝗅𝗉B\mathsf{mlp}_{B}, 𝗆𝗅𝗉U\mathsf{mlp}_{U} and 𝗆𝗅𝗉\mathsf{mlp} are 𝖬𝖫𝖯s\mathsf{MLP}\text{s}. With some effort, one can represent these computations by expressions in 𝖳𝖫p(Ω,Θ)\mathsf{TL}_{p}(\Omega,\Theta) where pp is largest clique in GG. As such, the separation power of clique-complex 𝖬𝖯𝖲𝖭\mathsf{MPSN}s on graphs of clique size at most pp is bounded by p1-𝖶𝖫p-1\text{-}\mathsf{WL}. And indeed, Bodnar et al. (2021) consider Rook’s 4×44\times 4 graph, which contains a 44-clique, and the Shirkhande graph, which does not contain a 44-clique. As such, the analysis above implies that clique-complex 𝖬𝖯𝖲𝖭\mathsf{MPSN}s are bounded by 2-𝖶𝖫2\text{-}\mathsf{WL} on the Shrikhande graph, and by 3-𝖶𝖫3\text{-}\mathsf{WL} on Rook’s graph, consistent with the observation in Bodnar et al. (2021). A more detailed analysis of 𝖬𝖯𝖲𝖭\mathsf{MPSN}s in terms of summation depth and for other simplicial complexes is left as future work.

This illustrates again that our approach can be used to assess the separation power of a variety of 𝖦𝖭𝖭\mathsf{GNN} architectures in terms of k-𝖶𝖫k\text{-}\mathsf{WL}, by simply writing them as tensor language expressions. Furthermore, bounds in terms of k-𝖶𝖫k\text{-}\mathsf{WL} can be used for augmented 𝖦𝖭𝖭s\mathsf{GNN}\text{s} which form a more efficient way of incorporating higher-order graph structural information than higher-order 𝖦𝖭𝖭s\mathsf{GNN}\text{s}.

D.4 Spectral GNNs

In general, spectral 𝖦𝖭𝖭s\mathsf{GNN}\text{s} are defined in terms of eigenvectors and eigenvalues of the (normalized) graph Laplacian (Bruna et al., 2014; Defferrard et al., 2016; Levie et al., 2019; Balcilar et al., 2021b)). The diagonalization of the graph Laplacian is, however, avoided in practice, due to its excessive cost. Instead, by relying on approximation results in spectral graph analysis (Hammond et al., 2011), the layers of practical spectral 𝖦𝖭𝖭s\mathsf{GNN}\text{s} are defined in term propagation matrices consisting of functions, which operate directly on the graph Laplacian. This viewpoint allows for a spectral analysis of spectral and “spatial” 𝖦𝖭𝖭s\mathsf{GNN}\text{s} in a uniform way, as shown by Balcilar et al. (2021b). In this section, we consider two specific instances of spectral 𝖦𝖭𝖭s\mathsf{GNN}\text{s}: 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} (Defferrard et al., 2016) and 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet} (Levie et al., 2019), and assess their separation power in terms of tensor logic. Our general results then provide bounds on their separation power in terms color refinement and 2-𝖶𝖫2\text{-}\mathsf{WL}, respectively.

Chebnet.

The separation power of 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} (Defferrard et al., 2016) was already analyzed in Balcilar et al. (2021a) by representing them in the 𝖬𝖠𝖳𝖫𝖠𝖭𝖦\mathsf{MATLANG} matrix query language (Brijder et al., 2019). It was shown (Theorem 2 (Balcilar et al., 2021a)) that it is only the maximal eigenvalue λmax\lambda_{\max} of the graph Laplacian used in the layers of 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} that may result in the separation power of 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} to go beyond 1-𝖶𝖫1\text{-}\mathsf{WL}. We here revisit and refine this result by showing that, when ignoring the use of λmax\lambda_{\max}, the separation power of 𝖢𝗁𝖾𝖻𝗇𝖾𝗍\mathsf{Chebnet} is bounded already by color refinement (which, as mentioned in Section 2, is weaker than 1-𝖶𝖫1\text{-}\mathsf{WL} for vertex embeddings). In a nutshell, the layers of a 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} are defined in terms of Chebyshev polynomials of the normalized Laplacian 𝑳norm=𝑰𝑫1/2𝑨𝑫1/2{\bm{L}}_{\textsl{norm}}={\bm{I}}-{\bm{D}}^{-1/2}\cdot{\bm{A}}\cdot{\bm{D}}^{-1/2} and these polynomials can be easily represented in 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega). One can alternatively use the graph Laplacian 𝑳=𝑫𝑨{\bm{L}}={\bm{D}}-{\bm{A}} in a 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet}, which allows for a similar analysis. The distinction between the choice of 𝑳norm{\bm{L}}_{\textsl{norm}} and 𝑳{\bm{L}} only shows in the needed summation depth (in as in similar way as for the 𝖦𝖢𝖭s\mathsf{GCN}\text{s} described earlier). We only consider the normalized Laplacian here.

More precisely, following Balcilar et al. (2021a; b), in layer tt, vertex embeddings are updated in a 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} according to:

𝑭(t):=σ(s=1p𝑪(s)𝑭(t1)𝑾(t1,s)),{\bm{F}}^{(t)}:=\sigma\left(\sum_{s=1}^{p}{\bm{C}}^{(s)}\cdot{\bm{F}}^{(t-1)}\cdot{\bm{W}}^{(t-1,s)}\right),

with

𝑪(1):=𝑰,𝑪(2)=2λmax𝑳norm𝑰,𝑪(s)=2𝑪(2)𝑪(s1)𝑪(s2), for s3,{\bm{C}}^{(1)}:={\bm{I}},{\bm{C}}^{(2)}=\frac{2}{\lambda_{\max}}{\bm{L}}_{\textsl{norm}}-{\bm{I}},{\bm{C}}^{(s)}=2{\bm{C}}^{(2)}\cdot{\bm{C}}^{(s-1)}-{\bm{C}}^{(s-2)},\text{ for $s\geq 3$,}

and where λmax\lambda_{\max} denotes the maximum eigenvalue of 𝑳norm{\bm{L}}_{\textsl{norm}}. We next use a similar analysis as in Balcilar et al. (2021a). That is, we ignore for the moment the maximal eigenvalue λmax\lambda_{\max} and redefine 𝑪(2){\bm{C}}^{(2)} as c𝑳norm𝑰c{\bm{L}}_{\textsl{norm}}-{\bm{I}} for some constant cc. We thus see that each 𝑪(s){\bm{C}}^{(s)} is a polynomial of the form ps(c,𝑳norm):=i=0qsa(s)i(c)(𝑳norm)ip_{s}(c,{\bm{L}}_{\textsl{norm}}):=\sum_{i=0}^{q_{s}}a^{(s)}_{i}(c)\cdot({\bm{L}}_{\textsl{norm}})^{i} with scalar functions a(s)i:a^{(s)}_{i}:\mathbb{R}\to\mathbb{R} and where we interpret (𝑳norm)0=𝑰({\bm{L}}_{\textsl{norm}})^{0}={\bm{I}}. To upper bound the separation power using our tensor language approach, we can thus shift our attention entirely to representing (𝑳norm)i𝑭(t1)𝑾(t1,s)({\bm{L}}_{\textsl{norm}})^{i}\cdot{\bm{F}}^{(t-1)}\cdot{\bm{W}}^{(t-1,s)} for powers ii\in\mathbb{N}. Furthermore, since (𝑳norm)i({\bm{L}}_{\textsl{norm}})^{i} is again a polynomial of the form qi(𝑫1/2𝑨𝑫1/2):=j=0ribij(𝑫1/2𝑨𝑫1/2)jq_{i}({\bm{D}}^{-1/2}\cdot{\bm{A}}\cdot{\bm{D}}^{-1/2}):=\sum_{j=0}^{r_{i}}b_{ij}\cdot({\bm{D}}^{-1/2}\cdot{\bm{A}}\cdot{\bm{D}}^{-1/2})^{j}, we can further narrow down the problem to represent

(𝑫1/2𝑨𝑫1/2)j𝑭(t1)𝑾(t1,s)({\bm{D}}^{-1/2}\cdot{\bm{A}}\cdot{\bm{D}}^{-1/2})^{j}\cdot{\bm{F}}^{(t-1)}\cdot{\bm{W}}^{(t-1,s)}

in 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega), for powers jj\in\mathbb{N}. And indeed, combining our analysis for 𝖦𝖢𝖭s\mathsf{GCN}\text{s} and 𝖲𝖦𝖢s\mathsf{SGC}\text{s} results in expressions in 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega). As an example let us consider (𝑫1/2𝑨𝑫1/2)2𝑭(t1)𝑾(t1)({\bm{D}}^{-1/2}\cdot{\bm{A}}\cdot{\bm{D}}^{-1/2})^{2}\cdot{\bm{F}}^{(t-1)}\cdot{\bm{W}}^{(t-1)}, that is we use a power of two. It then suffices to define, for each output dimension jj, the expressions:

ψ2j(x1)=f1/x(x2E(x1,x2))x2(E(x1,x2)f1/x(x1(E(x2,x1))x1(E(x2,x1)f1/x(x2E(x1,x2))(i=1dt1Wij(t1)φi(t1)(x1)))),\psi^{2}_{j}(x_{1})=f_{1/\sqrt{x}}\bigl{(}\textstyle\sum_{x_{2}}E(x_{1},x_{2})\bigr{)}\cdot\textstyle\sum_{x_{2}}\Biggl{(}E(x_{1},x_{2})\cdot f_{1/x}\bigl{(}\textstyle\sum_{x_{1}}(E(x_{2},x_{1})\bigr{)}\cdot\\ \textstyle\sum_{x_{1}}\Bigl{(}E(x_{2},x_{1})\cdot f_{1/\sqrt{x}}(\textstyle\sum_{x_{2}}E(x_{1},x_{2}))\cdot\bigl{(}\textstyle\sum_{i=1}^{d_{t-1}}{W}_{ij}^{(t-1)}\varphi_{i}^{(t-1)}(x_{1})\bigr{)}\Bigr{)}\Biggr{)},

where the φi(t1)(x1)\varphi_{i}^{(t-1)}(x_{1}) are expressions representing layer t1t-1. It is then readily verified that we can use ψj2(x1)\psi_{j}^{2}(x_{1}) to cast layer tt of a 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} in 𝖦𝖳𝖫(Ω)\mathsf{GTL}(\Omega) with Ω\Omega consisting of f1/x::x1xf_{1/\sqrt{x}}:\mathbb{R}\to\mathbb{R}:x\mapsto\frac{1}{\sqrt{x}}, f1/x::x1xf_{1/x}:\mathbb{R}\to\mathbb{R}:x\mapsto\frac{1}{x}, and the used activation function σ\sigma. We thus recover (and slightly refine) Theorem 2 in Balcilar et al. (2021a):

Corollary D.1.

On graphs sharing the same λmax\lambda_{\max} values, the separation power of 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} is bounded by color refinement, both for graph and vertex embeddings.

A more fine-grained analysis of the expressions is needed when interested in bounding the summation depth and thus of the number of rounds needed for color refinement. Moreover, as shown by Balcilar et al. (2021a), when graphs have non-regular components with different λmax\lambda_{\max} values, 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} can distinguish them, whilst 1-𝖶𝖫1\text{-}\mathsf{WL} cannot. To our knowledge, λmax\lambda_{\max} cannot be computed in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega) for any kk. This implies that it not clear whether an upper bound on the separation power can be obtained for 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet} taking λmax\lambda_{\max} into account. It is an interesting open question whether there are two graphs GG and HH which cannot be distinguished by k-𝖶𝖫k\text{-}\mathsf{WL} but can be distinguished based on λmax\lambda_{\max}. A positive answer would imply that the computation of λmax\lambda_{\max} is beyond reach for 𝖳𝖫(Ω)\mathsf{TL}(\Omega) and other techniques are needed.

CayleyNet.

We next show how the separation power of 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet} (Levie et al., 2019) can be analyzed. To our knowledge, this analysis is new. We show that the separation power of 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet} is bounded by 2-𝖶𝖫2\text{-}\mathsf{WL}. Following Levie et al. (2019) and Balcilar et al. (2021b), in each layer tt, a 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet} updates features as follows:

𝑭(t):=σ(s=1p𝑪(s)𝑭(t1)𝑾(t1,s)),{\bm{F}}^{(t)}:=\sigma\left(\sum_{s=1}^{p}{\bm{C}}^{(s)}\cdot{\bm{F}}^{(t-1)}{\bm{W}}^{(t-1,s)}\right),

with

𝑪(1):=𝑰,𝑪(2s):=𝖱𝖾((h𝑳ı𝑰h𝑳+ı𝑰)s),𝑪(2s+1):=𝖱𝖾(ı(h𝑳ı𝑰h𝑳+ı𝑰)s),{\bm{C}}^{(1)}:={\bm{I}},{\bm{C}}^{(2s)}:=\mathsf{Re}\Biggl{(}\Bigl{(}\frac{h{\bm{L}}-\imath{\bm{I}}}{h{\bm{L}}+\imath{\bm{I}}}\Bigr{)}^{s}\Biggr{)},{\bm{C}}^{(2s+1)}:=\mathsf{Re}\Biggl{(}\imath\Bigl{(}\frac{h{\bm{L}}-\imath{\bm{I}}}{h{\bm{L}}+\imath{\bm{I}}}\Bigr{)}^{s}\Biggr{)},

where hh is a constant, ı\imath is the imaginary unit, and 𝖱𝖾:\mathsf{Re}:\mathbb{C}\to\mathbb{C} maps a complex number to its real part. We immediately observe that a 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet} requires the use of complex numbers and matrix inversion. So far, we considered real numbers only, but when our separation results are concerned, the choice between real or complex numbers is insignificant. In fact, only the proof of Proposition C.3 requires a minor modification when working on complex numbers: the infinite disjunctions used in the proof now need to range over complex numbers. For matrix inversion, when dealing with separation power, one can use different expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) for computing the matrix inverse, depending on the input size. And indeed, it is well-known (see e.g., Csanky (1976)) that based on the characteristic polynomial of 𝑨{\bm{A}}, 𝑨1{\bm{A}}^{-1} for any matrix 𝑨n×n{\bm{A}}\in\mathbb{R}^{n\times n} can be computed as a polynomial 1cni=1n1ci𝑨n1i\frac{-1}{c_{n}}\sum_{i=1}^{n-1}c_{i}{\bm{A}}^{n-1-i} if cn0c_{n}\neq 0 and where each coefficient cic_{i} is a polynomial in 𝗍𝗋(𝑨j)\mathsf{tr}({\bm{A}}^{j}), for various jj. Here, 𝗍𝗋()\mathsf{tr}(\cdot) is the trace of a matrix. As a consequence, layers in 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet} can be viewed as polynomials in h𝑳ı𝑰h{\bm{L}}-\imath{\bm{I}} with coefficients polynomials in 𝗍𝗋((h𝑳ı𝑰)j)\mathsf{tr}((h{\bm{L}}-\imath{\bm{I}})^{j}). One now needs three index variables to represent the trace computations 𝗍𝗋((h𝑳ı𝑰)j)\mathsf{tr}((h{\bm{L}}-\imath{\bm{I}})^{j}). Indeed, let φ0(x1,x2)\varphi_{0}(x_{1},x_{2}) be the 𝖳𝖫2\mathsf{TL}_{2} expression representing h𝑳ı𝑰h{\bm{L}}-\imath{\bm{I}}. Then, for example, (h𝑳ı𝑰)j(h{\bm{L}}-\imath{\bm{I}})^{j} can be computed in 𝖳𝖫3\mathsf{TL}_{3} using

φj(x1,x2):=x3φ0(x1,x3)φj1(x3,x2)\varphi_{j}(x_{1},x_{2}):=\sum_{x_{3}}\varphi_{0}(x_{1},x_{3})\cdot\varphi_{j-1}(x_{3},x_{2})

and hence 𝗍𝗋((h𝑳ı𝑰)j)\mathsf{tr}((h{\bm{L}}-\imath{\bm{I}})^{j}) is represented by x1x2φj(x1,x2)𝟏x1=x2.\sum_{x_{1}}\sum_{x_{2}}\varphi_{j}(x_{1},x_{2})\cdot\bm{1}_{x_{1}=x_{2}}.. In other words, we obtain expressions in 𝖳𝖫3\mathsf{TL}_{3}. The polynomials in h𝑳ı𝑰h{\bm{L}}-\imath{\bm{I}} can be represented in 𝖳𝖫2\mathsf{TL}_{2} just as for 𝖢𝗁𝖾𝖻𝖭𝖾𝗍\mathsf{ChebNet}. This implies that each layer in 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet} can be represented, on graphs of fixed size, by 𝖳𝖫3(Ω)\mathsf{TL}_{3}(\Omega) expressions, where Ω\Omega includes the activation function σ\sigma and the function 𝖱𝖾\mathsf{Re}. This suffices to use our general results and conclude that 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet}s are bounded in separation power by 2-𝖶𝖫2\text{-}\mathsf{WL}. An interesting question is to find graphs that can be separated by a 𝖢𝖺𝗒𝗅𝖾𝗒𝖭𝖾𝗍\mathsf{CayleyNet} but not by 1-𝖶𝖫1\text{-}\mathsf{WL}. We leave this as an open problem.

Appendix E Proof of Theorem 5.1

We here consider another higher-order 𝖦𝖭𝖭\mathsf{GNN} proposal: the invariant graph networks or k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} of Maron et al. (2019b). By contrast to k-𝖥𝖦𝖭𝖭sk\text{-}\mathsf{FGNN}\text{s}, k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} are linear architectures. If we denote by k-𝖨𝖦𝖭(t)k\text{-}\mathsf{IGN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} the class of tt layered k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s}, then following inclusions are known (Maron et al., 2019b)

ρ1(k-𝖨𝖦𝖭(t))ρ1(𝗏𝗐𝗅k1(t)) and ρ0(k-𝖨𝖦𝖭)ρ0(𝗀𝗐𝗅k1()).\rho_{1}\bigl{(}k\text{-}\mathsf{IGN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\text{ and }\rho_{0}\bigl{(}k\text{-}\mathsf{IGN}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}.

The reverse inclusions were posed as open problems in Maron et al. (2019a) and were shown to hold by Chen et al. (2020) for k=2k=2, by means of an extensive case analysis and by relying on properties of 1-𝖶𝖫1\text{-}\mathsf{WL}. In this section, we show that the separation power of k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} is bounded by that of (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}, for arbitrary k2k\geq 2. Theorem 4.2 tells that we can entirely shift our attention to showing that the layers of k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} can be represented in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega). In other words, we only need to show that kk index variables are needed for the layers. As we will see below, this requires a bit of work since a naive representation of the layers of k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} use 2k2k index variables. Nevertheless, we show that this can be reduced to kk index variables only.

By inspecting the expressions needed to represent the layers of k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega), we obtain that a tt layer k-𝖨𝖦𝖭(t)k\text{-}\mathsf{IGN}^{(t)} require expressions of summation depth of tktk. In other words, the correspondence between layers and summation depth is precisely in sync. This implies, by Theorem 4.2:

ρ1(k-𝖨𝖦𝖭)=ρ1(𝗏𝗐𝗅k1()),\rho_{1}\bigl{(}k\text{-}\mathsf{IGN}\bigr{)}=\rho_{1}\bigl{(}\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)},

where we ignore the number of layers. We similarly obtain that ρ0(k-𝖨𝖦𝖭)=ρ0(𝗀𝗐𝗅k1())\rho_{0}\bigl{(}k\text{-}\mathsf{IGN}\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}, hereby answering the open problem posed in Maron et al. (2019a). Finally, we observe that the k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} used in Maron et al. (2019b) to show the inclusion ρ1(k-𝖨𝖦𝖭(t))ρ1(𝗏𝗐𝗅k1(t))\rho_{1}\bigl{(}k\text{-}\mathsf{IGN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)} are of very simple form. By defining a simple class of k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s}, denoted by k-𝖦𝖨𝖭sk\text{-}\mathsf{GIN}\text{s}, we obtain

ρ1(k-𝖦𝖨𝖭(t))=ρ1(𝗏𝗐𝗅k1(t)),\rho_{1}\bigl{(}k\text{-}\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)},

hereby recovering the layer/round connections.

We start with the following lemma:

Lemma E.1.

For any k2k\geq 2, a tt layer k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} can be represented in 𝖳𝖫k(tk)(Ω)\mathsf{TL}_{k}^{\!\scalebox{0.6}{(}tk\scalebox{0.6}{)}}(\Omega).

Before proving this lemma, we recall k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s}. These are architectures that consist of linear equivariant layers. Such linear layers allow for an explicit description. Indeed, following Maron et al. (2019c), let \sim_{\ell} be the equality pattern equivalence relation on [n][n]^{\ell} such that for 𝒂,𝒃[n]{\bm{a}},{\bm{b}}\in[n]^{\ell}, 𝒂𝒃{\bm{a}}\sim_{\ell}{\bm{b}} if and only if ai=ajbi=bj{a}_{i}={a}_{j}\Leftrightarrow{b}_{i}={b}_{j} for all j[]j\in[\ell]. We denote by [n]/[n]^{\ell}/_{\sim_{\ell}} the equivalence classes induced by \sim_{\ell}. Let us denote by 𝑭(t1)nk×dt1{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\in\mathbb{R}^{n^{k}\times d_{t-1}} the tensor computed by an k-𝖨𝖦𝖭k\text{-}\mathsf{IGN} in layer t1t-1. Then, in layer tt, a new tensor in nk×dt\mathbb{R}^{n^{k}\times d_{t}} is computed, as follows. For j[dt]j\in[d_{t}] and v1,,vk[n]kv_{1},\ldots,v_{k}\in[n]^{k}:

F(t)v1,,vk,j:=σ(γ[n]2k/2k𝒘[n]k𝟏(𝒗,𝒘)γi[dt1]cγ,i,jF(t1)w1,,wk,i+μ[n]k/k𝟏𝒗μbμ,j){\mathsfit{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},j}:=\sigma\Biggl{(}\sum_{\gamma\in[n]^{2k}\!/_{\!\sim_{2k}}}\sum_{{\bm{w}}\in[n]^{k}}\bm{1}_{({\bm{v}},{\bm{w}})\in\gamma}\sum_{i\in[d_{t-1}]}c_{\gamma,i,j}{\mathsfit{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{w_{1},\ldots,w_{k},i}+\sum_{\mu\in[n]^{k}\!/_{\!\sim_{k}}}\bm{1}_{{\bm{v}}\in\mu}{b}_{\mu,j}\Biggr{)} (1)

for activation function σ\sigma, constants cγ,i,jc_{\gamma,i,j} and bμ,jb_{\mu,j} in \mathbb{R} and where 𝟏(𝒗,𝒘)γ\bm{1}_{({\bm{v}},{\bm{w}})\in\gamma} and 𝟏𝒗μ\bm{1}_{{\bm{v}}\in\mu} are indicator functions for the 2k2k-tuple (𝒗,𝒘)({\bm{v}},{\bm{w}}) to be in the equivalence class γ[n]2k/2k\gamma\in[n]^{2k}\!/_{\!\sim_{2k}} and the kk-tuple 𝒗{\bm{v}} to be in class μ[n]k/k\mu\in[n]^{k}\!/_{\!\sim_{k}}. As initial tensor 𝑭(0){\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}} one defines F(0)v1,,vk,j:=𝖺𝗍𝗉k(G,𝒗)d0,{\mathsfit{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},j}:=\mathsf{atp}_{k}(G,{\bm{v}})\in\mathbb{R}^{d_{0}}, with d0=2(k2)+kd_{0}={2\genfrac{(}{)}{0.0pt}{2}{k}{2}+k\ell} where \ell is the number of initial vertex labels, just as for k-𝖥𝖦𝖭𝖭sk\text{-}\mathsf{FGNN}\text{s}.

We remark that the need for having a summation depth of tktk in the expressions in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega), or equivalently for requiring tktk rounds of (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL}, can intuitively be explained that each layer of a k-𝖨𝖦𝖭k\text{-}\mathsf{IGN} aggregates more information from “neighbouring” kk-tuples than (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL} does. Indeed, in each layer, an k-𝖨𝖦𝖭k\text{-}\mathsf{IGN} can use previous tuple embeddings of all possible kk-tuples. In a single round of (k1)-𝖶𝖫(k-1)\text{-}\mathsf{WL} only previous tuple embeddings from specific sets of kk-tuples are used. It is only after an additional k1k-1 rounds, that k-𝖶𝖫k\text{-}\mathsf{WL} gets to the information about arbitrary kk-tuples, whereas this information is available in a k-𝖨𝖦𝖭k\text{-}\mathsf{IGN} in one layer directly.

Proof of Lemma E.1.

We have seen how 𝑭(0){\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}} can be represented in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega) when dealing with k-𝖥𝖦𝖭𝖭sk\text{-}\mathsf{FGNN}\text{s}. We assume now that also the t1t-1th layer 𝑭(t1){\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}} can be represented by dt1d_{t-1} expressions in 𝖳𝖫k((t1)k)(Ω)\mathsf{TL}_{k}^{\!\scalebox{0.6}{(}(t-1)k\scalebox{0.6}{)}}(\Omega) and show that the same holds for the ttth layer.

We first represent 𝑭(t){\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} in 𝖳𝖫2k(Ω)\mathsf{TL}_{2k}(\Omega), based on the explicit description given earlier. The expressions use index variables x1,,xkx_{1},\ldots,x_{k} and y1,,yky_{1},\ldots,y_{k}. More specifically, for j[dt]j\in[d_{t}] we consider the expressions:

φj(t)(x1,,xk)=σ(γ[n]2k/2ki=1dt1cγ,i,jy1ykψγ(x1,,xk,y1,,yk)φi(t1)(y1,,yk)+μ[n]k/kbμ,jψμ(x1,,xk)),\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1},\ldots,x_{k})=\sigma\left(\sum_{\gamma\in[n]^{2k}\!/_{\!\sim_{2k}}}\sum_{i=1}^{d_{t-1}}c_{\gamma,i,j}\right.\\ \sum_{y_{1}}\cdots\sum_{y_{k}}\psi_{\gamma}(x_{1},\ldots,x_{k},y_{1},\ldots,y_{k})\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\\ \left.{}+\sum_{\mu\in[n]^{k}\!/_{\!\sim_{k}}}b_{\mu,j}\cdot\psi_{\mu}(x_{1},\ldots,x_{k})\right), (2)

where ψμ(x1,,xk)\psi_{\mu}(x_{1},\ldots,x_{k}) is a product of expressions of the form 𝟏xi𝗈𝗉xj\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}} encoding the equality pattern μ\mu, and similarly, ψγ(x1,,xk,y1,,yk)\psi_{\gamma}(x_{1},\ldots,x_{k},y_{1},\ldots,y_{k}) is a product of expressions of the form 𝟏xi𝗈𝗉xj\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}}, 𝟏yi𝗈𝗉yj\bm{1}_{y_{i}\mathop{\mathsf{op}}y_{j}} and 𝟏xi𝗈𝗉yj\bm{1}_{x_{i}\mathop{\mathsf{op}}y_{j}} encoding the equality pattern γ\gamma. These expressions are indicator functions for the their corresponding equality patterns. That is,

[[ψγ,(𝒗,𝒘)]]G={1if (𝒗,𝒘)γ0otherwise[[ψμ,𝒗]]G={1if 𝒗μ0otherwise[\![\psi_{\gamma},({\bm{v}},{\bm{w}})]\!]_{G}=\begin{cases}1&\text{if $({\bm{v}},{\bm{w}})\in\gamma$}\\ 0&\text{otherwise}\end{cases}\quad[\![\psi_{\mu},{\bm{v}}]\!]_{G}=\begin{cases}1&\text{if ${\bm{v}}\in\mu$}\\ 0&\text{otherwise}\end{cases}

We remark that in the expressions φj(t)\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} we have two kinds of summations: those ranging over a fixed number of elements (over equality patterns, feature dimension), and those ranging over the index variables y1,,yky_{1},\ldots,y_{k}. The latter are the only ones contributing the summation depth. The former are just concise representations of a long summation over a fixed number of expressions.

We now only need to show that we can equivalently write φj(t)(x1,,xk)\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1},\ldots,x_{k}) as expressions in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega), that is, using only indices x1,,xkx_{1},\ldots,x_{k}. As such, we can already ignore the term μ[n]k/kbμ,jψμ(x1,,xk)\sum_{\mu\in[n]^{k}\!/_{\!\sim_{k}}}b_{\mu,j}\cdot\psi_{\mu}(x_{1},\ldots,x_{k}) since this is already in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega). Furthermore, this expressions does not affect the summation depth.

Furthermore, as just mentioned, we can expand expression φj(t)\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} into linear combinations of other simpler expressions. As such, it suffices to show that kk index variables suffice for each expression of the form:

y1ykψγ(x1,,xk,y1,,yk)φi(t)(y1,,yk),\sum_{y_{1}}\cdots\sum_{y_{k}}\psi_{\gamma}(x_{1},\ldots,x_{k},y_{1},\ldots,y_{k})\cdot\varphi_{i}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(y_{1},\ldots,y_{k}), (3)

obtained by fixing μ\mu and ii in expression (2). To reduce the number of variables, as a first step we eliminate any disequality using the inclusion-exclusion principle. More precisely, we observe that ψγ(𝒙,𝒚)\psi_{\gamma}({\bm{x}},{\bm{y}}) can be written as:

(i,j)I𝟏xi=xj(i,j)I¯𝟏xixj(i,j)J𝟏yi=yj(i,j)J¯𝟏yiyj(i,j)K𝟏xi=yj(i,j)K¯𝟏xiyj\displaystyle\prod_{(i,j)\in I}\bm{1}_{x_{i}=x_{j}}\cdot\prod_{(i,j)\in\bar{I}}\bm{1}_{x_{i}\neq x_{j}}\cdot\prod_{(i,j)\in J}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in\bar{J}}\bm{1}_{y_{i}\neq y_{j}}\prod_{(i,j)\in K}\bm{1}_{x_{i}=y_{j}}\cdot\prod_{(i,j)\in\bar{K}}\bm{1}_{x_{i}\neq y_{j}}{}
=AI¯BJ¯CK¯(1)|A|+|B|+|C|(i,j)IA𝟏xi=xj(i,j)JB𝟏yi=yj(i,j)JC𝟏xi=yj,\displaystyle=\sum_{A\subseteq\bar{I}}\sum_{B\subseteq\bar{J}}\sum_{C\subseteq\bar{K}}(-1)^{|A|+|B|+|C|}\prod_{(i,j)\in I\cup A}\bm{1}_{x_{i}=x_{j}}\prod_{(i,j)\in J\cup B}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in J\cup C}\bm{1}_{x_{i}=y_{j}}, (4)

for some sets II, JJ and KK of pairs of indices in [k]2[k]^{2}, and where I¯=[k]2I\bar{I}=[k]^{2}\setminus I, J¯=[k]2J\bar{J}=[k]^{2}\setminus J and K¯=[k]2K\bar{K}=[k]^{2}\setminus K. Here we use that 𝟏xixj=1𝟏xi=xj\bm{1}_{x_{i}\neq x_{j}}=1-\bm{1}_{x_{i}=x_{j}}, 𝟏yiyj=1𝟏yi=yj\bm{1}_{y_{i}\neq y_{j}}=1-\bm{1}_{y_{i}=y_{j}} and 𝟏xiyj=1𝟏yi=yj\bm{1}_{x_{i}\neq y_{j}}=1-\bm{1}_{y_{i}=y_{j}} and use the inclusion-exclusion principle to obtain a polynomial in equality conditions only.

In view of expression (4), we can push the summations over y1,,yky_{1},\ldots,y_{k} in expression (3) to the subexpressions that actually use y1,,yky_{1},\ldots,y_{k}. That is, we can rewrite expression (3) into the equivalent expression:

AI¯BJ¯CK¯(1)|A|+|B|+|C|(i,j)IA𝟏xi=xj(y1yk(i,j)JB𝟏yi=yj(i,j)KC𝟏xi=yjφi(t1)(y1,,yk)).\sum_{A\subseteq\bar{I}}\sum_{B\subseteq\bar{J}}\sum_{C\subseteq\bar{K}}(-1)^{|A|+|B|+|C|}\cdot\prod_{(i,j)\in I\cup A}\bm{1}_{x_{i}=x_{j}}\\ {}\cdot\left(\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(i,j)\in J\cup B}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in K\cup C}\bm{1}_{x_{i}=y_{j}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\right). (5)

By fixing AA, BB and CC, it now suffices to argue that

(i,j)IA𝟏xi=xj(y1yk(i,j)JB𝟏yi=yj(i,j)KC𝟏xi=yjφi(t1)(y1,,yk)),\prod_{(i,j)\in I\cup A}\bm{1}_{x_{i}=x_{j}}\cdot\left(\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(i,j)\in J\cup B}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in K\cup C}\bm{1}_{x_{i}=y_{j}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\right), (6)

can be equivalently expressed in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega).

Since our aim is to reduced the number of index variables from 2k2k to kk, it is important to known which variables are the same. In expression (6), some equalities that hold between the variables may not be explicitly mentioned. For this reason, we expand IAI\cup A, JBJ\cup B and KCK\cup C with their implied equalities. That is, 𝟏xi=xj\bm{1}_{x_{i}=x_{j}} is added to IAI\cup A, if for any (𝒗,𝒘)({\bm{v}},{\bm{w}}) such that

[[(i,j)IA𝟏xi=xj(i,j)JB𝟏yi=yj(i,j)KC𝟏xi=yj,(𝒗,𝒘)]]G=1[[𝟏xi=xj,𝒗]]G=1[\![\prod_{(i,j)\in I\cup A}\bm{1}_{x_{i}=x_{j}}\cdot\prod_{(i,j)\in J\cup B}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in K\cup C}\bm{1}_{x_{i}=y_{j}},({\bm{v}},{\bm{w}})]\!]_{G}=1\Rightarrow[\![\bm{1}_{x_{i}=x_{j}},{\bm{v}}]\!]_{G}=1

holds. Similar implied equalities 𝟏yi=yj\bm{1}_{y_{i}=y_{j}} and 𝟏xi=yj\bm{1}_{x_{i}=y_{j}} are added to JBJ\cup B and KCK\cup C, respectively. let us denoted by II^{\prime}, JJ^{\prime} and KK^{\prime}. It should be clear that we can add these implied equalities to expression (6) without changing its semantics. In other words, expression (6) can be equivalently represented by

(i,j)I𝟏xi=xj(y1yk(i,j)J𝟏yi=yj(i,j)K𝟏xi=yjφi((t1)(y1,,yk)),\prod_{(i,j)\in I^{\prime}}\bm{1}_{x_{i}=x_{j}}\cdot\left(\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(i,j)\in J^{\prime}}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in K^{\prime}}\bm{1}_{x_{i}=y_{j}}\cdot\varphi_{i}^{(\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\right), (7)

There now two types of index variables among the y1,,yky_{1},\ldots,y_{k}: those that are equal to some xix_{i}, and those that are not. Now suppose that (j,j)J(j,j^{\prime})\in J^{\prime}, and thus yj=yjy_{j}=y_{j^{\prime}}, and that also (i,j)K(i,j)\in K^{\prime}, and thus xi=yjx_{i}=y_{j}. Since we included the implied equalities, we also have (i,j)K(i,j^{\prime})\in K^{\prime}, and thus xi=yjx_{i}=y_{j^{\prime}}. There is no reason to keep (j,j)J(j,j^{\prime})\in J^{\prime} as it is implied by (i,j)(i,j) and (i,j)K(i,j^{\prime})\in K^{\prime}. We can thus safely remove all pairs (j,j)(j,j^{\prime}) from JJ^{\prime} such that (i,j)K(i,j)\in K^{\prime} (and thus also (i,j)K(i,j^{\prime})\in K^{\prime}). We denote by JJ^{\prime\prime} be the reduced set of pairs of indices obtained from JJ^{\prime} in this way. We have that expression (7) can be equivalently written as

(i,j)I𝟏xi=xj(y1yk(i,j)K𝟏xi=yj(i,j)J𝟏yi=yjφi(t1)(y1,,yk)),\prod_{(i,j)\in I^{\prime}}\bm{1}_{x_{i}=x_{j}}\cdot\left(\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(i,j)\in K^{\prime}}\bm{1}_{x_{i}=y_{j}}\cdot\prod_{(i,j)\in J^{\prime\prime}}\bm{1}_{y_{i}=y_{j}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\right), (8)

where we also switched the order of equalities in JJ^{\prime\prime} and KK^{\prime}. Our construction of JJ^{\prime\prime} and KK^{\prime} ensures that none of the variables yjy_{j} with jj belonging to a pair in JJ^{\prime\prime} is equal to some xix_{i}.

By contrast, the variable yjy_{j} occurring in (i,j)K(i,j)\in K^{\prime} are equal to xix_{i}. We observe, however, that also certain equalities among the variables {x1,,xk}\{x_{1},\ldots,x_{k}\} hold, as represented by the pairs in II^{\prime}. let I(i):={i(i,i)I}I^{\prime}(i):=\{i^{\prime}\mid(i,i^{\prime})\in I^{\prime}\} and define ı^\hat{\imath} as a unique representative element in I(i)I^{\prime}(i). For example, one can take i^\hat{i} to be smallest index in I(i)I^{\prime}(i). We use this representative index (and corresponding xx-variable) to simplify KK^{\prime}. More precisely, we replace each pair (i,j)K(i,j)\in K^{\prime} with the pair (ı^,j)(\hat{\imath},j). In terms of variables, we replace xi=yjx_{i}=y_{j} with xi^=yjx_{\hat{i}}=y_{j}. Let KK^{\prime\prime} be the set KK^{\prime\prime} modified in that way. Expression (8) can thus be equivalently written as

(i,j)I𝟏xi=xj(y1yk(ı^,j)K𝟏xı^=yj(i,j)J𝟏yi=yjφi(t1)(y1,,yk)),\prod_{(i,j)\in I^{\prime}}\bm{1}_{x_{i}=x_{j}}\cdot\left(\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(\hat{\imath},j)\in K^{\prime\prime}}\bm{1}_{x_{\hat{\imath}}=y_{j}}\cdot\prod_{(i,j)\in J^{\prime\prime}}\bm{1}_{y_{i}=y_{j}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\right), (9)

where the free index variables of the subexpression

y1yk(ı^,j)K𝟏xı^=yj(i,j)J𝟏yi=yjφi(t1)(y1,,yk)\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(\hat{\imath},j)\in K^{\prime\prime}}\bm{1}_{x_{\hat{\imath}}=y_{j}}\cdot\prod_{(i,j)\in J^{\prime\prime}}\bm{1}_{y_{i}=y_{j}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k}) (10)

are precisely the index variables xı^x_{\hat{\imath}} for (ı^,j)K(\hat{\imath},j)\in K^{\prime\prime}. Recall that our aim is to reduce the variables from 2k2k to kk. We are now finally ready to do this. More specifically, we consider a bijection β:{y1,,yk}{x1,,xk}\beta:\{y_{1},\ldots,y_{k}\}\to\{x_{1},\ldots,x_{k}\} in which ensure that for each ı^\hat{\imath} there is a jj such that (i^,j)K(\hat{i},j)\in K^{\prime\prime} and β(yj)=xı^\beta(y_{j})=x_{\hat{\imath}}. Furthermore, among the summations y1yk\sum_{y_{1}}\cdots\sum_{y_{k}} we can ignore those for which β(yj)=xı^\beta(y_{j})=x_{\hat{\imath}} holds. After all, they only contribute for a given xı^x_{\hat{\imath}} value. Let YY be those indices in [k][k] such that β(yj)xı^\beta(y_{j})\neq x_{\hat{\imath}} for some ı^\hat{\imath}. Then, we can equivalently write expression (9) as

(i,j)I𝟏xi=xj(β(yi),iY(ı^,j)K𝟏xı^=β(yj)(i,j)J𝟏β(yi)=β(yj)β(φi(t1)(y1,,yk))),\prod_{(i,j)\in I^{\prime}}\bm{1}_{x_{i}=x_{j}}\cdot\Biggl{(}\sum_{\beta(y_{i}),i\in Y}\prod_{(\hat{\imath},j)\in K^{\prime}}\bm{1}_{x_{\hat{\imath}}=\beta(y_{j})}\cdot\prod_{(i,j)\in J^{\prime\prime}}\bm{1}_{\beta(y_{i})=\beta(y_{j})}\\ \cdot\beta(\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k}))\Biggr{)}, (11)

where β(φi(t1)(y1,,yk))\beta(\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})) denotes the expression obtained by renaming of variables y1,,yjy_{1},\ldots,y_{j} in φi(t1)(y1,,yk)\varphi_{i}^{(t-1)}(y_{1},\allowbreak\ldots,y_{k}) into xx-variables according to β\beta. This is our desired expression in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega). If we analyze the summation depth of this expression, we have by induction that the summation depth of φi(t1)\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}} is at most (t1)k(t-1)k. In the above expression, we are increasing the summation depth with at most |Y||Y|. The largest size of YY is kk, which occurs when none of the yy-variables are equal to any of the xx-variables. As a consequence, we obtained an expression of summation depth at most tktk, as desired. ∎

As a consequence, when using k-𝖨𝖦𝖭s(t)k\text{-}\mathsf{IGN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} for vertex embeddings, using (G,v)𝑭(t)v,,v,:(G,v)\to{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v,\ldots,v,:} one simply pads the layer expression with i[k]𝟏x1=xi\prod_{i\in[k]}\bm{1}_{x_{1}=x_{i}} which does not affect the number of variables or summation depth. When using k-𝖨𝖦𝖭s(t)k\text{-}\mathsf{IGN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} of graph embeddings, an additional invariant layer is added to obtain an embedding from GdtG\to\mathbb{R}^{d_{t}}. Such invariant layers have a similar (simpler) representation as given in equation 1 (Maron et al., 2019c), and allow for a similar analysis. One can verify that expressions in 𝖳𝖫k((t+1)k)(Ω)\mathsf{TL}_{k}^{\!\scalebox{0.6}{(}(t+1)k\scalebox{0.6}{)}}(\Omega) are needed when such an invariant layer is added to previous tt layers. Based on this, Theorem 4.2, Lemma E.1 and Theorem 1 in Maron et al. (2019b), imply that ρ1(k-𝖨𝖦𝖭)=ρ1(𝗏𝗐𝗅k1())\rho_{1}\bigl{(}k\text{-}\mathsf{IGN}\bigr{)}=\rho_{1}\bigl{(}\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)} and ρ0(k-𝖨𝖦𝖭)=ρ0(𝗀𝗐𝗅k1())\rho_{0}\bigl{(}k\text{-}\mathsf{IGN}\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)} hold.

kk-dimensional GINs.

We can recover a layer-based characterization for k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} that compute vertex embeddings by considering a special subset of k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s}. Indeed, the k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} used in Maron et al. (2019b) to show ρ1(𝗐𝗅k1(t))ρ1(k-𝖨𝖦𝖭(t))\rho_{1}(\mathsf{wl}_{k-1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(k\text{-}\mathsf{IGN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) are of a very special form. We extract the essence of these special k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} in the form of kk-dimensional 𝖦𝖨𝖭s\mathsf{GIN}\text{s}. That is, we define the class k-𝖦𝖨𝖭sk\text{-}\mathsf{GIN}\text{s} to consist of layers defined as follows. The initial layers are just as for k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s}. Then, for t1t\geq 1:

𝑭(t)v1,,vk,::=𝗆𝗅𝗉0(t)(𝑭(t1)v1,,vk,:,uVG𝗆𝗅𝗉1(t)(𝑭u,v2,,vk,:(t1)),uVG𝗆𝗅𝗉1(t)(𝑭v1,u,,vk,:(t1)),,uVG𝗆𝗅𝗉1(t)(𝑭v1,v2,,vk1,w,:(t1)))),{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},:}:=\mathsf{mlp}_{0}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigl{(}{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},:},\sum_{u\in V_{G}}\mathsf{mlp}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}({\bm{\mathsfit{F}}}_{u,v_{2},\ldots,v_{k},:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}),\sum_{u\in V_{G}}\mathsf{mlp}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}({\bm{\mathsfit{F}}}_{v_{1},u,\ldots,v_{k},:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\\ ,\ldots,\sum_{u\in V_{G}}\mathsf{mlp}_{1}^{(t)}({\bm{\mathsfit{F}}}_{v_{1},v_{2},\ldots,v_{k-1},w,:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}))\bigr{)},

where Fv1,v2,,vk,:(t1)dt1{\mathsfit{F}}_{v_{1},v_{2},\ldots,v_{k},:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\in\mathbb{R}^{d_{t-1}}, 𝗆𝗅𝗉1(t):dt1bt\mathsf{mlp}_{1}^{(t)}:\mathbb{R}^{d_{t-1}}\to\mathbb{R}^{b_{t}} and 𝗆𝗅𝗉1(t):dt1+kbtdt\mathsf{mlp}_{1}^{(t)}:\mathbb{R}^{d_{t-1}+kb_{t}}\to\mathbb{R}^{d_{t}} are 𝖬𝖫𝖯s\mathsf{MLP}\text{s}. It is now an easy exercise to show that 𝗄-𝖦𝖨𝖭(t)\mathsf{k}\textit{-}\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} can be represented in 𝖳𝖫k(t)(Ω)\mathsf{TL}_{k}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega) (remark that the summations used increase the summation depth with one only in each layer). Combined with Theorem 4.2 and by inspecting the proof of Theorem 1 in Maron et al. (2019b), we obtain:

Proposition E.2.

For any k2k\geq 2 and any t0t\geq 0: ρ1(k-𝖦𝖨𝖭(t))=ρ1(𝗏𝗐𝗅k1(t))\rho_{1}(k\text{-}\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}).

We can define the invariant version of k-𝖨𝖦𝖭sk\text{-}\mathsf{IGN}\text{s} by adding a simple readout layer of the form

v1,,vkVG𝗆𝗅𝗉(𝑭v1,,vk,:(t)),\sum_{v_{1},\ldots,v_{k}\in V_{G}}\mathsf{mlp}({\bm{\mathsfit{F}}}_{v_{1},\ldots,v_{k},:}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}),

as is used in Maron et al. (2019b). We obtain, ρ0(k-𝖦𝖨𝖭)=ρ0(𝗀𝗐𝗅k1())\rho_{0}(k\text{-}\mathsf{GIN})=\rho_{0}(\mathsf{gwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}), by simply rephrasing the readout layer in 𝖳𝖫k(Ω)\mathsf{TL}_{k}(\Omega).

Appendix F Details of Section 6

Let 𝒞(𝒢s,)\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell}) be the class of all continuous functions from 𝒢s\mathcal{G}_{s} to \mathbb{R}^{\ell}. We always assume that 𝒢s\mathcal{G}_{s} forms a compact space. For example, when vertices are labeled with values in {0,1}0\{0,1\}^{\ell_{0}}, 𝒢s\mathcal{G}_{s} is a finite set which we equip with the discrete topology. When vertices carry labels in 0\mathbb{R}^{\ell_{0}} we assume that these labels come from a compact set K0K\subset\mathbb{R}^{\ell_{0}}. In this case, one can represent graphs in 𝒢s\mathcal{G}_{s} by elements in (0)2(\mathbb{R}^{\ell_{0}})^{2} and the topology used is the one induced by some metric .\|.\| on the reals. Similarly, we equip \mathbb{R}^{\ell} with the topology induced by some metric .\|.\|.

Consider 𝒞(𝒢s,)\mathcal{F}\subseteq\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell}) and define ¯\overline{\mathcal{F}} as the closure of \mathcal{F} in 𝒞(𝒢s,)\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell}) under the usual topology induced by f𝗌𝗎𝗉G,𝒗f(G,𝒗)f\mapsto\mathsf{sup}_{G,{\bm{v}}}\|f(G,{\bm{v}})\|. In other words, a continuous function h:𝒢sh:\mathcal{G}_{s}\to\mathbb{R}^{\ell} is in ¯\overline{\mathcal{F}} if there exists a sequence of functions f1,f2,f_{1},f_{2},\ldots\in\mathcal{F} such that limi𝗌𝗎𝗉G,𝒗fi(G,𝒗)h(G,𝒗)=0\lim_{i\to\infty}\mathsf{sup}_{G,{\bm{v}}}\|f_{i}(G,{\bm{v}})-h(G,{\bm{v}})\|=0. The following theorem provides a characterization of the closure of a set of functions. We state it here modified to our setting.

Theorem F.1 ((Timofte, 2005)).

Let 𝒞(𝒢s,)\mathcal{F}\subseteq\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell}) such that there exists a set 𝒮𝒞(𝒢s,)\mathcal{S}\subseteq\mathcal{C}(\mathcal{G}_{s},\mathbb{R}) satisfying 𝒮\mathcal{S}\cdot\mathcal{F}\subseteq\mathcal{F} and ρ(𝒮)ρ()\rho(\mathcal{S})\subseteq\rho(\mathcal{F}). Then,

¯:={f𝒞(𝒢s,)|ρ(F)ρ(f),(G,𝒗)𝒢s,f(G,𝒗)(G,𝒗)¯},\overline{\mathcal{F}}:=\bigl{\{}f\in\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell})\bigm{|}\rho(F)\subseteq\rho(f),\forall(G,{\bm{v}})\in\mathcal{G}_{s},f(G,{\bm{v}})\in\overline{\mathcal{F}(G,{\bm{v}})}\bigr{\}},

where (G,𝐯):={h(G,𝐯)h}\mathcal{F}(G,{\bm{v}}):=\{h(G,{\bm{v}})\mid h\in\mathcal{F}\}\subseteq\mathbb{R}^{\ell}. We can equivalently replace ρ()\rho(\mathcal{F}) by ρ(𝒮)\rho(\mathcal{S}) in the expression for ¯\overline{\mathcal{F}}.∎

We will use this theorem to show Theorem 6.1 in the setting that \mathcal{F} consists of functions that can be represented in 𝖳𝖫(Ω)\mathsf{TL}(\Omega), and more generally, sets of functions that satisfy two conditions, stated below. We more generally allow \mathcal{F} to consist of functions f:𝒢sff:\mathcal{G}_{s}\to\mathbb{R}^{\ell_{f}}, where the f\ell_{f}\in\mathbb{N} may depend on ff. We will require \mathcal{F} to satisfy the following two conditions:

concatenation-closed:

If f1:𝒢spf_{1}:\mathcal{G}_{s}\to\mathbb{R}^{p} and f2:𝒢sqf_{2}:\mathcal{G}_{s}\to\mathbb{R}^{q} are in \mathcal{F}, then g:=(f1,f2):𝒢sp+q:(G,𝒗)(f1(G,𝒗),f2(G,𝒗))g:=(f_{1},f_{2}):\mathcal{G}_{s}\to\mathbb{R}^{p+q}:(G,{\bm{v}})\mapsto(f_{1}(G,{\bm{v}}),f_{2}(G,{\bm{v}})) is also in \mathcal{F}.

function-closed:

For a fixed \ell\in\mathbb{N}, for any ff\in\mathcal{F} such that f:𝒢spf:\mathcal{G}_{s}\to\mathbb{R}^{p}, also hf:𝒢sh\circ f:\mathcal{G}_{s}\to\mathbb{R}^{\ell} is in \mathcal{F} for any continuous function h𝒞(p,)h\in\mathcal{C}(\mathbb{R}^{p},\mathbb{R}^{\ell}).

We denote by \mathcal{F}_{\ell} be the subset of \mathcal{F} of functions from 𝒢s\mathcal{G}_{s} to \mathbb{R}^{\ell}. See 6.1

Proof.

The proof consist of (i) verifying the existence of a set 𝒮\mathcal{S} as mentioned Theorem F.1; and of (ii) eliminating the pointwise convergence condition “(G,𝒗)𝒢s,f(G,𝒗)(G,𝒗)¯\forall(G,{\bm{v}})\in\mathcal{G}_{s},f(G,{\bm{v}})\in\overline{\mathcal{F}_{\ell}(G,{\bm{v}})} in the closure characterization in Theorem F.1.

For showing (ii) we argue that (G,𝒗)¯=\overline{\mathcal{F}_{\ell}(G,{\bm{v}})}=\mathbb{R}^{\ell} such that the conditions f(G,𝒗)(G,𝒗)¯f(G,{\bm{v}})\in\overline{\mathcal{F}_{\ell}(G,{\bm{v}})} is automatically satisfied for any f𝒞(𝒢s,)f\in\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell}). Indeed, take an arbitrary f𝒢kf\in\mathcal{G}_{k}\to\mathbb{R}^{\ell} and consider the constant functions bi::𝒙𝒃ib_{i}:\mathbb{R}^{\ell}\to\mathbb{R}^{\ell}:{\bm{x}}\mapsto{\bm{b}}_{i} with 𝒃i{\bm{b}}_{i}\in\mathbb{R}^{\ell} the iith basis vector. Since \mathcal{F} is function-closed for \ell, so is \mathcal{F}_{\ell}. Hence, bi:=gifb_{i}:=g_{i}\circ f\in\mathcal{F}_{\ell} as well. Furthermore, if sa::𝒙a×𝒙s_{a}:\mathbb{R}^{\ell}\to\mathbb{R}^{\ell}:{\bm{x}}\mapsto a\times{\bm{x}}, for aa\in\mathbb{R}, then safs_{a}\circ f\in\mathcal{F}_{\ell} and thus \mathcal{F}_{\ell} is closed under scalar multiplication. Finally, consider +:2:(𝒙,𝒚)𝒙+𝒚+:\mathbb{R}^{2\ell}\to\mathbb{R}^{\ell}:({\bm{x}},{\bm{y}})\mapsto{\bm{x}}+{\bm{y}}. For ff and gg in \mathcal{F}_{\ell}, h=(f,g)h=(f,g)\in\mathcal{F} since \mathcal{F} is concatenation-closed. As a consequence, the function +h:𝒢s+\circ h:\mathcal{G}_{s}\to\mathbb{R}^{\ell} is in \mathcal{F}_{\ell}, showing that \mathcal{F}_{\ell} is also closed under addition. All combined, this shows that \mathcal{F}_{\ell} is closed under taking linear combinations and since the basis vectors of \mathbb{R}^{\ell} can be attained, (G,𝒗)¯:=\overline{\mathcal{F}_{\ell}(G,{\bm{v}})}:=\mathbb{R}^{\ell}, as desired.

For (i), we show the existence of a set 𝒮𝒞(𝒢s,)\mathcal{S}\subseteq\mathcal{C}(\mathcal{G}_{s},\mathbb{R}) such that 𝒮\mathcal{S}\cdot\mathcal{F}_{\ell}\subseteq\mathcal{F}_{\ell} and ρs(𝒮)ρs()\rho_{s}(\mathcal{S})\subseteq\rho_{s}(\mathcal{F}_{\ell}) hold. Similarly as in Azizian & Lelarge (2021), we define

𝒮:={f𝒞(𝒢s,)|(f,f,,f) times}.\mathcal{S}:=\bigl{\{}f\in\mathcal{C}(\mathcal{G}_{s},\mathbb{R})\bigm{|}\underbrace{(f,f,\ldots,f)}_{\text{$\ell$ times}}\in\mathcal{F}_{\ell}\bigr{\}}.

We remark that for s𝒮s\in\mathcal{S} and ff\in\mathcal{F}_{\ell}, sf:𝒢s:(G,𝒗)s(G,𝒗)f(G,𝒗)s\cdot f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}:(G,{\bm{v}})\mapsto s(G,{\bm{v}})\odot f(G,{\bm{v}}), with \odot being pointwise multiplication, is also in \mathcal{F}_{\ell}. Indeed, sf=(s,f)s\cdot f=\odot\circ(s,f) with (s,f)(s,f) the concatenation of ss and ff and :2:(𝒙,𝒚)𝒙𝒚\odot:\mathbb{R}^{2\ell}\to\mathbb{R}^{\ell}:({\bm{x}},{\bm{y}})\to{\bm{x}}\odot{\bm{y}} being pointwise multiplication.

It remains to verify ρs(𝒮)ρs()\rho_{s}(\mathcal{S})\subseteq\rho_{s}(\mathcal{F}_{\ell}). Assume that (G,𝒗)(G,{\bm{v}}) and (H,𝒘)(H,{\bm{w}}) are not in ρs()\rho_{s}(\mathcal{F}_{\ell}). By definition, this implies the existence of a function f^\hat{f}\in\mathcal{F}_{\ell} such that f^(G,𝒗)=𝒂𝒃=f^(H,𝒘)\hat{f}(G,{\bm{v}})={\bm{a}}\neq{\bm{b}}=\hat{f}(H,{\bm{w}}) with 𝒂,𝒃{\bm{a}},{\bm{b}}\in\mathbb{R}^{\ell}. We argue that (G,𝒗)(G,{\bm{v}}) and (H,𝒘)(H,{\bm{w}}) are also not in ρs(𝒮)\rho_{s}(\mathcal{S}) either. Indeed, Proposition 1 in Maron et al. (2019b) implies that there exists natural numbers 𝜶=(α1,,α)\boldsymbol{\alpha}=(\alpha_{1},\ldots,\alpha_{\ell})\in\mathbb{N}^{\ell} such that the mapping h𝜶::𝒙i=1xiαih_{\boldsymbol{\alpha}}:\mathbb{R}^{\ell}\to\mathbb{R}:{\bm{x}}\to\prod_{i=1}^{\ell}{x}_{i}^{\alpha_{i}} satisfies h𝜶(𝒂)=ab=h𝜶(𝒃)h_{\boldsymbol{\alpha}}({\bm{a}})=a\neq b=h_{\boldsymbol{\alpha}}({\bm{b}}), with a,ba,b\in\mathbb{R}. Since \mathcal{F} (and thus also )\mathcal{F}_{\ell}) is function-closed, h𝜶fh_{\boldsymbol{\alpha}}\circ f\in\mathcal{F}_{\ell} for any ff\in\mathcal{F}_{\ell}. In particular, g:=h𝜶f^g:=h_{\boldsymbol{\alpha}}\circ\hat{f}\in\mathcal{F}_{\ell} and concatenation-closure implies that (g,,g):𝒢s(g,\ldots,g):\mathcal{G}_{s}\to\mathbb{R}^{\ell} is in \mathcal{F}_{\ell} too. Hence, g𝒮g\in\mathcal{S}, by definition. It now suffices to observe that g(G,𝒗)=h𝜶(f^(G,𝒗))=ab=h𝜶(f^(H,𝒘))=g(H,𝒘)g(G,{\bm{v}})=h_{\boldsymbol{\alpha}}(\hat{f}(G,{\bm{v}}))=a\neq b=h_{\boldsymbol{\alpha}}(\hat{f}(H,{\bm{w}}))=g(H,{\bm{w}}), and thus (G,𝒗)(G,{\bm{v}}) and (H,𝒘)(H,{\bm{w}}) are not in ρs(𝒮)\rho_{s}(\mathcal{S}), as desired. ∎

When we know more about ρs()\rho_{s}(\mathcal{F}_{\ell}) we can say a bit more. In the following, we let 𝖺𝗅𝗀{𝖼𝗋(t),𝗀𝖼𝗋(t),𝗏𝗐𝗅k(t),𝗀𝗐𝗅k()}\mathsf{alg}\in\{\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\} and only consider the setting where ss is either 0 (invariant graph functions) or s=1s=1 (equivariant graph/vertex functions). See 6.2

Proof.

This is just a mere restatement of Theorem 6.1 in which ρs()\rho_{s}(\mathcal{F}_{\ell}) in the condition ρs()ρs(f)\rho_{s}(\mathcal{F}_{\ell})\subseteq\rho_{s}(f) is replaced by ρs(𝖺𝗅𝗀)\rho_{s}(\mathsf{alg}), where s=1s=1 for 𝖺𝗅𝗀{𝖼𝗋(t),𝗏𝗐𝗅k(t)}\mathsf{alg}\in\{\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\} and s=0s=0 for 𝖺𝗅𝗀{𝗀𝖼𝗋(t),𝗀𝗐𝗅k()}\mathsf{alg}\in\{\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\allowbreak\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\}. ∎

To relate all this to functions representable by tensor languages, we make the following observations. First, if we consider \mathcal{F} to be the set of all functions that can be represented in 𝖦𝖳𝖫(t)(Ω)\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega), 𝖳𝖫2(t+1)(Ω)\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega), 𝖳𝖫k+1(t)(Ω)\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega) or 𝖳𝖫(Ω)\mathsf{TL}(\Omega), then \mathcal{F} will be automatically concatenation and function-closed, provided that Ω\Omega consists of all functions in p𝒞(p,)\textstyle\bigcup_{p}\mathcal{C}(\mathbb{R}^{p},\mathbb{R}^{\ell}). Hence, Theorem 6.1 applies. Furthermore, our results from Section 4 tell us that for all t0t\geq 0, and k1k\geq 1, ρ1(𝖼𝗋(t))=ρ1(𝖦𝖳𝖫(t)(Ω))\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}, ρ0(𝗀𝖼𝗋(t))=ρ0(𝖳𝖫2(t+1)(Ω))=ρ0(𝗀𝗐𝗅1(t))\rho_{0}\bigl{(}\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}, ρ1(𝖾𝗐𝗅k(t))=ρ1(𝖳𝖫k+1(t)(Ω))\rho_{1}\bigl{(}\mathsf{ewl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}, and ρ0(𝖳𝖫k+1(Ω))=ρ0(𝗀𝗐𝗅k())\rho_{0}\bigl{(}\mathsf{TL}_{k+1}(\Omega)\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}. As a consequence, Corollary 6.2 applies as well. We thus easily obtain the following characterizations:

Proposition F.2.

For any t0t\geq 0 and k1k\geq 1:

  • If \mathcal{F} consists of all functions representable in 𝖦𝖳𝖫(t)(Ω)\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega), then ¯={f:𝒢1ρ1(𝖼𝗋(t))ρ1(f)}\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}(f)\};

  • If \mathcal{F} consists of all functions representable in 𝖳𝖫k+1(t)(Ω)\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega), then ¯={f:𝒢1ρ1(𝗏𝗐𝗅k(t))ρ1(f)}\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}(f)\};

  • If \mathcal{F} consists of all functions representable in 𝖳𝖫2(t+1)(Ω)\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega), then ¯={f:𝒢0ρ0(𝗀𝗐𝗅1(t))ρ0(f)}\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{0}\to\mathbb{R}^{\ell}\mid\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}(f)\}; and finally,

  • If \mathcal{F} consists of all functions representable in 𝖳𝖫k+1(Ω)\mathsf{TL}_{k+1}(\Omega), then ¯={f:𝒢0ρ0(𝗀𝗐𝗅k())ρ0(f)}\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{0}\to\mathbb{R}^{\ell}\mid\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}(f)\},

provided that Ω\Omega consists of all functions in p𝒞(p,)\textstyle\bigcup_{p}\mathcal{C}(\mathbb{R}^{p},\mathbb{R}^{\ell}).

In fact, Lemma 32 in Azizian & Lelarge (2021) implies that we can equivalently populate Ω\Omega with all 𝖬𝖫𝖯s\mathsf{MLP}\text{s} instead of all continuous functions. We can thus use 𝖬𝖫𝖯s\mathsf{MLP}\text{s} and continuous functions interchangeably when considering the closure of functions.

At this point, we want to make a comparison with the results and techniques in Azizian & Lelarge (2021). Our proof strategy is very similar and is also based on Theorem F.1. The key distinguishing feature is that we consider functions f:𝒢sff:\mathcal{G}_{s}\to\mathbb{R}^{\ell_{f}} instead of functions from graphs alone. This has as great advantage that no separate proofs are needed to deal with invariant or equivariant functions. Equivariance incurs quite some complexity in the setting considered in Azizian & Lelarge (2021). A second major difference is that, by considering functions representable in tensor languages, and based on our results from Section 4, we obtain a more fine-grained characterization. Indeed, we obtain characterizations in terms of the number of rounds used in 𝖢𝖱\mathsf{CR} and k-𝖶𝖫k\text{-}\mathsf{WL}. In Azizian & Lelarge (2021), tt is always set to \infty, that is, an unbounded number of rounds is considered. Furthermore, when it concerns functions f:𝒢1ff:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{f}}, we recall that 𝖢𝖱\mathsf{CR} is different from 1-𝖶𝖫1\text{-}\mathsf{WL}. Only 1-𝖶𝖫1\text{-}\mathsf{WL} is considered in Azizian & Lelarge (2021). Finally, another difference is that we define the equivariant version 𝗏𝗐𝗅k\mathsf{vwl}_{k} in a different way than is done in Azizian & Lelarge (2021), because in this way, a tighter connection to logics and tensor languages can be made. In fact, if we were to use the equivariant version of k-𝖶𝖫k\text{-}\mathsf{WL} from Azizian & Lelarge (2021), then we necessarily have to consider an unbounded number of rounds (similarly as in our 𝗀𝗐𝗅k\mathsf{gwl}_{k} case).

We conclude this section by providing a little more details about the consequences of the above results for 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. As we already mentioned in Section 6.2, many common 𝖦𝖭𝖭\mathsf{GNN} architectures are concatenation and function-closed (using 𝖬𝖫𝖯s\mathsf{MLP}\text{s} instead of continuous functions). This holds, for example, for the classes 𝖦𝖨𝖭(t)\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, 𝖾𝖦𝖨𝖭(t)\mathsf{e}\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\ell}, k-𝖥𝖦𝖭𝖭(t)k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} and k-𝖦𝖨𝖭(t)k\text{-}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} and k-𝖨𝖦𝖭(t)k\text{-}\mathsf{IGN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, as described in Section 5 and further detailed in Section E and D. Here, the subscript \ell refers to the dimension of the embedding space.

We now consider a function ff that is not more separating than 𝖼𝗋(t)\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} (respectively, 𝗀𝖼𝗋(t)\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}, 𝗏𝗐𝗅(t)k\mathsf{vwl}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{k} or 𝗀𝗐𝗅()k\mathsf{gwl}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}_{k}, for some k1k\geq 1), and want to know whether ff can be approximated by a class of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}. Proposition F.2 tells that such ff can be approximated by a class of 𝖦𝖭𝖭s\mathsf{GNN}\text{s} as long as these are at least as separating as 𝖦𝖳𝖫(t)\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} (respectively, 𝖳𝖫2(t+1)\mathsf{TL}_{2}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}, 𝖳𝖫k+1(t)\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} or 𝖳𝖫k+1()\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}). This, in turn, amounts showing that the 𝖦𝖭𝖭s\mathsf{GNN}\text{s} can be represented in the corresponding tensor language fragment, and that they can match the corresponding labeling algorithm in separation power. We illustrate this for the 𝖦𝖭𝖭\mathsf{GNN} architectures mentioned above.

  • In Section 5 we showed that 𝖦𝖨𝖭(t)\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} can be represented in 𝖦𝖳𝖫(t)(Ω)\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega). Theorem 4.3 then implies that ρ1(𝖼𝗋(t))ρ1(𝖦𝖨𝖭(t))\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}). Furthermore, Xu et al. (2019) showed that ρ1(𝖦𝖨𝖭(t))ρ1(𝖼𝗋(t))\rho_{1}(\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}). As a consequence, ρ1(𝖦𝖨𝖭(t))=ρ1(𝖼𝗋(t))\rho_{1}(\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}). We note that the lower bound for 𝖦𝖨𝖭s\mathsf{GIN}\text{s} only holds when graphs carry discrete labels. The same restriction is imposed in Azizian & Lelarge (2021).

  • In Section 5 we showed that 𝖾𝖦𝖨𝖭(t)\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} can be represented in 𝖳𝖫2(t)(Ω)\mathsf{TL}_{2}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega). Theorem 4.2 then implies that ρ1(𝗏𝗐𝗅1(t))ρ1(𝖾𝖦𝖨𝖭(t))\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}). Furthermore, Barceló et al. (2020) showed that ρ1(𝖾𝖦𝖨𝖭(t))ρ1(𝗏𝗐𝗅1(t))\rho_{1}(\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}). As a consequence, ρ1(𝖾𝖦𝖨𝖭(t))=ρ1(𝗏𝗐𝗅1(t))\rho_{1}(\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}). Again, the lower bound is only valid when graphs carry discrete labels.

  • In Section 5 we mentioned (see details in Section D) that k-𝖥𝖦𝖭𝖭(t)k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} can be represented in 𝖳𝖫k+1(t)(Ω)\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega). Theorem 4.2 then implies that ρ1(𝗏𝗐𝗅k(t))ρ1(k-𝖥𝖦𝖭𝖭(t))\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}). Furthermore, Maron et al. (2019b) showed that ρ1(k-𝖥𝖦𝖭𝖭(t))ρ1(𝗏𝗐𝗅k(t))\rho_{1}(k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}). As a consequence, ρ1(k-𝖥𝖦𝖭𝖭(t))ρ1(𝗏𝗐𝗅k(t))\rho_{1}(k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}). Similarly, ρ1((k+1)-𝖦𝖨𝖭(t))=ρ1(𝗏𝗐𝗅k(t))\rho_{1}((k+1)\text{-}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) for the special class of (k+1)-𝖨𝖦𝖭s(k+1)\text{-}\mathsf{IGN}\text{s} described in Section E. No restrictions are in place for the lower bounds and hence real-valued vertex-labelled graphs can be considered.

  • When 𝖦𝖨𝖭(t)\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} or 𝖾𝖦𝖨𝖭(t)\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} are extended with a readout layer, we showed in Section 5 that these can be represented in 𝖳𝖫2(t+1)(Ω)\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega). Theorem 4.4 and the results by Xu et al. (2019) and Barceló et al. (2020) then imply that ρ0(𝗏𝗐𝗅1(t))\rho_{0}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) and ρ0(𝗀𝖼𝗋(t))\rho_{0}(\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}) coincide with the separation power of these architectures with a readout layer. Here again, discrete labels need to be considered.

  • Similarly, when k-𝖥𝖦𝖭𝖭k\text{-}\mathsf{FGNN} or (k+1)-𝖨𝖦𝖭s(k+1)\text{-}\mathsf{IGN}\text{s} are used for graph embeddings, we can represent these in 𝖳𝖫k+1(Ω)\mathsf{TL}_{k+1}(\Omega) resulting again that their separation power coincides with that of 𝗀𝗐𝗅k()\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}. No restrictions are again in place on the vertex labels.

So for all these architectures, Corollary 6.2 applies and we can characterize the closures of these architectures in terms of functions that not more separating than their corresponding versions of 𝖼𝗋\mathsf{cr} or k-𝖶𝖫k\text{-}\mathsf{WL}, as described in the main paper. In summary,

Proposition F.3.

For any t0t\geq 0:

𝖦𝖨𝖭(t)¯\displaystyle\overline{\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}} ={f:𝒢1ρ1(𝖼𝗋(t))ρ1(f)}=𝖦𝖳𝖫(t)(Ω)¯\displaystyle=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f)\}=\overline{\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)_{\ell}}
𝖾𝖦𝖨𝖭(t)¯\displaystyle\overline{\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}} ={f:𝒢1ρ1(𝗏𝗐𝗅1(t))ρ1(f)}=𝖳𝖫2(t)(Ω)¯\displaystyle=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f)\}=\overline{\mathsf{TL}_{2}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)_{\ell}}
and when extended with a readout layer:
𝖦𝖨𝖭(t)¯=𝖾𝖦𝖨𝖭(t)¯\displaystyle\overline{\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}=\overline{\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}} ={f:𝒢0ρ0(𝗀𝗐𝗅1(t))ρ0(f)}=𝖳𝖫2(t+1)(Ω)¯.\displaystyle=\{f:\mathcal{G}_{0}\to\mathbb{R}^{\ell}\mid\rho_{0}(\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{0}(f)\}=\overline{\mathsf{TL}_{2}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)_{\ell}}.
Furthermore, for any k1k\geq 1
k-𝖥𝖦𝖭𝖭(t)¯=k-𝖦𝖨𝖭(t)¯\displaystyle\overline{k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}=\overline{k\text{-}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}} ={f:𝒢1ρ1(𝗏𝗐𝗅k(t))ρ1(f)}=𝖳𝖫k+1(t)(Ω)¯\displaystyle=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f)\}=\overline{\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)_{\ell}}
(k+1)-𝖨𝖦𝖭¯\displaystyle\overline{(k+1)\text{-}\mathsf{IGN}_{\ell}} ={f:𝒢1ρ1(𝗏𝗐𝗅k())ρ1(f)}=𝖳𝖫k+1(Ω)¯\displaystyle=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}})\subseteq\rho_{1}(f)\}=\overline{\mathsf{TL}_{k+1}(\Omega)_{\ell}}
and when converted into graph embeddings:
k-𝖥𝖦𝖭𝖭¯=k-𝖦𝖨𝖭¯\displaystyle\overline{k\text{-}\mathsf{FGNN}_{\ell}}=\overline{k\text{-}\mathsf{GIN}_{\ell}} =(k+1)-𝖨𝖦𝖭¯={f:𝒢0ρ0(𝗀𝗐𝗅k())ρ0(f)}=𝖳𝖫k+1(Ω)¯,\displaystyle=\overline{(k+1)\text{-}\mathsf{IGN}_{\ell}}=\{f:\mathcal{G}_{0}\to\mathbb{R}^{\ell}\mid\rho_{0}(\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}})\subseteq\rho_{0}(f)\}=\overline{\mathsf{TL}_{k+1}(\Omega)_{\ell}},

where the closures of the tensor languages are interpreted as the closure of the graph or graph/vertex functions that they can represent. For results involving 𝖦𝖨𝖭s\mathsf{GIN}\text{s} or 𝖾𝖦𝖨𝖭s\mathsf{e}\mathsf{GIN}\text{s}, the graphs considered should have discretely labeled vertices.

As a side note, we remark that in order to simulate 𝖢𝖱\mathsf{CR} on graphs with real-valued labels, one can use a 𝖦𝖭𝖭\mathsf{GNN} architecture of the form 𝑭(t)v:=(𝑭v:(t1),uNG(v)𝗆𝗅𝗉(𝑭u:(t1))){\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v:}=\bigl{(}{\bm{F}}_{v:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}},\sum_{u\in N_{G}(v)}\mathsf{mlp}({\bm{F}}_{u:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\bigr{)}, which translates in 𝖦𝖳𝖫(t)(Ω)\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega) as expressions of the form

φj(t)(x1):={φj(t1)(x1)1jdt1x2E(x1,x2)𝗆𝗅𝗉j(φ1(t1)(x1),,φdt(t1)(x1))dt1<jdt.\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}):=\begin{cases}\varphi_{j}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})&1\leq j\leq d_{t-1}\\ \textstyle\sum_{x_{2}}E(x_{1},x_{2})\cdot\mathsf{mlp}_{j}\bigl{(}\varphi_{1}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}),\ldots,\varphi_{{d_{t}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})\bigr{)}&d_{t-1}<j\leq d_{t}.\end{cases}

The upper bound in terms of 𝖢𝖱\mathsf{CR} follows from our main results. To show that 𝖢𝖱\mathsf{CR} can be simulated, it suffices to observe that one can approximate the function used in Proposition 1 in Maron et al. (2019b) to injectively encode multisets of real vectors by means of 𝖬𝖫𝖯s\mathsf{MLP}\text{s}. As such, a continuous version of the first bullet in the previous proposition can be obtained.

Appendix G Details on Treewidth and Proposition 4.5

As an extension of our main results in Section 4, we enrich the class of tensor language expressions for which connections to k-𝖶𝖫k\text{-}\mathsf{WL} exist. More precisely, instead of requiring expressions to belong to 𝖳𝖫k+1(Ω)\mathsf{TL}_{k+1}(\Omega), that is to only use k+1k+1 index variables, we investigate when expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) are semantically equivalent to an expression using k+1k+1 variables. Proposition 4.5 identifies a large class of such expressions, those of treewidth kk. As a consequence, even when representing 𝖦𝖭𝖭\mathsf{GNN} architectures may require more than k+1k+1 index variables, sometimes this number can be reduced. As a consequence of our results, this implies that their separation power is in fact upper bounded by -𝖶𝖫\ell\text{-}\mathsf{WL} for a smaller <k\ell<k. Stated otherwise, to boost the separation power of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}, the treewidth of the expressions representing the layers of the 𝖦𝖭𝖭s\mathsf{GNN}\text{s} must have large treewidth.

We next introduce some concepts related to treewidth. We here closely follow the exposition given in Abo Khamis et al. (2016) for introducing treewidth by means variable elimination sequences of hypergraphs.

In this section, we restrict ourselves to summation aggregation.

G.1 Elimination sequences

We first define elimination sequences for hypergraphs. Later on, we show how to associate such hypergraphs to expressions in tensor languages, allowing us to define elimination sequences for tensor language expressions.

With a multi-hypergraph =(𝒱,)\mathcal{H}=(\mathcal{V},\mathcal{E}) we simply mean a multiset \mathcal{E} of subsets of vertices 𝒱\mathcal{V}. An elimination hypergraph sequences is a vertex ordering σ=v1,,vn\sigma=v_{1},\ldots,v_{n} of the vertices of \mathcal{H}. With such a sequence σ\sigma, we can associate for j=n,n1,n2,,1j=n,n-1,n-2,\ldots,1 a sequence of nn multi-hypergraphs nσ,n1σ,,1σ\mathcal{H}_{n}^{\sigma},\mathcal{H}_{n-1}^{\sigma},\ldots,\mathcal{H}_{1}^{\sigma} as follows. We define

n\displaystyle\mathcal{H}_{n} :=(𝒱n,n):=\displaystyle:=(\mathcal{V}_{n},\mathcal{E}_{n}):=\mathcal{H}
(vn)\displaystyle\partial(v_{n}) :={FnvnF}\displaystyle:=\{F\in\mathcal{E}_{n}\mid v_{n}\in F\}
Un\displaystyle U_{n} :=F(vn)F.\displaystyle:=\bigcup_{F\in\partial(v_{n})}F.
and for j=n1,n2,,1:j=n-1,n-2,\ldots,1:
𝒱j\displaystyle\mathcal{V}_{j} :={v1,,vj}\displaystyle:=\{v_{1},\ldots,v_{j}\}
j\displaystyle\mathcal{E}_{j} :=(j+1(vj+1)){Uj+1{vj+1}}\displaystyle:=(\mathcal{E}_{j+1}\setminus\partial(v_{j+1}))\cup\{U_{j+1}\setminus\{v_{j+1}\}\}
(vj)\displaystyle\partial(v_{j}) :={FjvjF}\displaystyle:=\{F\in\mathcal{E}_{j}\mid v_{j}\in F\}
Uj\displaystyle U_{j} :=F(vj)F.\displaystyle:=\bigcup_{F\in\partial(v_{j})}F.

The induced width on \mathcal{H} by σ\sigma is defined as maxi[n]|Ui|1\max_{i\in[n]}|U_{i}|-1. We further consider the setting in which \mathcal{H} has some distinguished vertices. As we will see shortly, these distinguished vertices correspond to the free index variables of tensor language expressions. Without loss of generality, we assume that the distinguished vertices are v1,v2,,vfv_{1},v_{2},\ldots,v_{f}. When such distinguished vertices are present, an elimination sequence is just as before, except that the distinguished vertices come first in the sequence. If v1,,vfv_{1},\ldots,v_{f} are the distinguished vertices, then we define the induced width of the sequence as f+maxf+1in|Ui{v1,,vf}|1f+\max_{f+1\leq i\leq n}|U_{i}\setminus\{v_{1},\ldots,v_{f}\}|-1. In other words, we count the number of distinguished vertices, and then augment it with the induced width of the sequence, starting from vf+1v_{f+1} to to vnv_{n}, hereby ignoring the distinguished variables in the UiU_{i}’s. One could, more generally, also try to reduce the number of free index variables but we assume that this number is fixed, similarly as how 𝖦𝖭𝖭s\mathsf{GNN}\text{s} operate.

G.2 Conjunctive 𝖳𝖫\mathsf{TL} expressions and treewidth

We start by considering a special form of 𝖳𝖫\mathsf{TL} expressions, which we refer to as conjunctive 𝖳𝖫\mathsf{TL} expressions, in analogy to conjunctive queries in database research and logic. A conjunctive 𝖳𝖫\mathsf{TL} expression is of the form

φ(𝒙)=𝒚ψ(𝒙,𝒚).\varphi({\bm{x}})=\sum_{{\bm{y}}}\psi({\bm{x}},{\bm{y}}).

where 𝒙{\bm{x}} denote the free index variables, 𝒚{\bm{y}} contains all index variables under the scope of a summation, and finally, ψ(𝒙,𝒚)\psi({\bm{x}},{\bm{y}}) is a product of base predicates in 𝖳𝖫\mathsf{TL}. That is, ψ(𝒙,𝒚)\psi({\bm{x}},{\bm{y}}) is a product of E(zi,zj)E(z_{i},z_{j}) and P(zi)P_{\ell}(z_{i}) with zi,zjz_{i},z_{j} variables in 𝒙{\bm{x}} or 𝒚{\bm{y}}. With such a conjunctive 𝖳𝖫\mathsf{TL} expression, one can associate a multi-hypergraph in a canonical way (Abo Khamis et al., 2016). More precisely, given a conjunctive 𝖳𝖫\mathsf{TL} expression φ(𝒙)\varphi({\bm{x}}) we define φ\mathcal{H}_{\varphi} as:

  • 𝒱φ\mathcal{V}_{\varphi} consist of all index variables in 𝒙{\bm{x}} and 𝒚{\bm{y}};

  • φ\mathcal{E}_{\varphi}: for each atomic base predicate τ\tau in ψ\psi we have an edge FτF_{\tau} containing the indices occurring in the predicate; and

  • the vertices corresponding to the free index variables 𝒙{\bm{x}} form the distinguishing set of vertices.

We now define an elimination sequence for φ\varphi as an elimination sequence for φ\mathcal{H}_{\varphi} taking the distinguished vertices into account. The following observation ties elimination sequences of φ\varphi to the number of variables needed to express φ\varphi.

Proposition G.1.

Let φ(𝐱)\varphi({\bm{x}}) be a conjunctive 𝖳𝖫\mathsf{TL} expression for which an elimination sequence of induced with k1k-1 exists. Then φ(𝐱)\varphi({\bm{x}}) is equivalent to an expression φ~(𝐱)\tilde{\varphi}({\bm{x}}) in 𝖳𝖫k\mathsf{TL}_{k}.

Proof.

We show this by induction on the number of vertices in φ\mathcal{H}_{\varphi} which are not distinguished. For the base case, all vertices are distinguished and hence φ(𝒙)\varphi({\bm{x}}) does not contain any summation and is an expression in 𝖳𝖫k\mathsf{TL}_{k} itself.

Suppose that in φ\mathcal{H}_{\varphi} there are pp undistinguished vertices. That is,

φ(𝒙)=y1ypψ(𝒙,𝒚).\varphi({\bm{x}})=\sum_{y_{1}}\cdots\sum_{y_{p}}\psi({\bm{x}},{\bm{y}}).

By assumption, we have an elimination sequence of the undistinguished vertices. Assume that ypy_{p} is first in this ordering. Let us write

φ(𝒙)\displaystyle\varphi({\bm{x}}) =y1ypψ(𝒙,𝒚)\displaystyle=\sum_{y_{1}}\cdots\sum_{y_{p}}\psi({\bm{x}},{\bm{y}})
=y1yp1ψ1(𝒙,𝒚yp)ypψ2(𝒙,𝒚)\displaystyle=\sum_{y_{1}}\cdots\sum_{y_{p-1}}\psi_{1}({\bm{x}},{\bm{y}}\setminus y_{p})\cdot\sum_{y_{p}}\psi_{2}({\bm{x}},{\bm{y}})

where ψ1\psi_{1} is the product of predicates corresponding to the edges Fφ(yp)F\in\mathcal{E}_{\varphi}\setminus\partial(y_{p}), that is, those not containing ypy_{p}, and ψ2\psi_{2} is the product of all predicates corresponding to the edges F(yp)F\in\partial(y_{p}), that is, those containing the predicate ypy_{p}. Note that, because of the induced width of k1k-1, ypψ2(𝒙,𝒚)\sum_{y_{p}}\psi_{2}({\bm{x}},{\bm{y}}) contains all indices in UpU_{p} which is of size k\leq k. We now replace the previous expression with another expression

φ(𝒙)\displaystyle\varphi^{\prime}({\bm{x}}) =y1yp1ψ1(𝒙,𝒚yp)Rp(𝒙,𝒚)\displaystyle=\sum_{y_{1}}\cdots\sum_{y_{p-1}}\psi_{1}({\bm{x}},{\bm{y}}\setminus y_{p})\cdot R_{p}({\bm{x}},{\bm{y}})

Where RpR_{p} is regarded as an |Up|1|U_{p}|-1-ary predicate over the indices in UpypU_{p}\setminus y_{p}. It is now easily verified that φ\mathcal{H}_{\varphi^{\prime}} is the hypergraph p1\mathcal{H}_{p-1} corresponding to the variable ordering σ\sigma. We note that this is a hypergraph over p1p-1 undistinguished vertices. We can apply the induction hypothesis and replace φ(𝒙)\varphi^{\prime}({\bm{x}}) with its equivalent expression φ~(𝒙)\tilde{\varphi}^{\prime}({\bm{x}}) in 𝖳𝖫k\mathsf{TL}_{k}. To obtain the expression φ~(𝒙)\tilde{\varphi}({\bm{x}}) of φ(𝒙)\varphi({\bm{x}}), it now remains to replace the new predicate RpR_{p} with its defining expression. We note again that RpR_{p} contains at most k1k-1 indices, so it will occur in φ~(𝒙)\tilde{\varphi}^{\prime}({\bm{x}}) in the form Rp(𝒙,𝒛)R_{p}({\bm{x}},{\bm{z}}) where |𝒛|k1|{\bm{z}}|\leq k-1. In other words, one of the variables in 𝒛{\bm{z}} is not used, say zsz_{s}, and we can simply replace Rp(𝒙,𝒛)R_{p}({\bm{x}},{\bm{z}}) by zsψx(𝒙,𝒛,zs)\sum_{z_{s}}\psi_{x}({\bm{x}},{\bm{z}},z_{s}). ∎

As a consequence, one way of showing that a conjunctive expression φ(𝒙)\varphi({\bm{x}}) in 𝖳𝖫\mathsf{TL} is equivalently expressible in 𝖳𝖫k\mathsf{TL}_{k}, is to find an elimination sequence of induced width k1k-1. This in turn is equivalent to φ\mathcal{H}_{\varphi} having a treewidth of k1k-1, as is shown, e.g., in Abo Khamis et al. (2016). As usual, we define the treewidth of a conjunctive expression φ(𝒙)\varphi({\bm{x}}) in 𝖳𝖫\mathsf{TL} as the treewidth of its associated hypergraph φ\mathcal{H}_{\varphi}.

We recall the definition of treewidth (modified to our setting): A tree decomposition T=(VT,ET,ξT)T=(V_{T},E_{T},\xi_{T}) of φ\mathcal{H}_{\varphi} with ξT:VT2𝒱\xi_{T}:V_{T}\to 2^{\mathcal{V}} is such that

  • For any FF\in\mathcal{E}, there is a tVTt\in V_{T} such that FξT(t)F\subseteq\xi_{T}(t); and

  • For any v𝒱v\in\mathcal{V} corresponding to a non-distinguished index variable, the set {ttVT,vξ(t)}\{t\mid t\in V_{T},v\in\xi(t)\} is not empty and forms a connected sub-tree of TT.

The width of a tree decomposition TT is given by maxtVT|ξT(t)|1\max_{t\in V_{T}}|\xi_{T}(t)|-1. Now the treewidth of φ\mathcal{H}_{\varphi}, 𝗍𝗐()\mathsf{tw}(\mathcal{H}) is the minimum width of any of its tree decompositions. We denote by 𝗍𝗐(φ)\mathsf{tw}(\varphi) the treewidth of φ\mathcal{H}_{\varphi}. Again, similar modifications are used when distinguished vertices are in place. Referring again to Abo Khamis et al. (2016), 𝗍𝗐(φ)=k1\mathsf{tw}(\varphi)=k-1 is equivalent to having a variable elimination sequence for φ\varphi of an induced width of k1k-1. Hence, combining this observation with Proposition G.1 results in:

Corollary G.2.

Let φ(𝐱)\varphi({\bm{x}}) be a conjunctive 𝖳𝖫\mathsf{TL} expression of treewidth k1k-1. Then φ(𝐱)\varphi({\bm{x}}) is equivalent to an expression φ~(𝐱)\tilde{\varphi}({\bm{x}}) in 𝖳𝖫k\mathsf{TL}_{k}.

That is, we have established Proposition 4.5 for conjunctive 𝖳𝖫\mathsf{TL} expressions. We next lift this to arbitrary 𝖳𝖫(Ω)\mathsf{TL}(\Omega) expressions.

G.3 Arbitrary 𝖳𝖫(Ω)\mathsf{TL}(\Omega) expressions

First, we observe that any expression in 𝖳𝖫\mathsf{TL} can be written as a linear combination of conjunctive expressions. This readily follows from the linearity of the operations in 𝖳𝖫\mathsf{TL} and that equality and inequality predicates can be eliminated. More specifically, we may assume that φ(𝒙)\varphi({\bm{x}}) in 𝖳𝖫\mathsf{TL} is of the form

αAaαψα(𝒙,𝒚),\sum_{\alpha\in A}a_{\alpha}\psi_{\alpha}({\bm{x}},{\bm{y}}),

with AA finite set of indices and aαa_{\alpha}\in\mathbb{R}, and ψα(𝒙,𝒚)\psi_{\alpha}({\bm{x}},{\bm{y}}) conjunctive 𝖳𝖫\mathsf{TL} expressions. We now define

𝗍𝗐(φ):=max{𝗍𝗐(ψα)αA}\mathsf{tw}(\varphi):=\max\{\mathsf{tw}(\psi_{\alpha})\mid\alpha\in A\}

for expressions in 𝖳𝖫\mathsf{TL}. To deal with expressions in 𝖳𝖫(Ω)\mathsf{TL}(\Omega) that may contain function application, we define 𝗍𝗐(φ)\mathsf{tw}(\varphi) as the maximum treewidth of the expressions: (i) φ𝗇𝗈𝖿𝗎𝗇(𝒙)𝖳𝖫\varphi_{\mathsf{nofun}}({\bm{x}})\in\mathsf{TL} obtained by replacing each top-level function application f(φ1,,φp)f(\varphi_{1},\ldots,\varphi_{p}) by a new predicate RfR_{f} with free indices 𝖿𝗋𝖾𝖾(φ1)𝖿𝗋𝖾𝖾(φp)\mathsf{free}(\varphi_{1})\cup\cdots\cup\mathsf{free}(\varphi_{p}); and (ii) all expressions φ1,,φp\varphi_{1},\ldots,\varphi_{p} occurring in a top-level function application f(φ1,,φp)f(\varphi_{1},\ldots,\varphi_{p}) in φ\varphi. We note that these expression either have no function applications (as in (i)) or have function applications of lower nesting depth (in φ\varphi, as in (ii)(ii)). In other words, applying this definition recursively, we end up with expressions with no function applications, for which treewidth was already defined. With this notion of treewidth at hand, Proposition 4.5 now readily follows.

Appendix H Higher-order MPNNs

We conclude the supplementary material by elaborating on kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and by relating them to classical 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (Gilmer et al., 2017). As underlying tensor language we use 𝖳𝖫k+1(Ω,Θ)\mathsf{TL}_{k+1}(\Omega,\Theta) which includes arbitrary functions (Ω\Omega) and aggregation functions (Θ\Theta), as defined in Section C.5.

We recall from Section 3 that kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} refer to the class of embeddings f:𝒢sf:\mathcal{G}_{s}\to\mathbb{R}^{\ell} for some \ell\in\mathbb{N} that can be represented in 𝖳𝖫k+1(Ω,Θ)\mathsf{TL}_{k+1}(\Omega,\Theta). When considering an embedding f:𝒢sf:\mathcal{G}_{s}\to\mathbb{R}^{\ell}, the notion of being represented is defined in terms of the existence of \ell expressions in 𝖳𝖫k+1(Ω,Θ)\mathsf{TL}_{k+1}(\Omega,\Theta), which together provide each of the \ell components of the embedding in \mathbb{R}^{\ell}. We remark, however, that we can alternatively include concatenation in tensor language. As such, we can concatenate \ell separate expressions into a single expression. As a positive side effect, for f:𝒢sf:\mathcal{G}_{s}\to\mathbb{R}^{\ell} to be represented in tensor language, we can then simply define it by requiring the existence of a single expression, rather than \ell separate ones. This results in a slightly more succinct way of reasoning about kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}.

In order to reason about kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} as a class of embeddings, we can obtain an equivalent definition for the class of kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} by inductively stating how new embeddings are computed out of old embeddings. Let X={x1,,xk+1}X=\{x_{1},\ldots,x_{k+1}\} be a set of k+1k+1 distinct variables. In the following, 𝒗{\bm{v}} denotes a tuple of vertices that have at least as many components as the highest index of variables used in expressions. Intuitively, variable xjx_{j} refers to the jjth component in 𝒗{\bm{v}}. We also denote the image of a graph GG and tuple 𝒗{\bm{v}} by an expression φ\varphi, i.e., the semantics of φ\varphi given GG and 𝒗{\bm{v}}, as φ(G,𝒗)\varphi(G,{\bm{v}}) rather than by [[φ,𝒗]]G[\![\varphi,{\bm{v}}]\!]_{G}. We further simply refer to embeddings rather than expressions.

We first define “atomic” kk-𝖬𝖯𝖭𝖭\mathsf{MPNN} embeddings which extract basic information from the graph GG and the given tuple 𝒗{\bm{v}} of vertices.

  • Label embeddings of the form φ(xi):=𝖯s(xi)\varphi(x_{i}):=\mathsf{P}_{s}(x_{i}), with xiXx_{i}\in X, and defined by φ(G,𝒗):=(𝖼𝗈𝗅G(vi))s\varphi(G,{\bm{v}}):=(\mathsf{col}_{G}(v_{i}))_{s}, are kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s};

  • Edge embeddings of the form φ(xi,xj):=𝖤(xi,xj)\varphi(x_{i},x_{j}):=\mathsf{E}(x_{i},x_{j}), with xi,xjXx_{i},x_{j}\in X, and defined by

    φ(G,𝒗):={1if vivjEG0otherwise,\varphi(G,{\bm{v}}):=\begin{cases}1&\text{if $v_{i}v_{j}\in E_{G}$}\\ 0&\text{otherwise},\end{cases}

    are kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}; and

  • (Dis-)equality embeddings of the form φ(xi,xj):=𝟏xi𝗈𝗉xj\varphi(x_{i},x_{j}):=\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}}, with xi,xjXx_{i},x_{j}\in X, and defined by

    φ(G,𝒗):={1if vi𝗈𝗉vj0otherwise,\varphi(G,{\bm{v}}):=\begin{cases}1&\text{if $v_{i}\mathop{\mathsf{op}}v_{j}$}\\ 0&\text{otherwise},\end{cases}

    are kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}.

We next inductively define new kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} from “old” kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}. That is, given kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} φ1(𝒙1),,φ(𝒙)\varphi_{1}({\bm{x}}_{1}),\ldots,\varphi_{\ell}({\bm{x}}_{\ell}), the following are also kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}:

  • Function applications of the form φ(𝒙):=𝐟(φ1(𝒙1),,φ(𝒙)\varphi({\bm{x}}):=\mathbf{f}(\varphi_{1}({\bm{x}}_{1}),\ldots,\varphi_{\ell}({\bm{x}}_{\ell}) are kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, where 𝒙=𝒙1𝒙{\bm{x}}={\bm{x}}_{1}\cup\cdots\cup{\bm{x}}_{\ell}, and defined by

    φ(G,𝒗):=𝐟(φ1(G,𝒗|𝒙1),,φ(G,𝒗|𝒙)).\varphi(G,{\bm{v}}):=\mathbf{f}\left(\varphi_{1}(G,{\bm{v}}|_{{\bm{x}}_{1}}),\ldots,\varphi_{\ell}(G,{\bm{v}}|_{{\bm{x}}_{\ell}})\right).

    Here, if φi(G,𝒗|𝒙i)di\varphi_{i}(G,{\bm{v}}|_{{\bm{x}}_{i}})\in\mathbb{R}^{d_{i}}, then 𝐟:d1××dd\mathbf{f}:\mathbb{R}^{d_{1}}\times\cdots\times\mathbb{R}^{d_{\ell}}\to\mathbb{R}^{d} for some dd\in\mathbb{N}. That is, φ\varphi generates an embedding in d\mathbb{R}^{d}. We remark that our function applications include concatenation.

  • Unconditional aggregations of the form φ(𝒙):=𝖺𝗀𝗀xj𝐅(φ1(𝒙,xj))\varphi({\bm{x}}):=\mathsf{agg}_{x_{j}}^{\mathbf{F}}(\varphi_{1}({\bm{x}},x_{j})) are kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, where xjXx_{j}\in X and xj𝒙x_{j}\not\in{\bm{x}}, and defined by

    φ(G,𝒗):=𝐅({{φ1(G,v1,,vj1,w,vj+1,,vk)wVG}}).\varphi(G,{\bm{v}}):=\mathbf{F}\bigl{(}\{\!\!\{\varphi_{1}(G,v_{1},\ldots,v_{j-1},w,v_{j+1},\ldots,v_{k})\mid w\in V_{G}\}\!\}\bigr{)}.

    Here, if φ1\varphi_{1} generates an embedding in d1\mathbb{R}^{d_{1}}, then 𝐅\mathbf{F} is an aggregation function assigning to multisets of vectors in d1\mathbb{R}^{d_{1}} a vector in d\mathbb{R}^{d}, for some dd\in\mathbb{N}. So, φ\varphi generates an embedding in d\mathbb{R}^{d}.

  • Conditional aggregations of the form φ(xi):=𝖺𝗀𝗀xj𝐅(φ1(xi,xj)|E(xi,xj))\varphi(x_{i}):=\mathsf{agg}_{x_{j}}^{\mathbf{F}}(\varphi_{1}(x_{i},x_{j})|E(x_{i},x_{j})) are kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, with xi,xjXx_{i},x_{j}\in X, and defined by

    φ(G,𝒗):=𝐅({{φ1(G,vi,w)wNG(vi)}}).\varphi(G,{\bm{v}}):=\mathbf{F}\bigl{(}\{\!\!\{\varphi_{1}(G,v_{i},w)\mid w\in N_{G}(v_{i})\}\!\}\bigr{)}.

    As before, if φ1\varphi_{1} generates an embedding in d1\mathbb{R}^{d_{1}}, then 𝐅\mathbf{F} is an aggregation function assigning to multisets of vectors in d1\mathbb{R}^{d_{1}} a vector in d\mathbb{R}^{d}, for some dd\in\mathbb{N}. So again, φ\varphi generates an embedding in d\mathbb{R}^{d}.

As defined in the main paper, we also consider the subclass kk-𝖬𝖯𝖭𝖭s(t)\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} by only considering kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} defined in terms of expressions of aggregation depth at most tt. Our main results, phrased in terms of kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} are:

ρ1(𝗏𝗐𝗅k(t))=ρ1(k-𝖬𝖯𝖭𝖭s(t)) and ρ0(𝗀𝗐𝗅k)=ρ0(k-𝖬𝖯𝖭𝖭s).\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(k\text{-}\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\text{ and }\rho_{0}(\mathsf{gwl}_{k})=\rho_{0}(k\text{-}\mathsf{MPNN}\text{s}).

Hence, if the embeddings computed by 𝖦𝖭𝖭s\mathsf{GNN}\text{s} are kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}, one obtains an upper bound on the separation power in terms of k-𝖶𝖫k\text{-}\mathsf{WL}.

The classical 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} (Gilmer et al., 2017) are subclass of 11-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} in which no unconditional aggregation can be used and furthermore, function applications require input embeddings with the same single variable (x1x_{1} or x2x_{2}), and only 𝟏xi=xi\bm{1}_{x_{i}=x_{i}} and 𝟏xixi\bm{1}_{x_{i}\neq x_{i}} are allowed. In other words, they correspond to guarded tensor language expressions (Section 4.2). We denote this class of 11-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} by 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and by 𝖬𝖯𝖭𝖭s(t)\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}} when restrictions on aggregation depth are in place. And indeed, the classical way of describing 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} as

φ(0)(x1)\displaystyle\varphi^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1}) =(P1(x1),,P(x1))\displaystyle=(P_{1}(x_{1}),\ldots,P_{\ell}(x_{1}))
φ(t)(x1)\displaystyle\varphi^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}) =𝐟(t)(φ(t1)(x1),𝖺𝗀𝗀𝗋x2𝐅(t)(φ(t1)(x1),φ(t1)(x2)|E(xi,xj)))\displaystyle=\mathbf{f}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\Bigl{(}\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}),\mathsf{aggr}_{x_{2}}^{\mathbf{F}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}\bigl{(}\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}),\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2})|E(x_{i},x_{j})\bigr{)}\Bigr{)}

correspond to 11-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} that satisfy the above mentioned restrictions. Without readouts, 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} compute vertex embeddings and hence, our results imply

ρ1(𝖼𝗋(t))=ρ1(𝖬𝖯𝖭𝖭s(t)).\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}).

Furthermore, 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} with a readout function fall into the category of 11-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s}:

φ:=𝖺𝗀𝗀𝗋x1𝗋𝖾𝖺𝖽𝗈𝗎𝗍(φ(t)(x1))\varphi:=\mathsf{aggr}_{x_{1}}^{\mathsf{readout}}(\varphi^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}))

where unconditional aggregation is used. Hence,

ρ0(𝗀𝖼𝗋(t))=ρ0(𝗀𝗐𝗅1(t))=ρ0(1-𝖬𝖯𝖭𝖭s(t+1)).\rho_{0}(\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{0}(\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{0}(1\text{-}\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}).

We thus see that kk-𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} gracefully extend 𝖬𝖯𝖭𝖭s\mathsf{MPNN}\text{s} and can be used for obtaining upper bounds on the separation power of classes of 𝖦𝖭𝖭s\mathsf{GNN}\text{s}.