Expressiveness and Approximation Properties of Graph Neural Networks

Floris Geerts
Department of Computer Science, University of Antwerp, Belgium
floris.geerts@uantwerpen.be
&Juan L. Reutter
School of Engineering, Pontificia Universidad Católica de Chile, Chile & IMFD, Chile
jreutter@ing.puc.cl

Abstract

Characterizing the separation power of graph neural networks ( $\mathsf{GNN}\text{s}$ ) provides an understanding of their limitations for graph learning tasks. Results regarding separation power are, however, usually geared at specific $\mathsf{GNN}$ architectures, and tools for understanding arbitrary $\mathsf{GNN}$ architectures are generally lacking. We provide an elegant way to easily obtain bounds on the separation power of $\mathsf{GNN}\text{s}$ in terms of the Weisfeiler-Leman ( $\mathsf{WL}$ ) tests, which have become the yardstick to measure the separation power of $\mathsf{GNN}\text{s}$ . The crux is to view $\mathsf{GNN}\text{s}$ as expressions in a procedural tensor language describing the computations in the layers of the $\mathsf{GNN}\text{s}$ . Then, by a simple analysis of the obtained expressions, in terms of the number of indexes and the nesting depth of summations, bounds on the separation power in terms of the $\mathsf{WL}$ -tests readily follow. We use tensor language to define Higher-Order Message-Passing Neural Networks (or $k$ - $\mathsf{MPNN}\text{s}$ ), a natural extension of $\mathsf{MPNN}\text{s}$ . Furthermore, the tensor language point of view allows for the derivation of universality results for classes of $\mathsf{GNN}\text{s}$ in a natural way. Our approach provides a toolbox with which $\mathsf{GNN}$ architecture designers can analyze the separation power of their $\mathsf{GNN}\text{s}$ , without needing to know the intricacies of the $\mathsf{WL}$ -tests. We also provide insights in what is needed to boost the separation power of $\mathsf{GNN}\text{s}$ .

1 Introduction

Graph Neural Networks ( $\mathsf{GNN}\text{s}$ ) (Merkwirth & Lengauer, 2005; Scarselli et al., 2009) cover many popular deep learning methods for graph learning tasks (see Hamilton (2020) for a recent overview). These methods typically compute vector embeddings of vertices or graphs by relying on the underlying adjacency information. Invariance (for graph embeddings) and equivariance (for vertex embeddings) of $\mathsf{GNN}\text{s}$ ensure that these methods are oblivious to the precise representation of the graphs.

Separation power.

Our primary focus is on the separation power of $\mathsf{GNN}$ architectures, i.e., on their ability to separate vertices or graphs by means of the computed embeddings. It has become standard to characterize $\mathsf{GNN}$ architectures in terms of the separation power of graph algorithms such as color refinement ( $\mathsf{CR}$ ) and $k$ -dimensional Weisfeiler-Leman tests $(k\text{-}\mathsf{WL})$ , as initiated in Xu et al. (2019) and Morris et al. (2019). Unfortunately, understanding the separation power of any given $\mathsf{GNN}$ architecture requires complex proofs, geared at the specifics of the architecture. We provide a tensor language-based technique to analyze the separation power of general $\mathsf{GNN}\text{s}$ .

Tensor languages.

Matrix query languages (Brijder et al., 2019; Geerts et al., 2021b) are defined to assess the expressive power of linear algebra. Balcilar et al. (2021a) observe that, by casting various $\mathsf{GNN}\text{s}$ into the $\mathsf{MATLANG}$ (Brijder et al., 2019) matrix query language, one can use existing separation results (Geerts, 2021) to obtain upper bounds on the separation power of $\mathsf{GNN}\text{s}$ in terms of $1\text{-}\mathsf{WL}$ and $2\text{-}\mathsf{WL}$ . In this paper, we considerably extend this approach by defining, and studying, a new general-purpose tensor language specifically designed for modeling $\mathsf{GNN}\text{s}$ . As in Balcilar et al. (2021a), our focus on tensor languages allows us to obtain new insights about $\mathsf{GNN}$ architectures. First, since tensor languages can only define invariant and equivariant graph functions, any $\mathsf{GNN}$ that can be cast in our tensor language inherits these desired properties. More importantly, the separation power of our tensor language is as closely related to $\mathsf{CR}$ and $k\text{-}\mathsf{WL}$ as $\mathsf{GNN}\text{s}$ are. Loosely speaking, if tensor language expressions use $k+1$ indices, then their separation power is bounded by $k\text{-}\mathsf{WL}$ . Furthermore, if the maximum nesting of summations in the expression is $t$ , then $t$ rounds of $k\text{-}\mathsf{WL}$ are needed to obtain an upper bound on the separation power. A similar connection is obtained for $\mathsf{CR}$ and a fragment of tensor language that we call “guarded” tensor language.

We thus reduce problem of assessing the separation power of any specific $\mathsf{GNN}$ architecture to the problem of specifying it in our tensor language, analyzing the number of indices used and counting their summation depth. This is usually much easier than dealing with intricacies of $\mathsf{CR}$ and $k\text{-}\mathsf{WL}$ , as casting $\mathsf{GNN}\text{s}$ in our tensor language is often as simple as writing down their layer-based definition. We believe that this provides a nice toolbox for $\mathsf{GNN}$ designers to assess the separation power of their architecture. We use this toolbox to recover known results about the separation power of specific $\mathsf{GNN}$ architectures such as $\mathsf{GIN}\text{s}$ (Xu et al., 2019), $\mathsf{GCN}\text{s}$ (Kipf & Welling, 2017), Folklore $\mathsf{GNN}\text{s}$ (Maron et al., 2019b), $k$ - $\mathsf{GNN}\text{s}$ (Morris et al., 2019), and several others. We also derive new results: we answer an open problem posed by Maron et al. (2019a) by showing that the separation power of Invariant Graph Networks ( $k\text{-}\mathsf{IGN}\text{s})$ , introduced by Maron et al. (2019b), is bounded by $(k-1)\text{-}\mathsf{WL}$ . In addition, we revisit the analysis by Balcilar et al. (2021b) of $\mathsf{ChebNet}$ (Defferrard et al., 2016), and show that $\mathsf{CayleyNet}$ (Levie et al., 2019) is bounded by $2\text{-}\mathsf{WL}$ .

When writing down $\mathsf{GNN}\text{s}$ in our tensor language, the less indices needed, the stronger the bounds in terms of $k\text{-}\mathsf{WL}$ we obtain. After all, $(k-1)\text{-}\mathsf{WL}$ is known to be strictly less separating than $k\text{-}\mathsf{WL}$ (Otto, 2017). Thus, it is important to minimize the number of indices used in tensor language expressions. We connect this number to the notion of treewidth: expressions of treewidth $k$ can be translated into expressions using $k+1$ indices. This corresponds to optimizing expressions, as done in many areas in machine learning, by reordering the summations (a.k.a. variable elimination).

Approximation and universality.

We also consider the ability of $\mathsf{GNN}\text{s}$ to approximate general invariant or equivariant graph functions. Once more, instead of focusing on specific architectures, we use our tensor languages to obtain general approximation results, which naturally translate to universality results for $\mathsf{GNN}\text{s}$ . We show: $(k+1)$ -index tensor language expressions suffice to approximate any (invariant/equivariant) graph function whose separating power is bounded by $k\text{-}\mathsf{WL}$ , and we can further refine this by comparing the number of rounds in $k\text{-}\mathsf{WL}$ with the summation depth of the expressions. These results provide a finer picture than the one obtained by Azizian & Lelarge (2021). Furthermore, focusing on “guarded” tensor expressions yields a similar universality result for $\mathsf{CR}$ , a result that, to our knowledge, was not known before. We also provide the link between approximation results for tensor expressions and $\mathsf{GNN}\text{s}$ , enabling us to transfer our insights into universality properties of $\mathsf{GNN}\text{s}$ . As an example, we show that $k\text{-}\mathsf{IGN}\text{s}$ can approximate any graph function that is less separating than $(k-1)\text{-}\mathsf{WL}$ . This case was left open in Azizian & Lelarge (2021).

In summary, we draw new and interesting connections between tensor languages, $\mathsf{GNN}$ architectures and classic graph algorithms. We provide a general recipe to bound the separation power of $\mathsf{GNN}\text{s}$ , optimize them, and understand their approximation power. We show the usefulness of our method by recovering several recent results, as well as new results, some of them left open in previous work.

Related work.

Separation power has been studied for specific classes of $\mathsf{GNN}\text{s}$ (Morris et al., 2019; Xu et al., 2019; Maron et al., 2019b; Chen et al., 2019; Morris et al., 2020; Azizian & Lelarge, 2021). A first general result concerns the bounds in terms of $\mathsf{CR}$ and $1\text{-}\mathsf{WL}$ of Message-Passing Neural Networks (Gilmer et al., 2017; Morris et al., 2019; Xu et al., 2019). Balcilar et al. (2021a) use the $\mathsf{MATLANG}$ matrix query language to obtain upper bounds on the separation power of various $\mathsf{GNN}\text{s}$ . $\mathsf{MATLANG}$ can only be used to obtain bounds up to $2\text{-}\mathsf{WL}$ and is limited to matrices. Our tensor language is more general and flexible and allows for reasoning over the number of indices, treewidth, and summation depth of expressions. These are all crucial for our main results. The tensor language introduced resembles $\mathsf{sum}$ - $\mathsf{MATLANG}$ (Geerts et al., 2021b), but with the added ability to represent tensors. Neither separation power nor guarded fragments were considered in Geerts et al. (2021b). See Section A in the supplementary material for more details. For universality, Azizian & Lelarge (2021) is closest in spirit. Our approach provides an elegant way to recover and extend their results. Azizian & Lelarge (2021) describe how their work (and hence also ours) encompasses previous works (Keriven & Peyré, 2019; Maron et al., 2019c; Chen et al., 2019). Our results use connections between $k\text{-}\mathsf{WL}$ and logics (Immerman & Lander, 1990; Cai et al., 1992), and $\mathsf{CR}$ and guarded logics (Barceló et al., 2020). The optimization of algebraic computations and the use of treewidth relates to the approaches by Aji & McEliece (2000) and Abo Khamis et al. (2016).

2 Background

We denote sets by $\{\}$ and multisets by $\{\!\{\}\!\}$ . For $n\in\mathbb{N}$ , $n>0$ , $[n]:=\{1,\ldots,n\}$ . Vectors are denoted by ${\bm{v}},{\bm{w}},\ldots$ , matrices by ${\bm{A}},{\bm{B}},\ldots$ , and tensors by ${\bm{\mathsfit{S}}},{\bm{\mathsfit{T}}},\ldots$ . Furthermore, ${v}_{i}$ is the $i$ -th entry of vector ${\bm{v}}$ , ${A}_{ij}$ is the $(i,j)$ -th entry of matrix ${\bm{A}}$ and ${\mathsfit{S}}_{{\bm{i}}}$ denotes the ${\bm{i}}=(i_{1},\ldots,i_{k})$ -th entry of a tensor ${\bm{\mathsfit{S}}}$ . If certain dimensions are unspecified, then this is denoted by a “ $:$ ”. For example, ${\bm{A}}_{i:}$ and ${\bm{A}}_{:j}$ denote the $i$ -th row and $j$ -th column of matrix ${\bm{A}}$ , respectively. Similarly for slices of tensors.

We consider undirected simple graphs $G=(V_{G},E_{G},\mathsf{col}_{G})$ equipped with a vertex-labelling $\mathsf{col}_{G}:V_{G}\to\mathbb{R}^{\ell}$ . We assume that graphs have size $n$ , so $V_{G}$ consists of $n$ vertices and we often identify $V_{G}$ with $[n]$ . For a vertex $v\in V_{G}$ , $N_{G}(v):=\{u\in V_{G}\mid vu\in E_{G}\}$ . We let $\mathcal{G}$ be the set of all graphs of size $n$ and let $\mathcal{G}_{s}$ be the set of pairs $(G,{\bm{v}})$ with $G\in\mathcal{G}$ and ${\bm{v}}\in V_{G}^{s}$ . Note that $\mathcal{G}=\mathcal{G}_{0}$ .

The color refinement algorithm ( $\mathsf{CR})$ (Morgan, 1965) iteratively computes vertex labellings based on neighboring vertices, as follows. For a graph $G$ and vertex $v\in V_{G}$ , $\mathsf{cr}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(G,v):=\mathsf{col}_{G}(v)$ . Then, for $t\geq 0$ , $\mathsf{cr}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(G,v):=\left(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,v),\{\!\{\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,u)\mid u\in N_{G}(v)\}\!\}\right)$ . We collect all vertex labels to obtain a label for the entire graph by defining $\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G):=\{\!\{\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,v)\mid v\in V_{G}\}\!\}$ . The $k$ -dimensional Weisfeiler-Leman algorithm ( $k\text{-}\mathsf{WL}$ ) (Cai et al., 1992) iteratively computes labellings of $k$ -tuples of vertices. For a $k$ -tuple ${\bm{v}}$ , its atomic type in $G$ , denoted by $\mathsf{atp}_{k}(G,{\bm{v}})$ , is a vector in $\mathbb{R}^{2\genfrac{(}{)}{0.0pt}{4}{k}{2}+k\ell}$ . The first $\binom{k}{2}$ entries are $0/1$ -values encoding the equality type of ${\bm{v}}$ , i.e., whether ${v}_{i}={v}_{j}$ for $1\leq i<j\leq k$ . The second $\binom{k}{2}$ entries are $0/1$ -values encoding adjacency information, i.e., whether ${v}_{i}{v}_{j}\in E_{G}$ for $1\leq i<j\leq k$ . The last $k\ell$ real-valued entries correspond to $\mathsf{col}_{G}({v}_{i})\in\mathbb{R}^{\ell}$ for $1\leq i\leq k$ . Initially, for a graph $G$ and ${\bm{v}}\in V_{G}^{k}$ , $k\text{-}\mathsf{WL}$ assigns the label $\mathsf{wl}_{k}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(G,{\bm{v}}):=\mathsf{atp}_{k}(G,{\bm{v}})$ . For $t\geq 0$ , $k\text{-}\mathsf{WL}$ revises the label according to $\mathsf{wl}_{k}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(G,{\bm{v}}):=(\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,{\bm{v}}),M)$ with $M:=\left\{\!\left\{\left(\mathsf{atp}_{k+1}(G,{\bm{v}}u),\allowbreak\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,{\bm{v}}[u/1]),\ldots,\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,{\bm{v}}[u/k])\right)\mid u\in V_{G}\right\}\!\right\}$ , where ${\bm{v}}[u/i]:=({v}_{1},\allowbreak\ldots,{v}_{i-1},u,{v}_{i+1},\ldots,{v}_{k})$ . We use $k\text{-}\mathsf{WL}$ to assign labels to vertices and graphs by defining: $\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,v):=\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,(v,\ldots,v))$ , for vertex-labellings, and $\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:=\{\!\{\mathsf{wl}_{k}^{(t)}(G,{\bm{v}})\mid{\bm{v}}\in V_{G}^{k}\}\!\}$ , for graph-labellings. We use $\mathsf{cr}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}$ , $\mathsf{gcr}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}$ , $\mathsf{vwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}$ , and $\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}$ to denote the stable labellings produced by the corresponding algorithm over an arbitrary number of rounds. Our version of $1\text{-}\mathsf{WL}$ differs from $\mathsf{CR}$ in that $1\text{-}\mathsf{WL}$ also uses information from non-adjacent vertices; this distinction only matters for vertex embeddings (Grohe, 2021). We use the “folklore” $k\text{-}\mathsf{WL}$ of Cai et al. (1992), except Cai et al. use $1\text{-}\mathsf{WL}$ to refer to $\mathsf{CR}$ . While equivalent to “oblivious” $(k+1)\text{-}\mathsf{WL}$ (Grohe, 2021), used in some other works on $\mathsf{GNN}\text{s}$ , care is needed when comparing to our work.

Let $G$ be a graph with $V_{G}=[n]$ and let $\sigma$ be a permutation of $[n]$ . We denote by $\sigma\star G$ the isomorphic copy of $G$ obtained by applying the permutation $\sigma$ . Similarly, for ${\bm{v}}\in V_{G}^{k}$ , $\sigma\star{\bm{v}}$ is the permuted version of ${\bm{v}}$ . Let ${\mathbb{F}}$ be some feature space. A function $f:\mathcal{G}_{0}\to{\mathbb{F}}$ is called invariant if $f(G)=f(\sigma\star G)$ for any permutation $\pi$ . More generally, $f:\mathcal{G}_{s}\to{\mathbb{F}}$ is equivariant if $f(\sigma\star G,\sigma\star{\bm{v}})=f(G,{\bm{v}})$ for any permutation $\sigma$ . The functions $\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathcal{G}_{1}\to{\mathbb{F}}$ and $\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathcal{G}_{1}\to{\mathbb{F}}$ are equivariant, whereas $\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathcal{G}_{0}\to{\mathbb{F}}$ and $\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathcal{G}_{0}\to{\mathbb{F}}$ are invariant, for any $t\geq 0$ and $k\geq 1$ .

3 Specifying GNNs

Many $\mathsf{GNN}\text{s}$ use linear algebra computations on vectors, matrices or tensors, interleaved with the application of activation functions or $\mathsf{MLP}\text{s}$ . To understand the separation power of $\mathsf{GNN}\text{s}$ , we introduce a specification language, $\mathsf{TL}$ , for tensor language, that allows us to specify any algebraic computation in a procedural way by explicitly stating how each entry is to be computed. We gauge the separation power of $\mathsf{GNN}\text{s}$ by specifying them as $\mathsf{TL}$ expressions, and syntactically analyzing the components of such $\mathsf{TL}$ expressions. This technique gives rise to Higher-Order Message-Passing Neural Networks (or $k$ - $\mathsf{MPNN}\text{s}$ ), a natural extension of $\mathsf{MPNN}\text{s}$ (Gilmer et al., 2017). For simplicity, we present $\mathsf{TL}$ using summation aggregation only but arbitrary aggregation functions on multisets of real values can be used as well (Section C.5 in the supplementary material).

To introduce $\mathsf{TL}$ , consider a typical layer in a $\mathsf{GNN}$ of the form ${\bm{F}}^{\prime}=\sigma({\bm{A}}\cdot{\bm{F}}\cdot{\bm{W}})$ , where ${\bm{A}}\in\mathbb{R}^{n\times n}$ is an adjacency matrix, ${\bm{F}}\in\mathbb{R}^{n\times\ell}$ are vertex features such that ${\bm{F}}_{i:}\in\mathbb{R}^{\ell}$ is the feature vector of vertex $i$ , $\sigma$ is a non-linear activation function, and ${\bm{W}}\in\mathbb{R}^{\ell\times\ell}$ is a weight matrix. By exposing the indices in the matrices and vectors we can equivalently write: for $i\in[n]$ and $s\in[\ell]$ :

{F}_{is}^{\prime}:=\sigma\Bigl{(}\textstyle\sum_{j\in[n]}{A}_{ij}\cdot\bigl{(}\textstyle\sum_{t\in[\ell]}{W}_{ts}\cdot{F}_{jt}\bigr{)}\Bigr{)}.

In $\mathsf{TL}$ , we do not work with specific matrices or indices ranging over $[n]$ , but focus instead on expressions applicable to any matrix. We use index variables $x_{1}$ and $x_{2}$ instead of $i$ and $j$ , replace ${A}_{ij}$ with a placeholder $E(x_{1},x_{2})$ and ${F}_{jt}$ with placeholders $P_{t}(x_{2})$ , for $t\in[\ell]$ . We then represent the above computation in $\mathsf{TL}$ by $\ell$ expressions $\psi_{s}(x_{1})$ , one for each feature column, as follows:

\psi_{s}(x_{1})=\sigma\Bigl{(}\textstyle\sum_{x_{2}}E(x_{1},x_{2})\cdot\bigl{(}\textstyle\sum_{t\in[\ell]}{W}_{ts}\cdot P_{t}(x_{2})\bigr{)}\Bigr{)}.

These are pure syntactical expressions. To give them a semantics, we assign to $E$ a matrix ${\bm{A}}\in\mathbb{R}^{n\times n}$ , to $P_{t}$ column vectors ${\bm{F}}_{:t}\in\mathbb{R}^{n\times 1}$ , for $t\in[\ell]$ , and to $x_{1}$ an index $i\in[n]$ . By letting the variable $x_{2}$ under the summation range over $1,2,\ldots,n$ , the $\mathsf{TL}$ expression $\psi_{s}(i)$ evaluates to ${F}_{is}^{\prime}$ . As such, ${\bm{F}}^{\prime}=\sigma({\bm{A}}\cdot{\bm{F}}\cdot{\bm{W}})$ can be represented as a specific instance of the above $\mathsf{TL}$ expressions. Throughout the paper we reason about expressions in $\mathsf{TL}$ rather than specific instances thereof. Importantly, by showing that certain properties hold for expressions in $\mathsf{TL}$ , these properties are inherited by all of its instances. We use $\mathsf{TL}$ to enable a theoretical analysis of the separating power of $\mathsf{GNN}\text{s}$ ; It is not intended as a practical programming language for $\mathsf{GNN}\text{s}$ .

Syntax.

We first give the syntax of $\mathsf{TL}$ expressions. We have a binary predicate $E$ , to represent adjacency matrices, and unary vertex predicates $P_{s}$ , $s\in[\ell]$ , to represent column vectors encoding the $\ell$ -dimensional vertex labels. In addition, we have a (possibly infinite) set $\Omega$ of functions, such as activation functions or $\mathsf{MLP}\text{s}$ . Then, $\mathsf{TL}(\Omega)$ expressions are defined by the following grammar:

\varphi:=\bm{1}_{x\mathop{\mathsf{op}}y}\,\,|\,\,E(x,y)\,\,|\,\,P_{s}(x)\,\,|\,\,\varphi\cdot\varphi\,\,|\,\,\varphi+\varphi\,\,|\,\,a\cdot\varphi\,\,|\,\,f(\varphi,\ldots,\varphi)\,\,|\,\,\textstyle{\sum}_{x}\,\varphi

where $\mathop{\mathsf{op}}\in\{=,\neq\}$ , $x,y$ are index variables that specify entries in tensors, $s\in[\ell]$ , $a\in\mathbb{R}$ , and $f\in\Omega$ . Summation aggregation is captured by $\sum_{x}\varphi$ .¹¹1We can replace $\sum_{x}\varphi$ by a more general aggregation construct $\mathsf{aggr}_{x}^{F}(\varphi)$ for arbitrary functions $F$ that assign a real value to multisets of real values. We refer to the supplementary material (Section C.5) for details. We sometimes make explicit which functions are used in expressions in $\mathsf{TL}(\Omega)$ by writing $\mathsf{TL}(f_{1},f_{2},\ldots)$ for $f_{1},f_{2},\ldots$ in $\Omega$ . For example, the expressions $\psi_{s}(x_{1})$ described earlier are in $\mathsf{TL}(\sigma)$ .

The set of free index variables of an expression $\varphi$ , denoted by $\mathsf{free}(\varphi)$ , determines the order of the tensor represented by $\varphi$ . It is defined inductively: $\mathsf{free}(\bm{1}_{x\mathop{\mathsf{op}}y})=\mathsf{free}(E(x,y)):=\{x,y\}$ , $\mathsf{free}(P_{s}(x))=\{x\}$ , $\mathsf{free}(\varphi_{1}\cdot\varphi_{2})=\mathsf{free}(\varphi_{1}+\varphi_{2}):=\mathsf{free}(\varphi_{1})\cup\mathsf{free}(\varphi_{2})$ , $\mathsf{free}(a\cdot\varphi_{1}):=\mathsf{free}(\varphi_{1})$ , $\mathsf{free}(f(\varphi_{1},\ldots,\varphi_{p})):=\cup_{i\in[p]}\mathsf{free}(\varphi_{i})$ , and $\mathsf{free}(\sum_{x}\varphi_{1}):=\mathsf{free}(\varphi_{1})\setminus\{x\}$ . We sometimes explicitly write the free indices. In our example expressions $\psi_{s}(x_{1})$ , $x_{1}$ is the free index variable.

An important class of expressions are those that only use index variables $\{x_{1},\ldots,x_{k}\}$ . We denote by $\mathsf{TL}_{k}(\Omega)$ the $k$ -index variable fragment of $\mathsf{TL}(\Omega)$ . The expressions $\psi_{s}(x_{1})$ are in $\mathsf{TL}_{2}(\sigma)$ .

Semantics.

We next define the semantics of expressions in $\mathsf{TL}(\Omega)$ . Let $G=(V_{G},E_{G},\mathsf{col}_{G})$ be a vertex-labelled graph. We start by defining the interpretation $[\![\cdot,\nu]\!]_{G}$ of the predicates $E$ , $P_{s}$ and the (dis)equality predicates, relative to $G$ and a valuation $\nu$ assigning a vertex to each index variable:

\begin{array}[]{rcl@{\hspace*{3.7mm}}rcl}[\![E(x,y),\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!&\mathrm{if~{}}\nu(x)\nu(y)\in E_{G}\text{~{}then~{}}1\text{~{}else~{}}0\hfil\hskip 10.5275pt&[\![P_{s}(x),\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!&\mathsf{col}_{G}(\nu(x))_{s}\in\mathbb{R}\\ [\![\bm{1}_{x\mathop{\mathsf{op}}y},\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!&\mathrm{if~{}}\nu(x)\mathop{\mathsf{op}}\nu(y)\text{~{}then~{}}1\text{~{}else~{}}0.\hfil\hskip 10.5275pt\end{array}

In other words, $E$ is interpreted as the adjacency matrix of $G$ and the $P_{s}$ ’s interpret the vertex-labelling $\mathsf{col}_{G}$ . Furthermore, we lift interpretations to arbitrary expressions in $\mathsf{TL}(\Omega)$ , as follows:

\begin{array}[]{rcl@{\hspace*{3.7mm}}rcl}[\![\varphi_{1}\cdot\varphi_{2},\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!\!&[\![\varphi_{1},\nu]\!]_{G}\cdot[\![\varphi_{2},\nu]\!]_{G}\hfil\hskip 10.5275pt&\!\![\![\varphi_{1}\!+\!\varphi_{2},\nu]\!]_{G}&\!\!\!\!\!\!:=\!\!\!\!\!\!&[\![\varphi_{1},\nu]\!]_{G}+[\![\varphi_{2},\nu]\!]_{G}\\ [\![\sum_{x}\,\varphi_{1},\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!\!&\sum_{v\in V_{G}}[\![\varphi_{1},\nu[x\mapsto v]]\!]_{G}\hfil\hskip 10.5275pt&[\![a\cdot\varphi_{1},\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!&a\cdot[\![\varphi_{1},\nu]\!]_{G}\\ \!\![\![f(\varphi_{1},\ldots,\varphi_{p}),\nu]\!]_{G}&\!\!\!\!:=\!\!\!\!\!\!&f([\![\varphi_{1},\nu]\!]_{G},\ldots,[\![\varphi_{p},\nu]\!]_{G})\hfil\hskip 10.5275pt\end{array}

where, $\nu[x\mapsto v]$ is the valuation $\nu$ but which now maps the index $x$ to the vertex $v\in V_{G}$ . For simplicity, we identify valuations with their images. For example, $[\![\varphi(x),v]\!]_{G}$ denotes $[\![\varphi(x),x\mapsto v]\!]_{G}$ . To illustrate the semantics, for each $v\in V_{G}$ , our example expressions satisfy $[\![\psi_{s},v]\!]_{G}={F}_{vs}^{\prime}$ for ${\bm{F}}^{\prime}=\sigma({\bm{A}}\cdot{\bm{F}}\cdot{\bm{W}})$ when ${\bm{A}}$ is the adjacency matrix of $G$ and ${\bm{F}}$ represents the vertex labels.

$k$ -MPNNs.

Consider a function $f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}:(G,{\bm{v}})\mapsto f(G,{\bm{v}})\in\mathbb{R}^{\ell}$ for some $\ell\in\mathbb{N}$ . We say that the function $f$ can be represented in $\mathsf{TL}(\Omega)$ if there exists $\ell$ expressions $\varphi_{1}(x_{1},\ldots,x_{s}),\ldots,\varphi_{\ell}(x_{1},\ldots,x_{s})$ in $\mathsf{TL}(\Omega)$ such that for each graph $G$ and each $s$ -tuple ${\bm{v}}\in V_{G}^{s}$ :

f(G,{\bm{v}})=\bigl{(}[\![\varphi_{1},{\bm{v}}]\!]_{G},\ldots,[\![\varphi_{\ell},{\bm{v}}]\!]_{G}\bigr{)}.

Of particular interest are $k$ th-order $\mathsf{MPNN}\text{s}$ (or $k$ - $\mathsf{MPNN}\text{s}$ ) which refers to the class of functions that can be represented in $\mathsf{TL}_{k+1}(\Omega)$ . We can regard $\mathsf{GNN}\text{s}$ as functions $f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}$ . Hence, a $\mathsf{GNN}$ is a $k$ - $\mathsf{MPNN}$ if its corresponding functions are $k$ - $\mathsf{MPNN}\text{s}$ . For example, we can interpret ${\bm{F}}^{\prime}=\sigma({\bm{A}}\cdot{\bm{F}}\cdot{\bm{W}})$ as a function $f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}$ such that $f(G,v):={\bm{F}}_{v:}^{\prime}$ . We have seen that for each $s\in[\ell]$ , $[\![\psi_{s},v]\!]_{G}={F}_{vs}^{\prime}$ with $\psi_{s}\in\mathsf{TL}_{2}(\sigma)$ . Hence, $f(G,v)=([\![\psi_{1},v]\!]_{G},\ldots,[\![\psi_{\ell},v]\!]_{G})$ and thus $f$ belongs to $1$ - $\mathsf{MPNN}\text{s}$ and our example $\mathsf{GNN}$ is a $1$ - $\mathsf{MPNN}$ .

TL represents equivariant or invariant functions.

We make a simple observation which follows from the type of operators allowed in expressions in $\mathsf{TL}(\Omega)$ .

Proposition 3.1.

Any function $f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}$ represented in $\mathsf{TL}(\Omega)$ is equivariant (invariant if $s=0$ ).

An immediate consequence is that when a $\mathsf{GNN}$ is a $k$ - $\mathsf{MPNN}$ , it is automatically invariant or equivariant, depending on whether graph or vertex tuple embeddings are considered.

4 Separation Power of Tensor Languages

Our first main results concern the characterization of the separation power of tensor languages in terms of the color refinement and $k$ -dimensional Weisfeiler-Leman algorithms. We provide a fine-grained characterization by taking the number of rounds of these algorithms into account. This will allow for measuring the separation power of classes of $\mathsf{GNN}\text{s}$ in terms of their number of layers.

4.1 Separation Power

We define the separation power of graph functions in terms of an equivalence relation, based on the definition from Azizian & Lelarge (2021), hereby first focusing on their ability to separate vertices.²²2We differ slightly from Azizian & Lelarge (2021) in that they only define equivalence relations on graphs.

Definition 1.

Let $\mathcal{F}$ be a set of functions $f:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{f}}$ . The equivalence relation $\rho_{1}(\mathcal{F})$ is defined by $\mathcal{F}$ on $\mathcal{G}_{1}$ as follows: $\bigl{(}(G,v),(H,w)\bigr{)}\in\rho_{1}(\mathcal{F})\Longleftrightarrow\forall f\in\mathcal{F},f(G,v)=f(H,w)$ . ∎

In other words, when $((G,v),(H,w))\in\rho_{1}(\mathcal{F})$ , no function in $\mathcal{F}$ can separate $v$ in $G$ from $w$ in $H$ . For example, we can view $\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ and $\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ as functions from $\mathcal{G}_{1}$ to some $\mathbb{R}^{\ell}$ . As such $\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ and $\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ measure the separation power of these algorithms. The following strict inclusions are known: for all $k\geq 1$ , $\rho_{1}(\mathsf{vwl}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subset\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ and $\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subset\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ (Otto, 2017; Grohe, 2021). It is also known that more rounds ( $t$ ) increase the separation power of these algorithms (Fürer, 2001).

For a fragment $\mathcal{L}$ of $\mathsf{TL}(\Omega)$ expressions, we define $\rho_{1}(\mathcal{L})$ as the equivalence relation associated with all functions $f:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{f}}$ that can be represented in $\mathcal{L}$ . By definition, we here thus consider expressions in $\mathsf{TL}(\Omega)$ with one free index variable resulting in vertex embeddings.

4.2 Main Results

We first provide a link between $k\text{-}\mathsf{WL}$ and tensor language expressions using $k+1$ index variables:

Theorem 4.1.

For each $k\geq 1$ and any collection $\Omega$ of functions, $\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{TL}_{k+1}(\Omega)\bigr{)}$ .

This theorem gives us new insights: if we wish to understand how a new $\mathsf{GNN}$ architecture compares against the $k\text{-}\mathsf{WL}$ algorithms, all we need to do is to show that such an architecture can be represented in $\mathsf{TL}_{k+1}(\Omega)$ , i.e., is a $k$ - $\mathsf{MPNN}$ , an arguably much easier endeavor. As an example of how to use this result, it is well known that triangles can be detected by $2\text{-}\mathsf{WL}$ but not by $1\text{-}\mathsf{WL}$ . Thus, in order to design $\mathsf{GNN}\text{s}$ that can detect triangles, layer definitions in $\mathsf{TL}_{3}$ rather than $\mathsf{TL}_{2}$ should be used.

We can do much more, relating the rounds of $k\text{-}\mathsf{WL}$ to the notion of summation depth of $\mathsf{TL}(\Omega)$ expressions. We also present present similar results for functions computing graph embeddings.

The summation depth $\mathsf{sd}(\varphi)$ of a $\mathsf{TL}(\Omega)$ expression $\varphi$ measures the nesting depth of the summations $\sum_{x}$ in the expression. It is defined inductively: $\mathsf{sd}(\bm{1}_{x\mathop{\mathsf{op}}y})=\mathsf{sd}(E(x,y))=\mathsf{sd}(P_{s}(x)):=0$ , $\mathsf{sd}(\varphi_{1}\cdot\varphi_{2})=\mathsf{sd}(\varphi_{1}+\varphi_{2}):=\mathsf{max}\{\mathsf{sd}(\varphi_{1}),\mathsf{sd}(\varphi_{2})\}$ , $\mathsf{sd}(a\cdot\varphi_{1}):=\mathsf{sd}(\varphi_{1})$ , $\mathsf{sd}(f(\varphi_{1},\ldots,\varphi_{p})):=\mathsf{max}\{\mathsf{sd}(\varphi_{i})|i\in[p]\}$ , and $\mathsf{sd}(\sum_{x}\varphi_{1}):=\mathsf{sd}(\varphi_{1})+1$ . For example, expressions $\psi_{s}(x_{1})$ above have summation depth one. We write $\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ for the class of expressions in $\mathsf{TL}_{k+1}(\Omega)$ of summation depth at most $t$ , and use $k$ - $\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ for the corresponding class of $k$ - $\mathsf{MPNN}\text{s}$ . We can now refine Theorem 4.1, taking into account the number of rounds used in $k\text{-}\mathsf{WL}$ .

Theorem 4.2.

For all $t\geq 0$ , $k\geq 1$ and any collection $\Omega$ of functions, $\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}$ .

Guarded TL and color refinement.

As noted by Barceló et al. (2020), the separation power of vertex embeddings of simple $\mathsf{GNN}\text{s}$ , which propagate information only through neighboring vertices, is usually weaker than that of $1\text{-}\mathsf{WL}$ . For these types of architectures, Barceló et al. (2020) provide a relation with the weaker color refinement algorithm, but only in the special case of first-order classifiers. We can recover and extend this result in our general setting, with a guarded version of $\mathsf{TL}$ which, as we will show, has the same separation power as color refinement.

The guarded fragment $\mathsf{GTL}(\Omega)$ of $\mathsf{TL}_{2}(\Omega)$ is inspired by the use of adjacency matrices in simple $\mathsf{GNN}\text{s}$ . In $\mathsf{GTL}(\Omega)$ only equality predicates $\bm{1}_{x_{i}=x_{i}}$ (constant $1$ ) and $\bm{1}_{x_{i}\neq x_{i}}$ (constant $0$ ) are allowed, addition and multiplication require the component expressions to have the same (single) free index, and summation must occur in a guarded form $\sum_{x_{j}}\bigl{(}E(x_{i},x_{j})\cdot\varphi(x_{j})\bigr{)}$ , for $i,j\in[2]$ . Guardedness means that summation only happens over neighbors. In $\mathsf{GTL}(\Omega)$ , all expressions have a single free variable and thus only functions from $\mathcal{G}_{1}$ can be represented. Our example expressions $\psi_{s}(x_{1})$ are guarded. The fragment $\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ consists of expressions in $\mathsf{GTL}(\Omega)$ of summation depth at most $t$ . We denote by $\mathsf{MPNN}\text{s}$ and $\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ the corresponding “guarded” classes of $1$ - $\mathsf{MPNN}\text{s}$ .³³3For the connection to classical $\mathsf{MPNN}\text{s}$ (Gilmer et al., 2017), see Section H in the supplementary material.

Theorem 4.3.

For all $t\geq 0$ and any collection $\Omega$ of functions: $\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}$ .

As an application of this theorem, to detect the existence of paths of length $t$ , the number of guarded layers in $\mathsf{GNN}\text{s}$ should account for a representation in $\mathsf{GTL}(\Omega)$ of summation depth of at least $t$ . We recall that $\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subset\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ which, combined with our previous results, implies that $\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ (resp., $1$ - $\mathsf{MPNN}\text{s}$ ) is strictly more separating than $\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ (resp., $\mathsf{MPNN}\text{s}$ ).

Graph embeddings.

We next establish connections between the graph versions of $k\text{-}\mathsf{WL}$ and $\mathsf{CR}$ , and $\mathsf{TL}$ expressions without free index variables. To this aim, we use $\rho_{0}(\mathcal{F})$ , for a set $\mathcal{F}$ of functions $f:\mathcal{G}\to\mathbb{R}^{\ell_{f}}$ , as the equivalence relation over $\mathcal{G}$ defined in analogy to $\rho_{1}$ : $(G,H)\in\rho_{0}(\mathcal{F})\Longleftrightarrow\forall f\in\mathcal{F},f(G)=f(H)$ . We thus consider separation power on the graph level. For example, we can consider $\rho_{0}(\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ and $\rho_{0}(\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ for any $t\geq 0$ and $k\geq 1$ . Also here, $\rho_{0}(\mathsf{gwl}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subset\rho_{0}(\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ but different from vertex embeddings, $\rho_{0}(\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{0}(\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ (Grohe, 2021). We define $\rho_{0}(\mathcal{L})$ for a fragment $\mathcal{L}$ of $\mathsf{TL}(\Omega)$ by considering expressions without free index variables.

The connection between the number of index variables in expressions and $k\text{-}\mathsf{WL}$ remains to hold. Apart from $k=1$ , no clean relationship exists between summation depth and rounds, however.⁴⁴4Indeed, the best one can obtain for general tensor logic expressions is $\rho_{0}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t+k\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)\bigr{)}$ . This follows from Cai et al. (1992) and connections to finite variable logics.

Theorem 4.4.

For all $t\geq 0$ , $k\geq 1$ and any collection $\Omega$ of functions, we have that:

\text{(1)}\quad\rho_{0}\bigl{(}\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\quad\quad\text{(2)}\quad\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{TL}_{k+1}(\Omega)\bigr{)}.

Intuitively, in (1) the increase in summation depth by one is incurred by the additional aggregation needed to collect all vertex labels computed by $\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ .

Optimality of number of indices.

Our results so far tell that graph functions represented in $\mathsf{TL}_{k+1}(\Omega)$ are at most as separating as $k\text{-}\mathsf{WL}$ . What is left unaddressed is whether all $k+1$ index variables are needed for the graph functions under consideration. It may well be, for example, that there exists an equivalent expression using less index variables. This would imply a stronger upper bound on the separation power by $\ell\text{-}\mathsf{WL}$ for $\ell<k$ . We next identify a large class of $\mathsf{TL}(\Omega)$ expressions, those of treewidth $k$ , for which the number of index variables can be reduced to $k+1$ .

Proposition 4.5.

Expressions in $\mathsf{TL}(\Omega)$ of treewidth $k$ are equivalent to expressions in $\mathsf{TL}_{k+1}(\Omega)$ .

Treewidth is defined in the supplementary material (Section G) and a treewidth of $k$ implies that the computation of tensor language expressions can be decomposed, by reordering summations, such that each local computation requires at most $k+1$ indices (see also Aji & McEliece (2000)). As a simple example, consider $\theta(x_{1})=\textstyle\sum_{x_{2}}\textstyle\sum_{x_{3}}E(x_{1},x_{2})\cdot E(x_{2},x_{3})$ in $\mathsf{TL}_{3}^{\!\scalebox{0.6}{(}2\scalebox{0.6}{)}}$ such that $[\![\theta,v]\!]_{G}$ counts the number of paths of length two starting from $v$ . This expression has a treewidth of one. And indeed, it is equivalent to the expression $\tilde{\theta}(x_{1})=\sum_{x_{2}}E(x_{1},x_{2})\cdot\bigl{(}\sum_{x_{1}}E(x_{2},x_{1})\bigr{)}$ in $\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}2\scalebox{0.6}{)}}$ (and in fact in $\mathsf{GTL}^{\!\scalebox{0.6}{(}2\scalebox{0.6}{)}}$ ). As a consequence, no more vertices can be separated by $\theta(x_{1})$ than by $\mathsf{cr}^{\scalebox{0.6}{(}2\scalebox{0.6}{)}}$ , rather than $\mathsf{vwl}_{2}^{{2}}$ as the original expression in $\mathsf{TL}_{3}^{\!\scalebox{0.6}{(}2\scalebox{0.6}{)}}$ suggests.

On the impact of functions.

All separation results for $\mathsf{TL}(\Omega)$ and fragments thereof hold irregardless of the chosen functions in $\Omega$ , including when no functions are present at all. Function applications hence do not add expressive power. While this may seem counter-intuitive, it is due to the presence of summation and multiplication in $\mathsf{TL}$ that are enough to separate graphs or vertices.

5 Consequences for GNNs

We next interpret the general results on the separation power from Section 4 in the context of $\mathsf{GNN}\text{s}$ .

1. The separation power of any vertex embedding $\mathsf{GNN}$ architecture which is an $\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ is bounded by the power of $t$ rounds of color refinement.

We consider the Graph Isomorphism Networks ( $\mathsf{GIN}\text{s})$ (Xu et al., 2019) and show that these are $\mathsf{MPNN}\text{s}$ . To do so, we represent them in $\mathsf{GTL}(\Omega)$ . Let $\mathsf{gin}$ be such a network; it updates vertex embeddings as follows. Initially, $\mathsf{gin}^{(0)}:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{0}}:(G,v)\mapsto{\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{v:}:=\mathsf{col}_{G}(v)\in\mathbb{R}^{\ell_{0}}$ . For layer $t>0$ , $\mathsf{gin}^{(t)}:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{t}}$ is given by: $(G,v)\mapsto{\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v:}:=\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigl{(}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v:},\textstyle\sum_{u\in N_{G}(v)}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{u:}\bigr{)}$ , with ${\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\in\mathbb{R}^{n\times\ell_{t}}$ and $\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}=(\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{1},\ldots,\mathsf{mlp}_{\ell_{t}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}):\mathbb{R}^{2\ell_{t-1}}\to\mathbb{R}^{\ell_{t}}$ is an $\mathsf{MLP}$ . We denote by $\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ the class of $\mathsf{GIN}\text{s}$ consisting $t$ layers. Clearly, $\mathsf{gin}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}$ can be represented in $\mathsf{GTL}^{\!\scalebox{0.6}{(}0\scalebox{0.6}{)}}$ by considering the expressions $\varphi_{i}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1}):=P_{i}(x_{1})$ for each $i\in[\ell_{0}]$ . To represent $\mathsf{gin}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , assume that we have $\ell_{t-1}$ expressions $\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})$ in $\mathsf{GTL}^{\!\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(\Omega)$ representing $\mathsf{gin}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}$ . That is, we have $[\![\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}},v]\!]_{G}={F}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{vi}$ for each vertex $v$ and $i\in[\ell_{t-1}]$ . Then $\mathsf{gin}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ is represented by $\ell_{t}$ expressions $\varphi_{i}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})$ defined as:

\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{i}\Bigl{(}\varphi_{1}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}),\ldots,\varphi_{\ell_{t-1}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}),\textstyle\sum_{x_{2}}E(x_{1},x_{2})\cdot\varphi_{1}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2}),\ldots,\textstyle\sum_{x_{2}}E(x_{1},x_{2})\cdot\varphi_{\ell_{t-1}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2})\Bigr{)},

which are now expressions in $\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ where $\Omega$ consists of $\mathsf{MLP}\text{s}$ . We have $[\![\varphi_{i}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},v]\!]_{G}={F}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v,i}$ for each $v\in V_{G}$ and $i\in[\ell_{t}]$ , as desired. Hence, Theorem 4.3 tells that $t$ -layered $\mathsf{GIN}\text{s}$ cannot be more separating than $t$ rounds of color refinement, in accordance with known results (Xu et al., 2019; Morris et al., 2019). We thus simply cast $\mathsf{GIN}\text{s}$ in $\mathsf{GTL}(\Omega)$ to obtain an upper bound on their separation power. In the supplementary material (Section D) we give similar analyses for GraphSage $\mathsf{GNN}\text{s}$ with various aggregation functions (Hamilton et al., 2017), $\mathsf{GCN}\text{s}$ (Kipf & Welling, 2017), simplified $\mathsf{GCN}\text{s}$ ( $\mathsf{SGC}\text{s}$ ) (Wu et al., 2019), Principled Neighborbood Aggregation ( $\mathsf{PNA}$ s) (Corso et al., 2020), and revisit the analysis of $\mathsf{ChebNet}$ (Defferrard et al., 2016) given in Balcilar et al. (2021a).

2. The separation power of any vertex embedding $\mathsf{GNN}$ architecture which is an $k$ - $\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ is bounded by the power of $t$ rounds of $k\text{-}\mathsf{WL}$ .

For $k=1$ , we consider extended Graph Isomorphism Networks ( $\mathsf{e}\mathsf{GIN}\text{s}$ ) (Barceló et al., 2020). For an $\mathsf{egin}\in\mathsf{e}\mathsf{GIN}$ , $\mathsf{egin}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{0}}$ is defined as for $\mathsf{GIN}\text{s}$ , but for layer $t>0$ , $\mathsf{egin}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{t}}$ is defined by $(G,v)\mapsto{\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v:}:=\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigl{(}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v:},\textstyle\sum_{u\in N_{G}(v)}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{u:},\textstyle\sum_{u\in V_{G}}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{u:}\bigr{)}$ , where $\mathsf{mlp}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ is now an $\mathsf{MLP}$ from $\mathbb{R}^{3\ell_{t-1}}\to\mathbb{R}^{\ell_{t}}$ . The difference with $\mathsf{GIN}\text{s}$ is the use of $\sum_{u\in V_{G}}{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{u:}$ which corresponds to the unguarded summation $\sum_{x_{1}}\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})$ . This implies that $\mathsf{TL}$ rather than $\mathsf{GTL}$ needs to be used. In a similar way as for $\mathsf{GIN}\text{s}$ , we can represent $\mathsf{eGIN}$ layers in $\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ . That is, each $\mathsf{e}\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ is an $1$ - $\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ . Theorem 4.2 tells that $t$ rounds of $1\text{-}\mathsf{WL}$ bound the separation power of $t$ -layered extended $\mathsf{GIN}\text{s}$ , conform to Barceló et al. (2020). More generally, any $\mathsf{GNN}$ looking to go beyond $\mathsf{CR}$ must use non-guarded aggregations.

For $k\geq 2$ , it is straightforward to show that $t$ -layered “folklore” $\mathsf{GNN}\text{s}$ ( $k\text{-}\mathsf{FGNN}\text{s}$ ) (Maron et al., 2019b) are $k$ - $\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ and thus, by Theorem 4.2, $t$ rounds of $k\text{-}\mathsf{WL}$ bound their separation power. One merely needs to cast the layer definitions in $\mathsf{TL}(\Omega)$ and observe that $k+1$ indices and summation depth $t$ are needed. We thus refine and recover the $k\text{-}\mathsf{WL}$ bound for $k\text{-}\mathsf{FGNN}\text{s}$ by Azizian & Lelarge (2021). We also show that the separation power of $(k+1)$ -Invariant Graph Networks ( $(k+1)\text{-}\mathsf{IGN}\text{s})$ (Maron et al., 2019b) are bounded by $k\text{-}\mathsf{WL}$ , albeit with an increase in the required rounds.

Theorem 5.1.

For any $k\geq 1$ , the separation power of a $t$ -layered $(k+1)\text{-}\mathsf{IGN}\text{s}$ is bounded by the separation power of $tk$ rounds of $k\text{-}\mathsf{WL}$ .

We hereby answer open problem 1 in Maron et al. (2019a). The case $k=1$ was solved in Chen et al. (2020) by analyzing properties of $1\text{-}\mathsf{WL}$ . By contrast, Theorem 4.2 shows that one can focus on expressing $(k+1)\text{-}\mathsf{IGN}\text{s}$ in $\mathsf{TL}_{k+1}(\Omega)$ and analyzing the summation depth of expressions. The proof of Theorem 5.1 requires non-trivial manipulations of tensor language expressions; it is a simplified proof of Geerts (2020). The additional rounds ( $tk$ ) are needed because $(k+1)\text{-}\mathsf{IGN}\text{s}$ aggregate information in one layer that becomes accessible to $k\text{-}\mathsf{WL}$ in $k$ rounds. We defer detail to Section E in the supplementary material, where we also identify a simple class of $t$ -layered $(k+1)\text{-}\mathsf{IGN}\text{s}$ that are as powerful as $(k+1)\text{-}\mathsf{IGN}\text{s}$ but whose separation power is bounded by $t$ rounds of $k\text{-}\mathsf{WL}$ .

We also consider “augmented” $\mathsf{GNN}\text{s}$ , which are combined with a preprocessing step in which higher-order graph information is computed. In the supplementary material (Section D.3) we show how $\mathsf{TL}$ encodes the preprocessing step, and how this leads to separation bounds in terms of $k\text{-}\mathsf{WL}$ , where $k$ depends on the treewidth of the graph information used. Finally, our approach can also be used to show that the spectral $\mathsf{CayleyNet}$ s (Levie et al., 2019) are bounded in separation power by $2\text{-}\mathsf{WL}$ . This result complements the spectral analysis of $\mathsf{CayleyNet}$ s given in Balcilar et al. (2021b).

3. The separation power of any graph embedding $\mathsf{GNN}$ architecture which is a $k$ - $\mathsf{MPNN}$ is bounded by the power of $k\text{-}\mathsf{WL}$ .

Graph embedding methods are commonly obtained from vertex (tuple) embeddings methods by including a readout layer in which all vertex (tuple) embeddings are aggregated. For example, $\mathsf{mlp}(\sum_{v\in V}\mathsf{egin}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,v))$ is a typical readout layer for $\mathsf{e}\mathsf{GIN}\text{s}$ . Since $\mathsf{egin}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ can be represented in $\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ , the readout layer can be represented in $\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)$ , using an extra summation. So they are $1$ - $\mathsf{MPNN}\text{s}$ . Hence, their separation power is bounded by $\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , in accordance with Theorem 4.4. This holds more generally. If vertex embedding methods are $k$ - $\mathsf{MPNN}\text{s}$ , then so are their graph versions, which are then bounded by $\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}$ by our Theorem 4.4.

4. To go beyond the separation power of $k\text{-}\mathsf{WL}$ , it is necessary to use $\mathsf{GNN}\text{s}$ whose layers are represented by expressions of treewidth $>k$ .

Hence, to design expressive $\mathsf{GNN}\text{s}$ one needs to define the layers such that treewidth of the resulting $\mathsf{TL}$ expressions is large enough. For example, to go beyond $1\text{-}\mathsf{WL}$ , $\mathsf{TL}_{3}$ representable linear algebra operations should be used. Treewidth also sheds light on the open problem from Maron et al. (2019a) where it was asked whether polynomial layers (in ${\bm{A}}$ ) increase the separation power. Indeed, consider a layer of the form $\sigma({\bm{A}}^{3}\cdot{\bm{F}}\cdot{\bm{W}})$ , which raises the adjacency matrix ${\bm{A}}$ to the power three. Translated in $\mathsf{TL}(\Omega)$ , layer expressions resemble $\sum_{x_{2}}\sum_{x_{3}}\sum_{x_{4}}E(x_{1},x_{2})\cdot E(x_{2},x_{3})\cdot E(x_{3},x_{4})$ , of treewidth one. Proposition 4.5 tells that the layer is bounded by $\mathsf{wl}_{1}^{\scalebox{0.6}{(}3\scalebox{0.6}{)}}$ (and in fact by $\mathsf{cr}^{\scalebox{0.6}{(}3\scalebox{0.6}{)}}$ ) in separation power. If instead, the layer is of the form $\sigma({\bm{C}}\cdot{\bm{F}}\cdot{\bm{W}})$ where ${C}_{ij}$ holds the number of cliques containing the edge $ij$ . Then, in $\mathsf{TL}(\Omega)$ we get expressions containing $\sum_{x_{2}}\sum_{x_{3}}E(x_{1},x_{2})\cdot E(x_{1},x_{3})\cdot E(x_{2},x_{3})$ . The variables form a $3$ -clique resulting in expressions of treewidth two. As a consequence, the separation power will be bounded by $\mathsf{wl}_{2}^{\scalebox{0.6}{(}2\scalebox{0.6}{)}}$ . These examples show that it is not the number of multiplications (in both cases two) that gives power, it is how variables are connected to each other.

6 Function Approximation

We next provide characterizations of functions that can be approximated by $\mathsf{TL}$ expressions, when interpreted as functions. We recover and extend results from Azizian & Lelarge (2021) by taking the number of layers of $\mathsf{GNN}\text{s}$ into account. We also provide new results related to color refinement.

6.1 General TL Approximation Results

We assume that $\mathcal{G}_{s}$ is a compact space by requiring that vertex labels come from a compact set $K\subseteq\mathbb{R}^{\ell_{0}}$ . Let $\mathcal{F}$ be a set of functions $f:\mathcal{G}_{s}\to\mathbb{R}^{\ell_{f}}$ and define its closure $\overline{\mathcal{F}}$ as all functions $h$ from $\mathcal{G}_{s}$ for which there exists a sequence $f_{1},f_{2},\ldots\in\mathcal{F}$ such that $\lim_{i\to\infty}\mathsf{sup}_{G,{\bm{v}}}\|f_{i}(G,{\bm{v}})-h(G,{\bm{v}})\|=0$ for some norm $\|.\|$ . We assume $\mathcal{F}$ to satisfy two properties. First, $\mathcal{F}$ is concatenation-closed: if $f_{1}:\mathcal{G}_{s}\to\mathbb{R}^{p}$ and $f_{2}:\mathcal{G}_{s}\to\mathbb{R}^{q}$ are in $\mathcal{F}$ , then $g:=(f_{1},f_{2}):\mathcal{G}_{s}\to\mathbb{R}^{p+q}:(G,{\bm{v}})\mapsto(f_{1}(G,{\bm{v}}),f_{2}(G,{\bm{v}}))$ is also in $\mathcal{F}$ . Second, $\mathcal{F}$ is function-closed, for a fixed $\ell\in\mathbb{N}$ : for any $f\in\mathcal{F}$ such that $f:\mathcal{G}_{s}\to\mathbb{R}^{p}$ , also $g\circ f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}$ is in $\mathcal{F}$ for any continuous function $g:\mathbb{R}^{p}\to\mathbb{R}^{\ell}$ . For such $\mathcal{F}$ , we let $\mathcal{F}_{\ell}$ be the subset of functions in $\mathcal{F}$ from $\mathcal{G}_{s}$ to $\mathbb{R}^{\ell}$ . Our next result is based on a generalized Stone-Weierstrass Theorem (Timofte, 2005), also used in Azizian & Lelarge (2021).

Theorem 6.1.

For any $\ell$ , and any set $\mathcal{F}$ of functions, concatenation and function closed for $\ell$ , we have: $\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}\mid\rho_{s}(\mathcal{F})\subseteq\rho_{s}(f)\}$ .

This result gives us insight on which functions can be approximated by, for example, a set $\mathcal{F}$ of functions originating from a class of $\mathsf{GNN}\text{s}$ . In this case, $\overline{\mathcal{F}_{\ell}}$ represent all functions approximated by instances of such a class and Theorem 6.1 tells us that this set corresponds precisely to the set of all functions that are equally or less separating than the $\mathsf{GNN}\text{s}$ in this class. If, in addition, $\mathcal{F}_{\ell}$ is more separating that $\mathsf{CR}$ or $k\text{-}\mathsf{WL}$ , then we can say more. Let $\mathsf{alg}\in\{\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\}$ .

Corollary 6.2.

Under the assumptions of Theorem 6.1 and if $\rho(\mathcal{F}_{\ell})=\rho(\mathsf{alg})$ , then $\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}\mid\rho(\mathsf{alg})\subseteq\rho(f)\}$ .

The properties of being concatenation and function-closed are satisfied for sets of functions representable in our tensor languages, if $\Omega$ contains all continuous functions $g:\mathbb{R}^{p}\to\mathbb{R}^{\ell}$ , for any $p$ , or alternatively, all $\mathsf{MLP}\text{s}$ (by Lemma 32 in Azizian & Lelarge (2021)). Together with our results in Section 4, the corollary implies that $\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , $1$ - $\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , $k$ - $\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ or $k$ - $\mathsf{MPNN}\text{s}$ can approximate all functions with equal or less separation power than $\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , $\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , $\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ or $\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}$ , respectively.

Prop. 3.1 also tells that the closure consists of invariant ( $s=0$ ) and equivariant ( $s>0$ ) functions.

6.2 Consequences for GNNs

All our results combined provide a recipe to guarantee that a given function can be approximated by $\mathsf{GNN}$ architectures. Indeed, suppose that your class of $\mathsf{GNN}\text{s}$ is an $\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ (respectively, $1$ - $\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , $k$ - $\mathsf{MPNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ or $k$ - $\mathsf{MPNN}$ , for some $k\geq 1$ ). Then, since most classes of $\mathsf{GNN}\text{s}$ are concatenation-closed and allow the application of arbitrary $\mathsf{MLP}\text{s}$ , this implies that your $\mathsf{GNN}\text{s}$ can only approximate functions $f$ that are no more separating than $\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ (respectively, $\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , $\mathsf{vwl}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{k}$ or $\mathsf{gwl}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}_{k}$ ). To guarantee that that these functions can indeed be approximated, one additionally has to show that your class of $\mathsf{GNN}\text{s}$ matches the corresponding labeling algorithm in separation power.

For example, $\mathsf{GNN}\text{s}$ in $\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ are $\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , and thus $\overline{\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}$ contains any function $f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}$ satisfying $\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f)$ . Similarly, $\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ s are $1$ - $\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , so $\overline{\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}$ contains any function satisfying $\rho_{1}(\mathsf{wl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f)$ ; and when extended with a readout layer, their closures consist of functions $f:\mathcal{G}_{0}\to\mathbb{R}^{\ell}$ satisfying $\rho_{0}(\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{0}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{0}(f)$ . Finally, $k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ s are $k$ - $\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , so $\overline{k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}$ consist of functions $f$ such that $\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f)$ . We thus recover and extend results by Azizian & Lelarge (2021) by including layer information ( $t$ ) and by treating color refinement separately from $1\text{-}\mathsf{WL}$ for vertex embeddings. Furthermore, Theorem 5.1 implies that $\overline{(k+1)\text{-}\mathsf{IGN}_{\ell}}$ consists of functions $f$ satisfying $\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}})\subseteq\rho_{1}(f)$ and $\rho_{0}(\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}})\subseteq\rho_{0}(f)$ , a case left open in Azizian & Lelarge (2021).

These results follow from Corollary 6.2, that the respective classes of $\mathsf{GNN}\text{s}$ can simulate $\mathsf{CR}$ or $k\text{-}\mathsf{WL}$ on either graphs with discrete (Xu et al., 2019; Barceló et al., 2020) or continuous labels (Maron et al., 2019b), and that they are $k$ - $\mathsf{MPNN}\text{s}$ of the appropriate form.

7 Conclusion

Connecting $\mathsf{GNN}\text{s}$ and tensor languages allows us to use our analysis of tensor languages to understand the separation and approximation power of $\mathsf{GNN}\text{s}$ . The number of indices and summation depth needed to represent the layers in $\mathsf{GNN}\text{s}$ determine their separation power in terms of color refinement and Weisfeiler-Leman tests. The framework of $k$ - $\mathsf{MPNN}\text{s}$ provides a handy toolbox to understand existing and new $\mathsf{GNN}$ architectures, and we demonstrate this by recovering several results about the power of $\mathsf{GNN}\text{s}$ presented recently in the literature, as well as proving new results.

8 Aknowledgements & Disclosure Funding

This work is partially funded by ANID–Millennium Science Initiative Program–Code ICN17_002, Chile.

References

Abo Khamis et al. (2016) Mahmoud Abo Khamis, Hung Q. Ngo, and Atri Rudra. FAQ: Questions Asked Frequently. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pp. 13–28. ACM, 2016. URL https://doi.org/10.1145/2902251.2902280.
Aji & McEliece (2000) Srinivas M. Aji and Robert J. McEliece. The generalized distributive law. IEEE Transactions on Information Theory, 46(2):325–343, 2000. URL https://doi.org/10.1109/18.825794.
Azizian & Lelarge (2021) Waiss Azizian and Marc Lelarge. Expressive power of invariant and equivariant graph neural networks. In Proceedings of the 9th International Conference on Learning Representations, ICLR, 2021. URL https://openreview.net/forum?id=lxHgXYN4bwl.
Balcilar et al. (2021a) Muhammet Balcilar, Pierre Héroux, Benoit Gaüzère, Pascal Vasseur, Sébastien Adam, and Paul Honeine. Breaking the limits of message passing graph neural networks. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 599–608. PMLR, 2021a. URL http://proceedings.mlr.press/v139/balcilar21a.html.
Balcilar et al. (2021b) Muhammet Balcilar, Guillaume Renton, Pierre Héroux, Benoit Gaüzère, Sébastien Adam, and Paul Honeine. Analyzing the expressive power of graph neural networks in a spectral perspective. In Proceedings of the 9th International Conference on Learning Representations, ICLR, 2021b. URL https://openreview.net/forum?id=-qh0M9XWxnv.
Barceló et al. (2020) Pablo Barceló, Egor V Kostylev, Mikael Monet, Jorge Pérez, Juan Reutter, and Juan Pablo Silva. The logical expressiveness of graph neural networks. In Proceedings of the 8th International Conference on Learning Representations, ICLR, 2020. URL https://openreview.net/forum?id=r1lZ7AEKvB.
Barceló et al. (2021) Pablo Barceló, Floris Geerts, Juan L. Reutter, and Maksimilian Ryschkov. Graph neural networks with local graph parameters. In Advances in Neural Information Processing Systems, volume 34, 2021. URL https://proceedings.neurips.cc/paper/2021/hash/d4d8d1ac7e00e9105775a6b660dd3cbb-Abstract.html.
Bodnar et al. (2021) Cristian Bodnar, Fabrizio Frasca, Yuguang Wang, Nina Otter, Guido F. Montúfar, Pietro Lió, and Michael M. Bronstein. Weisfeiler and Lehman go topological: Message passing simplicial networks. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 1026–1037. PMLR, 2021. URL http://proceedings.mlr.press/v139/bodnar21a.html.
Bouritsas et al. (2020) Giorgos Bouritsas, Fabrizio Frasca, Stefanos Zafeiriou, and Michael M. Bronstein. Improving graph neural network expressivity via subgraph isomorphism counting. In Graph Representation Learning and Beyond (GRL+) Workshop at the 37 th International Conference on Machine Learning, 2020. URL https://arxiv.org/abs/2006.09252.
Brijder et al. (2019) Robert Brijder, Floris Geerts, Jan Van den Bussche, and Timmy Weerwag. On the expressive power of query languages for matrices. ACM TODS, 44(4):15:1–15:31, 2019. URL https://doi.org/10.1145/3331445.
Bruna et al. (2014) Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally connected networks on graphs. In Proceedings of the 2nd International Conference on Learning Representations, ICLR, 2014. URL https://openreview.net/forum?id=DQNsQf-UsoDBa.
Cai et al. (1992) Jin-yi Cai, Martin Fürer, and Neil Immerman. An optimal lower bound on the number of variables for graph identifications. Comb., 12(4):389–410, 1992. URL https://doi.org/10.1007/BF01305232.
Chen et al. (2019) Zhengdao Chen, Soledad Villar, Lei Chen, and Joan Bruna. On the equivalence between graph isomorphism testing and function approximation with GNNs. In Advances in Neural Information Processing Systems, volume 32, 2019. URL https://proceedings.neurips.cc/paper/2019/file/71ee911dd06428a96c143a0b135041a4-Paper.pdf.
Chen et al. (2020) Zhengdao Chen, Lei Chen, Soledad Villar, and Joan Bruna. Can graph neural networks count substructures? In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://proceedings.neurips.cc/paper/2020/file/75877cb75154206c4e65e76b88a12712-Paper.pdf.
Corso et al. (2020) Gabriele Corso, Luca Cavalleri, Dominique Beaini, Pietro Liò, and Petar Veličković. Principal neighbourhood aggregation for graph nets. In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://proceedings.neurips.cc/paper/2020/file/99cad265a1768cc2dd013f0e740300ae-Paper.pdf.
Csanky (1976) L. Csanky. Fast parallel matrix inversion algorithms. SIAM J. Comput., 5(4):618–623, 1976. URL https://doi.org/10.1137/0205040.
Curticapean et al. (2017) Radu Curticapean, Holger Dell, and Dániel Marx. Homomorphisms are a good basis for counting small subgraphs. In Proceedings of the 49th Symposium on Theory of Computing, STOC, pp. 210––223, 2017. URL http://dx.doi.org/10.1145/3055399.3055502.
Damke et al. (2020) Clemens Damke, Vitalik Melnikov, and Eyke Hüllermeier. A novel higher-order weisfeiler-lehman graph convolution. In Proceedings of The 12th Asian Conference on Machine Learning, ACML, volume 129 of Proceedings of Machine Learning Research, pp. 49–64. PMLR, 2020. URL http://proceedings.mlr.press/v129/damke20a.html.
Defferrard et al. (2016) Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems, volume 30, 2016. URL https://proceedings.neurips.cc/paper/2016/file/04df4d434d481c5bb723be1b6df1ee65-Paper.pdf.
Fürer (2001) Martin Fürer. Weisfeiler-Lehman refinement requires at least a linear number of iterations. In Proceedings of the 28th International Colloqium on Automata, Languages and Programming, ICALP, volume 2076 of Lecture Notes in Computer Science, pp. 322–333. Springer, 2001. URL https://doi.org/10.1007/3-540-48224-5_27.
Geerts (2020) Floris Geerts. The expressive power of kth-order invariant graph networks. CoRR, abs/2007.12035, 2020. URL https://arxiv.org/abs/2007.12035.
Geerts (2021) Floris Geerts. On the expressive power of linear algebra on graphs. Theory Comput. Syst., 65(1):179–239, 2021. URL https://doi.org/10.1007/s00224-020-09990-9.
Geerts et al. (2021a) Floris Geerts, Filip Mazowiecki, and Guillermo A. Pérez. Let’s agree to degree: Comparing graph convolutional networks in the message-passing framework. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 3640–3649. PMLR, 2021a. URL http://proceedings.mlr.press/v139/geerts21a.html.
Geerts et al. (2021b) Floris Geerts, Thomas Muñoz, Cristian Riveros, and Domagoj Vrgoc. Expressive power of linear algebra query languages. In Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pp. 342–354. ACM, 2021b. URL https://doi.org/10.1145/3452021.3458314.
Gilmer et al. (2017) Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pp. 1263–1272, 2017. URL {http://proceedings.mlr.press/v70/gilmer17a/gilmer17a.pdf}.
Grohe (2021) Martin Grohe. The logic of graph neural networks. In Proceedings of the 36th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS, pp. 1–17. IEEE, 2021. URL https://doi.org/10.1109/LICS52264.2021.9470677.
Hamilton (2020) William L. Hamilton. Graph representation learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 14(3):1–159, 2020. URL https://doi.org/10.2200/S01045ED1V01Y202009AIM046.
Hamilton et al. (2017) William L. Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, volume 30, 2017. URL https://proceedings.neurips.cc/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf.
Hammond et al. (2011) David K. Hammond, Pierre Vandergheynst, and Rémi Gribonval. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2):129–150, 2011. ISSN 1063-5203. doi: https://doi.org/10.1016/j.acha.2010.04.005. URL https://www.sciencedirect.com/science/article/pii/S1063520310000552.
Immerman & Lander (1990) Neil Immerman and Eric Lander. Describing graphs: A first-order approach to graph canonization. In Complexity Theory Retrospective: In Honor of Juris Hartmanis on the Occasion of His Sixtieth Birthday, pp. 59–81. Springer, 1990. URL https://doi.org/10.1007/978-1-4612-4478-3_5.
Keriven & Peyré (2019) Nicolas Keriven and Gabriel Peyré. Universal invariant and equivariant graph neural networks. In Advances in Neural Information Processing Systems, volume 32, pp. 7092–7101, 2019. URL https://proceedings.neurips.cc/paper/2019/file/ea9268cb43f55d1d12380fb6ea5bf572-Paper.pdf.
Kipf & Welling (2017) Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, ICLR, 2017. URL https://openreview.net/pdf?id=SJU4ayYgl.
Levie et al. (2019) Ron Levie, Federico Monti, Xavier Bresson, and Michael M. Bronstein. Cayleynets: Graph convolutional neural networks with complex rational spectral filters. IEEE Trans. Signal Process., 67(1):97–109, 2019. URL https://doi.org/10.1109/TSP.2018.2879624.
Maron et al. (2019a) Haggai Maron, Heli Ben-Hamu, and Yaron Lipman. Open problems: Approximation power of invariant graph networks. In NeurIPS 2019 Graph Representation Learning Workshop, 2019a. URL https://grlearning.github.io/papers/31.pdf.
Maron et al. (2019b) Haggai Maron, Heli Ben-Hamu, Hadar Serviansky, and Yaron Lipman. Provably powerful graph networks. In Advances in Neural Information Processing Systems, volume 32, 2019b. URL https://proceedings.neurips.cc/paper/2019/file/bb04af0f7ecaee4aae62035497da1387-Paper.pdf.
Maron et al. (2019c) Haggai Maron, Heli Ben-Hamu, Nadav Shamir, and Yaron Lipman. Invariant and equivariant graph networks. In Proceedings of the 7th International Conference on Learning Representations, ICLR, 2019c. URL https://openreview.net/forum?id=Syx72jC9tm.
Merkwirth & Lengauer (2005) Christian Merkwirth and Thomas Lengauer. Automatic generation of complementary descriptors with molecular graph networks. J. Chem. Inf. Model., 45(5):1159–1168, 2005. URL https://doi.org/10.1021/ci049613b.
Morgan (1965) H. L. Morgan. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. Journal of Chemical Documentation, 5(2):107–113, 1965. URL https://doi.org/10.1021/c160017a018.
Morris et al. (2019) Christopher Morris, Martin Ritzert, Matthias Fey, William L. Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and Leman go neural: Higher-order graph neural networks. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp. 4602–4609, 2019. URL https://doi.org/10.1609/aaai.v33i01.33014602.
Morris et al. (2020) Christopher Morris, Gaurav Rattan, and Petra Mutzel. Weisfeiler and Leman go sparse: Towards scalable higher-order graph embeddings. In Advances in Neural Information Processing Systems, volume 33, 2020. URL https://proceedings.neurips.cc//paper/2020/file/f81dee42585b3814de199b2e88757f5c-Paper.pdf.
Otto (2017) Martin Otto. Bounded Variable Logics and Counting: A Study in Finite Models, volume 9 of Lecture Notes in Logic. Cambridge University Press, 2017. URL https://doi.org/10.1017/9781316716878.
Otto (2019) Martin Otto. Graded modal logic and counting bisimulation. ArXiv, 2019. URL https://arxiv.org/abs/1910.00039.
Scarselli et al. (2009) Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Trans. Neural Networks, 20(1):61–80, 2009. URL https://doi.org/10.1109/TNN.2008.2005605.
Timofte (2005) Vlad Timofte. Stone–Weierstrass theorems revisited. Journal of Approximation Theory, 136(1):45–59, 2005. URL https://doi.org/10.1016/j.jat.2005.05.004.
Velickovic et al. (2018) Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations, ICLR, 2018. URL https://openreview.net/forum?id=rJXMpikCZ.
Wu et al. (2019) Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q. Weinberger. Simplifying graph convolutional networks. In Proceedings of the 36th International Conference on Machine Learning, ICML, volume 97 of Proceedings of Machine Learning Research, pp. 6861–6871. PMLR, 2019. URL http://proceedings.mlr.press/v97/wu19e.html.
Xu et al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In Proceedings of the 7th International Conference on Learning Representations, ICLR, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km.
Zaheer et al. (2017) Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets. In Advances in Neural Information Processing Systems, volume 30, 2017. URL https://proceedings.neurips.cc/paper/2017/file/f22e4747da1aa27e363d86d40ff442fe-Paper.pdf.

Supplementary Material

Appendix A Related Work Cnt’d

We provide additional details on how the tensor language $\mathsf{TL}(\Omega)$ considered in this paper relates to recent work on other matrix query languages. Closest to $\mathsf{TL}(\Omega)$ is the matrix query language $\mathsf{sum}$ - $\mathsf{MATLANG}$ (Geerts et al., 2021b) whose syntax is close to that of $\mathsf{TL}(\Omega)$ . There are, however, key differences. First, although $\mathsf{sum}$ - $\mathsf{MATLANG}$ uses index variables (called vector variables), they all must occur under a summation. In other words, the concept of free index variables is missing, which implies that no general tensors can be represented. In $\mathsf{TL}(\Omega)$ , we can represent arbitrary tensors and the presence of free index variables is crucial to define vertex, or more generally, $k$ -tuple embeddings in the context of $\mathsf{GNN}\text{s}$ . Furthermore, no notion of summation depth was introduced for $\mathsf{sum}$ - $\mathsf{MATLANG}$ . In $\mathsf{TL}(\Omega)$ , the summation depth is crucial to assess the separation power in terms of the number of rounds of color refinement and $k\text{-}\mathsf{WL}$ . And in fact, the separation power of $\mathsf{sum}$ - $\mathsf{MATLANG}$ was not considered before, and neither are finite variable fragments of $\mathsf{sum}$ - $\mathsf{MATLANG}$ and connections to color refinement and $k\text{-}\mathsf{WL}$ studied before. Finally, no other aggregation functions were considered for $\mathsf{sum}$ - $\mathsf{MATLANG}$ . We detail in Section C.5 that $\mathsf{TL}(\Omega)$ can be gracefully extended to $\mathsf{TL}(\Omega,\Theta)$ for some arbitrary set $\Theta$ of aggregation functions.

Connections to $1\text{-}\mathsf{WL}$ and $2\text{-}\mathsf{WL}$ and the separation power of another matrix query language, $\mathsf{MATLANG}$ (Brijder et al., 2019) were established in Geerts (2021). Yet, the design of $\mathsf{MATLANG}$ is completely different in spirit than that of $\mathsf{TL}(\Omega)$ . Indeed, $\mathsf{MATLANG}$ does not have index variables or explicit summation aggregation. Instead, it only supports matrix multiplication, matrix transposition, function applications, and turning a vector into a diagonal matrix. As such, $\mathsf{MATLANG}$ can be shown to be included in $\mathsf{TL}_{3}(\Omega)$ . Similarly as for $\mathsf{sum}$ - $\mathsf{MATLANG}$ , $\mathsf{MATLANG}$ cannot represent general tensors, has no (free) index variables and summation depth is not considered (in view of the absence of an explicit summation).

We also emphasize that neither for $\mathsf{MATLANG}$ nor for $\mathsf{sum}$ - $\mathsf{MATLANG}$ a guarded fragment was considered. The guarded fragment is crucial to make connections to color refinement (Theorem 4.3). Furthermore, the analysis in terms of the number of index variables, summation depth and treewidth (Theorems 4.1,4.2 and Proposition 4.5), were not considered before in the matrix query language literature. For none of these matrix query languages, approximation results were considered (Section 6.1).

Matrix query languages are used to assess the expressive power of linear algebra. Balcilar et al. (2021a) use $\mathsf{MATLANG}$ and the above mentioned connections to $1\text{-}\mathsf{WL}$ and $2\text{-}\mathsf{WL}$ , to assess the separation power of $\mathsf{GNN}\text{s}$ . More specifically, similar to our work, they show that several $\mathsf{GNN}$ architectures can be represented in $\mathsf{MATLANG}$ , or fragments thereof. As a consequence, bounds on their separation power easily follow. Furthermore, Balcilar et al. (2021a) propose new architectures inspired by special operators in $\mathsf{MATLANG}$ . The use of $\mathsf{TL}(\Omega)$ can thus been seen as a continuation of their approach. We note, however, that $\mathsf{TL}(\Omega)$ is more general than $\mathsf{MATLANG}$ (which is included in $\mathsf{TL}_{3}(\Omega)$ ), allows to represent more complex linear algebra computations by means summation (or other) aggregation, and finally, provides insights in the number of iterations needed for color refinement and $k\text{-}\mathsf{WL}$ . The connection between the number of variables (or treewidth) and $k\text{-}\mathsf{WL}$ is not present in the work by Balcilar et al. (2021a), neither is the notion of guarded fragment, needed to connect to color refinement. We believe that it is precisely these latter two insights that make the tensor language approach valuable for any $\mathsf{GNN}$ designer who wishes to upper bound their $\mathsf{GNN}$ architecture.

Appendix B Details of Section 3

B.1 Proof of Proposition 3.1

Let $G=(V,E,\mathsf{col})$ be a graph and let $\sigma$ be a permutation of $V$ . As usual, we define $\sigma\star G=(V^{\sigma},E^{\sigma},\mathsf{col}^{\sigma})$ as the graph with vertex set $V^{\sigma}:=V$ , edge set $vw\in E^{\sigma}$ if and only if $\sigma^{-1}(v)\sigma^{-1}(w)\in E$ , and $\mathsf{col}^{\sigma}(v):=\mathsf{col}(\sigma^{-1}(v))$ . We need to show that for any expression $\varphi({\bm{x}})$ in $\mathsf{TL}(\Omega)$ either $[\![\varphi,\sigma\star{\bm{v}}]\!]_{\sigma\star G}=[\![\varphi,{\bm{v}}]\!]_{G}$ , or when $\varphi$ has no free index variables, $[\![\varphi]\!]_{\sigma\star G}=[\![\varphi]\!]_{G}$ . We verify this by a simple induction on the structure of expressions in $\mathsf{TL}(\Omega)$ .

•

If $\varphi(x_{i},x_{j})=\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}}$ , then for a valuation $\nu$ mapping $x_{i}$ to $v_{i}$ and $x_{j}$ to $v_{j}$ in $V$ :

[\![\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}},\nu]\!]_{G}=\bm{1}_{v_{i}\mathop{\mathsf{op}}v_{j}}=\bm{1}_{\sigma(v_{i})\mathop{\mathsf{op}}\sigma(v_{j})}=[\![\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}},\sigma\star\nu]\!]_{\sigma\star G},

where we used that $\sigma$ is a permutation.

•

If $\varphi(x_{i})=P_{\ell}(x_{i})$ , then for a valuation $\mu$ mapping $x_{i}$ to $v_{i}$ in $V$ :

[\![P_{\ell},\mu]\!]_{G}=(\mathsf{col}(v_{i}))_{\ell}=(\mathsf{col}^{\sigma}(\sigma(v_{i}))_{\ell}=[\![P_{\ell},\sigma\star\nu]\!]_{\sigma\star G},

where we used the definition of $\mathsf{col}^{\sigma}$ .

•

Similarly, if $\varphi(x_{i},x_{j})=E(x_{i},x_{j})$ , then for a valuation $\nu$ assigning $x_{i}$ to $v_{i}$ and $x_{j}$ to $v_{j}$ :

[\![\varphi,\nu]\!]_{G}=\bm{1}_{v_{i}v_{j}\in E}=\bm{1}_{\sigma(v_{i})\sigma(v_{j})\in E^{\sigma}}=[\![\varphi,\sigma\star\nu]\!]_{\sigma\star G},

where we used the definition of $E^{\sigma}$ .

•

If $\varphi({\bm{x}})=\varphi_{1}({\bm{x}}_{1})\cdot\varphi_{2}({\bm{x}}_{2})$ , then for a valuation $\nu$ from ${\bm{x}}$ to $V$ :

[\![\varphi,\nu]\!]_{G}=[\![\varphi_{1},\nu]\!]_{G}\cdot[\![\varphi_{2},\nu]\!]_{G}=[\![\varphi_{1},\sigma\star\nu]\!]_{\sigma\star G}\cdot[\![\varphi_{2},\sigma\star\nu]\!]_{\sigma\star G}=[\![\varphi,\sigma\star\nu]\!]_{\sigma\star G},

where we used the induction hypothesis for $\varphi_{1}$ and $\varphi_{2}$ . The cases $\varphi({\bm{x}})=\varphi_{1}({\bm{x}}_{1})+\varphi_{2}({\bm{x}}_{2})$ and $\varphi({\bm{x}})=a\cdot\varphi_{1}({\bm{x}})$ are dealt with in a similar way.

•

If $\varphi({\bm{x}})=f(\varphi_{1}({\bm{x}}_{1}),\ldots,\varphi_{p}({\bm{x}}_{p}))$ , then

	$\displaystyle[\![\varphi,\nu]\!]_{G}$	$\displaystyle=f([\![\varphi_{1},\nu]\!]_{G},\ldots,[\![\varphi_{p},\nu]\!]_{G})$
		$\displaystyle=f([\![\varphi_{1},\sigma\star\nu]\!]_{\sigma\star G},\ldots,[\![\varphi_{p},\sigma\star\nu]\!]_{\sigma\star G})$
		$\displaystyle=[\![\varphi,\sigma\star\nu]\!]_{\sigma\star G},$

where we used again the induction hypothesis for $\varphi_{1},\ldots,\varphi_{p}$ .

•

Finally, if $\varphi({\bm{x}})=\sum_{y}\varphi_{1}({\bm{x}},y)$ then for a valuation $\nu$ of ${\bm{x}}$ to $V$ :

	$\displaystyle[\![\varphi,\nu]\!]_{G}$	$\displaystyle=\sum_{v\in V}[\![\varphi_{1},\nu[y\mapsto v]]\!]_{G}=\sum_{v\in V}[\![\varphi_{1},\sigma\star\nu[y\mapsto v]]\!]_{\sigma\star G}$
		$\displaystyle=\sum_{v\in V^{\sigma}}[\![\varphi_{1},\sigma\star\nu[y\mapsto v]]\!]_{\sigma\star G}=[\![\varphi,\sigma\star\nu]\!]_{\sigma\star G},$

where we used the induction hypothesis for $\varphi_{1}$ and that $V^{\sigma}=V$ because $\sigma$ is a permutation.

We remark that when $\varphi$ does not contain free index variables, then $[\![\varphi,\nu]\!]_{G}=[\![\varphi]\!]_{G}$ for any valuation $\nu$ , from which invariance follows from the previous arguments. This concludes the proof of Proposition 3.1.

Appendix C Details of Section 4

In the following sections we prove Theorem 4.1, 4.2, 4.3 and 4.4. More specifically, we start by showing these results in the setting that $\mathsf{TL}(\Omega)$ only supports summation aggregation ( $\sum_{x}e$ ) and in which the vertex-labellings in graphs take values in $\{0,1\}^{\ell}$ . In this context, we introduce classical logics in Section C.1 and recall and extend connections between the separation power of these logics and the separation power of color refinement and $k\text{-}\mathsf{WL}$ in Section C.2. We connect $\mathsf{TL}(\Omega)$ and logics in Section C.3, to finally obtain the desired proofs in Section C.4. We then show how these results can be generalized in the presence of general aggregation operators in Section C.5, and to the setting where vertex-labellings take values in $\mathbb{R}^{\ell}$ in Section C.6.

C.1 Classical Logics

In what follows, we consider graphs $G=(V_{G},E_{G},\mathsf{col}_{G})$ with $\mathsf{col}_{G}:V_{G}\to\{0,1\}^{\ell}$ . We start by defining the $k$ -variable fragment $\mathsf{C}^{k}$ of first-order logic with counting quantifiers, followed by the definition of the guarded fragment $\mathsf{GC}$ of $\mathsf{C}^{2}$ . Formulae $\varphi$ in $\mathsf{C}^{k}$ are defined over the set $\{x_{1},\ldots,x_{k}\}$ of variables and are formed by the following grammar:

\varphi:=(x_{i}=x_{j})\,\,|\,\,E(x_{i},x_{j})\,\,|\,\,P_{s}(x_{i})\,\,|\,\,\neg\varphi\,\,|\,\,\varphi\land\varphi\,\,|\,\,\exists^{\geq m}x_{i}\,\varphi,

where $i,j\in[k]$ , $E$ is a binary predicate, $P_{s}$ for $s\in[\ell]$ are unary predicates for some $\ell\in\mathbb{N}$ , and $m\in\mathbb{N}$ . The semantics of formulae in $\mathsf{C}^{k}$ is defined in terms of interpretations relative to a given graph $G$ and a (partial) valuation $\mu:\{x_{1},\ldots,x_{k}\}\to V_{G}$ . Such an interpretation maps formulae, graphs and valuations to Boolean values $\mathbb{B}:=\{\bot,\top\}$ , in a similar way as we did for tensor language expressions.

More precisely, given a graph $G=(V_{G},E_{G},\mathsf{col}_{G})$ and partial valuation $\mu:\{x_{1},\ldots,x_{k}\}\to V_{G}$ , we define $[\![\varphi,\mu]\!]_{G}^{\mathbb{B}}\in\mathbb{B}$ for valuations defined on the free variables in $\varphi$ . That is, we define:

	$\displaystyle[\![x_{i}=x_{j},\mu]\!]_{G}^{\mathbb{B}}$	$\displaystyle:=\mathrm{if~{}}\mu(x_{i})=\mu(x_{j})\text{~{}then~{}}\top\text{~{}else~{}}\bot;$
	$\displaystyle[\![E(x_{i},x_{j}),\mu]\!]_{G}^{\mathbb{B}}$	$\displaystyle:=\mathrm{if~{}}\mu(x_{i})\mu(x_{j})\in E_{G}\text{~{}then~{}}\top\text{~{}else~{}}\bot;$
	$\displaystyle[\![P_{s}(x_{i}),\mu]\!]_{G}^{\mathbb{B}}$	$\displaystyle:=\mathrm{if~{}}\mathsf{col}_{G}(\mu(x_{i}))_{s}=1\text{~{}then~{}}\top\text{~{}else~{}}\bot;$
	$\displaystyle[\![\neg\varphi,\mu]\!]_{G}^{\mathbb{B}}$	$\displaystyle:=\neg[\![\varphi,\mu]\!]_{G}^{\mathbb{B}};$
	$\displaystyle[\![\varphi_{1}\land\varphi_{2},\mu]\!]_{G}^{\mathbb{B}}$	$\displaystyle:=[\![\varphi_{1},\mu]\!]_{G}^{\mathbb{B}}\land[\![\varphi_{2},\mu]\!]_{G}^{\mathbb{B}};$
	$\displaystyle[\![\exists^{\geq m}x_{i}\,\varphi_{1},\mu]\!]_{G}^{\mathbb{B}}$	$\displaystyle:=\text{~{}if~{}}\|\{v\in V_{G}\mid[\![\varphi,\mu[x_{i}\mapsto v]]\!]_{G}^{\mathbb{B}}=\top\}\|\geq m\text{~{}then~{}}\top\text{~{}else~{}}\bot.$

In the last expression, $\mu[x_{i}\mapsto v]$ denotes the valuation $\mu$ modified such that it maps $x_{i}$ to vertex $v$ .

We will also need the guarded fragment $\mathsf{GC}$ of $\mathsf{C}^{2}$ in which we only allow equality conditions of the form $x_{i}=x_{i}$ , component expressions of conjunction and disjunction should have the same single free variable, and counting quantifiers can only occur in guarded form: $\exists^{\geq m}x_{2}(E(x_{1},x_{2})\land\varphi(x_{2}))$ or $\exists^{\geq m}x_{1}(E(x_{2},x_{1})\land\varphi(x_{1}))$ . The semantics of formulae in $\mathsf{GC}$ is inherited from formulae in $\mathsf{C}^{2}$ .

Finally, we will also consider $\mathsf{C}^{k}_{\infty\omega}$ , that is, the logic $\mathsf{C}^{k}$ extended with infinitary disjunctions and conjunctions. More precisely, we add to the grammar of formulae the following constructs:

\bigvee_{\alpha\in A}\varphi_{\alpha}\text{ and }\bigwedge_{\alpha\in A}\varphi_{\alpha}

where the index set $A$ can be arbitrary, even containing uncountably many indices. We define $\mathsf{GC}_{\infty\omega}$ in the same way by relaxing the finite variable conditions. The semantics is, as expected: $[\![\bigvee_{\alpha\in A}\varphi_{\alpha},\mu]\!]_{G}^{\mathbb{B}}=\top$ if for at least one $\alpha\in A$ , $[\![\varphi_{\alpha},\mu]\!]_{G}^{\mathbb{B}}=\top$ , and $[\![\bigwedge_{\alpha\in A}\varphi_{\alpha},\mu]\!]_{G}^{\mathbb{B}}=\top$ if for all $\alpha\in A$ , $[\![\varphi_{\alpha},\mu]\!]_{G}^{\mathbb{B}}=\top$ .

We define the free variables of formulae just as for $\mathsf{TL}$ , and similarly, quantifier rank is defined as summation depth (only existential quantifications increase the quantifier rank). For any of the above logics $\mathcal{L}$ we define $\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ as the set of formulae in $\mathcal{L}$ of quantifier rank at most $t$ .

To capture the separation power of logics, we define $\rho_{1}\bigl{(}\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ as the equivalence relation on $\mathcal{G}_{1}$ defined by

\bigl{(}(G,v),(H,w)\bigr{)}\in\rho_{1}\bigl{(}\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\Longleftrightarrow\forall\varphi(x)\in\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:[\![\varphi,\mu_{v}]\!]_{G}^{\mathbb{B}}=[\![\varphi,\mu_{w}]\!]_{H}^{\mathbb{B}},

where $\mu_{v}$ is any valuation such that $\mu(x)=v$ , and likewise for $w$ . The relation $\rho_{0}$ is defined in a similar way, except that now the relation is only over pairs of graphs, and the characterization is over all formulae with no free variables (also called sentences). Finally, we also use, and define, the relation $\rho_{s}$ , which relates pairs from $\mathcal{G}_{s}$ : consisting of a graph and an $s$ -tuple of vertices. The relation is defined as

\bigl{(}(G,{\bm{v}}),(H,{\bm{w}})\bigr{)}\in\rho_{s}\bigl{(}\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\Longleftrightarrow\forall\varphi({\bm{x}})\in\mathcal{L}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:[\![\varphi,\mu_{\bm{v}}]\!]_{G}^{\mathbb{B}}=[\![\varphi,\mu_{\bm{w}}]\!]_{H}^{\mathbb{B}},

where ${\bm{x}}$ consist of $s$ free variables and $\mu_{\bm{v}}$ is a valuation assigning the $i$ -th variable of ${\bm{x}}$ to the $i$ -th value of ${\bm{v}}$ , for any $i\in[s]$ .

C.2 Characterization of Separation Power of Logics

We first connect the separation power of the color refinement and $k$ -dimensional Weisfeiler-Leman algorithms to the separation power of the logics we just introduced. Although most of these connections are known, we present them in a bit of a more fine-grained way. That is, we connect the number of rounds used in the algorithms to the quantifier rank of formulae in the above logics.

Proposition C.1.

For any $t\geq 0$ , we have the following identities:

(1)

$\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ and $\rho_{0}\bigl{(}\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{C}^{2,(t+1)}\bigr{)}$ ;

(2)

For $k\geq 1$ , $\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ and

\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+k)}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+1)}\bigr{)}.

As a consequence, $\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{(\infty)}\bigr{)}=\rho_{0}\bigl{(}\mathsf{C}^{k+1}\bigr{)}$ .

Proof.

For (1), the identity $\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ is known and can be found, for example, in Theorem V.10 in Grohe (2021). The identity $\rho_{0}\bigl{(}\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ can be found in Proposition V.4 in Grohe (2021). The identity $\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{C}^{2,(t+1)}\bigr{)}$ is a consequence of the inclusion shown in (2) for $k=1$ .

For (2), we use that $\rho_{k}\bigl{(}\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{k}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ , see e.g., Theorem V.8 in Grohe (2021). We argue that this identity holds for $\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ . Indeed, suppose that $(G,v)$ and $(H,w)$ are not in $\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ . Let $\varphi(x_{1})$ be a formula in $\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ such that $[\![\varphi,v]\!]_{G}^{\mathbb{B}}\neq[\![\varphi,w]\!]_{H}^{\mathbb{B}}$ . Consider the formula $\varphi^{+}(x_{1},\ldots,x_{k})=\varphi(x_{1})\land\bigwedge_{i=1}^{k}(x_{1}=x_{i})$ . Then, $[\![\varphi^{+},(v,\ldots,v)]\!]_{G}^{\mathbb{B}}\neq[\![\varphi^{+},(w,\ldots,w)]\!]_{H}^{\mathbb{B}}$ , and hence $(G,(v,\ldots,v))$ and $(H,(w,\ldots,w))$ are not in $\rho_{k}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ either. This implies that $\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,(v,\ldots,v))\neq\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(H,(w,\ldots,w))$ , and thus, by definition, $\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,v)\neq\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(H,w)$ . In other words, $(G,v)$ and $(H,w)$ are not in $\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ , from which the inclusion $\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ follows. Conversely, if $(G,v)$ and $(H,w)$ are not in $\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ , then $\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,(v,\ldots,v))\neq\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(H,(w,\ldots,w))$ . As a consequence, $(G,(v,\ldots,v))$ and $(H,(w,\ldots,w))$ are not in $\rho_{k}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ either. Let $\varphi(x_{1},\ldots,x_{k})$ be a formula in $\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ such that $[\![\varphi,(v,\ldots,v)]\!]_{G}^{\mathbb{B}}\neq[\![\varphi,(w,\ldots,w)]\!]_{H}^{\mathbb{B}}$ . Then it is readily shown that we can convert $\varphi(x_{1},\ldots,x_{k})$ into a formula $\varphi^{-}(x_{1})$ in $\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ such that $[\![\varphi^{-},v]\!]_{G}^{\mathbb{B}}\neq[\![\varphi^{-},w]\!]_{H}^{\mathbb{B}}$ , and thus $(G,v)$ and $(H,w)$ are not in $\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ . Hence, we also have the inclusion $\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\supseteq\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ , form which the first identity in (2) follows.

It remains to show $\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+k)}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+1)}\bigr{)}$ . Clearly, if $(G,H)$ is not in $\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ then the multisets of labels $\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,{\bm{v}})$ and $\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(H,{\bm{w}})$ differ. It is known that with each label $c$ one can associate a formula $\varphi^{c}$ in $\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ such that $[\![\varphi^{c},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top$ if and only if $\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(G,{\bm{v}})=c$ . So, if the multisets are different, there must be a $c$ that occurs more often in one multiset than in the other one. This can be detected by a fomulae of the form $\exists^{=m}(x_{1},\ldots,x_{k})\varphi^{c}(x_{1},\ldots,x_{k})$ which is satisfied if there are $m$ tuples ${\bm{v}}$ with label $c$ . It is now easily verified that the latter formula can be converted into a formula in $\mathsf{C}^{k+1,(t+k)}$ . Hence, the inclusion $\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+k)}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ follows.

For $\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{C}^{k+1,(t+1)}\bigr{)}$ , we show that if $(G,H)$ is in $\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ , then this implies that $[\![\varphi,\mu]\!]_{G}^{\mathbb{B}}=[\![\varphi,\mu]\!]_{H}^{\mathbb{B}}$ for all formulae in $\mathsf{C}^{k+1,(t+1)}$ and any valuation $\mu$ (notice that $\mu$ is superfluous in this definition when formulas have no free variables). Assume that $(G,H)$ is in $\rho_{0}(\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ . Since any formula of quantifier rank $t+1$ is a Boolean combination of formulas of less rank or a formula of the form $\varphi=\exists^{\geq m}x_{i}\,\psi$ where $\psi$ is of quantifier rank $t$ , without loss of generality consider a formula of the latter form, and assume for the sake of contradiction that $[\![\varphi,\mu]\!]_{G}^{\mathbb{B}}=\top$ but $[\![\varphi,\mu]\!]_{H}^{\mathbb{B}}=\bot$ . Since $[\![\varphi,\mu]\!]_{G}^{\mathbb{B}}=\top$ , there must be at least $m$ elements satisfying $\psi$ . More precisely, let $v_{1},\dots,v_{p}$ in $G$ be all vertices in $G$ such that for each valuation $\mu[x\mapsto v_{i}]$ it holds that $[\![\psi,\mu[x\mapsto v_{i}]\!]_{G}^{\mathbb{B}}=\top$ . As mentioned, it must be that $p$ is at least $m$ . Using again the fact that $\rho_{k}\bigl{(}\mathsf{wl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{k}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ , we infer that the color $\mathsf{wl}_{k}^{(t-1)}(G,(v_{i},\ldots,v_{i}))$ is the same, for each such $v_{i}$ .

Now since $\mathsf{gwl}_{k}^{(t-1)}(G)=\mathsf{gwl}_{k}^{(t-1)}(H)$ , it is not difficult to see that there must be exactly $p$ vertices $w_{1},\dots,w_{p}$ in $H$ such that $\mathsf{wl}_{k}^{(t-1)}(G,(v_{i},\ldots,v_{i}))=\mathsf{wl}_{k}^{(t-1)}(H,(w_{i},\ldots,w_{i}))$ . Otherwise, it would simply not be the case that the aggregation step of the colors, assigned by $k\text{-}\mathsf{WL}$ is the same in $G$ and $H$ . By the connection to logic, we again know that for valuation $\mu[x\mapsto w_{i}]$ it holds that $[\![\psi,\mu[x\mapsto w_{i}]\!]_{H}^{\mathbb{B}}=\top$ . It then follows that $[\![\varphi,\mu]\!]_{H}^{\mathbb{B}}=\top$ for any valuation $\mu$ , which was to be shown.

Finally, we remark that $\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{(\infty)}\bigr{)}=\rho_{0}\bigl{(}\mathsf{C}^{k+1}\bigr{)}$ follows from the preceding inclusions in (2). ∎

Before moving to tensor languages, where we will use infinitary logics to simulate expressions in $\mathsf{TL}_{k}(\Omega)$ and $\mathsf{GTL}(\Omega)$ , we recall that, when considering the separation power of logics, we can freely move between the logics and their infinitary counterparts:

Theorem C.2.

The following identities hold for any $t\geq 0$ , $k\geq 2$ and $s\geq 0$ :

(1)

$\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega}\bigr{)}=\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ ;
(2)

$\rho_{s}\bigl{(}\mathsf{C}^{k,\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega}\bigr{)}=\rho_{s}\bigl{(}\mathsf{C}^{k,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ .

Proof.

For identity (1), notice that we only need to prove that $\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega}\bigr{)}$ , the other direction follows directly from the definition. We point out the well-known fact that two tuples $(G,v)$ and $(H,w)$ belong to $\rho_{1}\bigl{(}\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ if and only if the unravelling of $G$ rooted at $v$ up to depth $t$ is isomorphic to the unravelling of $H$ rooted at $w$ up to root $t$ . Here the unravelling is the infinite tree whose root is the root node, and whose children are the neighbors of the root node (see e.g. Barceló et al. (2020); Otto (2019). Now for the connection with infinitary logic. Assume that the unravellings of $G$ rooted at $v$ and of $H$ rooted at $w$ up to level $t$ are isomorphic, but assume for the sake of contradiction that there is a formula $\varphi(x)$ in $\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega}$ such that $[\![\varphi,\mu_{v}]\!]_{G}^{\mathbb{B}}\neq[\![\varphi,\mu_{w}]\!]_{H}^{\mathbb{B}}$ , where $\mu_{v}$ and $\mu_{w}$ are any valuation mapping variable $x$ to $v$ and $w$ , respectively. Now since $G$ and $H$ are finite graphs, one can construct, from formula $\phi$ , a formula $\phi^{\prime}$ in $\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ such that $[\![\psi,\mu_{v}]\!]_{G}^{\mathbb{B}}\neq[\![\psi,\mu_{w}]\!]_{H}^{\mathbb{B}}$ . Notice that this is in contradiction with our assumption that unravellings where isomorphic and therefore indistinguishable by formulae in $\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ . To construct $\psi$ , consider an infinitary disjunction $\bigvee_{a\in A}\alpha_{a}$ . Since $G$ and $H$ have a finite number of vertices, and the formulae have a finite number of variables, the number of different valuations from the variables to the vertices in $G$ or $H$ is also finite. Thus, one can replace any extra copy of $\alpha_{a}$ , $\alpha_{a^{\prime}}$ such that their value is the same in $G$ and $H$ . The final result is a finite disjunction, and the truth value over $G$ and $H$ is equivalent to the original infinitary disjunction.

For identity (2) we refer to Corollary 2.4 in Otto (2017). ∎

C.3 From $\mathsf{TL}(\Omega)$ to $\mathsf{C}^{k}_{\infty\omega}$ and $\mathsf{GC}_{\infty\omega}$

We are now finally ready to make the connection between expressions in $\mathsf{TL}(\Omega)$ and the infinitary logics introduced earlier.

Proposition C.3.

For any expression $\varphi({\bm{x}})$ in $\mathsf{TL}_{k}(\Omega)$ and $c\in\mathbb{R}$ , there exists an expression $\tilde{\varphi}^{c}({\bm{x}})$ in $\mathsf{C}^{k}_{\infty\omega}$ such that $[\![\varphi,{\bm{v}}]\!]_{G}=c$ if and only if $[\![\tilde{\varphi}^{c},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top$ for any graph $G=(V_{G},E_{G},\mathsf{col}_{G})$ in $\mathcal{G}$ and ${\bm{v}}\in V_{G}^{k}$ . Furthermore, if $\varphi(x)\in\mathsf{GTL}(\Omega)$ then $\tilde{\varphi}^{c}\in\mathsf{GC}_{\infty\omega}$ . Finally, if $\varphi$ has summation depth $t$ then $\tilde{\varphi}^{c}$ has quantifier rank $t$ .

Proof.

We define $\tilde{\varphi}^{c}$ inductively on the structure of expressions in $\mathsf{TL}_{k}(\Omega)$ .

•

$\varphi(x_{i},x_{j}):=\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}}$ . Assume first that $\mathop{\mathsf{op}}$ is “ $=$ ”. We distinguish between (a) $i\neq j$ and (b) $i=j$ . For case (a), if $c=1$ , then we define $\tilde{\varphi}^{1}(x_{i},x_{j}):=(x_{i}=x_{j})$ , if $c=0$ , then we define $\tilde{\varphi}^{0}(x_{i},x_{j}):=\neg(x_{i}=x_{j})$ , and if $c\neq 0,1$ , then we define $\tilde{\varphi}^{c}(x_{i},x_{j}):=x_{i}\neq x_{i}$ . For case (b), if $c=1$ , then we define $\tilde{\varphi}^{1}(x_{i},x_{j}):=(x_{i}=x_{i})$ , and for any $c\neq 1$ , we define $\tilde{\varphi}^{c}(x_{i},x_{j}):=\neg(x_{i}=x_{i})$ . The case when $\mathop{\mathsf{op}}$ is “ $\neq$ ” is treated analogously.
•

$\varphi(x_{i}):=P_{\ell}(x_{i})$ . If $c=1$ , then we define $\tilde{\varphi}^{1}(x_{i}):=P_{\ell}(x_{i})$ , if $c=0$ , then we define $\tilde{\varphi}^{0}(x_{i}):=\neg P_{j}(x_{i})$ . For all other $c$ , we define $\tilde{\varphi}^{c}(x_{i},x_{j}):=\neg(x_{i}=x_{i})$ .
•

$\varphi(x_{i},x_{j}):=E(x_{i},x_{j})$ . If $c=1$ , then we define $\tilde{\varphi}^{1}(x_{i},x_{j}):=E(x_{i},x_{j})$ , if $c=0$ , then we define $\tilde{\varphi}^{0}(x_{i},x_{j}):=\neg E(x_{i},x_{j})$ . For all other $c$ , we define $\tilde{\varphi}^{c}(x_{i},x_{j}):=\neg(x_{i}=x_{i})$ .

•

$\varphi:=\varphi_{1}+\varphi_{2}$ . We observe that $[\![\varphi,{\bm{v}}]\!]_{G}=c$ if and only if there are $c_{1},c_{2}\in\mathbb{R}$ such that $[\![\varphi_{1},{\bm{v}}]\!]_{G}=c_{1}$ and $[\![\varphi_{2},{\bm{v}}]\!]_{G}=c_{2}$ and $c=c_{1}+c_{2}$ . Hence, it suffices to define

\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}c_{1},c_{2}\in\mathbb{R}\\ c=c_{1}+c_{2}\end{subarray}}\tilde{\varphi}_{1}^{c_{1}}\land\tilde{\varphi}_{2}^{c_{2}},

where $\tilde{\varphi}_{1}^{c_{1}}$ and $\tilde{\varphi}_{2}^{c_{2}}$ are the expressions such that $[\![\varphi_{1},{\bm{v}}]\!]_{G}=c_{1}$ if and only if $[\![\tilde{\varphi}_{1}^{c_{1}},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top$ and $[\![\varphi_{2},{\bm{v}}]\!]_{G}=c_{2}$ if and only if $[\![\tilde{\varphi}_{2}^{c_{2}},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top$ , which exist by induction.

•

$\varphi:=\varphi_{1}\cdot\varphi_{2}$ . This is case is analogous to the previous one. Indeed, $[\![\varphi,{\bm{v}}]\!]_{G}=c$ if and only if there are $c_{1},c_{2}\in\mathbb{R}$ such that $[\![\varphi_{1},{\bm{v}}]\!]_{G}=c_{1}$ and $[\![\varphi_{2},{\bm{v}}]\!]_{G}=c_{2}$ and $c=c_{1}\cdot c_{2}$ . Hence, it suffices to define

\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}c_{1},c_{2}\in\mathbb{R}\\ c=c_{1}\cdot c_{2}\end{subarray}}\tilde{\varphi}_{1}^{c_{1}}\land\tilde{\varphi}_{2}^{c_{2}}.

•

$\varphi:=a\cdot\varphi_{1}$ . This is case is again dealt with in a similar way. Indeed, $[\![\varphi,{\bm{v}}]\!]_{G}=c$ if and only if there is a $c_{1}\in\mathbb{R}$ such that $[\![\varphi_{1},{\bm{v}}]\!]_{G}=c_{1}$ and $c=a\cdot c_{1}$ . Hence, it suffices to define

$\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}c_{1}\in\mathbb{R}\\ c=a\cdot c_{1}\end{subarray}}\tilde{\varphi}_{1}^{c_{1}}.$

•

$\varphi:=f(\varphi_{1},\ldots,\varphi_{p})$ with $f:\mathbb{R}^{p}\to\mathbb{R}$ . We observe that $[\![\varphi,{\bm{v}}]\!]_{G}=c$ if and only if there are $c_{1},\ldots,c_{p}\in\mathbb{R}$ such that $c=f(c_{1},\ldots,c_{p})$ and $[\![\varphi_{i},{\bm{v}}]\!]_{G}=c_{i}$ for $i\in[p]$ . Hence, it suffices to define

\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}c_{1},\ldots,c_{p}\in\mathbb{R}\\ c=f(c_{1},\ldots,c_{p})\end{subarray}}\tilde{\varphi}_{1}^{c_{1}}\land\cdots\land\tilde{\varphi}_{p}^{c_{p}}.

•

$\varphi:=\sum_{x_{i}}\varphi_{1}$ . We observe that $[\![\varphi,\mu]\!]_{G}=c$ implies that we can partition $V_{G}$ into $\ell$ parts $V_{1},\ldots,V_{\ell}$ , of sizes $m_{1},\ldots,m_{\ell}$ , respectively, such that $[\![\varphi_{1},\mu[x_{i}\to v]]\!]_{G}=c_{i}$ for each $v\in V_{i}$ , and such that all $c_{i}$ ’s are pairwise distinct and $c=\sum_{i=1}^{\ell}c_{i}\cdot m_{i}$ . It now suffices to consider the following formula

\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}\ell,m_{1},\ldots,m_{\ell}\in\mathbb{N}\\ c_{1},\ldots,c_{\ell}\in\mathbb{R}\\ c=\sum_{i=1}^{\ell}m_{i}c_{i}\end{subarray}}\bigwedge_{i=1}^{\ell}\exists^{=m_{i}}x_{i}\,\tilde{\varphi}_{1}^{c_{i}}\land\forall x_{i}\,\bigvee_{i=1}^{\ell}\tilde{\varphi}_{1}^{c_{i}},

where $\exists^{=m_{i}}x_{i}\,\psi$ is shorthand notation for $\exists^{\geq m_{i}}x_{i}\,\psi\land\neg\exists^{\geq m_{i}+1}x_{i}\,\psi$ , and $\forall x_{i}\,\psi$ denotes $\neg\exists^{\geq 1}x_{i}\,\neg\psi$ .

This concludes the construction of $\tilde{\varphi}^{c}$ . We observe that we only introduce a quantifiers when $\varphi=\sum_{x_{i}}\varphi_{1}$ and hence if we assume by induction that summation depth and quantifier rank are in sync, then if $\varphi_{1}$ has summation depth $t-1$ and thus $\tilde{\varphi}_{1}^{c}$ has quantifier rank $t-1$ for any $c\in\mathbb{R}$ , then $\varphi$ has summation depth $t$ , and as can be seen from the definition of $\tilde{\varphi}^{c}$ , this formula has quantifier rank $t$ , as desired.

It remains to verify the claim about guarded expressions. This is again verified by induction. The only case requiring some attention is $\varphi(x_{1}):=\sum_{x_{2}}\,E(x_{1},x_{2})\land\varphi_{1}(x_{2})$ for which we can define

\tilde{\varphi}^{c}:=\bigvee_{\begin{subarray}{c}\ell,m_{1},\ldots,m_{\ell}\in\mathbb{N}\\ c_{1},\ldots,c_{\ell}\in\mathbb{R}\\ c=\sum_{i=1}^{\ell}m_{i}c_{i}\\ m=\sum_{i=1}^{\ell}m_{i}\end{subarray}}\exists^{=m}x_{2}E(x_{1},x_{2})\land\bigwedge_{i=1}^{\ell}\exists^{=m_{i}}x_{2}\,E(x_{1},x_{2})\land\tilde{\varphi}_{1}^{c_{i}}(x_{2}),

which is a formula in $\mathsf{GC}$ again only adding one to the quantifier rank of the formulae $\tilde{\varphi}_{1}^{c}$ for $c\in\mathbb{R}$ . So also here, we have the one-to-one correspondence between summation depth and quantifier rank. ∎

C.4 Proof of Theorem 4.1, 4.2, 4.3 and 4.4

Proposition C.4.

We have the following inclusions: For any $t\geq 0$ and any collection $\Omega$ of functions:

•

$\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}$ ;
•

$\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}$ ; and
•

$\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{TL}_{k+1}^{(t+1)}(\Omega)\bigr{)}$ .

Proof.

We first show the second bullet by contraposition. That is, we show that if $(G,v)$ and $(H,w)$ are not in $\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}$ , then neither are they in $\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ . Indeed, suppose that there exists an expression $\varphi(x_{1})$ in $\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ such that $[\![\varphi,v]\!]_{G}=c\neq c^{\prime}=[\![\varphi,w]\!]_{H}$ . From Proposition C.3 we know that there exists a formula $\tilde{\varphi}^{c}$ in $\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega}$ such that $[\![\tilde{\varphi}^{c},v]\!]_{G}^{\mathbb{B}}=\top$ and $[\![\tilde{\varphi}^{c},w]\!]_{H}^{\mathbb{B}}=\bot$ . Hence, $(G,v)$ and $(H,w)$ do no belong to $\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\infty\omega}\bigr{)}$ . Theorem C.2 implies that $(G,v)$ and $(H,w)$ also do not belong to $\rho_{1}\bigl{(}\mathsf{C}^{k+1,\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ . Finally, Proposition C.1 implies that $(G,v)$ and $(H,w)$ do not belong to $\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ , as desired. The third bullet is shown in precisely the same, but using the identities for $\rho_{0}$ rather than $\rho_{1}$ , and $\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ rather than $\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ .

Also the first bullet is shown in the same way, using the connection between $\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ , $\mathsf{GC}_{\infty\omega}^{2,\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , $\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ and $\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , as given by Proposition C.1, Theorem C.2, and Proposition C.3. ∎

We next show that our tensor languages are also more separating than the color refinement and $k$ -dimensional Weisfeiler-Leman algorithms.

Proposition C.5.

We have the following inclusions: For any $t\geq 0$ and any collection $\Omega$ of functions:

•

$\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ ;
•

$\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ ; and
•

$\rho_{0}\bigl{(}\mathsf{TL}_{k+1}^{(t+k)}(\Omega)\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ .

Proof.

For any of these inclusions to hold, for any $\Omega$ , we need to show the inclusion without the use of any functions. We again use the connections between the color refinement and $k$ -dimensional Weisfeiler-Leman algorithms and finite variable logics as stated in Proposition C.1. More precisely, we show for any formula $\varphi({\bm{x}})\in\mathsf{C}^{k,\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ there exists an expression $\hat{\varphi}({\bm{x}})\in\mathsf{TL}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ such that for any graph $G$ in $\mathcal{G}$ , $[\![\varphi,{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top$ implies $[\![\hat{\varphi},{\bm{v}}]\!]_{G}=1$ and $[\![\varphi,{\bm{v}}]\!]_{G}^{\mathbb{B}}=\bot$ implies $[\![\hat{\varphi},{\bm{v}}]\!]_{G}=0$ . By appropriately selecting $k$ and $t$ and by observing that when $\varphi(x)\in\mathsf{GC}$ then $\hat{\varphi}(x)\in\mathsf{GTL}$ , the inclusions follow.

The construction of $\hat{\varphi}({\bm{x}})$ is by induction on the structure of formulae in $\mathsf{C}^{k}$ .

•

$\varphi:=(x_{i}=x_{j})$ . Then, we define $\hat{\varphi}:=\bm{1}_{x_{i}=x_{j}}$ .
•

$\varphi:=P_{\ell}(x_{i})$ . Then, we define $\hat{\varphi}:=P_{\ell}(x_{i})$ .
•

$\varphi:=E(x_{i},x_{j})$ . Then, we define $\hat{\varphi}:=E(x_{i},x_{j})$ .
•

$\varphi:=\neg\varphi_{1}$ . Then, we define $\hat{\varphi}:=\bm{1}_{x_{i}=x_{i}}-\hat{\varphi}_{1}$ .
•

$\varphi:=\varphi_{1}\land\varphi_{2}$ . Then, we define $\hat{\varphi}:=\hat{\varphi}_{1}\cdot\hat{\varphi}_{2}$ .
•

$\varphi:=\exists^{\geq m}x_{i}\,\varphi_{1}$ . Consider a polynomial $p(x):=\sum_{j}a_{j}x^{j}$ such that $p(x)=0$ for $x\in\{0,1,\ldots,m-1\}$ and $p(x)=1$ for $x\in\{m,m+1,\ldots,n\}$ . Such a polynomial exists by interpolation. Then, we define $\hat{\varphi}:=\sum_{j}a_{j}\bigl{(}\sum_{x_{i}}\,\hat{\varphi}_{1}\bigr{)}^{j}$ .

We remark that we here crucially rely on the assumption that $\mathcal{G}$ contains graphs of fixed size $n$ and that $\mathsf{TL}_{k}$ is closed under linear combinations and product. Clearly, if $\varphi\in\mathsf{GC}$ , then the above translations results in an expression $\hat{\varphi}\in\mathsf{GTL}(\Omega)$ . Furthermore, the quantifier rank of $\varphi$ is in one-to-one correspondence to the summation depth of $\hat{\varphi}$ .

We can now apply Proposition C.1. That is, if $(G,v)$ and $(H,w)$ are not in $\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ then by Proposition C.1, there exists a formula $\varphi(x)$ in $\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ such that $[\![\varphi,v]\!]_{G}^{\mathbb{B}}=\top\neq[\![\varphi,w]\!]_{H}^{\mathbb{B}}=\bot$ . We have just shown when we consider $\tilde{\varphi}$ , in $\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , also $[\![\tilde{\varphi},v]\!]_{G}\neq[\![\tilde{\varphi},w]\!]_{H}$ holds. Hence, $(G,v)$ and $(H,w)$ are not in $\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}$ either, for any $\Omega$ . Hence, $\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ holds. The other bullets are shown in the same way, again by relying on Proposition C.1 and using that we can move from $\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ and $\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ to logical formulae, and to expressions in $\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ and $\mathsf{TL}_{k+1}^{(t+k)}$ , respectively, to separate $(G,v)$ from $(H,w)$ or $G$ from $H$ , respectively. ∎

Theorems 4.1, 4.2, 4.3 and 4.4 now follow directly from Propositions C.4 and C.5.

C.5 Other aggregation functions

As is mentioned in the main paper, our upper bound results on the separation power of tensor languages (and hence also of $\mathsf{GNN}\text{s}$ represented in those languages) generalize easily when other aggregation functions than summation are used in $\mathsf{TL}$ expressions.

To clarify what we understand by an aggregation function, let us first recall the semantics of summation aggregation. Let $\varphi:=\sum_{x_{i}}\varphi_{1}$ , where $\sum_{x_{i}}$ represents summation aggregation, let $G=(V_{G},E_{G},\mathsf{col}_{G})$ be a graph, and let $\nu$ be a valuation assigning index variables to vertices in $V_{G}$ . The semantics is then given by:

[\![\textstyle\sum_{x_{i}}\!\varphi_{1},\nu]\!]_{G}:=\sum_{v\in V_{G}}[\![\varphi_{1},\nu[x_{i}\mapsto v]]\!]_{G},

as explained in Section 3. Semantically, we can alternatively view $\sum_{x_{i}}\varphi_{1}$ as a function which takes the sum of the elements in the following multiset of real values:

\{\!\{[\![\varphi_{1},\nu[x_{i}\mapsto v]]\!]_{G}\mid v\in V_{G}\}\!\}.

One can now consider, more generally, an aggregation function $F$ as a function which assigns to any multiset of values in $\mathbb{R}$ a single real value. For example, $F$ could be $\mathsf{max}$ , $\mathsf{min}$ , $\mathsf{mean}$ , $\ldots$ . Let $\Theta$ be such a collection of aggregation functions. We next incorporate general aggregation function in tensor language.

First, we extend the syntax of expressions in $\mathsf{TL}(\Omega)$ by generalizing the construct $\sum_{x_{i}}\varphi$ in the grammar of $\mathsf{TL}(\Omega)$ expression. More precisely, we define $\mathsf{TL}(\Omega,\Theta)$ as the class of expressions, formed just like tensor language expressions, but in which two additional constructs, unconditional and conditional aggregation, are allowed. For an aggregation function $F$ we define:

\mathsf{aggr}_{x_{j}}^{F}(\varphi)\ \ \ \text{ and }\ \ \ \mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})\bigr{)},

where in the latter construct (conditional aggregation) the expression $\varphi(x_{j})$ represents a $\mathsf{TL}(\Omega,\Theta)$ expression whose only free variable is $x_{j}$ . The intuition behind these constructs is that unconditional aggregation $\mathsf{aggr}_{x_{j}}^{F}(\varphi)$ allows for aggregating, using aggregate function $F$ , over the values of $\varphi$ where $x_{j}$ ranges unconditionally over all vertices in the graph. In contrast, for conditional aggregation $\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})\bigr{)}$ , aggregation by $F$ of the values of $\varphi(x_{j})$ is conditioned on the neighbors of the vertex assigned to $x_{i}$ . That is, the vertices for $x_{j}$ range only among the neighbors of the vertex assigned to $x_{i}$ .

More specifically, the semantics of the aggregation constructs is defined as follows:

	$\displaystyle[\![\mathsf{aggr}_{x_{j}}^{F}(\varphi),\nu]\!]_{G}$	$\displaystyle:=F\left(\{\!\{[\![\varphi,\nu[x_{j}\mapsto v]]\!]_{G}\mid v\in V_{G}\}\!\}\right)\in\mathbb{R}.$
	$\displaystyle[\![\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})\bigr{)},\nu]\!]_{G}$	$\displaystyle:=F\left(\{\!\{[\![\varphi,\nu[x_{j}\mapsto v]]\!]_{G}\mid v\in V_{G},(\nu(x_{i}),v)\in E_{G}\}\!\}\right)\in\mathbb{R}.$

We remark that we can also consider aggregations functions $F$ over multisets of values in $\mathbb{R}^{\ell}$ for some $\ell\in\mathbb{N}$ . This requires extending the syntax with $\mathsf{aggr}_{x_{j}}^{F}(\varphi_{1},\ldots,\varphi_{\ell})$ for unconditional aggregation and with $\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi_{1}(x_{j}),\ldots,\varphi_{\ell}(x_{j})\mid E(x_{i},x_{j})\bigr{)}$ for conditional aggregation. The semantics is as expected: $F(\{\!\{(([\![\varphi_{1},\nu[x_{j}\mapsto v]]\!]_{G},\ldots,[\![\varphi_{\ell},\nu[x_{j}\mapsto v]]\!]_{G})\mid v\in V_{G}\}\!\})\in\mathbb{R}$ and $F(\{\!\{(([\![\varphi_{1},\nu[x_{j}\mapsto v]]\!]_{G},\ldots,[\![\varphi_{\ell},\nu[x_{j}\mapsto v]]\!]_{G})\mid v\in V_{G},(\nu(x_{i}),v)\in E_{G}\}\!\})\in\mathbb{R}$ .

The need for considering conditional and unconditional aggregation separately is due to the use of arbitrary aggregation functions. Indeed, suppose that one uses an aggregation function $F$ for which $0\in\mathbb{R}$ is a neutral value. That is, for any multiset $X$ of real values, the equality $F(X)=F(X\uplus\{0\})$ holds. For example, the summation aggregation function satisfies this property. We then observe:

	$\displaystyle[\![\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})\bigr{)},\nu]\!]_{G}$	$\displaystyle=F(\{\!\{[\![\varphi,\nu[x_{j}\mapsto v]]\!]\mid v\in V_{G},(\nu(x_{i}),v)\in E_{G}\}\!\}\bigr{)}$
		$\displaystyle=F(\{\!\{[\![\varphi\cdot E(x_{i},x_{j}),\nu[x_{j}\mapsto v]]\!]\mid v\in V_{G}\}\!\}\bigr{)}$
		$\displaystyle=[\![\mathsf{aggr}_{x_{j}}^{F}(\varphi(x_{j})\cdot E(x_{i},x_{j})),\nu]\!]_{G}.$

In other words, unconditional aggregation can simulate conditional aggregation. In contrast, when $0$ is not a neutral value of the aggregation function $F$ , conditional and unconditional aggregation behave differently. Indeed, in such cases $\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})\bigr{)}$ and $\mathsf{aggr}_{x_{j}}^{F}(\varphi(x_{j})\cdot E(x_{i},x_{j}))$ may evaluate to different values, as illustrated in the following example.

As aggregation function $F$ we take the average $\mathsf{avg}(X):=\frac{1}{|X|}\sum_{x\in X}x$ for multisets $X$ of real values. We remark that $0$ ’s in $X$ contribute to the size of $X$ and hence $0$ is not a neutral element of $\mathsf{avg}$ . Now, let us consider the expressions

\varphi_{1}(x_{i}):=\mathsf{aggr}_{x_{j}}^{\mathsf{avg}}(\bm{1}_{x_{j}=x_{j}}\cdot E(x_{i},x_{j}))\text{ and }\varphi_{2}(x_{i}):=\mathsf{aggr}_{x_{j}}^{\mathsf{avg}}(\bm{1}_{x_{j}=x_{j}}\mid E(x_{i},x_{j})).

Let $\nu$ be such that $\nu(x_{i})=v$ . Then, $[\![\varphi_{1},\nu]\!]_{G}$ results in applying the average to the multiset $\{\!\{\bm{1}_{w=w}\cdot E(v,w)\mid w\in V_{G}\}\!\}$ which includes the value $1$ for every $w\in N_{G}(v)$ and a $0$ for every non-neighbor $w\not\in N_{G}(v)$ . In other words, $[\![\varphi_{1},\nu]\!]_{G}$ results in $|N_{G}(v)|/|V_{G}|$ . In contrast, $[\![\varphi_{2},\nu]\!]_{G}$ results in applying the average to the multiset $\{\!\{\bm{1}_{w=w}\mid w\in V_{G},(v,w)\in E_{G}\}\!\}$ . In other words, this multiset only contains the value $1$ for each $w\in N_{G}(v)$ , ignoring any information about the non-neighbors of $v$ . In other words, $[\![\varphi_{2},\nu]\!]_{G}$ results in $|N_{G}(v)|/|N_{G}(v)|=1$ . Hence, conditional and unconditional aggregation behave differently for the average aggregation function.

This said, one could alternative use a more general variant of conditional aggregation of the form $\mathsf{aggr}_{x_{j}}^{F}(\varphi|\psi)$ with as semantics $[\![\mathsf{aggr}_{x_{j}}^{F}(\varphi|\psi),\nu]\!]_{G}:=F\bigl{(}\{\!\{[\![\varphi,\nu[x_{j}\to v]]\!]_{G}\mid v\in V_{G},[\![\psi,\nu[x_{j}\to v]\!]_{G}\neq 0\}\!\}\bigr{)}$ where one creates a multiset only for those valuations $\nu[x_{j}\to v]$ for which the condition $\psi$ evaluates to a non-zero value. This general form of aggregation includes conditional aggregation, by replacing $\psi$ with $E(x_{i},x_{j})$ and restricting $\varphi$ , and unconditional aggregation, by replacing $\psi$ with the constant function $1$ , e.g., $\bm{1}_{x_{j}=x_{j}}$ . In order not to overload the syntax of $\mathsf{TL}$ expressions, we will not discuss this general form of aggregation further.

The notion of free index variables for expressions in $\mathsf{TL}(\Omega,\Theta)$ is defined as before, where now $\mathsf{free}(\mathsf{aggr}_{x_{j}}^{F}(\varphi)):=\mathsf{free}(\varphi)\setminus\{x_{j}\}$ , and where $\mathsf{free}(\mathsf{aggr}_{x_{j}}^{F}\bigl{(}\varphi(x_{j})\mid E(x_{i},x_{j})):=\{x_{i}\}$ (recall that $\mathsf{free}(\varphi(x_{j}))=\{x_{j}\}$ in conditional aggregation). Moreover, summation depth is replaced by the notion of aggregation depth, $\mathsf{agd}(\varphi)$ , defined in the same way as summation depth except that $\mathsf{agd}(\mathsf{aggr}_{x_{j}}^{F}(\varphi)):=\mathsf{agd}(\varphi)+1$ and $\mathsf{agd}(\mathsf{aggr}_{x_{j}}^{F}(\varphi(x_{j})\mid E(x_{i},x_{j})):=\mathsf{agd}(\varphi)+1$ . Similarly, the fragments $\mathsf{TL}_{k}(\Omega,\Theta)$ and its aggregation depth restricted fragment $\mathsf{TL}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega,\Theta)$ are defined as before, using aggregation depth rather than summation depth.

For the guarded fragment, $\mathsf{GTL}(\Omega,\Theta)$ , expressions are now restricted such that aggregations must occur only in the form $\mathsf{aggr}_{x_{j}}^{F}(\varphi(x_{j})\mid E(x_{i},x_{j}))$ , for $i,j\in[2]$ . In other words, aggregation only happens on multisets of values obtained from neighboring vertices.

We now argue that our upper bound results on the separation power remain valid for the extension $\mathsf{TL}(\Omega,\Theta)$ of $\mathsf{TL}(\Omega)$ with arbitrary aggregation functions $\Theta$ .

Proposition C.6.

We have the following inclusions: For any $t\geq 0$ , any collection $\Omega$ of functions and any collection $\Theta$ of aggregation functions:

•

$\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega,\Theta)\bigr{)}$ ;
•

$\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega,\Theta)\bigr{)}$ ; and
•

$\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{TL}_{k+1}^{(t+1)}(\Omega,\Theta)\bigr{)}$ .

Proof.

It suffices to show that Proposition C.3 also holds for expressions in the fragments of $\mathsf{TL}(\Omega,\Theta)$ considered. In particular, we only need to revise the case of summation aggregation (that is, $\varphi:=\sum_{x_{i}}\varphi_{1}$ ) in the proof of Proposition C.3. Indeed, let us consider the more general case when one of the two aggregating functions are used.

•

$\varphi:=\mathsf{aggr}_{x_{i}}^{F}(\varphi_{1})$ . We then define

\tilde{\varphi}^{c}:=\bigvee_{\ell\in\mathbb{N}}\bigvee_{(m_{1},\ldots,m_{\ell})\in\mathbb{N}^{\ell}}\bigvee_{(c,c_{1},\ldots,c_{\ell})\in\mathcal{C}(m_{1},\ldots,m_{\ell},F)}\bigwedge_{s=1}^{\ell}\exists^{=m_{s}}x_{i}\,\tilde{\varphi}_{1}^{c_{s}}\land\forall x_{i}\,\bigvee_{s=1}^{\ell}\tilde{\varphi}_{1}^{c_{s}},

where $\mathcal{C}(m_{1},\ldots,m_{\ell},F)$ now consists of all $(c,c_{1},\dots,c_{\ell})\in\mathbb{R}^{\ell+1}$ such that

c=F\Bigl{(}\{\!\{\underbrace{c_{1},\ldots,c_{1}}_{\text{$m_{1}$ times}},\ldots,\underbrace{c_{\ell},\ldots,c_{\ell}}_{\text{$m_{\ell}$ times}}\}\!\}\Bigr{)}.

•

$\varphi:=\mathsf{aggr}_{x_{i}}^{F}(\varphi_{1}(x_{i})\mid E(x_{j},x_{i}))$ . We then define

\tilde{\varphi}^{c}:=\bigvee_{\ell\in\mathbb{N}}\bigvee_{(m_{1},\ldots,m_{\ell})\in\mathbb{N}^{\ell}}\bigvee_{(c,c_{1},\ldots,c_{\ell})\in\mathcal{C}(m_{1},\ldots,m_{\ell},F)}\exists^{=m}x_{i}\,E(x_{j},x_{i})\ \land\\ \bigwedge_{s=1}^{\ell}\exists^{=m_{s}}x_{i}\,E(x_{j},x_{i})\ \land\ \tilde{\varphi}_{1}^{c_{s}}(x_{i})

where $\mathcal{C}(m_{1},\ldots,m_{\ell},F)$ again consists of all $(c,c_{1},\dots,c_{\ell})\in\mathbb{R}^{\ell+1}$ such that

c=F\Bigl{(}\{\!\{\underbrace{c_{1},\ldots,c_{1}}_{\text{$m_{1}$ times}},\ldots,\underbrace{c_{\ell},\ldots,c_{\ell}}_{\text{$m_{\ell}$ times}}\}\!\}\Bigr{)}\text{ and }m=\sum_{s=1}^{\ell}m_{s}.

It is readily verified that $[\![\mathsf{aggr}_{x_{i}}^{F}(\varphi),{\bm{v}}]\!]_{G}=c$ iff $[\![\tilde{\varphi}^{c},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top$ , and $[\![\mathsf{aggr}_{x_{i}}^{F}(\varphi(x_{i})\mid E(x_{j},x_{i})),{\bm{v}}]\!]_{G}=c$ iff $[\![\tilde{\varphi}^{c},{\bm{v}}]\!]_{G}^{\mathbb{B}}=\top$ , as desired.

For the guarded case, we note that the expression $\tilde{\varphi}^{c}$ above yields a guarded expression as long conditional aggregation is used of the form $\mathsf{aggr}_{x_{i}}^{F}(\varphi(x_{i})\mid E(x_{j},x_{i}))$ with $i,j\in[2]$ , so we can reuse the argument in the proof of Proposition C.3 for the guarded case.∎

We will illustrate later on (Section D) that this generalization allows for assessing the separation power of $\mathsf{GNN}\text{s}$ that use a variety of aggregation functions.

The choice of supported aggregation functions has, of course, an impact on the ability of $\mathsf{TL}(\Omega,\Theta)$ to match color refinement or the $k\text{-}\mathsf{WL}$ procedures in separation power. The same holds for $\mathsf{GNN}\text{s}$ , as shown by Xu et al. (2019). And indeed, the proof of Proposition C.5 relies on the presence of summation aggregation. We note that most lower bounds on the separation power of $\mathsf{GNN}\text{s}$ in terms of color refinement or the $k\text{-}\mathsf{WL}$ procedures assume summation aggregation since summation suffices to construct injective sum-decomposable functions on multisets (Xu et al., 2019; Zaheer et al., 2017), which are used to simulate color refinement and $k\text{-}\mathsf{WL}$ . A more in-depth analysis of lower bounding $\mathsf{GNN}\text{s}$ with less expressive aggregation functions, possibly using weaker versions of color refinement and $k\text{-}\mathsf{WL}$ is left as future work.

C.6 Generalization to Graphs with real-valued vertex labels

We next consider the more general setting in which $\mathsf{col}_{G}:V_{G}\to\mathbb{R}^{\ell}$ for some $\ell\in\mathbb{N}$ . That is, vertices in a graph can carry real-valued vectors. We remark that no changes to neither the syntax nor the semantics of $\mathsf{TL}$ expressions are needed, yet note that $[\![P_{s}(x),\nu]\!]_{G}:=\mathsf{col}_{G}(\nu)_{s}$ is now an element in $\mathbb{R}$ rather than $0$ or $1$ , for each $s\in[\ell]$ .

A first observation is that the color refinement and $k\text{-}\mathsf{WL}$ procedures treat each real value as a separate label. That is, two values that differ only by any small $\epsilon>0$ , are considered different. The proofs of Theorem 4.1, 4.2, 4.3 and 4.4 rely on connections between color refinement and $k\text{-}\mathsf{WL}$ and the finite variable logics $\mathsf{GC}$ and $\mathsf{C}^{k+1}$ , respectively. In the discrete context, the unary predicates $P_{s}(x)$ used in the logical formulas indicate which label vertices have. That is, $[\![P_{s},v]\!]_{G}^{\mathbb{B}}=\top$ iff $\mathsf{col}_{G}(v)_{s}=1$ . To accommodate for real values in the context of separation power, these logics now need to be able to differentiate between different labels, that is, different real numbers. We therefore extend the unary predicates allowed in formulas. More precisely, for each dimension $s\in[\ell]$ , we now have uncountably many predicates of the form $P_{s,r}$ , one for each $r\in\mathbb{R}$ . In any formula in $\mathsf{GC}$ or $\mathsf{C}^{k+1}$ only a finite number of such predicates may occur. The Boolean semantics of these new predicates is as expected:

[\![P_{s,r}(x),\nu]\!]_{G}^{\mathbb{B}}:=\mathrm{if~{}}\mathsf{col}_{G}(\mu(x_{i}))_{s}=r\text{~{}then~{}}\top\text{~{}else~{}}\bot.

In other words, in our logics, we can now detect which real-valued labels vertices have. Although, in general, the introduction of infinite predicates may cause problems, we here consider a specific setting in which the vertices in a graph have a unique label. This is commonly assumed in graph learning. Given this, it is easily verified that all results in Section C.2 carry over, where all logics involved now use the unary predicates $P_{s,r}$ with $s\in[\ell]$ and $r\in\mathbb{R}$ .

The connection between $\mathsf{TL}$ and logics also carries over. First, for Proposition C.3 we now need to connect $\mathsf{TL}$ expressions, that use a finite number of predicates $P_{s}$ , for $s\in[\ell]$ , with the extended logics having uncountably many predicates $P_{s,r}$ , for $s\in[\ell]$ and $r\in\mathbb{R}$ , at their disposal. It suffices to reconsider the case $\varphi(x_{i})=P_{s}(x_{i})$ in the proof of Proposition C.3. More precisely, $[\![P_{s}(x_{i}),\nu]\!]_{G}$ can now be an arbitrary value $c\in\mathbb{R}$ . We now simply define $\tilde{\varphi}^{c}(x_{i}):=P_{s,c}(x_{i})$ . By definition $[\![P_{s}(x_{i}),\nu]\!]_{G}=c$ if and only if $[\![P_{s,c}(x_{i}),\nu]\!]_{G}^{\mathbb{B}}=\top$ , as desired.

The proof for the extended version of proposition C.5 now needs a slightly different strategy, where we build the relevant $\mathsf{TL}$ formula after we construct the contrapositive of the Proposition. Let us first show how to construct a $\mathsf{TL}$ formula that is equivalent to a logical formula on any graph using only labels in a specific (finite) set $R$ of real numbers.

In other words, given a set $R$ of real values, we show that for any formula $\varphi({\bm{x}})\in\mathsf{C}^{k,(t)}$ using unary predicates $P_{s,r}$ such that $r\in R$ , we can construct the desired $\hat{\varphi}$ . As mentioned, we only need to reconsider the case $\varphi(x_{i}):=P_{s,r}(x_{i})$ . We define

\hat{\varphi}:=\frac{1}{\prod_{r^{\prime}\in R,r\neq r^{\prime}}r-r^{\prime}}\prod_{r^{\prime}\in R,r\neq r^{\prime}}(P_{s}(x_{i})-r^{\prime}\bm{1}_{x_{i}=x_{i}}).

Then, $[\![\hat{\varphi},\nu]\!]_{G}$ evaluates to

\frac{\prod_{r^{\prime}\in R,r\neq r^{\prime}}(r-r^{\prime})}{\prod_{r^{\prime}\in R,r\neq r^{\prime}}(r-r^{\prime})}=\begin{cases}1&[\![P_{s,r},\nu]\!]=\top\\ 0&[\![P_{s,r},\nu]\!]=\bot\end{cases}.

Indeed, if $[\![P_{s,r},\nu]\!]=\top$ , then $\mathsf{col}_{G}(v)_{s}=r$ and hence $[\![P_{s},v]\!]_{G}=r$ , resulting in the same nominator and denominator in the above fraction. If $[\![P_{s,r},\nu]\!]=\bot$ , then $\mathsf{col}_{G}(v)_{s}=r^{\prime}$ for some value $r^{\prime}\in R$ with $r\neq r^{\prime}$ . In this case, the nominator in the above fraction becomes zero. We remark that this revised construction still results in a guarded $\mathsf{TL}$ expression, when the input logical formula is guarded as well.

Coming back to the proof of the extended version of Proposition C.5, let us show the proof for the the fact that $\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)},$ the other two items being analogous. Assume that there is a pair $(G,v)$ and $(H,w)$ which is not in $\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ . Then, by Proposition C.1, applied on graphs with real-valued labels, there exists a formula $\varphi(x)$ in $\mathsf{GC}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ such that $[\![\varphi,v]\!]_{G}^{\mathbb{B}}=\top\neq[\![\varphi,w]\!]_{H}^{\mathbb{B}}=\bot$ . We remark that $\varphi(x)$ uses finitely many $P_{s,r}$ predicates. Let $R$ be the set of real values used in both $G$ and $H$ (and $\varphi(x)$ ). We note that $R$ is finite. We invoke the construction sketched above, and obtain a formula $\hat{\varphi}$ in $\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ such that $[\![\tilde{\varphi},v]\!]_{G}\neq[\![\tilde{\varphi},w]\!]_{H}$ . Hence, $(G,v)$ and $(H,w)$ is not in $\rho_{1}\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}$ either, for any $\Omega$ , which was to be shown.

Appendix D Details of Section 5

We here provide some additional details on the encoding of layers of $\mathsf{GNN}\text{s}$ in our tensor languages, and how, as a consequence of our results from Section 4, one obtains a bound on their separation power. This section showcases that it is relatively straightforward to represent $\mathsf{GNN}\text{s}$ in our tensor languages. Indeed, often, a direct translation of the layers, as defined in the literature, suffices.

D.1 color Refinement

We start with $\mathsf{GNN}$ architectures related to color refinement, or in other words, architectures which can be represented in our guarded tensor language.

GraphSage.

We first consider a “basic” $\mathsf{GNN}$ , that is, an instance of GraphSage (Hamilton et al., 2017) in which sum aggregation is used. The initial features are given by ${\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}=({\bm{f}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{1},\ldots,{\bm{f}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{d_{0}})$ where ${\bm{f}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{i}\in\mathbb{R}^{n\times 1}$ is a hot-one encoding of the $i$ th vertex label in $G$ . We can represent the initial embedding easily in $\mathsf{GTL}^{\!\scalebox{0.6}{(}0\scalebox{0.6}{)}}$ , without the use of any summation. Indeed, it suffices to define $\varphi_{i}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1}):=P_{i}(x_{1})$ for $i\in[d_{0}]$ . We have ${F}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{vj}=[\![\varphi_{j}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}},v]\!]_{G}$ for $j\in[d_{0}]$ , and thus the initial features can be represented by simple expressions in $\mathsf{GTL}^{\!\scalebox{0.6}{(}0\scalebox{0.6}{)}}$ .

Assume now, by induction, that we can also represent the features computed by a basic $\mathsf{GNN}$ in layer $t-1$ . That is, let ${\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\in\mathbb{R}^{n\times d_{t-1}}$ be those features and for each $i\in[d_{t-1}]$ let $\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})$ be expressions in $\mathsf{GTL}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(\sigma)$ representing them. We assume that, for each $i\in[d_{t-1}]$ , ${F}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{vi}=[\![\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{i},v]\!]_{G}$ . We remark that we assume that a summation depth of $t-1$ is needed for layer $t-1$ .

Then, in layer $t$ , a basic $\mathsf{GNN}$ computes the next features as

{\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:=\sigma\left({\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\cdot{\bm{V}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}+{\bm{A}}\cdot{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\cdot{\bm{W}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}+{\bm{B}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\right),

where ${\bm{A}}\in\mathbb{R}^{n\times n}$ is the adjacency matrix of $G$ , ${\bm{V}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ and ${\bm{W}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ are weight matrices in $\mathbb{R}^{d_{t-1}\times d_{t}}$ , ${\bm{B}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\in\mathbb{R}^{n\times d_{t}}$ is a (constant) bias matrix consist of $n$ copies of ${\bm{b}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\in\mathbb{R}^{d_{t}}$ , and $\sigma$ is some activation function. We can simply use the following expressions $\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})$ , for $j\in[d_{t}]$ :

\sigma\left(\Bigl{(}\sum_{i=1}^{d_{t-1}}\!{V}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})\Bigr{)}+\sum_{x_{2}}\,\biggl{(}E(x_{1},x_{2})\cdot\Bigl{(}\sum_{i=1}^{d_{t-1}}\!{W}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2})\Bigr{)}\biggl{)}{}+b_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\bm{1}_{x_{1}=x_{1}}\right).

Here, ${W}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , ${V}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ and $b_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ are real values corresponding the weight matrices and bias vector in layer $t$ . These are expressions in $\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\sigma)$ since the additional summation is guarded, and combined with the summation depth of $t-1$ of $\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}$ , this results in a summation depth of $t$ for layer $t$ . Furthermore, ${F}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{vi}=[\![\varphi^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{i},v]\!]_{G}$ , as desired. If we denote by $\mathsf{b}\mathsf{GNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ the class of $t$ -layered basic $\mathsf{GNN}\text{s}$ , then our results imply

\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}(\bigl{(}\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{b}\mathsf{GNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)},

and thus the separation power of basic $\mathsf{GNN}\text{s}$ is bounded by the separation power of color refinement. We thus recover known results by Xu et al. (2019) and Morris et al. (2019).

Furthermore, if one uses a readout layer in basic $\mathsf{GNN}\text{s}$ to obtain a graph embedding, one typically applies a function $\mathsf{ro}:\mathbb{R}^{d_{t}}\to\mathbb{R}^{d_{t}}$ in the form of $\mathsf{ro}\bigl{(}\sum_{v\in V_{G}}{\bm{F}}_{v}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ , in which aggregation takes places over all vertices of the graph. This corresponds to an expression in $\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\sigma,\mathsf{ro})$ : $\varphi_{j}:=\mathsf{ro}_{j}\bigl{(}\sum_{x_{1}}\varphi_{j}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})\bigr{)}$ , where $\mathsf{ro}_{j}$ is the projection of the readout function on the $j$ the coordinate. We note that this is indeed not a guarded expression anymore, and thus our results tell that

\rho_{0}\bigl{(}\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}(\mathsf{TL}_{2}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{b}\mathsf{GNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}+\mathsf{readout}\bigr{)}.

More generally, GraphSage allows for the use of general aggregation functions $F$ on the multiset of features of neighboring vertices. To cast the corresponding layers in $\mathsf{TL}(\Omega)$ , we need to consider the extension $\mathsf{TL}(\Omega,\Theta)$ with an appropriate set $\Theta$ of aggregation functions, as described in Section C.5. In this way, we can represent layer $t$ by means of the following expressions $\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})$ , for $j\in[d_{t}]$ .

\sigma\left(\Bigl{(}\sum_{i=1}^{d_{t-1}}\!{V}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})\Bigr{)}+\sum_{i=1}^{d_{t-1}}\!{W}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\mathsf{aggr}_{x_{2}}^{F}\Bigl{(}\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2})\mid E(x_{1},x_{2})\Bigr{)}{}+b_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\bm{1}_{x_{1}=x_{1}}\right),

which is now an expression in $\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\{\sigma\},\Theta)$ and hence the bound in terms of $t$ iterations of color refinement carries over by Proposition C.6. Here, $\Theta$ simply consists of the aggregation functions used in the layers in GraphSage.

GCNs.

Graph Convolution Networks ( $\mathsf{GCN}\text{s}$ ) (Kipf & Welling, 2017) operate alike basic $\mathsf{GNN}\text{s}$ except that a normalized Laplacian ${\bm{D}}^{-1/2}({\bm{I}}+{\bm{A}}){\bm{D}}^{-1/2}$ is used to aggregate features, instead of the adjacency matrix ${\bm{A}}$ . Here, ${\bm{D}}^{-1/2}$ is the diagonal matrix consisting of reciprocal of the square root of the vertex degrees in $G$ plus $1$ . The initial embedding ${\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}$ is just as before. We use again $d_{t}$ to denote the number of features in layer $t$ . In layer $t>0$ , a $\mathsf{GCN}$ computes ${\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:=\sigma({\bm{D}}^{-1/2}({\bm{I}}+{\bm{A}}){\bm{D}}^{-1/2}\cdot{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}{\bm{W}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}+{\bm{B}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ . If, in addition to the activation function $\sigma$ we add the function $\frac{1}{\sqrt{x+1}}:\mathbb{R}\to\mathbb{R}:x\mapsto\frac{1}{\sqrt{x+1}}$ to $\Omega$ , we can represent the $\mathsf{GCN}$ layer, as follows. For $j\in[d_{t}]$ , we define the $\mathsf{GTL}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\sigma,\frac{1}{\sqrt{x+1}})$ expressions

\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}):=\sigma\Biggl{(}f_{1/\sqrt{x+1}}\bigl{(}\sum_{x_{2}}E(x_{1},x_{2})\bigr{)}\cdot\Bigl{(}\sum_{i=1}^{d_{t-1}}\!{W}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})\Bigr{)}\cdot f_{1/\sqrt{x+1}}\bigl{(}\sum_{x_{2}}E(x_{1},x_{2})\bigr{)}\\ {}+f_{1/\sqrt{x+1}}\bigl{(}\sum_{x_{2}}E(x_{1},x_{2})\bigr{)}\cdot\Bigl{(}\sum_{x_{2}}E(x_{1},x_{2})\cdot f_{1/\sqrt{x+1}}\bigl{(}\sum_{x_{1}}E(x_{2},x_{1})\bigr{)}\cdot\Bigl{(}\sum_{i=1}^{d_{t-1}}\!{W}_{ij}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2})\Bigr{)}\Biggr{)},

where we omitted the bias vector for simplicity. We again observe that only guarded summations are needed. However, we remark that in every layer we now add two the overall summation depth, since we need an extra summation to compute the degrees. In other words, a $t$ -layered $\mathsf{GCN}$ correspond to expressions in $\mathsf{GTL}^{\!\scalebox{0.6}{(}2t\scalebox{0.6}{)}}(\sigma,\frac{1}{\sqrt{x+1}})$ . If we denote by $\mathsf{GCN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ the class of $t$ -layered $\mathsf{GCN}\text{s}$ , then our results imply

\rho_{1}\bigl{(}\mathsf{cr}^{(2t)}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GTL}^{\!\scalebox{0.6}{(}2t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GCN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}.

We remark that another representation can be provided, in which the degree computation is factored out (Geerts et al., 2021a), resulting in a better upper bound $\rho_{1}\bigl{(}\mathsf{cr}^{(t+1)}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{GCN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ . In a similar way as for basic $\mathsf{GNN}\text{s}$ , we also have $\rho_{0}\bigl{(}\mathsf{gcr}^{(t+1)}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{GCN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}+\mathsf{readout}\bigr{)}$ .

SGCs.

As an other example, we consider a variation of Simple Graph Convolutions ( $\mathsf{SGC}\text{s}$ ) (Wu et al., 2019), which use powers the adjacency matrix and only apply a non-linear activation function at the end. That is, ${\bm{F}}:=\sigma({\bm{A}}^{p}\cdot{\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}\cdot{\bm{W}})$ for some $p\in\mathbb{N}$ and ${\bm{W}}\in\mathbb{R}^{d_{0}\times d_{1}}$ . We remark that $\mathsf{SGC}\text{s}$ actually use powers of the normalized Laplacian, that is, ${\bm{F}}:=\sigma\bigl{(}({\bm{D}}^{-1/2}({\bm{I}}+{\bm{A}}_{G}){\bm{D}}^{-1/2}))^{p}\cdot{\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}\cdot{\bm{W}}\bigr{)}$ but this only incurs an additional summation depth as for $\mathsf{GCN}\text{s}$ . We focus here on our simpler version. It should be clear that we can represent the architecture in $\mathsf{TL}_{p+1}^{\!\scalebox{0.6}{(}p\scalebox{0.6}{)}}(\Omega)$ by means of the expressions:

\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}):=\sigma\left(\sum_{x_{2}}\cdots\sum_{x_{p+1}}\prod_{k=1}^{p}E(x_{k},x_{k+1})\cdot\Bigl{(}\sum_{i=1}^{d_{0}}{W}_{ij}\cdot\varphi_{i}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{p+1})\Bigr{)}\right),

for $j\in[d_{1}]$ . A naive application of our results would imply an upper bound on their separation power by $p\text{-}\mathsf{WL}$ . We can, however, use Proposition 4.5. Indeed, it is readily verified that these expressions have a treewidth of one, because the variables form a path. And indeed, when for example, $p=3$ , we can equivalently write $\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})$ as

\sigma\left(\sum_{x_{2}}E(x_{1},x_{2})\cdot\biggl{(}\sum_{x_{1}}E(x_{2},x_{1})\cdot\Bigl{(}\sum_{x_{2}}E(x_{1},x_{2})\cdot\bigl{(}\sum_{i=1}^{d_{0}}{W}_{ij}\cdot\varphi_{i}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{2})\bigr{)}\Bigr{)}\biggl{)}\right),

by reordering the summations and reusing index variables. This holds for arbitrary $p$ . We thus obtain guarded expressions in $\mathsf{GTL}^{\!\scalebox{0.6}{(}p\scalebox{0.6}{)}}(\sigma)$ and our results tell that $t$ -layered $\mathsf{SGC}\text{s}$ are bounded by $\mathsf{cr}^{\scalebox{0.6}{(}p\scalebox{0.6}{)}}$ for vertex embeddings, and by $\mathsf{gcr}^{\scalebox{0.6}{(}p\scalebox{0.6}{)}}$ for $\mathsf{SGC}\text{s}+\mathsf{readout}$ .

Principal Neighbourhood Aggregation.

Our next example is a $\mathsf{GNN}$ in which different aggregation functions are used: Principal Neighborhood Aggregation $(\mathsf{PNA})$ is an architecture proposed by Corso et al. (2020) in which aggregation over neighboring vertices is done by means of $\mathsf{mean}$ , $\mathsf{stdv}$ , $\mathsf{max}$ and $\mathsf{min}$ , and this in parallel. In addition, after aggregation, three different scalers are applied. Scalers are diagonal matrices whose diagonal entries are a function of the vertex degrees. Given the features for each vertex $v$ computed in layer $t-1$ , that is, ${\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v:}\in\mathbb{R}^{1\times\ell}$ , a $\mathsf{PNA}$ computes $v$ ’s new features in layer $t$ in the following way (see layer definition (8) in (Corso et al., 2020)). First, vectors ${\bm{G}}_{v:}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\in\mathbb{R}^{1\times 4\ell}$ are computed such that

{G}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}=\begin{cases}\mathsf{mean}\left(\{\!\{\mathsf{mlp}_{j}({F}_{w:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\mid w\in N_{G}(v)\}\!\}\right)&\text{for $1\leq j\leq\ell$}\\ \mathsf{stdv}\left(\{\!\{\mathsf{mlp}_{j}({F}_{w:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\mid w\in N_{G}(v)\}\!\}\right)&\text{for $\ell+1\leq j\leq 2\ell$}\\ \mathsf{max}\left(\{\!\{\mathsf{mlp}_{j}({F}_{w:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\mid w\in N_{G}(v)\}\!\}\right)&\text{for $2\ell+1\leq j\leq 3\ell$}\\ \mathsf{min}\left(\{\!\{\mathsf{mlp}_{j}({F}_{w:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\mid w\in N_{G}(v)\}\!\}\right)&\text{for $3\ell+1\leq j\leq 4\ell$},\end{cases}

where $\mathsf{mlp}_{j}:\mathbb{R}^{\ell}\to\mathbb{R}$ is the projection of an $\mathsf{MLP}$ $\mathsf{mlp}:\mathbb{R}^{\ell}\to\mathbb{R}^{\ell}$ on the $j$ th coordinate. Then, three different scalers are applied. The first scaler is simply the identity, the second two scalers $s_{1}$ and $s_{2}$ depend on the vertex degrees. As such, vectors ${\bm{H}}_{v:}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\in\mathbb{R}^{12\ell}$ are constructed as follows:

{H}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}=\begin{cases}{H}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}&\text{for $1\leq j\leq 4\ell$}\\ s_{1}(\mathsf{deg}_{G}(v))\cdot{H}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}&\text{for $4\ell+1\leq j\leq 8\ell$}\\ s_{2}(\mathsf{deg}_{G}(v))\cdot{H}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}&\text{for $8\ell+1\leq j\leq 12\ell$},\end{cases}

where $s_{1}$ and $s_{2}$ are functions from $\mathbb{R}\to\mathbb{R}$ (see (Corso et al., 2020) for details). Finally, the new vertex embedding is obtained as

{\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v:}=\mathsf{mlp}^{\prime}({\bm{H}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v:})

for some $\mathsf{MLP}$ $\mathsf{mlp}^{\prime}:\mathbb{R}^{12\ell}\to\mathbb{R}^{\ell}$ . The above layer definition translates naturally into expressions in $\mathsf{TL}(\Omega,\Theta)$ , the extension of $\mathsf{TL}(\Omega)$ with aggregate functions (Section C.5). Indeed, suppose that for each $j\in[\ell]$ we have $\mathsf{TL}(\Omega,\Theta)$ expressions $\varphi_{j}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})$ such that $[\![\varphi_{j}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}},v]\!]_{G}={F}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{vj}$ for any vertex $v$ . Then, ${G}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ simply corresponds to the guarded expressions

\psi^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{j}(x_{1}):=\mathsf{aggr}_{x_{2}}^{\mathsf{mean}}(\mathsf{mlp}_{j}(\varphi_{1}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2}),\ldots,\varphi_{\ell}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2}))\mid E(x_{1},x_{2})),

for $1\leq j\leq\ell$ , and similarly for the other components of ${\bm{G}}_{v:}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ using the respective aggregation functions, $\mathsf{stdv}$ , $\mathsf{max}$ and $\mathsf{min}$ . Then, ${H}_{vj}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ corresponds to

\xi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})=\begin{cases}\psi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})&\text{for $1\leq j\leq 4\ell$}\\ s_{1}(\mathsf{aggr}_{x_{2}}^{\mathsf{sum}}(\bm{1}_{x_{2}=x_{2}}\mid E(x_{1},x_{2})))\cdot\psi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})&\text{for $4\ell+1\leq j\leq 8\ell$}\\ s_{2}(\mathsf{aggr}_{x_{2}}^{\mathsf{sum}}(\bm{1}_{x_{2}=x_{2}}\mid E(x_{1},x_{2})))\cdot\psi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})&\text{for $8\ell+1\leq j\leq 12\ell$},\end{cases}

where we use summation aggregation to compute the degree information used in the functions in the scalers $s_{1}$ and $s_{2}$ . And finally,

\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:=\mathsf{mlp}_{j}^{\prime}(\xi_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}),\ldots,\xi_{12\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}))

represents ${F}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{vj}$ . We see that all expressions only use two index variables and aggregation is applied in a guarded way. Furthermore, in each layer, the aggregation depth increases with one. As such, a $t$ -layered $\mathsf{PNA}$ can be represented in $\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega,\Theta)$ , where $\Omega$ consists of the $\mathsf{MLP}\text{s}$ and functions used in scalers, and $\Theta$ consists of $\mathsf{sum}$ (for computing vertex degrees), and $\mathsf{mean}$ , $\mathsf{stdv}$ , $\mathsf{max}$ and $\mathsf{min}$ . Proposition C.6 then implies a bound on the separation power by $\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ .

Other example.

In the same way, one can also easily analyze $\mathsf{GAT}\text{s}$ (Velickovic et al., 2018) and show that these can be represented in $\mathsf{GTL}(\Omega)$ as well, and thus bounds by color refinement can be obtained.

D.2 $k$ -dimensional Weisfeiler-Leman tests

We next discuss architectures related to the $k$ -dimensional Weisfeiler-Leman algorithms. For $k=1$ , we discussed the extended $\mathsf{GIN}\text{s}$ in the main paper. We here focus on arbitrary $k\geq 2$ .

Folklore GNNs.

We first consider the “Folklore” $\mathsf{GNN}\text{s}$ or $k\text{-}\mathsf{FGNN}\text{s}$ for short (Maron et al., 2019b). For $k\geq 2$ , $k\text{-}\mathsf{FGNN}\text{s}$ computes a tensors. In particular, the initial tensor ${\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}$ encodes $\mathsf{atp}_{k}(G,{\bm{v}})$ for each ${\bm{v}}\in V_{G}^{k}$ . We can represent this tensor by the following $k^{2}(\ell+2)$ expressions in $\mathsf{TL}_{k}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}$ :

\varphi_{r,s,j}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1},\ldots,x_{k}):=\begin{cases}\bm{1}_{x_{r}=x_{s}}\cdot P_{j}(x_{r})&\text{for $j\in[\ell]$}\\ E(x_{r},x_{s})&\text{for $j=\ell+1$}\\ \bm{1}_{x_{r}=x_{s}}&\text{for $j=\ell+2$}\end{cases},

for $r,s\in[k]$ and $j\in[\ell+2]$ . We note: $[\![\varphi^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{r,s,j},(v_{1},\ldots,v_{k})]\!]_{G}={\mathsfit{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},r,s,j}$ for all $(r,s,j)\in[k]^{2}\times[\ell+2]$ , as desired. We let $\tau_{0}:=[k]^{2}\times[\ell+2]$ and set $d_{0}=k^{2}\times(\ell+2)$ .

Then, in layer $t$ , a $k\text{-}\mathsf{FGNN}$ computes a tensor

{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},\bullet}:=\mathsf{mlp}_{0}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigl{(}{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},\bullet},\sum_{w\in V_{G}}\prod_{s=1}^{k}\mathsf{mlp}_{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}({\bm{\mathsfit{F}}}_{v_{1},\ldots,v_{s-1},w,v_{s+1},\ldots,v_{k},\bullet}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\bigr{)},

where $\mathsf{mlp}_{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathbb{R}^{d_{t-1}}\to\mathbb{R}^{d_{t}^{\prime}}$ , for $s\in[k]$ , and and $\mathsf{mlp}_{0}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}:\mathbb{R}^{d_{t-1}\times d_{t}^{\prime}}\to\mathbb{R}^{d_{t}}$ are $\mathsf{MLP}\text{s}$ . We here use $\bullet$ to denote combinations of indices in $\tau_{d}$ for ${\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ and in $\tau_{d-1}$ for ${\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}$ .

Let ${\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\in\mathbb{R}^{n^{k}\times d_{t-1}}$ be the tensor computed by an $k\text{-}\mathsf{FGNN}$ in layer $t-1$ . Assume that for each tuple of elements ${\bm{j}}$ in $\tau_{d_{t-1}}$ we have an expression $\varphi_{{\bm{j}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1},\ldots,x_{k})$ satisfying $[\![\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{{\bm{j}}},(v_{1},\ldots,v_{k})]\!]_{G}={\mathsfit{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},{\bm{j}}}$ and such that it is an expression in $\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(\Omega)$ . That is, we need $k+1$ index variables and a summation depth of $t-1$ to represent layer $t-1$ .

Then, for layer $t$ , for each ${\bm{j}}\in\tau_{d_{t}}$ , it suffices to consider the expression

\varphi_{\bm{j}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1},\ldots,x_{k}):=\mathsf{mlp}_{0,{\bm{j}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\Bigl{(}\bigl{(}\varphi_{{\bm{i}}}^{(t-1)}(x_{1},\ldots,x_{k})\bigr{)}_{{\bm{i}}\in\tau_{d_{t-1}}},{}\\ \sum_{x_{k+1}}\,\prod_{s=1}^{k}\mathsf{mlp}_{s,{\bm{j}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigl{(}(\varphi_{{\bm{i}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1},\ldots,x_{s-1},x_{k+1},x_{s+1},\ldots,x_{k})\bigr{)}_{{\bm{i}}\in\tau_{d_{t-1}}}\Bigr{)},

where $\mathsf{mlp}_{o,{\bm{j}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ and $\mathsf{mlp}_{s,{\bm{j}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ are the projections of the $\mathsf{MLP}\text{s}$ on the ${\bm{j}}$ -coordinates. We remark that we need $k+1$ index variables, and one extra summation is needed. We thus obtain expressions in $\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ for the $t$ th layer, as desired. We remark that the expressions are simple translations of the defining layer definitions. Also, in this case, $\Omega$ consists of all $\mathsf{MLP}\text{s}$ . When a $k\text{-}\mathsf{FGNN}$ is used for vertex embeddings, we now simply add to each expression a factor $\prod_{s=1}^{k}\bm{1}_{x_{1}=x_{s}}$ . As an immediate consequence of our results, if we denote by $k\text{-}\mathsf{FGNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ the class of $t$ -layered $k\text{-}\mathsf{FGNN}\text{s}$ , then for vertex embeddings:

\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}\subseteq\rho_{1}\bigl{(}k\text{-}\mathsf{FGNN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}

in accordance with the known results from Azizian & Lelarge (2021). When used for graph embeddings, an aggregation layer over all $k$ -tuples of vertices is added, followed by the application of an $\mathsf{MLP}$ . This results in expressions with no free index variables, and of summation depth $t+k$ , where the increase with $k$ stems from the aggregation process over all $k$ -tuples. In view of our results, for graph embeddings:

\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{TL}_{k+1}(\Omega)\bigr{)}\subseteq\rho_{0}\bigl{(}k\text{-}\mathsf{FGNN}\bigr{)}

in accordance again with Azizian & Lelarge (2021). We here emphasize that the upper bounds in terms of $k\text{-}\mathsf{WL}$ are obtained without the need to know how $k\text{-}\mathsf{WL}$ works. Indeed, one can really just focus on casting layers in the right tensor language!

We remark that Azizian & Lelarge (2021) define vertex embedding $k\text{-}\mathsf{FGNN}\text{s}$ in a different way. Indeed, for a vertex $v$ , its embedding is obtained by aggregating of all $(k-1)$ tuples in the remaining coordinates of the tensors. They define $\mathsf{vwl}_{k}$ accordingly. From the tensor language point of view, this corresponds to the addition of $k-1$ to the summation depth. Our results indicate that we loose the connection between rounds and layers, as in Azizian & Lelarge (2021). This is the reason why we defined vertex embedding $k\text{-}\mathsf{FGNN}\text{s}$ in a different way and can ensure a correspondence between rounds and layers for vertex embeddings.

Other higher-order examples.

It is readily verified that $t$ -layered $k$ - $\mathsf{GNN}\text{s}$ (Morris et al., 2019) can be represented in $\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ , recovering the known upper bound by $\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ (Morris et al., 2019). It is an equally easy exercise to show that $2\text{-}\mathsf{WL}$ -convolutions (Damke et al., 2020) and Ring- $\mathsf{GNN}\text{s}$ (Chen et al., 2019) are bounded by $2\text{-}\mathsf{WL}$ , by simply writing their layers in $\mathsf{TL}_{3}(\Omega)$ . The invariant graph networks ( $k\text{-}\mathsf{IGN}\text{s}$ ) (Maron et al., 2019b) will be treated in Section E, as their representation in $\mathsf{TL}_{k+1}(\Omega)$ requires some work.

D.3 Augmented GNNs

Higher-order $\mathsf{GNN}$ architectures such as $k$ - $\mathsf{GNN}\text{s}$ , $k\text{-}\mathsf{FGNN}\text{s}$ and $k\text{-}\mathsf{IGN}\text{s}$ , incur a substantial cost in terms of memory and computation (Morris et al., 2020). Some recent proposals infuse more efficient $\mathsf{GNN}\text{s}$ with higher-order information by means of some pre-processing step. We next show that the tensor language approach also enables to obtain upper bounds on the separation power of such “augmented” $\mathsf{GNN}\text{s}$ .

We first consider $\mathcal{F}\text{-}\mathsf{MPNN}\text{s}$ (Barceló et al., 2021) in which the initial vertex features are augmented with homomorphism counts of rooted graph patterns. More precisely, let $P^{r}$ be a connected rooted graph (with root vertex $r$ ), and consider a graph $G=(V_{G},E_{G},\mathsf{col}_{G})$ and vertex $v\in V_{G}$ . Then, $\mathsf{hom}(P^{r},G^{v})$ denotes the number of homomorphism from $P$ to $G$ , mapping $r$ to $v$ . We recall that a homomorphism is an edge-preserving mapping between vertex sets. Given a collection $\mathcal{F}=\{P_{1}^{r},\ldots,P_{\ell}^{r}\}$ of rooted patterns, an $\mathcal{F}\text{-}\mathsf{MPNN}$ runs an $\mathsf{MPNN}$ on the augmented initial vertex features:

\tilde{{\bm{F}}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{v:}:=({\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{v:},\mathsf{hom}(P_{1}^{r},G^{v}),\ldots,\mathsf{hom}(P_{\ell}^{r},G^{v})).

Now, take any $\mathsf{GNN}$ architecture that can be cast in $\mathsf{GTL}(\Omega)$ or $\mathsf{TL}_{2}(\Omega)$ and assume, for simplicity of exposition, that a $t$ -layer $\mathsf{GNN}$ corresponds to expressions in $\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ or $\mathsf{TL}_{2}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ . In order to analyze the impact of the augmented features, one only needs to revise the expressions $\varphi_{j}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1})$ that represent the initial features. In the absence of graph patterns, $\varphi_{j}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1}):=P_{j}(x_{1})$ , as we have seen before. By contrast, to represent $\tilde{{\bm{F}}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{vj}$ we need to cast the computation of $\mathsf{hom}(P_{i}^{r},G^{v})$ in $\mathsf{TL}$ . Assume that the graph pattern $P_{i}$ consists of $p$ vertices and let us identify the vertex set with $[p]$ . Furthermore, without of loss generality, we assume that vertex “ $1$ ” is the root vertex in $P_{i}$ . To obtain $\mathsf{hom}(P_{i}^{r},G^{v})$ we need to create an indicator function for the graph pattern $P_{i}$ and then count how many times this indicator value is equal to one in $G$ . The indicator function for $P_{i}$ is simply given by the expression $\prod_{uv\in E_{P_{i}}}E(x_{u},x_{v})$ . Then, counting just pours down to summation over all index variables except the one for the root vertex. More precisely, if we define

\varphi_{P_{i}}(x_{1}):=\sum_{x_{2}}\cdots\sum_{x_{p}}\prod_{uv\in E_{P_{i}}}E(x_{u},x_{v}),

then $[\![\varphi_{P_{i}},v]\!]_{G}=\mathsf{hom}(P_{i}^{r},G^{v})$ . This encoding results in an expression in $\mathsf{TL}_{p}$ . However, it is well-known that we can equivalently write $\varphi_{P_{i}}(x_{1})$ as an expression $\tilde{\varphi}_{P_{i}}(x_{1})$ in $\mathsf{TL}_{k+1}$ where $k$ is the treewidth of the graph $P_{i}$ . As such, our results imply that $\mathcal{F}\text{-}\mathsf{MPNN}\text{s}$ are bounded in separation power by $k\text{-}\mathsf{WL}$ where $k$ is the maximal treewidth of graphs in $\mathcal{F}$ . We thus recover the known upper bound as given in Barceló et al. (2021) using our tensor language approach.

Another example of augmented $\mathsf{GNN}$ architectures are the Graph Substructure Networks ( $\mathsf{GSN}\text{s})$ (Bouritsas et al., 2020). By contrast to $\mathcal{F}\text{-}\mathsf{MPNN}\text{s}$ , subgraph isomorphism counts rather than homomorphism counts are used to augment the initial features. At the core of a $\mathsf{GSN}$ thus lies the computation of $\mathsf{sub}(P^{r},G^{v})$ , the number of subgraphs $H$ in $G$ isomorphic to $P$ (and such that the isomorphisms map $r$ to $v$ ). In a similar way as for homomorphisms counts, we can either directly cast the computation of $\mathsf{sub}(P^{r},G^{v})$ in $\mathsf{TL}$ resulting again in the use of $p$ index variables. A possible reduction in terms of index variables, however, can be obtained by relying on the result (Theorem 1.1.) by Curticapean et al. (2017) in which it shown that $\mathsf{sub}(P^{r},G^{v})$ can be computed in terms of homomorphism counts of graph patterns derived from $P^{r}$ . More precisely, Curticapean et al. (2017) define $\mathsf{spasm}(P^{r})$ as the set of graphs consisting of all possible homomorphic images of $P^{r}$ . It is then readily verified that if the maximal treewidth of the graphs in $\mathsf{spasm}(P^{r})$ is $k$ , then $\mathsf{sub}(P^{r},G^{v})$ can be cast as an expression in $\mathsf{TL}_{k+1}$ . Hence, $\mathsf{GSN}s$ using a pattern collection $\mathcal{F}$ can be represented in $\mathsf{TL}_{k+1}$ , where $k$ is the maximal treewidth of graphs in any of the spams of patterns in $\mathcal{F}$ , and thus are bounded in separation power $k\text{-}\mathsf{WL}$ in accordance to the results by Barceló et al. (2021).

As a final example, we consider the recently introduced Message Passing Simplicial Networks ( $\mathsf{MPSN}$ s) (Bodnar et al., 2021). In a nutshell, $\mathsf{MPSN}$ s are run on simplicial complexes of graphs instead of on the original graphs. We sketch how our tensor language approach can be used to assess the separation power of $\mathsf{MPSN}$ s on clique complexes. We use the simplified version of $\mathsf{MPSN}$ s which have the same expressive power as the full version of $\mathsf{MPSN}$ s (Theorem 6 in Bodnar et al. (2021)).

We recall some definitions. Let $\mathsf{Cliques}(G)$ denote the set of all cliques in $G$ . Given two cliques $c$ and $c^{\prime}$ in $\mathsf{Cliques}(G)$ , define $c\prec c^{\prime}$ if $c\subset c^{\prime}$ and there exists no $c^{\prime\prime}$ in $\mathsf{Cliques}(G)$ , such that $c\subset c^{\prime\prime}\subset c^{\prime}$ . We define $\mathsf{Boundary}(c,G):=\{c^{\prime}\in\mathsf{Cliques}(G)\mid c^{\prime}\prec c\}$ and $\mathsf{Upper}(c,G):=\{c^{\prime}\in\mathsf{Cliques}(G)\mid\exists c^{\prime\prime}\in\mathsf{Cliques}(G),c^{\prime}\prec c^{\prime\prime}\text{ and }c\prec c^{\prime\prime}\}$ .

For each $c$ in $\mathsf{Cliques}(G)$ we have an initial feature vector ${\bm{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{c:}\in\mathbb{R}^{1\times\ell}$ . Bodnar et al. (2021) initialize all initial features with the same value. Then, in layer $t$ , for each $c\in\mathsf{Cliques}(G)$ , features are updated as follows:

	$\displaystyle{\bm{G}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{c:}$	$\displaystyle=F_{B}(\{\!\{\mathsf{mlp}_{B}({\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c:},{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c^{\prime}:})\mid c^{\prime}\in\mathsf{Boundary}(G,c)\}\!\})$
	$\displaystyle{\bm{H}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{c:}$	$\displaystyle=F_{U}(\{\!\{\mathsf{mlp}_{U}({\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c:},{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c^{\prime}:},{\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c\cup c^{\prime}:})\mid c^{\prime}\in\mathsf{Upper}(G,c)\}\!\})$
	$\displaystyle{\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{c:}$	$\displaystyle=\mathsf{mlp}({\bm{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{c:},{\bm{G}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{c:},{\bm{H}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{c:}),$

where $F_{B}$ and $F_{U}$ are aggregation functions and $\mathsf{mlp}_{B}$ , $\mathsf{mlp}_{U}$ and $\mathsf{mlp}$ are $\mathsf{MLP}\text{s}$ . With some effort, one can represent these computations by expressions in $\mathsf{TL}_{p}(\Omega,\Theta)$ where $p$ is largest clique in $G$ . As such, the separation power of clique-complex $\mathsf{MPSN}$ s on graphs of clique size at most $p$ is bounded by $p-1\text{-}\mathsf{WL}$ . And indeed, Bodnar et al. (2021) consider Rook’s $4\times 4$ graph, which contains a $4$ -clique, and the Shirkhande graph, which does not contain a $4$ -clique. As such, the analysis above implies that clique-complex $\mathsf{MPSN}$ s are bounded by $2\text{-}\mathsf{WL}$ on the Shrikhande graph, and by $3\text{-}\mathsf{WL}$ on Rook’s graph, consistent with the observation in Bodnar et al. (2021). A more detailed analysis of $\mathsf{MPSN}$ s in terms of summation depth and for other simplicial complexes is left as future work.

This illustrates again that our approach can be used to assess the separation power of a variety of $\mathsf{GNN}$ architectures in terms of $k\text{-}\mathsf{WL}$ , by simply writing them as tensor language expressions. Furthermore, bounds in terms of $k\text{-}\mathsf{WL}$ can be used for augmented $\mathsf{GNN}\text{s}$ which form a more efficient way of incorporating higher-order graph structural information than higher-order $\mathsf{GNN}\text{s}$ .

D.4 Spectral GNNs

In general, spectral $\mathsf{GNN}\text{s}$ are defined in terms of eigenvectors and eigenvalues of the (normalized) graph Laplacian (Bruna et al., 2014; Defferrard et al., 2016; Levie et al., 2019; Balcilar et al., 2021b)). The diagonalization of the graph Laplacian is, however, avoided in practice, due to its excessive cost. Instead, by relying on approximation results in spectral graph analysis (Hammond et al., 2011), the layers of practical spectral $\mathsf{GNN}\text{s}$ are defined in term propagation matrices consisting of functions, which operate directly on the graph Laplacian. This viewpoint allows for a spectral analysis of spectral and “spatial” $\mathsf{GNN}\text{s}$ in a uniform way, as shown by Balcilar et al. (2021b). In this section, we consider two specific instances of spectral $\mathsf{GNN}\text{s}$ : $\mathsf{ChebNet}$ (Defferrard et al., 2016) and $\mathsf{CayleyNet}$ (Levie et al., 2019), and assess their separation power in terms of tensor logic. Our general results then provide bounds on their separation power in terms color refinement and $2\text{-}\mathsf{WL}$ , respectively.

Chebnet.

The separation power of $\mathsf{ChebNet}$ (Defferrard et al., 2016) was already analyzed in Balcilar et al. (2021a) by representing them in the $\mathsf{MATLANG}$ matrix query language (Brijder et al., 2019). It was shown (Theorem 2 (Balcilar et al., 2021a)) that it is only the maximal eigenvalue $\lambda_{\max}$ of the graph Laplacian used in the layers of $\mathsf{ChebNet}$ that may result in the separation power of $\mathsf{ChebNet}$ to go beyond $1\text{-}\mathsf{WL}$ . We here revisit and refine this result by showing that, when ignoring the use of $\lambda_{\max}$ , the separation power of $\mathsf{Chebnet}$ is bounded already by color refinement (which, as mentioned in Section 2, is weaker than $1\text{-}\mathsf{WL}$ for vertex embeddings). In a nutshell, the layers of a $\mathsf{ChebNet}$ are defined in terms of Chebyshev polynomials of the normalized Laplacian ${\bm{L}}_{\textsl{norm}}={\bm{I}}-{\bm{D}}^{-1/2}\cdot{\bm{A}}\cdot{\bm{D}}^{-1/2}$ and these polynomials can be easily represented in $\mathsf{GTL}(\Omega)$ . One can alternatively use the graph Laplacian ${\bm{L}}={\bm{D}}-{\bm{A}}$ in a $\mathsf{ChebNet}$ , which allows for a similar analysis. The distinction between the choice of ${\bm{L}}_{\textsl{norm}}$ and ${\bm{L}}$ only shows in the needed summation depth (in as in similar way as for the $\mathsf{GCN}\text{s}$ described earlier). We only consider the normalized Laplacian here.

More precisely, following Balcilar et al. (2021a; b), in layer $t$ , vertex embeddings are updated in a $\mathsf{ChebNet}$ according to:

{\bm{F}}^{(t)}:=\sigma\left(\sum_{s=1}^{p}{\bm{C}}^{(s)}\cdot{\bm{F}}^{(t-1)}\cdot{\bm{W}}^{(t-1,s)}\right),

with

{\bm{C}}^{(1)}:={\bm{I}},{\bm{C}}^{(2)}=\frac{2}{\lambda_{\max}}{\bm{L}}_{\textsl{norm}}-{\bm{I}},{\bm{C}}^{(s)}=2{\bm{C}}^{(2)}\cdot{\bm{C}}^{(s-1)}-{\bm{C}}^{(s-2)},\text{ for $s\geq 3$,}

and where $\lambda_{\max}$ denotes the maximum eigenvalue of ${\bm{L}}_{\textsl{norm}}$ . We next use a similar analysis as in Balcilar et al. (2021a). That is, we ignore for the moment the maximal eigenvalue $\lambda_{\max}$ and redefine ${\bm{C}}^{(2)}$ as $c{\bm{L}}_{\textsl{norm}}-{\bm{I}}$ for some constant $c$ . We thus see that each ${\bm{C}}^{(s)}$ is a polynomial of the form $p_{s}(c,{\bm{L}}_{\textsl{norm}}):=\sum_{i=0}^{q_{s}}a^{(s)}_{i}(c)\cdot({\bm{L}}_{\textsl{norm}})^{i}$ with scalar functions $a^{(s)}_{i}:\mathbb{R}\to\mathbb{R}$ and where we interpret $({\bm{L}}_{\textsl{norm}})^{0}={\bm{I}}$ . To upper bound the separation power using our tensor language approach, we can thus shift our attention entirely to representing $({\bm{L}}_{\textsl{norm}})^{i}\cdot{\bm{F}}^{(t-1)}\cdot{\bm{W}}^{(t-1,s)}$ for powers $i\in\mathbb{N}$ . Furthermore, since $({\bm{L}}_{\textsl{norm}})^{i}$ is again a polynomial of the form $q_{i}({\bm{D}}^{-1/2}\cdot{\bm{A}}\cdot{\bm{D}}^{-1/2}):=\sum_{j=0}^{r_{i}}b_{ij}\cdot({\bm{D}}^{-1/2}\cdot{\bm{A}}\cdot{\bm{D}}^{-1/2})^{j}$ , we can further narrow down the problem to represent

({\bm{D}}^{-1/2}\cdot{\bm{A}}\cdot{\bm{D}}^{-1/2})^{j}\cdot{\bm{F}}^{(t-1)}\cdot{\bm{W}}^{(t-1,s)}

in $\mathsf{GTL}(\Omega)$ , for powers $j\in\mathbb{N}$ . And indeed, combining our analysis for $\mathsf{GCN}\text{s}$ and $\mathsf{SGC}\text{s}$ results in expressions in $\mathsf{GTL}(\Omega)$ . As an example let us consider $({\bm{D}}^{-1/2}\cdot{\bm{A}}\cdot{\bm{D}}^{-1/2})^{2}\cdot{\bm{F}}^{(t-1)}\cdot{\bm{W}}^{(t-1)}$ , that is we use a power of two. It then suffices to define, for each output dimension $j$ , the expressions:

\psi^{2}_{j}(x_{1})=f_{1/\sqrt{x}}\bigl{(}\textstyle\sum_{x_{2}}E(x_{1},x_{2})\bigr{)}\cdot\textstyle\sum_{x_{2}}\Biggl{(}E(x_{1},x_{2})\cdot f_{1/x}\bigl{(}\textstyle\sum_{x_{1}}(E(x_{2},x_{1})\bigr{)}\cdot\\ \textstyle\sum_{x_{1}}\Bigl{(}E(x_{2},x_{1})\cdot f_{1/\sqrt{x}}(\textstyle\sum_{x_{2}}E(x_{1},x_{2}))\cdot\bigl{(}\textstyle\sum_{i=1}^{d_{t-1}}{W}_{ij}^{(t-1)}\varphi_{i}^{(t-1)}(x_{1})\bigr{)}\Bigr{)}\Biggr{)},

where the $\varphi_{i}^{(t-1)}(x_{1})$ are expressions representing layer $t-1$ . It is then readily verified that we can use $\psi_{j}^{2}(x_{1})$ to cast layer $t$ of a $\mathsf{ChebNet}$ in $\mathsf{GTL}(\Omega)$ with $\Omega$ consisting of $f_{1/\sqrt{x}}:\mathbb{R}\to\mathbb{R}:x\mapsto\frac{1}{\sqrt{x}}$ , $f_{1/x}:\mathbb{R}\to\mathbb{R}:x\mapsto\frac{1}{x}$ , and the used activation function $\sigma$ . We thus recover (and slightly refine) Theorem 2 in Balcilar et al. (2021a):

Corollary D.1.

On graphs sharing the same $\lambda_{\max}$ values, the separation power of $\mathsf{ChebNet}$ is bounded by color refinement, both for graph and vertex embeddings.

A more fine-grained analysis of the expressions is needed when interested in bounding the summation depth and thus of the number of rounds needed for color refinement. Moreover, as shown by Balcilar et al. (2021a), when graphs have non-regular components with different $\lambda_{\max}$ values, $\mathsf{ChebNet}$ can distinguish them, whilst $1\text{-}\mathsf{WL}$ cannot. To our knowledge, $\lambda_{\max}$ cannot be computed in $\mathsf{TL}_{k}(\Omega)$ for any $k$ . This implies that it not clear whether an upper bound on the separation power can be obtained for $\mathsf{ChebNet}$ taking $\lambda_{\max}$ into account. It is an interesting open question whether there are two graphs $G$ and $H$ which cannot be distinguished by $k\text{-}\mathsf{WL}$ but can be distinguished based on $\lambda_{\max}$ . A positive answer would imply that the computation of $\lambda_{\max}$ is beyond reach for $\mathsf{TL}(\Omega)$ and other techniques are needed.

CayleyNet.

We next show how the separation power of $\mathsf{CayleyNet}$ (Levie et al., 2019) can be analyzed. To our knowledge, this analysis is new. We show that the separation power of $\mathsf{CayleyNet}$ is bounded by $2\text{-}\mathsf{WL}$ . Following Levie et al. (2019) and Balcilar et al. (2021b), in each layer $t$ , a $\mathsf{CayleyNet}$ updates features as follows:

{\bm{F}}^{(t)}:=\sigma\left(\sum_{s=1}^{p}{\bm{C}}^{(s)}\cdot{\bm{F}}^{(t-1)}{\bm{W}}^{(t-1,s)}\right),

with

{\bm{C}}^{(1)}:={\bm{I}},{\bm{C}}^{(2s)}:=\mathsf{Re}\Biggl{(}\Bigl{(}\frac{h{\bm{L}}-\imath{\bm{I}}}{h{\bm{L}}+\imath{\bm{I}}}\Bigr{)}^{s}\Biggr{)},{\bm{C}}^{(2s+1)}:=\mathsf{Re}\Biggl{(}\imath\Bigl{(}\frac{h{\bm{L}}-\imath{\bm{I}}}{h{\bm{L}}+\imath{\bm{I}}}\Bigr{)}^{s}\Biggr{)},

where $h$ is a constant, $\imath$ is the imaginary unit, and $\mathsf{Re}:\mathbb{C}\to\mathbb{C}$ maps a complex number to its real part. We immediately observe that a $\mathsf{CayleyNet}$ requires the use of complex numbers and matrix inversion. So far, we considered real numbers only, but when our separation results are concerned, the choice between real or complex numbers is insignificant. In fact, only the proof of Proposition C.3 requires a minor modification when working on complex numbers: the infinite disjunctions used in the proof now need to range over complex numbers. For matrix inversion, when dealing with separation power, one can use different expressions in $\mathsf{TL}(\Omega)$ for computing the matrix inverse, depending on the input size. And indeed, it is well-known (see e.g., Csanky (1976)) that based on the characteristic polynomial of ${\bm{A}}$ , ${\bm{A}}^{-1}$ for any matrix ${\bm{A}}\in\mathbb{R}^{n\times n}$ can be computed as a polynomial $\frac{-1}{c_{n}}\sum_{i=1}^{n-1}c_{i}{\bm{A}}^{n-1-i}$ if $c_{n}\neq 0$ and where each coefficient $c_{i}$ is a polynomial in $\mathsf{tr}({\bm{A}}^{j})$ , for various $j$ . Here, $\mathsf{tr}(\cdot)$ is the trace of a matrix. As a consequence, layers in $\mathsf{CayleyNet}$ can be viewed as polynomials in $h{\bm{L}}-\imath{\bm{I}}$ with coefficients polynomials in $\mathsf{tr}((h{\bm{L}}-\imath{\bm{I}})^{j})$ . One now needs three index variables to represent the trace computations $\mathsf{tr}((h{\bm{L}}-\imath{\bm{I}})^{j})$ . Indeed, let $\varphi_{0}(x_{1},x_{2})$ be the $\mathsf{TL}_{2}$ expression representing $h{\bm{L}}-\imath{\bm{I}}$ . Then, for example, $(h{\bm{L}}-\imath{\bm{I}})^{j}$ can be computed in $\mathsf{TL}_{3}$ using

\varphi_{j}(x_{1},x_{2}):=\sum_{x_{3}}\varphi_{0}(x_{1},x_{3})\cdot\varphi_{j-1}(x_{3},x_{2})

and hence $\mathsf{tr}((h{\bm{L}}-\imath{\bm{I}})^{j})$ is represented by $\sum_{x_{1}}\sum_{x_{2}}\varphi_{j}(x_{1},x_{2})\cdot\bm{1}_{x_{1}=x_{2}}.$ . In other words, we obtain expressions in $\mathsf{TL}_{3}$ . The polynomials in $h{\bm{L}}-\imath{\bm{I}}$ can be represented in $\mathsf{TL}_{2}$ just as for $\mathsf{ChebNet}$ . This implies that each layer in $\mathsf{CayleyNet}$ can be represented, on graphs of fixed size, by $\mathsf{TL}_{3}(\Omega)$ expressions, where $\Omega$ includes the activation function $\sigma$ and the function $\mathsf{Re}$ . This suffices to use our general results and conclude that $\mathsf{CayleyNet}$ s are bounded in separation power by $2\text{-}\mathsf{WL}$ . An interesting question is to find graphs that can be separated by a $\mathsf{CayleyNet}$ but not by $1\text{-}\mathsf{WL}$ . We leave this as an open problem.

Appendix E Proof of Theorem 5.1

We here consider another higher-order $\mathsf{GNN}$ proposal: the invariant graph networks or $k\text{-}\mathsf{IGN}\text{s}$ of Maron et al. (2019b). By contrast to $k\text{-}\mathsf{FGNN}\text{s}$ , $k\text{-}\mathsf{IGN}\text{s}$ are linear architectures. If we denote by $k\text{-}\mathsf{IGN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ the class of $t$ layered $k\text{-}\mathsf{IGN}\text{s}$ , then following inclusions are known (Maron et al., 2019b)

\rho_{1}\bigl{(}k\text{-}\mathsf{IGN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\text{ and }\rho_{0}\bigl{(}k\text{-}\mathsf{IGN}\bigr{)}\subseteq\rho_{0}\bigl{(}\mathsf{gwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}.

The reverse inclusions were posed as open problems in Maron et al. (2019a) and were shown to hold by Chen et al. (2020) for $k=2$ , by means of an extensive case analysis and by relying on properties of $1\text{-}\mathsf{WL}$ . In this section, we show that the separation power of $k\text{-}\mathsf{IGN}\text{s}$ is bounded by that of $(k-1)\text{-}\mathsf{WL}$ , for arbitrary $k\geq 2$ . Theorem 4.2 tells that we can entirely shift our attention to showing that the layers of $k\text{-}\mathsf{IGN}\text{s}$ can be represented in $\mathsf{TL}_{k}(\Omega)$ . In other words, we only need to show that $k$ index variables are needed for the layers. As we will see below, this requires a bit of work since a naive representation of the layers of $k\text{-}\mathsf{IGN}\text{s}$ use $2k$ index variables. Nevertheless, we show that this can be reduced to $k$ index variables only.

By inspecting the expressions needed to represent the layers of $k\text{-}\mathsf{IGN}\text{s}$ in $\mathsf{TL}_{k}(\Omega)$ , we obtain that a $t$ layer $k\text{-}\mathsf{IGN}^{(t)}$ require expressions of summation depth of $tk$ . In other words, the correspondence between layers and summation depth is precisely in sync. This implies, by Theorem 4.2:

\rho_{1}\bigl{(}k\text{-}\mathsf{IGN}\bigr{)}=\rho_{1}\bigl{(}\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)},

where we ignore the number of layers. We similarly obtain that $\rho_{0}\bigl{(}k\text{-}\mathsf{IGN}\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}$ , hereby answering the open problem posed in Maron et al. (2019a). Finally, we observe that the $k\text{-}\mathsf{IGN}\text{s}$ used in Maron et al. (2019b) to show the inclusion $\rho_{1}\bigl{(}k\text{-}\mathsf{IGN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}\bigl{(}\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ are of very simple form. By defining a simple class of $k\text{-}\mathsf{IGN}\text{s}$ , denoted by $k\text{-}\mathsf{GIN}\text{s}$ , we obtain

\rho_{1}\bigl{(}k\text{-}\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)},

hereby recovering the layer/round connections.

We start with the following lemma:

Lemma E.1.

For any $k\geq 2$ , a $t$ layer $k\text{-}\mathsf{IGN}\text{s}$ can be represented in $\mathsf{TL}_{k}^{\!\scalebox{0.6}{(}tk\scalebox{0.6}{)}}(\Omega)$ .

Before proving this lemma, we recall $k\text{-}\mathsf{IGN}\text{s}$ . These are architectures that consist of linear equivariant layers. Such linear layers allow for an explicit description. Indeed, following Maron et al. (2019c), let $\sim_{\ell}$ be the equality pattern equivalence relation on $[n]^{\ell}$ such that for ${\bm{a}},{\bm{b}}\in[n]^{\ell}$ , ${\bm{a}}\sim_{\ell}{\bm{b}}$ if and only if ${a}_{i}={a}_{j}\Leftrightarrow{b}_{i}={b}_{j}$ for all $j\in[\ell]$ . We denote by $[n]^{\ell}/_{\sim_{\ell}}$ the equivalence classes induced by $\sim_{\ell}$ . Let us denote by ${\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\in\mathbb{R}^{n^{k}\times d_{t-1}}$ the tensor computed by an $k\text{-}\mathsf{IGN}$ in layer $t-1$ . Then, in layer $t$ , a new tensor in $\mathbb{R}^{n^{k}\times d_{t}}$ is computed, as follows. For $j\in[d_{t}]$ and $v_{1},\ldots,v_{k}\in[n]^{k}$ :

{\mathsfit{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},j}:=\sigma\Biggl{(}\sum_{\gamma\in[n]^{2k}\!/_{\!\sim_{2k}}}\sum_{{\bm{w}}\in[n]^{k}}\bm{1}_{({\bm{v}},{\bm{w}})\in\gamma}\sum_{i\in[d_{t-1}]}c_{\gamma,i,j}{\mathsfit{F}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{w_{1},\ldots,w_{k},i}+\sum_{\mu\in[n]^{k}\!/_{\!\sim_{k}}}\bm{1}_{{\bm{v}}\in\mu}{b}_{\mu,j}\Biggr{)}

(1)

for activation function $\sigma$ , constants $c_{\gamma,i,j}$ and $b_{\mu,j}$ in $\mathbb{R}$ and where $\bm{1}_{({\bm{v}},{\bm{w}})\in\gamma}$ and $\bm{1}_{{\bm{v}}\in\mu}$ are indicator functions for the $2k$ -tuple $({\bm{v}},{\bm{w}})$ to be in the equivalence class $\gamma\in[n]^{2k}\!/_{\!\sim_{2k}}$ and the $k$ -tuple ${\bm{v}}$ to be in class $\mu\in[n]^{k}\!/_{\!\sim_{k}}$ . As initial tensor ${\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}$ one defines ${\mathsfit{F}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},j}:=\mathsf{atp}_{k}(G,{\bm{v}})\in\mathbb{R}^{d_{0}},$ with $d_{0}={2\genfrac{(}{)}{0.0pt}{2}{k}{2}+k\ell}$ where $\ell$ is the number of initial vertex labels, just as for $k\text{-}\mathsf{FGNN}\text{s}$ .

We remark that the need for having a summation depth of $tk$ in the expressions in $\mathsf{TL}_{k}(\Omega)$ , or equivalently for requiring $tk$ rounds of $(k-1)\text{-}\mathsf{WL}$ , can intuitively be explained that each layer of a $k\text{-}\mathsf{IGN}$ aggregates more information from “neighbouring” $k$ -tuples than $(k-1)\text{-}\mathsf{WL}$ does. Indeed, in each layer, an $k\text{-}\mathsf{IGN}$ can use previous tuple embeddings of all possible $k$ -tuples. In a single round of $(k-1)\text{-}\mathsf{WL}$ only previous tuple embeddings from specific sets of $k$ -tuples are used. It is only after an additional $k-1$ rounds, that $k\text{-}\mathsf{WL}$ gets to the information about arbitrary $k$ -tuples, whereas this information is available in a $k\text{-}\mathsf{IGN}$ in one layer directly.

Proof of Lemma E.1.

We have seen how ${\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}$ can be represented in $\mathsf{TL}_{k}(\Omega)$ when dealing with $k\text{-}\mathsf{FGNN}\text{s}$ . We assume now that also the $t-1$ th layer ${\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}$ can be represented by $d_{t-1}$ expressions in $\mathsf{TL}_{k}^{\!\scalebox{0.6}{(}(t-1)k\scalebox{0.6}{)}}(\Omega)$ and show that the same holds for the $t$ th layer.

We first represent ${\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ in $\mathsf{TL}_{2k}(\Omega)$ , based on the explicit description given earlier. The expressions use index variables $x_{1},\ldots,x_{k}$ and $y_{1},\ldots,y_{k}$ . More specifically, for $j\in[d_{t}]$ we consider the expressions:

\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1},\ldots,x_{k})=\sigma\left(\sum_{\gamma\in[n]^{2k}\!/_{\!\sim_{2k}}}\sum_{i=1}^{d_{t-1}}c_{\gamma,i,j}\right.\\ \sum_{y_{1}}\cdots\sum_{y_{k}}\psi_{\gamma}(x_{1},\ldots,x_{k},y_{1},\ldots,y_{k})\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\\ \left.{}+\sum_{\mu\in[n]^{k}\!/_{\!\sim_{k}}}b_{\mu,j}\cdot\psi_{\mu}(x_{1},\ldots,x_{k})\right),

(2)

where $\psi_{\mu}(x_{1},\ldots,x_{k})$ is a product of expressions of the form $\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}}$ encoding the equality pattern $\mu$ , and similarly, $\psi_{\gamma}(x_{1},\ldots,x_{k},y_{1},\ldots,y_{k})$ is a product of expressions of the form $\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}}$ , $\bm{1}_{y_{i}\mathop{\mathsf{op}}y_{j}}$ and $\bm{1}_{x_{i}\mathop{\mathsf{op}}y_{j}}$ encoding the equality pattern $\gamma$ . These expressions are indicator functions for the their corresponding equality patterns. That is,

[\![\psi_{\gamma},({\bm{v}},{\bm{w}})]\!]_{G}=\begin{cases}1&\text{if $({\bm{v}},{\bm{w}})\in\gamma$}\\ 0&\text{otherwise}\end{cases}\quad[\![\psi_{\mu},{\bm{v}}]\!]_{G}=\begin{cases}1&\text{if ${\bm{v}}\in\mu$}\\ 0&\text{otherwise}\end{cases}

We remark that in the expressions $\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ we have two kinds of summations: those ranging over a fixed number of elements (over equality patterns, feature dimension), and those ranging over the index variables $y_{1},\ldots,y_{k}$ . The latter are the only ones contributing the summation depth. The former are just concise representations of a long summation over a fixed number of expressions.

We now only need to show that we can equivalently write $\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1},\ldots,x_{k})$ as expressions in $\mathsf{TL}_{k}(\Omega)$ , that is, using only indices $x_{1},\ldots,x_{k}$ . As such, we can already ignore the term $\sum_{\mu\in[n]^{k}\!/_{\!\sim_{k}}}b_{\mu,j}\cdot\psi_{\mu}(x_{1},\ldots,x_{k})$ since this is already in $\mathsf{TL}_{k}(\Omega)$ . Furthermore, this expressions does not affect the summation depth.

Furthermore, as just mentioned, we can expand expression $\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ into linear combinations of other simpler expressions. As such, it suffices to show that $k$ index variables suffice for each expression of the form:

\sum_{y_{1}}\cdots\sum_{y_{k}}\psi_{\gamma}(x_{1},\ldots,x_{k},y_{1},\ldots,y_{k})\cdot\varphi_{i}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(y_{1},\ldots,y_{k}),

(3)

obtained by fixing $\mu$ and $i$ in expression (2). To reduce the number of variables, as a first step we eliminate any disequality using the inclusion-exclusion principle. More precisely, we observe that $\psi_{\gamma}({\bm{x}},{\bm{y}})$ can be written as:

	$\displaystyle\prod_{(i,j)\in I}\bm{1}_{x_{i}=x_{j}}\cdot\prod_{(i,j)\in\bar{I}}\bm{1}_{x_{i}\neq x_{j}}\cdot\prod_{(i,j)\in J}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in\bar{J}}\bm{1}_{y_{i}\neq y_{j}}\prod_{(i,j)\in K}\bm{1}_{x_{i}=y_{j}}\cdot\prod_{(i,j)\in\bar{K}}\bm{1}_{x_{i}\neq y_{j}}{}$
	$\displaystyle=\sum_{A\subseteq\bar{I}}\sum_{B\subseteq\bar{J}}\sum_{C\subseteq\bar{K}}(-1)^{\|A\|+\|B\|+\|C\|}\prod_{(i,j)\in I\cup A}\bm{1}_{x_{i}=x_{j}}\prod_{(i,j)\in J\cup B}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in J\cup C}\bm{1}_{x_{i}=y_{j}},$		(4)

for some sets $I$ , $J$ and $K$ of pairs of indices in $[k]^{2}$ , and where $\bar{I}=[k]^{2}\setminus I$ , $\bar{J}=[k]^{2}\setminus J$ and $\bar{K}=[k]^{2}\setminus K$ . Here we use that $\bm{1}_{x_{i}\neq x_{j}}=1-\bm{1}_{x_{i}=x_{j}}$ , $\bm{1}_{y_{i}\neq y_{j}}=1-\bm{1}_{y_{i}=y_{j}}$ and $\bm{1}_{x_{i}\neq y_{j}}=1-\bm{1}_{y_{i}=y_{j}}$ and use the inclusion-exclusion principle to obtain a polynomial in equality conditions only.

In view of expression (4), we can push the summations over $y_{1},\ldots,y_{k}$ in expression (3) to the subexpressions that actually use $y_{1},\ldots,y_{k}$ . That is, we can rewrite expression (3) into the equivalent expression:

\sum_{A\subseteq\bar{I}}\sum_{B\subseteq\bar{J}}\sum_{C\subseteq\bar{K}}(-1)^{|A|+|B|+|C|}\cdot\prod_{(i,j)\in I\cup A}\bm{1}_{x_{i}=x_{j}}\\ {}\cdot\left(\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(i,j)\in J\cup B}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in K\cup C}\bm{1}_{x_{i}=y_{j}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\right).

(5)

By fixing $A$ , $B$ and $C$ , it now suffices to argue that

\prod_{(i,j)\in I\cup A}\bm{1}_{x_{i}=x_{j}}\cdot\left(\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(i,j)\in J\cup B}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in K\cup C}\bm{1}_{x_{i}=y_{j}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\right),

(6)

can be equivalently expressed in $\mathsf{TL}_{k}(\Omega)$ .

Since our aim is to reduced the number of index variables from $2k$ to $k$ , it is important to known which variables are the same. In expression (6), some equalities that hold between the variables may not be explicitly mentioned. For this reason, we expand $I\cup A$ , $J\cup B$ and $K\cup C$ with their implied equalities. That is, $\bm{1}_{x_{i}=x_{j}}$ is added to $I\cup A$ , if for any $({\bm{v}},{\bm{w}})$ such that

[\![\prod_{(i,j)\in I\cup A}\bm{1}_{x_{i}=x_{j}}\cdot\prod_{(i,j)\in J\cup B}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in K\cup C}\bm{1}_{x_{i}=y_{j}},({\bm{v}},{\bm{w}})]\!]_{G}=1\Rightarrow[\![\bm{1}_{x_{i}=x_{j}},{\bm{v}}]\!]_{G}=1

holds. Similar implied equalities $\bm{1}_{y_{i}=y_{j}}$ and $\bm{1}_{x_{i}=y_{j}}$ are added to $J\cup B$ and $K\cup C$ , respectively. let us denoted by $I^{\prime}$ , $J^{\prime}$ and $K^{\prime}$ . It should be clear that we can add these implied equalities to expression (6) without changing its semantics. In other words, expression (6) can be equivalently represented by

\prod_{(i,j)\in I^{\prime}}\bm{1}_{x_{i}=x_{j}}\cdot\left(\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(i,j)\in J^{\prime}}\bm{1}_{y_{i}=y_{j}}\cdot\prod_{(i,j)\in K^{\prime}}\bm{1}_{x_{i}=y_{j}}\cdot\varphi_{i}^{(\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\right),

(7)

There now two types of index variables among the $y_{1},\ldots,y_{k}$ : those that are equal to some $x_{i}$ , and those that are not. Now suppose that $(j,j^{\prime})\in J^{\prime}$ , and thus $y_{j}=y_{j^{\prime}}$ , and that also $(i,j)\in K^{\prime}$ , and thus $x_{i}=y_{j}$ . Since we included the implied equalities, we also have $(i,j^{\prime})\in K^{\prime}$ , and thus $x_{i}=y_{j^{\prime}}$ . There is no reason to keep $(j,j^{\prime})\in J^{\prime}$ as it is implied by $(i,j)$ and $(i,j^{\prime})\in K^{\prime}$ . We can thus safely remove all pairs $(j,j^{\prime})$ from $J^{\prime}$ such that $(i,j)\in K^{\prime}$ (and thus also $(i,j^{\prime})\in K^{\prime}$ ). We denote by $J^{\prime\prime}$ be the reduced set of pairs of indices obtained from $J^{\prime}$ in this way. We have that expression (7) can be equivalently written as

\prod_{(i,j)\in I^{\prime}}\bm{1}_{x_{i}=x_{j}}\cdot\left(\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(i,j)\in K^{\prime}}\bm{1}_{x_{i}=y_{j}}\cdot\prod_{(i,j)\in J^{\prime\prime}}\bm{1}_{y_{i}=y_{j}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\right),

(8)

where we also switched the order of equalities in $J^{\prime\prime}$ and $K^{\prime}$ . Our construction of $J^{\prime\prime}$ and $K^{\prime}$ ensures that none of the variables $y_{j}$ with $j$ belonging to a pair in $J^{\prime\prime}$ is equal to some $x_{i}$ .

By contrast, the variable $y_{j}$ occurring in $(i,j)\in K^{\prime}$ are equal to $x_{i}$ . We observe, however, that also certain equalities among the variables $\{x_{1},\ldots,x_{k}\}$ hold, as represented by the pairs in $I^{\prime}$ . let $I^{\prime}(i):=\{i^{\prime}\mid(i,i^{\prime})\in I^{\prime}\}$ and define $\hat{\imath}$ as a unique representative element in $I^{\prime}(i)$ . For example, one can take $\hat{i}$ to be smallest index in $I^{\prime}(i)$ . We use this representative index (and corresponding $x$ -variable) to simplify $K^{\prime}$ . More precisely, we replace each pair $(i,j)\in K^{\prime}$ with the pair $(\hat{\imath},j)$ . In terms of variables, we replace $x_{i}=y_{j}$ with $x_{\hat{i}}=y_{j}$ . Let $K^{\prime\prime}$ be the set $K^{\prime\prime}$ modified in that way. Expression (8) can thus be equivalently written as

\prod_{(i,j)\in I^{\prime}}\bm{1}_{x_{i}=x_{j}}\cdot\left(\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(\hat{\imath},j)\in K^{\prime\prime}}\bm{1}_{x_{\hat{\imath}}=y_{j}}\cdot\prod_{(i,j)\in J^{\prime\prime}}\bm{1}_{y_{i}=y_{j}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})\right),

(9)

where the free index variables of the subexpression

\sum_{y_{1}}\cdots\sum_{y_{k}}\prod_{(\hat{\imath},j)\in K^{\prime\prime}}\bm{1}_{x_{\hat{\imath}}=y_{j}}\cdot\prod_{(i,j)\in J^{\prime\prime}}\bm{1}_{y_{i}=y_{j}}\cdot\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k})

(10)

are precisely the index variables $x_{\hat{\imath}}$ for $(\hat{\imath},j)\in K^{\prime\prime}$ . Recall that our aim is to reduce the variables from $2k$ to $k$ . We are now finally ready to do this. More specifically, we consider a bijection $\beta:\{y_{1},\ldots,y_{k}\}\to\{x_{1},\ldots,x_{k}\}$ in which ensure that for each $\hat{\imath}$ there is a $j$ such that $(\hat{i},j)\in K^{\prime\prime}$ and $\beta(y_{j})=x_{\hat{\imath}}$ . Furthermore, among the summations $\sum_{y_{1}}\cdots\sum_{y_{k}}$ we can ignore those for which $\beta(y_{j})=x_{\hat{\imath}}$ holds. After all, they only contribute for a given $x_{\hat{\imath}}$ value. Let $Y$ be those indices in $[k]$ such that $\beta(y_{j})\neq x_{\hat{\imath}}$ for some $\hat{\imath}$ . Then, we can equivalently write expression (9) as

\prod_{(i,j)\in I^{\prime}}\bm{1}_{x_{i}=x_{j}}\cdot\Biggl{(}\sum_{\beta(y_{i}),i\in Y}\prod_{(\hat{\imath},j)\in K^{\prime}}\bm{1}_{x_{\hat{\imath}}=\beta(y_{j})}\cdot\prod_{(i,j)\in J^{\prime\prime}}\bm{1}_{\beta(y_{i})=\beta(y_{j})}\\ \cdot\beta(\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k}))\Biggr{)},

(11)

where $\beta(\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(y_{1},\ldots,y_{k}))$ denotes the expression obtained by renaming of variables $y_{1},\ldots,y_{j}$ in $\varphi_{i}^{(t-1)}(y_{1},\allowbreak\ldots,y_{k})$ into $x$ -variables according to $\beta$ . This is our desired expression in $\mathsf{TL}_{k}(\Omega)$ . If we analyze the summation depth of this expression, we have by induction that the summation depth of $\varphi_{i}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}$ is at most $(t-1)k$ . In the above expression, we are increasing the summation depth with at most $|Y|$ . The largest size of $Y$ is $k$ , which occurs when none of the $y$ -variables are equal to any of the $x$ -variables. As a consequence, we obtained an expression of summation depth at most $tk$ , as desired. ∎

As a consequence, when using $k\text{-}\mathsf{IGN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ for vertex embeddings, using $(G,v)\to{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v,\ldots,v,:}$ one simply pads the layer expression with $\prod_{i\in[k]}\bm{1}_{x_{1}=x_{i}}$ which does not affect the number of variables or summation depth. When using $k\text{-}\mathsf{IGN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ of graph embeddings, an additional invariant layer is added to obtain an embedding from $G\to\mathbb{R}^{d_{t}}$ . Such invariant layers have a similar (simpler) representation as given in equation 1 (Maron et al., 2019c), and allow for a similar analysis. One can verify that expressions in $\mathsf{TL}_{k}^{\!\scalebox{0.6}{(}(t+1)k\scalebox{0.6}{)}}(\Omega)$ are needed when such an invariant layer is added to previous $t$ layers. Based on this, Theorem 4.2, Lemma E.1 and Theorem 1 in Maron et al. (2019b), imply that $\rho_{1}\bigl{(}k\text{-}\mathsf{IGN}\bigr{)}=\rho_{1}\bigl{(}\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}$ and $\rho_{0}\bigl{(}k\text{-}\mathsf{IGN}\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}$ hold.

$k$ -dimensional GINs.

We can recover a layer-based characterization for $k\text{-}\mathsf{IGN}\text{s}$ that compute vertex embeddings by considering a special subset of $k\text{-}\mathsf{IGN}\text{s}$ . Indeed, the $k\text{-}\mathsf{IGN}\text{s}$ used in Maron et al. (2019b) to show $\rho_{1}(\mathsf{wl}_{k-1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(k\text{-}\mathsf{IGN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ are of a very special form. We extract the essence of these special $k\text{-}\mathsf{IGN}\text{s}$ in the form of $k$ -dimensional $\mathsf{GIN}\text{s}$ . That is, we define the class $k\text{-}\mathsf{GIN}\text{s}$ to consist of layers defined as follows. The initial layers are just as for $k\text{-}\mathsf{IGN}\text{s}$ . Then, for $t\geq 1$ :

{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},:}:=\mathsf{mlp}_{0}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigl{(}{\bm{\mathsfit{F}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}_{v_{1},\ldots,v_{k},:},\sum_{u\in V_{G}}\mathsf{mlp}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}({\bm{\mathsfit{F}}}_{u,v_{2},\ldots,v_{k},:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}),\sum_{u\in V_{G}}\mathsf{mlp}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}({\bm{\mathsfit{F}}}_{v_{1},u,\ldots,v_{k},:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\\ ,\ldots,\sum_{u\in V_{G}}\mathsf{mlp}_{1}^{(t)}({\bm{\mathsfit{F}}}_{v_{1},v_{2},\ldots,v_{k-1},w,:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}))\bigr{)},

where ${\mathsfit{F}}_{v_{1},v_{2},\ldots,v_{k},:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}\in\mathbb{R}^{d_{t-1}}$ , $\mathsf{mlp}_{1}^{(t)}:\mathbb{R}^{d_{t-1}}\to\mathbb{R}^{b_{t}}$ and $\mathsf{mlp}_{1}^{(t)}:\mathbb{R}^{d_{t-1}+kb_{t}}\to\mathbb{R}^{d_{t}}$ are $\mathsf{MLP}\text{s}$ . It is now an easy exercise to show that $\mathsf{k}\textit{-}\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ can be represented in $\mathsf{TL}_{k}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ (remark that the summations used increase the summation depth with one only in each layer). Combined with Theorem 4.2 and by inspecting the proof of Theorem 1 in Maron et al. (2019b), we obtain:

Proposition E.2.

For any $k\geq 2$ and any $t\geq 0$ : $\rho_{1}(k\text{-}\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(\mathsf{vwl}_{k-1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ .

We can define the invariant version of $k\text{-}\mathsf{IGN}\text{s}$ by adding a simple readout layer of the form

\sum_{v_{1},\ldots,v_{k}\in V_{G}}\mathsf{mlp}({\bm{\mathsfit{F}}}_{v_{1},\ldots,v_{k},:}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}),

as is used in Maron et al. (2019b). We obtain, $\rho_{0}(k\text{-}\mathsf{GIN})=\rho_{0}(\mathsf{gwl}_{k-1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}})$ , by simply rephrasing the readout layer in $\mathsf{TL}_{k}(\Omega)$ .

Appendix F Details of Section 6

Let $\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell})$ be the class of all continuous functions from $\mathcal{G}_{s}$ to $\mathbb{R}^{\ell}$ . We always assume that $\mathcal{G}_{s}$ forms a compact space. For example, when vertices are labeled with values in $\{0,1\}^{\ell_{0}}$ , $\mathcal{G}_{s}$ is a finite set which we equip with the discrete topology. When vertices carry labels in $\mathbb{R}^{\ell_{0}}$ we assume that these labels come from a compact set $K\subset\mathbb{R}^{\ell_{0}}$ . In this case, one can represent graphs in $\mathcal{G}_{s}$ by elements in $(\mathbb{R}^{\ell_{0}})^{2}$ and the topology used is the one induced by some metric $\|.\|$ on the reals. Similarly, we equip $\mathbb{R}^{\ell}$ with the topology induced by some metric $\|.\|$ .

Consider $\mathcal{F}\subseteq\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell})$ and define $\overline{\mathcal{F}}$ as the closure of $\mathcal{F}$ in $\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell})$ under the usual topology induced by $f\mapsto\mathsf{sup}_{G,{\bm{v}}}\|f(G,{\bm{v}})\|$ . In other words, a continuous function $h:\mathcal{G}_{s}\to\mathbb{R}^{\ell}$ is in $\overline{\mathcal{F}}$ if there exists a sequence of functions $f_{1},f_{2},\ldots\in\mathcal{F}$ such that $\lim_{i\to\infty}\mathsf{sup}_{G,{\bm{v}}}\|f_{i}(G,{\bm{v}})-h(G,{\bm{v}})\|=0$ . The following theorem provides a characterization of the closure of a set of functions. We state it here modified to our setting.

Theorem F.1 ((Timofte, 2005)).

Let $\mathcal{F}\subseteq\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell})$ such that there exists a set $\mathcal{S}\subseteq\mathcal{C}(\mathcal{G}_{s},\mathbb{R})$ satisfying $\mathcal{S}\cdot\mathcal{F}\subseteq\mathcal{F}$ and $\rho(\mathcal{S})\subseteq\rho(\mathcal{F})$ . Then,

\overline{\mathcal{F}}:=\bigl{\{}f\in\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell})\bigm{|}\rho(F)\subseteq\rho(f),\forall(G,{\bm{v}})\in\mathcal{G}_{s},f(G,{\bm{v}})\in\overline{\mathcal{F}(G,{\bm{v}})}\bigr{\}},

where $\mathcal{F}(G,{\bm{v}}):=\{h(G,{\bm{v}})\mid h\in\mathcal{F}\}\subseteq\mathbb{R}^{\ell}$ . We can equivalently replace $\rho(\mathcal{F})$ by $\rho(\mathcal{S})$ in the expression for $\overline{\mathcal{F}}$ .∎

We will use this theorem to show Theorem 6.1 in the setting that $\mathcal{F}$ consists of functions that can be represented in $\mathsf{TL}(\Omega)$ , and more generally, sets of functions that satisfy two conditions, stated below. We more generally allow $\mathcal{F}$ to consist of functions $f:\mathcal{G}_{s}\to\mathbb{R}^{\ell_{f}}$ , where the $\ell_{f}\in\mathbb{N}$ may depend on $f$ . We will require $\mathcal{F}$ to satisfy the following two conditions:

concatenation-closed:: If $f_{1}:\mathcal{G}_{s}\to\mathbb{R}^{p}$ and $f_{2}:\mathcal{G}_{s}\to\mathbb{R}^{q}$ are in $\mathcal{F}$ , then $g:=(f_{1},f_{2}):\mathcal{G}_{s}\to\mathbb{R}^{p+q}:(G,{\bm{v}})\mapsto(f_{1}(G,{\bm{v}}),f_{2}(G,{\bm{v}}))$ is also in $\mathcal{F}$ .
function-closed:: For a fixed $\ell\in\mathbb{N}$ , for any $f\in\mathcal{F}$ such that $f:\mathcal{G}_{s}\to\mathbb{R}^{p}$ , also $h\circ f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}$ is in $\mathcal{F}$ for any continuous function $h\in\mathcal{C}(\mathbb{R}^{p},\mathbb{R}^{\ell})$ .

We denote by $\mathcal{F}_{\ell}$ be the subset of $\mathcal{F}$ of functions from $\mathcal{G}_{s}$ to $\mathbb{R}^{\ell}$ . See 6.1

Proof.

The proof consist of (i) verifying the existence of a set $\mathcal{S}$ as mentioned Theorem F.1; and of (ii) eliminating the pointwise convergence condition “ $\forall(G,{\bm{v}})\in\mathcal{G}_{s},f(G,{\bm{v}})\in\overline{\mathcal{F}_{\ell}(G,{\bm{v}})}$ in the closure characterization in Theorem F.1.

For showing (ii) we argue that $\overline{\mathcal{F}_{\ell}(G,{\bm{v}})}=\mathbb{R}^{\ell}$ such that the conditions $f(G,{\bm{v}})\in\overline{\mathcal{F}_{\ell}(G,{\bm{v}})}$ is automatically satisfied for any $f\in\mathcal{C}(\mathcal{G}_{s},\mathbb{R}^{\ell})$ . Indeed, take an arbitrary $f\in\mathcal{G}_{k}\to\mathbb{R}^{\ell}$ and consider the constant functions $b_{i}:\mathbb{R}^{\ell}\to\mathbb{R}^{\ell}:{\bm{x}}\mapsto{\bm{b}}_{i}$ with ${\bm{b}}_{i}\in\mathbb{R}^{\ell}$ the $i$ th basis vector. Since $\mathcal{F}$ is function-closed for $\ell$ , so is $\mathcal{F}_{\ell}$ . Hence, $b_{i}:=g_{i}\circ f\in\mathcal{F}_{\ell}$ as well. Furthermore, if $s_{a}:\mathbb{R}^{\ell}\to\mathbb{R}^{\ell}:{\bm{x}}\mapsto a\times{\bm{x}}$ , for $a\in\mathbb{R}$ , then $s_{a}\circ f\in\mathcal{F}_{\ell}$ and thus $\mathcal{F}_{\ell}$ is closed under scalar multiplication. Finally, consider $+:\mathbb{R}^{2\ell}\to\mathbb{R}^{\ell}:({\bm{x}},{\bm{y}})\mapsto{\bm{x}}+{\bm{y}}$ . For $f$ and $g$ in $\mathcal{F}_{\ell}$ , $h=(f,g)\in\mathcal{F}$ since $\mathcal{F}$ is concatenation-closed. As a consequence, the function $+\circ h:\mathcal{G}_{s}\to\mathbb{R}^{\ell}$ is in $\mathcal{F}_{\ell}$ , showing that $\mathcal{F}_{\ell}$ is also closed under addition. All combined, this shows that $\mathcal{F}_{\ell}$ is closed under taking linear combinations and since the basis vectors of $\mathbb{R}^{\ell}$ can be attained, $\overline{\mathcal{F}_{\ell}(G,{\bm{v}})}:=\mathbb{R}^{\ell}$ , as desired.

For (i), we show the existence of a set $\mathcal{S}\subseteq\mathcal{C}(\mathcal{G}_{s},\mathbb{R})$ such that $\mathcal{S}\cdot\mathcal{F}_{\ell}\subseteq\mathcal{F}_{\ell}$ and $\rho_{s}(\mathcal{S})\subseteq\rho_{s}(\mathcal{F}_{\ell})$ hold. Similarly as in Azizian & Lelarge (2021), we define

\mathcal{S}:=\bigl{\{}f\in\mathcal{C}(\mathcal{G}_{s},\mathbb{R})\bigm{|}\underbrace{(f,f,\ldots,f)}_{\text{$\ell$ times}}\in\mathcal{F}_{\ell}\bigr{\}}.

We remark that for $s\in\mathcal{S}$ and $f\in\mathcal{F}_{\ell}$ , $s\cdot f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}:(G,{\bm{v}})\mapsto s(G,{\bm{v}})\odot f(G,{\bm{v}})$ , with $\odot$ being pointwise multiplication, is also in $\mathcal{F}_{\ell}$ . Indeed, $s\cdot f=\odot\circ(s,f)$ with $(s,f)$ the concatenation of $s$ and $f$ and $\odot:\mathbb{R}^{2\ell}\to\mathbb{R}^{\ell}:({\bm{x}},{\bm{y}})\to{\bm{x}}\odot{\bm{y}}$ being pointwise multiplication.

It remains to verify $\rho_{s}(\mathcal{S})\subseteq\rho_{s}(\mathcal{F}_{\ell})$ . Assume that $(G,{\bm{v}})$ and $(H,{\bm{w}})$ are not in $\rho_{s}(\mathcal{F}_{\ell})$ . By definition, this implies the existence of a function $\hat{f}\in\mathcal{F}_{\ell}$ such that $\hat{f}(G,{\bm{v}})={\bm{a}}\neq{\bm{b}}=\hat{f}(H,{\bm{w}})$ with ${\bm{a}},{\bm{b}}\in\mathbb{R}^{\ell}$ . We argue that $(G,{\bm{v}})$ and $(H,{\bm{w}})$ are also not in $\rho_{s}(\mathcal{S})$ either. Indeed, Proposition 1 in Maron et al. (2019b) implies that there exists natural numbers $\boldsymbol{\alpha}=(\alpha_{1},\ldots,\alpha_{\ell})\in\mathbb{N}^{\ell}$ such that the mapping $h_{\boldsymbol{\alpha}}:\mathbb{R}^{\ell}\to\mathbb{R}:{\bm{x}}\to\prod_{i=1}^{\ell}{x}_{i}^{\alpha_{i}}$ satisfies $h_{\boldsymbol{\alpha}}({\bm{a}})=a\neq b=h_{\boldsymbol{\alpha}}({\bm{b}})$ , with $a,b\in\mathbb{R}$ . Since $\mathcal{F}$ (and thus also $\mathcal{F}_{\ell})$ is function-closed, $h_{\boldsymbol{\alpha}}\circ f\in\mathcal{F}_{\ell}$ for any $f\in\mathcal{F}_{\ell}$ . In particular, $g:=h_{\boldsymbol{\alpha}}\circ\hat{f}\in\mathcal{F}_{\ell}$ and concatenation-closure implies that $(g,\ldots,g):\mathcal{G}_{s}\to\mathbb{R}^{\ell}$ is in $\mathcal{F}_{\ell}$ too. Hence, $g\in\mathcal{S}$ , by definition. It now suffices to observe that $g(G,{\bm{v}})=h_{\boldsymbol{\alpha}}(\hat{f}(G,{\bm{v}}))=a\neq b=h_{\boldsymbol{\alpha}}(\hat{f}(H,{\bm{w}}))=g(H,{\bm{w}})$ , and thus $(G,{\bm{v}})$ and $(H,{\bm{w}})$ are not in $\rho_{s}(\mathcal{S})$ , as desired. ∎

When we know more about $\rho_{s}(\mathcal{F}_{\ell})$ we can say a bit more. In the following, we let $\mathsf{alg}\in\{\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\}$ and only consider the setting where $s$ is either $0$ (invariant graph functions) or $s=1$ (equivariant graph/vertex functions). See 6.2

Proof.

This is just a mere restatement of Theorem 6.1 in which $\rho_{s}(\mathcal{F}_{\ell})$ in the condition $\rho_{s}(\mathcal{F}_{\ell})\subseteq\rho_{s}(f)$ is replaced by $\rho_{s}(\mathsf{alg})$ , where $s=1$ for $\mathsf{alg}\in\{\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\}$ and $s=0$ for $\mathsf{alg}\in\{\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}},\allowbreak\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\}$ . ∎

To relate all this to functions representable by tensor languages, we make the following observations. First, if we consider $\mathcal{F}$ to be the set of all functions that can be represented in $\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ , $\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)$ , $\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ or $\mathsf{TL}(\Omega)$ , then $\mathcal{F}$ will be automatically concatenation and function-closed, provided that $\Omega$ consists of all functions in $\textstyle\bigcup_{p}\mathcal{C}(\mathbb{R}^{p},\mathbb{R}^{\ell})$ . Hence, Theorem 6.1 applies. Furthermore, our results from Section 4 tell us that for all $t\geq 0$ , and $k\geq 1$ , $\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}$ , $\rho_{0}\bigl{(}\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{0}\bigl{(}\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}$ , $\rho_{1}\bigl{(}\mathsf{ewl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}=\rho_{1}\bigl{(}\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)\bigr{)}$ , and $\rho_{0}\bigl{(}\mathsf{TL}_{k+1}(\Omega)\bigr{)}=\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}$ . As a consequence, Corollary 6.2 applies as well. We thus easily obtain the following characterizations:

Proposition F.2.

For any $t\geq 0$ and $k\geq 1$ :

•

If $\mathcal{F}$ consists of all functions representable in $\mathsf{GTL}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ , then $\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}\bigl{(}\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}(f)\}$ ;
•

If $\mathcal{F}$ consists of all functions representable in $\mathsf{TL}_{k+1}^{\!\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ , then $\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}\bigl{(}\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{1}(f)\}$ ;
•

If $\mathcal{F}$ consists of all functions representable in $\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)$ , then $\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{0}\to\mathbb{R}^{\ell}\mid\rho_{0}\bigl{(}\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}(f)\}$ ; and finally,
•

If $\mathcal{F}$ consists of all functions representable in $\mathsf{TL}_{k+1}(\Omega)$ , then $\overline{\mathcal{F}_{\ell}}=\{f:\mathcal{G}_{0}\to\mathbb{R}^{\ell}\mid\rho_{0}\bigl{(}\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}\bigr{)}\subseteq\rho_{0}(f)\}$ ,

provided that $\Omega$ consists of all functions in $\textstyle\bigcup_{p}\mathcal{C}(\mathbb{R}^{p},\mathbb{R}^{\ell})$ .

In fact, Lemma 32 in Azizian & Lelarge (2021) implies that we can equivalently populate $\Omega$ with all $\mathsf{MLP}\text{s}$ instead of all continuous functions. We can thus use $\mathsf{MLP}\text{s}$ and continuous functions interchangeably when considering the closure of functions.

At this point, we want to make a comparison with the results and techniques in Azizian & Lelarge (2021). Our proof strategy is very similar and is also based on Theorem F.1. The key distinguishing feature is that we consider functions $f:\mathcal{G}_{s}\to\mathbb{R}^{\ell_{f}}$ instead of functions from graphs alone. This has as great advantage that no separate proofs are needed to deal with invariant or equivariant functions. Equivariance incurs quite some complexity in the setting considered in Azizian & Lelarge (2021). A second major difference is that, by considering functions representable in tensor languages, and based on our results from Section 4, we obtain a more fine-grained characterization. Indeed, we obtain characterizations in terms of the number of rounds used in $\mathsf{CR}$ and $k\text{-}\mathsf{WL}$ . In Azizian & Lelarge (2021), $t$ is always set to $\infty$ , that is, an unbounded number of rounds is considered. Furthermore, when it concerns functions $f:\mathcal{G}_{1}\to\mathbb{R}^{\ell_{f}}$ , we recall that $\mathsf{CR}$ is different from $1\text{-}\mathsf{WL}$ . Only $1\text{-}\mathsf{WL}$ is considered in Azizian & Lelarge (2021). Finally, another difference is that we define the equivariant version $\mathsf{vwl}_{k}$ in a different way than is done in Azizian & Lelarge (2021), because in this way, a tighter connection to logics and tensor languages can be made. In fact, if we were to use the equivariant version of $k\text{-}\mathsf{WL}$ from Azizian & Lelarge (2021), then we necessarily have to consider an unbounded number of rounds (similarly as in our $\mathsf{gwl}_{k}$ case).

We conclude this section by providing a little more details about the consequences of the above results for $\mathsf{GNN}\text{s}$ . As we already mentioned in Section 6.2, many common $\mathsf{GNN}$ architectures are concatenation and function-closed (using $\mathsf{MLP}\text{s}$ instead of continuous functions). This holds, for example, for the classes $\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , $\mathsf{e}\mathsf{GIN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{\ell}$ , $k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ and $k\text{-}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ and $k\text{-}\mathsf{IGN}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , as described in Section 5 and further detailed in Section E and D. Here, the subscript $\ell$ refers to the dimension of the embedding space.

We now consider a function $f$ that is not more separating than $\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ (respectively, $\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ , $\mathsf{vwl}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{k}$ or $\mathsf{gwl}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}_{k}$ , for some $k\geq 1$ ), and want to know whether $f$ can be approximated by a class of $\mathsf{GNN}\text{s}$ . Proposition F.2 tells that such $f$ can be approximated by a class of $\mathsf{GNN}\text{s}$ as long as these are at least as separating as $\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ (respectively, $\mathsf{TL}_{2}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}$ , $\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ or $\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}$ ). This, in turn, amounts showing that the $\mathsf{GNN}\text{s}$ can be represented in the corresponding tensor language fragment, and that they can match the corresponding labeling algorithm in separation power. We illustrate this for the $\mathsf{GNN}$ architectures mentioned above.

•

In Section 5 we showed that $\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ can be represented in $\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ . Theorem 4.3 then implies that $\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ . Furthermore, Xu et al. (2019) showed that $\rho_{1}(\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ . As a consequence, $\rho_{1}(\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ . We note that the lower bound for $\mathsf{GIN}\text{s}$ only holds when graphs carry discrete labels. The same restriction is imposed in Azizian & Lelarge (2021).
•

In Section 5 we showed that $\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ can be represented in $\mathsf{TL}_{2}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ . Theorem 4.2 then implies that $\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ . Furthermore, Barceló et al. (2020) showed that $\rho_{1}(\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ . As a consequence, $\rho_{1}(\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ . Again, the lower bound is only valid when graphs carry discrete labels.
•

In Section 5 we mentioned (see details in Section D) that $k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ can be represented in $\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ . Theorem 4.2 then implies that $\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ . Furthermore, Maron et al. (2019b) showed that $\rho_{1}(k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ . As a consequence, $\rho_{1}(k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ . Similarly, $\rho_{1}((k+1)\text{-}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ for the special class of $(k+1)\text{-}\mathsf{IGN}\text{s}$ described in Section E. No restrictions are in place for the lower bounds and hence real-valued vertex-labelled graphs can be considered.
•

When $\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ or $\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ are extended with a readout layer, we showed in Section 5 that these can be represented in $\mathsf{TL}_{2}^{\!\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)$ . Theorem 4.4 and the results by Xu et al. (2019) and Barceló et al. (2020) then imply that $\rho_{0}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ and $\rho_{0}(\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})$ coincide with the separation power of these architectures with a readout layer. Here again, discrete labels need to be considered.
•

Similarly, when $k\text{-}\mathsf{FGNN}$ or $(k+1)\text{-}\mathsf{IGN}\text{s}$ are used for graph embeddings, we can represent these in $\mathsf{TL}_{k+1}(\Omega)$ resulting again that their separation power coincides with that of $\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}}$ . No restrictions are again in place on the vertex labels.

So for all these architectures, Corollary 6.2 applies and we can characterize the closures of these architectures in terms of functions that not more separating than their corresponding versions of $\mathsf{cr}$ or $k\text{-}\mathsf{WL}$ , as described in the main paper. In summary,

Proposition F.3.

For any $t\geq 0$ :

	$\displaystyle\overline{\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}$	$\displaystyle=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f)\}=\overline{\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)_{\ell}}$
	$\displaystyle\overline{\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}$	$\displaystyle=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}(\mathsf{vwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f)\}=\overline{\mathsf{TL}_{2}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)_{\ell}}$
and when extended with a readout layer:
	$\displaystyle\overline{\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}=\overline{\mathsf{e}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}$	$\displaystyle=\{f:\mathcal{G}_{0}\to\mathbb{R}^{\ell}\mid\rho_{0}(\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{0}(f)\}=\overline{\mathsf{TL}_{2}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}(\Omega)_{\ell}}.$
Furthermore, for any $k\geq 1$
	$\displaystyle\overline{k\text{-}\mathsf{FGNN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}=\overline{k\text{-}\mathsf{GIN}_{\ell}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}$	$\displaystyle=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\subseteq\rho_{1}(f)\}=\overline{\mathsf{TL}_{k+1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)_{\ell}}$
	$\displaystyle\overline{(k+1)\text{-}\mathsf{IGN}_{\ell}}$	$\displaystyle=\{f:\mathcal{G}_{1}\to\mathbb{R}^{\ell}\mid\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}})\subseteq\rho_{1}(f)\}=\overline{\mathsf{TL}_{k+1}(\Omega)_{\ell}}$
and when converted into graph embeddings:
	$\displaystyle\overline{k\text{-}\mathsf{FGNN}_{\ell}}=\overline{k\text{-}\mathsf{GIN}_{\ell}}$	$\displaystyle=\overline{(k+1)\text{-}\mathsf{IGN}_{\ell}}=\{f:\mathcal{G}_{0}\to\mathbb{R}^{\ell}\mid\rho_{0}(\mathsf{gwl}_{k}^{\scalebox{0.6}{(}\infty\scalebox{0.6}{)}})\subseteq\rho_{0}(f)\}=\overline{\mathsf{TL}_{k+1}(\Omega)_{\ell}},$

where the closures of the tensor languages are interpreted as the closure of the graph or graph/vertex functions that they can represent. For results involving $\mathsf{GIN}\text{s}$ or $\mathsf{e}\mathsf{GIN}\text{s}$ , the graphs considered should have discretely labeled vertices.

As a side note, we remark that in order to simulate $\mathsf{CR}$ on graphs with real-valued labels, one can use a $\mathsf{GNN}$ architecture of the form ${\bm{F}}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}_{v:}=\bigl{(}{\bm{F}}_{v:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}},\sum_{u\in N_{G}(v)}\mathsf{mlp}({\bm{F}}_{u:}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}})\bigr{)}$ , which translates in $\mathsf{GTL}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(\Omega)$ as expressions of the form

\varphi_{j}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}):=\begin{cases}\varphi_{j}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})&1\leq j\leq d_{t-1}\\ \textstyle\sum_{x_{2}}E(x_{1},x_{2})\cdot\mathsf{mlp}_{j}\bigl{(}\varphi_{1}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}),\ldots,\varphi_{{d_{t}}}^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1})\bigr{)}&d_{t-1}<j\leq d_{t}.\end{cases}

The upper bound in terms of $\mathsf{CR}$ follows from our main results. To show that $\mathsf{CR}$ can be simulated, it suffices to observe that one can approximate the function used in Proposition 1 in Maron et al. (2019b) to injectively encode multisets of real vectors by means of $\mathsf{MLP}\text{s}$ . As such, a continuous version of the first bullet in the previous proposition can be obtained.

Appendix G Details on Treewidth and Proposition 4.5

As an extension of our main results in Section 4, we enrich the class of tensor language expressions for which connections to $k\text{-}\mathsf{WL}$ exist. More precisely, instead of requiring expressions to belong to $\mathsf{TL}_{k+1}(\Omega)$ , that is to only use $k+1$ index variables, we investigate when expressions in $\mathsf{TL}(\Omega)$ are semantically equivalent to an expression using $k+1$ variables. Proposition 4.5 identifies a large class of such expressions, those of treewidth $k$ . As a consequence, even when representing $\mathsf{GNN}$ architectures may require more than $k+1$ index variables, sometimes this number can be reduced. As a consequence of our results, this implies that their separation power is in fact upper bounded by $\ell\text{-}\mathsf{WL}$ for a smaller $\ell<k$ . Stated otherwise, to boost the separation power of $\mathsf{GNN}\text{s}$ , the treewidth of the expressions representing the layers of the $\mathsf{GNN}\text{s}$ must have large treewidth.

We next introduce some concepts related to treewidth. We here closely follow the exposition given in Abo Khamis et al. (2016) for introducing treewidth by means variable elimination sequences of hypergraphs.

In this section, we restrict ourselves to summation aggregation.

G.1 Elimination sequences

We first define elimination sequences for hypergraphs. Later on, we show how to associate such hypergraphs to expressions in tensor languages, allowing us to define elimination sequences for tensor language expressions.

With a multi-hypergraph $\mathcal{H}=(\mathcal{V},\mathcal{E})$ we simply mean a multiset $\mathcal{E}$ of subsets of vertices $\mathcal{V}$ . An elimination hypergraph sequences is a vertex ordering $\sigma=v_{1},\ldots,v_{n}$ of the vertices of $\mathcal{H}$ . With such a sequence $\sigma$ , we can associate for $j=n,n-1,n-2,\ldots,1$ a sequence of $n$ multi-hypergraphs $\mathcal{H}_{n}^{\sigma},\mathcal{H}_{n-1}^{\sigma},\ldots,\mathcal{H}_{1}^{\sigma}$ as follows. We define

	$\displaystyle\mathcal{H}_{n}$	$\displaystyle:=(\mathcal{V}_{n},\mathcal{E}_{n}):=\mathcal{H}$
	$\displaystyle\partial(v_{n})$	$\displaystyle:=\{F\in\mathcal{E}_{n}\mid v_{n}\in F\}$
	$\displaystyle U_{n}$	$\displaystyle:=\bigcup_{F\in\partial(v_{n})}F.$
and for $j=n-1,n-2,\ldots,1:$
	$\displaystyle\mathcal{V}_{j}$	$\displaystyle:=\{v_{1},\ldots,v_{j}\}$
	$\displaystyle\mathcal{E}_{j}$	$\displaystyle:=(\mathcal{E}_{j+1}\setminus\partial(v_{j+1}))\cup\{U_{j+1}\setminus\{v_{j+1}\}\}$
	$\displaystyle\partial(v_{j})$	$\displaystyle:=\{F\in\mathcal{E}_{j}\mid v_{j}\in F\}$
	$\displaystyle U_{j}$	$\displaystyle:=\bigcup_{F\in\partial(v_{j})}F.$

The induced width on $\mathcal{H}$ by $\sigma$ is defined as $\max_{i\in[n]}|U_{i}|-1$ . We further consider the setting in which $\mathcal{H}$ has some distinguished vertices. As we will see shortly, these distinguished vertices correspond to the free index variables of tensor language expressions. Without loss of generality, we assume that the distinguished vertices are $v_{1},v_{2},\ldots,v_{f}$ . When such distinguished vertices are present, an elimination sequence is just as before, except that the distinguished vertices come first in the sequence. If $v_{1},\ldots,v_{f}$ are the distinguished vertices, then we define the induced width of the sequence as $f+\max_{f+1\leq i\leq n}|U_{i}\setminus\{v_{1},\ldots,v_{f}\}|-1$ . In other words, we count the number of distinguished vertices, and then augment it with the induced width of the sequence, starting from $v_{f+1}$ to to $v_{n}$ , hereby ignoring the distinguished variables in the $U_{i}$ ’s. One could, more generally, also try to reduce the number of free index variables but we assume that this number is fixed, similarly as how $\mathsf{GNN}\text{s}$ operate.

G.2 Conjunctive $\mathsf{TL}$ expressions and treewidth

We start by considering a special form of $\mathsf{TL}$ expressions, which we refer to as conjunctive $\mathsf{TL}$ expressions, in analogy to conjunctive queries in database research and logic. A conjunctive $\mathsf{TL}$ expression is of the form

\varphi({\bm{x}})=\sum_{{\bm{y}}}\psi({\bm{x}},{\bm{y}}).

where ${\bm{x}}$ denote the free index variables, ${\bm{y}}$ contains all index variables under the scope of a summation, and finally, $\psi({\bm{x}},{\bm{y}})$ is a product of base predicates in $\mathsf{TL}$ . That is, $\psi({\bm{x}},{\bm{y}})$ is a product of $E(z_{i},z_{j})$ and $P_{\ell}(z_{i})$ with $z_{i},z_{j}$ variables in ${\bm{x}}$ or ${\bm{y}}$ . With such a conjunctive $\mathsf{TL}$ expression, one can associate a multi-hypergraph in a canonical way (Abo Khamis et al., 2016). More precisely, given a conjunctive $\mathsf{TL}$ expression $\varphi({\bm{x}})$ we define $\mathcal{H}_{\varphi}$ as:

•

$\mathcal{V}_{\varphi}$ consist of all index variables in ${\bm{x}}$ and ${\bm{y}}$ ;
•

$\mathcal{E}_{\varphi}$ : for each atomic base predicate $\tau$ in $\psi$ we have an edge $F_{\tau}$ containing the indices occurring in the predicate; and
•

the vertices corresponding to the free index variables ${\bm{x}}$ form the distinguishing set of vertices.

We now define an elimination sequence for $\varphi$ as an elimination sequence for $\mathcal{H}_{\varphi}$ taking the distinguished vertices into account. The following observation ties elimination sequences of $\varphi$ to the number of variables needed to express $\varphi$ .

Proposition G.1.

Let $\varphi({\bm{x}})$ be a conjunctive $\mathsf{TL}$ expression for which an elimination sequence of induced with $k-1$ exists. Then $\varphi({\bm{x}})$ is equivalent to an expression $\tilde{\varphi}({\bm{x}})$ in $\mathsf{TL}_{k}$ .

Proof.

We show this by induction on the number of vertices in $\mathcal{H}_{\varphi}$ which are not distinguished. For the base case, all vertices are distinguished and hence $\varphi({\bm{x}})$ does not contain any summation and is an expression in $\mathsf{TL}_{k}$ itself.

Suppose that in $\mathcal{H}_{\varphi}$ there are $p$ undistinguished vertices. That is,

\varphi({\bm{x}})=\sum_{y_{1}}\cdots\sum_{y_{p}}\psi({\bm{x}},{\bm{y}}).

By assumption, we have an elimination sequence of the undistinguished vertices. Assume that $y_{p}$ is first in this ordering. Let us write

	$\displaystyle\varphi({\bm{x}})$	$\displaystyle=\sum_{y_{1}}\cdots\sum_{y_{p}}\psi({\bm{x}},{\bm{y}})$
		$\displaystyle=\sum_{y_{1}}\cdots\sum_{y_{p-1}}\psi_{1}({\bm{x}},{\bm{y}}\setminus y_{p})\cdot\sum_{y_{p}}\psi_{2}({\bm{x}},{\bm{y}})$

where $\psi_{1}$ is the product of predicates corresponding to the edges $F\in\mathcal{E}_{\varphi}\setminus\partial(y_{p})$ , that is, those not containing $y_{p}$ , and $\psi_{2}$ is the product of all predicates corresponding to the edges $F\in\partial(y_{p})$ , that is, those containing the predicate $y_{p}$ . Note that, because of the induced width of $k-1$ , $\sum_{y_{p}}\psi_{2}({\bm{x}},{\bm{y}})$ contains all indices in $U_{p}$ which is of size $\leq k$ . We now replace the previous expression with another expression

\displaystyle\varphi^{\prime}({\bm{x}})

\displaystyle=\sum_{y_{1}}\cdots\sum_{y_{p-1}}\psi_{1}({\bm{x}},{\bm{y}}\setminus y_{p})\cdot R_{p}({\bm{x}},{\bm{y}})

Where $R_{p}$ is regarded as an $|U_{p}|-1$ -ary predicate over the indices in $U_{p}\setminus y_{p}$ . It is now easily verified that $\mathcal{H}_{\varphi^{\prime}}$ is the hypergraph $\mathcal{H}_{p-1}$ corresponding to the variable ordering $\sigma$ . We note that this is a hypergraph over $p-1$ undistinguished vertices. We can apply the induction hypothesis and replace $\varphi^{\prime}({\bm{x}})$ with its equivalent expression $\tilde{\varphi}^{\prime}({\bm{x}})$ in $\mathsf{TL}_{k}$ . To obtain the expression $\tilde{\varphi}({\bm{x}})$ of $\varphi({\bm{x}})$ , it now remains to replace the new predicate $R_{p}$ with its defining expression. We note again that $R_{p}$ contains at most $k-1$ indices, so it will occur in $\tilde{\varphi}^{\prime}({\bm{x}})$ in the form $R_{p}({\bm{x}},{\bm{z}})$ where $|{\bm{z}}|\leq k-1$ . In other words, one of the variables in ${\bm{z}}$ is not used, say $z_{s}$ , and we can simply replace $R_{p}({\bm{x}},{\bm{z}})$ by $\sum_{z_{s}}\psi_{x}({\bm{x}},{\bm{z}},z_{s})$ . ∎

As a consequence, one way of showing that a conjunctive expression $\varphi({\bm{x}})$ in $\mathsf{TL}$ is equivalently expressible in $\mathsf{TL}_{k}$ , is to find an elimination sequence of induced width $k-1$ . This in turn is equivalent to $\mathcal{H}_{\varphi}$ having a treewidth of $k-1$ , as is shown, e.g., in Abo Khamis et al. (2016). As usual, we define the treewidth of a conjunctive expression $\varphi({\bm{x}})$ in $\mathsf{TL}$ as the treewidth of its associated hypergraph $\mathcal{H}_{\varphi}$ .

We recall the definition of treewidth (modified to our setting): A tree decomposition $T=(V_{T},E_{T},\xi_{T})$ of $\mathcal{H}_{\varphi}$ with $\xi_{T}:V_{T}\to 2^{\mathcal{V}}$ is such that

•

For any $F\in\mathcal{E}$ , there is a $t\in V_{T}$ such that $F\subseteq\xi_{T}(t)$ ; and
•

For any $v\in\mathcal{V}$ corresponding to a non-distinguished index variable, the set $\{t\mid t\in V_{T},v\in\xi(t)\}$ is not empty and forms a connected sub-tree of $T$ .

The width of a tree decomposition $T$ is given by $\max_{t\in V_{T}}|\xi_{T}(t)|-1$ . Now the treewidth of $\mathcal{H}_{\varphi}$ , $\mathsf{tw}(\mathcal{H})$ is the minimum width of any of its tree decompositions. We denote by $\mathsf{tw}(\varphi)$ the treewidth of $\mathcal{H}_{\varphi}$ . Again, similar modifications are used when distinguished vertices are in place. Referring again to Abo Khamis et al. (2016), $\mathsf{tw}(\varphi)=k-1$ is equivalent to having a variable elimination sequence for $\varphi$ of an induced width of $k-1$ . Hence, combining this observation with Proposition G.1 results in:

Corollary G.2.

Let $\varphi({\bm{x}})$ be a conjunctive $\mathsf{TL}$ expression of treewidth $k-1$ . Then $\varphi({\bm{x}})$ is equivalent to an expression $\tilde{\varphi}({\bm{x}})$ in $\mathsf{TL}_{k}$ .

That is, we have established Proposition 4.5 for conjunctive $\mathsf{TL}$ expressions. We next lift this to arbitrary $\mathsf{TL}(\Omega)$ expressions.

G.3 Arbitrary $\mathsf{TL}(\Omega)$ expressions

First, we observe that any expression in $\mathsf{TL}$ can be written as a linear combination of conjunctive expressions. This readily follows from the linearity of the operations in $\mathsf{TL}$ and that equality and inequality predicates can be eliminated. More specifically, we may assume that $\varphi({\bm{x}})$ in $\mathsf{TL}$ is of the form

\sum_{\alpha\in A}a_{\alpha}\psi_{\alpha}({\bm{x}},{\bm{y}}),

with $A$ finite set of indices and $a_{\alpha}\in\mathbb{R}$ , and $\psi_{\alpha}({\bm{x}},{\bm{y}})$ conjunctive $\mathsf{TL}$ expressions. We now define

\mathsf{tw}(\varphi):=\max\{\mathsf{tw}(\psi_{\alpha})\mid\alpha\in A\}

for expressions in $\mathsf{TL}$ . To deal with expressions in $\mathsf{TL}(\Omega)$ that may contain function application, we define $\mathsf{tw}(\varphi)$ as the maximum treewidth of the expressions: (i) $\varphi_{\mathsf{nofun}}({\bm{x}})\in\mathsf{TL}$ obtained by replacing each top-level function application $f(\varphi_{1},\ldots,\varphi_{p})$ by a new predicate $R_{f}$ with free indices $\mathsf{free}(\varphi_{1})\cup\cdots\cup\mathsf{free}(\varphi_{p})$ ; and (ii) all expressions $\varphi_{1},\ldots,\varphi_{p}$ occurring in a top-level function application $f(\varphi_{1},\ldots,\varphi_{p})$ in $\varphi$ . We note that these expression either have no function applications (as in (i)) or have function applications of lower nesting depth (in $\varphi$ , as in $(ii)$ ). In other words, applying this definition recursively, we end up with expressions with no function applications, for which treewidth was already defined. With this notion of treewidth at hand, Proposition 4.5 now readily follows.

Appendix H Higher-order MPNNs

We conclude the supplementary material by elaborating on $k$ - $\mathsf{MPNN}\text{s}$ and by relating them to classical $\mathsf{MPNN}\text{s}$ (Gilmer et al., 2017). As underlying tensor language we use $\mathsf{TL}_{k+1}(\Omega,\Theta)$ which includes arbitrary functions ( $\Omega$ ) and aggregation functions ( $\Theta$ ), as defined in Section C.5.

We recall from Section 3 that $k$ - $\mathsf{MPNN}\text{s}$ refer to the class of embeddings $f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}$ for some $\ell\in\mathbb{N}$ that can be represented in $\mathsf{TL}_{k+1}(\Omega,\Theta)$ . When considering an embedding $f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}$ , the notion of being represented is defined in terms of the existence of $\ell$ expressions in $\mathsf{TL}_{k+1}(\Omega,\Theta)$ , which together provide each of the $\ell$ components of the embedding in $\mathbb{R}^{\ell}$ . We remark, however, that we can alternatively include concatenation in tensor language. As such, we can concatenate $\ell$ separate expressions into a single expression. As a positive side effect, for $f:\mathcal{G}_{s}\to\mathbb{R}^{\ell}$ to be represented in tensor language, we can then simply define it by requiring the existence of a single expression, rather than $\ell$ separate ones. This results in a slightly more succinct way of reasoning about $k$ - $\mathsf{MPNN}\text{s}$ .

In order to reason about $k$ - $\mathsf{MPNN}\text{s}$ as a class of embeddings, we can obtain an equivalent definition for the class of $k$ - $\mathsf{MPNN}\text{s}$ by inductively stating how new embeddings are computed out of old embeddings. Let $X=\{x_{1},\ldots,x_{k+1}\}$ be a set of $k+1$ distinct variables. In the following, ${\bm{v}}$ denotes a tuple of vertices that have at least as many components as the highest index of variables used in expressions. Intuitively, variable $x_{j}$ refers to the $j$ th component in ${\bm{v}}$ . We also denote the image of a graph $G$ and tuple ${\bm{v}}$ by an expression $\varphi$ , i.e., the semantics of $\varphi$ given $G$ and ${\bm{v}}$ , as $\varphi(G,{\bm{v}})$ rather than by $[\![\varphi,{\bm{v}}]\!]_{G}$ . We further simply refer to embeddings rather than expressions.

We first define “atomic” $k$ - $\mathsf{MPNN}$ embeddings which extract basic information from the graph $G$ and the given tuple ${\bm{v}}$ of vertices.

•

Label embeddings of the form $\varphi(x_{i}):=\mathsf{P}_{s}(x_{i})$ , with $x_{i}\in X$ , and defined by $\varphi(G,{\bm{v}}):=(\mathsf{col}_{G}(v_{i}))_{s}$ , are $k$ - $\mathsf{MPNN}\text{s}$ ;
•

Edge embeddings of the form $\varphi(x_{i},x_{j}):=\mathsf{E}(x_{i},x_{j})$ , with $x_{i},x_{j}\in X$ , and defined by

$\varphi(G,{\bm{v}}):=\begin{cases}1&\text{if $v_{i}v_{j}\in E_{G}$}\\ 0&\text{otherwise},\end{cases}$

are $k$ - $\mathsf{MPNN}\text{s}$ ; and
•

(Dis-)equality embeddings of the form $\varphi(x_{i},x_{j}):=\bm{1}_{x_{i}\mathop{\mathsf{op}}x_{j}}$ , with $x_{i},x_{j}\in X$ , and defined by

$\varphi(G,{\bm{v}}):=\begin{cases}1&\text{if $v_{i}\mathop{\mathsf{op}}v_{j}$}\\ 0&\text{otherwise},\end{cases}$

are $k$ - $\mathsf{MPNN}\text{s}$ .

We next inductively define new $k$ - $\mathsf{MPNN}\text{s}$ from “old” $k$ - $\mathsf{MPNN}\text{s}$ . That is, given $k$ - $\mathsf{MPNN}\text{s}$ $\varphi_{1}({\bm{x}}_{1}),\ldots,\varphi_{\ell}({\bm{x}}_{\ell})$ , the following are also $k$ - $\mathsf{MPNN}\text{s}$ :

•

Function applications of the form $\varphi({\bm{x}}):=\mathbf{f}(\varphi_{1}({\bm{x}}_{1}),\ldots,\varphi_{\ell}({\bm{x}}_{\ell})$ are $k$ - $\mathsf{MPNN}\text{s}$ , where ${\bm{x}}={\bm{x}}_{1}\cup\cdots\cup{\bm{x}}_{\ell}$ , and defined by

\varphi(G,{\bm{v}}):=\mathbf{f}\left(\varphi_{1}(G,{\bm{v}}|_{{\bm{x}}_{1}}),\ldots,\varphi_{\ell}(G,{\bm{v}}|_{{\bm{x}}_{\ell}})\right).

Here, if $\varphi_{i}(G,{\bm{v}}|_{{\bm{x}}_{i}})\in\mathbb{R}^{d_{i}}$ , then $\mathbf{f}:\mathbb{R}^{d_{1}}\times\cdots\times\mathbb{R}^{d_{\ell}}\to\mathbb{R}^{d}$ for some $d\in\mathbb{N}$ . That is, $\varphi$ generates an embedding in $\mathbb{R}^{d}$ . We remark that our function applications include concatenation.

•

Unconditional aggregations of the form $\varphi({\bm{x}}):=\mathsf{agg}_{x_{j}}^{\mathbf{F}}(\varphi_{1}({\bm{x}},x_{j}))$ are $k$ - $\mathsf{MPNN}\text{s}$ , where $x_{j}\in X$ and $x_{j}\not\in{\bm{x}}$ , and defined by

\varphi(G,{\bm{v}}):=\mathbf{F}\bigl{(}\{\!\!\{\varphi_{1}(G,v_{1},\ldots,v_{j-1},w,v_{j+1},\ldots,v_{k})\mid w\in V_{G}\}\!\}\bigr{)}.

Here, if $\varphi_{1}$ generates an embedding in $\mathbb{R}^{d_{1}}$ , then $\mathbf{F}$ is an aggregation function assigning to multisets of vectors in $\mathbb{R}^{d_{1}}$ a vector in $\mathbb{R}^{d}$ , for some $d\in\mathbb{N}$ . So, $\varphi$ generates an embedding in $\mathbb{R}^{d}$ .

•

Conditional aggregations of the form $\varphi(x_{i}):=\mathsf{agg}_{x_{j}}^{\mathbf{F}}(\varphi_{1}(x_{i},x_{j})|E(x_{i},x_{j}))$ are $k$ - $\mathsf{MPNN}\text{s}$ , with $x_{i},x_{j}\in X$ , and defined by

$\varphi(G,{\bm{v}}):=\mathbf{F}\bigl{(}\{\!\!\{\varphi_{1}(G,v_{i},w)\mid w\in N_{G}(v_{i})\}\!\}\bigr{)}.$

As before, if $\varphi_{1}$ generates an embedding in $\mathbb{R}^{d_{1}}$ , then $\mathbf{F}$ is an aggregation function assigning to multisets of vectors in $\mathbb{R}^{d_{1}}$ a vector in $\mathbb{R}^{d}$ , for some $d\in\mathbb{N}$ . So again, $\varphi$ generates an embedding in $\mathbb{R}^{d}$ .

As defined in the main paper, we also consider the subclass $k$ - $\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ by only considering $k$ - $\mathsf{MPNN}\text{s}$ defined in terms of expressions of aggregation depth at most $t$ . Our main results, phrased in terms of $k$ - $\mathsf{MPNN}\text{s}$ are:

\rho_{1}(\mathsf{vwl}_{k}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(k\text{-}\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})\text{ and }\rho_{0}(\mathsf{gwl}_{k})=\rho_{0}(k\text{-}\mathsf{MPNN}\text{s}).

Hence, if the embeddings computed by $\mathsf{GNN}\text{s}$ are $k$ - $\mathsf{MPNN}\text{s}$ , one obtains an upper bound on the separation power in terms of $k\text{-}\mathsf{WL}$ .

The classical $\mathsf{MPNN}\text{s}$ (Gilmer et al., 2017) are subclass of $1$ - $\mathsf{MPNN}\text{s}$ in which no unconditional aggregation can be used and furthermore, function applications require input embeddings with the same single variable ( $x_{1}$ or $x_{2}$ ), and only $\bm{1}_{x_{i}=x_{i}}$ and $\bm{1}_{x_{i}\neq x_{i}}$ are allowed. In other words, they correspond to guarded tensor language expressions (Section 4.2). We denote this class of $1$ - $\mathsf{MPNN}\text{s}$ by $\mathsf{MPNN}\text{s}$ and by $\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}$ when restrictions on aggregation depth are in place. And indeed, the classical way of describing $\mathsf{MPNN}\text{s}$ as

	$\displaystyle\varphi^{\scalebox{0.6}{(}0\scalebox{0.6}{)}}(x_{1})$	$\displaystyle=(P_{1}(x_{1}),\ldots,P_{\ell}(x_{1}))$
	$\displaystyle\varphi^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1})$	$\displaystyle=\mathbf{f}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}\Bigl{(}\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}),\mathsf{aggr}_{x_{2}}^{\mathbf{F}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}}\bigl{(}\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{1}),\varphi^{\scalebox{0.6}{(}t-1\scalebox{0.6}{)}}(x_{2})\|E(x_{i},x_{j})\bigr{)}\Bigr{)}$

correspond to $1$ - $\mathsf{MPNN}\text{s}$ that satisfy the above mentioned restrictions. Without readouts, $\mathsf{MPNN}\text{s}$ compute vertex embeddings and hence, our results imply

\rho_{1}(\mathsf{cr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{1}(\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}).

Furthermore, $\mathsf{MPNN}\text{s}$ with a readout function fall into the category of $1$ - $\mathsf{MPNN}\text{s}$ :

\varphi:=\mathsf{aggr}_{x_{1}}^{\mathsf{readout}}(\varphi^{\scalebox{0.6}{(}t\scalebox{0.6}{)}}(x_{1}))

where unconditional aggregation is used. Hence,

\rho_{0}(\mathsf{gcr}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{0}(\mathsf{gwl}_{1}^{\scalebox{0.6}{(}t\scalebox{0.6}{)}})=\rho_{0}(1\text{-}\mathsf{MPNN}\text{s}^{\scalebox{0.6}{(}t+1\scalebox{0.6}{)}}).

We thus see that $k$ - $\mathsf{MPNN}\text{s}$ gracefully extend $\mathsf{MPNN}\text{s}$ and can be used for obtaining upper bounds on the separation power of classes of $\mathsf{GNN}\text{s}$ .

Expressiveness and Approximation Properties of Graph Neural Networks

Abstract

1 Introduction

Separation power.

Tensor languages.

Approximation and universality.

Related work.

2 Background

3 Specifying GNNs

Syntax.

Semantics.

kk-MPNNs.

TL represents equivariant or invariant functions.

Proposition 3.1.

4 Separation Power of Tensor Languages

4.1 Separation Power

Definition 1.

4.2 Main Results

Theorem 4.1.

Theorem 4.2.

Guarded TL and color refinement.

Theorem 4.3.

Graph embeddings.

Theorem 4.4.

Optimality of number of indices.

Proposition 4.5.

On the impact of functions.

5 Consequences for GNNs

Theorem 5.1.

6 Function Approximation

6.1 General TL Approximation Results

Theorem 6.1.

Corollary 6.2.

6.2 Consequences for GNNs

7 Conclusion

8 Aknowledgements & Disclosure Funding

References

Supplementary Material

Appendix A Related Work Cnt’d

Appendix B Details of Section 3

B.1 Proof of Proposition 3.1

Appendix C Details of Section 4

C.1 Classical Logics

C.2 Characterization of Separation Power of Logics

Proposition C.1.

Proof.

Theorem C.2.

Proof.

C.3 From 𝖳𝖫​(Ω)\mathsf{TL}(\Omega) to 𝖢∞​ωk\mathsf{C}^{k}_{\infty\omega} and 𝖦𝖢∞​ω\mathsf{GC}_{\infty\omega}

Proposition C.3.

Proof.

C.4 Proof of Theorem 4.1, 4.2, 4.3 and 4.4

Proposition C.4.

Proof.

Proposition C.5.

Proof.

C.5 Other aggregation functions

Proposition C.6.

Proof.

C.6 Generalization to Graphs with real-valued vertex labels

Appendix D Details of Section 5

D.1 color Refinement

GraphSage.

GCNs.

SGCs.

Principal Neighbourhood Aggregation.

Other example.

D.2 kk-dimensional Weisfeiler-Leman tests

Folklore GNNs.

Other higher-order examples.

D.3 Augmented GNNs

D.4 Spectral GNNs

Chebnet.

Corollary D.1.

CayleyNet.

Appendix E Proof of Theorem 5.1

Lemma E.1.

Proof of Lemma E.1.

kk-dimensional GINs.

Proposition E.2.

$k$ -MPNNs.

C.3 From $\mathsf{TL}(\Omega)$ to $\mathsf{C}^{k}_{\infty\omega}$ and $\mathsf{GC}_{\infty\omega}$

D.2 $k$ -dimensional Weisfeiler-Leman tests

$k$ -dimensional GINs.

G.2 Conjunctive $\mathsf{TL}$ expressions and treewidth

G.3 Arbitrary $\mathsf{TL}(\Omega)$ expressions