This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Combinatorial Regularity for Relatively Perfect Discrete Morse Gradient Vector Fields of ReLU Neural Networks

Robyn Brooks, Marissa Masden

1 Introduction

Much of the recent progress of machine learning is due to exponential growth in available computational power. This growth has enabled neural network models with trillions of parameters to be trained on extremely large datasets, with exceptional results. In contrast, environmental and economic concerns lead to the question of the minimal network architecture necessary to perform a specific task.

The theoretical characterization of the exact capabilities of fixed neural network architectures is still ongoing. To understand the classification capacity of ReLU neural network functions geometrically, researchers have investigated attributes such as the expected number of linear regions [11] and the average curvature of decision boundary [1]. To describe the topological capacity of such networks, one measure of interest is the achievable Betti numbers of decision regions and decision boundaries for a fixed neural network architecture[3, 10].

In this vein, we continue the development of algorithmic tools for characterizing the topological behavior of fully-connected, feedforward ReLU neural network functions using piecewise linear (PL) Morse and discrete Morse theory. Previous work has established that most ReLU neural networks are PL Morse, and identified conditions for a point in the input space of a ReLU neural network to be regular or have PL Morse index kk [8]. However, discrete Morse theory provides graph-theoretic algorithms for speeding up computations and narrowing down which computations even need to be performed [13]. Furthermore, discrete Morse theory has tools for canceling critical cells which allow for the simplification of sets of critical cells.

The usefulness of discrete Morse theory in algorithmic computations and understanding the topology of cellular spaces motivates us to bridge the existing gap between PL Morse functions and discrete Morse functions in the context of ReLU neural networks.

1.1 Contributions and Related Work

We follow a framework for the polyhedral decomposition of the input space of a ReLU neural network FF by the canonical polyhedral complex which we will denote as 𝒞(F)\mathcal{C}(F), as introduced in [7] and defined in Definition 14. This framework is expanded in [8], introducing a PL Morse characterization, and in [18], introducing the combinatorial characterization of the polyhedral decomposition by sign sequences.

Unfortunately, computing the canonical polyhedral complex and topological properties of its level and sublevel sets is expensive and can only be done for small neural networks, in part because 𝒞(F)\mathcal{C}(F) has exponentially many vertices in the input dimension of FF. It is this drawback that is the main motivation for the construction of a discrete function associated to FF; such a function will retain the topological information desired, but should ease issues of hight computational complexity.

We build on existing literature by providing a translation from the PL Morse characterization from [8] to a discrete Morse gradient vector field by exploiting both the combinatorics from [18] and a technique for generating relatively perfect discrete gradient vector fields from [6]. The main result of this paper (Theorem 3) provides a canonical way to directly construct a discrete Morse function (defined on the input space) which captures the same topological information as the existing PL Morse function. This paper is intended as an intermediary which should allow further computational tools to be developed.

While not the primary goal of this paper, a secondary contribution which we hope to highlight is the development of additional tools for translating from piecewise linear functions on non-simplicial complexes to discrete Morse gradient vector fields. To our knowledge, all current theory relating PL Morse functions to discrete Morse functions on the same complexes is detailed in [6]. We also provide realizability results for certain shallow networks (Theorem 2) - namely, we are able to characterize the possible homotopy types of the descision boundary for a (n,n+1,1)(n,n+1,1) generic PL Morse ReLU neural network FF by restricting the number and possible indices of the critical points of FF.

1.2 Outline

In Section 2, we review the mathematical tools we are using. In Section 3, we review the construction of the canonical polyhedral complex and share some novel realizability results for shallow neural networks. Section 4 contains our main result (Theorem 3). This theorem provides a translation from a PL Morse ReLU neural network FF to a relatively perfect discrete gradient vector field on 𝒞(F)\mathcal{C}(F). In Section 5 we discuss some of the issues surrounding effective computational implementation, and in Section 6, we provide concluding remarks.

2 Background: Polyhedral Geometry and Morse Theories

A ReLU neural network is a piecewise linear function on a polyhedral complex which we call the canonical polyhedral complex. Before developing constructions on this specific object, we review some general constructions in polyhedral and piecewise linear geometry which we will use repeatedly. Readers familiar with these topics may choose to begin at a later section and refer back to this section for notation if necessary.

Notably, we treat polyhedra as closed intersection of finitely many halfspaces in n\mathbb{R}^{n}, and allow unbounded polyhedra. In our setting, a polyhedral complex 𝒞\mathcal{C} is a set of polyhedra in n\mathbb{R}^{n} which is closed under taking faces, such that every pair of polyhedra shares a common face (which may be the empty face). We denote the underlying set of a polyhedral complex by |𝒞||\mathcal{C}|, and take the interior of a polyhedron to be its interior in the relative topology induced by n\mathbb{R}^{n}. A function f:𝒞𝒟f:\mathcal{C}\to\mathcal{D} may be defined by functions with domain and codomain given by the respective underlying sets. Such functions are called piecewise linear on 𝒞\mathcal{C} if the function on |𝒞||\mathcal{C}| is continuous and, for each polyhedron C𝒞C\in\mathcal{C}, f|Cf|_{C} is affine. We refer the reader to [7] and [9] for an additional overview of definitions in polyhedral geometry relevant to this work.

2.1 Some piecewise linear and polyhedral constructions

The terms below are defined specifically for polyhedra and polyhedral complexes. We use definitions from [22] and [6]. These definitions for polyhedral complexes are motivated by similar constructions for simplicial complexes. The additional generalizations to local versions of these terms are necessary for working in the polyhedral setting. We will use the local star and link of a vertex to provide combinatorial regularity which we will exploit in the algorithms we describe in Section 4 and Section 5.

Definition 1 (Cone, Cone Neighborhood, cf. [22, 9]).

Let pp be a point in n\mathbb{R}^{n} and AA a set in n\mathbb{R}^{n}. We define pA={tp+(1t)at[0,1],aA}pA=\{tp+(1-t)a\mid t\in[0,1],a\in A\} and call pApA a cone if each point in pApA can be written uniquely as a linear combination of pp and an element of AA. A cone neighborhood of a point pp in a polyhedral complex 𝒞\mathcal{C} is a closed neighborhood of pp in |𝒞||\mathcal{C}| given by a cone pApA, for a compact set AA.

Remark 1.

Every open neighborhood of pp in |𝒞||\mathcal{C}| contains a cone neighborhood of pp in 𝒞\mathcal{C}.

Next, we create local versions of the following constructions which are well known in simplicial complexes. (1-2) are adapted from [9], (3) is adapted from [6], and (4) is, to our knowledge, new:

Definition 2 (Star, Local Star, Lower Star, Local Lower Star).
  1. 1.

    The star star(p)\mathrm{star}(p) of a point pp in a polyhedral complex 𝒞\mathcal{C} is the set of all polyhedra in 𝒞\mathcal{C} which contain pp, that is, star(p)={C𝒞:pC}\mathrm{star}(p)=\{C\in\mathcal{C}:p\in C\}.

  2. 2.

    Let LL be a compact set contained in star(p)\mathrm{star}(p) such that the cone neighborhood pLpL satisfies pLstar(p)pL\subset\mathrm{star}(p). The local star of pp with respect to LL, denoted starL(p)\mathrm{star}_{L}(p), is given by the cells {CpL:Cstar(p)}\{C\cap pL:C\in\mathrm{star}(p)\}.

  3. 3.

    If f:𝒞f:\mathcal{C}\to\mathbb{R} is a piecewise linear function, the lower star of p relative to ff is the set star(p):={Cstar(p):f(x)f(p)xC}.\mathrm{star}^{-}(p):=\{C\in\mathrm{star}(p):f(x)\leq f(p)~\forall x\in C\}.

  4. 4.

    Finally, the local lower star of pp with respect to LL and relative to ff, denoted starL(p)\mathrm{star}^{-}_{L}(p), is given by the restriction of a lower star of pp to the cone neighborhood pLpL: {CpL:Cstar(p)}\{C\cap pL:C\in\mathrm{star}(p)\}.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: Upper left: The stars of the indicated vertices overlap. Upper right: One possible polyhedral construction of the local stars of the indicated vertices. Bottom left: The lower stars of the indicated vertices, given the indicated gradient directions. Observe the lower stars are necessarily disjoint. Bottom right: The local lower stars of the indicated vertices have simpler combinatorial type, but the cells are in bijection with the cells in the lower star.
Refer to caption
Refer to caption
Refer to caption
Figure 2: Upper left: The links of the indicated vertices overlap and have arbitrary combinatorial type. Upper right: The local links of the indicated vertices. Bottom: The local lower links of the indicated vertices, given the indicated F\nabla F-orientations on edges.

Observe that for a given point pp and any cone neighborhood pLpL of pp, the local star and local lower star of pp have a poset structure given by containment and induced by that of the star of pp; this poset structure is independent of choice of LL. This justifies calling our construction the local lower star of pp relative to ff.

In the simplicial context, the link can be thought of intuitively as the boundary of the star. We introduce a local version of the link in the polyhedral setting. However, in this polyhedral setting, in contrast to the star, the link and the local link may be combinatorially distinct, even though the star and local star are not.

Definition 3 (Link, Local Link, Local Lower Link).

Let pp be a point in |𝒞||\mathcal{C}| for some polyhedral complex 𝒞\mathcal{C}.

  1. 1.

    The link of pp is the set of all faces of cells in star(p)\mathrm{star}(p) that do not contain pp, denoted link(p)\mathrm{link}(p).

  2. 2.

    If LL is a compact set such that pLpL is a cone neighborhood of pp, then we call LL a local link of pp.

  3. 3.

    If LL is a local link of pp contained in star(p)\mathrm{star}(p), then the local lower link of pp is the restriction of LL to the lower star of pp: {LC:Cstar(p)}\{L\cap C:C\in\mathrm{star}^{-}(p)\}, denoted linkL(p)\mathrm{link}^{-}_{L}(p).

The local link and local lower link of a point pp in 𝒞\mathcal{C} have a well-defined combinatorial decomposition induced by that of the star of pp, while the true (combinatorial) link in 𝒞\mathcal{C} does not, as illustrated in Figures 1 and 2.

2.2 F\nabla F-orientations of PL functions

In Section 4 we will translate from piecewise linear Morse functions on certain PL manifolds to discrete Morse gradient vector fields on a corresponding polyhedral complex. A useful intermediary, and furthermore a useful tool for visualization, is what we call the F\nabla F-orientation on the PL manifold’s 1-skeleton.

Definition 4 (F\nabla F orientation, [7]).

Let 𝒞\mathcal{C} be a polyhedral complex and 𝒞(1)\mathcal{C}^{(1)} be the 1-skeleton of 𝒞\mathcal{C}. Let F:|𝒞|F:|\mathcal{C}|\to\mathbb{R} be a piecewise linear function on 𝒞\mathcal{C}. Then the following orientation on the 1-skeleton of 𝒞\mathcal{C} is called the F\nabla F-orientation (read “grad-F orientation”) on 𝒞(1)\mathcal{C}^{(1)}.

  1. 1.

    Orient the edges of 𝒞\mathcal{C} on which FF is nonconstant in the direction of increase of FF.

  2. 2.

    Do not assign an orientation to those edges of 𝒞\mathcal{C} on which FF is constant.

We primarily consider the case where FF is only constant on vertices. For ReLU neural networks, this is not always the case, but is sufficiently common to treat as a distinguished case.

Lemma 1.

The following are properties of a F\nabla F-orientation on a polyhedral complex 𝒞\mathcal{C} on which FF is only constant on vertices.

  1. 1.

    There are no directed cycles in the directed graph.

  2. 2.

    If CC is a closed, bounded polytopal cell of 𝒞\mathcal{C}, then the F\nabla F orientation restricted to the boundary of the cell has a unique source and a unique sink.

  3. 3.

    If a polytope CC has a source vminv_{min} and a sink vmaxv_{max} induced by the F\nabla F orientation on its edges, then there is a hyperplane sweep in n0\mathbb{R}^{n_{0}} that hits vertex vminv_{min} first and vertex vmaxv_{max} last.

Proof.

These are standard results in linear programming. (3) is due to the fact that FF induces a linear projection F|C:CF\big{|}_{C}:C\to\mathbb{R} such that F|C(vmin)F\big{|}_{C}(v_{min}) is minimal and F|C(vmax)F\big{|}_{C}(v_{max}) is maximal. The preimage of each point in \mathbb{R} is a hyperplane intersected with CC. ∎

Remark 2.

As evidenced by (3), two combinatorially equivalent polyhedral complexes might have different sets of admissible PL gradients on their edges depending on the location of their vertices. The question of how to classify all geometrically realizable polyhedral complexes and corresponding F\nabla F-orientations by a ReLU neural network FF with a fixed architecture is open.

Lemma 2 (Cf. [8] Lemma 6.1).

If v1v_{1} and v2v_{2} are two vertices connected by a flat edge ee, and e1e_{1} and e2e_{2} are two edges incident to v1v_{1} and v2v_{2} respectively which bound the same cell CC, then the gradients on e1e_{1} and e2e_{2} have the same relative orientation; they must both be pointing away from v1v_{1} and v2v_{2} respectively, or both be pointing towards.

The above lemma implies the following corollary immediately.

Corollary 1.

If CC is a (k1)(k-1) dimensional face of a kk-dimensional cell DD, and FF is a continuous affine linear function on CC and DD such that F(C)F(C) is constant, then F(D)F(C)F(D)\leq F(C) or F(D)F(C)F(D)\geq F(C), with equality occurring only on CC.

We also find the following no-zigzags lemma useful in understanding realizable F\nabla F orientations.

Lemma 3 (No-zigzags lemma).

Let CC be a 22-cell of a polyhedral complex and FF an affine function on CC. Let e1e_{1} and e2e_{2} be unbounded edges of CC with vertices v1v_{1} and v2v_{2}. Let ee be an edge of CC connecting v1v_{1} and v2v_{2}. If F\nabla F orientation on e1e_{1} is towards v1v_{1} and the F\nabla F orientation on e2e_{2} is away from v2v_{2} then the F\nabla F orientation on ee is from v1v_{1} to v2v_{2}.

Proof.

First, the existence of a F\nabla F orientation on e1e_{1} and e2e_{2} implies that FF is nonconstant on CC. As FF is affine, the level sets of FF in CC are lines.

Next, FF is not constant on ee, because if it were, then by Lemma 2 both e1e_{1} and e2e_{2} would be oriented in the same direction relative to v1v_{1} and v2v_{2}.

For the sake of contradiction, if the edge ee were oriented from v2v_{2} to v1v_{1}, then consider a level set of FF that includes at a point xx on the interior of ee. The F\nabla F-orientations on e1e_{1} and e2e_{2} imply that the level set containing xx also intersects the interiors of e1e_{1} and e2e_{2} in points x1x_{1} and x2x_{2}. The points x1,x2x_{1},x_{2} and xx, on three different faces of CC, cannot be contained in a line because CC is a polyhedron, giving a contradiction. ∎

2.3 Piecewise linear Morse critical points

Smooth Morse theory describes the classical relationship between the homotopy type of sublevel sets of a smooth function f:Mf:M\to\mathbb{R} and what is called the index of its critical points, the number of negative eigenvalues of the Hessian [20]. In non-smooth contexts, alternative tools are needed. One such tool is piecewise linear (PL) Morse theory. No one such theory is generally accepted, but we adapt the following definition due to its ease of use in this context, adapted from [9]:

Definition 5 (Piecewise Linear Morse Critical Point, Regular Point, Index).

Let MM be a combinatorial dd-manifold and let f:|M|f:|M|\to\mathbb{R} be piecewise affine on cells. Let x|M|x\in|M|. Let St(d)St(d) be the standard cross-polytope in d\mathbb{R}^{d} centered at the origin oo and define fi:St(d)f_{i}:St(d)\to\mathbb{R} by

fi(x1,,xd)=k=1i|xk|+k=i+1d|xk|f_{i}(x_{1},...,x_{d})=\sum_{k=1}^{i}-|x_{k}|+\sum_{k=i+1}^{d}|x_{k}|

If there are combinatorially equivalent link complexes for xx and oo contained in the stars of xx and oo such that ff(x)f-f(x) and fkf_{k} have the same signs at corresponding vertices, then xx is a PL critical point of ff with index ii.

Letting g(x1,..,xd)=x1g(x_{1},..,x_{d})=x_{1}, we call xx a PL regular point of ff if, likewise, there is a combinatorially equivalent link complex for xx and oo contained in the stars of xx and oo such that ff(x)f-f(x) and gg have the same signs at corresponding vertices.

If ff satisfies neither condition at xx, then xx is called a degenerate critical point of ff.

Here, the standard cross-polytope St(d)St(d) in d\mathbb{R}^{d} is the convex hull of the points {(0,,±1,,0)}\{(0,...,\pm 1,...,0)\}. One natural simplicial decomposition of St(d)St(d) consists of those simplices given by the convex hull of the origin, oo, together with one vertex viv_{i} which is nonzero in the iith coordinate direction. This simplicial decomposition is compatible with the piecewise linear structure of fif_{i} for all 0id0\leq i\leq d.

We say that ff is PL Morse if all vertices are regular or critical with index ii for some 1id1\leq i\leq d. As in smooth Morse theory, the sublevel set topology of a PL function ff only changes at PL critical points, and if ff is PL Morse, the change in homotopy type at a PL critical point of index kk is consistent with attaching a kk-cell. Furthermore, if f(v)=cf(v)=c and vv is the only PL critical point satisfying that condition, the rank of the relative homology Hk(fc,fcϵ)H_{k}(f_{\leq c},f_{\leq c-\epsilon}) is 11 and, for iki\neq k, Hi(fc,fcϵ)H_{i}(f_{\leq c},f_{\leq c-\epsilon}) have rank zero [8].

In most cases, PL critical points only occur at vertices of the polyhedral complex, but this relies on the function ff being nonconstant on positive-dimensional cells. In [8] and [18] it is shown that all ReLU neural networks which are nonconstant on positive-dimensional cells are PL Morse.

ReLU neural networks often have cells of their Canonical Polyhedral Complex (see Definition 14) on which they are constant (which we call flat cells, in line with [8]), and as a result not all ReLU neural networks are PL Morse. While it is a goal of the authors to extend our construction of a discrete Morse function for networks with flat cells (see Section 6 for a brief discussion on this topic), such networks are not in the scope of the current paper. Therefore, we restrict this paper to ReLU neural networks whose only flat cells are vertices.

2.4 Discrete Morse vector fields

The strength of discrete Morse theory is that it provides algorithmic tools for computing complete sublevel set topology [5].

Such tools, to our knowledge, have not been developed in generality for PL Morse theory. While the majority of discrete Morse theory has been developed in the context of simplicial complexes and CW complexes, the definitions are applicable in the context of polyhedral complexes with few changes, which we discuss at the end of this section. The main difference between polyhedral complexes and cellular complexes is the presence of unbounded polyhedra, which we address in Section 4.1.

We begin by reviewing standard definitions in discrete Morse theory; interested readers can find more details in [6].

Definition 6 (Discrete Morse Function ([5])).

Let 𝒞\mathcal{C} be a simplicial [cellular] complex. A function f:𝒞f:\mathcal{C}\to\mathbb{R} is a discrete Morse function if, for every simplex [cell] α𝒞\alpha\in\mathcal{C},

#{β<α,dim(β)=dim(α)1|f(β)f(α)}1\#\{\beta<\alpha,\textrm{dim}(\beta)=\textrm{dim}(\alpha)-1|f(\beta)\geq f(\alpha)\}\leq 1

and

#{γ>α,dim(γ)=dim(α)+1|f(γ)f(α)}1,\#\{\gamma>\alpha,\textrm{dim}(\gamma)=\textrm{dim}(\alpha)+1|f(\gamma)\leq f(\alpha)\}\leq 1,

and at least one of the above equalities is strict. Simplices [cells] for which both equalities are strict are called critical.

A discrete Morse function has the property that it assigns higher values to higher dimensional simplices, except possibly with one exception locally for each simplex. For a given simplicial complex KK, there may be many discrete Morse functions; for example, any function which assigns increasing values to simplices with increasing dimension will be discrete Morse, and all simplices will be critical. However, it is usually possible to find a more “efficient” discrete Morse functions in the sense that complex has a smaller amount of critical simplices than overall simplices (see for example [4, 13, 17]).

Sublevel sets of discrete Morse functions are subcomplexes, and provide a way of building the simplical complex in question by adding simplices in order of increasing function value. As in classical Morse theory, the homotopy type of the sublevel set can only change when a critical simplex is added, and the dimension of the critical simplex indicates that homotopy type of the sublevel differs only by the attachment of a cell of the same dimension.

Associated to each discrete Morse function is a discrete gradient vector field. The information contained in the gradient vector field of a discrete Morse function is sufficient to determine the homotopy type of sublevel sets, and gradient vector fields have the benefit of a combinatorial formulation. To define the gradient vector field of a discrete Morse function, we first introduce the definition of a general discrete vector field.

Definition 7 (Discrete Vector Field, [6]).

Let 𝒞\mathcal{C} be a simplicial [cellular] complex. A discrete vector field VV on 𝒞\mathcal{C} is a collection of pairs {(C,D)}\{(C,D)\} of simplices [cells] of 𝒞\mathcal{C} such that:

  1. 1.

    dimC=dimD1\dim C=\dim D-1

  2. 2.

    Each simplex [cell] of 𝒞\mathcal{C} belongs to at most one pair in VV.

The discrete gradient vector field of a discrete Morse function ff on 𝒞\mathcal{C} is the pairing that arises from ff as follows:

  1. 1.

    If α\alpha is critical, then it is unpaired,

  2. 2.

    Otherwise, if there exists a face β\beta of α\alpha with f(β)f(α)f(\beta)\geq f(\alpha), then α\alpha is paired with β\beta,

  3. 3.

    Otherwise, there is a coface γ\gamma of α\alpha with f(α)f(γ)f(\alpha)\geq f(\gamma), and α\alpha is paired with γ\gamma.

It is clear from Definitions 6 and 7 that this collection of pairs of 𝒞\mathcal{C} is indeed a discrete vector field.

Often, discrete Morse functions are represented only by their gradient vector fields, as opposed to giving the function ff itself. Therefore, it is valuable to be able to determine when a discrete vector field represents the gradient of a discrete Morse function. The discrete Morse analogue to gradient flow along a discrete vector field is a VV-path.

Definition 8 (VV-path, [6]).

Given a discrete vector field VV on a simplicial [cellular] complex 𝒞\mathcal{C}, a VV-path is a sequence of cells:

C0,D0,C1,D1,,Cr,DrC_{0},D_{0},C_{1},D_{1},\ldots,C_{r},D_{r}

such that for each i=0,,ri=0,...,r, (Ci,Di)V(C_{i},D_{i})\in V and Di>Ci+1CiD_{i}>C_{i+1}\neq C_{i}.

A classical way of determining whether a smooth vector field can represent a gradient is when it lacks circulation. The analogue in the discrete setting to “lacking circulation” is “no non-trivial closed VV-paths”. The following theorem therefore indicates when a discrete vector field is the gradient of a discrete Morse function.

Theorem 1 ([5], Thm. 3.5).

A discrete vector field VV on a simplicial [cellular] complex is the gradient vector field of a discrete Morse function if and only if there are no non-trivial closed VV-paths.

The critical cells of a discrete gradient vector field VV can be thought of as tracking an analog of the topology of the polyhedral complex’s sublevel sets under a particular function which would induce said discrete vector field, as seen below.

Definition 9 (Perfect DGVF, Relatively Perfect DGVF [6]).

A discrete gradient vector field VV on simplicial [cellular] complex 𝒞\mathcal{C} is called perfect if the number of critical cells of VV of dimension kk is equal to the rank of Hk(|𝒞|)H_{k}(|\mathcal{C}|) for all integers kk.

Let f:|𝒞|f:|\mathcal{C}|\to\mathbb{R} be a piecewise linear function on cells of 𝒞\mathcal{C}. Let imf\ell\in im\;f be restricted to the images of vertices of 𝒞\mathcal{C}. For a given value of \ell, define \ell^{\prime} to be the greatest value of ff on vertices strictly less than \ell.

A discrete gradient vector field VV on 𝒞\mathcal{C} is called relatively perfect with respect to the function ff if mi(V)=rkHi(|𝒞|,|𝒞|)m_{i}^{\ell}(V)=\mathrm{rk}H_{i}(|\mathcal{C}_{\ell}|,|\mathcal{C}_{\ell^{\prime}}|),

𝒞={C𝒞fmax(C)}\mathcal{C}_{\ell}=\{C\in\mathcal{C}\mid f_{max}(C)\leq\ell\}

where mi(V)m_{i}^{\ell}(V) denotes the number of discrete critical ii-simplices [cells] in 𝒞𝒞\mathcal{C}_{\ell}\setminus\mathcal{C}_{\ell^{\prime}},

Ultimately, we hope the existence of a relatively perfect discrete gradient vector field will enable improved computation of topological features of the level and sublevel sets of a neural network FF.

Refer to caption
Refer to caption
Figure 3: Left: A F\nabla F-induced orientation on the edges of a polyhedral complex, with PL Morse critical points indicated. There are one index-zero critical point, two index-one critical points, and one index-two critical point. Right: One possible set of critical cells which would make a discrete gradient vector field on the polyhedral complex relatively perfect to FF.

2.5 Challenges in relating PL Morse and discrete Morse functions

Most known relationships beween PL Morse and discrete Morse constructions are implicit. In [2, 12, 21], we see that the construction of discrete gradient vector fields for cubical complexes from function data is well-studied; this is in part due to the regularity of their local combinatorics. For cubulations of compact regions of n\mathbb{R}^{n}, for example, when simplifying persistence homology computations, there is a discrete Morse function constructed which has an implied analog of a smooth or piecewise linear function on the underlying space.

To our knowledge, there has been little work done to construct a discrete gradient vector field from a PL Morse function in a general setting. In a general setting, [16] compares the PL approximations of a scalar function defined on a triangulated surface to a discrete gradient vector field built using a greedy algorithm - they find that under certain regularity conditions, critical cells in the vector field are adjacent to critical vertices in the PL approximation. Likewise in [6] an algorithm is presented for constructing a discrete gradient vector field which is relatively perfect to a PL Morse function on a simplicial complex which is a combinatorial manifold. However, due to theoretical limitations for algorithms on nn-spheres for n4n\geq 4, the algorithm applies only to simplicial complexes of dimension 3\leq 3.

As polyhedral complexes have fewer combinatorial restrictions on their structure than simplicial and cubical complexes, a general theory for creating discrete gradient vector fields on an arbitrary polyhedral complex from function values is likely to be similarly intractable.

Fortunately, due to the combinatorial regularity of the canonical polyhedral complex of a ReLU neural network, which we will describe in Section 3, we may follow an approach similar to that in [6] to constructively obtain a relatively perfect discrete gradient vector field when the network has vertices in general position.

3 Background: The canonical polyhedral complex 𝒞(F)\mathcal{C}(F)

We now may discuss the specifics of the canonical polyhedral complex, beginning with its construction through a brief description of the combinatorial characterization which enables its topological properties to be studied.

3.1 Construction of 𝒞(F)\mathcal{C}(F)

For nn\in\mathbb{N}, define the function ReLU:nn\textrm{ReLU}:\mathbb{R}^{n}\to\mathbb{R}^{n} as

ReLU(x1,,xn)=(max{0,x1},,max{0,xn}).\textrm{ReLU}(x_{1},\dots,x_{n})=(\textrm{max}\{0,x_{1}\},\dots,\textrm{max}\{0,x_{n}\}).
Definition 10 (ReLU Neural Network; [7], Definition 2.1,[18], Definition 3).

Let n0,,nmn_{0},\dots,n_{m}\in\mathbb{N}. A (fully-connected, feed-forward) ReLU neural network with architecture (n0,,nm,1)(n_{0},\dots,n_{m},1) is a collection 𝒩={Ai}\mathcal{N}=\{A_{i}\} of affine maps Ai:nini+1A_{i}:\mathbb{R}^{n_{i}}\to\mathbb{R}^{n_{i+1}} for i=0,,mi=0,\dots,m. Such a collection determines a function F𝒩:n0F_{\mathcal{N}}:\mathbb{R}^{n_{0}}\to\mathbb{R}, the associated neural network map, given by the composite

n0F1=ReLUA1n1F2=ReLUA2Fm=ReLUAmnmG=Am+1.\mathbb{R}^{n_{0}}\xrightarrow{F_{1}=\textrm{ReLU}\circ A_{1}}\mathbb{R}^{n_{1}}\xrightarrow{F_{2}=\textrm{ReLU}\circ A_{2}}\dots\xrightarrow{F_{m}=\textrm{ReLU}\circ A_{m}}\mathbb{R}^{n_{m}}\xrightarrow{G=A_{m+1}}\mathbb{R}.

We say that this network has depth m+1m+1 and width max{n1,,nm,1}\{n_{1},\dots,n_{m},1\}. The maps FkF_{k} are called the kkth layer maps. By abuse of notation, we often refer to F𝒩F_{\mathcal{N}} as simply FF.

The terms fully-connected and feedforward are machine learning terms which indicate that there are no restrictions on each affine function AiA_{i} and that there are no expectation of identical layer maps (i.e. no recurrence), respectively. We will omit these terms in the rest of this paper, but keep them in the definition to disambiguate for readers with a machine learning background.

Note that FF is a piecewise linear function, and that FF defines a polyhedral decomposition of its domain, n0\mathbb{R}^{n_{0}}. More specifically, we may decompose n0\mathbb{R}^{n_{0}} into a polyhedral complex by identifying the (maximal) subsets of n0\mathbb{R}^{n_{0}} on which each FiF_{i} is affine linear. We call this decomposition the canonical polyhedral complex of FF, denoted 𝒞(F)\mathcal{C}(F). To give a precise definition of this complex, we first introduce notation concerning partial compositions of the layer maps. In particular, the canonical polyhedral complex can be defined using such partial composites.

Definition 11 ([18], Definition 4).

If F=GFmF1F=G\circ F_{m}\circ\dots\circ F_{1} is a ReLU neural network with F:n0F:\mathbb{R}^{n_{0}}\to\mathbb{R}, then we denote the composition of the first kk layers as F(k)F_{(k)}; i.e.

F(k)=FkF1.F_{(k)}=F_{k}\circ\dots\circ F_{1}.

We refer to F(k)F_{(k)} as FF ending at the kkth layer.

Conversely, we denote the composition of the last m+1km+1-k layers as F(k)F^{(k)}; i.e.

F(k)=GFmFk.F^{(k)}=G\circ F_{m}\circ\dots\circ F_{k}.

We refer to F(k)F^{(k)} as FF starting at the kkth layer.

Clearly, F=F(k)F(k1)F=F^{(k)}\circ F_{(k-1)}. Furthermore, each F(k)F_{(k)} has an associated natural polyhedral decomposition of n0\mathbb{R}^{n_{0}}. The canonical polyhedral complex will be defined as the common refinement of these iterative polyhedral decompositions. To formalize the polyhedral decomposition induced by F(k)F_{(k)}, note that each affine linear map has an associated polyhedral complex.

Definition 12 (Notation R(i),πjR^{(i)},\pi_{j};[7], [18], Definition 4).

Let Ai:ni1niA_{i}:\mathbb{R}^{n_{i-1}}\to\mathbb{R}^{n_{i}} be an affine function for 1im1\leq i\leq m. Denote by R(i)R^{(i)} the polyhedral complex associated to the hyperplane arrangement in ni1\mathbb{R}^{n_{i-1}}, induced by the hyperplanes given by the solution set to Hij={xn:πjAi(x)=0}H_{ij}=\{x\in\mathbb{R}^{n}:\pi_{j}\circ A_{i}(x)=0\}, where πj\pi_{j} is the projection onto the jjth coordinate in ni\mathbb{R}^{n_{i}}.

For k>1k>1, F(k)F_{(k)} is not affine linear, but instead is piecewise linear. Therefore, the solution sets associated to F(k)F_{(k)} are not necessarily hyperplanes. However, it is still possible to use these solution sets to determine a polyhedral decomposition of the input space, for each kk.

Definition 13 (Node maps and bent hyperplanes, [7], Definion 8.1, 6.1, [18], Definition 5, 6).

Given a ReLU neural network FF, the node map Fi,j:n0F_{i,j}:\mathbb{R}^{n_{0}}\to\mathbb{R} is defined by

Fi,j=πjAiF(i1).F_{i,j}=\pi_{j}\circ A_{i}\circ F_{(i-1)}.

A bent hyperplane of is the preimage of 0 under a node map, that is, Fi,j1(0)F^{-1}_{i,j}(0) for fixed i,ji,j.

A bent hyperplane is generically a piecewise linear codimension 11 submanifold of the domain (see [7] for more details). It is “bent” in that it is a union of polyhedra, and may not be contractible or even connected. For each F(k)F_{(k)}, the associated bent hyperplanes induce a polyhdedral decompostion of n0\mathbb{R}^{n_{0}} which we denote 𝒞(F(k))\mathcal{C}(F_{(k)}). The canonical polyhedral complex can then defined iteratively, by intersecting the regions of 𝒞(F(k))\mathcal{C}(F_{(k)}) with the polyhedral decomposition given at F(k1)F_{(k-1)}.

Definition 14 (Canonical Polyhedral Complex 𝒞(F)\mathcal{C}(F), [18], Definition 7).

Let F:n0F:\mathbb{R}^{n_{0}}\to\mathbb{R} be a ReLU neural network with mm layers. Define the canonical polyhedral complex of FF, denoted 𝒞(F)\mathcal{C}(F), as follows:

  1. 1.

    Define 𝒞(F(1))\mathcal{C}(F_{(1)}) by R(1)R^{(1)}.

  2. 2.

    Define 𝒞(F(k))\mathcal{C}(F_{(k)}) be defined in terms of 𝒞(F(k1))\mathcal{C}(F_{(k-1)}) as the polyhedral complex consisting of the following cells:

    𝒞(F(k))={CF(k1)1(R):C𝒞(F(k1)),RR(k)}\mathcal{C}(F_{(k)})=\left\{C\cap F^{-1}_{(k-1)}(R)\ :\ C\in\mathcal{C}(F_{(k-1)}),\ R\in R^{(k)}\right\}

    Then 𝒞(F)\mathcal{C}(F) is given by 𝒞(F(m))\mathcal{C}(F_{(m)}).

The above definition is the “Forward Construction” of 𝒞(F)\mathcal{C}(F) in [18]. Alternatively, there is an “Backward Construction” which gives the same complex. This definition originally appeared in [7].

For example, in special case of a neural network FF with architecture (2,n,1)(2,n,1), the canonical polyhedral complex 𝒞(F)\mathcal{C}(F) is a decomposition of 2\mathbb{R}^{2} by nn lines, which with full measure will fall in general position. It is immediate that 𝒞(F)\mathcal{C}(F) contains 2n2n unbounded edges, 2n2n unbounded polyhedra of dimension 2, and (n2){n\choose 2} vertices.

Refer to caption
Refer to caption
Figure 4: (Left) A portion of a canonical polyhedral complex 𝒞(F)\mathcal{C}(F) given as a bent hyperplane arrangement. The two “bent” hyperplanes which are not lines are given distinct colors. (Right) A plausible F\nabla F orientation on the edges of this 𝒞(F)\mathcal{C}(F), and a plausible level set, marked in red.

3.2 Local characterization of vertices and PL Morse critical points in 𝒞(F)\mathcal{C}(F)

For general piecewise linear functions on polyhedral complexes, it is not generally algorithmically decidable whether a vertex is PL critical or regular. However, the canonical polyhedral complex has combinatorial properties which make the question of PL criticality algorithmically decidable.

Following [18], under full-measure conditions called supertransversality and genericity, each cell CC of 𝒞(F)\mathcal{C}(F) can be labeled with a sequence in {1,0,1}N\{-1,0,1\}^{N}, where N=i=1mniN=\sum_{i=1}^{m}n_{i}. This construction is not new (for example, it also appears in [14]), but to our knowledge there is no standard reference.

Definition 15 (Sign Sequence, [18]).

The sign sij(C)s_{ij}(C) is given by the sign of πjAiF(i1)(C)\pi_{j}\circ A_{i}\circ F_{(i-1)}(C), which is well-defined. The collection of all such sij(C)s_{ij}(C) for a specific cell CC is called its sign sequence, and is denoted s(C)s(C).

These sign sequences encode the cellular poset of 𝒞(F)\mathcal{C}(F) as follows:

Lemma 4 (Sign sequence properties, [18]).

Let FF be a supertransversal ReLU neural network. The following is true about any two cells CC and DD of 𝒞(F)\mathcal{C}(F):

Define S(C)S(D)S(C)\cdot S(D) by

(S(C)S(D))ij={S(C)ijif S(C)ij0S(D)ijelse(S(C)\cdot S(D))_{ij}=\begin{cases}S(C)_{ij}&\textrm{if $S(C)_{ij}\neq 0$}\\ S(D)_{ij}&\textrm{else}\end{cases}

Then S(C)S(D)=S(E)S(C)\cdot S(D)=S(E), where EE is a cell in 𝒞(F)\mathcal{C}(F). Furthermore:

  1. 1.

    CC is a face of EE (Lemma 18)

  2. 2.

    CDC\leq D if and only if S(C)S(D)=S(D)S(C)\cdot S(D)=S(D) (Lemma 19)

Finally, the cellular coboundary map in 𝒞(F)\mathcal{C}(F) can be neatly described:

Lemma 5 (Sign sequence, [18], Lem. 21).

Let FF be a supertransversal, generic ReLU neural network. Let CC be a [polyhedral] cell of 𝒞(F)\mathcal{C}(F). Then the cells DD of which CC is a facet are given by the set of cells with sign sequence given by sij(D)=sij(C)s_{ij}(D)=s_{ij}(C) for all ii and jj except for exactly one, a location for which sij(C)=0s_{ij}(C)=0.

In other words, under supertransversality and genericity conditions each kk-cell of 𝒞(F)\mathcal{C}(F) has exactly n0kn_{0}-k entries of its sign sequence equal to zero, and all incident cells can be identified by replacing each zero entry with ±1\pm 1. Not unsurprisingly, this identifies the intersection combinatorics of cells in the local lower star of a vertex vv with the intersection combinatorics of the coordinate planes in n0\mathbb{R}^{n_{0}}.

If vv is a vertex of 𝒞(F)\mathcal{C}(F), we can thus create a simplicial complex whose underlying set is the union of the local star and local link of vv, and which has the combinatorics of a cross-polytope in 0n\mathbb{R}^{n}_{0}. The vertices of this simplicial complex are vv together with a point selected from each edge incident to vv. For each ii from 11 to n0n_{0}, replacing the iith zero entry with a 11 to obtain an edge ei+e_{i}^{+}, then selecting a point vi+v_{i}^{+} from that edge; likewise select viv_{i}^{-} from the edge eie_{i}^{-} obtained by replacing the iith zero entry with a 1-1. The n0n_{0}-simplices of this simplicial complex are the convex hull of the sets consisting of exactly one of vi+v_{i}^{+} and viv_{i}^{-} for each ii, together with vv.

Refer to caption
Figure 5: The union of the local star and local link of a vertex vv is a cross-polytope.

The arguments in [8] and [19] use this characterization and variants of Definition 5 to show that a vertex vv of 𝒞(F)\mathcal{C}(F) is PL critical if and only if the edges (vi,v)(v_{i}^{-},v) and (v,vi+)(v,v_{i}^{+}) have opposite F\nabla F-orientations for all ii , and if so, the index of the the critical point is given by the number of these pairs which are oriented towards vv ([19], Theorem 3.7.3). As a result, FF is PL Morse if and only if all edges are assigned a F\nabla F-orientation.

3.3 Realizability results for PL Morse ReLU neural networks

As an initial example of the usefulness of the combinatorial description given by sign sequences, we develop an exploration of some realizability results for ReLU networks by classifying all possible PL Morse ReLU neural networks on an (n,n+1,1)(n,n+1,1) architecture, up to F\nabla F-orientation (Theorem 2). These results are new in this context and our techniques are illustrative of sign sequence properties which will be used throughout the remainder of the paper. However, the main results of this paper do not rely on the results of this section.

Lemma 6.

If FF is a PL-Morse generic ReLU neural network with a (n,n+1,1)(n,n+1,1) architecture, then the F\nabla F-orientations on unbounded edges which share the same vertex is the same, but any set of F\nabla F-orientations on unbounded edges subject to this restriction is realizable.

Proof.

There is a unique generic affine hyperplane arrangment 𝒜\mathcal{A} with n+1n+1 hyperplanes in n\mathbb{R}^{n}, up to affine transformations. It has a single bounded nn-cell which is, in fact, an nn-simplex, which we will call Σ\Sigma. All unbounded cells in this hyperplane arrangement share a face with this nn-simplex.

If FF is PL Morse, then none of the faces of Σ\Sigma is a flat cell. This means none of the nn-dimensional topes of 𝒜\mathcal{A} has the (1,,1)(-1,...,-1) sign sequence.

There are 2n+12^{n+1} possible sign sequences in {1,1}n+1\{-1,1\}^{n+1}, and 2n+112^{n+1}-1 topes in 𝒜\mathcal{A}. In particular, the (1,1,,1)(1,1,\ldots,1) cell is present. Furthermore, it cannot be an unbounded cell, as the opposite unbounded cell would have the (1,,1)(-1,\ldots,-1) sign sequence, so the (1,,1)(1,\ldots,1) cell is Σ\Sigma. Thus, the image of F1(Σ)F_{1}(\Sigma) is contained in the first quadrant.

Next if TT is an unbounded tope of 𝒜\mathcal{A} with a single vertex, then it has a sign sequence with exactly one entry which is a 11, corresponding with the axis its image is restricted to. That is F1(T)span{ei}+F_{1}(T)\subset\text{span}\{e_{i}\}^{+}, where ii is the index in which the sign sequence of TT is positive.

All unbounded edges of 𝒜\mathcal{A} belong to exactly one tope TT of this form. An unbounded edge has exactly nn zero entries, and one nonzero entry, which must be 11 otherwise the edge would be a face of the all 1-1 region. The tope TT may be identified in sign sequence by setting all nn zero-entries of the edge’s sign sequence to 1-1.

All edges of TT are unbounded and share the same vertex. A path away from the vertex of TT along any edge of TT maps to a path along eie_{i} away from the origin under F1F_{1}, the first layer of FF. The derivative of the restriction of F1F_{1} along any edge EE of TT pointing away from the vertex of TT is a positive multiple of eie_{i}, which we denote DE(F1)=ceiD_{E}(F_{1})=ce_{i}.

The F\nabla F-orientation on an (outward-oriented) edge EE of TiT_{i} is given by the sign of DE(F1)vD_{E}(F_{1})\cdot\vec{v} for v\vec{v} the (n+1)(n+1)-dimensional vector giving the linear part of the affine function G:n+1G:\mathbb{R}^{n+1}\to\mathbb{R}.

As DE(F1)=ceiD_{E}(F_{1})=ce_{i} for all edges EE of TT, for a positive cc, then the F\nabla F-orientation on EE is outward if and only if the iith entry of v\vec{v} is positive.

This shows all edges EE of TT have the same F\nabla F-orientation. Since there is such a tope TT for each 1in+11\leq i\leq n+1, to induce an orientation on TiT_{i} set vi\vec{v}_{i} to be positive or negative, as desired. This allows for all possible orientations on the unbounded edges of 𝒜\mathcal{A}. ∎

Using both Lemma 6 and Lemma 3, we may further exploit properties specific to the polyhedral complex of a network FF with architecture (n,n+1,1)(n,n+1,1) to determine which vertices in 𝒞(F)\mathcal{C}(F) are PL-critical, as shown in the following lemma.

Lemma 7.

Let FF be an (n,n+1,1)(n,n+1,1) generic, PL Morse ReLU neural network. Any critical points of FF are index-0 or index-nn, and there is at most one critical point.

Proof.

Any critical points of FF are vertices of Σ\Sigma. Let vv be a vertex of Σ\Sigma and suppose it is a critical point. Let TT be the unique unbounded tope of 𝒜\mathcal{A} whose only vertex is vv. Note that TT has nn (unbounded) edges, and the \nabla-F orientation on these edges of TT either are all towards vv or are all away from vv, as seen in Lemma 6. As each edge of TT is opposite vv from an edge of Σ\Sigma, in order for vv to be critical all edges on Σ\Sigma containing vv must be oriented in the opposite direction from their paired edges on TT [8]. As a result, the edges in 𝒜\mathcal{A} incident to vv are either all oriented towards vv or all oriented away from vv; that is, vv is either index-0 or index-nn.

To see that there is at most one critical point, without loss of generality, assume vv is critical of index 0. Then all edges incident to vv are oriented away from vv. If ww is another vertex of Σ\Sigma, then there is an edge ee connecting vv to ww, and of course ee must be oriented away from vv. There is a unique unbounded cell UU in 𝒜\mathcal{A} which contains vv and ww and no other vertices; it also contains ee. The unbounded edges of UU which contain vv must be oriented away from vv. By the no-zigzags lemma 3 the unbounded edges of UU which contain ww must be oriented away from ww (as each unbounded edge of UU containing ww shares a 22-cell with an unbounded edge of UU containing vv). By Lemma 6 all unbounded edges pointing away from ww are oriented away from ww. In particular, the edge opposite ee is oriented away from ww. Because ee is oriented towards ww and its opposite edge is oriented away from ww, we conclude ww is a PL-regular point. ∎

Theorem 2.

Let FF be an (n,n+1,1)(n,n+1,1) generic, PL Morse ReLU neural network. Then the decision boundary of FF is empty, has the homotopy type of a point, or has the homotopy type of an (n1)(n-1)-sphere.

Proof.

Following the previous lemma 3.3, we will follow [8], and conclude the topology of the decision boundary must be one of these three options. ∎

We now classify all possible PL Morse (2,3,1)(2,3,1) neural networks.

Corollary 2.

The F\nabla F-orientations depicted in Figure 6 are the only possible F\nabla F-orientations on a generic, supertransversal, PL Morse (2,3,1)(2,3,1) ReLU neural network, up to combinatorial equivalence.

Proof.

Let FF be such a network, and denote as Σ\Sigma the unique bounded 22-cell of 𝒞(F)\mathcal{C}(F). Following Lemma 6, there are 4 possible scenarios for the F\nabla F-orientations on the unbounded edges of 𝒞(F)\mathcal{C}(F):

  1. 1.

    all unbounded edges are oriented towards Σ\Sigma (Figure 6(a)),

  2. 2.

    all unbounded edges are oriented away from Σ\Sigma (Figure 6(b)),

  3. 3.

    the unbounded edges of exactly one vertex of Σ\Sigma are oriented towards Σ\Sigma and all other unbounded edges are oriented away from σ\sigma (Figure 6(c)), or

  4. 4.

    the unbounded edges of exactly one vertex of Σ\Sigma are oriented away from Σ\Sigma and all other unbounded edges are oriented towards σ\sigma(Figure 6(d)).

If not all unbounded edges have the same orientation with respect to Σ\Sigma, then Lemma 3 determines the orientation of two of the three edges of Σ\Sigma. Moreover, the orientations of these bounded edges ensure that none of the vertices in σ\sigma can be PL-critical.

If all unbounded edges have the same orientation with respect to Σ\Sigma, then Lemma 1 ensures that F\nabla F orientation on the edges of Σ\Sigma do not generate a cycle. Therefore, there must be a vertex vv of Σ\Sigma for which the F\nabla F orientation of each bounded edge adjacent to vv is towards vv, and a vertex ww of Σ\Sigma for which the F\nabla F orientation of each bounded edge adjacent to ww is away from ww.

If it is the case that all unbounded edges are oriented towards Σ\Sigma, then vv is a PL-critical vertex of index 22, and all other vertices are PL-regular.

If it is the case that all unbounded edges are oriented away Σ\Sigma, then ww is a PL-critical vertex of index 22, and all other vertices are PL-regular. ∎

Refer to caption
(a) (2,3,1)(2,3,1) neural network with PL-critical vertex of index 22 marked in red.
Refer to caption
(b) (2,3,1)(2,3,1) neural network with PL-critical vertex of index 0 marked in red.
Refer to caption
(c) (2,3,1)(2,3,1) neural network with PL-regular vertices; the unmarked edge can have either possible orientation.
Refer to caption
(d) (2,3,1)(2,3,1) neural network with PL-regular vertices; the unmarked edge can have either possible orientation.
Figure 6: All possible F\nabla F-orientations for a generic, supertransversal, PL Morse (2,3,1)(2,3,1) neural network.

Given a canonical polyhedral complex (arising from a ReLU neural network), not all functions on that canonical polyhedral complex are realizable as ReLU neural network functions. From [8] we see that that a ReLU neural network of the form (n,m,1)(n,m,1) generally has at most one nn-cell on which it is constant, for example, but other limitations exist as well.

Refer to caption
Figure 7: A F\nabla F orientation on the canonical polyhedral complex consisting of a generic hyperplane arrangement of 3 hyperplanes in 2\mathbb{R}^{2} which is realizable as a PL function, but not as a ReLU neural network of with architecture (2,3,1)(2,3,1). Selecting distinct values on v1,v6v_{1},...v_{6} determines a F\nabla F orientation on the pink polyhedral complex.

In fact, Figure 7 depicts a F\nabla F orientation on the same 3 hyperplanes which is realizable if FF is a PL function, but not if FF is a ReLU neural network of architecture (2,3,1)(2,3,1). That it cannot be a ReLU network follows immediately from Corollary 2. That it is realizable can be seen by assigning F(vi)=iF(v_{i})=i and observing that FF on each simplex in the highlighted simplicial complex determines the F\nabla F orientation on the edges of the polyhedral cell which contains it. This results in the F\nabla F orientations pictured.

4 Relatively Perfect Discrete Gradient Vector Fields for ReLU Networks

Because our setup allows us to identify PL critical vertices, we further leverage this information to constructively establish the existence of a discrete gradient vector field on the cells of 𝒞(F)\mathcal{C}(F) which are bounded above in FF. Moreover, this discrete gradient vector field has the property that critical cells are in bijection with PL critical vertices in a way which respects function values; this is a technical property introduced in [6] called relative perfectness (Definition 9). Ideally, the existence and algorithmic constructability of this discrete gradient vector field will enable faster computational measurements of the topology of ReLU neural networks’ decision regions.

In this section, we follow a similar construction to that in [6], where they establish an algorithm for finding a discrete gradient vector field which is relatively perfect to any given PL Morse function on a simplicial combinatorial manifold with dimension d3d\leq 3. In a similar vein, we construct a discrete gradient vector field by considering the lower stars of individual vertices of 𝒞(F)\mathcal{C}(F).

Some of the key differences in our algorithm are that (a) we are not dimensionally restricted, (b) 𝒞(F)\mathcal{C}(F) is not generally a simplicial complex, and (c) we do not rely on nonconstructive existence theorems to assign local gradient vector fields. Instead, we exploit specific combinatorics of 𝒞(F)\mathcal{C}(F) (as given in Section 3.2) to constructively produce the desired local pairings. To our knowledge, constructions establishing discrete gradient vector fields on polyhedral complexes with associated PL Morse functions are relatively unexplored.

4.1 Discrete Morse theory on unbounded polyhedral complexes

Before we introduce our construction, we must justify why it is reasonable to use the tools of discrete Morse theory on the polyhedral complex 𝒞(F)\mathcal{C}(F), which is not formally a cellular complex due to the presence of unbound cells. In fact, we will construct a discrete gradient vector field on an associated CWCW-complex 𝒞(F)\mathcal{C}(F)^{-}_{*} with no unbounded cells.

Definition 16 (𝒞(F),𝒞(F)\mathcal{C}(F)^{-},\mathcal{C}(F)^{-}_{*}).

The Complete Lower Star Complex relative to FF is the subcomplex of 𝒞(F)\mathcal{C}(F) containing all cells which are bounded above, i.e.

𝒞(F):={C𝒞(F)|r such that CF1(,r])}.\mathcal{C}(F)^{-}:=\{C\in\mathcal{C}(F)|\ \exists\ r\in\mathbb{R}\textrm{ such that }C\subset F^{-1}(-\infty,r])\}.

The one-point compactification of 𝒞(F)\mathcal{C}(F)^{-}, which we call the one-point compactified complete lower star complex relative to FF is denoted 𝒞(F)\mathcal{C}(F)^{-}_{*}. The distinguished point {}\{*\} is formally assigned a function value F()=F(*)=-\infty.

Remark 3.

𝒞(F)\mathcal{C}(F)^{-} is indeed a subcomplex of 𝒞(F)\mathcal{C}(F), as if CC is a polyhedron satisfying F(C)rF(C)\leq r, then this is true of all of its faces as well. Furthermore, even though 𝒞(F)\mathcal{C}(F)^{-} contains unbounded cells, 𝒞(F)\mathcal{C}(F)^{-}_{*} is a regular CWCW-complex.

In Theorem 3 we will identify a a discrete Morse function on 𝒞(F)\mathcal{C}(F)^{-}_{*} with a single connected component appearing at -\infty. This new discrete Morse function will be relatively perfect to the PL function on the subset of Sn0S^{n_{0}} given by 𝒞(F)\mathcal{C}(F)^{-}_{*}. That this construction does not capture gradient information on cells whose image in FF is not bounded above is not a large problem: the homotopy type of the sublevel sets of 𝒞(F)\mathcal{C}(F) only changes at F(v)F(v), for vv a vertex of 𝒞(F)\mathcal{C}(F).

4.2 Networks which are injective on vertices

We start our construction locally, by showing that, for a given vertex in 𝒞(F)\mathcal{C}(F), it is always possible to generate pairings in the local lower star of vv which reflect the PL-criticality of vv.

Lemma 8.

Let FF be a fully-connected, feedforward ReLU neural network which is injective on vertices of 𝒞(F)\mathcal{C}(F). Then for each vertex vv in 𝒞(F)\mathcal{C}(F), there is a pairing in the local lower star of vv relative to FF satisfying exactly one of the two following conditions:

  1. 1.

    If vv is a vertex of 𝒞(F)\mathcal{C}(F) that is PL regular, then there exists a choice VV of complete acyclic pairing of cells in the local lower star of vv relative to FF.

  2. 2.

    If vv is a vertex of 𝒞(F)\mathcal{C}(F) that is PL critical of index kk, then there exists a choice VV of acyclic pairings of cells in the local lower star of vv which leaves exactly one kk-cell unpaired.

Furthermore, these pairings can be constructed algorithmically.

Proof.

As FF is injective on vertices, all edges have a F\nabla F-orientation and FF is PL Morse. For any vertex v𝒞(F)v\in\mathcal{C}(F), following Section 3.2 there exists a compact set Lstar(v)L\subset\mathrm{star}(v) for which starL(v)\mathrm{star}_{L}(v) is a n0n_{0}-dimensional cross-polytope, and the cells of this cross polytope are in one to one correspondence with the cells in star(v)\mathrm{star}(v). Through this correspondence, a pairing on the cells of starL(v)\mathrm{star}_{L}(v) may be used to induce a pairing on the cells of star(v)\mathrm{star}(v). This correspondence restricts to a correspondence between starL(v)\mathrm{star}_{L}^{-}(v) and star(v)\mathrm{star}^{-}(v).

If vv is regular (Figure 8). Let vv be a PL regular point. Then there is at least on pair of edges (ei+,ei)(e_{i}^{+},e_{i}^{-}) for which ei+starL(v)e_{i}^{+}\notin\mathrm{star}_{L}^{-}(v) and eistarL(v)e_{i}^{-}\in\mathrm{star}_{L}^{-}(v) (or vice versa) (due to [19], Theorem 3.7.3). Assume without loss of generality it is eie_{i}^{-} (as local relabeling does not change the combinatorics).

Refer to caption
Refer to caption
Figure 8: Left. A pairing on the lower star of a a PL regular point. Right. A pairing on the lower star of an index 1 PL critical point.

Let viv^{-}_{i} denote the bounding vertex of eie^{-}_{i} in starL(v)\mathrm{star}_{L}(v) which is not vv. We may view starL(v)\mathrm{star}_{L}^{-}(v) as the cone of viv^{-}_{i} over T(v)starL(v)T(v)\cap\mathrm{star}_{L}^{-}(v), where T(v)T(v) is the (n01)(n_{0}-1)-dimensional cross-polytope obtained when the antipodal edges ei+e^{+}_{i^{*}} and eie^{-}_{i^{*}} are removed from starL(v)\mathrm{star}_{L}(v): i.e.,

T(v):={CstarL(v)|ei+,eiC}.T(v):=\{C\in\mathrm{star}_{L}(v)|e^{+}_{i},e^{-}_{i}\notin C\}.

We may obtain a discrete vector field VV on starL(v)\mathrm{star}_{L}^{-}(v) by pairing a dd-dimensional cell σT(v)starL(v)\sigma\in T(v)\cap\mathrm{star}_{L}^{-}(v) with the corresponding d+1d+1 dimensional cell viσv^{-}_{i^{*}}\sigma. This gives a complete pairing of all cells in starL(v)\mathrm{star}_{L}^{-}(v).

To see that VV is acylic note that, by construction, any path within VV contains exactly one pair. For any choice of (C0,D0)(C_{0},D_{0}) as an initial pair in a path, we have D0=viC0D_{0}=v^{-}_{i}C_{0}. Any choice of C1<D0C_{1}<D_{0} and C1C0C_{1}\neq C_{0} has the property C1C_{1} contains viv^{-}_{i^{*}}, so that C1C_{1} paired with a codimension one face in VV. This ensures that any path terminates after one pair.

Refer to caption
Refer to caption
Figure 9: The algorithm constructing the pairing on starL(v)L\mathrm{star}^{-}_{L}(v)\cup L for an index-22 critical vertex vv. This pairing restricts to cells in star(v)\mathrm{star}^{-}(v), leaving only one two-cell unpaired.

If vv is critical (Figure 9). Now suppose vv is critical of index kk. Let LL be a local lower link of vv. Then starL(v)\mathrm{star}^{-}_{L}(v) has the combinatorics of the coordinate axes of k\mathbb{R}^{k}, and is the cone of vv with LL. Together, starL(v)L\mathrm{star}^{-}_{L}(v)\cup L form a kk-dimensional simplicial complex, call it 𝒩\mathcal{N}, whose underlying set is a kk-dimensional cross-polytope.

Selection step. For each coordinate direction 1ik1\leq i\leq k, there is a pair of edges ei+e_{i}^{+} and eie_{i}^{-} in star(v)\mathrm{star}^{-}(v) whose sign sequences differ by opposite signs in a single entry. The choice of labeling of ei+e_{i}^{+} determines eie_{i}^{-}. These edges intersect link(v)\mathrm{link}^{-}(v) on opposite sides of vv in vertices vi+,viv_{i}^{+},v_{i}^{-}. For each 1ik1\leq i\leq k, let viv_{i} be a vertex selected from {vi+,vi}\{v_{i}^{+},v_{i}^{-}\} (distinguishing a single quadrant of star(v)\mathrm{star}^{-}(v)). Observe that each viv_{i} has an opposite vertex given by the other vertex in this set, which we will call viopv_{i}^{op}. Once the choice of ordering coordinate directions and choosing a positive direction in each coordinate direction is complete, no other choices need to be made.

Pairing construction algorithm. We can then recursively assign the following pairings on 𝒩\mathcal{N}. Recursively for 1i<k1\leq i<k, for each CC in link(vi)\mathrm{link}(v_{i}) (relative to 𝒩\mathcal{N}), add the pairing (C,Cvi)(C,Cv_{i}) if CC has not yet been paired. These pairings restrict to a pairing on star(v)\mathrm{star}^{-}(v).

This operation constructs a discrete vector field. Namely, it does not try to pair any cell twice. Observe that at step ii if CC in star(v)\mathrm{star}^{-}(v) is in the link𝒩(vi)\mathrm{link}_{\mathcal{N}}(v_{i}) and CC has not yet been paired, then we claim CviCv_{i} has also not yet been paired. This is because each step adds cells which comprise the union of the star and the link of a single vertex in 𝒩\mathcal{N}, a simplicial complex; this union is always a simplicial complex. If a cell CC has not been added to this union, then no cell it is the boundary of could have been added to the union either. Thus each cell CC is included in at most one pair of cells in the pairing, as is needed.

All cells in starL(v)\mathrm{star}^{-}_{L}(v) get paired except one, of dimension kk. By construction, the union of the cells of the resulting pairing is the union of the stars and links of the vertices v1,,vnv_{1},...,v_{n} in 𝒩\mathcal{N}. Since star𝒩(vi+)link𝒩(vi+)\mathrm{star}_{\mathcal{N}}(v_{i}^{+})\cup\mathrm{link}_{\mathcal{N}}(v_{i}^{+}) is precisely all cells in 𝒩\mathcal{N} which do not contain viv_{i}^{-} (and vice versa), we see that the only cell in starL(v)\mathrm{star}^{-}_{L}(v) which is not paired is the unique cell which contains none of the viv_{i} and is not in the stars of any of the viv_{i}. This is the interior of the simplex given by {v}{viopp}1ik\{v\}\cup\{v_{i}^{opp}\}_{1\leq i\leq k}, which is a kk-simplex, as desired.

The resulting construction is acyclic. Let (C1,D1)(C_{1},D_{1}) be a pair in VV arising from this algorithm. Then by construction D1=C1viD_{1}=C_{1}v_{i} for some ii satisfying 1ik1\leq i\leq k.

Now we consider the possibilities for the next pair in the VV-path, (C2,D2)(C_{2},D_{2}). We know C2C_{2} must be a codimension 1 element of the boundary of D1D_{1} that is not C1C_{1}. It also must be paired with a higher-dimensional coface.

Let CC be a candidate for C2C_{2}, that is, an arbitrary codimension 1 element of the boundary of D1D_{1} that is not C1C_{1}. We observe that, as a boundary element of DD which is not C1C_{1},

C=BviC=Bv_{i}

where BB is a codimension-one boundary element of C1C_{1}.

Consider the step at which CC was added to the pairing.

If CC had not yet been added to the pairing by step ii, then CC contains vjopv_{j}^{op} for all j<ij<i (as otherwise CC would be in the star or link of vjv_{j}). Thus BB also contains vjopv_{j}^{op} for all j<ij<i as well, and also must not have been paired by step ii. If BB was not a member of a pair by step ii, then (B,Bvi=C)(B,Bv_{i}=C) is a VV-pair, and CC is added to the pairing at step ii. Thus, CC is not the first element of a pairing and cannot be C2C_{2} in a VV-path.

We conclude that if (C1,D1),(C2,D2)(C_{1},D_{1}),(C_{2},D_{2}) is a VV-path within the local lower star of vv, then C2C_{2} was paired at a strictly earlier step than C1C_{1}. As a result, VV contains no cycles.

Remark 4.

Observe that the only choice made was in the Selection Step. To make the Selection Step deterministic and dependent on the values of FF, we observe that viv_{i} can “morally” be selected to be on those edges whose opposing vertex has the lowest value in 𝒞(F)\mathcal{C}(F), or if there is no opposing vertex, whose unbounded edge has the steepest directional derivative. However, the same result follows regardless of whether we made the “moral” choice. In fact, as we discuss in Section 5, it is potentially more computationally convenient to be amoral (in this sense).

We now are able to “stitch together” the local pairings on the lower stars of the vertices for a global discrete vector field which satisfies the desired properties, following a similar approach as in [6].

Theorem 3.

Let vv be a vertex in 𝒞(F)\mathcal{C}(F) and let V(v)V(v) be an acyclic discrete vector field on star(v)\mathrm{star}^{-}(v) obtained by the construction in Lemma 8. Then V=v𝒞(F)V(v)V=\bigcup_{v\in\mathcal{C}(F)}V(v) is a relatively perfect discrete gradient vector field to FF on 𝒞(F)\mathcal{C}(F)^{-}_{*}, with {}\{*\} a critical 0-cell.

Proof.

The lower stars of each of the vertices of 𝒞(F)\mathcal{C}(F) are disjoint, and the union of all the lower stars of all the vertices of 𝒞(F)\mathcal{C}(F) is 𝒞(F)\mathcal{C}(F)^{-}.

VV is a valid discrete gradient vector field. Suppose we have a VV-path (C1,D1),(C2,D2),,(Cn,Dn)(C_{1},D_{1}),(C_{2},D_{2}),...,(C_{n},D_{n}). If this path consists entirely of cells in the lower star of some fixed vertex vv then by Lemma 8 it is acyclic.

Otherwise, some pair (Ci,Di),(Ci+1,Di+1)(C_{i},D_{i}),(C_{i+1},D_{i+1}) satisfies the condition that CiC_{i} is in the lower star of a different vertex than Ci+1C_{i+1}; call these viv_{i} and vi+1v_{i+1} respectively. (Observe we can make this statement because each VV-pair is contained in the lower star of the same vertex, by construction). As DiD_{i} is also in the lower star of viv_{i}, F(Di)F(D_{i}) is bounded above by F(vi)F(v_{i}). In particular, as Ci+1C_{i+1} is a face of DiD_{i}, F(Ci+1)F(C_{i+1}) must also be bounded above by F(vi)F(v_{i}). Since Ci+1C_{i+1} is not in the lower star of viv_{i} (and is in the lower star of vi+1v_{i+1}), we conclude that that F(vi+1)F({v_{i+1}}) is strictly less than F(vi)F({v_{i}}). As a result, all VV-paths cannot return to the lower star of a vertex once they have left it.

By construction VV has exactly one critical kk-cell for each vertex vv which is a PL critical point of FF with index kk, and a critical 0-cell {}\{*\} with FF-value -\infty. Each critical kk-cell has maximal value given by F(v)F(v) for its corresponding vertex vv. As the vertex vv is an index-kk PL critical point, Hk(𝒞(F)F(v),𝒞(F)F(v)ϵ)H_{k}(\mathcal{C}(F)^{F(v)},\mathcal{C}(F)^{F(v)-\epsilon}) is rank 1 ([8]); this is precisely what is needed for VV to be relatively perfect to FF (Definition 9).

In summary, the results of this section demonstrate a constructive algorithm for producing a relatively-perfect discrete gradient vector field to FF on the cells of 𝒞(F)\mathcal{C}(F)^{-}.

5 Computational considerations

Computational implemention of the algorithm in Theorem 3 would rely upon identifying whether a vertex in 𝒞(F)\mathcal{C}(F) is PL Morse, which involves computing the sign of the gradient on each edge incident to a vertex. Gradients are a local computation, but until now, identifying edges and vertices of this polyhedral complex computationally relied upon an algorithm which globally computes the location and sign sequence of all vertices of 𝒞(F)\mathcal{C}(F). This is inefficient if, for example, we wish to follow the local gradient flow and identify reasonable critical cells locally. In this section, we discuss algorithms which may be used to compute the F\nabla F-orientation of edges of 𝒞(F)\mathcal{C}(F) locally. This computation allows us to construct the pairing from Theorem 3 locally to a given vertex, including whether this vertex is critical.

5.1 Partial Derivatives along Edges

Here we develop an analytic description of the gradient of FF restricted to a cell CC of 𝒞(F)\mathcal{C}(F), determined by the sign sequence of CC (Lemma 9). This gradient may then be used to explicitly compute a vector in the direction vE\overrightarrow{vE}, that is, from a vertex to an incident edge for any vertex-edge pair in CC (Lemma 10). By multiplication, we may locally obtain the F\nabla F-orientation on EE (Corollary 3).

Lemma 9.

Let FF be a supertransversal neural network given by

F=GFmF1F=G\circ F_{m}\circ...\circ F_{1}

Let CC be any top-dimensional cell of 𝒞(F)\mathcal{C}(F). Then,

F(i)|C=k=1iReLU(Δk(C))WkandF|C=WGk=1mReLU(Δk(C))WkF_{(i)}^{\prime}|_{C}=\prod_{k=1}^{i}\textrm{ReLU}(\Delta_{k}(C))W_{k}\quad\textrm{and}\quad\nabla F|_{C}=W_{G}\prod_{k=1}^{m}\textrm{ReLU}(\Delta_{k}(C))W_{k}

where WkW_{k}, WGW_{G} are the linear weight matrices of AkA_{k} and GG, respectively, and Δk(C)\Delta_{k}(C) is the diagonal nk×nkn_{k}\times n_{k} matrix with skj(C)s_{kj}(C) in each diagonal entry, and jj ranges from 11 to nkn_{k}.

Proof.

By construction F|CF\big{|}_{C} is affine, and in fact each intermediate F(i)|CF_{(i)}|_{C} is also affine. Recall that each layer map FiF_{i} is given by

Fi=ReLUAiF_{i}=\textrm{ReLU}\circ A_{i}

where AiA_{i} is an affine function and ReLU is the max{0,x}\max\{0,x\} function applied coordinatewise.

By definition, sij(v)=0s_{ij}(v)=0 if and only if πjAiF(i1)(v)=0\pi_{j}\circ A_{i}\circ F_{(i-1)}(v)=0. This is an affine map. If Aix=Wix+biA_{i}\vec{x}=W_{i}\vec{x}+b_{i} for a weight matrix WiW_{i} and bias vector bib_{i}, then we note that Fi|F(i1)(C)F_{i}\big{|}_{F_{(i-1)}(C)} is given by

Fi|F(i1)(C)(x)=ReLU(Δi(C))(Wix+bi)F_{i}\big{|}_{F_{(i-1)}(C)}(\vec{x})=\textrm{ReLU}(\Delta_{i}(C))\left(W_{i}\vec{x}+\vec{b}_{i}\right)

where Δi(C)\Delta_{i}(C) is the diagonal ni×nin_{i}\times n_{i} matrix with sij(C)s_{ij}(C) in each diagonal entry, where jj ranges from 0 to nin_{i} (each of the output entries of FiF_{i}). This resulting map is affine. By composing these layer maps, we can express F(i)|CF_{(i)}|_{C} as

F(i)|C(x)=(k=1iReLU(Δk(C))Wk)x+bi(C)F_{(i)}|_{C}(\vec{x})=\left(\prod_{k=1}^{i}\textrm{ReLU}(\Delta_{k}(C))W_{k}\right)\vec{x}+\vec{b_{i}}(C)

where bi(C)\vec{b_{i}}(C) is determined by expanding the composite matrix multiplication (and will ultimately not matter). Resultingly,

F(i)|C=(k=1iReLU(Δk(C))Wk)F_{(i)}^{\prime}|_{C}=\left(\prod_{k=1}^{i}\textrm{ReLU}(\Delta_{k}(C))W_{k}\right)

Now that we have an equation for F(i)|CF_{(i)}^{\prime}|_{C}. Letting i=nmi=n_{m} (the last layer of FF) we obtain the total gradient of FF given by FmF_{m}^{\prime} followed by WGW_{G}, for G:nmG:\mathbb{R}^{n_{m}}\to\mathbb{R}.

F|C=WGk=1nmReLU(Δk(C))Wk\nabla F|_{C}=W_{G}\prod_{k=1}^{n_{m}}\textrm{ReLU}(\Delta_{k}(C))W_{k}

where WGW_{G} is the weight matrix of the final affine function G:nmG:\mathbb{R}^{n_{m}}\to\mathbb{R}, as desired.

We can obtain the gradient of FF along edges of 𝒞(F)\mathcal{C}(F) by first identifying the direction of vE\vec{vE}, which involves solving an n0×n0n_{0}\times n_{0} system of equations.

Lemma 10.

Let FF be a supertransversal, generic neural network and let vv be a vertex of 𝒞(F)\mathcal{C}(F). Let EE be an edge of 𝒞(F)\mathcal{C}(F) incident to VV. Let s(v),s(E)s(v),s(E) be sign sequences of vv and EE respectively. Let vE\overrightarrow{vE} denote the positive ray of vectors spanned by a vector beginning at vv and ending at xx, a point in EE.

Let (v)={(ik,jk)}k=1n0\mathcal{I}(v)=\{(i_{k},j_{k})\}_{k=1}^{n_{0}}, consisting of the (i,j)(i,j) tuples for which sij(v)=0s_{ij}(v)=0. Distinguish as (i,j)(i_{*},j_{*}) the unique (i,j)(i,j) coordinate for which si,j(E)si,j(v)s_{i_{*},j_{*}}(E)\neq s_{i_{*},j_{*}}(v).

Denote by 𝒲(v,C)\mathcal{W}(v,C) the (n0×n0)(n_{0}\times n_{0}) matrix whose kkth row is given by

[𝒲(v,C)]k,=[F(ik)|C]jk,where(ik,jk)(v)[\mathcal{W}(v,C)]_{k,\cdot}=[F_{(i_{k})}^{\prime}|_{C}]_{j_{k},\cdot}\quad\textrm{where}\quad(i_{k},j_{k})\in\mathcal{I}(v)

Then

vE=csij(E)𝒲(v,C)1ej\overrightarrow{vE}=c\cdot s_{i_{*}j_{*}}(E)\cdot\mathcal{W}(v,C)^{-1}e_{j_{*}}

where cc is any positive scalar and eje_{j_{*}} is the standard basis vector in n0\mathbb{R}^{n_{0}} with a 11 in the jj_{*} entry and 0 elsewhere.

Proof.

We begin by finding a system of equations that vv satisfies, determined by its sign sequence.

Letting s(v)s(v) be the sign sequence of vv, let CC be the cell containing vv and EE whose sign sequence is obtained by replacing s(E)s(E) with +1+1 for all entries that s(E)=0s(E)=0. By Lemma 4, CC contains EE (and vv).

Since vCv\in C we can express the location of vv in n0\mathbb{R}^{n_{0}} by the solution of the system of n0n_{0} equations given by equations obtained from the jjth rows of the iith intermediate layer maps for each (i,j)(i,j) pair where sij(v)=0s_{ij}(v)=0:

[F(i)|Cv+bi(C)]j=0[F_{(i)}^{\prime}|_{C}\vec{v}+\vec{b}_{i}(C)]_{j}=0 (1)

Simultaneously, along every point xx in EE, we have (assuming without loss of generality that sij(E)=1s_{i_{*}j_{*}}(E)=1) that

[F(i)|Cx+bi(C)]j>0[F_{(i_{*})}^{\prime}|_{C}x+\vec{b}_{i_{*}}(C)]_{j_{*}}>0

with all other equations in Equation 1 satisfied exactly.

Now consider the direction xv\vec{x}-\vec{v}, that is, the direction vE\overrightarrow{vE}. In each of the (i,j)(i,j) pairs for which sij(v)=0s_{ij}(v)=0, we have:

[F(i)|C(xv)]j\displaystyle[F_{(i)}^{\prime}|_{C}(\vec{x}-\vec{v})]_{j} =[F(i)|Cx+bi(C)]j[F(i)|Cv+bi(C)]j\displaystyle=[F_{(i)}^{\prime}|_{C}\vec{x}+\vec{b}_{i}(C)]_{j}-[F_{(i)}^{\prime}|_{C}\vec{v}+\vec{b}_{i}(C)]_{j}
={cwith sign(c)=sij(E) when (i,j)=(i,j)0otherwise\displaystyle=\begin{cases}c&\textrm{with }sign(c)=s_{i_{*}j_{*}}(E)\textrm{ when }(i,j)=(i_{*},j_{*})\\ 0&\textrm{otherwise}\end{cases}

Selecting any value for cc with the appropriate sign gives a n0×n0n_{0}\times n_{0} system of equations whose solution is a vector in the same direction xv\vec{x}-\vec{v}, that is, vE\overrightarrow{vE}. ∎

Corollary 3.

Let FF be a supertransversal neural network and vv be a vertex of 𝒞(F)\mathcal{C}(F). Let EE be an edge incident to vv. Let s(v),s(E)s(v),s(E) be the sign sequences of vv and EE, respectively. Denote by vEF\partial_{vE}F the partial derivative of FF in the direction vE\overrightarrow{vE}.

Then,

sign(vEF)=sign(F|CvE)\operatorname{sign}\left(\partial_{vE}F\right)=\operatorname{sign}\left(\nabla F\big{|}_{C}\overrightarrow{vE}\right)

where i,ji_{*},j_{*} is the layer and neuron, respectively, for which sij(v)sij(E)s_{i*j*}(v)\neq s_{i*j*}(E).

Proof.

Multiply.

Remark 5.

Observe that obtaining the sign of vEF\partial_{vE}F involves a total complexity of nm+1n_{m}+1 matrix multiplications (followed by ReLU coordinatewise), storing n0n_{0} rows of the intermediates to obtain a system of equations to solve for vE\overrightarrow{vE}. This process is comparable in complexity to evaluating FF at a point as long as n0n_{0} is relatively low.

5.2 Following Gradient Flow

Here we describe an algorithm which can be used to locally compute a discrete gradient vector field at a single vertex for a fully-connected, feedforward ReLU neural network FF on 𝒞(F)\mathcal{C}(F) without having computed all cells of 𝒞(F)\mathcal{C}(F).

In other words, suppose we have a cell C𝒞(F)C\in\mathcal{C}(F) with known sign sequence S(C)S(C), and we wish to identify a pairing for CC in a vector field VV compatible with FF while relying on a minimal number of evaluations of FF or FvE\partial F_{v\vec{E}} at individual points or along individual edges, respectively (as both computations are of similar complexity).

Knowing S(C)S(C) and the weights of FF gives an explicit set of linear inequalities which bound CC. The maximal (or minimal) value of FF (or any affine function) on CC can be identified via linear programming. In practice, applying a solver which uses the simplex algorithm [15] will quickly identify not only the vertex vCv\in C where the maximum (or minimum) of FF on CC is obtained, but also identifies the n0n_{0} equations giving its precise location. Setting the signs of S(C)S(C) in the entries corresponding to those n0n_{0} equations to zero will identify the sign sequence S(v)S(v).

The zero entries of S(v)S(v) have an order determined by the existing parameter order of FF; denote sij(v)s_{ij}(v) as the sign of vv in the iith layer and jjth neuron (jjth coordinate direction in ni\mathbb{R}^{n_{i}}). Ordering the entries of s(v)s(v) in lexicographic order induces an ordering Γ\Gamma on edges EE incident to vv by first ordering by which entry of S(E)S(E) does not equal that of S(v)S(v), and second by whether the entry is negative or positive.

In order to apply Lemma 8 to vv we must evaluate the directional derivative of FF along each of the 2n02n_{0} edges incident to vv whose sign sequences can be constructed. This can be done analytically via Corollary 3. If vv is regular, then take ee_{*} to be the first edge (with respect to the ordering Γ\Gamma) where F|e\partial F\big{|}_{e_{*}} is negative but F|eop\partial F\big{|}_{e_{*}^{op}} is positive; denote its sole additional nonzero entry the iji_{*}j_{*}-th entry.

Then if sij(C)=0s_{i_{*}j_{*}}(C)=0, we have that CC is paired with its coface DD obtained by replacing S(C)S(C) in index iji_{*}j_{*} with sij(e)s_{i_{*}j_{*}}(e_{*}). In the case where sij(C)=sij(e)s_{i_{*}j_{*}}(C)=s_{i_{*}j_{*}}(e_{*}) the cell CC is paired with its face DD obtained by replacing sij(C)s_{i_{*}j_{*}}(C) with sij(D)=0s_{i_{*}j_{*}}(D)=0.

If vv is critical of index kk, then take the critical kk-cell to be the cell obtained by replacing the kk zero entries of s(v)s(v) that identify the cells in star(v)\mathrm{star}^{-}(v) by 1-1. This an ordering and labeling of the edges eie_{i}^{-} for 1ik1\leq i\leq k given by the restriction of Γ\Gamma to these edges. This completes the Selection Step, and the algorithm may proceed.

In this way, in order to identify a pairing to which CC belongs, we only need to 1) optimize FF on CC, then 2) evaluate 2n02n_{0} directional derivatives of FF. The pairings are, in this sense, not determined by the value of FF on neighbors of vv, but instead the choice of ordering of the coordinates in each layer’s ni\mathbb{R}^{n_{i}} and the relative orientations of their corresponding bent hyperplanes at vv.

6 Conclusion

In this work, we introduce a schematic for translating between a given piecewise linear Morse function on a canonical polyhedral complex and a compatible (“relatively perfect”) discrete Morse function. Our approach is constructive, producing an algorithm that can be used to determine if a given vertex in a canonical polyhedral complex corresponds to a piecewise linear Morse critical point, and furthermore an algorithm for constructing a consistent pairing on cells in the canonical polyhedral complex which contain this vertex. However, though we discuss the principles necessary, we leave explicit computational implementation and experimental observations for future work.

As discussed in [8], not all ReLU neural networks are piecewise linear Morse, and this is a limitation of our work. Neural networks with “flat” cells (on which FF is constant) are not addressed by our algorithm. This work also defines homological tools which can be used to describe the local change in sublevel set topology at a subcomplex of flat cells, however extensive technical work is needed to provide a direct analog between the cellular topology of 𝒞(F)\mathcal{C}(F) and the relevant sublevel set topology. We also leave this to future work.

We have reason to believe that our proposed algorithm is applicable to any setting in which the star neighborhoods of the vertices of a PL manifold with the structure of a polyhedral complex are locally combinatorially equivalent to a cross-polytope. The only broad class of functions which we are aware of that satisfies these conditions are ReLU neural networks and similar (for example, leaky ReLU networks or piecewise linear neural networks with activation functions that have several nonlinearities).

We intend that this work be used to develop further theoretical and computational tools for analyzing neural network functions from topological perspectives.

7 Acknowledgements

Many thanks to Eli Grigsby, without whom we may have never put our heads together.

References

  • [1] Randall Balestriero, Romain Cosentino, Behnaam Aazhang, and Richard Baraniuk. The geometry of deep networks: Power diagram subdivision. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  • [2] Brian Brost, Jesper Michael Møller, and Pawel Winter. Computing Persistent Homology via Discrete Morse Theory. Technical report, 2013.
  • [3] Ekin Ergen and Moritz Grillo. Topological expressivity of relu neural networks, 2024.
  • [4] Robin Forman. Morse theory for cell complexes. Advances in Mathematics, 134(1):90–145, 1998.
  • [5] Robin Forman. A user’s guide to discrete Morse theory. Sém. Lothar. Combin., 48:Art. B48c, 35, 2002.
  • [6] Ulderico Fugacci, Claudia Landi, and Hanife Varlı. Critical sets of pl and discrete morse theory: A correspondence. Computers & Graphics, 90:43–50, 2020.
  • [7] J. Elisenda Grigsby and Kathryn Lindsey. On transversality of bent hyperplane arrangements and the topological expressiveness of ReLU neural networks. SIAM Journal on Applied Algebra and Geometry, 6(2):216–242, 2022.
  • [8] J. Elisenda Grigsby, Kathryn Lindsey, and Marissa Masden. Local and global topological complexity measures of ReLU neural network functions. Preprint arXiv:2204.06062, 2022.
  • [9] Romain Grunert. Piecewise Linear Morse Theory. PhD thesis, 2017.
  • [10] William H. Guss and Ruslan Salakhutdinov. On characterizing the capacity of neural networks using algebraic topology. CoRR, abs/1802.04443, 2018.
  • [11] Boris Hanin and David Rolnick. Deep ReLU networks have surprisingly few activation patterns. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
  • [12] Shaun Harker, Konstantin Mischaikow, and Kelly Spendlove. Morse Theoretic Templates for High Dimensional Homology Computation. may 2021.
  • [13] Patricia Hersh. On optimizing discrete morse functions. Advances in Applied Mathematics, 35(3):294–322, 2005.
  • [14] Matt Jordan, Justin Lewis, and Alexandros G Dimakis. Provable certificates for adversarial examples: Fitting a ball in the union of polytopes. Advances in neural information processing systems, 32, 2019.
  • [15] Howard Karloff. The Simplex Algorithm, pages 23–47. Birkhäuser Boston, Boston, MA, 1991.
  • [16] Thomas Lewiner. Critical sets in discrete morse theories: Relating forman and piecewise-linear approaches. Computer Aided Geometric Design, 30(6):609–621, 2013. Foundations of Topological Analysis.
  • [17] Thomas Lewiner, Hélio Lopes, and Geovan Tavares. Toward optimality in discrete morse theory. Experimental Mathematics, 12, 01 2003.
  • [18] Marissa Masden. Algorithmic determination of the combinatorial structure of the linear regions of ReLU neural networks. Preprint arXiv:2207.07696, 2022.
  • [19] Marissa Masden. Accessing the Topological Properties of Neural Network Functions. PhD thesis, University of Oregon, 2023.
  • [20] John Willard Milnor. Morse theory. Number 51. Princeton university press, 1963.
  • [21] Vanessa Robins, Peter John Wood, and Adrian P. Sheppard. Theory and algorithms for constructing discrete morse complexes from grayscale digital images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1646–1658, 2011.
  • [22] C P Rourke and B J Sanderson. Introduction to piecewise-linear topology. Springer Study Edition. Springer, Berlin, Germany, January 1982.