This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Betti Numbers of Prodsimplicial Complexes for Directed Graphs with Applications to Word Reductions

Lina Fajardo Gómez, Margherita Maria Ferrari, Nataša Jonoska, Masahico Saito
Abstract

We propose custom made cell complexes, in particular prodsimplicial complexes, in order to analyze data consisting of directed graphs. These are constructed by attaching cells that are products of simplices and are suited to study data of acyclic directed graphs, called here consistently directed graphs. We investigate possible values of the first and second Betti numbers and the types of cycles that generate nontrivial homology. We apply these tools to directed graphs associated with reductions of double occurrence words, words that are associated with DNA recombination processes in certain species of ciliates. We study the effects of word operations on the homology for these graphs.

1 Introduction

Topological Data Analysis (TDA) has been extensively used in recent years in biology and other sciences. It tries to capture the underlying structure of a given data set through properties such as connectedness, circular loops and higher dimensional holes. Topological invariants can be used to detect such “topological signatures” of the space. Typically, a given data set to analyze is a set of points in a Euclidean space, which is a priori discrete. To capture topological shapes of such data sets, simplicial complexes are formed based on the proximity of data points.

In some biological phenomena, such as DNA recombination processes, data sets are represented by graphs, which consist of vertices and edges. For example, in [16], gene interaction patterns are represented by graphs where vertices represent genes and edges represent how two genes interact, such as intersecting each other. In this paper, we present a novel graph-based model for genome rearrangement in some ciliate species. Gene assembly pathways can be represented by subword pattern deletions in double occurrence words (DOWs), words where each symbol appears exactly twice [19]. The iterated subword deletions modeling the recombination patterns can be represented by a graph, called a word graph, whose vertices are DOWs connected by a directed edge if one word can be obtained from the other through a pattern deletion. Thus, methods for performing TDA on such directed graph data are of interest.

Another shortcoming of common TDA is that its construction of complexes is often based solely on simplicial ones. In our model of word graphs for gene assembly, it seems natural to fill in squares as well as triangles to study topological aspects of the biological process. Given such a specific biological situation, we are inspired to use more general, custom built cell complexes to study topological properties of data sets. In this paper, for the purpose of specifically applying it to our word graph model, we propose the use of a prodsimplicial complex for topological studies of digraphs.

Thus, our focus of the paper is twofold: (1) defining specific cell complexes, called prodsimplicial complexes, constructed from directed graphs by attaching products of simplices for topological analysis of graph outputs, and (2) applying prodsimplicial complex homology to study the complexity of gene assembly processes through word graphs that model DNA rearrangement in certain species of ciliates.

The paper is organized as follows. In Section 2 we define prodsimplicial cell complexes for directed graphs. In Section 3 we describe some nontrivial generators of homology groups and construct digraphs with arbitrarily large Betti numbers. Section 4 introduces DOWs, their reduction pathways and word graphs, while Section 5 studies the effect of word operations on word graphs and their associated topological invariants.

2 Prodsimplicial Homology for Directed Graphs

Topological data analysis is used to determine the underlying shape of the space containing a data set by studying the space’s invariants. For example, the nn-dimensional Betti number, denoted by βn\beta_{n}, is a topological invariant corresponding to the number of “holes” of dimension nn. Instead of using simplices as cells, we define a prodismplicial complex for our goal of studying DNA recombination. In this section, we recall basic concepts from graph theory and introduce the notation that are used throughout the rest of this discussion.

2.1 Directed Graphs with Source and Target

A directed graph (digraph for short) GG is a 4-tuple (V,E,ι,τ)(V,E,\iota,\tau) where VV is a finite set, EV×V{(x,x)|xV}E\subseteq V\times V\setminus\{(x,x)\ |\ x\in V\} and ι,τ\iota,\tau are maps from EE to VV. Elements of VV and EE are called vertices and edges, respectively, and we use V(G)V(G) and E(G)E(G) to denote the sets of vertices and edges of a given digraph GG. The map ι\iota indicates the source (also called initial) vertex of each edge, and similarly τ\tau indicates the target (also called terminal) vertex. Directly from the definition, it follows that the graphs considered have no loops or parallel edges. An edge ee with source uu and target vv is denoted 111To distinguish between ordered pairs in the Cartesian product of two sets and directed edges, we use [u,v][u,v] to denote the ordered pair (u,v)(u,v) when it corresponds to an edge, so that [(u1,v1),(u2,v2)][(u_{1},v_{1}),(u_{2},v_{2})] denotes the directed edge (u1,v1)(u2,v2)(u_{1},v_{1})\rightarrow(u_{2},v_{2}) between vertices (u1,v1)(u_{1},v_{1}) and (u2,v2)(u_{2},v_{2}). with e=[u,v]e=[u,v]. A source of GG is a vertex ss such that τ(e)s\tau(e)\neq s for every edge eEe\in E. Similarly, a target of GG is a vertex tt such that ι(e)t\iota(e)\neq t for every edge eEe\in E.

For ease of notation, we refer to a digraph as G=(V,E)G=(V,E), omitting the maps ι\iota and τ\tau whenever clear from the context. We refer the reader to [11] for other elementary definitions in graph theory not explicitly recalled here.

Definition 2.1 (weakly directed, consistently directed).

A connected digraph GG is said to be weakly directed if it has unique source ss and a unique target tt. Moreover, if GG is weakly directed and it has no (directed) cycles, then GG is called consistently directed. In a weakly directed digraph GG we use s(G)s(G) and t(G)t(G) to denote the unique source and target vertices, respectively.

Example 2.2.

Figure 1 illustrates a digraph that is weakly directed, but not consistently directed since it has a single source (shown in green), a single target (in red) and a directed cycle.

\psscalebox

1.0 1.0

Figure 1: A weakly directed digraph that is not consistently directed. The source is colored with green, while the target is in red.

We use the definition of Cartesian product as in [24].

Definition 2.3 (Cartesian product, prime graph).

Let G1=(V1,E1)G_{1}=(V_{1},E_{1}) and G2=(V2,E2)G_{2}=(V_{2},E_{2}) be digraphs. The Cartesian product of G1G_{1} and G2G_{2} is the digraph G=G1G2=(V,E)G=G_{1}\square G_{2}=(V,E) where V=V1×V2V=V_{1}\times V_{2} and [(u1,v1),(u2,v2)]E[(u_{1},v_{1}),(u_{2},v_{2})]\in E if either u1=u2u_{1}=u_{2} and [v1,v2]E2[v_{1},v_{2}]\in E_{2} or v1=v2v_{1}=v_{2} and [u1,u2]E1[u_{1},u_{2}]\in E_{1}.

A digraph GG is said to be prime with respect to the Cartesian product if G=G1G2G=G_{1}\square G_{2} implies that either G1G_{1} or G2G_{2} is the trivial graph on one vertex, denoted Δ0\Delta^{0}. It is shown in [13] that the factorization of a connected directed graph into prime factors is unique with respect to the Cartesian product.

Note that G1G2G2G1G_{1}\square G_{2}\cong G_{2}\square G_{1}, as [(u1,v1),(u2,v2)]E(G1G2)[(u_{1},v_{1}),(u_{2},v_{2})]\in E(G_{1}\square G_{2}) if and only if [(v1,u1),(v2,u2)]E(G2G1)[(v_{1},u_{1}),(v_{2},u_{2})]\in E(G_{2}\square G_{1}) so that the Cartesian product is commutative modulo isomorphism. Moreover, as proved in [18], it is also associative. As a consequence, when dealing with the Cartesian product of more than two graphs we write G=((((G1G2)G3))Gk)G=((\ldots((G_{1}\square G_{2})\square G_{3})\ldots)\square G_{k}) without including nested parentheses. Similarly, we write V(G)={((v1,,vk1),vk)|viV(Gi) for 1ik}V(G)=\{((v_{1},\ldots,v_{k-1}),v_{k})\ |\ v_{i}\in V(G_{i})\text{ for }1\leq i\leq k\} without nested parentheses.

The Cartesian product G1G2G_{1}\square G_{2} creates one copy (referred to as a G1G_{1}-layer in [18]) of G1G_{1} for every vertex in G2G_{2}, and vice versa. It can be shown that the Cartesian product G1G2G_{1}\square G_{2} is consistently directed if and only if both factors are consistently directed [12]. By associativity, this result extends to Cartesian products of several factors.

2.2 Prodsimplicial Complexes for Directed Graphs

Vertices and edges in a graph can be considered as 0-dimensional and 1-dimensional cells respectively, therefore a graph can be endowed with a richer structure by attaching higher dimensional cells at instances of particular subgraphs.

In this section we present the main definitions of prodsimplicial complexes introduced in [20] for directed graphs. Other complexes, such as the pp-path complexes, called pp-path complexes, have also been considered in [15, 14]. Our motivation comes from a biological process where two paths of length 2 connecting two vertices correspond to two independent pathways of DNA recombination. For such path pairs, we attach solid square faces so that the two pathways are regarded as equivalent.

Definition 2.4 (simplicial digraph).

The nn-dimensional simplicial digraph, denoted by Δn\Delta^{n}, is the digraph with vertices V(Δn)={v0,v1,,vn}V(\Delta^{n})=\{v_{0},v_{1},\ldots,v_{n}\} and edges E(Δn)={[vi,vj]| 0i<jn}E(\Delta^{n})=\{[v_{i},v_{j}]\ |\ 0\leq i<j\leq n\}.

Note that the source and target of the edges of Δn\Delta^{n} are induced by the total order on the set of vertices V(Δn)={v0,v1,,vn}V(\Delta^{n})=\{v_{0},v_{1},\ldots,v_{n}\}. It follows that Δn\Delta^{n} is consistently directed, with source v0v_{0} and target vnv_{n}. Moreover, the total order on the vertices guarantees that all subgraphs of Δn\Delta^{n} induced by a subset of V(Δn)V(\Delta^{n}) are also simplicial digraphs. Also, any two simplicial digraphs on the same number of vertices are isomorphic as directed graphs. In general, Δn\Delta^{n} can be obtained from Δn1\Delta^{n-1} by adding the new vertex vnv_{n} along with the edges [vi,vn][v_{i},v_{n}] for i=0,,n1i=0,\ldots,n-1.

In other works, simplicial digraphs are referred to as directed cliques [23, 21] or transitive tournaments [8]. Simplicial digraphs are prime with respect to the Cartesian product. Thanks to the unique prime decomposition in [13], prodsimplicial cells as described below are well defined.

Definition 2.5 (prodsimplicial cell).

An NN-dimensional prodsimplicial cell PP is the NN-cell that is a product of simplices i=1kΔini=Δ1n1××Δknk\prod_{i=1}^{k}\Delta_{i}^{n_{i}}=\Delta_{1}^{n_{1}}\times\cdots\times\Delta_{k}^{n_{k}} where ni>0n_{i}>0 for all 1ik1\leq i\leq k and N=i=1kniN=\sum_{i=1}^{k}n_{i}. Its 1-skeleton is the Cartesian product of simplicial digraphs. That is, a graph of the form

i=1kΔini=Δ1n1Δknk.\displaystyle\displaystyle\operatorname*{{\square}}\limits_{i=1}^{k}\Delta_{i}^{n_{i}}=\Delta_{1}^{n_{1}}\square\cdots\square\Delta_{k}^{n_{k}}.

For brevity, we call PP a prodsimplicial NN-cell. When k=1k=1 we call PP an NN-simplex and denote PP by ΔN\Delta^{N}.

Since the Cartesian product of graphs is commutative up to isomorphism, we assume that in a prodsimplicial cell P=i=1kΔiniP=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}_{i} the order of the factors is such that n1n2nkn_{1}\geq n_{2}\geq\cdots\geq n_{k}. From this point on, a simplex will refer exclusively to a prodsimplicial cell with a single simplicial digraph factor unless otherwise specified. Given the correspondence between simplicial digraph factors and simplices, we abuse the notation and write P=i=1kΔiniP=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}_{i} instead of i=1kΔini\prod_{i=1}^{k}\Delta^{n_{i}}_{i}. Also note that the indices are necessary because the dimensions of the factors in the definition of PP may not be distinct. However, we omit the subscripts for brevity where it causes no confusion.

We can naturally define the geometric realization of a simplicial digraph as for standard simplices. From this geometric realization, prodsimplicial NN-cells can be viewed as subsets of N\mathbb{R}^{N} that inherit the product topology.

Example 2.6.

Prodsimplicial cells include not only simplices, but also cubes (as iterated products of Δ1\Delta^{1}), and triangular prisms (as products of any cell with Δ1\Delta^{1}). Figure 2 depicts the 1-skeletons of all prodsimplicial 33-cells.

\psscalebox

1.0 1.0 Δ2Δ1\Delta^{2}\square\Delta^{1}Δ1Δ1Δ1\Delta^{1}\square\Delta^{1}\square\Delta^{1}Δ3\Delta^{3}

Figure 2: Directed graphs corresponding to the 1-skeletons of prodsimplicial 33-cells.

The notion of prodsimplicial cell was first introduced for direct products of undirected graphs by Kozlov in [20]. We choose Cartesian products to obtain a complex that combines some features of both simplicial and cubical complexes while the cells remain consistently directed. Complexes comprised of prodsimplicial cells are also used in knot theory [6, 5].

Definition 2.7 (prodsimplicial cell orientation).

Let Δn\Delta^{n} be a simplex with vertex set V(Δn)={v0,v1,,vn}V(\Delta^{n})=\{v_{0},v_{1},\ldots,v_{n}\} and s(Δn)=v0s(\Delta^{n})=v_{0}. Let N(s(Δn))N(s(\Delta^{n})) be the set of neighbors of the source, V(Δn)s(Δn)V(\Delta^{n})\setminus s(\Delta^{n}). An orientation of Δn\Delta^{n} is an equivalence class (by even permutations) of orders of N(s(Δn))N(s(\Delta^{n})). An oriented simplex, denoted [Δn][\Delta^{n}], is a simplex along with an orientation.

Let P=i=1kΔniP=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}} be a prodsimplicial cell and let s(P)s(P) be the source of PP. Let vi,0=s(Δni)v_{i,0}=s(\Delta^{n_{i}}) and Ni={(v1,0,v2,0,,vi,j,,vk,0)|vi,jN(s(Δni)),1jni}N_{i}=\{(v_{1,0},v_{2,0},\ldots,v_{i,j},\ldots,v_{k,0})\ |\ v_{i,j}\in N(s(\Delta^{n_{i}})),1\leq j\leq n_{i}\}. Given a total order on each NiN_{i}, a total order on N1,,NkN_{1},\ldots,N_{k} determines a total order of N(s(P))N(s(P)). An orientation of PP is an equivalence class (by even permutations) of such orders of N(s(P))N(s(P)). A prodsimplicial cell PP along with an orientation is an oriented prodsimplicial cell and is denoted by [P][P].

Example 2.8 (oriented prodsimplicial cells).

Let Δ2\Delta^{2} be the 2-simplex with V(Δ2)={a,b,c}V(\Delta^{2})=\{a,b,c\} and Δ1\Delta^{1} be the 1-simplex with V(Δ1)={x,y}V(\Delta^{1})=\{x,y\}. Let P=Δ2Δ1P=\Delta^{2}\square\Delta^{1}. Then s(P)=(a,x)s(P)=(a,x) and N1={(b,x),(c,x)}N_{1}=\{(b,x),(c,x)\} and N2={(a,y)}N_{2}=\{(a,y)\}. The order N1,N2N_{1},N_{2} of the sets of neighbors induces the order ((b,x),(c,x),(a,y))((b,x),(c,x),(a,y)) on N(s(P))N(s(P)), while N2,N1N_{2},N_{1} induces the order ((a,y),(b,x),(c,x))((a,y),(b,x),(c,x)). If N1N_{1} is ordered such that (c,x)(c,x) is before (b,x)(b,x), then the order N2,N1N_{2},N_{1} induces the order ((a,y),(c,x),(b,x))((a,y),(c,x),(b,x)). The first two orders give the prodsimplicial cell the same orientation, while the third order of the vertices defines an oppositely oriented cell.

Definition 2.9 (faces, facets, boundary set).

We say PP^{\prime} is a face of an NN-dimensional prodsimplicial cell P=i=1kΔniP=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}} if there exist ninii=1,kn_{i}^{\prime}\leq n_{i}\ \forall i=1,\ldots k such that P=i=1kΔniP^{\prime}=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}^{\prime}}. The facets of PP are its (N1)(N-1)-dimesional faces. The boundary set of PP, denoted by P\partial P, is the collection of all its facets.

Definition 2.10 (prodsimplicial cell complex).

Given a digraph G=(V,E)G=(V,E), we can inductively construct a prodsimplicial cell complex Γ\Gamma associated with GG, denoted by Γ(G)\Gamma(G) using the following gluing process:

  • Let Γ(0)\Gamma^{(0)} be the collection of all the vertices of GG.

  • Let Γ(1)=Γ(0)E\Gamma^{(1)}=\Gamma^{(0)}\cup E. That is, add to Γ(0)\Gamma^{(0)} all the edges of GG by attaching to the current complex all the prodsimplicial 11-cells whose facets are vertices of GG.

  • Let Γ(N1)\Gamma^{(N-1)} denote the complex created in the first N1N-1 steps. Let PP be a prodsimplicial NN-cell. We add PP to Γ(N)\Gamma^{(N)} if there exists a map ϕ:PΓ(N1)\phi:\partial P\rightarrow\Gamma^{(N-1)} such that P\partial P is homeomorphic to ϕ(P)\phi(\partial P), ϕ\phi preserves the orientation of each edge, and the restriction of ϕ\phi to each facet of PP is also a homeomorphism.

Example 2.11 (prodsimplicial cell complex).

Figure 3 shows the construction of a prodsimplicial cell complex Γ(3)\Gamma^{(3)} with cells of dimension at most 3 including seven squares and a cube (with its six faces counted among the squares). The digraph GG corresponds to Γ(1)\Gamma^{(1)}, obtained by collecting vertices and edges of GG.

(a) \psscalebox0.6 0.6 (b) \psscalebox0.6 0.6
(c) \psscalebox0.6 0.6 (d) \psscalebox0.6 0.6
Figure 3: Faces of dimension 0 are shown in (a), of dimension 1 in (b), 2 in (c) and 3 in (d) as they are added to the complex.

2.3 Prodsimplicial Homology Groups for Directed Graphs

In this section we describe a boundary operator for prodsimplicial cells, which allows us to compute homology groups for prodsimplicial cell complexes. Although a chain complex structure is defined on a prodsimplicial complex as a special case of CW-complexes, we present an explicit boundary operator on prodsimplicial cells using the product rule for computational purposes.

Given a prodsimplicial cell complex Γ\Gamma, a prodsimplicial NN-chain group on Γ\Gamma, denoted CN(Γ)C_{N}(\Gamma), is the free abelian group generated by oriented NN-dimensional prodsimplicial cells of Γ\Gamma. Its typical element, a prodsimplicial NN-chain, is a finite formal linear combination of NN-dimensional prodsimplicial cells with integer coefficients. In the Cartesian product of a simplex and a chain, the product distributes over the sum; that is

Δn0(i=1kΔni)=i=1k(Δn0Δni).\Delta^{n_{0}}\square\left(\sum_{i=1}^{k}\Delta^{n_{i}}\right)=\sum_{i=1}^{k}(\Delta^{n_{0}}\square\Delta^{n_{i}}).

Recall that the boundary operator for simplices is defined by

n(Δn)=i=0n(1)i[v0,v1,,vi1,vi^,vi+1,,vn],\partial_{n}(\Delta^{n})=\displaystyle\sum_{i=0}^{n}(-1)^{i}[v_{0},v_{1},\ldots,v_{i-1},\hat{v_{i}},v_{i+1},\ldots,v_{n}],

where vi^\hat{v_{i}} indicates that vertex viv_{i} has been deleted from the simplex. For brevity, we use [vi^][\hat{v_{i}}] to denote the simplex [v0,v1,,vi1,vi^,vi+1,,vn][v_{0},v_{1},\ldots,v_{i-1},\hat{v_{i}},v_{i+1},\ldots,v_{n}] in computations.

Definition 2.12 (boundary operator).

Let P=i=1kΔniP=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}} be a prodsimplicial NN-cell, where N=i=1kniN=\sum_{i=1}^{k}n_{i}. For 1ik1\leq i\leq k, we use the notation

niΔni¯=Δn1Δn2Δni1ni(Δni)Δni+1Δnk.\langle\overline{\partial_{n_{i}}\Delta^{n_{i}}}\rangle=\Delta^{n_{1}}\square\Delta^{n_{2}}\square\cdots\square\Delta^{n_{i-1}}\square\partial_{n_{i}}\left(\Delta^{n_{i}}\right)\square\Delta^{n_{i+1}}\square\cdots\square\Delta^{n_{k}}.

For 1i<jk1\leq i<j\leq k, let

niΔninjΔnj¯=Δn1ni(Δni)nj(Δnj)Δnk\langle\overline{\partial_{n_{i}}\Delta^{n_{i}}\partial_{n_{j}}\Delta^{n_{j}}}\rangle=\Delta^{n_{1}}\square\cdots\square\partial_{n_{i}}\left(\Delta^{n_{i}}\right)\square\cdots\square\partial_{n_{j}}\left(\Delta^{n_{j}}\right)\square\ldots\square\Delta^{n_{k}}

The NN-dimensional boundary operator applied to PP is defined as

N(P)=N(i=1kΔni)\displaystyle\displaystyle\partial_{N}\left(P\right)=\partial_{N}\left(\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}\right) =i=1k(1)α(i)[niΔni¯]\displaystyle=\sum_{i=1}^{k}(-1)^{\alpha(i)}\displaystyle[\overline{\partial_{n_{i}}\Delta^{n_{i}}}]

where α(i)==1i1n\alpha(i)=\sum_{\ell=1}^{i-1}n_{\ell} is the sum of the dimensions of the factors preceding the iith factor.

Since the boundary operator can only be applied to oriented chains ([Δn][\Delta^{n}] and [P][P]), and not graphs (Δn\Delta^{n} and GG), we omit the square brackets for brevity.

Note that given a prodsimplicial cell complex Γ\Gamma, N\partial_{N} defines a group homomorphism N:CN(Γ)CN1(Γ)\partial_{N}:C_{N}(\Gamma)\longrightarrow C_{N-1}(\Gamma). We verify that the above defined operator indeed defines a chain complex. Recall, for a simplex Δn+1\Delta^{n+1} with vertices {v0,v1,,vn+1}\{v_{0},v_{1},\ldots,v_{n+1}\}, we have (nn+1)(Δn+1)=0(\partial_{n}\circ\partial_{n+1})(\Delta^{n+1})=0.

Proposition 2.13 (2=0\partial^{2}=0 on prodsimplicial cells).

Let P=i=1kΔniP=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}} be a prodsimplicial (N+1)(N+1)-cell, where N+1=i=1kniN+1=\sum_{i=1}^{k}n_{i}. Then (NN+1)(P)=0(\partial_{N}\circ\partial_{N+1})(P)=0.

Proof.

By Definition 2.12, we have that

(NN+1)(P)\displaystyle(\partial_{N}\circ\partial_{N+1})(P) =(NN+1)(i=1kΔni)\displaystyle=(\partial_{N}\circ\partial_{N+1})\left(\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}\right)
=\displaystyle= N(i=1k(1)α(i)[niΔni¯])\displaystyle\partial_{N}\left(\sum_{i=1}^{k}(-1)^{\alpha(i)}[\overline{\partial_{n_{i}}\Delta^{n_{i}}}]\right)
=\displaystyle= j<i(1)α(i)(1)α(j)[njΔnj¯niΔni¯]+j>i(1)α(i)(1)α(j)1[niΔni¯njΔnj¯]\displaystyle\sum_{j<i}(-1)^{\alpha(i)}(-1)^{\alpha(j)}[\overline{\partial_{n_{j}}\Delta^{n_{j}}}\overline{\partial_{n_{i}}\Delta^{n_{i}}}]+\sum_{j>i}(-1)^{\alpha(i)}(-1)^{\alpha(j)-1}[\overline{\partial_{n_{i}}\Delta^{n_{i}}}\overline{\partial_{n_{j}}\Delta^{n_{j}}}]
+\displaystyle+ (ni1ni)(Δni).\displaystyle(\partial_{n_{i}-1}\circ\partial_{n_{i}})(\Delta^{n_{i}}).

The term (ni1ni)(Δni)(\partial_{n_{i}-1}\circ\partial_{n_{i}})(\Delta^{n_{i}}) is zero. Note that when j>ij>i, the dimension of the iith term has been decreased by one and hence the exponent on (1)(-1) for the sum over j>ij>i is α(j)1\alpha(j)-1.

Due to this difference of 1, the two sums (over j<ij<i and j>ij>i) have the same terms with opposite signs, so the final expression equals zero. ∎

With this boundary operator we can compute cycle groups, boundary groups, homology groups, and Betti numbers for prodsimplicial complexes of directed graphs.

3 Betti Numbers and Generators for Consistently Directed Graphs

We examine generators for the first and second homology groups and investigate possible values of Betti numbers in prodsimplicial complexes of directed graphs. We adopt the notation βn(Γ(G))\beta_{n}(\Gamma(G)) to denote the nnth Betti number of the prodsimplicial complex associated with a digraph GG, where Γ(G)\Gamma(G) is built through the gluing process defined in 2.10. For brevity, we abuse the notation and write βn(G)\beta_{n}(G) instead.

As is well known, β0\beta_{0} indicates the number of connected components. Since we study consistently directed digraphs, which consist of a single connected component, we take β0(G)=1\beta_{0}(G)=1 for all digraphs of interest.

By definition, consistently directed graphs admit no cycles in the graph theoretic sense. In the work that follows, where it causes no confusion, we use the term nn-cycles to refer to elements of cycle groups.

3.1 Generators of the First Homology Groups

Figure 4 shows two connected digraphs on three vertices. The one in Figure 4(a) is a closed path of the form (v0,v1,v2=v0)(v_{0},v_{1},v_{2}=v_{0}) and forms a 1-cycle, while simplicial digraphs, as shown in Figure 4(b) do not contribute to β1\beta_{1}. Although closed paths of the form (v0,v1,,vn=v0)(v_{0},v_{1},\ldots,v_{n}=v_{0}) contribute to β1\beta_{1}, they are absent in consistently directed graphs, so we exclude them in the search for homology group generators.

\psscalebox

1.0 1.0 (a)(b)

Figure 4: The only two possible connected digraphs on three vertices.

The only two consistently directed graphs on four vertices are shown in Figure 5. One is the union of an edge and a path of length three, as depicted in Figure 5(a), which results in a complex homeomorphic to the circle S1S^{1}. Another is a square cell of the form Δ1Δ1\Delta^{1}\square\Delta^{1}, shown in Figure 5(b). We summarize our observations as follows.

\psscalebox

1.0 1.0 (a)(b)

Figure 5: All consistenty directed graphs on four vertices.
Lemma 3.1.

Let G=(V,E)G=(V,E) where V={v0,v1,v2,v3}V=\{v_{0},v_{1},v_{2},v_{3}\} and E={[v0,v3],[v0,v1],[v1,v2],[v2,v3]}E=\{[v_{0},v_{3}],[v_{0},v_{1}],[v_{1},v_{2}],[v_{2},v_{3}]\}, as in Figure 5(a). Then β1(G)=1\beta_{1}(G)=1 and βn(G)=0\beta_{n}(G)=0 for all n>2n>2.

We can build graphs with arbitrarily large β1\beta_{1} values by attaching edge-disjoint paths of length 3 from source to target as follows.

Lemma 3.2.

For all k0k\geq 0, there exists a graph GG such that β1(G)=k\beta_{1}(G)=k and βn(G)=0\beta_{n}(G)=0 for n>2n>2.

\psscalebox

1.0 1.0 v0v_{0}v2k+1v_{2k+1}v1v_{1}v2v_{2}vkv_{k}\cdotsvk+1v_{k+1}vk+2v_{k+2}v2kv_{2k}

Figure 6: kk disjoint paths of length 3 running “parallel” to [v0,v2k+1][v_{0},v_{2k+1}].
Proof.

Let GG be the graph in Figure 6. Then, as a topological space, GG is homotopic equivalent to the bouquet of kk circles, hence β1(G)=k\beta_{1}(G)=k and βn(G)=0\beta_{n}(G)=0 for n>2n>2. ∎

Lemma 3.3.

For consistently directed graphs, all 1-cycles that represent nontrivial elements of the first homology group consist of paths with a common source and target p1=(s=u0,u1,,um=t)p_{1}=(s=u_{0},u_{1},\ldots,u_{m}=t) and p2=s=v0,v1,,vn=t)p_{2}=s=v_{0},v_{1},\ldots,v_{n}=t) where {u1,u2,,um1}{v1,v2,,vn1}=\{u_{1},u_{2},\ldots,u_{m-1}\}\cap\{v_{1},v_{2},\ldots,v_{n-1}\}=\varnothing.

3.2 Generators of the Second Homology Groups

This section will focus on directed graphs with vertices labeled viv_{i}, so we reserve the notation [v0,v1,,vn][v_{0},v_{1},\ldots,v_{n}] for simplices and label other low-dimensional prodsimplicial cells as sequences of vertices. For example, [v0,v1,v2,v3][v_{0},v_{1},v_{2},v_{3}] denotes a tetrahedron while v0v1v2v3v_{0}v_{1}v_{2}v_{3} denotes a square.

In dimension 2, the problem of finding all graphs whose corresponding prodsimplicial complex yields nontrivial 2-cycles is less straightforward. We present here a few of the simplest examples.

To consider generators of the second homology groups, we consider consistently directed graphs of at least five vertices, as Betti numbers for all consistently directed graphs on three and four vertices as shown in Figures 6 and 5. It can be checked by inspection that the only connected and consistently directed graphs on five edges and five vertices are pairs of paths as in Lemma 3.3. For an example with six edges and five vertices with β2(G)=1\beta_{2}(G)=1, we consider the graph consisting of three squares as shown in Figure 7.

Lemma 3.4.

Let GG be as shown in Figure 7. Then β1(G)=0\beta_{1}(G)=0, β2(G)=1\beta_{2}(G)=1 and βn(G)=0\beta_{n}(G)=0 for all n>2n>2.

\psscalebox

1.0 1.0 v0v_{0}v1v_{1}v2v_{2}v3v_{3}v4v_{4}

Figure 7: Consistently directed graph whose prodsimplicial complex consists of three squares, six edges and five vertices.
Proof.

We describe each chain group explicitly in terms of the vertices of GG. Note that squares are labeled forming a cycle, following the source vertex with its lowest lexicographic order neighbor. The chain groups, CnC_{n}, in the prodsimplicial complex associated with the graph are described by generators as follows:

C0\displaystyle C_{0} =[v0],[v1],[v2],[v3],[v4]\displaystyle=\langle[v_{0}],[v_{1}],[v_{2}],[v_{3}],[v_{4}]\rangle
C1\displaystyle C_{1} =[v0,v1],[v0,v2],[v0,v3],[v1,v4],[v2,v4],[v3,v4]\displaystyle=\langle[v_{0},v_{1}],[v_{0},v_{2}],[v_{0},v_{3}],[v_{1},v_{4}],[v_{2},v_{4}],[v_{3},v_{4}]\rangle
C2\displaystyle C_{2} =v0v1v4v2,v0v2v4v3,v0v1v4v3.\displaystyle=\langle v_{0}v_{1}v_{4}v_{2},v_{0}v_{2}v_{4}v_{3},v_{0}v_{1}v_{4}v_{3}\rangle.

In the graph any two paths of length 2 form a consistently directed square, so that the resulting prodsimplicial complex is homeomorphic to a sphere. It follows that β1(G)=0\beta_{1}(G)=0, β2(G)=1\beta_{2}(G)=1 and βn(G)=0\beta_{n}(G)=0 for all n>2n>2 as desired. ∎

In addition to this, we present another graph on five vertices that has nontrivial second homology group that also yields a 2-cycle.

Lemma 3.5.

Let GG be the graph in Figure 8(a). Then β1(G)=0\beta_{1}(G)=0, β2(G)=1\beta_{2}(G)=1 and βn(G)=0\beta_{n}(G)=0 for all n>2n>2.

Proof.

The chain groups in the prodsimplicial complex associated with the graph are as follows:

C0\displaystyle C_{0} =[v0],[v1],[v2],[v3],[v4],[v5]\displaystyle=\langle[v_{0}],[v_{1}],[v_{2}],[v_{3}],[v_{4}],[v_{5}]\rangle
C1\displaystyle C_{1} =[v0,v1],[v0,v2],[v1,v3],[v1,v4],[v2,v3],[v2,v4],[v3,v5],[v4,v5]\displaystyle=\langle[v_{0},v_{1}],[v_{0},v_{2}],[v_{1},v_{3}],[v_{1},v_{4}],[v_{2},v_{3}],[v_{2},v_{4}],[v_{3},v_{5}],[v_{4},v_{5}]\rangle
C2\displaystyle C_{2} =v0v1v3v2,v0v1v4v2,v1v3v5v4,v2v3v5v4.\displaystyle=\langle v_{0}v_{1}v_{3}v_{2},v_{0}v_{1}v_{4}v_{2},v_{1}v_{3}v_{5}v_{4},v_{2}v_{3}v_{5}v_{4}\rangle.

The four squares are connected in such a way that they form a complex homeomorphic to a sphere, from which the result follows. ∎

Remark 3.6.

It is possible to add edges connecting opposite vertices of a consistently directed square (dividing it into two simplicial digraphs) in either of the graphs depicted in Figure 7 or Figure 8. This yields graphs with complexes homeomorphic to a sphere, with different types of nontrivial polyhedral 2-cycles. For example, the graph in Figure 8(a) can be modified to obtain the graph in Figure 8(b).

(a) \psscalebox1.0 1.0 v0v_{0}v1v_{1}v2v_{2}v3v_{3}v4v_{4}v5v_{5} (b) \psscalebox1.0 1.0 v0v_{0}v1v_{1}v2v_{2}v3v_{3}v4v_{4}v5v_{5}
Figure 8: Graphs with nontrivial 2-cycles.

3.3 Realizability of Betti Number Combinations

Using some of the graphs from previous results, we construct graphs with larger order having specific homology groups. To prove this result we use the following lemma that follows from the Mayer-Vietoris Sequence [17].

Lemma 3.7.

Let GG and HH be consistently directed graphs such that GHG\cup H is consistently directed, Γ(GH)=Γ(G)Γ(H)\Gamma(G\cup H)=\Gamma(G)\cup\Gamma(H), H0(Γ(GH))=H_{0}(\Gamma(G\cap H))=\mathbb{Z}, and Hn(Γ(GH))=0H_{n}(\Gamma(G\cap H))=0 for n1n\geq 1. Then

Hn(Γ(GH))Hn(Γ(G))Hn(Γ(H))H_{n}(\Gamma(G\cup H))\cong H_{n}(\Gamma(G))\oplus H_{n}(\Gamma(H))

for n1n\geq 1.

As a direct result of this, we are able to “glue” together an arbitrary number of generator graphs to attain any combination of Betti numbers.

Corollary 3.8.

There exists a graph GG such that β1(G)=0\beta_{1}(G)=0, β2(G)=k\beta_{2}(G)=k and βn(G)=0\beta_{n}(G)=0 for all n>2n>2.

\psscalebox

1.0 1.0 v0v_{0}v1v_{1}v2v_{2}v3v_{3}v4v_{4}v5v_{5}v6v_{6}v7v_{7}v3k2v_{3k-2}v3(k1)v_{3(k-1)}v3k1v_{3k-1}v3kv_{3k}v3k+1v_{3k+1}\vdots

Figure 9: Graph whose corresponding prodsimplicial complex has second homology of rank kk.
Proof.

Let GG be as in Figure 9. Then, as a topological space, GG is equivalent to a wedge of spheres that are glued along an edge (where homology groups are trivial). As a result, we have β1(G)=0\beta_{1}(G)=0, β2(G)=k\beta_{2}(G)=k and βn(G)=0\beta_{n}(G)=0 for all n>2n>2. ∎

Corollary 3.9.

Let GG be as in Lemma 3.2 with vertex set {v0,v1,,v2k+1}\{v_{0},v_{1},\ldots,v_{2k+1}\} and β1(G)=k\beta_{1}(G)=k, and let HH be as in Corollary 3.8 with vertex set {u0,u1,,u3+1}\{u_{0},u_{1},\ldots,u_{3\ell+1}\} and β2(H)=\beta_{2}(H)=\ell so that v0=u3k+1v_{0}=u_{3k+1} and GG and HH only intersect at the vertex v0v_{0}.

Let GH=(V(G)V(H),E(G)E(H))G\cup H=(V(G)\cup V(H),E(G)\cup E(H)). Then GHG\cup H is consistently directed and satisfies

β1(GH)\displaystyle\beta_{1}(G\cup H) =k\displaystyle=k
β2(GH)\displaystyle\beta_{2}(G\cup H) =\displaystyle=\ell
βn(GH)\displaystyle\beta_{n}(G\cup H) =0 for n3.\displaystyle=0\ \text{ for }n\geq 3.
Proof.

It can be readily checked that the unique source and target of GHG\cup H are, respectively, u0u_{0} and v2k+1v_{2k+1}, so that GHG\cup H is consistently directed. By assumption, H0(Γ(GH))=H_{0}(\Gamma(G\cap H))=\mathbb{Z} and Hn(Γ(GH))=0H_{n}(\Gamma(G\cap H))=0 for n1n\geq 1 so that the result follows from Lemma 3.7. ∎

It is therefore possible to construct consistently directed graphs with first and second homology groups isomorphic to k\mathbb{Z}^{k} and \mathbb{Z}^{\ell}, respectively, for any positive integers kk and \ell. This construction is not minimal on the number of vertices. For example, the graph shown in Figure 10 follows a construction similar to that in Lemma 3.2 and achieves many of the same kk values using fewer vertices and edges.

\psscalebox

1.0 1.0 v0v_{0}vk+1v_{k+1}v1v_{1}v2v_{2}v3v_{3}vkv_{k}

Figure 10: Directed graph with (k12){k-1\choose 2} 2-cycles.
Lemma 3.10.

Let GG be a digraph with vertices V(G)={v0,v1,,vk+1}V(G)=\{v_{0},v_{1},\ldots,v_{k+1}\} and edges defined by the collection of paths (v0,vi,vk+1)(v_{0},v_{i},v_{k+1}) for 0<i<k+10<i<k+1. Then β1(G)=0\beta_{1}(G)=0, β2(G)=(k12)\beta_{2}(G)={k-1\choose 2}, and βn(G)=0\beta_{n}(G)=0 for all n>2n>2.

4 Word Graphs of Double Occurrence Words

In this section we focus on specific biomolecular processes where consistently directed graphs appear and the prodsimplicial complexes can be applied.

Massive rearrangement processes are observed during the development of somatic nuclei in certain species of ciliates such as Oxytricha trifallax’s. The recombination is guided by short DNA repeats flanking the DNA segments that are rearranged and guiding their order. These short repeats can be modeled by a sequence of double occurrence words, words where each symbol appears twice [7, 25, 22]. In particular, over 90% of DNA rearrangement in these species can be described through an iterated process of deletion of repeat and return words [4]. Repeat and return words are generalizations of square and palindromic factors in words and are of interest in language theory [4, 19, 2]. In this section we describe graphs associated with double occurrence words that model these DNA rearrangement process [3].

4.1 Double Occurrence Words

We call an ordered, countable set of symbols Σ\Sigma an alphabet. A word over Σ\Sigma is a finite sequence of the form w=a1a2anw=a_{1}a_{2}\ldots a_{n} where aiΣa_{i}\in\Sigma whose length, denoted |w||w|, is nn. We denote with Σ\Sigma^{*} the set of all words over Σ\Sigma, including the empty word, denoted by ϵ\epsilon. The set of all symbols comprising a word ww is denoted by Σ[w]\Sigma[w]. The reverse of a word w=a1a2anw=a_{1}a_{2}\ldots a_{n} is wR=anan1a2a1w^{R}=a_{n}a_{n-1}\ldots a_{2}a_{1}. The word vv is a factor of the word ww, denoted vwv\sqsubseteq w, if there exist w1,w2Σw_{1},w_{2}\in\Sigma^{\ast} such that w=w1vw2w=w_{1}vw_{2}. In this presentation we set Σ[n]\Sigma\subseteq[n] for some nn\in\mathbb{N}. For example w=122313w=122313 is a word over [3]={1,2,3}[3]=\{1,2,3\} of length |w|=6\lvert w\rvert=6. The reverse of w=122313w=122313 is the word wR=313221w^{R}=313221.

A word wΣw\in\Sigma^{*} is called a double occurrence word (DOW) if every symbol in Σ\Sigma appears in ww either zero or two times. We use ΣDOW\Sigma_{DOW} to denote the set of all DOWs over Σ\Sigma. Similarly, we call ww a single occurrence word (SOW) if each symbol in Σ\Sigma appears either once or not at all. The set of SOWs over Σ\Sigma is denoted by ΣSOW\Sigma_{SOW}. Since a DOW of length nn uses n/2n/2 distinct symbols, we say that the size of a DOW ww is |w|/2|w|/2. When we restrict ΣDOW\Sigma_{DOW} to DOWs of size less than or equal to nn, we denote the set by ΣDOWn\Sigma_{DOW}^{\leq n}.

A word wΣw\in\Sigma^{*} is said to be in ascending order if a1=min(Σ[w])a_{1}=\min(\Sigma[w]) and the first appearance of each symbol is the immediate successor of the largest of all the preceding symbols. For example, the word w1=122313w_{1}=122313 is a DOW in ascending order, while the DOW w2=133212w_{2}=133212 is not.

We say that w1w_{1} and w2w_{2} are ascending order equivalent, and write w1w2w_{1}\sim w_{2}, if there exists a bijection on Σ\Sigma inducing a morphism ff on Σ\Sigma^{\ast} such that f(w1)=w2f(w_{1})=w_{2}. Words w1=122313w_{1}=122313 and w2=133212w_{2}=133212 are equivalent via the bijective map given by: 11, 32, 23.1\mapsto 1,\;3\mapsto 2,\;2\mapsto 3. Since words in ascending order are unique, up to this equivalence we consider words in ascending order as representatives of the classes determined by the relation \sim.

From now on, we consider only equivalence classes of DOWs and abuse the notation by writing words in place of their equivalence class where no confusion arises. The following definition can be found in [9].

Definition 4.1 (repeat word, return word).

Let x,y,zΣx,y,z\in\Sigma^{*} and u(ΣΣ[w])SOWu\in(\Sigma\setminus\Sigma[w])_{SOW}. We say that

  • the word uuuu is a repeat word in w=xuyuzw=xuyuz and the word xyzxyz is obtained from ww by a repeat deletion denoted du(w)=xyzd_{u}(w)=xyz. In this case we call uu a repeat factor in ww.

  • the word uuRuu^{R} is a return word in w=xuyuRzw=xuyu^{R}z and the word xyzxyz is obtained from ww by a return deletion, also denoted du(w)=xyzd_{u}(w)=xyz. In this case we call uu a return factor in ww.

Repeat or return words, uuuu or uuRuu^{R}, where |u|=1\lvert u\rvert=1 are called trivial. We say that a word uuuu (resp. uuRuu^{R}) is a maximal repeat (resp. return) word in ww if there are no other repeat (resp. return) factors vv in ww containing uu with |v|>|u|\lvert v\rvert>\lvert u\rvert. Following [1], we use MwDOWM^{DOW}_{w} to denote the set of maximal repeat (uuuu) or return (uuRuu^{R}) words in ww. In addition, we define the set of repeat or return factors in ww as MwSOW={uw|uuMwDOW or uuRMwDOW}M^{SOW}_{w}=\{u\sqsubseteq w\ |\ uu\in M^{DOW}_{w}\text{ or }uu^{R}\in M^{DOW}_{w}\}.

Lemma 4.2.

[1] Let ww be a DOW of size nn. For each xΣ[w]x\in\Sigma[w] there exists a unique uMwSOWu\in M^{SOW}_{w} such that xΣ[u]x\in\Sigma[u].

The set of maximal repeat or return words may include trivial repeat or return words, as illustrated in Example 4.4. In ascending order, some of the factors may be equivalent so elements of MwDOWM^{DOW}_{w} (resp. MwSOWM^{SOW}_{w}) are always written as DOWs (resp. SOWs) and not their equivalence classes, so that every symbol in ww appears in some word in MwSOWM^{SOW}_{w}. We are interested in maximal repeat and return words, and present the following definition, which was adapted from the so-called “pattern reduction” process described in [1] and [19].

Definition 4.3 (successor, predecessor).

The set D(w)=uMwSOW{v|v is in ascending order and vdu(w)}D(w)=\bigcup_{u\in M^{SOW}_{w}}\{v\ |\ v\text{ is in ascending order and }v\sim d_{u}(w)\} is called the set of immediate successors of ww. If there exists a sequence of words w=w1,w2,,wn=ww=w_{1},w_{2},\ldots,w_{n}=w^{\prime} such that widui(wi1)w_{i}\sim d_{u_{i}}(w_{i-1}) for some choice of uiMwiSOWu_{i}\in M^{SOW}_{w_{i}}, we call ww^{\prime} a successor of ww and ww a predecessor of ww^{\prime}. Note that the empty word ϵ\epsilon is a successor of all words.

Example 4.4.

Let w=1234523541w=1234523541. The set of maximal repeat or return words in ww is MwDOW={11,2323,4554}M^{DOW}_{w}=\{11,2323,4554\} and MwSOW={1,23,45}M^{SOW}_{w}=\{1,23,45\}. Since d1(w)=2345235412341243d_{1}(w)=23452354\sim 12341243 and d23(w)=145541123321d_{23}(w)=145541\sim 123321, we have that the set of immediate successors of ww is D(w)={12341243,123321,123231}D(w)=\{12341243,123321,123231\}. We may continue to delete subwords from the successors as follows. The maximal repeat or return factors of 12341243,123321,12341243,123321, and 123231123231 are

M12341243SOW\displaystyle M^{SOW}_{12341243} ={12,34}\displaystyle=\{12,34\}
M123321SOW\displaystyle M^{SOW}_{123321} ={123}\displaystyle=\{123\}
M123231SOW\displaystyle M^{SOW}_{123231} ={1,23}\displaystyle=\{1,23\}

so that their deletions yield

d12(12341243)\displaystyle d_{12}(12341243) =34431221\displaystyle=3443\sim 1221
d34(12341243)\displaystyle d_{34}(12341243) =1212\displaystyle=1212
D(12341243)\displaystyle D(12341243) ={1221,1212}\displaystyle=\{1221,1212\}
d123(123321)\displaystyle d_{123}(123321) =ϵ\displaystyle=\epsilon
D(123321)\displaystyle D(123321) ={ϵ}\displaystyle=\{\epsilon\}
d1(123231)\displaystyle d_{1}(123231) =23231212\displaystyle=2323\sim 1212
d23(123231)\displaystyle d_{23}(123231) =11\displaystyle=11
D(123231)\displaystyle D(123231) ={1212,11}.\displaystyle=\{1212,11\}.

Repeating this process once more yields D(1212)=D(1221)=D(11)={ϵ}D(1212)=D(1221)=D(11)=\{\epsilon\}. We can therefore say that the set of all successors of ww is the set

{12341243,123321,123231,1221,1212,11,ϵ}.\{12341243,123321,123231,1221,1212,11,\epsilon\}.

4.2 Word Graphs

We refer the reader to [11] for elementary definitions in graph theory and to Section 2.1 for the definitions of a directed graph, source, and target.

Definition 4.5 (global word graph).

The global word graph Gn=(V,E)G_{n}=(V,E) of double occurrence words of size nn is the graph defined by:

  • V(Gn)=ΣDOWn/V(G_{n})=\Sigma_{DOW}^{\leq n}/_{\sim};

  • E(Gn)=wVEwE(G_{n})=\bigcup_{w\in V}E_{w}, where Ew={[w,v]|vD(w)}E_{w}=\{[w,v]\ |v\in D(w)\}.

For a vertex ww in GnG_{n}, we define the word graph rooted at ww, denoted GwG_{w}, as the induced subgraph of the global word graph containing as vertices ww and all of its successors.

By construction GwG_{w} does not contain any cycles, and has unique source ww and unique target ϵ\epsilon hence word graphs are consistently directed.

The figures in this section were computer generated. Though we do not include the characters in our presentation, the labels on the vertices are separated by commas to improve readability.

Example 4.6.

The global word graph G2G_{2} of size 22 is shown in Figure 11, with the word graph rooted at 11221122 highlighted in blue. The vertex set is V(G2)=ΣDOW2/={ϵ,11,1221,1212,1122}V(G_{2})=\Sigma_{DOW}^{\leq 2}/_{\sim}=\{\epsilon,11,1221,1212,1122\}. The word graph rooted at w=1234523541w=1234523541 whose successors are computed in Example 4.4 is depicted in Figure 12.

Refer to caption
Figure 11: Global word graph of size 22.
Refer to caption
Figure 12: Word graph rooted at 12345235411234523541.

4.3 DOW Operations and Their Effect on Word Graphs

4.3.1 Operations that Result in Isomorphic Word Graphs

We present some results on the cases where insertions, substitutions, or reversal do not affect the word graphs of DOWs. A prodsimplicial complex associated to wΣDOWw\in\Sigma_{DOW} is the prodsimplicial complex associated with GwG_{w}. We consider operations that yield classes of DOWs whose complexes have similar topological properties.

To find all predecessors of a given DOW ww we consider all possible insertions to ww up to ascending order equivalence. In [9] the insertions that yield equivalent DOWs and hence corresponding isomorphic word graphs were characterized. We present here results about DOWs that are not ascending order equivalent but yield isomorphic word graphs.

Definition 4.7.

Let w,wΣDOWw,w^{\prime}\in\Sigma_{DOW} be such that Σ[w]Σ[w]=\Sigma[w]\cap\Sigma[w^{\prime}]=\varnothing. We define the concatenation of ww and ww^{\prime} as the DOW that is ascending order equivalent to wwww^{\prime}.

Proposition 4.8.

Let wΣDOWw\in\Sigma_{DOW}, and uMwSOWu\in M^{SOW}_{w}. Let v(ΣΣ[w])SOWv\in(\Sigma\setminus\Sigma[w])_{SOW} and let

w\displaystyle w^{\prime} =xu1vu2yu1vu2z\displaystyle=xu_{1}vu_{2}yu_{1}vu_{2}z
w′′\displaystyle w^{\prime\prime} =xu1vu2yu2RvRu1Rz\displaystyle=xu_{1}vu_{2}yu_{2}^{R}v^{R}u_{1}^{R}z

where u1,u2Σu_{1},u_{2}\in\Sigma^{*} such that u=u1u2u=u_{1}u_{2}. Then GwGwGw′′G_{w}\cong G_{w^{\prime}}\cong G_{w^{\prime\prime}}.

Proof.

In this case u1vu2u1vu2u_{1}vu_{2}u_{1}vu_{2} is a maximal repeat word in ww^{\prime}. Note that MwSOW=(MwSOW{u}){u1vu2}M^{SOW}_{w^{\prime}}=(M^{SOW}_{w}\setminus\{u\})\cup\{u_{1}vu_{2}\} but du(w)=du1vu2(w)d_{u}(w)=d_{u_{1}vu_{2}}(w^{\prime}). Hence, the word graphs of ww and ww^{\prime} are isomorphic: GwGwG_{w^{\prime}}\cong G_{w}. The case when uuRuu^{R} is a maximal return word is similar. ∎

We call wΣDOWw\in\Sigma_{DOW} a palindrome if wRww^{R}\sim w. For example, the DOW w=123231w=123231 is a palindrome, as wR=132321123231w^{R}=132321\sim 123231. For a palindrome ww, both ww and wRw^{R} are in the same ascending order equivalence class so that the corresponding word graphs are the same. As a result, the reversal operation has no effect on the prodsimplicial complex obtained.

Lemma 4.9.

Let wΣDOWw\in\Sigma_{DOW}. Then GwGwRG_{w}\cong G_{w^{R}}.

Proof.

Note that uu is a repeat (resp. return) word in ww if and only if uRu^{R} is a repeat (resp. return) word in wRw^{R}. Then if vv is a successor of uu, for each edge [u,v]Gw[u,v]\in G_{w} we have a corresponding edge [uR,vR]E(GwR)[u^{R},v^{R}]\in E(G_{w^{R}}). This bijection between the edges induces an isomorphism between the graphs. ∎

Example 4.10.

Let w=122133w=122133. Then wR=331221112332w^{R}=331221\sim 112332. The word graphs rooted at 122133122133 and 112332112332 are isomorphic, as shown in Figure 13.

Refer to captionRefer to caption
Figure 13: The DOW 112332112332 is ascending order equivalent to 122133R122133^{R}, so G112332G_{112332} is isomorphic to G122133G_{122133}.

For the rest of this section, we consider the effect of substituting a repeat word uuuu by the return word uuRuu^{R} on word graphs. In some cases we may substitute a maximal repeat word in a DOW ww for a maximal return word in ww without affecting the word graph. The following properties seem to play an important role in this context.

Definition 4.11 (square repeat or return words).

Let wΣDOWw\in\Sigma_{DOW} and let uMwSOWu\in M^{SOW}_{w}. We say that uu is a square factor of ww if there exists vMwSOWv\in M^{SOW}_{w} such that vuv\neq u but |v|=|u|\lvert v\rvert=\lvert u\rvert. A DOW ww is said to be squarefree if it has no square factors.

Note that if ww is squarefree and uuMwDOWuu\in M^{DOW}_{w} then vv,vvRMwDOWvv,vv^{R}\not\in M^{DOW}_{w} for all vMwSOWv\in M^{SOW}_{w} with |u|=|v|\lvert u\rvert=\lvert v\rvert. Since no repeat or return factors appear more than once in ww, we also have that all deletions du(w)d_{u}(w) for uMwSOWu\in M^{SOW}_{w} result in distinct DOWs. In particular, all subwords of ww are also squarefree.

Example 4.12 (square factor, squarefree words).

Let w=12123434w=12123434. Then MwDOW={12,34}M^{DOW}_{w}=\{12,34\} and 1212 is a square repeat word in ww. Let x,y,zΣSOWx,y,z\in\Sigma_{SOW} have distinct lengths. Then w=xyxzyzw^{\prime}=xyxzyz is squarefree.

Definition 4.13 (coprime).

Given two DOWs ww and ww^{\prime} where Σ[w]Σ[w]=\Sigma[w]\cap\Sigma[w^{\prime}]=\varnothing, we say ww is coprime to ww^{\prime} if all words of the form uvuv where uV(Gw)u\in V(G_{w}) and vV(Gw)v\in V(G_{w^{\prime}}) are distinct in ascending order.

By the definition for coprime words, words ww and ww^{\prime} are coprime if for all u,uV(Gw)u,u^{\prime}\in V(G_{w}) and all v,vV(Gw)v,v^{\prime}\in V(G_{w^{\prime}}), uuu\neq u^{\prime} and vvv\neq v^{\prime} implies uv≁uvuv\not\sim u^{\prime}v^{\prime}.

Example 4.14 (coprime words).

The word w=12234143w=12234143 has successors {1221,112332,123132,11,ϵ}\{1221,112332,123132,11,\epsilon\}, and the word w=5678978956w^{\prime}=5678978956 has successors {5656,789789,ϵ}.\{5656,789789,\epsilon\}. It can be checked that all concatenations are distinct, therefore ww and ww^{\prime} are coprime.

Lemma 4.15.

Let ww^{\prime} be a successor of ww. Then GwG_{w^{\prime}} is an induced subgraph of GwG_{w}.

Lemma 4.16.

If ww is coprime to ww^{\prime}, then ww^{\prime} is coprime to ww.

Proof.

Suppose ww^{\prime} is not coprime to ww and let u,uV(Gw)u,u^{\prime}\in V(G_{w}) (where uuu\neq u^{\prime}) and v,vV(Gw)v,v^{\prime}\in V(G_{w^{\prime}}) (where vvv\neq v^{\prime}) be such that uv=uvuv=u^{\prime}v^{\prime}. Note that since ϵV(Gw)V(Gw)\epsilon\in V(G_{w})\cap V(G_{w^{\prime}}), if there exists a nonempty DOW xV(Gw)V(Gw)x\in V(G_{w})\cap V(G_{w^{\prime}}) then ww cannot be coprime to ww^{\prime}, as ϵx=xϵ=x\epsilon x=x\epsilon=x. Without loss of generality, let |u|>|u|\lvert u\rvert>\lvert u^{\prime}\rvert Then u=uxu^{\prime}=ux for some DOW xx (since uΣDOWu\in\Sigma_{DOW}) and v=xvv=xv^{\prime}, so that MvSOW=MvSOWMxSOWM^{SOW}_{v}=M^{SOW}_{v^{\prime}}\cup M^{SOW}_{x}. Moreover, for all yMvSOWy\in M^{SOW}_{v^{\prime}}, if v1=dy(v)v_{1}=d_{y}(v) then v=xv1v=xv_{1}^{\prime} where v1=dy(v)v_{1}^{\prime}=d_{y}(v^{\prime}). Inductively, if vv reduces to v1,v2,,ϵv_{1}^{\prime},v_{2}^{\prime},\ldots,\epsilon via the deletion of y1,y2,,yky_{1},y_{2},\ldots,y_{k}, then vv reduces to xx via the deletion of y1,y2,,yky_{1},y_{2},\ldots,y_{k}. That is, xV(Gw)x\in V(G_{w}). Similarly, if u=uxu^{\prime}=ux we have that xV(Gw)x\in V(G_{w^{\prime}}), so that ww^{\prime} is not coprime to ww. ∎

Given this symmetry, instead of saying that ww is coprime to ww^{\prime}, we say that ww and ww^{\prime} are coprime.

In the following examples, w=xyzw=xyz and uu are coprime words and the word ww^{\prime} (resp. w′′w^{\prime\prime}) resulting from the insertion of uuuu (resp. uuRuu^{R}) in ww is squarefree. In one instance, the substitution results in GwGwG_{w^{\prime}}\cong G_{w}, while in another it does not. We conjecture that squarefree and coprime are necessary conditions for invariance of word graphs under substitution of repeat and return words.

Example 4.17 (substitution results in isomorphic word graphs).

Let x=12x=12, y=345y=345, z=54312z=54312 and u=6789u=6789. Then w=xuyuz123456789345987612w^{\prime}=xuyuz\sim 123456789345987612 and w′′=xuyuRz123456789543987612w^{\prime\prime}=xuyu^{R}z\sim 123456789543987612. Here, uuuu and xyzxyz are coprime, and both insertions result in squarefree words, with GwGw′′G_{w^{\prime}}\cong G_{w^{\prime\prime}} as depicted in Figure 14.

Refer to captionRefer to caption
Figure 14: The word graphs of w=xuyuzw^{\prime}=xuyuz (left) and w′′=xuyuRzw^{\prime\prime}=xuyu^{R}z (right) are isomorphic, as they both correspond to cubes.
Example 4.18 (substitution does not result in isomorphic word graphs).

Let x=1x=1, y=12345y=12345, z=5432z=5432 and u=67u=67. Then w=xuyuz12314567237654w^{\prime}=xuyuz\sim 12314567237654 and w′′=xuyuRz12314567327654w^{\prime\prime}=xuyu^{R}z\sim 12314567327654. Here we have that uuuu is coprime to xyzxyz and the resulting insertions are squarefree but Gw≇Gw′′G_{w^{\prime}}\not\cong G_{w^{\prime\prime}} as depicted in in Figure 15.

Refer to captionRefer to caption
Figure 15: The word graphs of w=xuyuzw^{\prime}=xuyuz (left) and w′′=xuyuRzw^{\prime\prime}=xuyu^{R}z (right) are not isomorphic.

4.3.2 Doubling Effect

Concatenation of a repeat (resp. return) word uuuu (resp. uuRuu^{R}) at the end of an existing word ww creates two subgraphs isomorphic to GwG_{w} within GwuuG_{wuu} (resp. GwuuRG_{wuu^{R}}). These two subgraphs may not be disjoint unless we impose additional conditions, as discussed in the examples below.

Lemma 4.19.

Let wΣDOWw\in\Sigma_{DOW} and uu(ΣΣ[w])DOWuu\in(\Sigma\setminus\Sigma[w])_{DOW} be coprime. Then there exist two distinct (but possibly not disjoint) subgraphs G1G_{1} and G2G_{2} of GwuuG_{wuu} isomorphic to GwG_{w}. Similarly, GuuwG_{uuw} has two subgraphs isormophic to GwG_{w}. Moreover, we have GwuuGuuwG_{wuu}\cong G_{uuw}.

Proof.

Without loss of generality we prove the result for GwuuG_{wuu} only. We claim that for each vV(Gw)v\in V(G_{w}), the word vuuvuu is ascending order equivalent to some vV(Gwuu)v^{\prime}\in V(G_{wuu}). Indeed, if vv is obtained from ww through iterated deletions so that

vdx1(dx2(dx3(dxn(w))))v\sim d_{x_{1}}(d_{x_{2}}(d_{x_{3}}(\cdots d_{x_{n}}(w))\cdots))

where xi≁ux_{i}\not\sim u for all 1in1\leq i\leq n, then vvuuv^{\prime}\sim vuu is obtained from wuuwuu through

vuudx1(dx2(dx3(dxn(wuu)))).vuu\sim d_{x_{1}}(d_{x_{2}}(d_{x_{3}}(\cdots d_{x_{n}}(wuu))\cdots)).

Let ι:V(Gw)V(Gwuu)\iota:V(G_{w})\rightarrow V(G_{wuu}) be the inclusion of vertices and let G1G_{1} be the graph induced by the image of GwG_{w} in GwuuG_{wuu}. Let G2G_{2} be the induced subgraph of GwuuG_{wuu} generated by the set of vertices of the form vvuuv^{\prime}\sim vuu for vV(Gw)v\in V(G_{w}). Define f:V(G1)V(G2)f:V(G_{1})\rightarrow V(G_{2}) where vvvuuv\mapsto v^{\prime}\sim vuu. Due to the above correspondence, ff is a bijection on the sets of vertices. Note that if [v1,v2]Gw[v_{1},v_{2}]\in G_{w} then v2dx(v1)v_{2}\sim d_{x}(v_{1}) for some xMv1SOWx\in M^{SOW}_{v_{1}}. This holds if and only if v2uudx(v1uu)v_{2}uu\sim d_{x}(v_{1}uu) or [v1uu,v2uu]Gwuu[v_{1}uu,v_{2}uu]\in G_{wuu} so that ff induces a bijection between edge sets that preserves vertex adjacency. It follows that G1G2G_{1}\cong G_{2}. ∎

Example 4.20.

The word graph corresponding to 1213234412132344 has two subgraphs isomorphic to the pentagon G121323G_{121323} in it as described in Lemma 4.19, which are not disjoint. This can be seen in Figure 16, where the two subgraphs are highlighted in different colors.

Refer to caption
Figure 16: Word graph of 1213234412132344 with two highlighted subgraphs isomorphic to G121323G_{121323}.

4.3.3 Product Effect

In this subsubsection we study word graphs of concatenated words.

Theorem 4.21.

Let w,wΣDOWw,w^{\prime}\in\Sigma_{DOW} be coprime. Then we have GwwGwGwG_{ww^{\prime}}\cong G_{w}\square G_{w^{\prime}}.

Proof.

We begin by noting that from the definition of concatenation, every vertex in GwwG_{ww^{\prime}} can be decomposed into uivju_{i}v_{j} where uiuV(Gw)u_{i}\sim u\in V(G_{w}) and vjvV(Gw)v_{j}\sim v\in V(G_{w^{\prime}}). We define a map f:V(GwGw)V(Gww)f:V(G_{w}\square G_{w^{\prime}})\rightarrow V(G_{ww^{\prime}}) where (u,v)uivj(u,v)\mapsto u_{i}v_{j}. By hypothesis, since ww and ww^{\prime} are coprime, uuu\neq u^{\prime} and vvv\neq v^{\prime} implies uv≁uvuv\not\sim u^{\prime}v^{\prime} for all u,uV(Gw)u,u^{\prime}\in V(G_{w}) and all v,vV(Gw)v,v^{\prime}\in V(G_{w^{\prime}}). This makes ff one-to-one. Surjectivity follows from the fact that no successors of wwww^{\prime} will arise other than by concatenation of successors of ww and successors of ww^{\prime}. It follows that ff is a bijection on the sets of vertices.

We abuse the notation and write the vertices of GwwG_{ww^{\prime}} as uvuv where Σ[u]Σ[v]=\Sigma[u]\cap\Sigma[v]=\varnothing, with uuV(Gw)u\sim u^{\prime}\in V(G_{w}) and vvV(Gw)v\sim v^{\prime}\in V(G_{w^{\prime}}) following the decomposition described above. Let [(ui,vj),(ui,vj)]E(GwGw)[(u_{i},v_{j}),(u_{i}^{\prime},v_{j}^{\prime})]\in E(G_{w}\square G_{w^{\prime}}). Then either ui=uiu_{i}=u_{i}^{\prime} and vjdx(vj)v_{j}^{\prime}\sim d_{x}(v_{j}) for some xMvjSOWx\in M^{SOW}_{v_{j}} or vj=vjv_{j}=v_{j}^{\prime} and uidy(ui)u_{i}^{\prime}\sim d_{y}(u_{i}) for some yMuiSOWy\in M^{SOW}_{u_{i}}. Note that since ww and ww^{\prime} share no symbols we may write MwwDOW=MwDOWMwDOWM^{DOW}_{ww^{\prime}}=M^{DOW}_{w}\sqcup M^{DOW}_{w^{\prime}}. We observe that for uivj,uivjV(Gww)u_{i}v_{j},u_{i}^{\prime}v_{j}^{\prime}\in V(G_{ww^{\prime}}) there exists an edge [uivj,uivj]E(Gww)[u_{i}v_{j},u_{i}^{\prime}v_{j}^{\prime}]\in E(G_{ww^{\prime}}) if and only if ui=uiu_{i}=u_{i}^{\prime} and [vj,vj]E(Gw)[v_{j},v_{j}^{\prime}]\in E(G_{w^{\prime}}) or [ui,ui]E(Gw)[u_{i},u_{i}^{\prime}]\in E(G_{w}) and vj=vjv_{j}=v_{j}^{\prime}. This implies that [(ui,vj),(ui,vj)]E(GwGw)[(u_{i},v_{j}),(u_{i}^{\prime},v_{j}^{\prime})]\in E(G_{w}\square G_{w^{\prime}}) if and only if [f((ui,vj)),f((ui,vj))]E(Gww)[f((u_{i},v_{j})),f((u_{i}^{\prime},v_{j}^{\prime}))]\in E(G_{ww^{\prime}}), which indicates that GwwGwGwG_{ww^{\prime}}\cong G_{w}\square G_{w^{\prime}} as desired. ∎

In the case where ww^{\prime} is a repeat or return word, with word graph having two vertices connected by an edge, we have the following corollary.

Remark 4.22.

Note that if wΣDOWw\in\Sigma_{DOW} and uΣSOWu\in\Sigma_{SOW} are such that uu≁vuu\not\sim v for all vV(Gw)v\in V(G_{w}), then GwuuGwΔ1G_{wuu}\cong G_{w}\square\Delta^{1}. Similarly, if uuR≁vuu^{R}\not\sim v for all vV(Gw)v\in V(G_{w}) then GwuuRGwΔ1G_{wuu^{R}}\cong G_{w}\square\Delta^{1}.

5 Betti Numbers and Generators for Word Graphs

Having shown the result in general for arbitrary graphs, we now focus on consistently directed graphs corresponding to word graphs of DOWs to study the realizability of Betti numbers. Note that for DOWs ww and ww^{\prime}, if wMwDOWw\in M^{DOW}_{w^{\prime}} then GwGwG_{w}\leq G_{w^{\prime}}. To find generators for the first and second homology groups among word graphs of DOWs we therefore turn to minimal size words whose corresponding graphs attain particular β1\beta_{1} and β2\beta_{2} values.

A Python script (available at https://github.com/fajardogomez/dowgraphs) was used to create the prodsimplicial complex associated with any directed graph GG and compute Betti numbers.

5.1 Betti Numbers of Word Graphs

Let us consider the minimal size nontrivial 1-cycles among word graphs of DOWs. We claim that for the DOW ww, GwG_{w} satisfies H1(Γ(Gw))H_{1}(\Gamma(G_{w}))\cong\mathbb{Z} and all other homology groups are trivial:

1.121323,2.122331.1.\quad 121323,\hskip 42.67912pt2.\quad 122331.

The corresponding word graphs are shown in Figure 17.

Refer to captionRefer to caption
Figure 17: The word graphs for DOWs 121323121323 and 122331122331 are isomorphic and contain nontrivial 1-cycles in their prodsimplicial complexes.

The result follows from Lemma 3.3, as the graphs consist of parallel paths from source to target of lengths 2 and 3, five edges in total, forming a pentagon. The DOWs on the least number of symbols that result in a generator for the first homology group are 121323121323 and 122331122331.

Note that a 11-cycle that is a square cannot be a generator of the first homology group because every square in a word graph (DOW graphs) consists of two paths of length 2, and therefore bounds a 2-cell. In all examples we have examined, nn-gons representing nontrivial 1-cycles for n6n\geq 6 can be reduced to pentagons, so we conjecture that all nontrivial 1-cycles are homologous to pentagons.

Example 5.1.

Let u1u_{1}, u2u_{2} and α\alpha be SOWs with Σ[u1]Σ[u2]Σ[α]=\Sigma[u_{1}]\cap\Sigma[u_{2}]\cap\Sigma[\alpha]=\varnothing. The following word patterns result in pentagons and generalize the word graphs of 121323121323 and 122331122331: αu1u1Ru2u2RαR\alpha u_{1}u_{1}^{R}u_{2}u_{2}^{R}\alpha^{R} and u1αu1u2αu2u_{1}\alpha u_{1}u_{2}\alpha u_{2} provided that |u1|=|u2|\lvert u_{1}\rvert=\lvert u_{2}\rvert.

On the other side, all types of polyhedra that represent generators of nontrivial second homology discussed in Section 3.2 can be realized in word graphs.

The complex corresponding to the word graph rooted at 1232341412323414 is homeomorphic to that of the graph in Lemma 3.5. This can be verified by referring to Figure 18. Other word graphs have subgraphs isomorphic to those described in Lemma 3.10 and Lemma 3.4. For example, the word graph rooted at 1232314412323144 contains both.

Refer to caption
Figure 18: Word graph rooted at 1232341412323414.
(a) Refer to caption (b) Refer to caption
Figure 19: (a) Word graph rooted at 12323144 with a subgraph isomorphic to that in Figure 7 highlighted. (b) Word graph rooted at 12323144 with a subgraph isomorphic to that in Figure 8(a) highlighted.

5.2 Betti Numbers of Successors

In general, if uu is a successor of ww, it does not follow that βn(Gw)βn(Gu)\beta_{n}(G_{w})\geq\beta_{n}(G_{u}). Inspired by the work in [3], we define the nn-symbol tangled cord as the word

tn:=1213243(n1)(n2)n(n1)n.t_{n}:=1213243\cdots(n-1)(n-2)n(n-1)n.

Note that dn(tn)=tn1d_{n}(t_{n})=t_{n-1} so that tn1V(Gtn)t_{n-1}\in V(G_{t_{n}}) for any n>1n>1. For example, t6=121324354656t_{6}=121324354656 has (the word corresponding to) the tangled cord of size five t5=1213243545t_{5}=1213243545 as a successor. Based on results from the custom Python script, we have β1(Gt6)=1\beta_{1}(G_{t_{6}})=1 and β1(Gt5)=2\beta_{1}(G_{t_{5}})=2. This is the smallest pair of consecutive tangled cords where this inversion in Betti numbers is present. The next inversion occurs for β2\beta_{2} when n=11n=11, as shown in Table 1.

nn tnt_{n} β1(Gtn)\beta_{1}(G_{t_{n}}) β2(Gtn)\beta_{2}(G_{t_{n}}) |V(Gtn)|\lvert V(G_{t_{n}})\rvert
2 1,2,1,2 0 0 2
3 1,2,1,3,2,3 1 0 5
4 1,2,1,3,2,4,3,4 1 2 8
5 1,2,1,3,2,4,3,5,4,5 2 6 13
6 1,2,1,3,2,4,3,5,4,6,5,6 1 27 21
7 1,2,1,3,2,4,3,5,4,6,5,7,6,7 1 54 34
8 1,2,1,3,2,4,3,5,4,6,5,7,6,8,7,8 1 86 55
9 1,2,1,3,2,4,3,5,4,6,5,7,6,8,7,9,8,9 1 111 89
10 1,2,1,3,2,4,3,5,4,6,5,7,6,8,7,9,8,10,9,10 1 126 144
11 1,2,1,3,2,4,3,5,4,6,5,7,6,8,7,9,8,10,9,11,10,11 1 116 233
12 1,2,1,3,2,4,3,5,4,6,5,7,6,8,7,9,8,10,9,11,10,12,11,12 1 112 377
Table 1: Tangled cords and invariants of their word graphs.

It is of interest to find conditions for a successor ww^{\prime} of WW such that βn(Gw)βn(Gw)\beta_{n}(G_{w})\geq\beta_{n}(G_{w^{\prime}}) does not hold for some n1n\geq 1.

6 Concluding Remarks

In this paper, a method to study topological properties of digraphs is proposed, by constructing a prodsimplicial complex for a given digraph. The construction provides an approach for TDA to be used on biological data that can be described with graphs. A specific such example, called the word graph, is presented that model assembly pathways in a ciliate species via DOWs [19]. The proposed construction of prodsimplicial complexes is applied to word graphs, and the Betti numbers are determined. The effect on word graphs under various operations, such as concatenation of words, are studied, and types of generators of homology were examined.

A number of problems remain unsolved. Lemma 3.3 characterizes generators of the first homology groups for digraphs and Corollary 3.9 addresses the realizability of combinations of β1\beta_{1} and β2\beta_{2}. However, we do not know all possible combinations of values of Betti numbers β1\beta_{1} and β2\beta_{2} over all word graphs. Though only a few specific examples of DOWs where the corresponding word graph has nontrivial 1-cycles and 2-cycles are included, more can be obtained through the DOW operations described in Section 4.3.1. For example, the concatenation of 121323121323 with 45454545 corresponds to the Cartesian product of the word graphs (as shown in Remark 4.22) and therefore has no effect on Betti numbers so that β1(G1213234545)=β1(G121323)=1\beta_{1}(G_{1213234545})=\beta_{1}(G_{121323})=1. However, these methods would not create an exhaustive list of all graphs attaining particular given Betti numbers. It is desirable to characterize the effect of other word operations, such as insertions that disrupt maximal repeat or return words and thus may alter the homology groups of the complex on their word graphs.

We expect that the list of types of 1- and 2-cycles that represent nontrivial classes, such as the ones in Lemma 3.3, Lemma 3.4, and Lemma 3.5 may not be exhausted. We found pentagons as in Figure 17 as nontrivial 1-cycles in word graphs. We do not know whether there are other polygons that represent nontrivial cycles. The SageMath [10] topology package has an option to find generators for the homology groups. However, the linear combinations in the output are not necessarily optimized to produce minimal generators. For instance, a generator for the first homology group in the word graph of t10t_{10} is listed as a linear combination of nine edges. However, when these are drawn as an induced subgraph it becomes apparent that there exist edges between the vertices in the cycle so that it is homologous to a class represented by a pentagon.

It is also of interest to classify possible polyhedra that represent nontrivial second homology classes. We gave only partial answers in Section 3. Polyhedra such as “pillows” formed by two squares sharing all four boundary edges, are not allowed in the construction of the prodsimplicial complex. All word graphs computed here with nontrivial 1-cycles or 2-cycles in their corresponding prodsimplicial complex have subgraphs isomorphic to those in Lemmas 3.1, 3.2, 3.3, 3.4, 3.10 and 3.5. The problem of finding an exhaustive list of generators using a minimal number of vertices and edges remains open.

As a note on homology groups in general, SageMath outputs full groups instead of just Betti numbers as presented here. In an analysis of all DOWs of up to 7 symbols, no DOWs had a word graph whose corresponding prodsimplicial complex had torsion [12]. As the number of DOWs for each size increases superexponentially, homology groups were computed on longer DOWs for only samples of randomly generated DOWs and special word categories. As an interesting find, the homology groups of the tangled cord on ten symbols, t10t_{10}, had 2-torsion in the first (or second) homology group. Due to the large number of vertices and edges in the generator, however, it is not possible to find a type of generator of the 2-torsion. It remains an open problem to find and characterize DOWs whose corresponding prodsimplicial complexes have torsion in their homology groups.

The construction of prodsymlicial complexes for directed graphs presented in this paper is a first attempt to apply TDA to data sets consisting of graphs, designed specifically for the biological outputs called word graphs. Other graph outputs have been obtained in biology, and development of TDA for such outputs in a more general situations is desirable. More generally, we propose constructions of custom-built cell complexes designed for the purpose of studying individual biological problems. Polyhedral cells are to be specified to build a complex, in such a way that are closed under boundary operators, and that are appropriate for a given biological situation.

Word graphs of DOWs correspond to reductions of chord diagrams, and we expect applications to areas related to chord diagrams. The reductions of repeat and return words can be generalized to other rewriting systems.It is desirable to apply similar constructions of complexes and use of their homology for confluent rewriting systems.

Acknowledgments

This research was (partially) supported by the grants NSF DMS-2054321, CCF-2107267, The Simon’s Fellow grant from the Simons Foundation, the W.M. Keck Foundation. In addition this research was under auspices of the Southeast Center for Mathematics and Biology, an NSF-Simons Research Center for Mathematics of Complex Biological Systems, under National Science Foundation Grant No. DMS-1764406 and Simons Foundation Grant No. 594594.

References

  • [1] Ryan Arredondo “Reductions on Double Occurrence Words” In arXiv.org Ithaca: Cornell University Library, arXiv.org, 2013
  • [2] Ryan C. Arredondo “Properties of graphs used to model DNA recombination” ProQuest Dissertations Publishing, 2014
  • [3] Jonathan Burns et al. “Four-regular graphs with rigid vertices associated to DNA recombination” In Discrete Applied Mathematics 161.10, 2013, pp. 1378–1394
  • [4] Jonathan Burns et al. “Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax” In Journal of Theoretical Biology 410, 2016, pp. 171–180
  • [5] J. Carter, Victoria Lebed and Seung Yeop Yang “A prismatic classifying space” In Nonassociative mathematics and its applications 721, Contemp. Math. Amer. Math. Soc., Providence, RI, 2019, pp. 43–68
  • [6] J. Carter, Atsushi Ishii, Masahico Saito and Kokoro Tanaka “Homology for quandles with partial group operations” In Pacific Journal of Mathematics 287.1, 2017, pp. 19–48
  • [7] Andre R.. Cavalcanti and Laura F. Landweber “Insights into a Biological Computer: Detangling Scrambled Genes in Ciliates” In Nanotechnology: Science and Computation, Natural Computing Series Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 349–359
  • [8] “Classes of Directed Graphs”, Springer Monographs in Mathematics Cham: Springer International Publishing, 2018
  • [9] Daniel A. Cruz et al. “Insertions Yielding Equivalent Double Occurrence Words” In Fundamenta Informaticae 171, 2020, pp. 113–132
  • [10] The Sage Developers et al. “SageMath, version 9.0”, 2020 URL: http://www.sagemath.org
  • [11] Reinhard Diestel “Graph Theory (Graduate Texts in Mathematics)” New York: Springer, 2005
  • [12] Lina Fajardo Gómez “Methods in Discrete Mathematics to Study DNA Rearrangement Processes”, 2022
  • [13] Joan Feigenbaum “Directed cartesian-product graphs have unique factorizations that can be computed in polynomial time” In Discrete Applied Mathematics 15.1, 1986, pp. 105–110
  • [14] Alexander Grigor’yan, Yong Lin, Yuri Muranov and Shing-Tung Yau “Homologies of path complexes and digraphs”, 2013 arXiv:1207.2834 [math.CO]
  • [15] Alexander A. Grigor’yan, Yu V. Muranov, Yong Lin and Shing-Tung Yau “Path Complexes and their Homologies” In Journal of Mathematical Science 248, 2020, pp. 564–599
  • [16] Mustafa Hajij, Nataša Jonoska, Denys Kukushkin and Masahico Saito “Graph based analysis for gene segment organization In a scrambled genome” In Journal of Theoretical Biology 494, 2020, pp. 110215
  • [17] Allen Hatcher “Algebraic Topology”, Algebraic Topology New York: Cambridge University Press, 2002
  • [18] Wilfried Imrich and Sandi Klavžar “Product Graphs: Structure and Recognition” New York: Wiley, 2000
  • [19] Nataša Jonoska, Lukas Nabergall and Masahico Saito “Patterns and Distances in Words Related to DNA Rearrangement” In Fundamenta Informaticae 154, 2017, pp. 225–238
  • [20] Dimitry Kozlov “Combinatorial Algebraic Topology” New York: Springer Science & Business Media, 2007
  • [21] Paolo Masulli and Alessandro E. Villa “The topology of the directed clique complex as a network invariant” In SpringerPlus 5.1 Cham: Springer International Publishing, 2016, pp. 388–388
  • [22] David M. Prescott and Arthur F. Greslin “Scrambled actin I gene in the micronucleus of Oxytricha nova” In Developmental genetics 13.1 Hoboken: Wiley Subscription Services, Inc., A Wiley Company, 1992, pp. 66–74
  • [23] Michael W Reimann et al. “Cliques of Neurons Bound into Cavities Provide a Missing Link between Structure and Function” In Frontiers in computational neuroscience 11 Switzerland: Frontiers Research Foundation, 2017, pp. 48–48
  • [24] Vadim Georgievich Vizing “The cartesian product of graphs.” In Vyčisl. Sistemy 9, 1963, pp. 30–43
  • [25] V Talya Yerlici and Laura F Landweber “Programmed Genome Rearrangements in the Ciliate Oxytricha” In Microbiology spectrum 2.6, 2014