Betti Numbers of Prodsimplicial Complexes for Directed Graphs with Applications to Word Reductions

Lina Fajardo Gómez, Margherita Maria Ferrari, Nataša Jonoska, Masahico Saito

Abstract

We propose custom made cell complexes, in particular prodsimplicial complexes, in order to analyze data consisting of directed graphs. These are constructed by attaching cells that are products of simplices and are suited to study data of acyclic directed graphs, called here consistently directed graphs. We investigate possible values of the first and second Betti numbers and the types of cycles that generate nontrivial homology. We apply these tools to directed graphs associated with reductions of double occurrence words, words that are associated with DNA recombination processes in certain species of ciliates. We study the effects of word operations on the homology for these graphs.

1 Introduction

Topological Data Analysis (TDA) has been extensively used in recent years in biology and other sciences. It tries to capture the underlying structure of a given data set through properties such as connectedness, circular loops and higher dimensional holes. Topological invariants can be used to detect such “topological signatures” of the space. Typically, a given data set to analyze is a set of points in a Euclidean space, which is a priori discrete. To capture topological shapes of such data sets, simplicial complexes are formed based on the proximity of data points.

In some biological phenomena, such as DNA recombination processes, data sets are represented by graphs, which consist of vertices and edges. For example, in [16], gene interaction patterns are represented by graphs where vertices represent genes and edges represent how two genes interact, such as intersecting each other. In this paper, we present a novel graph-based model for genome rearrangement in some ciliate species. Gene assembly pathways can be represented by subword pattern deletions in double occurrence words (DOWs), words where each symbol appears exactly twice [19]. The iterated subword deletions modeling the recombination patterns can be represented by a graph, called a word graph, whose vertices are DOWs connected by a directed edge if one word can be obtained from the other through a pattern deletion. Thus, methods for performing TDA on such directed graph data are of interest.

Another shortcoming of common TDA is that its construction of complexes is often based solely on simplicial ones. In our model of word graphs for gene assembly, it seems natural to fill in squares as well as triangles to study topological aspects of the biological process. Given such a specific biological situation, we are inspired to use more general, custom built cell complexes to study topological properties of data sets. In this paper, for the purpose of specifically applying it to our word graph model, we propose the use of a prodsimplicial complex for topological studies of digraphs.

Thus, our focus of the paper is twofold: (1) defining specific cell complexes, called prodsimplicial complexes, constructed from directed graphs by attaching products of simplices for topological analysis of graph outputs, and (2) applying prodsimplicial complex homology to study the complexity of gene assembly processes through word graphs that model DNA rearrangement in certain species of ciliates.

The paper is organized as follows. In Section 2 we define prodsimplicial cell complexes for directed graphs. In Section 3 we describe some nontrivial generators of homology groups and construct digraphs with arbitrarily large Betti numbers. Section 4 introduces DOWs, their reduction pathways and word graphs, while Section 5 studies the effect of word operations on word graphs and their associated topological invariants.

2 Prodsimplicial Homology for Directed Graphs

Topological data analysis is used to determine the underlying shape of the space containing a data set by studying the space’s invariants. For example, the $n$ -dimensional Betti number, denoted by $\beta_{n}$ , is a topological invariant corresponding to the number of “holes” of dimension $n$ . Instead of using simplices as cells, we define a prodismplicial complex for our goal of studying DNA recombination. In this section, we recall basic concepts from graph theory and introduce the notation that are used throughout the rest of this discussion.

2.1 Directed Graphs with Source and Target

A directed graph (digraph for short) $G$ is a 4-tuple $(V,E,\iota,\tau)$ where $V$ is a finite set, $E\subseteq V\times V\setminus\{(x,x)\ |\ x\in V\}$ and $\iota,\tau$ are maps from $E$ to $V$ . Elements of $V$ and $E$ are called vertices and edges, respectively, and we use $V(G)$ and $E(G)$ to denote the sets of vertices and edges of a given digraph $G$ . The map $\iota$ indicates the source (also called initial) vertex of each edge, and similarly $\tau$ indicates the target (also called terminal) vertex. Directly from the definition, it follows that the graphs considered have no loops or parallel edges. An edge $e$ with source $u$ and target $v$ is denoted ¹¹1To distinguish between ordered pairs in the Cartesian product of two sets and directed edges, we use $[u,v]$ to denote the ordered pair $(u,v)$ when it corresponds to an edge, so that $[(u_{1},v_{1}),(u_{2},v_{2})]$ denotes the directed edge $(u_{1},v_{1})\rightarrow(u_{2},v_{2})$ between vertices $(u_{1},v_{1})$ and $(u_{2},v_{2})$ . with $e=[u,v]$ . A source of $G$ is a vertex $s$ such that $\tau(e)\neq s$ for every edge $e\in E$ . Similarly, a target of $G$ is a vertex $t$ such that $\iota(e)\neq t$ for every edge $e\in E$ .

For ease of notation, we refer to a digraph as $G=(V,E)$ , omitting the maps $\iota$ and $\tau$ whenever clear from the context. We refer the reader to [11] for other elementary definitions in graph theory not explicitly recalled here.

Definition 2.1 (weakly directed, consistently directed).

A connected digraph $G$ is said to be weakly directed if it has unique source $s$ and a unique target $t$ . Moreover, if $G$ is weakly directed and it has no (directed) cycles, then $G$ is called consistently directed. In a weakly directed digraph $G$ we use $s(G)$ and $t(G)$ to denote the unique source and target vertices, respectively.

Example 2.2.

Figure 1 illustrates a digraph that is weakly directed, but not consistently directed since it has a single source (shown in green), a single target (in red) and a directed cycle.

\psscalebox

1.0 1.0

Figure 1: A weakly directed digraph that is not consistently directed. The source is colored with green, while the target is in red.

We use the definition of Cartesian product as in [24].

Definition 2.3 (Cartesian product, prime graph).

Let $G_{1}=(V_{1},E_{1})$ and $G_{2}=(V_{2},E_{2})$ be digraphs. The Cartesian product of $G_{1}$ and $G_{2}$ is the digraph $G=G_{1}\square G_{2}=(V,E)$ where $V=V_{1}\times V_{2}$ and $[(u_{1},v_{1}),(u_{2},v_{2})]\in E$ if either $u_{1}=u_{2}$ and $[v_{1},v_{2}]\in E_{2}$ or $v_{1}=v_{2}$ and $[u_{1},u_{2}]\in E_{1}$ .

A digraph $G$ is said to be prime with respect to the Cartesian product if $G=G_{1}\square G_{2}$ implies that either $G_{1}$ or $G_{2}$ is the trivial graph on one vertex, denoted $\Delta^{0}$ . It is shown in [13] that the factorization of a connected directed graph into prime factors is unique with respect to the Cartesian product.

Note that $G_{1}\square G_{2}\cong G_{2}\square G_{1}$ , as $[(u_{1},v_{1}),(u_{2},v_{2})]\in E(G_{1}\square G_{2})$ if and only if $[(v_{1},u_{1}),(v_{2},u_{2})]\in E(G_{2}\square G_{1})$ so that the Cartesian product is commutative modulo isomorphism. Moreover, as proved in [18], it is also associative. As a consequence, when dealing with the Cartesian product of more than two graphs we write $G=((\ldots((G_{1}\square G_{2})\square G_{3})\ldots)\square G_{k})$ without including nested parentheses. Similarly, we write $V(G)=\{((v_{1},\ldots,v_{k-1}),v_{k})\ |\ v_{i}\in V(G_{i})\text{ for }1\leq i\leq k\}$ without nested parentheses.

The Cartesian product $G_{1}\square G_{2}$ creates one copy (referred to as a $G_{1}$ -layer in [18]) of $G_{1}$ for every vertex in $G_{2}$ , and vice versa. It can be shown that the Cartesian product $G_{1}\square G_{2}$ is consistently directed if and only if both factors are consistently directed [12]. By associativity, this result extends to Cartesian products of several factors.

2.2 Prodsimplicial Complexes for Directed Graphs

Vertices and edges in a graph can be considered as 0-dimensional and 1-dimensional cells respectively, therefore a graph can be endowed with a richer structure by attaching higher dimensional cells at instances of particular subgraphs.

In this section we present the main definitions of prodsimplicial complexes introduced in [20] for directed graphs. Other complexes, such as the $p$ -path complexes, called $p$ -path complexes, have also been considered in [15, 14]. Our motivation comes from a biological process where two paths of length 2 connecting two vertices correspond to two independent pathways of DNA recombination. For such path pairs, we attach solid square faces so that the two pathways are regarded as equivalent.

Definition 2.4 (simplicial digraph).

The $n$ -dimensional simplicial digraph, denoted by $\Delta^{n}$ , is the digraph with vertices $V(\Delta^{n})=\{v_{0},v_{1},\ldots,v_{n}\}$ and edges $E(\Delta^{n})=\{[v_{i},v_{j}]\ |\ 0\leq i<j\leq n\}$ .

Note that the source and target of the edges of $\Delta^{n}$ are induced by the total order on the set of vertices $V(\Delta^{n})=\{v_{0},v_{1},\ldots,v_{n}\}$ . It follows that $\Delta^{n}$ is consistently directed, with source $v_{0}$ and target $v_{n}$ . Moreover, the total order on the vertices guarantees that all subgraphs of $\Delta^{n}$ induced by a subset of $V(\Delta^{n})$ are also simplicial digraphs. Also, any two simplicial digraphs on the same number of vertices are isomorphic as directed graphs. In general, $\Delta^{n}$ can be obtained from $\Delta^{n-1}$ by adding the new vertex $v_{n}$ along with the edges $[v_{i},v_{n}]$ for $i=0,\ldots,n-1$ .

In other works, simplicial digraphs are referred to as directed cliques [23, 21] or transitive tournaments [8]. Simplicial digraphs are prime with respect to the Cartesian product. Thanks to the unique prime decomposition in [13], prodsimplicial cells as described below are well defined.

Definition 2.5 (prodsimplicial cell).

An $N$ -dimensional prodsimplicial cell $P$ is the $N$ -cell that is a product of simplices $\prod_{i=1}^{k}\Delta_{i}^{n_{i}}=\Delta_{1}^{n_{1}}\times\cdots\times\Delta_{k}^{n_{k}}$ where $n_{i}>0$ for all $1\leq i\leq k$ and $N=\sum_{i=1}^{k}n_{i}$ . Its 1-skeleton is the Cartesian product of simplicial digraphs. That is, a graph of the form

\displaystyle\displaystyle\operatorname*{{\square}}\limits_{i=1}^{k}\Delta_{i}^{n_{i}}=\Delta_{1}^{n_{1}}\square\cdots\square\Delta_{k}^{n_{k}}.

For brevity, we call $P$ a prodsimplicial $N$ -cell. When $k=1$ we call $P$ an $N$ -simplex and denote $P$ by $\Delta^{N}$ .

Since the Cartesian product of graphs is commutative up to isomorphism, we assume that in a prodsimplicial cell $P=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}_{i}$ the order of the factors is such that $n_{1}\geq n_{2}\geq\cdots\geq n_{k}$ . From this point on, a simplex will refer exclusively to a prodsimplicial cell with a single simplicial digraph factor unless otherwise specified. Given the correspondence between simplicial digraph factors and simplices, we abuse the notation and write $P=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}_{i}$ instead of $\prod_{i=1}^{k}\Delta^{n_{i}}_{i}$ . Also note that the indices are necessary because the dimensions of the factors in the definition of $P$ may not be distinct. However, we omit the subscripts for brevity where it causes no confusion.

We can naturally define the geometric realization of a simplicial digraph as for standard simplices. From this geometric realization, prodsimplicial $N$ -cells can be viewed as subsets of $\mathbb{R}^{N}$ that inherit the product topology.

Example 2.6.

Prodsimplicial cells include not only simplices, but also cubes (as iterated products of $\Delta^{1}$ ), and triangular prisms (as products of any cell with $\Delta^{1}$ ). Figure 2 depicts the 1-skeletons of all prodsimplicial $3$ -cells.

\psscalebox

1.0 1.0

Figure 2: Directed graphs corresponding to the 1-skeletons of prodsimplicial

3

-cells.

The notion of prodsimplicial cell was first introduced for direct products of undirected graphs by Kozlov in [20]. We choose Cartesian products to obtain a complex that combines some features of both simplicial and cubical complexes while the cells remain consistently directed. Complexes comprised of prodsimplicial cells are also used in knot theory [6, 5].

Definition 2.7 (prodsimplicial cell orientation).

Let $\Delta^{n}$ be a simplex with vertex set $V(\Delta^{n})=\{v_{0},v_{1},\ldots,v_{n}\}$ and $s(\Delta^{n})=v_{0}$ . Let $N(s(\Delta^{n}))$ be the set of neighbors of the source, $V(\Delta^{n})\setminus s(\Delta^{n})$ . An orientation of $\Delta^{n}$ is an equivalence class (by even permutations) of orders of $N(s(\Delta^{n}))$ . An oriented simplex, denoted $[\Delta^{n}]$ , is a simplex along with an orientation.

Let $P=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}$ be a prodsimplicial cell and let $s(P)$ be the source of $P$ . Let $v_{i,0}=s(\Delta^{n_{i}})$ and $N_{i}=\{(v_{1,0},v_{2,0},\ldots,v_{i,j},\ldots,v_{k,0})\ |\ v_{i,j}\in N(s(\Delta^{n_{i}})),1\leq j\leq n_{i}\}$ . Given a total order on each $N_{i}$ , a total order on $N_{1},\ldots,N_{k}$ determines a total order of $N(s(P))$ . An orientation of $P$ is an equivalence class (by even permutations) of such orders of $N(s(P))$ . A prodsimplicial cell $P$ along with an orientation is an oriented prodsimplicial cell and is denoted by $[P]$ .

Example 2.8 (oriented prodsimplicial cells).

Let $\Delta^{2}$ be the 2-simplex with $V(\Delta^{2})=\{a,b,c\}$ and $\Delta^{1}$ be the 1-simplex with $V(\Delta^{1})=\{x,y\}$ . Let $P=\Delta^{2}\square\Delta^{1}$ . Then $s(P)=(a,x)$ and $N_{1}=\{(b,x),(c,x)\}$ and $N_{2}=\{(a,y)\}$ . The order $N_{1},N_{2}$ of the sets of neighbors induces the order $((b,x),(c,x),(a,y))$ on $N(s(P))$ , while $N_{2},N_{1}$ induces the order $((a,y),(b,x),(c,x))$ . If $N_{1}$ is ordered such that $(c,x)$ is before $(b,x)$ , then the order $N_{2},N_{1}$ induces the order $((a,y),(c,x),(b,x))$ . The first two orders give the prodsimplicial cell the same orientation, while the third order of the vertices defines an oppositely oriented cell.

Definition 2.9 (faces, facets, boundary set).

We say $P^{\prime}$ is a face of an $N$ -dimensional prodsimplicial cell $P=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}$ if there exist $n_{i}^{\prime}\leq n_{i}\ \forall i=1,\ldots k$ such that $P^{\prime}=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}^{\prime}}$ . The facets of $P$ are its $(N-1)$ -dimesional faces. The boundary set of $P$ , denoted by $\partial P$ , is the collection of all its facets.

Definition 2.10 (prodsimplicial cell complex).

Given a digraph $G=(V,E)$ , we can inductively construct a prodsimplicial cell complex $\Gamma$ associated with $G$ , denoted by $\Gamma(G)$ using the following gluing process:

•

Let $\Gamma^{(0)}$ be the collection of all the vertices of $G$ .
•

Let $\Gamma^{(1)}=\Gamma^{(0)}\cup E$ . That is, add to $\Gamma^{(0)}$ all the edges of $G$ by attaching to the current complex all the prodsimplicial $1$ -cells whose facets are vertices of $G$ .
•

Let $\Gamma^{(N-1)}$ denote the complex created in the first $N-1$ steps. Let $P$ be a prodsimplicial $N$ -cell. We add $P$ to $\Gamma^{(N)}$ if there exists a map $\phi:\partial P\rightarrow\Gamma^{(N-1)}$ such that $\partial P$ is homeomorphic to $\phi(\partial P)$ , $\phi$ preserves the orientation of each edge, and the restriction of $\phi$ to each facet of $P$ is also a homeomorphism.

Example 2.11 (prodsimplicial cell complex).

Figure 3 shows the construction of a prodsimplicial cell complex $\Gamma^{(3)}$ with cells of dimension at most 3 including seven squares and a cube (with its six faces counted among the squares). The digraph $G$ corresponds to $\Gamma^{(1)}$ , obtained by collecting vertices and edges of $G$ .

(a)	\psscalebox0.6 0.6	(b)	\psscalebox0.6 0.6
(c)	\psscalebox0.6 0.6	(d)	\psscalebox0.6 0.6

Figure 3: Faces of dimension 0 are shown in (a), of dimension 1 in (b), 2 in (c) and 3 in (d) as they are added to the complex.

2.3 Prodsimplicial Homology Groups for Directed Graphs

In this section we describe a boundary operator for prodsimplicial cells, which allows us to compute homology groups for prodsimplicial cell complexes. Although a chain complex structure is defined on a prodsimplicial complex as a special case of CW-complexes, we present an explicit boundary operator on prodsimplicial cells using the product rule for computational purposes.

Given a prodsimplicial cell complex $\Gamma$ , a prodsimplicial $N$ -chain group on $\Gamma$ , denoted $C_{N}(\Gamma)$ , is the free abelian group generated by oriented $N$ -dimensional prodsimplicial cells of $\Gamma$ . Its typical element, a prodsimplicial $N$ -chain, is a finite formal linear combination of $N$ -dimensional prodsimplicial cells with integer coefficients. In the Cartesian product of a simplex and a chain, the product distributes over the sum; that is

\Delta^{n_{0}}\square\left(\sum_{i=1}^{k}\Delta^{n_{i}}\right)=\sum_{i=1}^{k}(\Delta^{n_{0}}\square\Delta^{n_{i}}).

Recall that the boundary operator for simplices is defined by

\partial_{n}(\Delta^{n})=\displaystyle\sum_{i=0}^{n}(-1)^{i}[v_{0},v_{1},\ldots,v_{i-1},\hat{v_{i}},v_{i+1},\ldots,v_{n}],

where $\hat{v_{i}}$ indicates that vertex $v_{i}$ has been deleted from the simplex. For brevity, we use $[\hat{v_{i}}]$ to denote the simplex $[v_{0},v_{1},\ldots,v_{i-1},\hat{v_{i}},v_{i+1},\ldots,v_{n}]$ in computations.

Definition 2.12 (boundary operator).

Let $P=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}$ be a prodsimplicial $N$ -cell, where $N=\sum_{i=1}^{k}n_{i}$ . For $1\leq i\leq k$ , we use the notation

\langle\overline{\partial_{n_{i}}\Delta^{n_{i}}}\rangle=\Delta^{n_{1}}\square\Delta^{n_{2}}\square\cdots\square\Delta^{n_{i-1}}\square\partial_{n_{i}}\left(\Delta^{n_{i}}\right)\square\Delta^{n_{i+1}}\square\cdots\square\Delta^{n_{k}}.

For $1\leq i<j\leq k$ , let

\langle\overline{\partial_{n_{i}}\Delta^{n_{i}}\partial_{n_{j}}\Delta^{n_{j}}}\rangle=\Delta^{n_{1}}\square\cdots\square\partial_{n_{i}}\left(\Delta^{n_{i}}\right)\square\cdots\square\partial_{n_{j}}\left(\Delta^{n_{j}}\right)\square\ldots\square\Delta^{n_{k}}

The $N$ -dimensional boundary operator applied to $P$ is defined as

\displaystyle\displaystyle\partial_{N}\left(P\right)=\partial_{N}\left(\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}\right)

\displaystyle=\sum_{i=1}^{k}(-1)^{\alpha(i)}\displaystyle[\overline{\partial_{n_{i}}\Delta^{n_{i}}}]

where $\alpha(i)=\sum_{\ell=1}^{i-1}n_{\ell}$ is the sum of the dimensions of the factors preceding the $i$ th factor.

Since the boundary operator can only be applied to oriented chains ( $[\Delta^{n}]$ and $[P]$ ), and not graphs ( $\Delta^{n}$ and $G$ ), we omit the square brackets for brevity.

Note that given a prodsimplicial cell complex $\Gamma$ , $\partial_{N}$ defines a group homomorphism $\partial_{N}:C_{N}(\Gamma)\longrightarrow C_{N-1}(\Gamma)$ . We verify that the above defined operator indeed defines a chain complex. Recall, for a simplex $\Delta^{n+1}$ with vertices $\{v_{0},v_{1},\ldots,v_{n+1}\}$ , we have $(\partial_{n}\circ\partial_{n+1})(\Delta^{n+1})=0$ .

Proposition 2.13 ( $\partial^{2}=0$ on prodsimplicial cells).

Let $P=\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}$ be a prodsimplicial $(N+1)$ -cell, where $N+1=\sum_{i=1}^{k}n_{i}$ . Then $(\partial_{N}\circ\partial_{N+1})(P)=0$ .

Proof.

By Definition 2.12, we have that

	$\displaystyle(\partial_{N}\circ\partial_{N+1})(P)$	$\displaystyle=(\partial_{N}\circ\partial_{N+1})\left(\operatorname*{{\square}}_{i=1}^{k}\Delta^{n_{i}}\right)$
	$\displaystyle=$	$\displaystyle\partial_{N}\left(\sum_{i=1}^{k}(-1)^{\alpha(i)}[\overline{\partial_{n_{i}}\Delta^{n_{i}}}]\right)$
	$\displaystyle=$	$\displaystyle\sum_{j<i}(-1)^{\alpha(i)}(-1)^{\alpha(j)}[\overline{\partial_{n_{j}}\Delta^{n_{j}}}\overline{\partial_{n_{i}}\Delta^{n_{i}}}]+\sum_{j>i}(-1)^{\alpha(i)}(-1)^{\alpha(j)-1}[\overline{\partial_{n_{i}}\Delta^{n_{i}}}\overline{\partial_{n_{j}}\Delta^{n_{j}}}]$
	$\displaystyle+$	$\displaystyle(\partial_{n_{i}-1}\circ\partial_{n_{i}})(\Delta^{n_{i}}).$

The term $(\partial_{n_{i}-1}\circ\partial_{n_{i}})(\Delta^{n_{i}})$ is zero. Note that when $j>i$ , the dimension of the $i$ th term has been decreased by one and hence the exponent on $(-1)$ for the sum over $j>i$ is $\alpha(j)-1$ .

Due to this difference of 1, the two sums (over $j<i$ and $j>i$ ) have the same terms with opposite signs, so the final expression equals zero. ∎

With this boundary operator we can compute cycle groups, boundary groups, homology groups, and Betti numbers for prodsimplicial complexes of directed graphs.

3 Betti Numbers and Generators for Consistently Directed Graphs

We examine generators for the first and second homology groups and investigate possible values of Betti numbers in prodsimplicial complexes of directed graphs. We adopt the notation $\beta_{n}(\Gamma(G))$ to denote the $n$ th Betti number of the prodsimplicial complex associated with a digraph $G$ , where $\Gamma(G)$ is built through the gluing process defined in 2.10. For brevity, we abuse the notation and write $\beta_{n}(G)$ instead.

As is well known, $\beta_{0}$ indicates the number of connected components. Since we study consistently directed digraphs, which consist of a single connected component, we take $\beta_{0}(G)=1$ for all digraphs of interest.

By definition, consistently directed graphs admit no cycles in the graph theoretic sense. In the work that follows, where it causes no confusion, we use the term $n$ -cycles to refer to elements of cycle groups.

3.1 Generators of the First Homology Groups

Figure 4 shows two connected digraphs on three vertices. The one in Figure 4(a) is a closed path of the form $(v_{0},v_{1},v_{2}=v_{0})$ and forms a 1-cycle, while simplicial digraphs, as shown in Figure 4(b) do not contribute to $\beta_{1}$ . Although closed paths of the form $(v_{0},v_{1},\ldots,v_{n}=v_{0})$ contribute to $\beta_{1}$ , they are absent in consistently directed graphs, so we exclude them in the search for homology group generators.

\psscalebox

1.0 1.0

Figure 4: The only two possible connected digraphs on three vertices.

The only two consistently directed graphs on four vertices are shown in Figure 5. One is the union of an edge and a path of length three, as depicted in Figure 5(a), which results in a complex homeomorphic to the circle $S^{1}$ . Another is a square cell of the form $\Delta^{1}\square\Delta^{1}$ , shown in Figure 5(b). We summarize our observations as follows.

\psscalebox

1.0 1.0

Figure 5: All consistenty directed graphs on four vertices.

Lemma 3.1.

Let $G=(V,E)$ where $V=\{v_{0},v_{1},v_{2},v_{3}\}$ and $E=\{[v_{0},v_{3}],[v_{0},v_{1}],[v_{1},v_{2}],[v_{2},v_{3}]\}$ , as in Figure 5(a). Then $\beta_{1}(G)=1$ and $\beta_{n}(G)=0$ for all $n>2$ .

We can build graphs with arbitrarily large $\beta_{1}$ values by attaching edge-disjoint paths of length 3 from source to target as follows.

Lemma 3.2.

For all $k\geq 0$ , there exists a graph $G$ such that $\beta_{1}(G)=k$ and $\beta_{n}(G)=0$ for $n>2$ .

\psscalebox

1.0 1.0

Figure 6:

k

disjoint paths of length 3 running “parallel” to

[v_{0},v_{2k+1}]

Proof.

Let $G$ be the graph in Figure 6. Then, as a topological space, $G$ is homotopic equivalent to the bouquet of $k$ circles, hence $\beta_{1}(G)=k$ and $\beta_{n}(G)=0$ for $n>2$ . ∎

Lemma 3.3.

For consistently directed graphs, all 1-cycles that represent nontrivial elements of the first homology group consist of paths with a common source and target $p_{1}=(s=u_{0},u_{1},\ldots,u_{m}=t)$ and $p_{2}=s=v_{0},v_{1},\ldots,v_{n}=t)$ where $\{u_{1},u_{2},\ldots,u_{m-1}\}\cap\{v_{1},v_{2},\ldots,v_{n-1}\}=\varnothing$ .

3.2 Generators of the Second Homology Groups

This section will focus on directed graphs with vertices labeled $v_{i}$ , so we reserve the notation $[v_{0},v_{1},\ldots,v_{n}]$ for simplices and label other low-dimensional prodsimplicial cells as sequences of vertices. For example, $[v_{0},v_{1},v_{2},v_{3}]$ denotes a tetrahedron while $v_{0}v_{1}v_{2}v_{3}$ denotes a square.

In dimension 2, the problem of finding all graphs whose corresponding prodsimplicial complex yields nontrivial 2-cycles is less straightforward. We present here a few of the simplest examples.

To consider generators of the second homology groups, we consider consistently directed graphs of at least five vertices, as Betti numbers for all consistently directed graphs on three and four vertices as shown in Figures 6 and 5. It can be checked by inspection that the only connected and consistently directed graphs on five edges and five vertices are pairs of paths as in Lemma 3.3. For an example with six edges and five vertices with $\beta_{2}(G)=1$ , we consider the graph consisting of three squares as shown in Figure 7.

Lemma 3.4.

Let $G$ be as shown in Figure 7. Then $\beta_{1}(G)=0$ , $\beta_{2}(G)=1$ and $\beta_{n}(G)=0$ for all $n>2$ .

\psscalebox

1.0 1.0

Figure 7: Consistently directed graph whose prodsimplicial complex consists of three squares, six edges and five vertices.

Proof.

We describe each chain group explicitly in terms of the vertices of $G$ . Note that squares are labeled forming a cycle, following the source vertex with its lowest lexicographic order neighbor. The chain groups, $C_{n}$ , in the prodsimplicial complex associated with the graph are described by generators as follows:

	$\displaystyle C_{0}$	$\displaystyle=\langle[v_{0}],[v_{1}],[v_{2}],[v_{3}],[v_{4}]\rangle$
	$\displaystyle C_{1}$	$\displaystyle=\langle[v_{0},v_{1}],[v_{0},v_{2}],[v_{0},v_{3}],[v_{1},v_{4}],[v_{2},v_{4}],[v_{3},v_{4}]\rangle$
	$\displaystyle C_{2}$	$\displaystyle=\langle v_{0}v_{1}v_{4}v_{2},v_{0}v_{2}v_{4}v_{3},v_{0}v_{1}v_{4}v_{3}\rangle.$

In the graph any two paths of length 2 form a consistently directed square, so that the resulting prodsimplicial complex is homeomorphic to a sphere. It follows that $\beta_{1}(G)=0$ , $\beta_{2}(G)=1$ and $\beta_{n}(G)=0$ for all $n>2$ as desired. ∎

In addition to this, we present another graph on five vertices that has nontrivial second homology group that also yields a 2-cycle.

Lemma 3.5.

Let $G$ be the graph in Figure 8(a). Then $\beta_{1}(G)=0$ , $\beta_{2}(G)=1$ and $\beta_{n}(G)=0$ for all $n>2$ .

Proof.

The chain groups in the prodsimplicial complex associated with the graph are as follows:

	$\displaystyle C_{0}$	$\displaystyle=\langle[v_{0}],[v_{1}],[v_{2}],[v_{3}],[v_{4}],[v_{5}]\rangle$
	$\displaystyle C_{1}$	$\displaystyle=\langle[v_{0},v_{1}],[v_{0},v_{2}],[v_{1},v_{3}],[v_{1},v_{4}],[v_{2},v_{3}],[v_{2},v_{4}],[v_{3},v_{5}],[v_{4},v_{5}]\rangle$
	$\displaystyle C_{2}$	$\displaystyle=\langle v_{0}v_{1}v_{3}v_{2},v_{0}v_{1}v_{4}v_{2},v_{1}v_{3}v_{5}v_{4},v_{2}v_{3}v_{5}v_{4}\rangle.$

The four squares are connected in such a way that they form a complex homeomorphic to a sphere, from which the result follows. ∎

Remark 3.6.

It is possible to add edges connecting opposite vertices of a consistently directed square (dividing it into two simplicial digraphs) in either of the graphs depicted in Figure 7 or Figure 8. This yields graphs with complexes homeomorphic to a sphere, with different types of nontrivial polyhedral 2-cycles. For example, the graph in Figure 8(a) can be modified to obtain the graph in Figure 8(b).

(a)

\psscalebox1.0 1.0

(b)

\psscalebox1.0 1.0

Figure 8: Graphs with nontrivial 2-cycles.

3.3 Realizability of Betti Number Combinations

Using some of the graphs from previous results, we construct graphs with larger order having specific homology groups. To prove this result we use the following lemma that follows from the Mayer-Vietoris Sequence [17].

Lemma 3.7.

Let $G$ and $H$ be consistently directed graphs such that $G\cup H$ is consistently directed, $\Gamma(G\cup H)=\Gamma(G)\cup\Gamma(H)$ , $H_{0}(\Gamma(G\cap H))=\mathbb{Z}$ , and $H_{n}(\Gamma(G\cap H))=0$ for $n\geq 1$ . Then

H_{n}(\Gamma(G\cup H))\cong H_{n}(\Gamma(G))\oplus H_{n}(\Gamma(H))

for $n\geq 1$ .

As a direct result of this, we are able to “glue” together an arbitrary number of generator graphs to attain any combination of Betti numbers.

Corollary 3.8.

There exists a graph $G$ such that $\beta_{1}(G)=0$ , $\beta_{2}(G)=k$ and $\beta_{n}(G)=0$ for all $n>2$ .

\psscalebox

1.0 1.0

Figure 9: Graph whose corresponding prodsimplicial complex has second homology of rank

k

Proof.

Let $G$ be as in Figure 9. Then, as a topological space, $G$ is equivalent to a wedge of spheres that are glued along an edge (where homology groups are trivial). As a result, we have $\beta_{1}(G)=0$ , $\beta_{2}(G)=k$ and $\beta_{n}(G)=0$ for all $n>2$ . ∎

Corollary 3.9.

Let $G$ be as in Lemma 3.2 with vertex set $\{v_{0},v_{1},\ldots,v_{2k+1}\}$ and $\beta_{1}(G)=k$ , and let $H$ be as in Corollary 3.8 with vertex set $\{u_{0},u_{1},\ldots,u_{3\ell+1}\}$ and $\beta_{2}(H)=\ell$ so that $v_{0}=u_{3k+1}$ and $G$ and $H$ only intersect at the vertex $v_{0}$ .

Let $G\cup H=(V(G)\cup V(H),E(G)\cup E(H))$ . Then $G\cup H$ is consistently directed and satisfies

	$\displaystyle\beta_{1}(G\cup H)$	$\displaystyle=k$
	$\displaystyle\beta_{2}(G\cup H)$	$\displaystyle=\ell$
	$\displaystyle\beta_{n}(G\cup H)$	$\displaystyle=0\ \text{ for }n\geq 3.$

Proof.

It can be readily checked that the unique source and target of $G\cup H$ are, respectively, $u_{0}$ and $v_{2k+1}$ , so that $G\cup H$ is consistently directed. By assumption, $H_{0}(\Gamma(G\cap H))=\mathbb{Z}$ and $H_{n}(\Gamma(G\cap H))=0$ for $n\geq 1$ so that the result follows from Lemma 3.7. ∎

It is therefore possible to construct consistently directed graphs with first and second homology groups isomorphic to $\mathbb{Z}^{k}$ and $\mathbb{Z}^{\ell}$ , respectively, for any positive integers $k$ and $\ell$ . This construction is not minimal on the number of vertices. For example, the graph shown in Figure 10 follows a construction similar to that in Lemma 3.2 and achieves many of the same $k$ values using fewer vertices and edges.

\psscalebox

1.0 1.0

Figure 10: Directed graph with

{k-1\choose 2}

2-cycles.

Lemma 3.10.

Let $G$ be a digraph with vertices $V(G)=\{v_{0},v_{1},\ldots,v_{k+1}\}$ and edges defined by the collection of paths $(v_{0},v_{i},v_{k+1})$ for $0<i<k+1$ . Then $\beta_{1}(G)=0$ , $\beta_{2}(G)={k-1\choose 2}$ , and $\beta_{n}(G)=0$ for all $n>2$ .

4 Word Graphs of Double Occurrence Words

In this section we focus on specific biomolecular processes where consistently directed graphs appear and the prodsimplicial complexes can be applied.

Massive rearrangement processes are observed during the development of somatic nuclei in certain species of ciliates such as Oxytricha trifallax’s. The recombination is guided by short DNA repeats flanking the DNA segments that are rearranged and guiding their order. These short repeats can be modeled by a sequence of double occurrence words, words where each symbol appears twice [7, 25, 22]. In particular, over 90% of DNA rearrangement in these species can be described through an iterated process of deletion of repeat and return words [4]. Repeat and return words are generalizations of square and palindromic factors in words and are of interest in language theory [4, 19, 2]. In this section we describe graphs associated with double occurrence words that model these DNA rearrangement process [3].

4.1 Double Occurrence Words

We call an ordered, countable set of symbols $\Sigma$ an alphabet. A word over $\Sigma$ is a finite sequence of the form $w=a_{1}a_{2}\ldots a_{n}$ where $a_{i}\in\Sigma$ whose length, denoted $|w|$ , is $n$ . We denote with $\Sigma^{*}$ the set of all words over $\Sigma$ , including the empty word, denoted by $\epsilon$ . The set of all symbols comprising a word $w$ is denoted by $\Sigma[w]$ . The reverse of a word $w=a_{1}a_{2}\ldots a_{n}$ is $w^{R}=a_{n}a_{n-1}\ldots a_{2}a_{1}$ . The word $v$ is a factor of the word $w$ , denoted $v\sqsubseteq w$ , if there exist $w_{1},w_{2}\in\Sigma^{\ast}$ such that $w=w_{1}vw_{2}$ . In this presentation we set $\Sigma\subseteq[n]$ for some $n\in\mathbb{N}$ . For example $w=122313$ is a word over $[3]=\{1,2,3\}$ of length $\lvert w\rvert=6$ . The reverse of $w=122313$ is the word $w^{R}=313221$ .

A word $w\in\Sigma^{*}$ is called a double occurrence word (DOW) if every symbol in $\Sigma$ appears in $w$ either zero or two times. We use $\Sigma_{DOW}$ to denote the set of all DOWs over $\Sigma$ . Similarly, we call $w$ a single occurrence word (SOW) if each symbol in $\Sigma$ appears either once or not at all. The set of SOWs over $\Sigma$ is denoted by $\Sigma_{SOW}$ . Since a DOW of length $n$ uses $n/2$ distinct symbols, we say that the size of a DOW $w$ is $|w|/2$ . When we restrict $\Sigma_{DOW}$ to DOWs of size less than or equal to $n$ , we denote the set by $\Sigma_{DOW}^{\leq n}$ .

A word $w\in\Sigma^{*}$ is said to be in ascending order if $a_{1}=\min(\Sigma[w])$ and the first appearance of each symbol is the immediate successor of the largest of all the preceding symbols. For example, the word $w_{1}=122313$ is a DOW in ascending order, while the DOW $w_{2}=133212$ is not.

We say that $w_{1}$ and $w_{2}$ are ascending order equivalent, and write $w_{1}\sim w_{2}$ , if there exists a bijection on $\Sigma$ inducing a morphism $f$ on $\Sigma^{\ast}$ such that $f(w_{1})=w_{2}$ . Words $w_{1}=122313$ and $w_{2}=133212$ are equivalent via the bijective map given by: $1\mapsto 1,\;3\mapsto 2,\;2\mapsto 3.$ Since words in ascending order are unique, up to this equivalence we consider words in ascending order as representatives of the classes determined by the relation $\sim$ .

From now on, we consider only equivalence classes of DOWs and abuse the notation by writing words in place of their equivalence class where no confusion arises. The following definition can be found in [9].

Definition 4.1 (repeat word, return word).

Let $x,y,z\in\Sigma^{*}$ and $u\in(\Sigma\setminus\Sigma[w])_{SOW}$ . We say that

•

the word $uu$ is a repeat word in $w=xuyuz$ and the word $xyz$ is obtained from $w$ by a repeat deletion denoted $d_{u}(w)=xyz$ . In this case we call $u$ a repeat factor in $w$ .
•

the word $uu^{R}$ is a return word in $w=xuyu^{R}z$ and the word $xyz$ is obtained from $w$ by a return deletion, also denoted $d_{u}(w)=xyz$ . In this case we call $u$ a return factor in $w$ .

Repeat or return words, $uu$ or $uu^{R}$ , where $\lvert u\rvert=1$ are called trivial. We say that a word $uu$ (resp. $uu^{R}$ ) is a maximal repeat (resp. return) word in $w$ if there are no other repeat (resp. return) factors $v$ in $w$ containing $u$ with $\lvert v\rvert>\lvert u\rvert$ . Following [1], we use $M^{DOW}_{w}$ to denote the set of maximal repeat ( $uu$ ) or return ( $uu^{R}$ ) words in $w$ . In addition, we define the set of repeat or return factors in $w$ as $M^{SOW}_{w}=\{u\sqsubseteq w\ |\ uu\in M^{DOW}_{w}\text{ or }uu^{R}\in M^{DOW}_{w}\}$ .

Lemma 4.2.

[1] Let $w$ be a DOW of size $n$ . For each $x\in\Sigma[w]$ there exists a unique $u\in M^{SOW}_{w}$ such that $x\in\Sigma[u]$ .

The set of maximal repeat or return words may include trivial repeat or return words, as illustrated in Example 4.4. In ascending order, some of the factors may be equivalent so elements of $M^{DOW}_{w}$ (resp. $M^{SOW}_{w}$ ) are always written as DOWs (resp. SOWs) and not their equivalence classes, so that every symbol in $w$ appears in some word in $M^{SOW}_{w}$ . We are interested in maximal repeat and return words, and present the following definition, which was adapted from the so-called “pattern reduction” process described in [1] and [19].

Definition 4.3 (successor, predecessor).

The set $D(w)=\bigcup_{u\in M^{SOW}_{w}}\{v\ |\ v\text{ is in ascending order and }v\sim d_{u}(w)\}$ is called the set of immediate successors of $w$ . If there exists a sequence of words $w=w_{1},w_{2},\ldots,w_{n}=w^{\prime}$ such that $w_{i}\sim d_{u_{i}}(w_{i-1})$ for some choice of $u_{i}\in M^{SOW}_{w_{i}}$ , we call $w^{\prime}$ a successor of $w$ and $w$ a predecessor of $w^{\prime}$ . Note that the empty word $\epsilon$ is a successor of all words.

Example 4.4.

Let $w=1234523541$ . The set of maximal repeat or return words in $w$ is $M^{DOW}_{w}=\{11,2323,4554\}$ and $M^{SOW}_{w}=\{1,23,45\}$ . Since $d_{1}(w)=23452354\sim 12341243$ and $d_{23}(w)=145541\sim 123321$ , we have that the set of immediate successors of $w$ is $D(w)=\{12341243,123321,123231\}$ . We may continue to delete subwords from the successors as follows. The maximal repeat or return factors of $12341243,123321,$ and $123231$ are

	$\displaystyle M^{SOW}_{12341243}$	$\displaystyle=\{12,34\}$
	$\displaystyle M^{SOW}_{123321}$	$\displaystyle=\{123\}$
	$\displaystyle M^{SOW}_{123231}$	$\displaystyle=\{1,23\}$

so that their deletions yield

	$\displaystyle d_{12}(12341243)$	$\displaystyle=3443\sim 1221$
	$\displaystyle d_{34}(12341243)$	$\displaystyle=1212$
	$\displaystyle D(12341243)$	$\displaystyle=\{1221,1212\}$

	$\displaystyle d_{123}(123321)$	$\displaystyle=\epsilon$
	$\displaystyle D(123321)$	$\displaystyle=\{\epsilon\}$

	$\displaystyle d_{1}(123231)$	$\displaystyle=2323\sim 1212$
	$\displaystyle d_{23}(123231)$	$\displaystyle=11$
	$\displaystyle D(123231)$	$\displaystyle=\{1212,11\}.$

Repeating this process once more yields $D(1212)=D(1221)=D(11)=\{\epsilon\}$ . We can therefore say that the set of all successors of $w$ is the set

\{12341243,123321,123231,1221,1212,11,\epsilon\}.

4.2 Word Graphs

We refer the reader to [11] for elementary definitions in graph theory and to Section 2.1 for the definitions of a directed graph, source, and target.

Definition 4.5 (global word graph).

The global word graph $G_{n}=(V,E)$ of double occurrence words of size $n$ is the graph defined by:

•

$V(G_{n})=\Sigma_{DOW}^{\leq n}/_{\sim}$ ;
•

$E(G_{n})=\bigcup_{w\in V}E_{w}$ , where $E_{w}=\{[w,v]\ |v\in D(w)\}$ .

For a vertex $w$ in $G_{n}$ , we define the word graph rooted at $w$ , denoted $G_{w}$ , as the induced subgraph of the global word graph containing as vertices $w$ and all of its successors.

By construction $G_{w}$ does not contain any cycles, and has unique source $w$ and unique target $\epsilon$ hence word graphs are consistently directed.

The figures in this section were computer generated. Though we do not include the characters in our presentation, the labels on the vertices are separated by commas to improve readability.

Example 4.6.

The global word graph $G_{2}$ of size $2$ is shown in Figure 11, with the word graph rooted at $1122$ highlighted in blue. The vertex set is $V(G_{2})=\Sigma_{DOW}^{\leq 2}/_{\sim}=\{\epsilon,11,1221,1212,1122\}$ . The word graph rooted at $w=1234523541$ whose successors are computed in Example 4.4 is depicted in Figure 12.

Refer to caption — Figure 11: Global word graph of size $2$ .

4.3 DOW Operations and Their Effect on Word Graphs

4.3.1 Operations that Result in Isomorphic Word Graphs

We present some results on the cases where insertions, substitutions, or reversal do not affect the word graphs of DOWs. A prodsimplicial complex associated to $w\in\Sigma_{DOW}$ is the prodsimplicial complex associated with $G_{w}$ . We consider operations that yield classes of DOWs whose complexes have similar topological properties.

To find all predecessors of a given DOW $w$ we consider all possible insertions to $w$ up to ascending order equivalence. In [9] the insertions that yield equivalent DOWs and hence corresponding isomorphic word graphs were characterized. We present here results about DOWs that are not ascending order equivalent but yield isomorphic word graphs.

Definition 4.7.

Let $w,w^{\prime}\in\Sigma_{DOW}$ be such that $\Sigma[w]\cap\Sigma[w^{\prime}]=\varnothing$ . We define the concatenation of $w$ and $w^{\prime}$ as the DOW that is ascending order equivalent to $ww^{\prime}$ .

Proposition 4.8.

Let $w\in\Sigma_{DOW}$ , and $u\in M^{SOW}_{w}$ . Let $v\in(\Sigma\setminus\Sigma[w])_{SOW}$ and let

	$\displaystyle w^{\prime}$	$\displaystyle=xu_{1}vu_{2}yu_{1}vu_{2}z$
	$\displaystyle w^{\prime\prime}$	$\displaystyle=xu_{1}vu_{2}yu_{2}^{R}v^{R}u_{1}^{R}z$

where $u_{1},u_{2}\in\Sigma^{*}$ such that $u=u_{1}u_{2}$ . Then $G_{w}\cong G_{w^{\prime}}\cong G_{w^{\prime\prime}}$ .

Proof.

In this case $u_{1}vu_{2}u_{1}vu_{2}$ is a maximal repeat word in $w^{\prime}$ . Note that $M^{SOW}_{w^{\prime}}=(M^{SOW}_{w}\setminus\{u\})\cup\{u_{1}vu_{2}\}$ but $d_{u}(w)=d_{u_{1}vu_{2}}(w^{\prime})$ . Hence, the word graphs of $w$ and $w^{\prime}$ are isomorphic: $G_{w^{\prime}}\cong G_{w}$ . The case when $uu^{R}$ is a maximal return word is similar. ∎

We call $w\in\Sigma_{DOW}$ a palindrome if $w^{R}\sim w$ . For example, the DOW $w=123231$ is a palindrome, as $w^{R}=132321\sim 123231$ . For a palindrome $w$ , both $w$ and $w^{R}$ are in the same ascending order equivalence class so that the corresponding word graphs are the same. As a result, the reversal operation has no effect on the prodsimplicial complex obtained.

Lemma 4.9.

Let $w\in\Sigma_{DOW}$ . Then $G_{w}\cong G_{w^{R}}$ .

Proof.

Note that $u$ is a repeat (resp. return) word in $w$ if and only if $u^{R}$ is a repeat (resp. return) word in $w^{R}$ . Then if $v$ is a successor of $u$ , for each edge $[u,v]\in G_{w}$ we have a corresponding edge $[u^{R},v^{R}]\in E(G_{w^{R}})$ . This bijection between the edges induces an isomorphism between the graphs. ∎

Example 4.10.

Let $w=122133$ . Then $w^{R}=331221\sim 112332$ . The word graphs rooted at $122133$ and $112332$ are isomorphic, as shown in Figure 13.

For the rest of this section, we consider the effect of substituting a repeat word $uu$ by the return word $uu^{R}$ on word graphs. In some cases we may substitute a maximal repeat word in a DOW $w$ for a maximal return word in $w$ without affecting the word graph. The following properties seem to play an important role in this context.

Definition 4.11 (square repeat or return words).

Let $w\in\Sigma_{DOW}$ and let $u\in M^{SOW}_{w}$ . We say that $u$ is a square factor of $w$ if there exists $v\in M^{SOW}_{w}$ such that $v\neq u$ but $\lvert v\rvert=\lvert u\rvert$ . A DOW $w$ is said to be squarefree if it has no square factors.

Note that if $w$ is squarefree and $uu\in M^{DOW}_{w}$ then $vv,vv^{R}\not\in M^{DOW}_{w}$ for all $v\in M^{SOW}_{w}$ with $\lvert u\rvert=\lvert v\rvert$ . Since no repeat or return factors appear more than once in $w$ , we also have that all deletions $d_{u}(w)$ for $u\in M^{SOW}_{w}$ result in distinct DOWs. In particular, all subwords of $w$ are also squarefree.

Example 4.12 (square factor, squarefree words).

Let $w=12123434$ . Then $M^{DOW}_{w}=\{12,34\}$ and $12$ is a square repeat word in $w$ . Let $x,y,z\in\Sigma_{SOW}$ have distinct lengths. Then $w^{\prime}=xyxzyz$ is squarefree.

Definition 4.13 (coprime).

Given two DOWs $w$ and $w^{\prime}$ where $\Sigma[w]\cap\Sigma[w^{\prime}]=\varnothing$ , we say $w$ is coprime to $w^{\prime}$ if all words of the form $uv$ where $u\in V(G_{w})$ and $v\in V(G_{w^{\prime}})$ are distinct in ascending order.

By the definition for coprime words, words $w$ and $w^{\prime}$ are coprime if for all $u,u^{\prime}\in V(G_{w})$ and all $v,v^{\prime}\in V(G_{w^{\prime}})$ , $u\neq u^{\prime}$ and $v\neq v^{\prime}$ implies $uv\not\sim u^{\prime}v^{\prime}$ .

Example 4.14 (coprime words).

The word $w=12234143$ has successors $\{1221,112332,123132,11,\epsilon\}$ , and the word $w^{\prime}=5678978956$ has successors $\{5656,789789,\epsilon\}.$ It can be checked that all concatenations are distinct, therefore $w$ and $w^{\prime}$ are coprime.

Lemma 4.15.

Let $w^{\prime}$ be a successor of $w$ . Then $G_{w^{\prime}}$ is an induced subgraph of $G_{w}$ .

Lemma 4.16.

If $w$ is coprime to $w^{\prime}$ , then $w^{\prime}$ is coprime to $w$ .

Proof.

Suppose $w^{\prime}$ is not coprime to $w$ and let $u,u^{\prime}\in V(G_{w})$ (where $u\neq u^{\prime}$ ) and $v,v^{\prime}\in V(G_{w^{\prime}})$ (where $v\neq v^{\prime}$ ) be such that $uv=u^{\prime}v^{\prime}$ . Note that since $\epsilon\in V(G_{w})\cap V(G_{w^{\prime}})$ , if there exists a nonempty DOW $x\in V(G_{w})\cap V(G_{w^{\prime}})$ then $w$ cannot be coprime to $w^{\prime}$ , as $\epsilon x=x\epsilon=x$ . Without loss of generality, let $\lvert u\rvert>\lvert u^{\prime}\rvert$ Then $u^{\prime}=ux$ for some DOW $x$ (since $u\in\Sigma_{DOW}$ ) and $v=xv^{\prime}$ , so that $M^{SOW}_{v}=M^{SOW}_{v^{\prime}}\cup M^{SOW}_{x}$ . Moreover, for all $y\in M^{SOW}_{v^{\prime}}$ , if $v_{1}=d_{y}(v)$ then $v=xv_{1}^{\prime}$ where $v_{1}^{\prime}=d_{y}(v^{\prime})$ . Inductively, if $v$ reduces to $v_{1}^{\prime},v_{2}^{\prime},\ldots,\epsilon$ via the deletion of $y_{1},y_{2},\ldots,y_{k}$ , then $v$ reduces to $x$ via the deletion of $y_{1},y_{2},\ldots,y_{k}$ . That is, $x\in V(G_{w})$ . Similarly, if $u^{\prime}=ux$ we have that $x\in V(G_{w^{\prime}})$ , so that $w^{\prime}$ is not coprime to $w$ . ∎

Given this symmetry, instead of saying that $w$ is coprime to $w^{\prime}$ , we say that $w$ and $w^{\prime}$ are coprime.

In the following examples, $w=xyz$ and $u$ are coprime words and the word $w^{\prime}$ (resp. $w^{\prime\prime}$ ) resulting from the insertion of $uu$ (resp. $uu^{R}$ ) in $w$ is squarefree. In one instance, the substitution results in $G_{w^{\prime}}\cong G_{w}$ , while in another it does not. We conjecture that squarefree and coprime are necessary conditions for invariance of word graphs under substitution of repeat and return words.

Example 4.17 (substitution results in isomorphic word graphs).

Let $x=12$ , $y=345$ , $z=54312$ and $u=6789$ . Then $w^{\prime}=xuyuz\sim 123456789345987612$ and $w^{\prime\prime}=xuyu^{R}z\sim 123456789543987612$ . Here, $uu$ and $xyz$ are coprime, and both insertions result in squarefree words, with $G_{w^{\prime}}\cong G_{w^{\prime\prime}}$ as depicted in Figure 14.

Example 4.18 (substitution does not result in isomorphic word graphs).

Let $x=1$ , $y=12345$ , $z=5432$ and $u=67$ . Then $w^{\prime}=xuyuz\sim 12314567237654$ and $w^{\prime\prime}=xuyu^{R}z\sim 12314567327654$ . Here we have that $uu$ is coprime to $xyz$ and the resulting insertions are squarefree but $G_{w^{\prime}}\not\cong G_{w^{\prime\prime}}$ as depicted in in Figure 15.

4.3.2 Doubling Effect

Concatenation of a repeat (resp. return) word $uu$ (resp. $uu^{R}$ ) at the end of an existing word $w$ creates two subgraphs isomorphic to $G_{w}$ within $G_{wuu}$ (resp. $G_{wuu^{R}}$ ). These two subgraphs may not be disjoint unless we impose additional conditions, as discussed in the examples below.

Lemma 4.19.

Let $w\in\Sigma_{DOW}$ and $uu\in(\Sigma\setminus\Sigma[w])_{DOW}$ be coprime. Then there exist two distinct (but possibly not disjoint) subgraphs $G_{1}$ and $G_{2}$ of $G_{wuu}$ isomorphic to $G_{w}$ . Similarly, $G_{uuw}$ has two subgraphs isormophic to $G_{w}$ . Moreover, we have $G_{wuu}\cong G_{uuw}$ .

Proof.

Without loss of generality we prove the result for $G_{wuu}$ only. We claim that for each $v\in V(G_{w})$ , the word $vuu$ is ascending order equivalent to some $v^{\prime}\in V(G_{wuu})$ . Indeed, if $v$ is obtained from $w$ through iterated deletions so that

v\sim d_{x_{1}}(d_{x_{2}}(d_{x_{3}}(\cdots d_{x_{n}}(w))\cdots))

where $x_{i}\not\sim u$ for all $1\leq i\leq n$ , then $v^{\prime}\sim vuu$ is obtained from $wuu$ through

vuu\sim d_{x_{1}}(d_{x_{2}}(d_{x_{3}}(\cdots d_{x_{n}}(wuu))\cdots)).

Let $\iota:V(G_{w})\rightarrow V(G_{wuu})$ be the inclusion of vertices and let $G_{1}$ be the graph induced by the image of $G_{w}$ in $G_{wuu}$ . Let $G_{2}$ be the induced subgraph of $G_{wuu}$ generated by the set of vertices of the form $v^{\prime}\sim vuu$ for $v\in V(G_{w})$ . Define $f:V(G_{1})\rightarrow V(G_{2})$ where $v\mapsto v^{\prime}\sim vuu$ . Due to the above correspondence, $f$ is a bijection on the sets of vertices. Note that if $[v_{1},v_{2}]\in G_{w}$ then $v_{2}\sim d_{x}(v_{1})$ for some $x\in M^{SOW}_{v_{1}}$ . This holds if and only if $v_{2}uu\sim d_{x}(v_{1}uu)$ or $[v_{1}uu,v_{2}uu]\in G_{wuu}$ so that $f$ induces a bijection between edge sets that preserves vertex adjacency. It follows that $G_{1}\cong G_{2}$ . ∎

Example 4.20.

The word graph corresponding to $12132344$ has two subgraphs isomorphic to the pentagon $G_{121323}$ in it as described in Lemma 4.19, which are not disjoint. This can be seen in Figure 16, where the two subgraphs are highlighted in different colors.

4.3.3 Product Effect

In this subsubsection we study word graphs of concatenated words.

Theorem 4.21.

Let $w,w^{\prime}\in\Sigma_{DOW}$ be coprime. Then we have $G_{ww^{\prime}}\cong G_{w}\square G_{w^{\prime}}$ .

Proof.

We begin by noting that from the definition of concatenation, every vertex in $G_{ww^{\prime}}$ can be decomposed into $u_{i}v_{j}$ where $u_{i}\sim u\in V(G_{w})$ and $v_{j}\sim v\in V(G_{w^{\prime}})$ . We define a map $f:V(G_{w}\square G_{w^{\prime}})\rightarrow V(G_{ww^{\prime}})$ where $(u,v)\mapsto u_{i}v_{j}$ . By hypothesis, since $w$ and $w^{\prime}$ are coprime, $u\neq u^{\prime}$ and $v\neq v^{\prime}$ implies $uv\not\sim u^{\prime}v^{\prime}$ for all $u,u^{\prime}\in V(G_{w})$ and all $v,v^{\prime}\in V(G_{w^{\prime}})$ . This makes $f$ one-to-one. Surjectivity follows from the fact that no successors of $ww^{\prime}$ will arise other than by concatenation of successors of $w$ and successors of $w^{\prime}$ . It follows that $f$ is a bijection on the sets of vertices.

We abuse the notation and write the vertices of $G_{ww^{\prime}}$ as $uv$ where $\Sigma[u]\cap\Sigma[v]=\varnothing$ , with $u\sim u^{\prime}\in V(G_{w})$ and $v\sim v^{\prime}\in V(G_{w^{\prime}})$ following the decomposition described above. Let $[(u_{i},v_{j}),(u_{i}^{\prime},v_{j}^{\prime})]\in E(G_{w}\square G_{w^{\prime}})$ . Then either $u_{i}=u_{i}^{\prime}$ and $v_{j}^{\prime}\sim d_{x}(v_{j})$ for some $x\in M^{SOW}_{v_{j}}$ or $v_{j}=v_{j}^{\prime}$ and $u_{i}^{\prime}\sim d_{y}(u_{i})$ for some $y\in M^{SOW}_{u_{i}}$ . Note that since $w$ and $w^{\prime}$ share no symbols we may write $M^{DOW}_{ww^{\prime}}=M^{DOW}_{w}\sqcup M^{DOW}_{w^{\prime}}$ . We observe that for $u_{i}v_{j},u_{i}^{\prime}v_{j}^{\prime}\in V(G_{ww^{\prime}})$ there exists an edge $[u_{i}v_{j},u_{i}^{\prime}v_{j}^{\prime}]\in E(G_{ww^{\prime}})$ if and only if $u_{i}=u_{i}^{\prime}$ and $[v_{j},v_{j}^{\prime}]\in E(G_{w^{\prime}})$ or $[u_{i},u_{i}^{\prime}]\in E(G_{w})$ and $v_{j}=v_{j}^{\prime}$ . This implies that $[(u_{i},v_{j}),(u_{i}^{\prime},v_{j}^{\prime})]\in E(G_{w}\square G_{w^{\prime}})$ if and only if $[f((u_{i},v_{j})),f((u_{i}^{\prime},v_{j}^{\prime}))]\in E(G_{ww^{\prime}})$ , which indicates that $G_{ww^{\prime}}\cong G_{w}\square G_{w^{\prime}}$ as desired. ∎

In the case where $w^{\prime}$ is a repeat or return word, with word graph having two vertices connected by an edge, we have the following corollary.

Remark 4.22.

Note that if $w\in\Sigma_{DOW}$ and $u\in\Sigma_{SOW}$ are such that $uu\not\sim v$ for all $v\in V(G_{w})$ , then $G_{wuu}\cong G_{w}\square\Delta^{1}$ . Similarly, if $uu^{R}\not\sim v$ for all $v\in V(G_{w})$ then $G_{wuu^{R}}\cong G_{w}\square\Delta^{1}$ .

5 Betti Numbers and Generators for Word Graphs

Having shown the result in general for arbitrary graphs, we now focus on consistently directed graphs corresponding to word graphs of DOWs to study the realizability of Betti numbers. Note that for DOWs $w$ and $w^{\prime}$ , if $w\in M^{DOW}_{w^{\prime}}$ then $G_{w}\leq G_{w^{\prime}}$ . To find generators for the first and second homology groups among word graphs of DOWs we therefore turn to minimal size words whose corresponding graphs attain particular $\beta_{1}$ and $\beta_{2}$ values.

A Python script (available at https://github.com/fajardogomez/dowgraphs) was used to create the prodsimplicial complex associated with any directed graph $G$ and compute Betti numbers.

5.1 Betti Numbers of Word Graphs

Let us consider the minimal size nontrivial 1-cycles among word graphs of DOWs. We claim that for the DOW $w$ , $G_{w}$ satisfies $H_{1}(\Gamma(G_{w}))\cong\mathbb{Z}$ and all other homology groups are trivial:

1.\quad 121323,\hskip 42.67912pt2.\quad 122331.

The corresponding word graphs are shown in Figure 17.

The result follows from Lemma 3.3, as the graphs consist of parallel paths from source to target of lengths 2 and 3, five edges in total, forming a pentagon. The DOWs on the least number of symbols that result in a generator for the first homology group are $121323$ and $122331$ .

Note that a $1$ -cycle that is a square cannot be a generator of the first homology group because every square in a word graph (DOW graphs) consists of two paths of length 2, and therefore bounds a 2-cell. In all examples we have examined, $n$ -gons representing nontrivial 1-cycles for $n\geq 6$ can be reduced to pentagons, so we conjecture that all nontrivial 1-cycles are homologous to pentagons.

Example 5.1.

Let $u_{1}$ , $u_{2}$ and $\alpha$ be SOWs with $\Sigma[u_{1}]\cap\Sigma[u_{2}]\cap\Sigma[\alpha]=\varnothing$ . The following word patterns result in pentagons and generalize the word graphs of $121323$ and $122331$ : $\alpha u_{1}u_{1}^{R}u_{2}u_{2}^{R}\alpha^{R}$ and $u_{1}\alpha u_{1}u_{2}\alpha u_{2}$ provided that $\lvert u_{1}\rvert=\lvert u_{2}\rvert$ .

On the other side, all types of polyhedra that represent generators of nontrivial second homology discussed in Section 3.2 can be realized in word graphs.

The complex corresponding to the word graph rooted at $12323414$ is homeomorphic to that of the graph in Lemma 3.5. This can be verified by referring to Figure 18. Other word graphs have subgraphs isomorphic to those described in Lemma 3.10 and Lemma 3.4. For example, the word graph rooted at $12323144$ contains both.

5.2 Betti Numbers of Successors

In general, if $u$ is a successor of $w$ , it does not follow that $\beta_{n}(G_{w})\geq\beta_{n}(G_{u})$ . Inspired by the work in [3], we define the $n$ -symbol tangled cord as the word

t_{n}:=1213243\cdots(n-1)(n-2)n(n-1)n.

Note that $d_{n}(t_{n})=t_{n-1}$ so that $t_{n-1}\in V(G_{t_{n}})$ for any $n>1$ . For example, $t_{6}=121324354656$ has (the word corresponding to) the tangled cord of size five $t_{5}=1213243545$ as a successor. Based on results from the custom Python script, we have $\beta_{1}(G_{t_{6}})=1$ and $\beta_{1}(G_{t_{5}})=2$ . This is the smallest pair of consecutive tangled cords where this inversion in Betti numbers is present. The next inversion occurs for $\beta_{2}$ when $n=11$ , as shown in Table 1.

$n$	$t_{n}$	$\beta_{1}(G_{t_{n}})$	$\beta_{2}(G_{t_{n}})$	$\lvert V(G_{t_{n}})\rvert$
2	1,2,1,2	0	0	2
3	1,2,1,3,2,3	1	0	5
4	1,2,1,3,2,4,3,4	1	2	8
5	1,2,1,3,2,4,3,5,4,5	2	6	13
6	1,2,1,3,2,4,3,5,4,6,5,6	1	27	21
7	1,2,1,3,2,4,3,5,4,6,5,7,6,7	1	54	34
8	1,2,1,3,2,4,3,5,4,6,5,7,6,8,7,8	1	86	55
9	1,2,1,3,2,4,3,5,4,6,5,7,6,8,7,9,8,9	1	111	89
10	1,2,1,3,2,4,3,5,4,6,5,7,6,8,7,9,8,10,9,10	1	126	144
11	1,2,1,3,2,4,3,5,4,6,5,7,6,8,7,9,8,10,9,11,10,11	1	116	233
12	1,2,1,3,2,4,3,5,4,6,5,7,6,8,7,9,8,10,9,11,10,12,11,12	1	112	377

Table 1: Tangled cords and invariants of their word graphs.

It is of interest to find conditions for a successor $w^{\prime}$ of $W$ such that $\beta_{n}(G_{w})\geq\beta_{n}(G_{w^{\prime}})$ does not hold for some $n\geq 1$ .

6 Concluding Remarks

In this paper, a method to study topological properties of digraphs is proposed, by constructing a prodsimplicial complex for a given digraph. The construction provides an approach for TDA to be used on biological data that can be described with graphs. A specific such example, called the word graph, is presented that model assembly pathways in a ciliate species via DOWs [19]. The proposed construction of prodsimplicial complexes is applied to word graphs, and the Betti numbers are determined. The effect on word graphs under various operations, such as concatenation of words, are studied, and types of generators of homology were examined.

A number of problems remain unsolved. Lemma 3.3 characterizes generators of the first homology groups for digraphs and Corollary 3.9 addresses the realizability of combinations of $\beta_{1}$ and $\beta_{2}$ . However, we do not know all possible combinations of values of Betti numbers $\beta_{1}$ and $\beta_{2}$ over all word graphs. Though only a few specific examples of DOWs where the corresponding word graph has nontrivial 1-cycles and 2-cycles are included, more can be obtained through the DOW operations described in Section 4.3.1. For example, the concatenation of $121323$ with $4545$ corresponds to the Cartesian product of the word graphs (as shown in Remark 4.22) and therefore has no effect on Betti numbers so that $\beta_{1}(G_{1213234545})=\beta_{1}(G_{121323})=1$ . However, these methods would not create an exhaustive list of all graphs attaining particular given Betti numbers. It is desirable to characterize the effect of other word operations, such as insertions that disrupt maximal repeat or return words and thus may alter the homology groups of the complex on their word graphs.

We expect that the list of types of 1- and 2-cycles that represent nontrivial classes, such as the ones in Lemma 3.3, Lemma 3.4, and Lemma 3.5 may not be exhausted. We found pentagons as in Figure 17 as nontrivial 1-cycles in word graphs. We do not know whether there are other polygons that represent nontrivial cycles. The SageMath [10] topology package has an option to find generators for the homology groups. However, the linear combinations in the output are not necessarily optimized to produce minimal generators. For instance, a generator for the first homology group in the word graph of $t_{10}$ is listed as a linear combination of nine edges. However, when these are drawn as an induced subgraph it becomes apparent that there exist edges between the vertices in the cycle so that it is homologous to a class represented by a pentagon.

It is also of interest to classify possible polyhedra that represent nontrivial second homology classes. We gave only partial answers in Section 3. Polyhedra such as “pillows” formed by two squares sharing all four boundary edges, are not allowed in the construction of the prodsimplicial complex. All word graphs computed here with nontrivial 1-cycles or 2-cycles in their corresponding prodsimplicial complex have subgraphs isomorphic to those in Lemmas 3.1, 3.2, 3.3, 3.4, 3.10 and 3.5. The problem of finding an exhaustive list of generators using a minimal number of vertices and edges remains open.

As a note on homology groups in general, SageMath outputs full groups instead of just Betti numbers as presented here. In an analysis of all DOWs of up to 7 symbols, no DOWs had a word graph whose corresponding prodsimplicial complex had torsion [12]. As the number of DOWs for each size increases superexponentially, homology groups were computed on longer DOWs for only samples of randomly generated DOWs and special word categories. As an interesting find, the homology groups of the tangled cord on ten symbols, $t_{10}$ , had 2-torsion in the first (or second) homology group. Due to the large number of vertices and edges in the generator, however, it is not possible to find a type of generator of the 2-torsion. It remains an open problem to find and characterize DOWs whose corresponding prodsimplicial complexes have torsion in their homology groups.

The construction of prodsymlicial complexes for directed graphs presented in this paper is a first attempt to apply TDA to data sets consisting of graphs, designed specifically for the biological outputs called word graphs. Other graph outputs have been obtained in biology, and development of TDA for such outputs in a more general situations is desirable. More generally, we propose constructions of custom-built cell complexes designed for the purpose of studying individual biological problems. Polyhedral cells are to be specified to build a complex, in such a way that are closed under boundary operators, and that are appropriate for a given biological situation.

Word graphs of DOWs correspond to reductions of chord diagrams, and we expect applications to areas related to chord diagrams. The reductions of repeat and return words can be generalized to other rewriting systems.It is desirable to apply similar constructions of complexes and use of their homology for confluent rewriting systems.

Acknowledgments

This research was (partially) supported by the grants NSF DMS-2054321, CCF-2107267, The Simon’s Fellow grant from the Simons Foundation, the W.M. Keck Foundation. In addition this research was under auspices of the Southeast Center for Mathematics and Biology, an NSF-Simons Research Center for Mathematics of Complex Biological Systems, under National Science Foundation Grant No. DMS-1764406 and Simons Foundation Grant No. 594594.

References

[1] Ryan Arredondo “Reductions on Double Occurrence Words” In arXiv.org Ithaca: Cornell University Library, arXiv.org, 2013
[2] Ryan C. Arredondo “Properties of graphs used to model DNA recombination” ProQuest Dissertations Publishing, 2014
[3] Jonathan Burns et al. “Four-regular graphs with rigid vertices associated to DNA recombination” In Discrete Applied Mathematics 161.10, 2013, pp. 1378–1394
[4] Jonathan Burns et al. “Recurring patterns among scrambled genes in the encrypted genome of the ciliate Oxytricha trifallax” In Journal of Theoretical Biology 410, 2016, pp. 171–180
[5] J. Carter, Victoria Lebed and Seung Yeop Yang “A prismatic classifying space” In Nonassociative mathematics and its applications 721, Contemp. Math. Amer. Math. Soc., Providence, RI, 2019, pp. 43–68
[6] J. Carter, Atsushi Ishii, Masahico Saito and Kokoro Tanaka “Homology for quandles with partial group operations” In Pacific Journal of Mathematics 287.1, 2017, pp. 19–48
[7] Andre R.. Cavalcanti and Laura F. Landweber “Insights into a Biological Computer: Detangling Scrambled Genes in Ciliates” In Nanotechnology: Science and Computation, Natural Computing Series Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 349–359
[8] “Classes of Directed Graphs”, Springer Monographs in Mathematics Cham: Springer International Publishing, 2018
[9] Daniel A. Cruz et al. “Insertions Yielding Equivalent Double Occurrence Words” In Fundamenta Informaticae 171, 2020, pp. 113–132
[10] The Sage Developers et al. “SageMath, version 9.0”, 2020 URL: http://www.sagemath.org
[11] Reinhard Diestel “Graph Theory (Graduate Texts in Mathematics)” New York: Springer, 2005
[12] Lina Fajardo Gómez “Methods in Discrete Mathematics to Study DNA Rearrangement Processes”, 2022
[13] Joan Feigenbaum “Directed cartesian-product graphs have unique factorizations that can be computed in polynomial time” In Discrete Applied Mathematics 15.1, 1986, pp. 105–110
[14] Alexander Grigor’yan, Yong Lin, Yuri Muranov and Shing-Tung Yau “Homologies of path complexes and digraphs”, 2013 arXiv:1207.2834 [math.CO]
[15] Alexander A. Grigor’yan, Yu V. Muranov, Yong Lin and Shing-Tung Yau “Path Complexes and their Homologies” In Journal of Mathematical Science 248, 2020, pp. 564–599
[16] Mustafa Hajij, Nataša Jonoska, Denys Kukushkin and Masahico Saito “Graph based analysis for gene segment organization In a scrambled genome” In Journal of Theoretical Biology 494, 2020, pp. 110215
[17] Allen Hatcher “Algebraic Topology”, Algebraic Topology New York: Cambridge University Press, 2002
[18] Wilfried Imrich and Sandi Klavžar “Product Graphs: Structure and Recognition” New York: Wiley, 2000
[19] Nataša Jonoska, Lukas Nabergall and Masahico Saito “Patterns and Distances in Words Related to DNA Rearrangement” In Fundamenta Informaticae 154, 2017, pp. 225–238
[20] Dimitry Kozlov “Combinatorial Algebraic Topology” New York: Springer Science & Business Media, 2007
[21] Paolo Masulli and Alessandro E. Villa “The topology of the directed clique complex as a network invariant” In SpringerPlus 5.1 Cham: Springer International Publishing, 2016, pp. 388–388
[22] David M. Prescott and Arthur F. Greslin “Scrambled actin I gene in the micronucleus of Oxytricha nova” In Developmental genetics 13.1 Hoboken: Wiley Subscription Services, Inc., A Wiley Company, 1992, pp. 66–74
[23] Michael W Reimann et al. “Cliques of Neurons Bound into Cavities Provide a Missing Link between Structure and Function” In Frontiers in computational neuroscience 11 Switzerland: Frontiers Research Foundation, 2017, pp. 48–48
[24] Vadim Georgievich Vizing “The cartesian product of graphs.” In Vyčisl. Sistemy 9, 1963, pp. 30–43
[25] V Talya Yerlici and Laura F Landweber “Programmed Genome Rearrangements in the Ciliate Oxytricha” In Microbiology spectrum 2.6, 2014

Betti Numbers of Prodsimplicial Complexes for Directed Graphs with Applications to Word Reductions

Abstract

1 Introduction

2 Prodsimplicial Homology for Directed Graphs

2.1 Directed Graphs with Source and Target

Definition 2.1 (weakly directed, consistently directed).

Example 2.2.

Definition 2.3 (Cartesian product, prime graph).

2.2 Prodsimplicial Complexes for Directed Graphs

Definition 2.4 (simplicial digraph).

Definition 2.5 (prodsimplicial cell).

Example 2.6.

Definition 2.7 (prodsimplicial cell orientation).

Example 2.8 (oriented prodsimplicial cells).

Definition 2.9 (faces, facets, boundary set).

Definition 2.10 (prodsimplicial cell complex).

Example 2.11 (prodsimplicial cell complex).

2.3 Prodsimplicial Homology Groups for Directed Graphs

Definition 2.12 (boundary operator).

Proposition 2.13 (∂2=0\partial^{2}=0 on prodsimplicial cells).

Proof.

3 Betti Numbers and Generators for Consistently Directed Graphs

3.1 Generators of the First Homology Groups

Lemma 3.1.

Lemma 3.2.

Proof.

Lemma 3.3.

3.2 Generators of the Second Homology Groups

Lemma 3.4.

Proof.

Lemma 3.5.

Proof.

Remark 3.6.

3.3 Realizability of Betti Number Combinations

Lemma 3.7.

Corollary 3.8.

Proof.

Corollary 3.9.

Proof.

Lemma 3.10.

4 Word Graphs of Double Occurrence Words

4.1 Double Occurrence Words

Definition 4.1 (repeat word, return word).

Lemma 4.2.

Definition 4.3 (successor, predecessor).

Example 4.4.

4.2 Word Graphs

Definition 4.5 (global word graph).

Example 4.6.

4.3 DOW Operations and Their Effect on Word Graphs

4.3.1 Operations that Result in Isomorphic Word Graphs

Definition 4.7.

Proposition 4.8.

Proof.

Lemma 4.9.

Proof.

Example 4.10.

Definition 4.11 (square repeat or return words).

Example 4.12 (square factor, squarefree words).

Definition 4.13 (coprime).

Example 4.14 (coprime words).

Lemma 4.15.

Lemma 4.16.

Proof.

Example 4.17 (substitution results in isomorphic word graphs).

Example 4.18 (substitution does not result in isomorphic word graphs).

4.3.2 Doubling Effect

Lemma 4.19.

Proof.

Example 4.20.

4.3.3 Product Effect

Theorem 4.21.

Proof.

Remark 4.22.

5 Betti Numbers and Generators for Word Graphs

5.1 Betti Numbers of Word Graphs

Example 5.1.

5.2 Betti Numbers of Successors

6 Concluding Remarks

Acknowledgments

Proposition 2.13 ( $\partial^{2}=0$ on prodsimplicial cells).