This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Exact-22-Relation Graphs

Yangjing Long yangjing@mail.ccnu.edu.cn Peter F. Stadler stadler@bioinf.uni-leipzig.de School of Mathematics and Statistics, Center China Normal University, No. 152, Luoyu Road, Wuhan, Hubei, P. R. China Bioinformatics Group, Department of Computer Science; Interdisciplinary Center for Bioinformatics; German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig; Competence Center for Scalable Data Services and Solutions Dresden-Leipzig; Leipzig Research Center for Civilization Diseases; and Centre for Biotechnology and Biomedicine, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria Facultad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Colombia The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, United States
Abstract

Pairwise compatibility graphs (PCGs) with non-negative integer edge weights recently have been used to describe rare evolutionary events and scenarios with horizontal gene transfer. Here we consider the case that vertices are separated by exactly two discrete events: Given a tree TT with leaf set LL and edge-weights λ:E(T)0\lambda:E(T)\to\mathbb{N}_{0}, the non-negative integer pairwise compatibility graph nniPCG(T,λ,2,2)\textrm{nniPCG}(T,\lambda,2,2) has vertex set LL and xyxy is an edge whenever the sum of the non-negative integer weights along the unique path from xx to yy in TT equals 22. A graph GG has a representation as nniPCG(T,λ,2,2)\textrm{nniPCG}(T,\lambda,2,2) if and only if its point-determining quotient G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
is a block graph, where two vertices are in relation

\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
if they have the same neighborhood in GG. If GG is of this type, a labeled tree (T,λ)(T,\lambda) explaining GG can be constructed efficiently. In addition, we consider an oriented version of this class of graphs.

keywords:
Pairwise compatibility graphs , edge-labeled trees , thin graphs , block graphs , oriented graphs
MSC:
05C75 , 05C05 , 92B10

1 Introduction

Consider a tree TT with leaf set VV and a non-negative edge-weight function :E(T)0+\ell:E(T)\to\mathbb{R}^{+}_{0}. Denote by 𝒫(x,y)\mathcal{P}(x,y) the unique path between xx and yy in TT. The canonical distance function dT,:V×V0+d_{T,\ell}:V\times V\to\mathbb{R}^{+}_{0} is then defined by

dT,(x,y)=e𝒫(x,y)(e)d_{T,\ell}(x,y)=\sum_{e\in\mathcal{P}(x,y)}\ell(e) (1)

This definition is the starting point for mathematical phylogenetics, which is centered around finite additive metric spaces and their generalizations [1, 2]. It also serves as a basis for defining a large class of graphs that in the recent past has received considerable attention. The pairwise compatibility graphs G=PCG(T,,dmin,dmax)G=\mathrm{PCG}(T,\ell,d_{\min},d_{\max}) has vertex set VV and edges

uvE(G)if and only ifdmindT,w(u,v)dmaxuv\in E(G)\quad\textrm{if and only if}\qquad d_{\min}\leq d_{T,w}(u,v)\leq d_{\max} (2)

Originally introduced in the context of phylogenetics [3], they have received considerable interest in the last years, see [4, 5] and the references therein, as well as [6, 7, 8]. A further generalization of “multi-interval” PCGs in explored in [9]. In the setting of PCGs and most phylogenetic applications one usually stipulates that (e)>0\ell(e)>0, measuring e.g. the time between two distinct events associated with adjacent vertices of TT. A class of graphs that is conceptually closely are the exact kk-leaf powers [10], for which λ(e)=1\lambda(e)=1 for all edges of TT and dmin=dmax=kd_{\min}=d_{\max}=k.

In an alternative interpretation, (e)\ell(e) models the number of discrete evolutionary events along an edge of ee of TT. This is of interest in particular in the context of so-called rare genomic changes such as the gene or loss of a particular gene of gene family or a particular genomic rearragement [11]. Some of these convey phylogenetic information that is (nearly) free of homoplasy, i.e., the independent occurrance in independent lineages. Examples of such rare events are the emergence of novel microRNA families [12] or rearrangements of the genomic gene order [13]. Since such events are very unlikely to have occurred more than once in the same manner, they identify phylogenetic groupings that share such an innovation with very little ambiguity. This provides information to resolve also parts the phylogenetic tree where classical, sequence-based methods fail [14]. In this context it is necessary to allow (e)=0\ell(e)=0 because the events of interest are by definition so rare that not all taxa will be distinguished by them. In the same vein it important that events are discrete and hence an integer-valued weight function λ:E(T)0\lambda:E(T)\to\mathbb{N}_{0}. Both conditions on \ell cause subtle but important differences in comparison with the usual definition of PCGs that requires non-zero edge length but otherwise allows arbitrary real values. We will denote these “non-negative integer pairwise compatibility graph” by nniPCG to distinguish them from the better studied class of PCGs with non-zero real-valued edge weights λ\lambda.

The special case in which two leaves xx and yy of TT are separated by a single event, corresponding to graphs of the form nniPCG(T,λ,1,1)\mathrm{nniPCG}(T,\lambda,1,1), was explored in [15] as a model of rare events in evolution. It turns out that this graph class coincides with the forests. The graphs nniPCG(T,λ,1,)\mathrm{nniPCG}(T,\lambda,1,\infty) requiring at least one event along the path between two leaves also have a very simple structure: they are exactly the complete multipartite graphs [16].

Considering a rooted tree T\vec{T} instead of an unrooted tree TT, it is natural to consider the digraphs with edges (x,y)(x,y) whenever a certain number of events occured between the last common ancestor lca(x,y)\mathop{lca}(x,y) and yy. This construction appears naturally in the context of horizontal gene transfer (HGT), where one asks whether lca(x,y)\mathop{lca}(x,y) and yy are separated by at least one HGT event111HGT refer to the import of gene from an unrelated species e.g. through an infection, ingestion, acquisition via a plasmid. This gives rise to the class of Fitch graphs [17], which form a subclass of di-cographs introduced by [18]. Their underlying undirected graphs are exactly the nniPCG(T,λ,1,)\mathrm{nniPCG}(T,\lambda,1,\infty), i.e., the complete multipartite graphs.

A related construction requires a certain number of events between lca(x,y)\mathop{lca}(x,y) and yy and excludes all events between lca(x,y)\mathop{lca}(x,y) and xx. This class of graphs appears naturally when events are directed, i.e., when it is (in general) no possible to revert the effect of an operation in a single step. Probably the best studied type of single genomic events of this type are so-called Tandem-Duplication-Random-Loss events, during which a genomic interval is duplicated and then one of the two copies of each gene is lost at random [19]. The antisymmetric digraphs obtained by single events are characterized in [15].

Our interest in the graphs nniPCG(T,λ,k,k)\mathrm{nniPCG}(T,\lambda,k,k) for general k1k\geq 1 also stems from rare-event phylogenetic data. Since we assume an underlying tree, the distance matrix dTd_{T} is additive and its entries are small non-negative integers. The fact that all edge lengths (e)\ell(e) are also integers of course imposes additional constraints. As demonstrated e.g. in the context of orthology assignment (a related problem with vertex labeled trees for which the corresponding graphs turn out to be cographs), graph editing can be employed to correct empirically estimated input graphs [20]. This approach requires, however, that constraint on the graphs that can appear are known. In the case of rare-event phylogenetics, we know that the graph with edge set {xy|dT(x,y)=k}\{xy|d_{T}(x,y)=k\} must be a nniPCG(T,λ,k,k)\mathrm{nniPCG}(T,\lambda,k,k). In the rare-event scenario, the number of pairs of nodes with dT(x,y)=kd_{T}(x,y)=k will quickly decrease with kk, so that the empirical input graphs will have few edges for larger values of kk and thus rarely reveal obstructions. Hence only small value of kk are of practical value for detecting measurement errors in the data. Since the nniPCG(T,λ,1,1)\mathrm{nniPCG}(T,\lambda,1,1) are forests, the corresponding graph editing problem amounts to identifying spanning forests, and possible false positive events are edges in cycles. False negatives are not detectable for k=1k=1 since there are no non-tree graphs that would become trees by inserting edges. They could be detected, however, as missing edges in the empirical graph for k=2k=2 compared to the most similar member of nniPCG(T,λ,2,2)\mathrm{nniPCG}(T,\lambda,2,2). In this contribution, therefore, we are interested in the characterization of the graphs nniPCG(T,λ,2,2)\mathrm{nniPCG}(T,\lambda,2,2), in which edges correspond to exactly two events between two leaves. This graph class is very different from the exact-2-leaf power graphs, which are known to coincide with the disjoint unions of cliques [21, 10]. In contrast, we shall see below that e.g. every path also has a representation as nniPCG(T,λ,2,2)\mathrm{nniPCG}(T,\lambda,2,2).

This contribution is organized as follows: We first consider a few general properties of the slightly more general exactly-kk-relation 𝑘\overset{k}{\sim} before investigating for some small graphs and simple graph families whether they can be respresented with respect to the exactly-22-relation 2\overset{2}{\sim} on the leaf set of some tree. Here, we consider the case that dT(x,y)>0d_{T}(x,y)>0, i.e., that all leaves are separated by at least one event, and then relax this constraint and characterize the entire graph class nniPCG(T,λ,2,2)\mathrm{nniPCG}(T,\lambda,2,2) for non-negative, integer λ\lambda. Our main result is that these graphs are those whose quotient with respect to the false twin (R-thinness) relation is a block graph. We then consider the oriented version of the problem and give characterization in terms of forbidden subgraphs.

2 Simple Properties of the Exactly-kk-Relation

We shall see that the restriction to integer edge weights on the one hand, and the admission of zero-weights on the other hand, make the graphs nniPCG(T,λ,k,k)\mathrm{nniPCG}(T,\lambda,k,k) quite different from the exact-kk-leaf power graphs studies systematically in [10]. While it is true that every PCG(T,λ,dmin,dmax)\mathrm{PCG}(T,\lambda,d_{\min},d_{\max}) with non-negative real weigths λ\lambda and bounds dmind_{\min} and dmaxd_{\max} also has a representation as PCG(T,λ^,d^min,d^max)\mathrm{PCG}(T,\hat{\lambda},\hat{d}_{\min},\hat{d}_{\max}) with integer weights and bounds [22, Lemma 2], the restriction to integer weights clearly affects the definition of graph classes. For instance, the PCG class with rational weights and bounds dmin=dmax=1d_{\min}=d_{\max}=1 contains the nniPCG(T,λ,k,k)\mathrm{nniPCG}(T,\lambda,k,k) for all kk\in\mathbb{N}. Throughout this contribution we use a notation that is inspired by related work in mathematical phylogenetics.

Definition 1.

Let (T,λ)(T,\lambda) be an unrooted tree with leaf set LL and edge-labeling function λ:E(T)0\lambda:E(T)\to\mathbb{N}_{0}. For x,yLx,y\in L we consider the exactly-kk-relation 𝑘\overset{k}{\sim} defined by x𝑘yx\overset{k}{\sim}y if the (unique) path (x,y)\mathbb{(}x,y) from xx to yy in TT satisfies e𝒫(x,y)λ(e)=k\sum_{e\in\mathcal{P}(x,y)}\lambda(e)=k.
Furthermore, we say (T,λ)(T,\lambda) explains a graph G(L,E)G(L,E) (with respect to the exactly-kk-relation) if {x,y}E\{x,y\}\in E if and only if x𝑘yx\overset{k}{\sim}y.

We consider unrooted instead of rooted tree since the distances dT(x,y)d_{T}(x,y) and thus the exactly-kk-relation 𝑘\overset{k}{\sim} contains no information on position of root. In fact, it is well known [23, 24] that a metric dd of the form (1) uniquely defines an unrooted tree. Therefore, one can only hope to reconstruct the unrooted tree TT.

Refer to caption

(T,λ)(T,\lambda)                                                                  (T,λ)(T^{\prime},\lambda^{\prime})

Figure 1: Illustration of Definition 2. The edge-labeled tree (T,λ)(T^{\prime},\lambda^{\prime}) on the r.h.s. is displayed by (T,λ)(T,\lambda). It is obtained as the restriction of TT to the non-gray vertices. Correponding vertices are shown in matching locations. All edges ee that are “merged” into single edges have their weights annotated. Edges that remain unchanged or deleted are only shown in color (black for λ(e)=0\lambda(e)=0, red for λ(e)=2\lambda(e)=2, and blue of λ(e)=2\lambda(e)=2) without displaying the weight explicitly.
Definition 2.

The edge-labeled tree (T,λ)(T,\lambda) displays the edge labeled tree (T,λ)(T^{\prime},\lambda^{\prime}) if (T,λ)(T^{\prime},\lambda^{\prime}) can be obtained from (T,λ)(T,\lambda) by first removing every edge and vertex from (T,λ)(T,\lambda) that is not contained in a path connecting two leaves of TT^{\prime}, and then contracting every path 𝒫(u,v)\mathcal{P}(u,v) in the remainder of TT that has only interior vertices of degree 22 by a single edge ee^{\prime} in TT^{\prime} with label λ(e)=e𝒫(u,v)λ(e)\lambda^{\prime}(e^{\prime})=\sum_{e\in\mathcal{P}(u,v)}\lambda(e).

In particular, therefore it is sufficient to consider phylogenetic trees, that is, trees TT in which every interior node xV(T)Lx\in V(T)\setminus L has degree at least 33. Fig. 1 gives an example. A simple, but important consequence of Definition 1 is the following

Lemma 3.

If (T,λ)(T,\lambda) displays (T,λ)(T^{\prime},\lambda^{\prime}), (T,λ)(T,\lambda) explains G(L,𝑘)G(L,\overset{k}{\sim}) and (T,λ)(T^{\prime},\lambda^{\prime}) explains G(L,𝑘)G^{\prime}(L^{\prime},\overset{k}{\sim}).
Then G(L,𝑘)=G(L,𝑘)[L]G^{\prime}(L^{\prime},\overset{k}{\sim})=G(L,\overset{k}{\sim})[L^{\prime}], the subgraph of G(L,𝑘)G(L,\overset{k}{\sim}) induced by LL^{\prime}.

Proof.

If (T,λ)(T,\lambda) displays (T,λ)(T^{\prime},\lambda^{\prime}) then e𝒫T(u,v)λ(e)=e𝒫T(u,v)λ(e)\sum_{e\in\mathcal{P}_{T}(u,v)}\lambda(e)=\sum_{e\in\mathcal{P}_{T}(u,v)}\lambda^{\prime}(e) for all u,vL(T)L(T)u,v\in L(T^{\prime})\subseteq L(T), and thus we conclude that for all u,vL(T)u,v\in\in L(T^{\prime}), we have u𝑘(T,λ)vu\overset{k}{\sim}_{(T^{\prime},\lambda^{\prime})}v if and only if u𝑘(T,λ)vu\overset{k}{\sim}_{(T^{\prime},\lambda^{\prime})}v, i.e., G(L,𝑘)G^{\prime}(L^{\prime},\overset{k}{\sim}) is the subgraph of G(L,𝑘)G(L,\overset{k}{\sim}) induced by LL^{\prime}. ∎

It follows that “being explained with respect to the exactly-kk-relation” is a hereditary graph property for all kk.

We also note the following immediate consequence of the definition.

Lemma 4.

If (T,λ)(T,\lambda) explains GG with respect to 1\overset{1}{\sim}, then (T,kλ)(T,k\lambda) explains GG with respect to 𝑘\overset{k}{\sim}.

Lemma 5.

Let GG be a graph with connected components GiG_{i}, i=1,,Ni=1,\dots,N. Then there is an edge-labeled tree (T,λ)(T,\lambda) explaining TT with respect to 𝑘\overset{k}{\sim} if and only if there are edge labeled trees (Ti,λi)(T_{i},\lambda_{i}) explaining GiG_{i} for all i=1,,Ni=1,\dots,N.

Proof.

The condition is necessary because of heredity. In order to see sufficiency, we can construct (T,λ)(T,\lambda) from the disjoint union of the (Ti,λi)(T_{i},\lambda_{i}) in the following way: first we arrange them as an arbitrary tree 𝒯\mathcal{T}. Then we replace each (K2,λ(e)=k)(K_{2},\lambda(e)=k) by S2S_{2} with the two edges ee^{\prime} and e′′e^{\prime\prime} labeled such that λ(e)+λ(e′′)=k\lambda(e^{\prime})+\lambda(e^{\prime\prime})=k. Now choose for each tree TiT_{i} an arbitrary inner vertex xix_{i} in TiK1T_{i}\neq K_{1} and the unique vertex xix_{i} if Ti=K1T_{i}=K_{1}. Finally, we connect xix_{i} and xjx_{j} by an edge eije_{ij} with λ(eij)=k+1\lambda(e_{ij})=k+1 if and only if TiT_{i} and TjT_{j} are adjacent in 𝒯\mathcal{T}. To verify that (T,λ)(T,\lambda) indeed explains GG we observe: (i) If xx and yy are leafs from different connected components of GG, they are located in different subtrees TiT_{i} and thus the path connecting them contains one of the edges label k+1k+1, thus xx and yy are not in relation 𝑘\overset{k}{\sim}. ∎

It is therefore sufficient to consider connected graphs.

Definition 6.

An edge-labeled graph (T,λ)(T,\lambda) is canonical if TT is phylogenetic and λ(e)0\lambda(e)\neq 0 for all interior edges.

Lemma 7.

Let (T^,λ^)(\hat{T},\hat{\lambda}) be the edge labeled tree obtained from (T,λ)(T,\lambda) by (1) replacing every path 𝒫(u,v)\mathcal{P}(u,v) in TT whose interior vertices have degree 22 by a single edge ee^{\prime} in TT^{\prime} with label λ(e)=e𝒫(u,v)λ(e)\lambda^{\prime}(e^{\prime})=\sum_{e\in\mathcal{P}(u,v)}\lambda(e) and (2) contracting every interior edge with λ(e)=0\lambda(e)=0. The tree (T^,λ^)(\hat{T},\hat{\lambda}) is uniquely defined, canonical, and explains the same graph as (T,λ)(T,\lambda).

Proof.

The maximal paths with interior vertices of degree 22 in TT are disjoint and thus can be treated independently. By construction, any such path 𝒫\mathcal{P} can also be stepwisely replaced by edges, eventually arriving at the same edge weight for the single edge that remains. Given TT, the resulting tree T^\hat{T} is therefore unique and contains no vertex of degree 22. It is therefore phylogenetic. Since an interior edge with label λ(e)=0\lambda(e)=0 does not contribute the total weight of any path that runs through it, it can be contracted without changing the total path weights between leaves. Thus (T,λ)(T,\lambda) and (T^,λ^)(\hat{T},\hat{\lambda}) explain the same graph. ∎

Consider two leaves x,yLx,y\in L in an edge-labeled tree (T,λ)(T,\lambda) such that x0yx\overset{0}{\sim}y, i.e., dT(x,y)=0d_{T}(x,y)=0, and another leaf zL{x,y}z\in L\setminus\{x,y\}. The triangle inequalities dT(x,z)dT(x,y)+dT(y,z)d_{T}(x,z)\leq d_{T}(x,y)+d_{T}(y,z) and dT(y,z)dT(y,x)+dT(x,z)d_{T}(y,z)\leq d_{T}(y,x)+d_{T}(x,z) implies dT(x,z)=dT(y,z)d_{T}(x,z)=d_{T}(y,z). Thus xx and yy have the same neighbors in graph GG explained by TT, i.e., NG(x)=NG(y)N_{G}(x)=N_{G}(y).

Definition 8.

Let GG be a graph. For each xV(G)x\in V(G) denote by N(x)N(x) the neighbors of xx. Two vertices xx and yy are false twins, x

y
x\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}y
, if N(x)=N(y)N(x)=N(y).

In contrast, true twins, which play no role here, satisfy N(x){x}=N(y){y}N(x)\cup\{x\}=N(y)\cup\{y\}. By definition, false twins x

y
x\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}y
are non-adjacent, while true twins are always adjacent [25].

Refer to caption

G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
            GG         G[r1,,r6]G[r_{1},\dots,r_{6}]

Figure 2: The false twin classes are indicated by dashed outlines in the graph GG (middle) and form the vertices (in red) of the quotient graph G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
on the right. For comparison, the orginal edges between the

\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
classes are shown. The subgraph G[r1,r2,r3,r4,r5,r6]G[r_{1},r_{2},r_{3},r_{4},r_{5},r_{6}] of GG induced by one representative rir_{i} for each

\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
-class is shown on the left. It is isomorphic to G/

G/\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
irrespective of the choice of the representative vertices rir_{i} for the

\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
classes.

The false twin (R-thinness) relation

\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
has been well studied in the literature, in particular in the context of graph products [26]. It is well known that

\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
is an equivalence relation, see e.g. [26, sect. 8.2]. Its equivalence classes, which we denote by RiR_{i}, i=1,,hi=1,\dots,h, are totally disconnected in GG because, by definition, xN(x)x\notin N(x). Denote by G[r1,r2,,rh]G[r_{1},r_{2},\dots,r_{h}] be the subgraph of GG induced by one arbitrarily chosen representative riRir_{i}\in R_{i} of each false twin class. Since for any xRix\in R_{i} and yRjy\in R_{j} we have xyE(G)xy\in E(G) if and only xyE(G)x^{\prime}y^{\prime}\in E(G) for all xRix^{\prime}\in R_{i} and all yRjy^{\prime}\in R_{j} we observe that G[r1,r2,,rh]G[r_{1},r_{2},\dots,r_{h}] and the quotient graph G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
are isomorphic. An illustration is given in Fig. 2.

Lemma 9.

Let (T,λ)(T,\lambda) be a canonical tree explaining a connected graph GG with respect to 𝑘\overset{k}{\sim}, and let WW be a set of sibling leaves attached to the same parent qq with λ(qw)=0\lambda(qw)=0 for all wWw\in W. Then WW is contained in a false twin class for the graph GG explained by (T,λ)(T,\lambda) with respect to 𝑘\overset{k}{\sim} for all k>0k>0.

Proof.

Consider a node yLWy\in L\setminus W. Then the total weight of the path between yy and every wWw\in W is the same. Furthermore, the total path weight between any two vertices in w,w′′Ww^{\prime},w^{\prime\prime}\in W is 0, i.e., there is no edge between ww^{\prime} and w′′w^{\prime\prime}. Thus N(w)=N(w′′)N(w^{\prime})=N(w^{\prime\prime}), i.e., w

w′′
w^{\prime}\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}w^{\prime\prime}
for all w,w′′Ww^{\prime},w^{\prime\prime}\in W. ∎

A graph is called R-thin [27], point determining graph [28] or mating graph [29] if

\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
is discrete, i.e., every false twin class consists of only a single point. Clearly, G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
is R-thin. R-thin graphs have also been studied from the point of view of combinatorial enumeration [30, 31]. Algorithms for prime-factorization of graphs, furthermore, often operate on G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
, since R-thinness ensures uniqueness of the factorization and allows for highly efficient algorithms [27, 32, 26]. Below we show that it also suffices to consider G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
, i.e., R-thin graphs, in our setting. Indeed, a simple consequence of Lemma 9 is

Corollary 10.

If GG is R-thin and (T,λ)(T,\lambda) is a canonical tree explaining GG with respect to 𝑘\overset{k}{\sim}, then 0\overset{0}{\sim} is discrete.

Algorithm 1 Compute (T,λ)(T,\lambda) from (T,λ)(T^{*},\lambda^{*}) and false twin classes RiR_{i} with representatives riRir_{i}\in R_{i}.
0:  (T,λ)(T^{*},\lambda^{*}), (ri,Ri)(r_{i},R_{i}) for i=1,,hi=1,\dots,h.
1:  for all false twin classes RiR_{i} with |Ri|>1|R_{i}|>1 do
2:     qq\leftarrow unique neighbor of leaf rir_{i} in (T,λ)(T^{*},\lambda^{*})
3:     remove rir_{i} from (T,λ)(T^{*},\lambda^{*})
4:     if  λ(qri)k/2\lambda^{*}(qr_{i})\neq k/2  then
5:        insert all leaves rRir\in R_{i} with edges qrqr and λ(qr)=λ(qri)\lambda(qr)=\lambda^{*}(qr_{i})
6:     else
7:        insert a node qq^{\prime} and the edge qqqq^{\prime} with λ(qq)=λ(qri)\lambda(qq^{\prime})=\lambda^{*}(qr_{i})
8:        insert all leaves rRir\in R_{i} with edges qrq^{\prime}r and λ(qr)=0\lambda(q^{\prime}r)=0
9:     end if
10:  end for

For an illustrative example see Fig. 3.

Refer to caption

GG G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
(T,λ)(T^{*},\lambda^{*}) (T2,λ2)(T_{2},\lambda_{2}) (T2,4,λ2,4)(T_{2,4},\lambda_{2,4})

Figure 3: The graph GG has two non-trivial false twin classes R2={2,2}R_{2}=\{2,2^{\prime}\} and R4={4,4,4′′}R_{4}=\{4,4^{\prime},4^{\prime\prime}\}. Its R-thin quotient G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
is explained w.r.t. 2\overset{2}{\sim} by the tree (T,λ)(T^{*},\lambda^{*}). The tree (T2,λ2)(T_{2},\lambda_{2}) is obtained from (T,λ)(T^{*},\lambda^{*}) using first alternative in Alg. 1 (line 5), amounting to replacing 22 by |R2||R_{2}| leaves with the same parent. In the second step, vertex 44 is replaced by the subtree according to the second alternative in Alg. 1 (lines 7 and 8).
Theorem 11.

GG can be explained w.r.t. 𝑘\overset{k}{\sim} if and only G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
can explained w.r.t. 𝑘\overset{k}{\sim}. If (T,λ)(T^{*},\lambda^{*}) is a canonical tree explaining G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
, then a canonical tree (T,λ)(T,\lambda) explaining GG is obtained by Algorithm 1.

Proof.

Since GG can be explained and G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
is an induced subgraph of GG, G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
can be explained w.r.t. to 𝑘\overset{k}{\sim} by a tree that we denote by (T,λ)(T^{*},\lambda^{*}). Let rr be the representative of the false twin class RR of GG, and let xRx\in R. Insert xx into (T,λ)(T^{*},\lambda^{*}) are a sibling of rr and set λ(x)=λ(r)=λ(r)\lambda(x)=\lambda(r)=\lambda^{*}(r). Then xx and rr have the same total path weights to all other vertices. This remains true if each leaf rr in (T,λ)(T^{*},\lambda^{*}) is replaced in this manner by the set RR of sibling vertices with rRr\in R. Since no two vertices in RR are adjacent we require that λ(r)+λ(x)k\lambda(r)+\lambda(x)\neq k, i.e., λ(r)k/2\lambda^{*}(r)\neq k/2. If this conditions is satisfied, then (T,λ)(T,\lambda) explains GG with respect to 𝑘\overset{k}{\sim}.

If λ(qr)=k/21\lambda^{*}(qr)=k/2\geq 1, Alg. 1 inserts an extra vertex qq^{\prime} adjacent to qq with λ(qq)=λ(qr)0\lambda(qq^{\prime})=\lambda^{*}(qr)\neq 0. Since we assumed that (T,λ)(T^{*},\lambda^{*}) was canonical, qq has at least two more neighbors, i.e., the resulting tree is again canonical. Since RR is attached with edge weights λ(qr)=0\lambda(q^{\prime}r)=0 we conclude that (i) the total path weight between rr^{\prime} and qq is k/2k/2 and (ii) λ(rq)+λ(qr′′)=0\lambda(r^{\prime}q^{\prime})+\lambda(q^{\prime}r^{\prime\prime})=0 for all r,r′′Rr^{\prime},r^{\prime\prime}\in R, i.e., rr^{\prime} and r′′r^{\prime\prime} are not adjacent in the graph explained by (T,λ)(T,\lambda). Hence rr^{\prime} and r′′r^{\prime\prime} have the same neighbors and thus belong the same false twin class of GG. Since the total path weights between all representatives of false twin classes are preserved by this construction, (T,λ)(T,\lambda) indeed explains GG with respect to 𝑘\overset{k}{\sim}. We note, finally, that (T,λ)(T,\lambda) is again canonical because qq^{\prime} has at least three neighbors (the parent qq and at least members of RR), and all interior edges the resulting tree have non-zero labels as long as (T,λ)(T^{*},\lambda^{*}) was canonical. ∎

From here on we will therefore assume the (T,λ)(T,\lambda) is canonical, i.e., it has non-zero labels for all inner edges of TT. It is important to note, however, that we still need to consider zero weights on the edges incident with leaves. For instance, it not difficult to check that the graph G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
in Fig. 3, i.e., K3+eK_{3}+e, cannot be explained by a tree with only non-zero edge weights.

3 Graphs Explained w.r.t. 2\overset{2}{\sim}

We will first consider the special case of edge labelings with discrete 0\overset{0}{\sim}. In this case every interior vertex of TT is incident with at most one zero-weight edge.

Refer to caption
Figure 4: The two connected graphs P3P_{3} and K3K_{3} on three vertices, the graph K3+eK_{3}+e on four vertices, and bowtie graph BB on five vertices are explained by unique {0,1,2}\{0,1,2\}-edge-labeled trees with respect to 2\overset{2}{\sim} and discrete relation 0\overset{0}{\sim}.

The trivial cases K1K_{1} and K2K_{2} are explained by the trees K1K_{1} and K2K_{2} with label λ(e)=k\lambda(e)=k at the unique edge ee, respectively. For |L|=3|L|=3 there is only a single phylogenetic tree, the star S3S_{3} with three leaves and two connected graphs, P3P_{3}, and K3K_{3}, see Fig. 4. We denote the edges from the center to leaf xix_{i} by eie_{i}, 1i31\leq i\leq 3. Fig. 4 also shows that class of graphs explained w.r.t. 2\overset{2}{\sim} is much larger than the exact-2-leaf powers graphs, which comprise only the disjoint unions of cliques [21, 10].

Lemma 12.

There are unique labelings λP3\lambda_{P_{3}} and λK3\lambda_{K_{3}} of the tree S3S_{3} with discrete 0\overset{0}{\sim} that explain the graphs P3P_{3} and K3K_{3}, respectively:
λP3(e1)=λP3(e2)=2\lambda_{P_{3}}(e_{1})=\lambda_{P_{3}}(e_{2})=2 and λP3(e3)=0\lambda_{P_{3}}(e_{3})=0;
λK3(e1)=λK3(e2)=λK3(e3)=1\lambda_{K_{3}}(e_{1})=\lambda_{K_{3}}(e_{2})=\lambda_{K_{3}}(e_{3})=1;

Proof.

S3S_{3} contains three paths on length two. Adopting the notation of Fig. 4 for both cases P3P_{3} and K3K_{3} we need λ(e1)+λ(e2)=2\lambda(e_{1})+\lambda(e_{2})=2 and λ(e3)+λ(e2)=2\lambda(e_{3})+\lambda(e_{2})=2. Therefore λ(e1){0,1,2}\lambda(e_{1})\in\{0,1,2\}. Explicitly enumerating the three cases yields:
λ(e1)=0\lambda(e_{1})=0 implies λ(e2)=2\lambda(e_{2})=2 and thus λ(e3)=0\lambda(e_{3})=0, in which case x10x3x_{1}\overset{0}{\sim}x_{3}, contradicting the fact 0\overset{0}{\sim} is discrete.
λ(e1)=1\lambda(e_{1})=1 implies λ(e2)=λ(e3)=1\lambda(e_{2})=\lambda(e_{3})=1, and thus G(2)=K3G(\overset{2}{\sim})=K_{3}.
λ(e1)=2\lambda(e_{1})=2 implies λ(e2)=0\lambda(e_{2})=0 and thus λ(e3)=2\lambda(e_{3})=2, whence G(2)=P3G(\overset{2}{\sim})=P_{3}. ∎

Lemma 13.

The path P4P_{4} on four vertices xyzux-y-z-u is explained only be the tree T=(xy)pq(zu)T=(xy)p-q(zu) with labels λ(xp)=λ(qu)=λ(pq)=2\lambda(xp)=\lambda(qu)=\lambda(pq)=2 and λ(yp)=λ(qz)=0\lambda(yp)=\lambda(qz)=0.

Proof.

First we observe that the path P4P_{4} on four vertices cannot be explained by any labeling of a S4S_{4}. This leaves the fully resolved tree on four vertices. Its interior edge pqpq cannot be labeled 0. First consider λ(pq)=1\lambda(pq)=1. It cannot contain an S3S_{3} with all three edges labeled 11 since this would induce a triangle, i.e., at most one neighbor of pp, say xx, is attached by a 1-edge. The other neighbor of pp, call it yy, then must be attached by a 0-edge, since otherwise yy is isolated. In order for yy not to be isolated, qq also must have a neighbor, that is connected via a 11-edge, say λ(qz)=1\lambda(qz)=1. The same argument implies the the remaining leaf uu must be connected to qq with λ(qu)=0\lambda(qu)=0. This tree, however, explains the non-connected graph K2K2K_{2}\cup K_{2}. Thus λ(pq)=2\lambda(pq)=2. Connectedness implies that at least one of the leaves attached to pp and qq must be labeled 0, say λ(py)=λ(qz)=0\lambda(py)=\lambda(qz)=0, and thus yy and zz are adjacent in GG. It remains to consider the possible coloring for the remaining to edges λ(px)\lambda(px) and λ(qu)\lambda(qu). If λ(px)=1\lambda(px)=1 then xx is isolated for all choices of λ(qz)\lambda(qz). An analogous statement is true for λ(qu)=1\lambda(qu)=1. For λ(px)=λ(qu)=0\lambda(px)=\lambda(qu)=0 we obtain K4eK_{4}-e. If λ(px)=0\lambda(px)=0 and λ(qu)=2\lambda(qu)=2 we obtain S3S_{3}. The same is true for λ(px)=2\lambda(px)=2 and λ(qu)=0\lambda(qu)=0. Thus the only remaining choice is λ(px)=λ(qu)=2\lambda(px)=\lambda(qu)=2. It indeed explains the path xyzux-y-z-u, see Fig. 5. ∎

The fact that SnS_{n} is the only “exact-2-leaf root” of KnK_{n}, i.e., the only tree with unit edge weights that explains KnK_{n} is shown in [10, Lemma 2]. It is not difficult to see that there is also no other choice of non-negative integer labels on SnS_{n} that explains KnK_{n}:

Lemma 14.

The complete graph KnK_{n} is explained with respect to 2\overset{2}{\sim} by the star SnS_{n} with the unique labeling function λ(e)=1\lambda(e)=1 for all eE(Sn)e\in E(S_{n}).

Proof.

Is is easy to check that this construction explains KnK_{n} for all n3n\geq 3. The trivial cases n=1n=1 and n=2n=2 are explained in the text. K3K_{3} is only explained by S3S_{3} with all edges labeled λ(e)=1\lambda(e)=1. Since the start SnS_{n} displays S3S_{3} corresponding to every K3K_{3} subgraph, all edges of SnS_{n} must be labeled by λ(e)=1\lambda(e)=1.

We note for later reference that the uniqueness results in Lemmas 13 and 14 do not require the precondition that 0\overset{0}{\sim} is discrete. This observation will be important in the following section.

Refer to caption
Figure 5: Connected graphs on four vertices that are explained (green outline) or cannot be explained (red background) with respect to 2\overset{2}{\sim}. The explaining trees are shown above the graphs. The path P4P_{4} and the star S3S_{3} are explained because all trees are already explained by 1\overset{1}{\sim}. The K4K_{4} is explained by Lemma 14. The graph K3+eK_{3}+e can be explained by a fully resolved phylogenetic tree with a edge labeling explicitly given in the proof of Lemma 15. In contrast, C4C_{4} and K4eK_{4}-e cannot be explained with respected to 2\overset{2}{\sim} with an edge labeling with discrete 0\overset{0}{\sim} according to Lemma 15.
Lemma 15.

There is no edge-labeled tree (T,λ)(T,\lambda) with discrete 0\overset{0}{\sim} that explains the graphs C4C_{4} and K4eK_{4}-e with respect to 2\overset{2}{\sim}. The graph K3+eK_{3}+e is explained by a unique edge-labeled tree.

Proof.

There are two topologically distinct trees for |L|=4|L|=4, the star S4S_{4} and tree T4T_{4} with a single interior split. First consider the star S4S_{4}. In order to explain K3+eK_{3}+e or K4eK_{4}-e three of the four edges must be labeled 11 (corresponding to the induced K3K_{3}. Depending on whether λ(e4)=1\lambda(e_{4})=1 or λ(e4)1\lambda(e_{4})\neq 1, the fourth vertex is either connected to all or none of the three other vertices. In order explain C4C_{4}, there must be two edges with λ(e1)=λ(e3)=2\lambda(e_{1})=\lambda(e_{3})=2 and one with λ(e2)=0\lambda(e_{2})=0 corresponding to an induced P3P_{3}. The remaining edge then must have λ(e4)=0\lambda(e_{4})=0. But then x20x4x_{2}\overset{0}{\sim}x_{4}, contradicting that 0\overset{0}{\sim} is discrete.

Now consider the treeT4T_{4}, which can be obtained from S3S_{3} by subviding one of the edges and attaching an extra leaf to the subdividing vertex. Denote by ss the (unique) inner edge of T4T_{4}. Consider K3+eK_{3}+e and K4eK_{4}-e as shown in Fig. 5. Then we must have λ(e1)=λ(e2)=1\lambda(e_{1})=\lambda(e_{2})=1. If λ(s)=0\lambda(s)=0 we recover the situation of S4S_{4}, since the inner edge does not contribute to 2\overset{2}{\sim}. On the other hand, if If λ(s)=2\lambda(s)=2, then x1x_{1} and x2x_{2} cannot be connected with x3x_{3} or x4x_{4}, contradicting the existence of K3K_{3} as induced subgraph. Thus λ(s)=1\lambda(s)=1. Then λ(e3)=0\lambda(e_{3})=0. By assumption, λ(e4)0\lambda(e_{4})\neq 0 since otherwise x30x4x_{3}\overset{0}{\sim}x_{4}. If λ(e4)=1\lambda(e_{4})=1, the x4x_{4} is an isolated vertex in GG. If λ(e4)=2\lambda(e_{4})=2, then x42x3x_{4}\overset{2}{\sim}x_{3} while x4x_{4} is not in 2\overset{2}{\sim} relation to either x1x_{1} or x2x_{2}. Thus G=K3+eG=K_{3}+e. The corresponding edge labeled tree is shown in Fig. 4. Since we have already considered all cases, K4eK_{4}-e cannot be explained with respect to 2\overset{2}{\sim}.

Finally, consider T4T_{4} and suppose that GG contains P3P_{3} as induced subgraph. There are two cases: If λ(e1)=λ(e2)=2\lambda(e_{1})=\lambda(e_{2})=2 then connectedness of GG implies that λ(s)=λ(e3)=λ(e4)=0\lambda(s)=\lambda(e_{3})=\lambda(e_{4})=0, contradicting that 0\overset{0}{\sim} is discrete. In the alternative case we can assume, w.l.o.g., that λ(e1)=2\lambda(e_{1})=2 and λ(e2)=2\lambda(e_{2})=2. Furthermore, in order to explain C4C_{4} we must have λ(e3)+λ(e4)=2\lambda(e_{3})+\lambda(e_{4})=2. If both λ(e3)=λ(e4)=1\lambda(e_{3})=\lambda(e_{4})=1. Then λ(s)=0\lambda(s)=0 and λ(s)=2\lambda(s)=2 yields G=K2K2G=K_{2}\cup K_{2}, for λ(s)=1\lambda(s)=1 we obtain S4S_{4}. In the remaining case we can choose λ(e3)=2\lambda(e_{3})=2 and λ(e4)=0\lambda(e_{4})=0. Now λ(s)=0\lambda(s)=0 contradicts discreteness of 0\overset{0}{\sim}, λ(s)=1\lambda(s)=1 yields the edgeless graph. For λ(s)=2\lambda(s)=2 we obtain P4P_{4}. Thus CC cannot be explained by T4T_{4} with respect to 2\overset{2}{\sim} by a labeling with discrete 0\overset{0}{\sim}. ∎

Refer to caption
Figure 6: Construction of trees that explain paths.

The fact that K4eK_{4}-e is a forbidden subgraph implies that two cliques in GG cannot be “glued together” by a single common edges. It is possible, however, for cliques to touch in a cut vertex as shown by the example of the bowtie graph BB, which is obtained by gluing together two triangles at a common vertex, see Fig. 4.

Graphs that can be represented as pairwise compatibility graphs of caterpillars have received special attention in the literature [5, 33, 34, 35]. It is not difficult to see that the path PhP_{h}, h3h\geq 3 can represented by a caterpillar in several settings. These results cannot be directly applied in our setting, however. Any two leaves xx and yy attached to two distinct inner vertices of a caterpillar are separated by at least three edges an thus cannot be in relation 2\overset{2}{\sim} if we assume strictly positive integer weights. It follows immediately that PhP_{h} is not an exact-2-leaf power of a caterpillar and that PhP_{h} cannot be explained by caterpillar unless zero-weights are allowed. An explicit construction in [15] shows that PhP_{h} is explained by a caterpillar with edge weights in {0,1}\{0,1\} with respect to exactly-1-relation 1\overset{1}{\sim}. Lemma 4 implies that we can use the same construction to explain PhP_{h} by a caterpillar with edge weights in {0,2}\{0,2\}, see Fig. 6. It will be important later on that this construction is indeed unique:

Lemma 16.

The path PhP_{h} has as its unique explaining tree the caterpillar (Th,λh)(T_{h},\lambda_{h}) with all inner edges and the edges connecting to the end-points of PhP_{h} labeled 22 and all edges connecting to inner vertices of PhP_{h} labeled 0.

Proof.

We first recall that the tree (T4,λ4)(T_{4},\lambda_{4}) explaining P4P_{4} is unique by Lemma 13. Now assume that for h5h\geq 5, the tree (Th1,λh1)(T_{h-1},\lambda_{h-1}) explaining Ph1P_{h-1} is unique and thus a caterpillar. Any tree (T,λ)(T,\lambda) explaining PhP_{h} therefore must display (Th1,λh1)(T_{h-1},\lambda_{h-1}), i.e., (T,λ)(T,\lambda) is obtained from ThT_{h} by subdiving one edge and attaching leaf hh and edge ehe_{h} to the new vertex, or by attaching hh and ehe_{h} to an inner vertex of Th1T_{h-1}. One easily checks that the latter yields a branched tree or a disconnected graph. The same is true is any other edge except eh1e_{h-1} and e1e_{1}, the edges adjacent to the leaves h1h-1 or 11 are subdivided. In the latter case, hh cannot be adjacent to h1h-1. In the remaining case, the edge with which h1h-1 is attached is subdivided into an interior part ss and the part eh1e_{h-1} incident with h1h-1. Since the interior part cannot carry a zero label, we must have λh(eh1)=0\lambda_{h}(e_{h-1})=0, λh(s)=2\lambda_{h}(s)=2, and λh(eh)=2\lambda_{h}(e_{h})=2. Thus the caterpillar of Fig. 6 is indeed the only choice. We emphasize that this observation remains true even is 0\overset{0}{\sim} is not assumed to be discrete. ∎

Lemma 17.

The simple cycles CpC_{p}, p5p\geq 5 cannot be explained with respect to 2\overset{2}{\sim} irrespective of whether 0\overset{0}{\sim} is discrete or not.

Proof.

Every cycle CpC_{p} contains a path Pp1P_{p-1} with one vertex less as an induced subgraph. From Lemma 16 we known that Pp1P_{p-1} has a unique explanation by a caterpillar for all p5p\geq 5. Thus any tree (T,λ)(T^{*},\lambda^{*}) explaining CpC_{p} thus must display caterpillar (Tp1,λp1)(T_{p-1},\lambda_{p-1}) and thus TT^{*} is obtained from Tp1T_{p-1} by either attaching pp and epe_{p} to inner vertex of Tp1T_{p-1} or by subdiving an edge and attaching pp and epe_{p} to the newly inserted vertex. As argued above, attachment to an inner vertex or subdivision of an edge other than e1e_{1} or ep1e_{p-1} leads to a branched tree or a disconnected graph. If epe_{p} is inserted by subdivision of e1e_{1}, then pp cannot be adjacent to p1p-1 and subdivision of ep1e_{p-1} precludes adjacency of pp and 11 for p3p\geq 3. Thus the catapillar tree (Tp1,λp1)(T_{p-1},\lambda_{p-1}) cannot be extented to tree that explains CpC_{p} for any p4p\geq 4. Note that this argument did not make the assumption that 0\overset{0}{\sim} is discrete. ∎

Let us now turn to the general case. We first note that all graphs explained w.r.t. 2\overset{2}{\sim} with discrete 0\overset{0}{\sim} are chordal, i.e., every cycle of length greater than three has a chord. Even more stringently, every cycle of length 44 corresponds to a clique in GG because the K4eK_{4}-e, i.e., the 4-cycle with a chord, is also a forbidden induced subgraph. We note that there is ample literature on the relationship of chordal graphs and PCGs, see e.g. [4, 5]. Due to the differences in the edge weight functions, it is not immediately pertinent to our discussion, however.

Lemma 18.

If GG can be explained by the exact-22-relation with discrete 0\overset{0}{\sim} and contains a Hamiltonian cycle, then GG is a complete graph.

Proof.

The assertion is trivially true for n=3n=3 and holds for n=4n=4 because C4C_{4} and K4eK_{4}-e, the only Hamiltonian graphs on 4 vertices except K4K_{4} are forbidden induced subgraphs. Now suppose the statement is true for for all |V|<p|V|<p and consider a graph with pp vertices. Since GG is chordal, there is in particular a planar triangulation of CC that is a subgraph of GG and thus there are three consecutive vertices uvwu-v-w along CC such that uwu-w is a also an edge in GG. Thus GvG\setminus v is Hamiltonian. As an induced subgraph of GG it can be explained by the exact-22-relation and thus is a complete graph by the induction hypothesis. Thus uxwu-x-w is triangle in GvG\setminus v and uxwvu-x-w-v is a cycle of length 44 in GG. Since C4C_{4} and K4eK_{4}-e cannot appear as induced subgraphs of GG, {u,v,w,x}\{u,v,w,x\} must for a clique in GG, and hence the edge {v,x}E(G)\{v,x\}\in E(G) for all xV(Gv)x\in V(G\setminus v). Thus GG is a complete graph. ∎

Lemma 19.

A graph GG with at least three vertices that can be explained by the exact-2-relation with discrete 0\overset{0}{\sim} is complete if and only if it is 2-connected.

Proof.

If GG is Hamiltonian, it is in particular also 2-connected. Now consider the case that GG is 2-connected but not Hamiltonian. Let CC be a cycle of maximal length in GG and let xx be a vertex not in CC. Then there is a cycle CC^{\prime} in GG that contains xx and at least two distinct vertices of CC since otherwise one of the vertices of CC would be a cut vertex of GG, contradicting 2-connectedness. Starting from xx, let pp and qq be first and last vertex of CC encountered along CC^{\prime}. By Lemma 18, G[C]G[C] is a complete graph, and hence there is a another Hamiltonian cycle C′′C^{\prime\prime} on G[C]G[C] so that pp and qq are consecutive along CC^{\prime}. Thus the cycle CC^{*} obtained traversing C′′C^{\prime\prime} from pp to qq and then following CC^{\prime} from qq through xx back to pp is a cycle that is strictly longer than CC, contradicting maximality. Thus GG is Hamiltonian, and hence complete. ∎

A graph GG is a block graph [36] if each of its biconnected components is a clique. Lemma 19 thus implies that every graph that can be explained with respect to the exact-22-relations with discrete 0\overset{0}{\sim} is a block graph (see Thm. 21 below for a formal proof). Algorithm 2 (illustrated in Figure 7) explicitly constructs an edge-labeled tree that explains a given block graph.

Algorithm 2 Compute (T(G),λ)(T(G),\lambda) for a connected block graph GG
0:  a connected block graph GG
1:  mark “red” all cut vertices uV(G)u\in V(G).
2:  for all cliques KK in GG do
3:     if  KK is an edge ee  then
4:        λ(e)=2\lambda(e)=2
5:     else
6:        replace KK by a star S|V(K)|S_{|V(K)|} with center cKc_{K}
7:        λ(ucK)=1\lambda(uc_{K})=1 for each uV(K)u\in V(K)
8:     end if
9:  end for
10:  for all red vertices vv do
11:     add a vertex vv^{\prime} and edge vvvv^{\prime} with λ(vv)=0\lambda(vv^{\prime})=0
12:     exchange the vertex names vv and vv^{\prime}
13:  end for
14:  return  (G,λ)(G,\lambda)
Lemma 20.

Algorithm 2 transforms any connected block-graph GG into an edge-labeled tree that explains GG with respect to the exactly-22-relation 2\overset{2}{\sim} with discrete 0\overset{0}{\sim}.

Proof.

The output of Alg. 2 contains no cycles since all cycles in the input GG are contained within a block and each block is replaced by a star. Furthermore, the replacement of a clique KpK_{p} by a star SpS_{p} with p+1p+1 vertices preserves connectedness, hence GG has been transformed into a tree at this stage. Every vertex of a clique KK, with |V(K)|3|V(K)|\geq 3 this is not also contained in another block is now a leaf; all other nodes of KK are marked red. Every vertex in an K2K_{2} original K2K_{2} block is either a leaf or marked “red”. By construction, every “red” vertex has degree at least 22 and hence is not a leaf. The final operation adds a leaf to each “red” vertex. Together with the renaming of the vertices, thus, every vertex of the input graph is now a leaf in TT.

Now consider the labeling. First suppose that uu and vv are non-adjacent in the input GG, that is, there is a least one cut-vertex, say zz, between them in GG. The construction of T(G)T(G) ensures that the unique path from uu to vv in T(G)T(G) runs through a vertex zz^{\prime} that zz as its neighbor. If the path from uu to zz in GG ran through an edge in a triangle, it passes through the corresponding star and hence contains two edges labeled 11. Otherwise it runs through an unaltered K2K_{2}-block of GG, which is labeled 22. In each case, therefore, dT(G),λ(x,y)4d_{T(G),\lambda}(x,y)\geq 4. Now suppose that uu and vv are adjacent in GG. First suppose uvuv is contained in a triangle of GG. If neither uu nor vv was marked “red” they are both adjacent to the center cKc_{K} of a star with edges labeled 11. If uu was a cut vertex, i.e., marked “red”, it appears a leaf adjacent to a vertex uu^{\prime} that in turn is adjacent to cKc_{K}; furthermore λ(uu)=0\lambda(uu^{\prime})=0 and λ(ucK)=1\lambda(u^{\prime}c_{K})=1. Analogous reasoning applied if vv was a cut vertex of GG. In all cases, thus dT(G),λ(uv)=dT(G),λ(u,cK)+dT(G),λ(cK,v)=1+1=2d_{T(G),\lambda}(uv)=d_{T(G),\lambda}(u,c_{K})+d_{T(G),\lambda}(c_{K},v)=1+1=2. If the edge uvuv is not contained in a triangle, then it is labeled 22. If uu or vv are cut vertices, then the unique path from uu to vv is uuvu-u^{\prime}-v, uvvu-v^{\prime}-v, or uuvvu-u^{\prime}-v^{\prime}-v, with λ(uu)=λ(vv)=0\lambda(uu^{\prime})=\lambda(vv^{\prime})=0 and a label 22 for the remaining edge. Hence, dT(G),λ(uv)=2d_{T(G),\lambda}(uv)=2. In summary u2T(G),λvu\overset{2}{\sim}_{T(G),\lambda}v if and only uu and vv are adjacent in GG. Thus GG is explained by (T(G),λ)(T(G),\lambda) with respect to the exactly-22-relation. ∎

Refer to caption
Figure 7: Illustration of Alg. 2. First all cut vertices of the input graph are marked. In the next step, edges not contained in triangles of GG receive label 22 and all larger cliques are replaced by stars with all new edges labeled 11. In the final step the original cut vertices are decorated by an additional neighbor with a 0-labeled edge.
Theorem 21.

A graph GG can be explained by an edge-labeled tree (T,λ)(T,\lambda) with respect to the exact-22-relation with discrete 0\overset{0}{\sim} if and only if it is a block graph.

Proof.

Suppose GG can be explained w.r.t. RtRt with discrete 0\overset{0}{\sim}. If GG is 2-connected, it is a clique by Lemma 19 and therefore also a block graph. Otherwise, we note that every 2-connected component GG^{\prime} of GG is induced subgraph of GG and thus, by Lemma 3, can be explained w.r.t. RtRt. By Lemma 19 every 2-connected component GG^{\prime} of GG therefore must be a clique, i.e., GG is a block graph.

Conversely, suppose that GG is a block graph. Since Algorithm 2 is correct by Lemma 20, every connected block graph can be explained. Since the non-connected block graphs are just disjoint unions of connected block graphs, Lemma 5 completes the characterization of the non-connected case. ∎

The main result of this section is now obtained as

Corollary 22.

A graph GG is explained by 2\overset{2}{\sim} if and only if R/

R/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
is a block graph.

Proof.

Thm. 11 establishes that GG can be explained w.r.t. 2\overset{2}{\sim} if and only if R/

R/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
can be explained w.r.t. 2\overset{2}{\sim}. Since the graph G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
is thin, Cor. 10 implies that 0\overset{0}{\sim} is discrete for the canonical tree explaining G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
, and thus Thm 21 can be applied to G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
. ∎

In the remainder of this section we consider the ambiguities in the construction of trees explaining block graphs. We start by characterizing contractible edges:

Lemma 23.

Suppose (Te,λe)(T_{e},\lambda_{e}) is obtained from a phylogenetic tree (T,λ)(T,\lambda) by contracting the edge ee in TT and setting λe(e)=λ(e)\lambda_{e}(e^{\prime})=\lambda(e^{\prime}) for all eee^{\prime}\neq e and suppose that G(T,λ)G(T,\lambda) is connected. Then G(T,λ)=G(Te,λe)G(T,\lambda)=G(T_{e},\lambda_{e}) if and only if ee is an interior edge of TT and λ(e)=0\lambda(e)=0.

Proof.

We have already noted the contracting an inner 0-edge does not change the graph. By definition, leaf-edges cannot be contracted, since the vertices of GG correspond to the leaves of TT. Connectedness of GG implies that there is a pair of vertices x,yx,y whose connecting path runs through ee and whose distance dT,λ(x,y)=2d_{T,\lambda}(x,y)=2. The contraction of ee only leaves this distance unaffected if λ(e)=0\lambda(e)=0. Otherwise dT,λ(x,y)d_{T,\lambda}(x,y) changes, which implies that xx and yy become disconnected in GG^{\prime} and hence the graph by the modified tree is different from GG. ∎

Refer to caption
Figure 8: Three alternative, topologically different canonical trees explaining the non-connected graph 2P32P_{3}. These are derived from the unique tree explaining P3P_{3} in Fig. 4.

We note that connectedness of GG is necessary in Lemma 23 since for non-connected GG, the connected components can be “glued together” with arbitrarily complex trees as long as the distances between the attachment points is at least 33. In such examples it can be possible to contract edges without changing the explained graph. There are, for example at least three topologically different canonical trees that explain 2P32P_{3}, see Fig. 8.

Lemma 24.

Let (T,λ)(T,\lambda) be a canonical tree explaining a connected graph GG and let xx be an interior vertex in TT. Then all edges incident to xx are 1-edges or pp has at least one adjacent leaf uu with λ(pu)=0\lambda(pu)=0.

Proof.

Suppose pp has no incident leaf. Since GG is connected, for every edge ee^{\prime} there is another edge e′′e^{\prime\prime} such that λ(e)+λ(e′′)=2\lambda(e^{\prime})+\lambda(e^{\prime\prime})=2. For this pair of edges we have λ(e)=λ(e′′)=1\lambda(e^{\prime})=\lambda(e^{\prime\prime})=1 because no interior edges is 0-labeled. Thus λ(e)=1\lambda(e^{\prime})=1 for all ee^{\prime} incident with xx. On the other hand, if pp has a neighbor uu with λ(pu)=2\lambda(pu)=2, then connectedness of GG implies that there is another neighbor yy of pp with λ(yp)=0\lambda(yp)=0. Hence, unless pp has only 11-neighbors, then there must be a least one incident 0-edge, which by assumption must be a leaf. ∎

We remark, finally, that a tree with minimal number of vertices (or edges) that explains a graph with respect to the exactly-22-relation is necessarily canonical. Otherwise, the contraction of an edge would make it possible to decrease of both the number of edges and vertices.

4 Oriented Exactly-2-Relation

Generalizing the construction of the oriented exactly-1-relation in [15], we consider here an oriented version of the exactly-kk-relation. In constrast to the previous sections, we consider here rooted trees TT with leaf set LL. For two leaves xx and yy there is a unique least common ancestor, denoted by lca(x,y)\mathop{lca}(x,y), defined as the vertex most distant from the root rr of T\overrightarrow{T} that is common to the paths connecting rr with xx and rr with yy, respectively.

Definition 25.

Let (T,λ)(\overrightarrow{T},\lambda) be a rooted tree with leaf set LL and edge-labeling function λ:E(T)0\lambda:E(\overrightarrow{T})\to\mathbb{N}_{0}. For x,yLx,y\in L we consider the directed exactly-kk-relation 𝑘\overset{k}{\rightharpoonup} defined by x𝑘yx\overset{k}{\rightharpoonup}y if e𝒫(x,lca(x,y))λ(e)=0\sum_{e\in\mathcal{P}(x,\mathop{lca}(x,y))}\lambda(e)=0 and e𝒫(lca(x,y),y)λ(e)=k\sum_{e\in\mathcal{P}(\mathop{lca}(x,y),y)}\lambda(e)=k holds for the the (unique) paths 𝒫(x,lca(x,y))\mathcal{P}(x,\mathop{lca}(x,y)) from xx to lca(x,y)\mathop{lca}(x,y) and (lca(x,y),y)\mathbb{(}\mathop{lca}(x,y),y) from lca(x,y)\mathop{lca}(x,y) to yy, respectively.
The rooted tree (T,λ)(\overrightarrow{T},\lambda) explains a the directed graph G(L,E)\overrightarrow{G}(L,E) (with respect to the directed exactly-kk-relation) if (x,y)E(G)(x,y)\in E(G) if and only if x𝑘yx\overset{k}{\rightharpoonup}y.

By construction G(L,E)\overrightarrow{G}(L,E) is an oriented graph, i.e., at most one of (x,y)(x,y) and (y,x)(y,x) can be an edge. As in the unrooted case, we say that a rooted tree (T,λ)(\overrightarrow{T},\lambda) is canonical if it is a rooted phylogenetic tree and does not have an inner 0-edge. In the following we will consider the case that 0\overset{0}{\sim} is discrete. As in the undirected case, we shall relax this requirement in the end.

As in [15], our strategy is to exploit the close relationships between the oriented and the undirected case. Therefore, we first derive some technical results regarding common properties of the oriented relation 𝑘\overset{k}{\rightharpoonup} and its undirected relative 𝑘\overset{k}{\sim}.

Note that the underlying tree (T,λ)(T,\lambda) of a rooted canonical tree (T,λ)(\overrightarrow{T},\lambda) is not necessarily an unrooted canonical tree. By contracting all the interior 0-edges and degree 2 vertices, we get a unique unrooted canonical tree (T,λ)(T^{\prime},\lambda^{\prime}) corresponds to (T,λ)(\overrightarrow{T},\lambda). Conversely, for any unrooted canonical tree (T,λ)(T,\lambda) with |V(T)|>1|V(T)|>1, we can create a set 𝕋(T,λ)\mathbb{T}(T,\lambda) of corresponding rooted least resolved trees as follows: (i) each interior vertex of (T,λ)(T,\lambda) may serve as a root; (ii) each leaf attached by a 0-edge may serve as a root; and (iii) every 2-edge can be subdivided by inserting a the root as a new vertex such that each of the two resulting edges is labeled 1. The construction is detailed in Algorithm 3. An example is given in Figure 9. The following lemma formalizes this one-to-one correspondence between unrooted canonical trees (T,λ)(T,\lambda) and its corresponding sets of rooted canonical tree.

Algorithm 3 Compute the set of canonical rooted trees (𝕋,λ)(\mathbb{T,\lambda})
0:  unrooted canonical tree (T,λ)(T,\lambda) with |V(T)|>1|V(T)|>1
1:  (𝕋,λ)(\mathbb{T,\lambda})\leftarrow\emptyset
2:  for all interior vertices vTv\in T do
3:     designate vv as root
4:     add the rooted tree to (𝕋,λ)(\mathbb{T,\lambda})
5:  end for
6:  for all leaf vertices vTv\in T with λ(vw)>0\lambda(vw)>0 where N(v)={w}N(v)=\{w\} do
7:     subdivide vwvw to vvwvv^{*}w and designate vv^{*} as root
8:     relabel as λ(vv)λ(vw)\lambda(vv^{*})\leftarrow\lambda(vw) and λ(vw)0\lambda(v^{*}w)\leftarrow 0, designate vv^{*}
9:     add the resulting rooted tree to (𝕋,λ)(\mathbb{T,\lambda}).
10:  end for
11:  for all edges e=uve=uv with λ(e)=k>1\lambda(e)=k>1 do
12:     subdivide the edge ee by inserting vv^{*} and designate vv^{*} as the root
13:     for j=1k1j=1...k-1 do
14:        λ(uv)j\lambda(uv^{*})\leftarrow j and λ(vv)kj\lambda(v^{*}v)\leftarrow k-j
15:        add the resulting rooted tree to (𝕋,λ)(\mathbb{T,\lambda}).
16:     end for
17:  end for
18:  return  (𝕋,λ)(\mathbb{T,\lambda})
Lemma 26.

Every rooted canonical tree can be constructed from its underlying unrooted canonical tree by Algorithm  3.

Proof.

By construction, the set of canonical rooted trees corresponding to unrooted canonical tree is well defined, i.e., the correspondence is a mapping.

Suppose there are two distinct unrooted canonical trees (T1,λ1)(T_{1},\lambda_{1}) and (T2,λ2)(T_{2},\lambda_{2}) such that both their correspondings sets of rooted trees contains (T,λ)(\overrightarrow{T},\lambda). By construction, it has a underlying tree (T,λ)(T,\lambda) from which a unique canonical tree is obtained by contracting 0-edges and degree 2 vertices. Thus (T1,λ1)=(T2,λ2)(T_{1},\lambda_{1})=(T_{2},\lambda_{2}), a contradiction. Hence the mapping is injective.

The mapping is also surjective, since each rooted canonical tree (T,λ)(\overrightarrow{T},\lambda) can be constructed from its corresponding unrooted canonical tree. ∎

Lemma 27.

Suppose the unrooted canonical tree (T,λ)(T,\lambda) explains GG with respect to 𝑘\overset{k}{\sim}. Let G\overrightarrow{G} be a digraph explained w.r.t. 𝑘\overset{k}{\rightharpoonup} by a rooted tree (T,λ)(\overrightarrow{T},\lambda) corresponding to (T,λ)(T,\lambda). Then the underlying graph of G\overrightarrow{G} is a spanning subgraph of GG.

Proof.

By construction, (T,λ)(T,\lambda) and (T,λ)(\overrightarrow{T},\lambda) has the same leaf set, and hence VG=VGV_{G}=V_{\overrightarrow{G}}.

Any arc xyx\to y in G\overrightarrow{G} is an edge in the underlying graph of G\overrightarrow{G} because that fact that (T,λ)(\overrightarrow{T},\lambda) explains G\overrightarrow{G} implies e𝒫(x,lca(x,y))λ(e)=0\sum_{e\in\mathcal{P}(x,\mathop{lca}(x,y))}\lambda(e)=0 and e𝒫(lca(x,y),y)λ(e)=k\sum_{e\in\mathcal{P}(\mathop{lca}(x,y),y)}\lambda(e)=k. Considering the underlying unrooted graph (T,λ)(T^{\prime},\lambda^{\prime}), we have e𝒫(x,y)λ(e)=k\sum_{e\in\mathcal{P}(x,y)}\lambda^{\prime}(e)=k. Since (T,λ)(T,\lambda) explains GG, and by construction (T,λ)(T^{\prime},\lambda^{\prime}) displays (T,λ)(T,\lambda), we conclude that (T,λ)(T^{\prime},\lambda^{\prime}) also explains GG. Hence (x,y)E(G)(x,y)\in E(G). ∎

Refer to caption
Figure 9: Construction of rooted canonical trees A through F from the unrooted canonical tree on the left. The possible positions of the root are indicated by the triangles and the corresponding six rooted trees are shown to the right.
Definition 28.

Suppose there a tree (T,λ)(T,\lambda) with discrete 0\overset{0}{\sim} that explains GG w.r.t. 𝑘\overset{k}{\sim}, then every subgraph HH of GG is allowed for 𝑘/0\overset{k}{\sim}/\overset{0}{\sim}. Analogously, if there is a rooted tree (T,λ)(\overrightarrow{T},\lambda) with discrete 0\overset{0}{\sim} that explains G\overrightarrow{G} w.r.t. 𝑘\overset{k}{\rightharpoonup}, we say that every subgraph H\overrightarrow{H} of G\overrightarrow{G} is allowed for 𝑘/0\overset{k}{\sim}/\overset{0}{\sim}.

In more detail a graph HH is allowed for 𝑘/0\overset{k}{\sim}/\overset{0}{\sim} if there exists (T,λ)(T,\lambda) such that for any (x,y)E(G)(x,y)\in E(G), we have e𝒫(x,y)λ(e)=k\sum_{e\in\mathcal{P}(x,y)}\lambda(e)=k. If GG is not allowed for 𝑘/0\overset{k}{\sim}/\overset{0}{\sim}, we say that is it is forbidden (as a subgraph) for 𝑘/0\overset{k}{\sim}/\overset{0}{\sim}. Analogous, a graph H\overrightarrow{H} is allowed for 𝑘/0\overset{k}{\rightharpoonup}/\overset{0}{\sim} in the rooted case, if there exists (T,λ)(T,\lambda) such that for any (x,y)E(G)(x,y)\in E(G), we have e𝒫(x,lca(x,y))λ(e)=0\sum_{e\in\mathcal{P}(x,\mathop{lca}(x,y))}\lambda(e)=0 and e𝒫(lca(x,y),y)λ(e)=k\sum_{e\in\mathcal{P}(\mathop{lca}(x,y),y)}\lambda(e)=k. If G\overrightarrow{G} is not allowed as a subgraph in G(𝑘)/0G(\overset{k}{\rightharpoonup})/\overset{0}{\sim}, we that say GG is forbidden (as a subgraph) for in 𝑘/0\overset{k}{\rightharpoonup}/\overset{0}{\sim}.

Lemma 29.

If GG is forbidden for 𝑘/0\overset{k}{\sim}/\overset{0}{\sim}, then any orientation of GG is forbidden in 𝑘/0\overset{k}{\rightharpoonup}/\overset{0}{\sim}. If G\overrightarrow{G} is allowed as a subgraph for 𝑘/0\overset{k}{\rightharpoonup}/\overset{0}{\sim} with rooted tree (T,λ)(\overrightarrow{T},\lambda), then its underlying graph is allowed for 𝑘/0\overset{k}{\sim}/\overset{0}{\sim} as a subgraph with the corresponding underlying tree (T,λ)(T,\lambda).

Proof.

Suppose, for contradictions, that GG is forbidden for 𝑘/0\overset{k}{\sim}/\overset{0}{\sim} but the orientation G\overrightarrow{G} of GG is allowed for 𝑘/0\overset{k}{\rightharpoonup}/\overset{0}{\sim}. Then there exists a rooted tree T,λ{\overrightarrow{T},\lambda} such that for any arc xyx\to y in G\overrightarrow{G}, we have e𝒫(x,u)λ(e)=0\sum_{e\in\mathcal{P}(x,u)}\lambda(e)=0 and e𝒫(u,y)λ(e)=k\sum_{e\in\mathcal{P}(u,y)}\lambda(e)=k where u=lca(x,y)u=\mathop{lca}(x,y). Consider the unrooted tree (T,λ)(T,\lambda) of (T,λ)(\overrightarrow{T},\lambda). Since (x,y)E(G)(x,y)\in E(G) if and only if xyx\to y or yxy\to x is an arc in G\overrightarrow{G}, then for any (x,y)E(G)(x,y)\in E(G), e𝒫(x,u)λ(e)=0\sum_{e\in\mathcal{P}(x,u)}\lambda(e)=0 and e𝒫(u,y)λ(e)=k\sum_{e\in\mathcal{P}(u,y)}\lambda(e)=k where u=lca(x,y)u=\mathop{lca}(x,y). By definition, GG is allowed for 𝑘/0\overset{k}{\sim}/\overset{0}{\sim}, i.e., we arrive at a contradiction. The second statement is a simple consequence of the first one. ∎

The technical results obtained so far will allow us to infer properties of the oriented graph G\overrightarrow{G} and their explaining trees (T,λ)(\overrightarrow{T},\lambda) from their underlying undirected graphs GG and unrooted trees (T,λ)(T,\lambda). In the following we will focus on graphs G\overrightarrow{G} that can be explained w.r.t. 2\overset{2}{\rightharpoonup} by a rooted tree (T,λ)(\overrightarrow{T},\lambda) with discrete 0\overset{0}{\sim}.

Lemma 30.

Oriented cycles are forbidden as a subgraph for 2/0\overset{2}{\rightharpoonup}/\overset{0}{\sim}.

Proof.

Suppose Cn\overrightarrow{C_{n}} is allowed. Then, by definition, there exists an orientation graph H\overrightarrow{H} with vertex set V(Cn)V(\overrightarrow{C_{n}}) such that Cn\overrightarrow{C_{n}} is a subgraph of H\overrightarrow{H} and a rooted tree (T,λ)(\overrightarrow{T},\lambda) that explains H\overrightarrow{H}. W.l.o.g., we assume (Tn,λ)(\overrightarrow{T_{n}},\lambda) is a rooted canonical tree.

Consider the underlying unrooted canonical tree (T,λ)(T,\lambda) of (Tn,λ)(\overrightarrow{T_{n}},\lambda), we claim that (T,λ)(T,\lambda) must be (Sn,1)(S_{n},1). Suppose that (T,λ)(T,\lambda) explains graph GG. By Lemma 27 since the underlying graph HH of H\overrightarrow{H} is a subgraph of GG which has the same vertex set with HH, and HH contains a Hamiltonian cycle, thus GG also contains a Hamiltonian cycle, by Lemma 18 GG is a complete graph KnK_{n}. And by Lemma 14 the (T,λ)(T,\lambda) displays (Sn,1)(S_{n},1).

Then we consider all the possibility to construct the set of rooted canonical trees corresponding to (Sn,1)(S_{n},1), and consider the oriented graph it explains. By Algorithm 3 we can place the root either on the center vertex, which will explain an empty graph, or place it on one of the leaves, which will explains oriented star on nn vertices point to the leaves. In either case there is no cycles. By Lemma 26 we know we have constructed all rooted canonical trees and thus all oriented graphs they explain. Thus oriented cycles are forbidden as a subgraph for 2/0\overset{2}{\rightharpoonup}/\overset{0}{\sim}. ∎

Lemma 31.

2-star oriented to center, \bullet\to\bullet\leftarrow\bullet, is forbidden as an induced subgraph for 2/0\overset{2}{\rightharpoonup}/\overset{0}{\sim}.

Proof.

Explicit construction shows that we obtain v10v3v_{1}\overset{0}{\sim}v_{3} for each of the three triples v1v2|v3v_{1}v_{2}|v_{3}, v1v3|v2v_{1}v_{3}|v_{2}, and v2v3|v1v_{2}v_{3}|v_{1}. This contradicts the assumption that 0\overset{0}{\sim} is discrete. ∎

Lemma 32.

Every graph G\overrightarrow{G} that can be explained by an edge-labeled tree w.r.t. 2/0\overset{2}{\rightharpoonup}/\overset{0}{\sim} is an oriented forest with the property that all its component trees have a unique source vertex from which all arcs are directed away.

Proof.

Let G\overrightarrow{G} be a graph that can be explained w.r.t. 2/0\overset{2}{\rightharpoonup}/\overset{0}{\sim}. Since all cycles are forbidden induced subgraphs, G\overrightarrow{G} is a forest. Furthermore, there is only a single source vertex in each connected component. Otherwise, if both xx and yy were sources within the same component tree, then the unique path from xx to yy would necessarily contain an induced subgraph of the form \bullet\to\bullet\leftarrow\bullet, which is forbidden. ∎

A canonical tree (T,λ)(\overrightarrow{T},\lambda) with discrete 0\overset{0}{\sim} that explains a connected oriented graph G\overrightarrow{G} w.r.t. 2\overset{2}{\rightharpoonup} has a leaf that is attached to the root by 0-edge.

Theorem 33.

G\overrightarrow{G} is explained w.r.t. 2\overset{2}{\rightharpoonup} by a rooted tree (T,λ)(\overrightarrow{T},\lambda) with discrete 0\overset{0}{\sim} if and only if GG is an oriented forest that does not contain the 2-star oriented to center, \bullet\to\bullet\leftarrow\bullet, as an induced subgraph.

Proof.

To show the “only if” part, suppose G\overrightarrow{G} can be explained. Then G\overrightarrow{G} is oriented and by Lemma 30 it is an orientation tree, and by Lemma 31 we know that 2-star oriented to center is forbidden. For the “if” part we use the construction employed in [15] for 1\overset{1}{\sim} (with 2-edges taking the place of 1-edges): To each inner vertex vv of G\overrightarrow{G} a new vertex vv^{\prime} which represent vv in tree is attached with a 0-edge, while the inner edges of the tree have label 22. The Theorem now follows directly from Lemma 4. ∎

We can relax the condition that (T,λ)(\overrightarrow{T},\lambda) has discrete 0\overset{0}{\sim}. To this end, we extend the false twin relation x

y
x\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}y
to digraphs by setting x

y
x\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}y
iff xx and yy have the same in- and out-neighbors. The quotient graph G/

G/\!\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
is known as the point-determining graph of GG.

Corollary 34.

An oriented graph G\overrightarrow{G} is explained w.r.t. 2\overset{2}{\rightharpoonup} if and only if G\overrightarrow{G} is an oriented forest whose point-determining graph does not contain \bullet\to\bullet\leftarrow\bullet as an induced subgraph.

Proof.

It suffices to note that 0\overset{0}{\sim}-equivalent vertices are in the same

\mathrel{\mathrel{\ooalign{\hss\raisebox{-0.73193pt}{$\sim$}\hss\cr\hss\raisebox{3.09999pt}{\scalebox{0.75}{$\bullet$}}\hss}}}
-class and that there is a least resolved tree in which all members of a 0\overset{0}{\sim}-class are siblings. ∎

Refer to caption
Figure 10: Two distinct rooted trees with discrete 0\overset{0}{\sim} explaining an oriented graph on four vertices w.r.t. 2\overset{2}{\rightharpoonup}.

We note, finally, that the rooted trees with discrete 0\overset{0}{\sim} that explain G\overrightarrow{G} w.r.t. 2\overset{2}{\rightharpoonup} are not unique, as exemplified in Figure 10.

5 Concluding Remarks

The main result of this contribution is the characteriztion of the exactly-22-relations, i.e. the graphs nniPCG(T,λ,2,2)\textrm{nniPCG}(T,\lambda,2,2). They form a proper superset of the the exact-2-leaf power graphs, which comprise only the disjoint unions of cliques. Section 2 suggests, however, that at least some of the structure and techniques carry over to general values of kk. Several related problems are worth considering as well: in particular, nniPCG(T,λ,k,)\textrm{nniPCG}(T,\lambda,k,\infty) and nniPCG(T,λ,1,k)\textrm{nniPCG}(T,\lambda,1,k) are of interest as models for coarse grained models of evolutionary distances.

The oriented version of the exactly-22-relation somewhat surprisingly, is much more closely related to the oriented exactly-11-relation of [15] that to the undirected exactly-22-relation. There is an alternative natural definition for a directed exactly-22-relation that omit the condition that e𝒫(x,lca(x,y))λ(e)=0\sum_{e\in\mathcal{P}(x,\mathop{lca}(x,y))}\lambda(e)=0. Clearly, the resulting digraph are not oriented, i.e., they may contain double edged. We suspect that their structure is more closely related to the Fitch graph (directed at-least-11-relation) recently studied in [17].

Regarding the analysis of rare-event data in phylogenetics the characterization of the exactly-22-relation naturally leads to the edge modification problem for block graphs and graphs whose R-thin quotient is a block graph, respectively. Although these problems do not seem to have been studied so far (see e.g. [37, Tab.1] and [38]). Since exactly-2-relation graphs are hereditary by Lemma 3, we suspect that edge modification problem for the exactly-2-relation graphs can be handled in manner similar to closely related edge modification problem for chordal graphs [37, 38] or cluster editing [39].

Acknowledgments

The authors gratefully acknowledge stimulating discussions with Marc Hellmuth, Manuela Geiß, and Maribel Hernández-Rosales on related classes of graphs derived from labeled trees.

References

  • [1] C. Semple, M. Steel, Phylogenetics, Oxford University Press, Oxford UK, 2003.
  • [2] A. W. M. Dress, K. T. Huber, J. Koolen, V. Moulton, A. Spillner, Basic Phylogenetic Combinatorics, Cambridge University Press, Cambridge UK, 2012.
  • [3] P. E. Kearney, J. I. Munro, D. Phillips, Efficient generation of uniform samples from phylogenetic trees, in: Algorithms in Bioinformatics (WABI 2003 Budapest), Vol. 2812 of Lect. Notes Comp. Sci., Springer, Berlin, 2003, pp. 177–189. doi:10.1007/978-3-540-39763-2\_14.
  • [4] M. N. Yanhaona, K. S. M. T. Hossain, M. S. Rahman, Pairwise compatibility graphs, J. Appl. Math. Comput. 30 (2009) 479–503. doi:10.1007/s12190-008-0204-7.
  • [5] T. Calamoneri, B. Sinaimeri, Pairwise compatibility graphs: A survey, SIAM Review 58 (2016) 445–460. doi:10.1137/140978053.
  • [6] M. I. Hossain, S. A. Salma, M. S. Rahman, D. Mondal, A necessary condition and a sufficient condition for pairwise compatibility graphs, J. Graph Algorithms Appl. 21 (2017) 341–352. doi:10.1007/978-3-319-30139-6\_9.
  • [7] P. Baiocchi, T. Calamoneri, A. Monti, R. Petreschi, Graphs that are not pairwise compatible: A new proof technique, in: C. Iliopoulos, H. W. Leong, W.-K. Sung (Eds.), Combinatorial Algorithms, 29th IWOCA, Vol. 10979 of Lecture Notes Comp. Sci., Springer, Berlin, Heidelberg, 2018, pp. 39–51. doi:10.1007/978-3-319-94667-2\_4.
  • [8] P. Baiocchi, T. Calamoneri, A. Monti, R. Petreschi, Some classes of graphs that are not PCGs, Theor. Comp. Sci. 791 (2019) 62–75. doi:10.1016/j.tcs.2019.05.017.
  • [9] S. Ahmed, M. S. Rahman, Multi-interval pairwise compatibility graphs, in: T. V. Gopal, G. Jäger, S. Steila (Eds.), Theory and Applications of Models of Computation (14’th TAMC 2017), Vol. 10185 of Lect. Notes Comp. Sci., Springer, Heidelberg, 2017, pp. 71–84. doi:10.1007/978-3-319-55911-7\_6.
  • [10] A. Brandstädt, V. B. Lea, D. Rautenbach, Exact leaf powers, Theor. Comp. Sci. 411 (2010) 2968–2977. doi:10.1016/j.tcs.2010.04.027.
  • [11] A. Rokas, P. W. Holland, Rare genomic changes as a tool for phylogenetics, Trends Ecol Evol 15 (2000) 454–459. doi:10.1016/S0169-5347(00)01967-4.
  • [12] J. E. Tarver, E. A. Sperling, A. Nailor, A. M. Heimberg, J. M. Robinson, B. L. King, D. Pisani, P. C. J. Donoghue, K. J. Peterson, miRNAs: Small genes with big potential in metazoan phylogenetics, Mol. Biol. Evol. 30 (2013) 2369–2382. doi:10.1093/molbev/mst133.
  • [13] H. Luo, W. Arndt, Y. Zhang, G. Shi, M. A. Alekseyev, J. Tang, A. L. Hughes, R. Friedman, Phylogenetic analysis of genome rearrangements among five mammalian orders, Mol Phylogenet Evol. 65 (2012) 871–882. doi:10.1016/j.ympev.2012.08.008.
  • [14] J. W. Waegele, T. W. Bartholomaeus (Eds.), Deep Metazoan Phylogeny: The Backbone of the Tree of Life—New Insights from Analyses of Molecules, Morphology, and Theory of Data Analysis, De Gruyter, 2014.
  • [15] M. Hellmuth, M. Hernandez-Rosales, Y. Long, P. F. Stadler, Inferring phylogenetic trees from the knowledge of rare evolutionary events, J. Math. Biol. 76 (2017) 1623–1653. doi:10.1007/s00285-017-1194-6.
  • [16] M. Hellmuth, Y. Long, M. Geiß, P. F. Stadler, A short note on undirected Fitch graphs, Art Discrete Appl. Math. (ADAM) 1 (2018) P1.08. doi:10.26493/2590-9770.1245.98c.
  • [17] M. Geiß, J. Anders, P. F. Stadler, N. Wieseke, M. Hellmuth, Reconstructing gene trees from Fitch’s xenology relation, J. Math. Biol. 77 (2017) 1459–1491. doi:10.1007/s00285-018-1260-8.
  • [18] C. Crespelle, C. Paul, Fully dynamic recognition algorithm and certificate for directed cographs, Discr. Appl. Math. 154 (2006) 1722–1741. doi:10.1016/j.dam.2006.03.005.
  • [19] K. Chaudhuri, K. Chen, R. Mihaescu, S. Rao, On the tandem duplication-random loss model of genome rearrangement, in: Proc. 17th Ann. ACM-SIAM Symp. Discrete Algorithm (SODA ’06), Soc. Industrial Appl. Math., Philadelphia, 2006, pp. 564–570. doi:10.5555/1109557.1109619.
  • [20] M. Hellmuth, M. Hernandez-Rosales, K. T. Huber, V. Moulton, P. F. Stadler, N. Wieseke, Orthology relations, symbolic ultrametrics, and cographs, J. Math. Biol. 66 (2013) 399–420. doi:10.1007/s00285-012-0525-x.
  • [21] N. Nishimura, P. Ragde, D. M. Thilikos, On graph powers for leaf-labeled trees, J. Algorithms 42 (2002) 69–108. doi:10.1006/jagm.2001.1195.
  • [22] T. Calamoneri, E. Montefusco, R. Petreschi, B. Sinaimeria, Exploring pairwise compatibility graphs, Theor. Comp. Sci. 468 (2013) 23–36. doi:10.1016/j.tcs.2012.11.015.
  • [23] P. Buneman, The recovery of trees from measures of dissimilarity, in: F. R. Hodson, D. G. Kendall, P. Tautu (Eds.), Mathematics in the Archaeological and Historical Sciences, Edinburgh University Press, Edinburgh, 1971, pp. 387–385.
  • [24] J. M. S. Simões-Pereira, A note on the tree realizability of a distance matrix, J. Combin. Theory 6 (1969) 303–310. doi:10.1016/S0021-9800(69)80092-X.
  • [25] H.-J. Bandelt, H. M. Mulder, Distance-hereditary graphs, J. Comb. Th., Ser. B 41 (1986) 182–208. doi:10.1016/0095-8956(86)90043-2.
  • [26] R. Hammack, W. Imrich, S. Klavžar, Handbook of Product graphs, 2nd Edition, CRC Press, Boca Raton, 2011.
  • [27] R. McKenzie, Cardinal multiplication of structures with a reflexive relation, Fund. Math. 70 (1971) 59–101. doi:10.4064/fm-70-1-59-101.
  • [28] D. P. Sumner, Point determination in graphs, Discrete Math. 5 (1973) 179–187. doi:10.1016/0012-365X(73)90109-X.
  • [29] J. J. Bull, C. M. Pease, Combinatorics and variety of mating-type systems, Evolution 43 (1989) 667–671. doi:10.1111/j.1558-5646.1989.tb04263.x.
  • [30] G. Kilibarda, Enumeration of unlabelled mating graphs, Graphs Combinatorics 23 (2007) 183–199. doi:10.1007/s00373-007-0692-5.
  • [31] I. Gessel, J. Li, Enumeration of point-determining graphs, J. Comb. Th., Ser. A 118 (2011) 591–612. doi:10.1016/j.jcta.2010.03.009.
  • [32] W. Imrich, Factoring cardinal product graphs in polynomial time, Discrete Math. 192 (1998) 119–144. doi:10.1016/S0012-365X(98)00069-7.
  • [33] A. Brandstädt, C. Hundt, Ptolemaic graphs and interval graphs are leaf powers, in: E. S. Laber, C. F. Bornstein, L. T. Nogueira, F. L. (Eds.), LATIN 2008, Vol. 4957 of Lect. Notes Comp. Sci., Springer, Berlin, 2008, pp. 479–491. doi:10.1007/978-3-540-78773-0\_42.
  • [34] T. Calamoneri, A. Frangioni, B. Sinaimeri, Pairwise compatibility graphs of caterpillars, Computer J. 57 (2014) 1616–1623. doi:10.1093/comjnl/bxt068.
  • [35] S. A. Salma, M. S. Rahman, M. I. Hossain, Triangle-free outerplanar 3-graphs are pairwise compatibility graphs, J. Graph Alg. Appl. 17 (2013) 81–102. doi:10.7155/jgaa.00286.
  • [36] F. Harary, A characterization of block-graphs, Canadian Math. Bull. 6 (1963) 1–6. doi:10.4153/CMB-1963-001-x.
  • [37] P. Burzyn, F. Bonomo, G. Durán, NP-completeness results for edge modification problems, Discr. Appl. Math. 154 (2006) 1824–1844. doi:10.1016/j.dam.2006.03.031.
  • [38] R. Sritharan, Graph modification problem for some classes of graphs, J. Discr. Algorithms 38-41 (2016) 32–37. doi:10.1016/j.jda.2016.06.003.
  • [39] R. Shamir, R. Sharan, D. Thur, Cluster graph modification problems, Discr. Appl. Math. 144 (2004) 173–182. doi:10.1016/j.dam.2004.01.007.