On Finding Lekkerkerker-Boland Subgraphs
Abstract
Lekkerkerker and Boland characterized the minimal forbidden induced subgraphs for the class of interval graphs. We give a linear-time algorithm to find one in any graph that is not an interval graph. Tucker characterized the minimal forbidden submatrices of matrices that do not have the consecutive-ones property. We give a linear-time algorithm to find one in any matrix that does not have the consecutive-ones property.
1 Introduction
A graph is an interval graph if it is the intersection graph of a set of intervals on a line. Such a set of intervals is known as an interval model of the graph. They are an important subclass of perfect graphs [3], they have been written extensively about and they model constraints in various combinatorial optimization and decision problems [11, 13]. They have a rich structure and history, and interesting relationships to other graph classes. For a survey, see [1].
If is a 0-1 (binary) matrix, we let denote the number of rows, columns and 1’s. Such a matrix has the consecutive-ones property if there exists a reordering of its columns such that, in every row, the 1’s are consecutive. A clique matrix of a graph is a matrix that has a row for each vertex, a column for each clique, and a 1 in row , column if vertex is contained in clique . A graph is an interval graph if and only if its clique matrices have the consecutive-ones property, see, for example [3].
In 1962, Lekkerkerker and Boland described the minimal induced forbidden subgraphs for the class of interval graphs [6], known as the LB graphs (Figure 2). Ten years later, Tucker described the minimum forbidden submatrices for consecutive-ones matrices [16]. These are depicted in Figure 1. Not surprisingly, there is a relationship between the intersection graphs of rows of Tucker matrices and the LB graphs, depicted in Figure 2.
In this paper, we give a linear time bound for finding one of the LB subgraphs when a graph is not an interval graph. As part of our algorithm, we also give a linear-time bound for finding one of Tucker’s submatrices in a matrix that does not have the consecutive-ones property. This latter problem was solved previously in time in [2], where is the number of rows of the matrix.
A graph is chordal if it has no chordless cycle ( on four or more vertices). Figure 5 seems to imply that the rows of a given Tucker submatrix of the clique matrix can be extended to an interval model of an LB subgraph by including at most three additional rows of the matrix, giving the rows of vertices that induce an LB subgraph. Figure 5 gives a counterexample. Fortunately, it is true in the case of chordal graphs, but the example illustrates the need for a proof; it does not follow from seemingly obvious considerations, such as that no clique is a subset of any other. This gives an LB subgraph if is chordal, and when it is not chordal, the algorithm of [15] already gives a .
An interval graph is proper if there exists an interval representation where no interval is a subset of another. It is a unit interval graph if there exists an interval representation where all intervals have the same length. These graph classes are the same, and Wegner showed that a graph is a proper interval graph if and only if it does not have a chordless cycle, the special case of or for or the claw () as an induced subgraph [17]. Hell and Huang give an algorithm that produces one of them in linear time [4] . The problem of finding a forbidden subgraph reduces easily to finding an LB subgraph. Each of the LB graphs is either one of Wegner’s forbidden subgraphs or contains an obvious claw, and finding a claw in linear time, given an interval model, is elementary. By itself, this approach has no obvious advantages over Hell and Huang’s elegant algorithm, but such reductions are useful when studying or programming a collection of related algorithms.
A certifying algorithm is an algorithm that provides, with each output, a simple-to-check proof that it has answered correctly [5, 9]. An interval model gives a certificate that a graph is an interval graph, and an LB subgraph gives one if the graph is not an interval graph. However, a certifying algorithm was given previously in [5]. The ability to give a consecutive-ones ordering or a Tucker submatrix in linear time gives a linear-time certifying algorithm for consecutive-ones matrices, but one was given previously in [8]. The previous certificates are easier to check, which is a desirable property for certifying algorithms. However, they are neither minimal nor uniquely characterized. Aside from the theoretical interest in LB subgraphs, it is easy to obtain a minimal certificate of the form given in [5] from an LB subgraph found by the algorithm we describe below. An open question is whether a minimal, unique, especially simple, or otherwise interesting special case of the certificate from [8] can be obtained by applying the algorithm of that paper to a Tucker submatrix obtained by the algorithm of the present paper. Tucker submatrices may be useful in heuristics for finding large submatrices that have the consecutive-ones property, small Tucker matrices, or identifying errors in biological data [14, 2]. Our techniques provide new tools for such heuristics.
2 Preliminaries
Given a graph , let denote the number of vertices and denote the number of edges. If , let denote the subgraph induced by . Standard sparse representations of 0-1 matrices take space to represent . We treat the rows and columns as sets, where each row is the set of columns where the row has a 1 and each column is the set of rows where the column has a 1. Suppose is the set of rows of a consecutive-ones ordered matrix and is the ordering of the columns. In linear time, we can find, for each row, the leftmost and rightmost column in the row. Let us call these the left endpoint and right endpoint of the row.
That interval graphs are a subclass of the class of chordal graphs follows from inclusion of among the LB subgraphs. Rose, Tarjan and Lueker give an algorithm that finds whether a graph is a chordal graph, and, if so, produces its maximal cliques [12]. Otherwise, the algorithm of [15] produces a chordless cycle () in linear time.
When a graph is chordal, the problem of deciding whether it is an interval graph reduces to the problem of deciding whether its clique matrix has the consecutive-ones property. Booth and Lueker further reduced this problem to that of finding a maximal prefix of the rows of a binary matrix that has the consecutive-ones property, in time.
Assigning a left-to-right order to children of each internal node of a rooted tree results in a unique left-to-right order of the leaves. Booth and Lueker’s algorithm produces a PQ tree, for . The PQ tree represents all possible consecutive-ones orderings of . There is one leaf for each column . The internal nodes of the PQ tree consists of P nodes and Q nodes. The consecutive-ones ordering of columns are given by the leaf orders obtainable by assigning an arbitrary left-to-right order to children of each P node, and for each Q node, assigning the given left-to-right order or its reverse.
Though the PQ tree can be represented using space per node, conceptually, we will consider each node of the PQ tree to be set given by the disjoint union of its children; equivalently, it is the union of its leaf descendants.
Definition 1
Let be a collection of subsets of a set . Two elements of are in the same Venn class if they are elements of the same set of members of . The unconstrained Venn class consists of those elements of that are not in any member of ; all others are constrained. Two sets overlap if their intersection is nonempty, but neither is a subset of the other. The overlap graph of is the undirected graph whose vertices are the members of , and are adjacent if and only if and overlap.
Lemma 1
[10] A set of columns is a Q node of a consecutive-ones matrix if and only if it is the union of rows of a connected component of the overlap graph of rows of . The Venn classes of rows in this component are its children.
3 Breadth-first search on the overlap graph of a collection of sets, given a consecutive-ones ordering
In linear time, we may label each row of a consecutive-ones ordered matrix with its left and right endpoints. We may then label each column of a consecutive-ones ordered matrix with the set of rows that have their left endpoints in . In linear time, we can then radix sort the list of sets that have their left endpoint at in descending order of index of right endpoint, yielding a list . This is accomplished with a single radix sort that has the index of the left endpoint as its primary sort key and index of the right endpoint as the secondary sort key. By symmetry, we can construct a list of rows that have their right endpoint in each column , sorted in ascending order of index of left endpoint.
This allows us to perform a breadth-first search on the overlap graph of the rows in time linear in the size of the matrix, as follows. The lists and are represented with doubly-linked lists. We maintain the invariant that elements that have been placed in the BFS queue have been removed from these lists. When a consecutive-ones ordered set comes to the front of the queue, we traverse its list of columns. For each in the list, we remove elements from and place them in the queue, until we reach an element in whose right endpoint is no farther to the right than . All of the removed elements overlap . Since is sorted in descending order of right endpoint, all these elements are a prefix of , and any remaining elements in the list do not overlap . When we remove an element from , we remove it from any list that it is a member of, to maintain the invariant. This takes time for each element moved to the BFS queue, plus time for each column of . The lists are handled symmetrically. Summing over all rows , the time is .
4 Finding Tucker Submatrices
4.1 Tucker matrices with at most four rows
Lemma 2
If a set of rows has the consecutive-ones property and is a row such that does not, then is one of the rows of every Tucker submatrix in .
Algorithm 4.1
(See Lemma 3.)
init | ialR | ows | () |
; | |||
While and M has at least rows | |||
Using Booth and Lueker, find the minimal prefix of rows | |||
of that does not have the consecutive-ones property. | |||
Let be the matrix whose row sequence is . | |||
return |
Lemma 3
Suppose Algorithm 4.1 is run with parameter and a matrix that does not have the consecutive-ones property. If the returned matrix has at most rows, then these are the rows of every Tucker matrix of . Otherwise, fails to have the consecutive-ones property and every Tucker submatrix in has at least rows.
Proof
By induction on , does not have the consecutive-ones property at the end of iteration . Also, by induction on , using Lemma 2, at the end of iteration , for every Tucker submatrix of , the rows of include the first rows of . If has only rows, then the first rows of do not have the consecutive-ones property, so at the end of iteration , will have rows.
We run Algorithm 4.1 for . If it returns a matrix with rows, where , it is easy to get a linear time bound to get the columns. (One way is to generate all orderings of rows and for each, to check for the columns of each Tucker matrix of size .) Otherwise, Algorithm 4.1 returns a matrix of more than 4 rows. By Lemma 3, fails to have the consecutive-ones property and every Tucker submatrix of has at least five rows. This excludes any instances of , or the anomalous case of on three rows that does not correspond to a chordless cycle.
4.2 Matrices in which all Tucker submatrices have more than four rows
Lemma 4
The overlap graphs of , , and are simple cycles.
Definition 2
Suppose is a set of rows with the consecutive-one property, is a Q node of its PQ tree, is the ordering of ’s children and is a row not in . Let be three children of such that . They are a 1-0-1 configuration for if and each contain a 1 of row and contains a 0 of row . They are a 0-1-0 configuration for if and each contain a 0 of row and contains a 1 of row .
Lemma 5
If is a set of rows that has the consecutive-ones property and is a row not in , then does not have the consecutive-ones property if the PQ tree of has a Q node such that either:
-
1.
has a 1-0-1 configuration for ;
-
2.
has a 0-1-0 configuration for and is not a subset of .
This test is implicit in Booth and Lueker’s algorithm, where it is a sufficient condition, but not a necessary one. The following is a consequence of Lemma 1.
Lemma 6
Lemma 7
If a matrix fails to have the consecutive-ones property and has no Tucker submatrix with fewer than five rows, then when Algorithm 4.1 is run on it with , at the end of one of the five iterations of its loop, the PQ tree of will have a Q node with the following properties:
-
–
has a 1-0-1 configuration for ;
-
–
There exist that are members of the component of the overlap graph on whose union is , and such that contains and is disjoint from and , and contains and is disjoint from and .
Proof
If is the rows of that contain a Tucker submatrix , at the end of an iteration of the loop of Algorithm 4.1, by by Lemma 2. By Lemma 4, the overlap graph of is connected, so is a subset of a component of the overlap graph of , which gives rise to a Q node of the PQ tree of , by Lemma 1. Since the children of are the Venn classes of the component, no two Venn classes of , hence no two columns of , can lie in the same child of the Q node.
For each choice of a last row of a Tucker matrix on at least five rows, Figure 3 gives the possible orderings imposed on the last row by a consecutive-ones ordering of , which is unique up to reversal, by Lemmas 4 and 1. In each case, if row is chosen to go last, rows and satisfy the requirements of and . has at least five rows and no row of is contained in more than once in the five iterations, so in at least one of the iterations a row will go last.
The correctness and linear time bound for the following are the key results of this section:
Algorithm 4.2
Find the rows of a Tucker submatrix when every Tucker submatrix has at least five rows
fin | dRo | ws | () |
Run initialRows() (Algorithm 4.1) to find , , , | |||
satisfying the requirements of Lemma 7 | |||
Let be a shortest path in the overlap graph of from to | |||
Let be a minimal prefix of such that the union of and the | |||
set of rows of does not have the consecutive-ones property. | |||
Let be a minimal suffix of such that the union of and the | |||
set of rows of does not have the consecutive-ones property. | |||
Return . |
Lemma 8
If does not have the consecutive-ones property and every Tucker matrix of has at least five rows, then Algorithm 4.2 returns the set of rows of a Tucker matrix of .
Proof
Since and lie in the same component of the overlap graph of , exists. Since has the consecutive-ones property, so does . Because has a connected overlap graph, , is a single Q node of the PQ tree of , by Lemma 1. Because of and , , , and are contained in distinct Venn classes of , and the ones containing and are constrained. Since is consecutive, it must have a row that contains the , hence the Venn class of containing is also constrained. Therefore, does not have the consecutive-ones property by Lemma 5, and and exist. By Lemma 2, all Tucker matrices in contain , so this applies also to .
Suppose there is a proper subset of the rows on such that contains a Tucker matrix. The overlap graph of is connected, by Lemma 4. Since is a shortest path, it is a chordless path, so is a subpath of by Lemma 4. Let be the rows on , excluding the last row on . Let be the rows of , excluding the first row on . By the minimality of and , and have the consecutive-ones property. Since is a subpath of , or , so has the consecutive-ones property, contradicting our assumption that it does not. Therefore, is the set of rows of a Tucker matrix.
Figure 4 gives an example on which we illustrate some implementation details. is given by the 1’s and 0’s above the column numbers in the figure on the left, and is depicted by the intervals. The rows labeled and satisfy the requirements of and for Lemma 7, and is a shortest path from to in the overlap graph of , found using the BFS algorithm of Section 3.
Using Booth and Lueker’s terminology, we maintain labels on each class indicating whether it is full (contains only 1’s of ), empty (contains only 0’s of ), or partial (contains both 1’s and 0’s of ). The minimal prefix of whose rows, together with , do not have the consecutive-ones property, is . This is detected as follows. It is easy to verify that its sequence of constrained Venn classes is ,,, , , , , , , and their full/partial/empty labels , respectively. Selecting a 1 from a full class, a 0 from the first partial class and a 1 from the second partial class gives a 1-0-1 configuration satisfying Lemma 5. It is the minimal such prefix. A smaller prefix, has a 0-1-0 configuration, but it does not satisfy Lemma 5 because .
The minimal suffix of that satisfies Lemma 5 is , which is found in the same way by working on the reverse of . Its sequence of constrained Venn classes are , , , , , labeled , respectively. Selecting a 0 from the first partial class, a 1 from the next, and a 0 from an empty class gives a 0-1-0 configuration. It satisfies condition 2 of the lemma, because the unconstrained class, is partial, hence .
Therefore, is the set of rows of a Tucker submatrix. A minimal set of columns that illustrates that it satisfies the lemma is . On the righthand side of Figure 4 is the resulting Tucker matrix, which matches the final configuration in the sequence for in Figure 3.
This example shows that the key to finding and is maintaining the sequence of constrained Venn classes and their full/partial/empty labels as rows are added in the order in which they occur on or on the reverse of . Since they are added in an order such that every prefix of the order has a connected overlap graph, the sequence is uniquely constrained after each row is added, by Lemma 1. When a row is added, if it overlaps a constrained Venn class , must be replaced in the sequence with two Venn classes, or with , whichever is required to maintain consecutiveness of . If intersects the unconstrained class, , then must be added at one extreme end of the sequence, whichever maintains consecutiveness of . Details are given in [8].
The difference between this algorithm and that of [8] (and Booth and Lueker) is that, instead of testing at each iteration whether the next row can be added to those considered so far without undermining the consecutive-ones property, it must repeatedly perform this test on the fixed row after each row is added. We already know that can be added, since has the consecutive-ones property. Like Booth and Lueker, the previous algorithm of [8] applies the full/partial/empty labels for to facilitate the test, in time, and then removes them before considering the next row . Though we must perform the test on the fixed row at each iteration, instead of on , we must do it time, not time, in order to retain the linear time bound. To do this, we leave the full/partial/empty labelings for from one iteration to the next, so that we only have to update them, using , rather than re-creating them each time a new row is considered.
To facilitate this, we keep updated labels and on each Venn class , where denotes the cardinality of and is the number of elements of in . Labels only need to be updated when a Venn class is split. It is split into and . We may find and by counting them directly, since there are of these elements. The classes are implemented with doubly-linked lists, and these sets are removed from the list for , leaving it to represent . Subtracting and from the old labels and gives the updated labels for and in time. Each of the new classes is full if its and labels are equal, empty if its label is 0, and partial otherwise.
To evaluate whether one of the conditions of Lemma 5 holds, it is easy to see that it suffices to keep track of transition pairs, which are consecutive pairs such that one contains a 0 and one contains a 1. This happens when their full/partial/empty labels are unequal, or else both partial. When a new transition pair forms, we have touched at least one member of the pair within our operations, so keeping track of these does not affect this time bound.
Since finding takes linear time by the BFS of Section 3, it remains only to bound the time required to find the first step, finding the elements and of Lemma 7. This is much more straightforward, since we apply it once for each iteration of the loop of Algorithm 4.1, hence we can afford to take time for the test. We can apply the entire set of full/partial/empty labels for to the PQ tree within this bound, by working from the leaves to the root.
For each Q node , the members of the overlap component whose union is are unions of more than one and fewer than all of its children. How to find them in linear time for all Q nodes has been described previously, for example in [7]. We find the rows of the overlap component that contain a Venn child of that is labeled full or partial (a “1”). Out of all such rows, let be the one with a leftmost right endpoint, and let be the one with the rightmost left endpoint. By a simple greedy swapping argument, the overlap component giving rise to contains an and a satisfying Lemma 7 if and only if and satisfy it, which happens if and only if there is a child between the right endpoint of and the left endpoint of that is labeled partial or empty (a “0”). This can also clearly be implemented so that the time bound over all Q nodes takes time.
Once the set of rows of a Tucker matrix have been found, it remains to find the columns. This is a set of columns, the removal of any one of which would undermine the conditions of Lemma 5, which are satisfied initially by . Deletion of a column undermines the lemma if and only if does one of the following:
-
1.
It disconnects the path in overlap graph on ;
-
2.
It undermines the only remaining 1-0-1 configuration, or 0-1-0 configuration with a 1 in the unconstrained class.
The second test is elementary and omitted because of space constraints. For the first test, recall that is a chordless path in the overlap graph. Let for for . Each element of is consecutive-ones ordered, the sum of cardinalities of sets in is . The overlap graph remains connected if and only if every member of contains at least one retained column. We give each column a list of members of it is contained in and keep a counter on each element of indicating the number of remaining columns it contains. When removing a column , the counters can be updated by decrementing the counters of members of in its list. A column cannot be removed if removing it would decrement a counter to 0.
5 Finding a Lekkerkerker-Boland Subgraph
Tucker observed that the smallest graphs whose clique matrices contain a Tucker matrix must be exactly the LB graphs [16]. However, making use of this to find an LB subgraph is not as straightforward as it appears. It is easy to believe from Figure 2 that that the rows of every Tucker submatrix in a clique matrix can be extended to a clique matrix of an LB subgraph by including at most three additional rows. The rows of the result would therefore identify the vertices that induce an LB subgraph.
That this reasoning is flawed is illustrated by Figure 5. Neither does it follow from the fact that no column of the clique matrix is a subset of any other; we discovered the example in the figure when trying to prove it using only this fact. Therefore, though the conclusion is true in the case of chordal graphs, this requires proof.
If is not chordal, we may return a by the algorithm of [15]. Henceforth, we may assume that the graph is chordal. A clique tree of a chordal graph is a tree that has one node for each maximal clique, and with the property that for each vertex of , the cliques that contain induce a connected subtree. Every chordal graph has a clique tree, see for example [3]. A vertex is simplicial if its neighbors induce a complete subgraph. The following are immediate from results that appear there:
Lemma 9
Let be a clique tree for a chordal graph , and let be a leaf. Then contains a simplicial vertex of . Let be the simplicial vertices of , and let be the result of deleting leaf from . Deleting from yields an induced subgraph that has as a clique tree.
Definition 3
By shrinking a clique tree , let us denote the operation of deleting the set of simplicial vertices in a leaf of , yielding a graph with the smaller clique tree described by the lemma.
Lemma 10
If is a chordal graph, is a clique tree of , and is connected, then the cliques of that contain members of induce a connected subtree of .
Lemma 11
Let be a clique tree of a chordal graph , a collection of cliques of , and a clique . If every member of contains a vertex that does not have and is connected, then does not lie on the path in between any pair of cliques of .
Definition 4
If a Tucker matrix occurs as a submatrix induced by column set and row set of a clique matrix, let a private row for be a row that is contained in no column of other than . Let the special columns of be those that are subsets of other columns of these matrices, and let the special columns of an instance of on three rows be all three columns.
The special columns are the ones that Tucker identified as “asteroidal column triples” in the bipartite incidence graph of rows and and columns in [16]. However, the existence of simplicial vertices or “asteroidal vertex triples” in a a sense defined in [6] was not examined. The paper dealt with arbitrary consecutive-ones matrices where such rows need not occur.
Lemma 12
Let be a chordal graph and let be a submatrix of a clique matrix that is an instance of on three vertices or an instance of . Then the clique matrix has a private row for each special column of .
Proof
Let be the set of cliques of that contain the columns of this instance of , let contain a special column of the instance, and let be a clique tree for . In each case, does not lie on the unique path in between any pair of members of , by Lemma 11. Therefore, iteratively shrinking (Definition 3) until it cannot be further shrunk without deleting a clique of results in an induced subgraph with a clique tree where is a leaf. By Lemma 9, has a simplicial vertex, which must be a private row for in .
By examination, it is easy to verify in the case of on three rows, and that adding the private rows for the special columns to the rows and columns of the submatrix gives an interval model for the corresponding LB subgraph. The resulting set of rows therefore induces an LB subgraph.
References
- [1] A. Brandstaedt, V.B. Le, and J.P. Spinrad. Graph Classes: A Survey. SIAM Monographs on Discrete Mathematics, Philadelphia, 1999.
- [2] C. Chauve, U.-U. Haus, T. Stephen, and V. P. You. Minimal conflicting sets for the consecutive ones property in ancestral genome reconstruction. Journal of Computational Biology, 17:1167–1181, 2010.
- [3] M. C. Golumbic. Algorithmic Graph Theory and Perfect Graphs. Academic Press, New York, 1980.
- [4] P. Hell and J. Huang. Certifying LexBFS recognition algorithms for proper inteval graphs and proper interval bigraphs. SIAM J. Discrete Math, 18:554–570, 2004.
- [5] D. Kratsch, R.M. McConnell, K. Mehlhorn, and J.P. Spinrad. Certifying algorithms. SIAM Journal on Computing, 36:236–353, 2006.
- [6] C. Lekkerker and D. Boland. Representation of finite graphs by a set of intervals on the real line. Fund. Math., 51:45–64, 1962.
- [7] G. S. Lueker and K. S. Booth. A linear time algorithm for deciding interval graph isomorphism. J. ACM, 26:183–195, 1979.
- [8] R. M. McConnell. A certifying algorithm for the consecutive-ones property. Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA04), 15:761–770, 2004.
- [9] R. M. McConnell, K. Mehlhorn, S. Näher, and P. Schweitzer. Certifying algorithms. Computer Science Reviews, 5:119–161, 2011.
- [10] J. Meidanis, O. Porto, and G.P. Telles. On the consecutive ones property. Discrete Applied Mathematics, 88:325–354, 1998.
- [11] Fred S. Roberts. Graph Theory and Its Applications to Problems of Society. Society for Industrial and Applied Mathematics, Philadelphia, 1978.
- [12] D. Rose, R. E. Tarjan, and G. S. Lueker. Algorithmic aspects of vertex elimination on graphs. SIAM J. Comput., 5:266–283, 1976.
- [13] J. Spinrad. Efficient Graph Representations. American Mathematical Society, Providence RI, 2003.
- [14] J Stoye and R. Wittler. A unified approach for reconstructing ancience gene clusters. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6:387–400, 2009.
- [15] E. E. Tarjan and M. Yannakakis. Addendum: Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM Journal on Computing, 14:254–255, 1985.
- [16] A. Tucker. A structure theorem for the consecutive 1’s property. Journal of Combinatorial Theory, Series B, 12:153–162, 1972.
- [17] G. Wegner. Eigenschaften der Nerven Homologishe-Einfactor Familien in . PhD thesis, Universität Göttingen, 1967.