Local correlation clustering
Abstract
Correlation clustering is perhaps the most natural formulation of clustering. Given objects and a pairwise similarity measure, the goal is to cluster the objects so that, to the best possible extent, similar objects are put in the same cluster and dissimilar objects are put in different clusters.
Despite its theoretical appeal, the practical relevance of correlation clustering still remains largely unexplored. This is mainly due to the fact that correlation clustering requires the pairwise similarities as input. In large datasets this is infeasible to compute or even only to store.
In this paper we initiate the investigation into local algorithms for correlation clustering, laying the theoretical foundations for clustering “big data”. In local correlation clustering we are given the identifier of a single object and we want to return the cluster to which it belongs in some globally consistent near-optimal clustering, using a small number of similarity queries.
Local algorithms for correlation clustering open the door to sublinear-time algorithms, which are particularly useful when the similarity between items is costly to compute, as it is often the case in many practical application domains. They also imply distributed and streaming clustering algorithms, constant-time estimators and testers for cluster edit distance, and property-preserving parallel reconstruction algorithms for clusterability.
Specifically, we devise a local clustering algorithm attaining a -approximation (a solution with cost at most , where is the optimal cost). Its running time is independently of the dataset size. If desired, an explicit approximate clustering for all objects can be produced in time (which is provably optimal). We also provide a fully additive -approximation with local query complexity and time complexity . The explicit clustering can be found in time . The latter yields the fastest polynomial-time approximation scheme for correlation clustering known to date.
1 Introduction
In correlation clustering111Sometimes called clustering with qualitative information or cluster editing. we are given a set of objects and a pairwise similarity function , and the goal is to cluster the items in such a way that, to the best possible extent, similar objects are put in the same cluster and dissimilar objects are put in different clusters. Assuming that cluster identifiers are represented by natural numbers, a clustering is a function . Correlation clustering aims at minimizing the cost:
(1) |
The intuition underlying the above problem definition is that if two objects and are assigned to the same cluster we should pay the amount of their dissimilarity , while if they are assigned to different clusters we should pay the amount of their similarity .
In the most widely studied setting, the similarity function is binary, i.e., . This setting can be viewed very conveniently trough graph-theoretic lenses: the items correspond to the vertices of a similarity graph , which is a complete undirected graph with edges labelled “+” or “-”. An edge causes a disagreement (of cost 1) between the similarity graph and a clustering when it is a “+” edge connecting vertices in different clusters, or a “–” edge connecting vertices within the same cluster. If we were given a cluster graph [37] (or clusterable graph), i.e., a graph whose set of positive edges is the union of vertex-disjoint cliques, we would be able to produce a perfect (i.e., cost 0) clustering simply by computing the connected components of the positive graph. However, similarities will generally be inconsistent with one another, so incurring a certain cost is unavoidable. Correlation clustering aims at minimizing such cost. The problem can be viewed as an agnostic learning problem, where we try to approximate the adjacency function of by the hypothesis class of cluster graphs; alternatively, it is the task of finding the equivalence relation that most closely resembles a given symmetric relation .
Correlation clustering provides a general framework in which one only needs to define a suitable similarity function. This makes it particularly appealing for the task of clustering structured objects, where the similarity function is domain-specific and does not rely on an ad hoc specification of some suitable metric such as the Euclidean distance of vectors. Thanks to this generality, the technique is applicable to a multitude of problems in different domains, including duplicate detection and similarity joins [27, 17], biology [11], image segmentation [30] and social networks [12].
Another key feature of correlation clustering is that it does not require a prefixed number of clusters, instead it automatically finds the optimal number.
Despite its appeal, correlation clustering has been, so far, mainly of theoretical interest. This is due to its scaling behavior with the size of the input data: given items to be clustered, building the complete similarity graph requires similarity computations. For a large , the the similarity graph might unfeasible to construct, or even only to store. This is the main bottleneck of correlation clustering and the reason why its practical relevance still remains largely unexplored.
The high-level contribution of our work is to overcome the main drawback of correlation clustering, making it scalable. We achieve this by designing algorithms that can construct a clustering in a local and distributed manner.
The input of a local clustering algorithm is the identifier of one of the objects to be clustered, along with a short random seed. After making a small number of oracle similarity queries (probes into the pairwise similarity matrix), a local algorithm outputs the cluster to which the object belongs, in some globally consistent near-optimal clustering.
1.1 A model for local correlation clustering
In the following we focus on the binary case: we will discuss the non-binary case together with other extensions in Section LABEL:sec:extensions. We work with the adjacency matrix model, which assumes oracle access to the input graph . Namely, given , we can ask whether is a positive edge of ; each query is charged with unit cost. By explicitly finding a clustering we mean storing for every . In this explicit model a running time of is necessary as it requires to specify all values. An algorithm with complexity for (approximate) correlation clustering is already a significant improvement over the complexity of most current solutions, but we take a step further and ask whether the dependence on may be avoided altogether by producing implicit representations of the cluster mapping.
It is for this reason that we define local clustering as follows. Let us fix, for each finite graph , a collection of “high quality” clusterings for .
Definition 1.1 (Local clustering algorithm)
Let . A clustering algorithm for is said to be local with time (resp., query) complexity if having oracle access to any graph , and taking as input and a vertex , returns a cluster label in time (resp., with queries).
Algorithm implicitly defines a clustering, described by the cluster label function , where the same sequence of random bits is used by to calculate for each . The success probability of is the infimum (over all graphs ) of the probability (over ) that the clustering implicitly defined by belongs to .
Note that does not depend on : this means that the cluster label of each vertex can be computed in constant time independently of the others. On the other hand, could have a (hopefully mild) dependence on the desired quality of the clustering produced (which defines the set for a given ), and the success probability of . Finally, it is important to note that, in order to define a unique “global” clustering across different vertices, the same sequence of random coin flips must be used.
Sometimes we also allow local algorithms with preprocessing , meaning (when denotes time complexity) that is allowed to perform computations and queries using total time before reading the input vertex . This preprocessing computation/query set is common to all vertices and may only depend on the outcome of ’s internal coin tosses and the edges probed.
1.2 Contributions and practical implications
We focus on approximation algorithms for local correlation clustering with sublinear time and query complexity. Since any multiplicative approximation needs to make queries (Section LABEL:sec:lb), we need less stringent requirements.222We remark that in a different model that uses neighborhood oracles [4], it is possible to bypass the lower bound for multiplicative approximations that holds for edge queries. In fact from our analysis we can derive the first sublinear-time constant-factor approximation algorithm for this case; see Section LABEL:sec:extensions. One way is to allow an additional -fraction of edges to be violated, compared to the optimal clustering of cost . Following Parnas and Ron [33], we study approximations: solutions with at most disagreements. These solutions form the set of “high-quality” clusterings in Definition LABEL:def:local. Here is a small constant and is an accuracy parameter specified by the user. Essentially handles the trade-off between the desired accuracy and the run-time: the larger the faster then algorithm, but also the further from .
While we provide the formal statement of our results in Section LABEL:sec:results, here we highlight the main message of this paper: there exist efficient local clustering algorithms with good approximation guarantees. Namely, in time it is possible to obtain -approximations locally. (Typically we think of as a user-defined constant.) This yields many practical contributions as by-products:
-
Explicit clustering in time . Given that can be computed in time for each , one can produce an explicit clustering in time . Since , this is linear in the number of vertices (not edges) of the graph. More generally, the complexity of finding clusters of a subset of vertices requested by the user is proportional to the size of this subset.
-
Distributed algorithms. We can assign vertices to different processors and compute their cluster labels in parallel, provided that the same random seed is passed along to all processors.
-
Streaming algorithms. Similarly, local clustering algorithms can cluster graphs in the streaming setting, where edges arrive in arbitrary order. In this case the sublinear behaviour is lost because we still need to process every edge. However, the memory footprint of the algorithm can be brought down from to (called the semi-streaming model [18]). Indeed, note that given a fixed random seed, for every vertex the set of all possible queries that can be made during the computation of has size333This bound can in fact be reduced to for the non-adaptive algorithms we devise. at most . This set can be computed before any edge arrives. From then on it suffices to keep in memory the edges where , and there are of them. In fact, the running time of the local-based algorithm will be dominated by the time it takes to discard the unneeded edges.
-
Cluster edit distance estimators and testers. We can estimate the degree of clusterability of the input data in constant time by sampling pairs of vertices and using the local clustering algorithm to see how many of them disagree with the input graph. We believe this can be an important primitive to develop new algorithms. Moreover, estimators for cluster edit distance give (tolerant) testers for the property of being clusterable, thereby allowing us to quickly detect data instances where any attempt to obtain a good clustering is bound to failure.
-
Local clustering reconstruction. Queries of the form “are in the same cluster?” can be answered in constant time without having to partition the whole graph: simply compute and , and check for equality. This means that we can “correct” our input graph (a “corrupted” version of a clusterable graph) so that the modified graph we output is close to the input and satisfies the property of being clusterable. This fits the paradigm of local property-preserving data reconstruction of [3] and [35].
To the best of our knowledge, this is the first work about local algorithms for correlation clustering.
2 Background and related work
Correlation clustering. Minimizing disagreements is the same as maximizing agreements for exact algorithms, but the two tasks differ with regard to approximation. Following [25], we refer to these two problems as MaxAgree and MinDisagree, while and refer to the variants of the problem with a bound on the number of clusters. Not surprisingly MaxAgree and MinDisagree are -complete [10, 37]; the same holds for their bounded counterparts, provided that . Therefore approximate solutions are of interest. For MaxAgree, there is a (randomized) \PTAS: the first such result was due to Bansal et al. [10] and ran in time , later improved to by Giotis and Guruswami [25]. The latter also presented a for that runs in time . In contrast, MinDisagree is -hard [14], so we do not expect a \PTAS. Nevertheless, there are constant-factor approximation algorithms [10, 14, 2]. The best factor () was given by Ailon et al. [2], who also present a simple, elegant algorithm that achieves a slightly weaker expected approximation ratio of , called QuickCluster (see Section LABEL:sec:main). For , appeared in [25] and [29]. There is also work on correlation clustering on incomplete graphs [10, 14, 41, 25, 17].
Sublinear clustering algorithms. Sublinear clustering algorithms for geometric data sets are known [5, 32, 9, 15, 16]. Many of these find implicit representations of the clustering they output. There is a natural implicit representation for most of this problems, e.g., the set of cluster centers. By contrast, in correlation clustering there may be no clear way to define a clustering for the whole graph based on a small set of vertices. The only sublinear-time algorithm known for correlation clustering is the aforementioned result of [25]; it runs in time , but the multiplicative constant hidden in the notation has an exponential dependence on the approximation parameter.
The literature on active clustering also contains algorithms with sublinear query complexity (see, e.g., [28]); many of them are heuristic or do not apply to correlation clustering. Ailon et al. [1] obtain algorithms for with sublinear query complexity, but the running time of their solutions is exponential in .
Local algorithms. The following notion of locality is used in the distributed computing literature. Each vertex of a sparse graph is assigned a processor, and each processor can compute a certain function in a constant number of rounds by passing messages to its neighbours (see Suomela’s survey [40]). Our algorithms are also local in this sense.
Recently, Rubinfeld et al. [34] introduced a model that encompasses notions from several algorithmic subfields, such as locally decodable codes, local reconstruction and local distributed computation. Our definition fits into their framework: it corresponds to query-oblivious, parallelizable, strongly local algorithms that compute a cluster label function in constant time.
Finally, we point out the work of Spielman and Teng [39] pertaining local clustering algorithms. In their papers an algorithm is “local” if it can, given a vertex , output ’s cluster in time nearly linear in the cluster’s size. Our local clustering algorithms also have this ability (assuming, as they do, that for each vertex we are given a list of its neighbours), although the results are not comparable because [39] attempt to minimize the cluster’s conductance.
Testing and estimating clusterability. Our methods can also be used for quickly testing clusterability of a given input graph , which is related to the task of estimating the cluster edit distance, i.e., the minimum number of edge label swaps (from “+” to “–” and viceversa) needed to transform into a cluster graph. Note that this corresponds to the optimal cost of correlation clustering for the given input . Clusterability is a hereditary graph property (closed under removal and renaming of vertices), hence it can be tested with one-sided error using a constant number of queries by the powerful result of Alon and Shapira [8]. Combined with the work of Fischer and Newman [20], this also yields estimators for cluster edit distance that run in time independent of the graph size. Unfortunately, the query complexity of the algorithm given by these results would be a tower exponential of height , where is the approximation parameter.
Approximation algorithms for MIN-2-CSP problems [7] also give estimators for cluster edit distance. However, they provide no way of computing each variable assignment in constant time. Moreover, they use time to calculate all assignments, and hence do not lend themselves to sublinear-time clustering algorithms.
3 Statement of results
All our graphs are undirected and simple. For a vertex , is the set of positive edges incident with ; similarly define . We extend this notation to sets of vertices in the obvious manner. The distance between two graphs and is . Their fractional distance is their distance divided by (note this is in the interval ). Two graphs are -close to each other if their distance is at most most . A -clusterable graph is a union of at most vertex-disjoint cliques. A graph is clusterable if it is -clusterable for some .
The following folklore lemma says that approximate -clustering algorithms yield approximate clustering algorithms with an unbounded number of clusters:
Lemma 3.1
If is clusterable, then it is -close to -clusterable.
Proof.
Take the optimal clustering for . Let be the set of vertices in clusters of size . Now re-cluster the elements of arbitrarily into clusters of size (except
possibly one). This introduces at most
additional errors. All but one of the clusters of the resulting
clustering have size , hence it has at most clusters.
Corollary 3.2
Any approximation to the optimal -clustering is also a approximation to the optimal clustering.
Proof.
Immediate from the triangle inequality for graph distances.
We are now ready to summarize our results. All our algorithms are (necessarily) randomized, and succeed with probability no less than (which can be amplified). Our first result concerns the standard setting where the clusters of all vertices need to be explicitly computed. We present a -approximation444We can also produce an expected -approximation. Because we insist on algorithms that work with constant success probability, we talk about -approximations, where the constant could be replaced with any number . that runs in time ; compare the complexity of most other clustering methods. Our algorithm is optimal up to constant factors.
Theorem 3.3
Given , a -approximate clustering for MinDisagree can be found in time . Moreover, finding an -approximation with constant success probability requires queries.
In other words, with a “budget” of queries we can obtain a -approximation. In fact, the upper bound of Theorem LABEL:explicit3 can be derived from our next result. It states that the same approximation can be implicitly constructed in constant time, regardless of the size of the graph.
Theorem 3.4
Given , a -approximate clustering for MinDisagree can be found locally in time , or in time after preprocessing that uses non-adaptive queries and time. Moreover, finding an -approximation with constant success probability requires adaptive queries.
As a corollary we obtain a partially tolerant tester of clusterability. We stress that the tester is efficient both in terms of query complexity and time complexity, unlike many results in property testing.
Corollary 3.5
There is a non-adaptive, two-sided error tester which accepts graphs that are -close to clusterable and rejects graphs that are -far from clusterable. It runs in time .
So far these results do not allow us to obtain clusterings that are arbitrarily close to the optimal one. To overcome this issue, we also show (using different techniques) that a purely additive approximation can still be found with queries, but with an exponentially larger running time.
Theorem 3.6
Given , there is a local clustering algorithm that achieves an approximation to the cost of the optimal clustering. Its local time complexity is after preprocessing that uses queries and time.
For the explicit versions we obtain the following.
Corollary 3.7
There is a -approximate clustering algorithm for MinDisagree (and hence MaxAgree too) that runs in time . In particular there is a for MaxAgree with the same running time.
The “in particular” part follows from the observation that the optimum value for is (see, e.g., [25, Theorem 3.1]). The best in the literature [25] ran in time . In our result, the dominating term (depending on ) has an exponentially smaller multiplicative constant (polynomial in ), and then we have an additive term exponential in (and independent of ). As for lower bounds, observe that the bound from Theorem LABEL:explicit3 still applies, while the presence of a term of the form for very small seems hard to avoid due to the -completeness of the problems, as an optimal solution can be found upon setting .
These results are established via the study of the corresponding problems with a prespecified number of clusters; such algorithms yield additive approximations to the general case upon setting in view of Lemma LABEL:fromkton. For fixed , the bounds for our algorithms have the same form after replacing by (see Section LABEL:main_dense). For example, we get a for in time .
Corollary 3.8
For any , there is a non-adaptive, one-sided error tester which accepts graphs that are -close to clusterable and rejects graphs that are -far from clusterable. It has query complexity and runs in time , where .
Techniques and roadmap. Our first local algorithm (Theorem LABEL:main_local) is inspired by the QuickCluster algorithm of Ailon et al. [2], which resembles the greedy procedure for finding maximal independent sets. The main idea to make a local version is to define the clusters “in reverse”. We find a small set of “cluster centers” or “pivots” by looking at a small induced subgraph, and then we show a simple rule to define an extended clustering for the whole graph in terms of the adjacencies of each particular vertex with . As it turns out, such can be obtained by a procedure that finds a constant-sized “almost-dominating” set of vertices that are within distance two of most other vertices in the graph, in such a way that we can combine the expected 3-approximation guarantee of [2] with an additive error term. The algorithm and its analysis are given in Section LABEL:sec:main.
The second local algorithm (Theorem LABEL:main_dense) borrows ideas from the PTAS for dense MaxCut of Frieze and Kannan [24] and uses low-rank approximations to the adjacency matrix of the graph. (Interestingly, while such approximations have been known for a long time, their implications for correlation clustering have been overlooked.) Notably, implicit descriptions of these approximations are locally computable in constant time (polynomial in the inverse of the approximation parameter). We show that in order to look for near-optimal clusterings, we can restrict the search to clusterings that “respect” a sufficiently fine weakly regular partition of the graph. Then we argue that this can be used to implicitly define a good approximate clustering: to cluster a given vertex, we first determine its piece in a regular partition, and then we look at which cluster contains this piece in the best coarsening of the partition. The details are in Section LABEL:sec:additive.
The lower bounds, proven in Section LABEL:sec:lb, are applications of Yao’s lemma [42]. Broadly speaking, we give the candidate algorithm a perfect clustering of most vertices of the graph into clusters of equal size, and for each of the remaining vertices a “secret” cluster is chosen at random among these . The optimal clustering of the resulting graph has fractional cost for some constant . We then ask the algorithm to find clusters for the remaining vertices, and show that it must make adaptive queries if it is to output a clustering with fractional cost no larger than .
Finally, in Section LABEL:sec:extensions we discuss several extensions, including the case of non-binary similarity measure.
4 -approximations
First we describe the QuickCluster algorithm of Ailon et al. [2]. It selects a random pivot, creates a cluster with it and its positive neighborhood, removes the cluster, and iterates on the induced subgraph remaining. Essentially it finds a maximal independent set in the positive graph. When the graph is clusterable, it makes no errors. In [2], the authors show that the expected cost of the clustering found is at most three times the optimum.
Note that determining the positive neighborhood of a pivot takes queries to the adjacency oracle. The algorithm’s worst-case complexity is : consider the graph with no positive edges. In fact its time and query complexity is , where is the average number of clusters found. This suggests attempting to partition the data into a small number of clusters to minimize query complexity.
We know from Lemma LABEL:fromkton that any clustering can be -approximated by a clustering with pieces of size . So an idea would be to modify QuickCluster so that most clusters output are sufficiently large. Fortunately, QuickCluster tends to do just that on average, provided that the graph of positive edges is sufficiently dense, because the expected size of the next cluster found is precisely one plus the average degree of the remaining graph. Once the graph becomes too sparse, a low-cost clustering of the remaining vertices can be found without even looking at the edges, for example by putting each of them into their own cluster.555Another possibility that works is to cluster all remaining vertices into clusters of size , eliminating the need for singleton clusters.
Another advantage of finding a small number of clusters is locality. Let denote the first elements of the sequence of pivots found by QuickCluster. Let us pick an arbitrary vertex contained in the neighbourhood of ; all other vertices can be safely ignored because as we shall see they usually will be incident to few edges (for suitably chosen ). Then the pivot of ’s cluster is the first element of that is a positive neighbour of : therefore it can be determined in time , assuming we are given the pivot sequence .
Therefore we propose the scheme whose pseudocode is given in Algorithm 1 (the analysis is presented in the next section). Assuming we know a good sequence , an implicit clustering is defined deterministically in the way described above; two vertices and belong to the same cluster if and only if FindCluster() = FindCluster(). Similarly to QuickCluster, we can find a set of pivots by finding an independent set of vertices in the graph; to keep it small we restrict the search to an induced subgraph of size . This is done by FindPivots, which can be seen as a “preprocessing stage” for the local clustering algorithm FindCluster. In the next section the following key lemma will be shown.
Lemma 4.1
Let and . The expected cost of the clustering determined by is at most , and the probability that it exceeds is less than .
For example, setting , we see that with probability , the clustering determined by is a -approximation to the optimal one. Although this low bound on the success probability may be overly pessimistic, we can amplify it in order to obtain better theoretical guarantees. To do this with confidence we try several samples and estimate the cost of the associated local clusterings by sampling random edges.
Lemma 4.2
Let denote the fractional cost of the optimal clustering. With probability at least , returns a pivot set with fractional cost at most . Its running time is .
Finally, to obtain a purely local clustering algorithm with no preprocessing we need each vertex to find the good set of pivots by itself, as per LocalCluster(). Note it is crucial here that all vertices have access to the same source of randomness so the same set of pivots is found on each separate call. In practical implementations this means introducing an additional parameter of short length, for instance a common random seed to be used.
From Lemmas LABEL:lem:eight and LABEL:lem:cost we can easily deduce two of our main results.
Corollary 4.3 (Upper bound of Theorem LABEL:main_local)
is a local clustering algorithm for MinDisagree achieving a approximation to the optimal clustering with probability . The preprocessing runs in time , and the clustering time per vertex is .
Corollary 4.4 (Upper bound of Theorem LABEL:explicit3)
An explicit clustering attaining a approximation can be found with probability in time .
By a very similar argument we can produce an expected -approximate clustering in time . A time-efficient property tester for clusterability (Corollary LABEL:cor:prop_test) is also a simple consequence of the above.
4.1 Analysis of the local algorithm
We prove the approximation guarantees of the algorithm (Lemma LABEL:lem:eight) by comparing it to the clustering found by QuickCluster, which is known to achieve an expected 3-approximation [10]. In this section we consider the input graph with negative edges removed, so for all .
The following is a straightforward consequence of the multiplicative Chernoff bounds:
Lemma 4.5
Let , , and . Suppose are random variables in such that for all , the variables are independent with common mean . Define . Then with probability at least , the following holds:
-
•
If , then .
-
•
For all , if , then .
Corollary 4.6
Let . Let be clusterings such that at least one of them has fractional cost . Then with probability we can select such that has fractional cost at most using a total of edge queries to .
Proof of Lemma LABEL:lem:cost.
Use Lemma LABEL:lem:eight and Corollary LABEL:cor:aggr.
Proof of Corollary LABEL:explicit4.
Call FindGoodPivots() once to obtain a good pivot sequence with probability
in time . Then run sequentially for each vertex in order to determine
its cluster label , appending to the list of vertices in cluster labelled . Finally output
the resulting clusters. The whole process runs in time .
A partial clustering of is a clustering of a subset of . Its partial cost is the number of disagreements between edges that have at least one endpoint in .
Now consider a clustering of into . For , the th partial subclustering of is the partition of into . Clearly the cost of a clustering upper bounds the partial cost of any of its partial subclusterings.
Lemma 4.7
Let denote the sequence of pivots found by . The expected number of edge violations involving vertices within distance from is at most .
Proof. To simplify the analysis, in the proof of this lemma we modify QuickCluster and FindPivots slightly so that they run deterministically provided that a random permutation of the vertex set is chosen in advance. Concretely, we consider a deterministic version of QuickCluster, denoted , that uses pivot set , where lists all vertices of in ascending order of . Similarly, deterministic takes for the set of the first elements in increasing order of . Clearly running on a random permutation is the same as running the original FindCluster, and likewise for QuickCluster.
Observe that the set of pivots returned by is a prefix of
the set of pivots returned by . Therefore the
first clusters are the same as well, i.e., define a partial
subclustering of the one found by . Hence the partial cost of the
subclustering determined by is in expectation at most . This is
equivalent to the statement of the lemma.
Next we show that FindCluster returns a small “almost-dominating” set of vertices, in the sense quantified in the following result.
Theorem 4.8
Let be a graph and be an ordered sample of independent vertices uniformly chosen with replacement from . Let . Then the expected number of edges of not incident with an element of is less than .
Observe that an existential result for an almost-dominating set is easy to establish by picking pivots in order of decreasing degree in the residual graph. However, doing so would invalidate the approximation guarantees of QuickCluster we are relying on. We defer the proof of Theorem LABEL:lem:indep. Assuming the result, we are ready to prove Lemma LABEL:lem:eight.
Proof of Lemma LABEL:lem:eight. Lemma LABEL:lem:prefix says that the set of pivots found define a partial clustering with expected cost bounded by . Let be the random sample used by . Theorem LABEL:lem:indep is stated for sampling with replacement, but this implies the same result for sampling without replacement, so its conclusion still holds. Combining the two results and setting , we see that the set of disagreements in the clustering produced can be written as the union of two disjoint random sets with and . Thus the total cost is . By linearity of expectation, the expected cost is and by applying Markov’s inequality to the non-negative variables and separately we conclude that
The rest of this section is
devoted to prove Theorem LABEL:lem:indep.
For any non-empty graph and pivot , let denote the subgraph of
resulting from removing all edges incident to (keeping all vertices).
Define a random sequence of graphs by and , where are chosen independently at random from (note
that sometimes ).
Lemma 4.9
Let have average degree . When going from to , the number of edges decreases in expectation by at least , and the number of degree-0 vertices increases in expectation by at least .
Proof. Let , and let denote the positive degree of on . The claim on the number of degree-0 vertices is easy, so we prove the claim on the number of edges. Consider an edge . It is deleted if the chosen pivot is an element of (this contains and ); let be the 0-1 random variable associated with this event. It occurs with probability
Let be the number of edges deleted. By linearity of expectation, its average is
Now we compute
hence
Now let and define the “actual size” of a graph by Let and define by
Lemma 4.10
For all the following inequalities hold:
(2) | ||||
(3) | ||||
(4) |
Proof. Inequality (LABEL:eq:cond_alpha_i) is a restatement of Lemma LABEL:lem:del_edges. Inequality (LABEL:eq:alpha_i) follows from Jensen’s inequality: since and the function mapping to is concave, we have
Finally we prove for all . We know that
so the claim follows by induction on as is increasing
on and
Remark 4.1
With a finer analysis (Appendix LABEL:sec:sharper), Equation (LABEL:eq:alpha_i2) can be strengthened to where . (This does not affect the asymptotics.)
Proof of Theorem LABEL:lem:indep.
Note that after sampling vertices with replacement from , the subgraph
of resulting from removing all edges incident to is
distributed according to . Using Equation (LABEL:eq:alpha_i2), we bound
5 Fully additive approximations
Here we study -approximations. By Lemma LABEL:fromkton and its corollary, it is enough to consider -clusterings for .
5.1 The regularity lemma.
One of the cornerstone results in graph theory is the regularity lemma of Szemerédi, which has found a myriad applications in combinatorics, number theory and theoretical computer science [31]. It asserts that every graph can be approximated by a small collection of random bipartite graphs; in fact from we can construct a small “reduced” weighted graph of constant size which inherits many properties of . If we select an approximation parameter , it gives us an equitable partition of the vertex set of into a constant number of classes such that the following holds: for any two large enough , the number of edges between and can be estimated by thinking of as a random graph where the probability of an edge between and is . (The precise notion of approximation we need will be explained later.) Moreover, it is possible to choose a minimum partition size ; often is chosen such that “internal” edges among vertices from the same class are few enough to be ignored.
The original result was existential, but algorithms to construct a regular partition are known [6, 23, 19] which run in time polynomial in (for constant ). This naturally suggests trying to use the partition classes in order to obtain an approximation of the optimal clustering. Nevertheless, to the best of our knowledge, the only prior attempts to exploit the regularity lemma for clustering are the papers of Speroto and Pelillo [38] and Sárközy, Song, Szemerédi and Trivedi [36]. They use the constructive versions of the lemma to find the reduced graph , and apply standard clustering algorithms to . Since the partition size required by the lemma is an -level iterated exponential of (and this kind of growth rate is necessary [26]), they propose heuristics to avoid this tower-exponential behaviour. However, the running time of their algorithms is at least , where is the exponent for matrix multiplication. Moreover, no theoretical bounds are provided on the quality of the clustering found by working with the reduced graph, even if no heuristics were applied.
5.2 Cut decompositions of matrices
The idea of Frieze and Kannan is to take any real matrix with row set and column set with bounded entries and approximate it by a low-rank matrix of a certain form. (The case of interest for us is when is the adjacency matrix of a graph.) Let and . Given and a real density , the cut matrix is the rank-1 matrix defined by if , and otherwise. We identify and with with column indicator vectors of length and , respectively. Then we can write . We define A cut decomposition of a matrix with relative error is a set of cut matrices , where , such that for all ,
Such a decomposition is said to have width and coefficient length .
Theorem 5.1 (Cut decompositions [22, Th. 1])
Suppose is a matrix with entries in and are reals. Then in time we can, with probability , find implicitly a cut decomposition of width , relative error at most and coefficient length at most 6.
Regarding the meaning of “implicit”. By implicitly finding a cut decomposition of a matrix in time , we mean that for any given pair , we can compute all of the following in time by making queries to :
-
the rational values ;
-
the indicator functions and , for all ;
-
the value of the entry .
In Appendix LABEL:app:fk we give a sketch of how Frieze and Kannan achieve this. We also observe that their algorithm is non-adaptive.
Specifying a maximum cut-set size. Suppose we start with arbitrary equitable partitions of the row set and column set of into pieces. We can then find cut decompositions of the submatrices induced by the partition, and combine them into a cut decomposition of the original matrix that satisfies ; the reader may verify that this preserves the bound on relative error. This process can only increase the query and time complexities by an factor (c.f. [22, Section 5.1]).
Application to adjacency matrices. Suppose is the adjacency matrix of an unweighted graph , and identify the sets and with . Then . Let denote the number of edges between and . Then and the conclusion of Theorem LABEL:thm:fk can be written as
for all ,
(5) |
The last term can be bounded by . While the standard regularity lemma supplies a much stronger notion of approximation, this bound suffices for certain applications.
Weakly regular partitions. A weakly -pseudo-regular partition of is a partition of into classes such that for all disjoint , where . If, in addition, the partition is equitable, it is said to be weakly -regular.
Given a cut decomposition of a graph with relative error and size , we get an -weakly pseudo-regular partition of size by taking the classes of the Venn diagram of with universe . So we can enforce the condition that the sets partition the vertex set of , at an exponential increase in the number of such sets. Furthermore, any weakly -pseudo-regular partition of size may be refined to obtain a weakly -regular partition of slightly larger size; see [22, Section 5.1].
Often the weak regularity lemma is stated thusly in terms of weakly regular partitions, but the formulation of Theorem LABEL:thm:fk is stronger in that it allows us to estimate the number of edges between two sets in time provided that we know the sizes of their intersections with all , even though the weakly regular partition has size .
5.3 Near-optimal clusterings and the local algorithm
Intuitively, two vertices in the same class of a regular partition have roughly the same number of connections with vertices outside. Hence for any given clustering of the remaining nodes, the cost of placing into any one of the clusters is roughly the same as the cost of placing there, suggesting they belong together in an optimal clustering (if we can afford to ignore the cost due to internal edges in the regular partition). In other words, a regular partition can be “coarsened” into a good clustering; the best one can be found by considering all possible combinations of assigning partition classes to clusters and estimating the cost of each resulting clustering.
We can make this argument rigorous by using bounds derived from the weak regularity lemma to approximate the cost of the optimal clustering by a certain quadratic program. If we ignore the terms with a single variable squared, the optimum of this program does not change by much as long as the partition is sufficiently fine. Then one can argue that the modified program attains its optimum for an assignment of variables which can be interpreted as a clustering that puts everything from the same regular partition into the same cluster.
Lemma 5.2
Let be the adjacency matrix of a graph and . Let be a cut decomposition of with relative error and with for all . Denote by the optimal -clustering, and by the optimal -clustering into classes that belong to the -algebra generated by over . Then
Proof. We use Equation (LABEL:eq:weak_reg) to introduce an “idealized” cost function satisfying the following for any clustering :
-
1.
; and
-
2.
.
Taken together, these two properties imply the result.
For each -clustering into , define
(6) |
For any , , using Equation (LABEL:eq:weak_reg) it holds that
Similarly, | ||||||
Therefore
where the last inequality is by Cauchy-Schwarz.
It remains to be shown that ; in other words, that there is an almost-optimal -clustering under the cost function whose pieces are unions of the pieces of the Venn diagram of . To see this, write
Then
Therefore is a quadratic form on the intersection sizes , :
where and when .
Now remove from this expression the terms where . Among these, the terms where evaluate to zero because and are disjoint. Each of the terms where and has absolute value at most
since from the bound on the coefficient length, and . Therefore the term removal changes the value of the cost function by at most .
For , let . Let
Then we have seen that
and . Hence finding the optimal -clustering under the idealized cost function can be reduced, up to an additive error of , to solving the following integer quadratic program:
minimize | (7) | ||||
subject to | |||||
The reason is that any feasible solution for gives a clustering by assigning arbitrary elements of to the first cluster, another elements of to the second cluster, and so on.
Because , there is an optimal solution to (LABEL:eq:qp) in which for all , exactly one is equal to and the rest are zero.
Indeed, fix for all and all in a solution (which corresponds to fixing a -clustering of ). Then the objective function becomes a linear combination of , plus a constant term. Therefore it is minimized by picking the
cluster with the smallest coefficient and setting .
We sketch now our second local algorithm.
Proof of Theorem LABEL:main_dense. For any , we show a local algorithm that achieves an -approximation to the optimal -clustering in time , after a preprocessing stage that uses queries and time. Theorem LABEL:main_dense then follows by setting .
Fist compute a cut decomposition of that satisfies the conditions of Lemma LABEL:lem:opt_cl. By Theorem LABEL:thm:fk, it can be computed implicitly in time. Let be the atoms of the -algebra, where and . Observe that they can also be defined implicitly: given we can compute in time a -bit label that determines the unique to which belongs, namely the value of the indicator functions .
Next we proceed to the more expensive preprocessing part. Consider a clustering all of whose classes are unions of . Any such clustering is defined by a mapping that, for every , identifies the cluster to which all elements of belong. We can try all the possibilities for , and for each of them and estimate the cost of the associated clustering to within with high enough success probability by sampling. (We omit the details.) If we select the best of them, by Lemma LABEL:lem:opt_cl, it will have cost within of the optimal one.
Now we have a “best” mapping from to that, for every , tells us the cluster of the elements of . Finally, note that for any , the appropriate such that can be determined in time , and then we can get a cluster label for in time by computing .
6 Lower bounds
We show that our algorithm from Section LABEL:sec:main is optimal up to constant factors by proving a matching lower bound for obtaining -approximations. For simplicity we consider expected approximations at first; later we prove that combining upper and lower bounds for expected approximations leads to lower bounds for finding bounded approximations with high confidence.
Theorem 6.1
Let , . Finding an expected -approximation to the best clustering with probability requires queries to the similarity matrix.
In addition, any local clustering algorithm achieving this approximation has query complexity . (This remains true even if we allow preprocessing, as long as its running time is bounded by a function of and not .)
Proof. The first part implies the second because any -query local clustering algorithm with preprocessing can be turned into an explicit -query clustering algorithm. Given a lower bound of on the complexity of finding approximate -clusterings for large enough , we get for all large enough , which implies since . So we prove the first claim.
By Yao’s minimax principle, it is enough to produce a distribution over graphs with the following properties:
-
•
the expected cost of the optimal clustering of is
-
•
for any deterministic algorithm making at most queries, the expected cost (over ) of the clustering produced exceeds .
Let , and . We can assume that , and are integral (here we use the fact that ). Let and . Consider the following distribution of graphs: partition the vertices of into exactly equal-sized clusters . The set of positive edges will be the union of the cliques defined by , plus an edge joining each vertex to a randomly chosen element . Define the natural clustering of a graph by the classes (). This clustering will have a few disagreements because of the negative edges between different vertices with . The cost of the optimal clustering of is bounded by that of the natural clustering , hence
We have to show that any algorithm making queries to graphs drawn from produces a clustering with expected cost larger than . This inequality holds provided that the output clustering and the natural clustering are at least -far apart. Indeed, reasoning about expected distances, is -close to , therefore any clustering that is -close to is also -close to from the triangle inequality.
Since all graphs in induce the same subgraphs on and separately, we can assume without loss of generality that the algorithm queries only edges between and . Let us analyze the distance between the natural clustering and the clustering found by the algorithm. For , let denote set of queries it makes from to and put . Clearly we can assume . The total number of queries made is .
As is independent of all edges from to , conditioning on the responses to all queries not involving we still know that the probability that all responses are negative is . When this happens, the probability that coincides with the algorithm’s choice is at most .
All in all we have that the probability that the algorithm puts into the same cluster as is bounded by . Let us associate a 0-1 random variable with this event and put . Consequently,
We will see below (Lemma LABEL:kmqi) that the last term can be bounded by , where . Therefore .
Now note that any vertex with introduces new differences with the natural clustering. Thus the expected number of differences is at least
because .
Lemma 6.2
Let with . Then
Proof. Let . Define the sets
and
Observe that . Then
Finally, we argue that similar bounds hold for algorithms that obtain good approximation with high success probability.
Lemma 6.3
Suppose finds a -approximate clustering with success probability using queries, and finds an expected -approximate clustering using queries. Then there is an algorithm that finds an expected -approximation using queries.
Proof. Algorithm does the following:
-
1.
Let .
-
2.
Run independent instantiations of to find clusterings with queries.
-
3.
Run independently to find an expected -approximate clustering with queries.
-
4.
Estimate the quality of these clusterings using random samples for each of them.
-
5.
Return the clustering with the smallest estimated error.
The query complexity bound of is as stated. When one of the clusterings found is -approximate, the probability that we fail to return a -approximation is at most . In this case we bound the error of the clustering output by . So the contribution to the expected approximation due to this kind of failure is at most . We assume from now on that this is not the case.
The probability that none of is a -approximation is at most
. In this case we output a -approximation.
On the other hand, with probability at most , we output a clustering that in expectation is a -approximation.
Therefore, the output is an expected -approximation.
Corollary 6.4
Let . Finding a -approximate clustering with confidence requires queries.
Proof. We may assume .
Take the algorithm from Corollary LABEL:explicit4 and plug it into Lemma LABEL:conv_expected. This gives an
expected approximation of using queries.
The result now follows from Lemma LABEL:lb_neps.
7 Extensions
Non-binary similarity function. In Section 1 we have introduced correlation clustering in its most general form, with a pairwise similarity function , while the case we ave studied so far is that of a binary similarity function . The general case can be reduced to this by “rounding the graph”, i.e., by replacing a non-binary similarity score with either or according to which is the closest (breaking ties arbitrarily): Bansal et al. [10, Thm. 23] showed that if is an algorithm that produces a clustering on a graph with -edges with approximation ratio , then running on the rounding of achieves an approximation ratio of . Therefore our algorithms also provide approximations for correlation clustering in the more general weighted case.
Neighborhood oracles. If, given , we can obtain a linked list of the positive neighbours of (in time linear in its length), then it is possible to obtain a multiplicative -approximation in time , which is sublinear. Indeed, Ailon and Liberty [4] argue that with a neighborhood oracle, QuickCluster runs in time ; if this is . On the other hand, if we set in our algorithm, we obtain in time a -approximation, which is also a -approximation when . So we can run QuickCluster for steps and output the result; if it doesn’t finish, we run our algorithm with .
Distributed/streaming clustering. In Section LABEL:sec:intro we mentioned that there are general transformations from local clustering algorithms into distributed/streaming algorithms. For our local algorithm from Section LABEL:sec:main we can do the following. Suppose that to each processor is assigned a subset of the pairs , so that can compute (or has information about) whether there is a positive edge between and for the pairs . (The assignment of vertex pairs to processors can be arbitrary, as long as they partition .) Then each processor selects the same random vertex subset of size , and discards (or does not query/compute) the edges not incident with among those it can see (). After this, each processor outputs, for each , the pairs in (note that for each , there are only different such pairs). With this information the pivot set is the subset of with no neighbour smaller than itself (in some random order), and then the label of ’s cluster is the first element of adjacent to . This can be computed easily in another round.
Note that the sum of the memory used by all processors is , so for constant we also get a (semi-)streaming algorithm that makes one pass over the data, with edges arriving in arbitrary order. In two passes we can reduce memory usage to : first store the adjacency matrix of the subgraph induced by the random sample , and compute the set of pivots. In the second pass, keep an integer for each that indicates the first element of that has appeared as a neighbour of in the edges seen so far. At the end, this integer will be ’s cluster.
8 Concluding remarks
This paper initiates the investigation into local correlation clustering, devising algorithms with sublinear time and query complexity. The tradeoff between the running time of our algorithms and the quality of the solution found is close to optimal. Moreover, our solutions are amenable to simple implementations and they can also be naturally adapted to the distributed and streaming settings in order to improve their latency or memory usage.
The notion of local clustering introduced in this paper opens an interesting line of work, which might lead to various contributions in more applied scenarios. For instance, the ability of local algorithms to (among others) quickly estimate the cost of the best clustering can provide a powerful a primitive for decision-making, around which to build new data analysis frameworks. The streaming capabilities of the algorithms may also prove useful in clustering large-scale evolving graphs: this might be applied to detect communities in on-line social networks.
Another intriguing question is whether one can devise other graph-querying models that allow for improved theoretical results while being reasonable from a practical viewpoint. The -time constant-factor approximation algorithm using neighborhood oracles that we discussed in Section LABEL:sec:extensions suggests that this may be a fruitful direction to pursue in further research. The question seems particulary relevant in order to apply local techniques to very sparse graphs.
References
- [1] N. Ailon, R. Begleiter, and E. Ezra. Active learning using smooth relative regret approximations with applications. In Proc. of 25th COLT, pages 19.1–19.20, 2012.
- [2] N. Ailon, M. Charikar, and A. Newman. Aggregating inconsistent information: Ranking and clustering. Journal of the ACM, 55(5), 2008.
- [3] N. Ailon, B. Chazelle, S. Comandur, and D. Liu. Property-preserving data reconstruction. Algorithmica, 51(2):160–182, 2008.
- [4] N. Ailon and E. Liberty. Correlation clustering revisited: The “true“ cost of error minimization problems. In Proc. of 36th ICALP, pages 24–36, 2009.
- [5] N. Alon, S. Dar, M. Parnas, and D. Ron. Testing of clustering. SIAM Journal on Discrete Mathematics, 16(3):393–417, 2003.
- [6] N. Alon, R. A. Duke, H. Lefmann, V. Rödl, and R. Yuster. The algorithmic aspects of the regularity lemma. Journal of Algorithms, 16(1):80–109, 1994.
- [7] N. Alon, W. Fernández de la Vega, R. Kannan, and M. Karpinski. Random sampling and approximation of MAX-CSPs. Journal of Computer and System Sciences, 67(2):212–243, 2003.
- [8] N. Alon and A. Shapira. A characterization of the (natural) graph properties testable with one-sided error. SIAM Journal on Computing, 37:1703–1727, 2008.
- [9] M.-F. Balcan, A. Blum, and S. Vempala. A discriminative framework for clustering via similarity functions. In Proc. of 40th STOC, pages 671–680, 2008.
- [10] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning, 56(1-3):89–113, 2004.
- [11] A. Ben-Dor, R. Shamir, and Z. Yakhini. Clustering gene expression patterns. Journal of Computational Biology, 6(3/4):281–297, 1999.
- [12] F. Bonchi, A. Gionis, F. Gullo, and A. Ukkonen. Chromatic correlation clustering. In Proc. of 18th KDD, pages 1321–1329, 2012.
- [13] S. Busygin, O. A. Prokopyev, and P. M. Pardalos. Biclustering in data mining. Computers & Operations Research, 35(9):2964–2987, 2008.
- [14] M. Charikar, V. Guruswami, and A. Wirth. Clustering with qualitative information. Journal of Computer and System Sciences, 71(3):360–383, 2005.
- [15] A. Czumaj and C. Sohler. Sublinear-time approximation algorithms for clustering via random sampling. Random Structures and Algorithms, 30(1–2):226–256, 2007.
- [16] A. Czumaj and C. Sohler. Small space representations for metric min-sum k-clustering and their applications. Theoretical Computer Science, 46(3):416–442, 2010.
- [17] E. D. Demaine, D. Emanuel, A. Fiat, and N. Immorlica. Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(2-3):172–187, 2006.
- [18] J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theoretical Computer Science, 348(2-3):207–216, 2005.
- [19] E. Fischer, A. Matsliah, and A. Shapira. Approximate hypergraph partitioning and applications. SIAM Journal on Computing, 39:3155–3185, 2010.
- [20] E. Fischer and I. Newman. Testing versus estimation of graph properties. SIAM Journal on Computing, 37(2):482–501, 2007.
- [21] A. M. Frieze and R. Kannan. The regularity lemma and approximation schemes for dense problems. In Proc. of 37th FOCS, pages 12–20, 1996.
- [22] A. M. Frieze and R. Kannan. Quick approximation to matrices and applications. Combinatorica, 19(2):175–220, 1999.
- [23] A. M. Frieze and R. Kannan. A simple algorithm for constructing Szemerédi’s regularity partition. Electronic Journal of Combinatorics, 6, 1999.
- [24] A. M. Frieze, R. Kannan, and S. Vempala. Fast Monte Carlo algorithms for finding low-rank approximations. Journal of the ACM, 51(6):1025–1041, 2004.
- [25] I. Giotis and V. Guruswami. Correlation clustering with a fixed number of clusters. Theory of Computing, 2(13):249–266, 2006.
- [26] T. Gowers. Lower bounds of tower type for Szemerédi’s uniformity lemma. Geometric and Functional Analysis, 7(2):322–337, 1997.
- [27] O. Hassanzadeh, F. Chiang, R. J. Miller, and H. C. Lee. Framework for evaluating clustering algorithms in duplicate detection. PVLDB, 2(1):1282–1293, 2009.
- [28] T. Hofmann and J. M. Buhmann. Active data clustering. In Advances in Neural Information Processing Systems 10 (NIPS), 1997.
- [29] M. Karpinski and W. Schudy. Linear time approximation schemes for the Gale-Berlekamp game and related minimization problems. In Proc. of 41st STOC, pages 313–322, 2009.
- [30] S. Kim, S. Nowozin, P. Kohli, and C. D. Yoo. Higher-order correlation clustering for image segmentation. In NIPS, pages 1530–1538, 2011.
- [31] J. Komlós, A. Shokoufandeh, M. Simonovits, and E. Szemerédi. The regularity lemma and its applications in graph theory. In Proc. of 19th STACS, pages 84–112, 2000.
- [32] N. Mishra, D. Oblinger, and L. Pitt. Sublinear time approximate clustering. In Proc. of 12nd SODA, pages 439–447, 2001.
- [33] M. Parnas and D. Ron. Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms. Theoretical Computer Science, 381(1-3):183–196, 2007.
- [34] R. Rubinfeld, G. Tamir, S. Vardi, and N. Xie. Fast local computation algorithms. In Proc. of second ITCS, pages 223–238, 2011.
- [35] M. E. Saks and C. Seshadhri. Local monotonicity reconstruction. SIAM Journal on Computing, 39(7):2897–2926, 2010.
- [36] G. N. Sárközy, F. Song, E. Szemerédi, and S. Trivedi. A practical regularity partitioning algorithm and its applications in clustering. Computing Research Repository, abs/1209.6540, 2012.
- [37] R. Shamir, R. Sharan, and D. Tsur. Cluster graph modification problems. Discrete Applied Mathematics, 144(1-2):173–182, 2004.
- [38] A. Sperotto and M. Pelillo. Szemerédi’s regularity lemma and its applications to pairwise clustering and segmentation. In Proc. of sixth Conference in Energy Minimization Methods in Computer Vision and Pattern Recognition, pages 13–27, 2007.
- [39] D. A. Spielman and S.-H. Teng. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on Computing, 42(1):1–26, 2013.
- [40] J. Suomela. Survey of local algorithms. ACM Computing Surveys, 45(2):24:1–24:40, Mar. 2013.
- [41] C. Swamy. Correlation clustering: maximizing agreements via semidefinite programming. In Proc. of 15th SODA, pages 526–527, 2004.
- [42] A. C.-C. Yao. Probabilistic computations: Toward a unified measure of complexity. In Proc. of 18th FOCS, pages 222–227, 1977.
Appendix A A sharper bound for Lemma 4.10
Lemma A.1
Let and define a sequence by for . Then for all ,
where .
Proof. Since replacing with does not affect the terms for , we can assume , in which case the result holds also for . Set for all . Then and
In other words, for all we have
Thus , , and
Since by the integral test
we have
so
as we wished to show.
Appendix B Finding cut decompositions implicitly
We give here an overview of Frieze and Kannan’s method [22]. In essence, the process works with the submatrix induced by certain randomly chosen subsets of size and defining and in terms of the adjacencies (matrix entries) of and with and .
We start with the following simple exponential-time algorithm for finding cut decompositions. Suppose we have found cut matrices and we want to find . Let be the residual matrix. While there exist sets with
(8) |
let , and add to the decomposition. An easy computation shows that the squared Frobenius norm of the residual matrix decreases by , i.e., at least an fraction of . Therefore this process cannot go on for more than steps. This gives a non-constructive proof of existence of cut decompositions.
How to make this procedure run in time independent of the matrix size? We can cut some slack here by replacing with some polynomial of with a larger exponent. Frieze and Kannan pick a row set and then use a sampling-based procedure to construct a column set such that the submatrix is sufficiently dense. Provided that the entries in the matrix remain bounded and inequality (LABEL:eq:weight) holds for some , they are able to find such that
which implies an -fractional decrease in the squared Frobenius norm of the residual matrix. They show that, with probability at least , we can take for the set of all with for some randomly chosen and ; and for the set of all with .
We need to deal with how to represent the sets used in the decomposition in an implicit manner. We will write down a predicate that, given and (resp., ), tells us whether (resp., ) and can be evaluated in time by making queries to . Although the size of may be linear in , its definition makes it possible to check for membership in with one query to . The set , for its part, does not admit such a quick membership test, so Frieze and Kannan work with an approximation achieved by replacing with a -sized portion thereof in the definition of . With the new definition, membership in and can be computed in time , as we shall see. Also, the density can be estimated to within accuracy by sampling with queries to .
Summarizing, we can build a cut decomposition in the following way. Let . At at stage , , the first cut matrices are implicitly known. Given the previous cut matrices, the residual matrix is given by
extend the notation to sets in the obvious manner.
The set is defined in terms of a random element and a random real by
The set is defined in terms of and a random sample of size by
Finally, the density is defined in terms of another random sample of size by
Let , . We need to compute compute , , and for all , . This can be done in time using dynamic programming and the formulas above. This allows us to compute for all in time .