This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Faster Approximation Algorithms for Parameterized Graph Clustering and Edge Labeling

Vedangi Bengali vedangibengali@tamu.edu Texas A&M UniversityCollege StationTexasUSA  and  Nate Veldt nveldt@tamu.edu Texas A&M UniversityCollege StationTexasUSA
Abstract.

Graph clustering is a fundamental task in network analysis where the goal is to detect sets of nodes that are well-connected to each other but sparsely connected to the rest of the graph. We present faster approximation algorithms for an NP-hard parameterized clustering framework called LambdaCC, which is governed by a tunable resolution parameter and generalizes many other clustering objectives such as modularity, sparsest cut, and cluster deletion. Previous LambdaCC algorithms are either heuristics with no approximation guarantees, or computationally expensive approximation algorithms. We provide fast new approximation algorithms that can be made purely combinatorial. These rely on a new parameterized edge labeling problem we introduce that generalizes previous edge labeling problems that are based on the principle of strong triadic closure and are of independent interest in social network analysis. Our methods are orders of magnitude more scalable than previous approximation algorithms and our lower bounds allow us to obtain a posteriori approximation guarantees for previous heuristics that have no approximation guarantees of their own.

graph clustering, strong triadic closure, correlation clustering, approximation algorithms
copyright: noneprice:

1. Introduction

In network analysis, graph clustering is the task of partitioning a graph into well-connected sets of nodes (called communities, clusters, or modules), that are more densely connected to each other than they are to the rest of the graph (Fortunato and Hric, 2016; Schaeffer, 2007; Porter et al., 2009). This fundamental task has widespread applications across numerous domains, including detecting related genes in biological networks (Shamir et al., 2004; Ben-Dor et al., 1999), finding communities in social networks (Veldt et al., 2018; Newman and Girvan, 2004), and image segmentation (Shi and Malik, 2000), to name only a few. A standard approach for finding clusters in a graph is to optimize some type of combinatorial objective function that encodes the quality of a clustering of nodes. Just as there are many different applications and reasons why one may wish to partition the nodes of a graph into clusters, there are many different types of objective functions for graph clustering (Newman and Girvan, 2004; Shi and Malik, 2000; Shamir et al., 2004; Bohlin et al., 2014; Delvenne et al., 2010), all of which strike a different balance between the goal of making clusters dense internally and the goal of ensuring that few edges cross cluster boundaries. In order to capture many different notions of community structure within the same framework, many graph clustering optimization objectives come with tunable resolution parameters (Schaub et al., 2012; Veldt et al., 2019a; Reichardt and Bornholdt, 2006; Delvenne et al., 2010; Newman, 2016), which control the tradeoff between the internal edge density and the inter-cluster edge density resulting from optimizing the objective.

One of the biggest challenges in graph clustering is that the vast majority of clustering objectives are NP-hard. Thus, while it is often easy to define a new way to measure clustering structure, it is very hard to find optimal (or even certifiably near-optimal) clusters in practice for any given objective. There has been extensive theoretical research on approximation algorithms for different clustering objectives (Arora et al., 2009; Leighton and Rao, 1999; Veldt et al., 2018; Charikar et al., 2003), but most of these come with high computational costs and memory constraints, often because they rely on expensive convex relaxations of the NP-hard clustering objective. On the other hand, scalable graph clustering algorithms have been designed based on local node moves and greedy heuristics (Newman and Girvan, 2004; Newman, 2006; Blondel et al., 2008; Traag et al., 2019; Veldt et al., 2018; Shi et al., 2021), but these come with no theoretical approximation guarantees. As a result, it can be challenging to tell whether the structure of an output clustering depends more on the underlying objective function or on the mechanisms of the algorithm being used.

This paper focuses on an existing optimization graph clustering framework called LambdaCC (Veldt et al., 2018; Gleich et al., 2018; Shi et al., 2021; Gan et al., 2020), which comes with two key benefits. The first is that it can detect different types of community structures by tuning a resolution parameter λ(0,1)\lambda\in(0,1). Many existing clustering objectives can be recovered as special cases for specific choices of λ\lambda (Veldt et al., 2018). The second benefit is that LambdaCC can be viewed as a special case of correlation clustering (Bansal et al., 2004), a framework for clustering based on similarity and dissimilarity scores, that has been studied extensively from the perspective of approximation algorithms (Charikar et al., 2005; Demaine et al., 2006; Gleich et al., 2018). As a result, LambdaCC directly inherits an O(logn)O(\log n) approximation algorithm that holds for any correlation clustering problem (Charikar et al., 2005; Demaine et al., 2006) and is amenable to even better approximation guarantees in some parameter regimes. Gleich et al. (Gleich et al., 2018) showed that for very small values of λ\lambda, the O(logn)O(\log n) approximation is the best that can be achieved by rounding a linear programming relaxation (the most successful approach known for approximating the objective). However, a 3-approximation algorithm has been developed for the regime where λ1/2\lambda\geq 1/2 (Veldt et al., 2018). Despite these results, LambdaCC suffers from a similar theory-practice gap as many other clustering frameworks. These previous approximation algorithms rely on expensive linear programming relaxations and are therefore not scalable. While faster heuristic algorithms do exist (Veldt et al., 2018; Shi et al., 2021), these come with no approximation guarantees.

The present work: fast approximation algorithms for parameterized graph clustering.

We develop algorithms for LambdaCC that come with rigorous approximation guarantees and are also far more scalable than existing approximation algorithms for this problem. We present new algorithms for all values of the parameter λ(0,1)\lambda\in(0,1), focusing especially on the regime λ(12,1)\lambda\in(\frac{1}{2},1), since constant factor approximations are possible in this regime and have been a focus in previous research. This is also the regime where LambdaCC interpolates between two existing objectives known as cluster editing and cluster deletion (Shamir et al., 2004). We first of all design a fast combinatorial approximation algorithm that returns a 6-approximation for any value of λ(12,1)\lambda\in(\frac{1}{2},1), that runs in only O(dv2)O(\sum d_{v}^{2}) time, where dvd_{v} is the degree of node vv. While this is a factor of 2 worse than the best existing 3-approximation, it is orders of magnitude faster than this previous approach, which requires solving an LP relaxation with O(n3)O(n^{3}) variables for an nn-node graph and takes Ω(n6)\Omega(n^{6}) time. Our second algorithm is an improved 72/λ7-{2}/{\lambda} approximation for λ12\lambda\geq\frac{1}{2} (which ranges from 3 to 5 as λ1\lambda\rightarrow 1) based on rounding an LP relaxation with far fewer constraints. In numerical experiments, we confirm for a large collection of real-world networks that the number of constraints in this cheaper LP tends to be orders of magnitude smaller than the O(n3)O(n^{3}) constraint set of the canonical LP relaxation. It can also be run on graphs that are so large that even forming the O(n3)O(n^{3}) constraint matrix for the canonical LP relaxation leads to memory issues. Even more significantly, this cheaper LP that we consider is a covering LP, a special type of LP that can be solved using combinatorial algorithms based on the multiplicative weights update method (Fleischer, 2004; Quanrud, 2020; Garg and Khandekar, 2004).

We also adapt our techniques to obtain a (1+1/λ)(1+1/\lambda) approximation by rounding the cheaper LP when λ<12\lambda<\frac{1}{2}. As is the case when rounding the righter and more expensive canonical LP relaxation, this gets increasingly worse as λ\lambda decreases. This is not surprising, given that even the canonical LP relaxation has an O(logn)O(\log n) integrality gap (Gleich et al., 2018). Our (1+1/λ)(1+1/\lambda) approximation is in fact quite close to the previous 1/λ1/\lambda approximation for small λ\lambda that was previously developed by Gleich et al. (Gleich et al., 2018) based on the canonical LP.

All of our approximation algorithms rely on a new connection between LambdaCC and an edge labeling problem that is based on the social network analysis principle of strong triadic closure (Easley and Kleinberg, 2010; Sintos and Tsaparas, 2014; Granovetter, 1973). This principle posits that if two people share strong links to a mutual friend, then they are likely to share at least a weak connection with each other. This principle has inspired a line of research on strong triadic closure (STC) labeling problems (Sintos and Tsaparas, 2014; Oettershagen et al., 2023; Veldt, 2022; Grüttemeier and Komusiewicz, 2020; Grüttemeier and Morawietz, 2020), which label edges in a graph as weak or strong (or in some cases add “missing” edges) in order to satisfy the strong triadic closure property. Previous research has shown that unweighted variants of this labeling problem are related to cluster editing and cluster deletion (Grüttemeier and Komusiewicz, 2020; Grüttemeier and Morawietz, 2020) (special cases of LambdaCC when λ=1/2\lambda=1/2 and λ1\lambda\approx 1 respectively). Recently it was shown that lower bounds and algorithms for these unweighted STC problems can be useful tools in designing faster approximation algorithms for cluster editing and cluster deletion (Veldt, 2022). We generalize this strategy by defining a new parameterized edge-labeling problem we call LambdaSTC, which provides new types of lower bounds for LambdaCC. We also provide a 3-approximation algorithm for LambdaSTC that applies for every value of λ(0,1)\lambda\in(0,1). All of these constitute new results for an edge labeling problem of independent interest in social network analysis, but our primary motivation is to use them to develop faster clustering approximation algorithms.

We demonstrate in numerical experiments that our algorithms are fast and effective, far surpassing their theoretical guarantees. In our experiments, we even find that solving our cheaper LP relaxation actually tends to return a solution that can quickly be certified to be the optimal solution for the more expensive canonical LP relaxation for LambdaCC. When this happens, we can use previous rounding techniques that guarantee a 3-approximation for λ12\lambda\geq\frac{1}{2}.

2. Preliminaries and Related Work

We begin with technical preliminaries on graph clustering, correlation clustering, and strong triadic closure edge labeling problems.

2.1. The LambdaCC Framework

Given an undirected graph G=(V,E)G=(V,E) the high-level goal of a graph clustering algorithm is to partition the node set VV into disjoint clusters in such a way that many edges are contained inside clusters, and few edges cross between clusters. These two goals are often in competition with each other, and there have been many different approaches for defining and forming clusters, all of which implicitly strike a different type of tradeoff between these goals. The LambdaCC clustering objective (Veldt et al., 2018) provides one approach for implicitly controlling this tradeoff using a resolution parameter λ(0,1)\lambda\in(0,1). Formally, given G=(V,E)G=(V,E) and parameter λ(0,1)\lambda\in(0,1), LambdaCC seeks a clustering that minimizes the following objective

(1) minδ(i,j)E(1λ)(1δij)+(i,j)Eλδij,\min_{\delta}\quad\sum_{(i,j)\in E}(1-\lambda)(1-\delta_{ij})+\sum_{(i,j)\notin E}\lambda\delta_{ij},

where δij\delta_{ij} is a binary cluster indicator for every node-pair, i.e., δij=1\delta_{ij}=1 if ii and jj are clustered together and 0 otherwise. The number of clusters to form is not specified. Rather, the optimal number of clusters is controlled implicitly by tuning λ\lambda. Observe that the two pieces of the LambdaCC objective directly correspond to the two goals of graph clustering: the term (1λ)(1δij)(1-\lambda)(1-\delta_{ij}) for (i,j)E(i,j)\in E is a penalty incurred if ii and jj are separated, and the term λδij\lambda\delta_{ij} for (i,j)E(i,j)\notin E places a penalty on putting ii and jj together if they do not share an edge. The relative importance of the two competing goals of graph clustering (form clusters that are internally dense and externally sparse) is then controlled by tuning λ\lambda. Smaller values of λ\lambda tend to produce a smaller number of (larger) clusters, and choosing large λ\lambda leads to a larger number of (smaller) clusters.

One of the benefits of the LambdaCC framework is that it generalizes and unifies a number of previously studied objectives for graph clustering, including the sparsest cut objective (Arora et al., 2009), cluster editing (Shamir et al., 2004; Bansal et al., 2004), and cluster deletion (Shamir et al., 2004). It also is equivalent to the popular modularity clustering objective (Newman and Girvan, 2004) under certain conditions. The definition of modularity depends on underlying null distributions for graphs. When the Erdős-Rényi null model is used, modularity is equivalent to Objective (1) for an appropriate choice of λ\lambda. When the Chung-Lu null model is used, modularity can be viewed as a special case of a degree-weighted version of LambdaCC (Veldt et al., 2018), though we focus on Objective (1) in this paper. Finally, for an appropriate choice of λ\lambda, the LambdaCC objective is equivalent to graph clustering based on maximum likelihood inference for the popular stochastic block model (Abbe, 2018), which can be seen from LambdaCC’s relationship to modularity (Newman, 2016).

2.2. Correlation Clustering

The CC in LambdaCC stands for correlation clustering (Bansal et al., 2004), a framework for clustering based on pairwise similarity and dissimilarity scores. In the most general setting, an instance of weighted correlation clustering is given by a set of vertices VV, along with two non-negative weights (wij+,wij)(w_{ij}^{+},w_{ij}^{-}) for each pair of distinct vertices i,jVi,j\in V. If nodes ii and jj are placed in the same cluster, this incurs a disagreement penalty of wijw_{ij}^{-} whereas if they are separated, a disagreement penalty of wij+w_{ij}^{+} is imposed. In correlation clustering, disagreements are also called mistakes. This terminology is especially natural when only one of the weights (wij+,wij)(w_{ij}^{+},w_{ij}^{-}) is positive and the other is zero (which is true for the most widely-studied special cases). In this case, each node pair (i,j)(i,j) is either “similar” (wij+>0w_{ij}^{+}>0) and wants to be clustered together, or “dissimilar” (wij>0w_{ij}^{-}>0) and wants to be clustered apart. A mistake or disagreement happens precisely when nodes are clustered in a way that does not match this “preference.” Formally, the disagreements minimization objective for correlation clustering can be represented as the following integer linear program (ILP):

(2) min i,jwij+xij+wij(1xij)\displaystyle{\textstyle\sum_{i,j}w_{ij}^{+}x_{ij}+w_{ij}^{-}(1-x_{ij})}
subject to xijxik+xjk for all i,j,k\displaystyle x_{ij}\leq x_{ik}+x_{jk}\text{ for all }i,j,k
xij{0,1} for all i,j\displaystyle x_{ij}\in\{0,1\}\text{ for all }i,j

where xijx_{ij} is a binary distance variable between nodes i,ji,j, i.e., xij=0x_{ij}=0 means nodes ii and jj are clustered together, and xij=1x_{ij}=1 means they are separated. The most well-studied special case is when (wij+,wij){(1,0),(0,1)}(w_{ij}^{+},w_{ij}^{-})\in\{(1,0),(0,1)\}. This is known as complete unweighted correlation clustering, as it is often viewed as a clustering objective on a complete signed graph where each pair of nodes either defines a positive edge or a negative edge. This is equivalent to the cluster editing problem (Shamir et al., 2004), which seeks to add or delete the minimum number of edges in an unsigned graph G=(V,E)G=(V,E) to partition it into a disjoint set of cliques. This is in turn related to cluster deletion, where one can only delete edges in GG in order to partition it into cliques. Cluster deletion is the same as solving Objective (2) when (wij+,wij){(1,0),(0,)}(w_{ij}^{+},w_{ij}^{-})\in\{(1,0),(0,\infty)\} for every pair of nodes (i,j)(i,j).

Approximation algorithms. Correlation clustering is NP-hard even for special cases of cluster editing and cluster deletion, but many approximation algorithms have been designed (Ailon et al., 2008; Charikar et al., 2005; Chawla et al., 2015). Most of these algorithms rely on solving and rounding a linear programming (LP) relaxation of ILP (2), obtained by replacing xij{0,1}x_{ij}\in\{0,1\} with the constraint xij[0,1]x_{ij}\in[0,1]. For the general weighted case, the best approximation guarantee is O(logn)O(\log n), which matches the integrality gap of the linear program (Charikar et al., 2005; Demaine et al., 2006). However, constant factor approximations are possible for certain weighted cases (Ailon et al., 2008; Veldt et al., 2020; Ailon et al., 2012). Ailon et al. (Ailon et al., 2008) designed a fast randomized combinatorial algorithm called Pivot for the complete unweighted case. This algorithm repeatedly selects a uniform random pivot node in each iteration and clusters it together with its unclustered neighboring nodes that share a positive edge. This algorithm comes with an expected 3-approximation guarantee. However, for general weighted correlation clustering, it can produce poor results.

Deterministic pivot. A derandomized version of the standard Pivot algorithm was developed by van Zuylen and Williamson (van Zuylen and Williamson, 2009), which can be applied to a broader class of weighted correlation clustering problems. Instead of randomly choosing pivot nodes, this technique relies on solving the LP relaxation of correlation clustering, constructing a derived graph G^\hat{G} based on the solution to this LP, and then running a pivoting procedure in G^\hat{G} that deterministically selects pivot nodes based on the LP output. They showed that this can produce a deterministic 3-approximation algorithm for the complete unweighted case, and can also be applied to other weighted cases including the case of probability constraints (where wij++wij=1w_{ij}^{+}+w_{ij}^{-}=1 for every pair {i,j}\{i,j\}). In proving these results, van Zuylen and Williamson presented a useful theorem (Theorem 3.1 in (van Zuylen and Williamson, 2009)) that can be used as a general strategy for developing approximation algorithms for other special weighted variants. We state a version of this theorem below, as it will be a useful step for some of our results. The original theorem includes details for choosing pivot nodes deterministically. The approximation holds in expectation when choosing pivot nodes uniformly at random.

Theorem 2.1.

Consider an instance of weighted correlation clustering given by a node-set VV and weights {wij+,wij}ij\{w^{+}_{ij},w^{-}_{ij}\}_{i\neq j}. Let bijb_{ij} represent a budget for node pair {i,j}\{i,j\} where iji\neq j, and G^=(V,E^)\hat{G}=(V,\hat{E}) be a graph which for α>0\alpha>0 satisfies the following conditions:

  1. (1)

    wijαbijw_{ij}^{-}\leq\alpha b_{ij} for all edges (i,j)E^(i,j)\in\hat{E}, and
    wij+αbijw_{ij}^{+}\leq\alpha b_{ij} for all edges (i,j)E^(i,j)\notin\hat{E},

  2. (2)

    wij++wjk++wikα(bij+bjk+bik)w_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-}\leq\alpha(b_{ij}+b_{jk}+b_{ik}) for every triplet {i,j,k}\{i,j,k\} in G^\hat{G} where {i,j}E^,{j,k}E^,{i,k}E^\{i,j\}\in\hat{E},\{j,k\}\in\hat{E},\{i,k\}\notin\hat{E}.

Applying Pivot to G^\hat{G} will return a clustering solution with an expected weight of disagreements bounded above by αi<jbij\alpha\sum_{i<j}b_{ij}.

Approximations for LambdaCC. The LambdaCC objective on a graph G=(V,E)G=(V,E) corresponds to a special case of Objective (2) where (wij+,wij)=(1λ,0)(w_{ij}^{+},w_{ij}^{-})=(1-\lambda,0) if (i,j)E(i,j)\in E and (wij+,wij)=(0,λ)(w_{ij}^{+},w_{ij}^{-})=(0,\lambda) if (i,j)E(i,j)\notin E. Veldt et al. (Veldt et al., 2018) previously showed a 3-approximation algorithm for the case where λ1/2\lambda\geq 1/2, based on rounding the standard correlation clustering LP relaxation. However, because this LP has O(n3)O(n^{3}) constraints for an nn-node graph, in practice it is challenging to solve it for graphs with even a thousand nodes. Later, Gleich et al. (Gleich et al., 2018) proved that the LP integrality gap can be O(logn)O(\log n) for small values of λ\lambda. They also developed approximation guarantees for smaller values λ\lambda, but these get increasingly worse as λ0\lambda\rightarrow 0. Faster heuristics algorithms for LambdaCC have also been developed (Veldt et al., 2018; Shi et al., 2021), but these come with no approximation guarantees. Thus, a limitation of this previous work is that existing LambdaCC algorithms either depend on an extremely expensive linear programming relaxation or come with no guarantees. The focus of our paper is to bridge this gap.

2.3. Strong Triadic Closure and Edge Labeling

In social network analysis, the principle of strong triadic closure (Easley and Kleinberg, 2010; Granovetter, 1973) posits that two individuals in a social network will share at least a weak connection if they both share strong connections to a common friend. This has been used to define certain types of edge labeling problems where the goal is to label edges in a graph G=(V,E)G=(V,E) in such a way that this principle holds (Sintos and Tsaparas, 2014; Oettershagen et al., 2023; Grüttemeier and Komusiewicz, 2020; Grüttemeier and Morawietz, 2020; Adriaens et al., 2020).

Given a graph G=(V,E)G=(V,E) (which could represent an observed set of social interactions), a triplet of vertices (i,j,k)(i,j,k) is an open wedge centered at jj if the vertex pairs (i,j)(i,j) and (j,k)(j,k) are edges (i.e., in EE) while (i,k)(i,k) is not. The strong triadic closure principle suggests that if such an open wedge exists, then either (i,j)(i,j) or (j,k)(j,k) is a weak edge, or else (i,k)(i,k) is a missing connection that should appear as an edge in GG but was not observed when GG was constructed. With this principle in mind, Sintos and Tsaparas (Sintos and Tsaparas, 2014) defined the strong triadic closure labeling problem (minSTC), where the goal is to label edges as weak and strong so that every open wedge contains as least one weak edge, and in such a way that the number of edges labeled weak is as small as possible. They showed that the problem is NP-hard but has a 2-approximation algorithm based on reduction to the Vertex Cover problem. They also considered a variation of the problem that allows for edge additions (minSTC+).

In our paper, we use 𝒲j\mathcal{W}_{j} to denote the set of open wedges centered at jj in GG, and let 𝒲=jV𝒲j\mathcal{W}=\cup_{j\in V}\mathcal{W}_{j}. We use the term STC-labeling to indicate labeling of node pairs (i,j)(V2)(i,j)\in{V\choose 2} that satisfies the strong triadic closure in the following sense: for every open wedge (i,j,k)(i,j,k) centered at jj, at least one of the edges {(i,j),(j,k)}\{(i,j),(j,k)\} is labeled weak, or the non-edge (i,j)E(i,j)\notin E is labeled as a missing edge. Such labeling is encoded by a collection of weak edges denoted as E𝑤𝑒𝑎𝑘EE_{\mathit{weak}}\subseteq E, as well as a set of missing edges denoted as E𝑚𝑖𝑠𝑠((V2)E)E_{\mathit{miss}}\subseteq({V\choose 2}-E). The minSTC+ problem seeks an STC-labeling (E𝑤𝑒𝑎𝑘,E𝑚𝑖𝑠𝑠)(E_{\mathit{weak}},E_{\mathit{miss}}) that minimizes |E𝑤𝑒𝑎𝑘|+|E𝑚𝑖𝑠𝑠||E_{\mathit{weak}}|+|E_{\mathit{miss}}|. This can be formally cast as the following ILP:

(3) min(i,j)(V2)zijs.t. zij+zjk+zik1 if (i,j,k)𝒲jzij{0,1} for (i,j)(V2).\begin{array}[]{lll}\min&\displaystyle{\sum_{(i,j)\in{V\choose 2}}}z_{ij}&\\ \text{s.t. }&z_{ij}+z_{jk}+z_{ik}\geq 1&\text{ if $(i,j,k)\in\mathcal{W}_{j}$}\\ &z_{ij}\in\{0,1\}&\text{ for $(i,j)\in{V\choose 2}$}.\end{array}

If zij=1z_{ij}=1, this represents the presence of either a weak edge (if (i,j)E(i,j)\in E) or a missing edge (if (i,j)E(i,j)\notin E). This problem is also NP-hard but can be reduced to Vertex Cover in a 3-uniform hypergraph in order to obtain a 3-approximation algorithm.

While the strong triadic closure principle and the resulting edge labeling problems are of their own independent interest, we are particularly interested in these problems given their relationships with certain clustering objectives. The solution for minSTC is known to lower bound the cluster deletion objective, and minSTC+ similarly lower bounds cluster editing (Grüttemeier and Komusiewicz, 2020; Grüttemeier and Morawietz, 2020; Konstantinidis et al., 2018; Veldt, 2022). The LP relaxations for these problems, therefore, provide lower bounds for cluster deletion and clustering editing that are cheaper and easier to compute than the standard linear programming relaxations. Veldt (Veldt, 2022) recently showed how to round these LP relaxations—and how to round approximation solutions for minSTC+ and minSTC—to develop faster approximation algorithms for cluster editing and cluster deletion. We generalize these techniques in order to develop faster approximation algorithms for the full parameter regime of LambdaCC.

3. Lambda STC Labeling

We now introduce a parameterized edge labeling problem called LambdaSTC, which generalizes previous edge labeling problems and can also be used to develop new approximations for LambdaCC.

Problem definition. Given graph G=(V,E)G=(V,E) and a parameter λ\lambda, LambdaSTC is the problem of finding an STC-labeling (E𝑤𝑒𝑎𝑘,E𝑚𝑖𝑠𝑠)(E_{\mathit{weak}},E_{\mathit{miss}}) that minimizes (1λ)|E𝑤𝑒𝑎𝑘|+λ|E𝑚𝑖𝑠𝑠|(1-\lambda)|E_{\mathit{weak}}|+\lambda|E_{\mathit{miss}}|. This can be formulated as:

(4) min (i,j)E(1λ)zij+(i,j)Eλzij\displaystyle{\textstyle\sum_{(i,j)\in E}(1-\lambda)z_{ij}+\sum_{(i,j)\notin E}\lambda z_{ij}}
s.t zij+zjk+zik1 for all (i,j,k)𝒲j\displaystyle z_{ij}+z_{jk}+z_{ik}\geq 1\text{ for all }(i,j,k)\in\mathcal{W}_{j}
zij{0,1} for all (i,j)(V2).\displaystyle z_{ij}\in\{0,1\}\text{ for all }(i,j)\in{\textstyle{V\choose 2}.}

We first note that this problem is equivalent to the minSTC+ problem when λ=12\lambda=\frac{1}{2}. When λ\lambda is close enough to 1, LambdaSTC is equivalent to minSTC. To see why, note that if λ>|E|(1λ)\lambda>|E|(1-\lambda), then labeling a single non-edge as “missing” is more expensive than labeling all edges in EE as “weak”. Hence, with a couple of steps of algebra, we can see that when λ>|E||E|+1\lambda>\frac{|E|}{|E|+1}, the optimal LambdaSTC solution will not place any non-edges in E𝑚𝑖𝑠𝑠E_{\mathit{miss}}, but will only add edges to E𝑤𝑒𝑎𝑘E_{\mathit{weak}} in order to construct a valid STC-labeling, so this differs from minSTC only by a multiplicative constant factor.

Varying λ\lambda between 12\frac{1}{2} and 11 offers us the flexibility to interpolate between minSTC+ and minSTC. Meanwhile, the λ<12\lambda<\frac{1}{2} regime corresponds to a new family of edge labeling problems where it is cheaper to label non-edges as missing. If we think of the graph G=(V,E)G=(V,E) as a (potentially noisy) representation of some social network observed from the real world, then the parameter λ\lambda can be chosen based on a user’s belief about the accuracy of the process that was used to observe edges. If the user has a strong belief that there are many friendships in the social network that were just not directly observed (and hence are not included as edges in the graph) then a smaller value of λ\lambda may be appropriate. If missing edges are unlikely, then a large value of λ\lambda is appropriate.

Approximating LambdaSTC. Approximation algorithms for minSTC and minSTC+ can be obtained by reducing these problems to unweighted Vertex Cover problems (in graphs and hypergraphs, respectively) (Sintos and Tsaparas, 2014; Grüttemeier and Morawietz, 2020). We generalize this approach and design an approximation algorithm that applies to LambdaSTC for any choice of the parameter λ\lambda, based on the Local-Ratio algorithm for weighted Vertex Cover (Bar-Yehuda and Even, 1985).

Algorithm 1 CoverLabel(G,λ)\textsc{CoverLabel}(G,\lambda)
1:  Input: Undirected graph G=(V,E);λ(0,1)G=(V,E);\lambda\in(0,1)
2:  Output: LambdaSTC labeling {Eweak,Emiss}\{E_{weak},E_{miss}\} of GG.
3:  Emiss,EweakE_{miss},E_{weak}\leftarrow\emptyset // Initialize empty sets
4:  (i,j)E\forall(i,j)\in E set rij=(1λ)r_{ij}=(1-\lambda) ; (i,j)E\forall(i,j)\notin E set rij=λr_{ij}=\lambda
5:  for {i,j,k}𝐖j\{i,j,k\}\in\mathbf{W}_{j} do
6:     M = min{rij,rjk,rik}\min\{r_{ij},r_{jk},r_{ik}\}
7:     rijrijMr_{ij}\leftarrow r_{ij}-M;  rjkrjkMr_{jk}\leftarrow r_{jk}-M;  rikrikMr_{ik}\leftarrow r_{ik}-M
8:  Emiss={(i,j)E:rij=0}E_{miss}=\{(i,j)\notin E\colon r_{ij}=0\}; Eweak={(i,j)E:rij=0}E_{weak}=\{(i,j)\in E\colon r_{ij}=0\}
9:  Return {Emiss,Eweak}\{E_{miss},E_{weak}\}

Algorithm 1 is pseudocode for our method, CoverLabel. This method “covers” all the open wedges in graph GG by either adding a missing edge between a pair of non-adjacent nodes or labeling at least one of the two edges as weak. This can be seen as finding a weighted vertex cover on a 3-uniform hypergraph =(𝒱,)\mathcal{H}=(\mathcal{V,E}) constructed as follows:

  • Every node pair (i,j)(V2)(i,j)\in{V\choose 2} is assigned to a vertex vijv_{ij} in 𝒱\mathcal{V} with a node-weight of (1λ)(1-\lambda) if (i,j)E(i,j)\in E and λ\lambda otherwise.

  • A hyperedge {vij,vjk,vik}\{v_{ij},v_{jk},v_{ik}\}\in\mathcal{E} is created for each open wedge {i,j,k}𝒲\{i,j,k\}\in\mathcal{W}.

Nodes in \mathcal{H} correspond to edges in GG, and hyperedges in \mathcal{H} correspond to open wedges in GG. Therefore, a vertex cover in \mathcal{H} corresponds to a labeling of edges in GG that “covers” all open wedges in a way that produces an STC-labeling. If the covered vertex is associated with an edge (i,j)E(i,j)\in E, we consider (i,j)(i,j) a weak edge. However, if it corresponds to a non-edge pair (i,j)E(i,j)\notin E, this is labeled as a missing edge. This provides an approximation-preserving reduction from LambdaSTC to 3-uniform hypergraph weighted Vertex Cover, so employing a 3-approximation algorithm for hypergraph vertex cover results in a 3-approximation for LambdaSTC. CoverLabel is equivalent to implicitly applying the Local-Ratio algorithm (Bar-Yehuda and Even, 1985) to the hypergraph \mathcal{H} described above. By implicitly, we mean that we do not form \mathcal{H} explicitly, but we apply the mechanics of this algorithm directly to find an STC-labeling in GG. The following theorem follows from the guarantee of the Local-Ratio algorithm for node-weighted 3-uniform hypergraphs.

Theorem 3.1.

Algorithm 1 is a 3-approximation algorithm for the LambdaSTC labeling problem.

New Lower bounds for LambdaCC. The LambdaSTC objective lower bounds LambdaCC, and a solution to the LambdaSTC LP yields a new type of lower bound for LambdaCC. To see why, consider the following change of variables: zij=xijz_{ij}=x_{ij} if (i,j)E(i,j)\in E, and zij=1xijz_{ij}=1-x_{ij} otherwise. This gives us the following equivalent formulation for the LP relaxation of LambdaSTC:

(5) min (i,j)E(1λ)xij+(i,j)Eλ(1xij)\displaystyle{\textstyle\sum_{(i,j)\in E}(1-\lambda)x_{ij}+\sum_{(i,j)\notin E}\lambda(1-x_{ij})}
s.t. xij+xjkxik if(i,j,k)𝒲j\displaystyle x_{ij}+x_{jk}\geq x_{ik}\text{ if}(i,j,k)\in\mathcal{W}_{j}
0xij1 for all (i,j)(V2).\displaystyle 0\leq x_{ij}\leq 1\text{ for all }(i,j)\in{\textstyle{V\choose 2}}.

This linear program shares the same objective function as the canonical LambdaCC LP relaxation, but has a subset of the O(n3)O(n^{3}) triangle inequality constraints. In particular, it only constraints xij+xjkxikx_{ij}+x_{jk}\geq x_{ik} when {i,j,k}\{i,j,k\} is an open wedge, rather than for all triplets of edges. This makes this LP relaxation easier to solve on a large scale. Furthermore, this is an example of a covering LP, which can be solved much more quickly than a generic LP using the multiplicative weights update method (Fleischer, 2004; Quanrud, 2020; Garg and Khandekar, 2004).

4. Faster LambdaCC Algorithms

We now present faster algorithms for LambdaCC, using lower bounds derived from the LambdaSTC objective. For a fixed λ\lambda value, we use the notation STC(λ)\textbf{STC}_{(\lambda)} and CC(λ)\textbf{CC}_{(\lambda)} to represent the optimal solution values for LambdaSTC and LambdaCC, respectively.

4.1. CoverFlipPivot algorithm

We present the first combinatorial algorithm for LambdaCC, called CoverFlipPivot (CFP), which provides a 6 approximation for every λ1/2\lambda\geq 1/2. As outlined in Algorithm 2, CFP comprises three steps:

  1. (1)

    Cover: Generate a feasible LambdaSTC labeling of GG by running the 3-approximate CoverLabel algorithm.

  2. (2)

    Flip: Flip the edges in the original graph G=(V,E)G=(V,E) to create a derived graph G^=(V,E^)\hat{G}=(V,\hat{E}), by deleting ‘weak’ edges E𝑤𝑒𝑎𝑘E_{\mathit{weak}} and adding ‘missing’ edges E𝑚𝑖𝑠𝑠E_{\mathit{miss}}.

  3. (3)

    Pivot: Run Pivot on the derived graph G^\hat{G}.

Algorithm 2 CoverFlipPivot(G,λ)\textsc{CoverFlipPivot}(G,\lambda)
1:  Input: Undirected graph G=(V,E);λ1/2G=(V,E);\lambda\geq 1/2
2:  Output: Feasible LambdaCC clustering of GG.
3:  Cover: {E𝑤𝑒𝑎𝑘,E𝑚𝑖𝑠𝑠}\{E_{\mathit{weak}},E_{\mathit{miss}}\} = CoverLabel(G,λ)(G,\lambda)
4:  Flip: Construct G^=(V,E^)\hat{G}=(V,\hat{E}) where E^=E𝑚𝑖𝑠𝑠(EE𝑤𝑒𝑎𝑘)\hat{E}=E_{\mathit{miss}}\cup(E-E_{\mathit{weak}})
5:  Return Pivot(G^)\textsc{Pivot}(\hat{G})

Before proving any approximation guarantees for CFP, we begin with a more general result that sheds light on the relationship between LambdaSTC and LambdaCC. This generalizes previous results showing that the optimal cluster deletion and minSTC objectives (and similarly, the cluster editing and minSTC+ objectives) differ by at most a factor of 2 (Veldt, 2022).

Theorem 4.1.

Given an input graph G=(V,E)G=(V,E), a clustering parameter λ1/2\lambda\geq 1/2, and an STC-labeling (E𝑤𝑒𝑎𝑘,E𝑚𝑖𝑠𝑠)(E_{\mathit{weak}},E_{\mathit{miss}}), running Pivot on the derived graph G^=(V,E^)\hat{G}=(V,\hat{E}) where E^=E𝑚𝑖𝑠𝑠(EE𝑤𝑒𝑎𝑘)\hat{E}=E_{\mathit{miss}}\cup(E-E_{\mathit{weak}}), returns a LambdaCC clustering solution with a bound of 2((1λ)|E𝑤𝑒𝑎𝑘|+λ|E𝑚𝑖𝑠𝑠|)2((1-\lambda)|E_{\mathit{weak}}|+\lambda|E_{\mathit{miss}}|) on the expected cost of disagreements.

Proof.

To prove this result, we show that all conditions of Theorem 2.1 are satisfied for α=2\alpha=2. Recall that for the LambdaCC framework, weights are defined as:

(6) (wij+,wij)={(1λ,0) if (i,j)E (0,λ) if (i,j)E.(w_{ij}^{+},w_{ij}^{-})=\begin{cases}(1-\lambda,0)&\text{ if $(i,j)\in E$ }\\ (0,\lambda)&\text{ if $(i,j)\notin E$}.\end{cases}

To bound the LambdaCC objective in terms of the STC-labeling, we define budgets based on flipped edges:

(7) bij={(1λ) if (i,j)E𝑤𝑒𝑎𝑘 λif (i,j)E𝑚𝑖𝑠𝑠0 otherwise.b_{ij}=\begin{cases}(1-\lambda)&\text{ if $(i,j)\in E_{\mathit{weak}}$ }\\ \lambda&\text{if $(i,j)\in E_{\mathit{miss}}$}\\ 0&\text{ otherwise.}\end{cases}

The sum of budgets can now be written as i<jbij=(1λ)|E𝑤𝑒𝑎𝑘|+λ|E𝑚𝑖𝑠𝑠|\sum_{i<j}b_{ij}=(1-\lambda)|E_{\mathit{weak}}|+\lambda|E_{\mathit{miss}}|. Now, we show that Condition (1) of Theorem 2.1 is satisfied for α=2\alpha=2, by considering four cases:

if (i,j)E^E then wij=02bij,\displaystyle\text{ if }(i,j)\in\hat{E}\cap E\text{ then }w_{ij}^{-}=0\leq 2b_{ij},
if (i,j)E^ but (i,j)E then wij=λ2λ=2bij,\displaystyle\text{ if }(i,j)\in\hat{E}\text{ but }(i,j)\notin E\text{ then }w_{ij}^{-}=\lambda\leq 2\lambda=2b_{ij},
if (i,j)E^ and (i,j)E then wij+=02bij and\displaystyle\text{ if }(i,j)\notin\hat{E}\text{ and }(i,j)\notin E\text{ then }w_{ij}^{+}=0\leq 2b_{ij}\text{ and }
if (i,j)E^ but (i,j)E then wij+=1λ2(1λ)=2bij.\displaystyle\text{ if }(i,j)\notin\hat{E}\text{ but }(i,j)\in E\text{ then }w_{ij}^{+}=1-\lambda\leq 2(1-\lambda)=2b_{ij}.

Next, we check Condition (2) of Theorem 2.1, i.e., we prove that wij++wjk++wikα(bij+bjk+bik)w_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-}\leq\alpha(b_{ij}+b_{jk}+b_{ik}) for every open wedge (i,j,k)(i,j,k) centered at jj in G^\hat{G}. The budgets and weights {wij+,wjk+,wik}\{w_{ij}^{+},w_{jk}^{+},w_{ik}^{-}\} depend on which pairs (i,j)(i,j), (j,k)(j,k), (i,k)(i,k) are edges in GG. Table 1 covers all 8 cases for how a triplet of nodes in GG could be mapped to an open wedge centered at jj in G^\hat{G}. The first column indicates which of the pairs {(i,j)(j,k)(i,k)}\{(i,j)(j,k)(i,k)\} are edges in GG, e.g., Y-Y-N (“yes-yes-no”) means that (i,j)(i,j) and (j,k)(j,k) are in EE, but (i,k)(i,k) is not. In each case, we show why wij++wjk++wikw_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-} is a lower bound for α(bij+bjk+bik)\alpha(b_{ij}+b_{jk}+b_{ik}) when α=2\alpha=2. By Theorem 2.1 the total cost of mistakes is then bounded by αi<jbij=2((1λ)|E𝑤𝑒𝑎𝑘|+λ|E𝑚𝑖𝑠𝑠|)\alpha\sum_{i<j}b_{ij}=2((1-\lambda)|E_{\mathit{weak}}|+\lambda|E_{\mathit{miss}}|).

Table 1. Proof of Theorem 2.1 Condition (2)
Edges in EE
ijjkikij-jk-ik 2(bij+bjk+bik)2(b_{ij}+b_{jk}+b_{ik}) wij++wjk++wikw_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-}
Y-Y-Y 2(0+0+1λ)=2(1λ)2(0+0+1-\lambda)=2(1-\lambda) 2(1λ)2(1-\lambda)
Y-Y-N Not Applicable N.A
Y-N-N 2(0+λ+0)=2λ2(0+\lambda+0)=2\lambda 12λ1\leq 2\lambda
Y-N-Y 2(0+λ+1λ)=22(0+\lambda+1-\lambda)=2 1λ<21-\lambda<2
N-N-Y 2(λ+λ+1λ)=2(1+λ)2(\lambda+\lambda+1-\lambda)=2(1+\lambda) 0<2(1+λ)0<2(1+\lambda)
N-Y-Y 2(λ+0+1λ)=22(\lambda+0+1-\lambda)=2 1λ<21-\lambda<2
N-Y-N 2(λ+0+0)=2λ2(\lambda+0+0)=2\lambda 12λ1\leq 2\lambda
N-N-N 2(λ+λ+0)=2λ2(\lambda+\lambda+0)=2\lambda λ<2λ\lambda<2\lambda

The second row of Table 1 corresponds to the case where an open wedge in GG maps to an open wedge in G^\hat{G}. However, this is in fact impossible, as it implies that none of the node pairs in (i,j,k)(i,j,k) were flipped, even though (i,j,k)(i,j,k) is an open wedge in GG, which violates the assumption that (E𝑤𝑒𝑎𝑘,Emiss)(E_{\mathit{weak}},E_{miss}) is an STC-labeling. ∎

Corollary 4.2.

Let 𝒜\mathcal{A} be a β\beta-approximation algorithm for LambdaSTC and fix λ1/2\lambda\geq 1/2. Running the procedure in Theorem 4.1 on the solution (E𝑤𝑒𝑎𝑘,E𝑚𝑖𝑠𝑠)(E_{\mathit{weak}},E_{\mathit{miss}}) obtained from 𝒜\mathcal{A} yields a (2β)(2\beta)-approximate solution for LambdaCC. Thus, STC(λ)CC(λ)2STC(λ)\textbf{STC}_{(\lambda)}\leq\textbf{CC}_{(\lambda)}\leq 2\textbf{STC}_{(\lambda)}, and Algorithm 2 is a randomized 6-approximation for LambdaCC.

Proof.

The optimal LambdaCC solution provides an upper bound for the optimal LambdaSTC solution, i.e., STC(λ)CC(λ)\textbf{STC}_{(\lambda)}\leq\textbf{CC}_{(\lambda)}. Algorithm 𝒜\mathcal{A} produces a β\beta-approximate labeling solution (E𝑤𝑒𝑎𝑘,E𝑚𝑖𝑠𝑠)(E_{\mathit{weak}},E_{\mathit{miss}}) with the LambdaSTC objective value STC(𝒜)=((1λ)|E𝑤𝑒𝑎𝑘|+λ|E𝑚𝑖𝑠𝑠|)\text{STC}({\mathcal{A}})=((1-\lambda)|E_{\mathit{weak}}|+\lambda|E_{\mathit{miss}}|), so we have that

STC(λ)STC(𝒜)βSTC(λ)βCC(λ).\displaystyle\textbf{STC}_{(\lambda)}\leq\text{STC}({\mathcal{A}})\leq\beta\textbf{STC}_{(\lambda)}\leq\beta\textbf{CC}_{(\lambda)}.

Therefore, 1βSTC(𝒜)\frac{1}{\beta}\text{STC}({\mathcal{A}}) lower bounds CC(λ)\textbf{CC}_{(\lambda)}. Applying Theorem 2 with algorithm 𝒜\mathcal{A}, we obtain a clustering with LambdaCC objective score of CC(𝒜)2((1λ)|E𝑤𝑒𝑎𝑘|+λ|E𝑚𝑖𝑠𝑠|)=2STC𝒜\text{CC}({\mathcal{A}})\leq 2((1-\lambda)|E_{\mathit{weak}}|+\lambda|E_{\mathit{miss}}|)=2\text{STC}_{\mathcal{A}}. Thus,

CC(𝒜)2STC(𝒜)2βCC(λ),\displaystyle\text{CC}({\mathcal{A}})\leq 2\cdot\text{STC}({\mathcal{A}})\leq 2\cdot\beta\cdot\textbf{CC}_{(\lambda)},

so CC(𝒜)\text{CC}({\mathcal{A}}) is a (2β)(2\beta)-approximation for LambdaCC. If 𝒜\mathcal{A} optimally solves LambdaSTC, then β=1\beta=1 and so STC(λ)CC(λ)2STC(λ)\textbf{STC}_{(\lambda)}\leq\textbf{CC}_{(\lambda)}\leq 2\textbf{STC}_{(\lambda)}. If 𝒜\mathcal{A} represents our 3-approximate CoverLabel algorithm for LambdaSTC, then combining it with Theorem 4.1 shows that Algorithm 2 is a 23=62\cdot 3=6-approximation for LambdaCC. ∎

Algorithm 2 can be easily derandomized using the deterministic strategy for choosing pivot nodes for Theorem 2.1 (see (van Zuylen and Williamson, 2009)).

4.2. Faster LP algorithm

Algorithm 3 is an approximation algorithm for LambdaCC based on rounding the LambdaSTC LP relaxation. This LP has |𝒲||\mathcal{W}| constraints, whereas the canonical LP has O(n3)O(n^{3}). In the worst case, it is possible for |𝒲|=O(n3)|\mathcal{W}|=O(n^{3}), but our experimental results demonstrate that |𝒲||\mathcal{W}| is far smaller for all of the real-world graphs we consider. Even more significantly, the LambdaSTC LP is a covering LP, which makes it possible to use fast existing techniques for approximating covering LPs. This leads to much faster algorithms, at the expense of only a slightly worse approximation factor since the LP is only solved approximately. The next section provides a more detailed runtime analysis.

Our approach for rounding the LambdaSTC LP follows a similar strategy as CFP, which involves building a new graph G^\hat{G} and then running Pivot. The construction of G^\hat{G} depends on the LP variables {xij}\{x_{ij}\}, the edge structure in GG, and the value of λ\lambda. When λ1/2\lambda\geq 1/2, we always ensure that a non-edge in GG maps to a non-edge in G^\hat{G}. For λ<1/2\lambda<1/2, we always ensure that an edge in GG maps to an edge in G^\hat{G}. We first prove that the algorithm has an approximation factor that ranges from 33 to 55 as λ\lambda goes from 1/21/2 to 11.

Algorithm 3 Rounding the LambdaSTC LP into a clustering
1:  Input: Undirected graph G=(V,E);λ(0,1)G=(V,E);\lambda\in(0,1)
2:  Output: Clustering of GG.
3:  Solve LambdaSTC LP (5) and obtain fractional xijx_{ij} values
4:  Construct G^=(V,E^)\hat{G}=(V,\hat{E}) where
E^={{(i,j):(i,j)E and xij<2λ7λ2} if λ1/2{(i,j):(i,j)E or xij<λ1+λ} if λ<1/2\hat{E}=\begin{cases}\{(i,j):(i,j)\in E\textit{ and }x_{ij}<\frac{2\lambda}{7\lambda-2}\}&\text{ if $\lambda\geq 1/2$}\\ \{(i,j):(i,j)\in E\textit{ or }x_{ij}<\frac{\lambda}{1+\lambda}\}&\text{ if $\lambda<1/2$}\end{cases}
5:  Return Pivot(G^)\textsc{Pivot}(\hat{G})
Theorem 4.3.

When λ1/2\lambda\geq 1/2, Algorithm 3 is a randomized (72λ)(7-\frac{2}{\lambda})-approximation algorithm for LambdaCC.

Proof.

To prove Theorem 4.4, we show the conditions in Theorem 2.1 are satisfied for α=(72λ)\alpha=(7-\frac{2}{\lambda}). In our analysis, we define budgets bijb_{ij} for each distinct pair of nodes (i,j)(i,j) based on the LP objective (5). Specifically, we set bij=(1λ)xijb_{ij}=(1-\lambda)x_{ij} if (i,j)E(i,j)\in E, and bij=λ(1xij)b_{ij}=\lambda(1-x_{ij}) if (i,j)E(i,j)\notin E. We begin by checking Condition (1) in Theorem 2.1 for each distinct pair of nodes (i,j)(i,j), i.e.,

(8) wijαbij for all (i,j)E^\displaystyle w_{ij}^{-}\leq\alpha b_{ij}\text{ for all }(i,j)\in\hat{E}
(9) wij+αbij for all (i,j)E^.\displaystyle w_{ij}^{+}\leq\alpha b_{ij}\text{ for all }(i,j)\notin\hat{E}.

If (i,j)E^E(i,j)\in\hat{E}\cap E, then wij=0w_{ij}^{-}=0 so Condition (8) holds. Similarly, when (i,j)E^(i,j)\notin\hat{E} and (i,j)E(i,j)\notin E, then wij+=0w_{ij}^{+}=0, trivially satisfying Condition (9). By the construction of G^\hat{G}, it is impossible for (i,j)E^(i,j)\in\hat{E} if (i,j)E(i,j)\notin E. So the last case to consider is when (i,j)E^(i,j)\notin\hat{E} and (i,j)E(i,j)\in E, in which case xij2λ7λ2x_{ij}\geq\frac{2\lambda}{7\lambda-2}, so

wij+=(1λ)<(72λ)(1λ)(2λ7λ2)α(1λ)xij=αbij.\displaystyle w_{ij}^{+}=(1-\lambda)<\Big{(}7-\frac{2}{\lambda}\Big{)}(1-\lambda)\Big{(}\frac{2\lambda}{7\lambda-2}\Big{)}\leq\alpha(1-\lambda)x_{ij}=\alpha b_{ij}.

Next, for every triplet (i,j,k)(i,j,k) such that (i,j)E^(i,j)\in\hat{E}, (j,k)E^(j,k)\in\hat{E} and (i,k)E^(i,k)\notin\hat{E}, we need to check that

(10) wij++wjk++wikα(bij+bjk+bik).w_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-}\leq\alpha(b_{ij}+b_{jk}+b_{ik}).

By our construction of G^\hat{G}, if (i,j)(i,j) and (j,k)E^(j,k)\in\hat{E}, then (i,j)(i,j) and (j,k)(j,k) are both edges in GG as well. Since (i,k)E^(i,k)\notin\hat{E}, (i,k)(i,k) may or may not be an edge in GG, we prove (10) by considering two cases.

Case 1: (i,k)E(i,k)\notin E. Here we have (bij,bjk,bik)=((1λ)xij,(1λ)xjk,λ(1xik))(b_{ij},b_{jk},b_{ik})=((1-\lambda)x_{ij},(1-\lambda)x_{jk},\lambda(1-x_{ik})) and (wij+,wjk+,wik)=(1λ,1λ,λ)(w_{ij}^{+},w_{jk}^{+},w_{ik}^{-})=(1-\lambda,1-\lambda,\lambda). Using the open wedge inequality xikxij+xjkx_{ik}\leq x_{ij}+x_{jk} and the fact that both xij,xjk<2λ7λ2x_{ij},x_{jk}<\frac{2\lambda}{7\lambda-2}, we know xik<4λ7λ2x_{ik}<\frac{4\lambda}{7\lambda-2}. Therefore,

α(bij+bjk+bik)\displaystyle\alpha(b_{ij}+b_{jk}+b_{ik}) =α((1λ)(xij+xjk)+λ(1xik))\displaystyle=\alpha((1-\lambda)(x_{ij}+x_{jk})+\lambda(1-x_{ik}))
α((1λ)xik+λλxik)\displaystyle\geq\alpha((1-\lambda)x_{ik}+\lambda-\lambda x_{ik})
α((12λ)xik+λ)\displaystyle\geq\alpha((1-2\lambda)x_{ik}+\lambda)
>(7λ2λ)((12λ)(4λ7λ2)+λ)\displaystyle>\Big{(}\frac{7\lambda-2}{\lambda}\Big{)}\Big{(}(1-2\lambda)\Big{(}\frac{4\lambda}{7\lambda-2}\Big{)}+\lambda\Big{)}
=(2λ)=wij++wjk++wik.\displaystyle=(2-\lambda)=w^{+}_{ij}+w_{jk}^{+}+w^{-}_{ik}.

Case 2: (i,k)E(i,k)\in E. In this case, (bij,bjk,bik)=((1λ)xij,(1λ)xjk,(1λ)xik)(b_{ij},b_{jk},b_{ik})=((1-\lambda)x_{ij},(1-\lambda)x_{jk},(1-\lambda)x_{ik}) and (wij+,wjk+,wik)=(1λ,1λ,0)(w_{ij}^{+},w_{jk}^{+},w_{ik}^{-})=(1-\lambda,1-\lambda,0). Then,

α(bij+bjk+bik)\displaystyle\alpha(b_{ij}+b_{jk}+b_{ik}) =α((1λ)xij+(1λ)xjk+(1λ)xik)\displaystyle=\alpha((1-\lambda)x_{ij}+(1-\lambda)x_{jk}+(1-\lambda)x_{ik})
7λ2λ((1λ)2λ7λ2)\displaystyle\geq\frac{7\lambda-2}{\lambda}\left((1-\lambda)\frac{2\lambda}{7\lambda-2}\right)
=2(1λ)=wij++wjk++wik.\displaystyle=2(1-\lambda)=w^{+}_{ij}+w_{jk}^{+}+w^{-}_{ik}.

Gleich et al. (Gleich et al., 2018) showed that for small enough λ\lambda, the canonical LambdaCC LP relaxation has an O(logn)O(\log n) integrality gap, but showed how to round that LP to obtain a 1λ\frac{1}{\lambda}-approximation, which is better than O(logn)O(\log n) for all λ=ω(1/logn)\lambda=\omega(1/\log n). These previous results rule out the possibility of obtaining an approximation better than O(logn)O(\log n) for arbitrarily small λ\lambda by rounding the (looser) LambdaSTC LP relaxation. However, we can show that Algorithm 3 will still provide a 1+1/λ1+1/\lambda approximation, which is very close to the approximation factor obtained by Gleich et al. (Gleich et al., 2018) for rounding a much more expensive LP.

Theorem 4.4.

When λ<1/2\lambda<1/2, Algorithm 3 is a randomized (1+1/λ)(1+1/\lambda)-approximation algorithm for LambdaCC.

Proof.

We prove the result by showing that the conditions of Theorem 2.1 are satisfied with α=(1+λ)/λ\alpha=(1+\lambda)/\lambda. Condition (1) of this theorem is easy to satisfy for a node pair (i,j)(i,j) if (i,j)EE^(i,j)\in E\cap\hat{E}, since wij=0w_{ij}^{-}=0. It is similarly easy to satisfy if (i,j)E(i,j)\notin E and (i,j)E^(i,j)\notin\hat{E} since then wij+=0w_{ij}^{+}=0. The construction of G^\hat{G} ensures it is impossible for (i,j)E(i,j)\in E if (i,j)E^(i,j)\notin\hat{E}. If (i,j)E(i,j)\notin E but (i,j)E^(i,j)\in\hat{E}, then we know wij=λw_{ij}^{-}=\lambda and bij=λ(1xij)b_{ij}=\lambda(1-x_{ij}) and xij<λ1+λx_{ij}<\frac{\lambda}{1+\lambda}. Thus,

αbij=(1+λλ)λ(1xij)>(1+λ)(1λ1+λ)=1>λ=wij,\displaystyle\alpha b_{ij}=\left(\frac{1+\lambda}{\lambda}\right)\lambda(1-x_{ij})>(1+\lambda)\left(1-\frac{\lambda}{1+\lambda}\right)=1>\lambda=w_{ij}^{-},

which proves Condition (1) of Theorem 2.1.

Next, we prove Condition (2) for every triplet {i,j,k}\{i,j,k\} that defines an open wedge centered at jj in G^\hat{G}, i.e., (i,j)E^(i,j)\in\hat{E}, (j,k)E^(j,k)\in\hat{E}, and (i,k)E^(i,k)\notin\hat{E}. We know that (i,k)E(i,k)\notin E and xikλ1+λx_{ik}\geq\frac{\lambda}{1+\lambda}, or else by the construction of G^\hat{G} we would have (i,k)E^(i,k)\in\hat{E}. Node pairs (i,j)(i,j) and (j,k)(j,k) may or may not be edges in GG, so we separately consider 4 cases in proving wij++wjk++wikα(bij+bjk+bik)w_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-}\leq\alpha(b_{ij}+b_{jk}+b_{ik}).

Case 1: When (i,j)E(i,j)\in E and (j,k)E(j,k)\in E, we have (bij,bjk,bik)=((1λ)xij,(1λ)xjk,λ(1xik))(b_{ij},b_{jk},b_{ik})=((1-\lambda)x_{ij},(1-\lambda)x_{jk},\lambda(1-x_{ik})). The triplet (i,j,k)(i,j,k) is also an open wedge in GG, so we have

α(bij+bjk+bik)\displaystyle\alpha(b_{ij}+b_{jk}+b_{ik}) =α((1λ)(xij+xjk)+λ(1xik))\displaystyle=\alpha((1-\lambda)(x_{ij}+x_{jk})+\lambda(1-x_{ik}))
α((1λ)xik+λλxik)\displaystyle\geq\alpha((1-\lambda)x_{ik}+\lambda-\lambda x_{ik})
=α((12λ)xik+λ)\displaystyle=\alpha((1-2\lambda)x_{ik}+\lambda)
(1+λλ)((12λ)(λ1+λ)+λ)\displaystyle\geq\left(\frac{1+\lambda}{\lambda}\right)\left((1-2\lambda)\left(\frac{\lambda}{1+\lambda}\right)+\lambda\right)
=2λ=wij++wjk++wik.\displaystyle=2-\lambda=w^{+}_{ij}+w_{jk}^{+}+w^{-}_{ik}.

Case 2: When (i,j)E(i,j)\in E and (j,k)E(j,k)\notin E, we have (bij,bjk,bik)=((1λ)xij,λ(1xjk),λ(1xik))(b_{ij},b_{jk},b_{ik})=((1-\lambda)x_{ij},\lambda(1-x_{jk}),\lambda(1-x_{ik})) and we know xjk<λ1+λ(1xjk)>1λ1+λ=11+λx_{jk}<\frac{\lambda}{1+\lambda}\implies(1-x_{jk})>1-\frac{\lambda}{1+\lambda}=\frac{1}{1+\lambda}. Then,

α(bij+bjk+bik)\displaystyle\alpha(b_{ij}+b_{jk}+b_{ik}) αbjk=αλ(1xjk)(1+λλ)λ(11+λ)\displaystyle\geq\alpha b_{jk}=\alpha\lambda(1-x_{jk})\geq\left(\frac{1+\lambda}{\lambda}\right)\lambda\left(\frac{1}{1+\lambda}\right)
=1=wij++wjk++wik.\displaystyle=1=w_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-}.

Case 3: When (i,j)E(i,j)\notin E and (j,k)E(j,k)\in E, this is symmetric to Case 2.

Case 4: When (i,j)E(i,j)\notin E and (j,k)E(j,k)\notin E, we have (bij,bjk,bik)=(λ(1xij),λ(1xjk),λ(1xik))(b_{ij},b_{jk},b_{ik})=(\lambda(1-x_{ij}),\lambda(1-x_{jk}),\lambda(1-x_{ik})), and both xijx_{ij} and xjkx_{jk} are strictly less than λ1+λ\frac{\lambda}{1+\lambda}, so

α(bij+bjk+bik)\displaystyle\alpha(b_{ij}+b_{jk}+b_{ik}) =α(λ(1xij)+λ(1xjk)+λ(1xik))\displaystyle=\alpha(\lambda(1-x_{ij})+\lambda(1-x_{jk})+\lambda(1-x_{ik}))
>α2λ(1λ1+λ)=2λ=wij++wjk++wik.\displaystyle>\alpha 2\lambda\left(1-\frac{\lambda}{1+\lambda}\right)=2\geq\lambda=w_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-}.

4.3. A 3-approximation via an intermediate LP

Veldt et al. (Veldt et al., 2018, 2017) originally presented a 3-approximation algorithm for λ1/2\lambda\geq 1/2 based on the canonical LP relaxation. This algorithm however comes with an O(n3)O(n^{3}) size constraint matrix since all triangle inequality constraints are considered for all triplet of nodes (i,j,k)(V3)(i,j,k)\in{V\choose 3}. In contrast, in the previous section, we proposed a faster 6-approximation algorithm based on the LambdaSTC LP relaxation that includes a triangle inequality constraint xikxjk+xijx_{ik}\leq x_{jk}+x_{ij} only when (i,j,k)(i,j,k) an open wedge (centered at jj) in GG. In this section, we show how to obtain a 3-approximation by rounding an LP relaxation whose constraint set lies somewhere between the LambdaSTC and canonical LambdaCC LP relaxation. In more detail, we include a triangle inequality constraint for every wedge in GG as well as for every triangle in GG. This is a superset of the constraint set for the LambdaSTC LP relaxation but does not include a triangle inequality constraint for every {i,j,k}\{i,j,k\}. Formally, this LP relaxation is given by

(11) min (i,j)E(1λ)xij+(i,j)Eλ(1xij)\displaystyle{\textstyle\sum_{(i,j)\in E}(1-\lambda)x_{ij}+\sum_{(i,j)\notin E}\lambda(1-x_{ij})}
s.t. xij+xjkxik if (i,j,k)𝒲j or (i,j,k)𝒯j\displaystyle x_{ij}+x_{jk}\geq x_{ik}\text{ if }(i,j,k)\in\mathcal{W}_{j}\text{ or }(i,j,k)\in\mathcal{T}_{j}
0xij1 for all (i,j)(V2)\displaystyle 0\leq x_{ij}\leq 1\text{ for all }(i,j)\in{\textstyle{V\choose 2}}

where 𝒯j\mathcal{T}_{j} represents a triangle that includes node jj as one of its vertices. Algorithm 4 uses the same rounding strategy that was used for the canonical LP relaxation (Veldt et al., 2018), except that it is applied to the LP in equation (11) rather than the canonical LP.

Algorithm 4 Rounding the LP (11) into a clustering
1:  Input: Undirected graph G=(V,E);λ1/2G=(V,E);\lambda\geq 1/2
2:  Output: Clustering of GG.
3:  Solve LP (11) and obtain fractional xijx_{ij} values
4:  Construct G^=(V,E^)\hat{G}=(V,\hat{E}) where
E^={(i,j):xij<1/3}\hat{E}=\{(i,j):x_{ij}<1/3\}
5:  Return Pivot(G^)\textsc{Pivot}(\hat{G})

The LP relaxation presented here has a constraint size determined by the number of wedges and triangles, denoted as |𝒲|+|𝒯||\mathcal{W}|+|\mathcal{T}|, in the graph. While both |𝒲||\mathcal{W}| and |𝒯||\mathcal{T}| can potentially be O(n3)O(n^{3}) in the worst case, this is not typically the scenario in practical situations. In real-world networks, the number of wedges and triangles is significantly smaller. Figure 1 illustrates this observation by comparing the number of constraints in the intermediate LP (11) against the canonical LP. Thus, solving and rounding this LP is more efficient compared to existing techniques, and we now prove that this can be done without a loss in the approximation factor.

Theorem 4.5.

Algorithm 4 is a randomized 3-approximation algorithm for LambdaCC when λ1/2\lambda\geq 1/2.

Proof.

We can prove that Algorithm 4 satisfies Theorem 2.1 by making slight modifications to the proof presented in Theorem 6 in the work of Veldt et al. (Veldt et al., 2017). Condition (1) of Theorem 2.1 can be satisfied following the proof as is in (Veldt et al., 2017). To prove Condition (2), we demonstrate that

(12) wij++wjk++wikα(bij+bjk+bik)w_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-}\leq\alpha(b_{ij}+b_{jk}+b_{ik})

for every triplet of nodes (i,j,k)(i,j,k) that is mapped to an open wedge centered at jj in G^\hat{G}. This means that xij,xjk<1/3x_{ij},x_{jk}<1/3, xik1/3x_{ik}\geq 1/3. Note that we are only able to apply the triangle inequality xikxij+xjkx_{ik}\leq x_{ij}+x_{jk} if (i,j,k)(i,j,k) is also an open wedge or a triangle in the original graph GG. Given an arbitrary triplet (i,j,k)(i,j,k) that maps to an open wedge in G^\hat{G}, there are 8 possibilities for the edge structure in GG, depending on which pairs of nodes in (i,j,k)(i,j,k) share an edge in GG. Following Veldt et al. (Veldt et al., 2017), we can prove inequality (12) for each of the 8 cases separately. Note that we do not need to update the analysis for cases where (i,j,k)(i,j,k) is an open wedge or a triangle in GG, since our new LP in (11) includes triangle inequality constraints for these cases. This means that the following cases from the analysis of Veldt et al. (Veldt et al., 2017) remain unchanged:

  • Case 1: (i,j,k)(i,j,k) forms a wedge centered at jj in GG.

  • Case 5 and Case 6: (i,j,k)(i,j,k) forms a wedge centered at either ii or kk.

  • Case 8: (i,j,k)(i,j,k) forms a triangle.

For Case 7, where (i,k)E(i,k)\in E, (i,j)E(i,j)\notin E, and (j,k)E(j,k)\notin E, the proof is trivial since wij++wjk++wik=0w_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-}=0. We update the proof for the remaining cases as follows:

Case 2: When (i,j)E,(j,k)E,(i,k)E(i,j)\in E,(j,k)\notin E,(i,k)\notin E, we have (bij,bjk,bik)=((1λ)xij,λ(1xjk),λ(1xik))(b_{ij},b_{jk},b_{ik})=((1-\lambda)x_{ij},\lambda(1-x_{jk}),\lambda(1-x_{ik})) and (wij+,wjk+,wik)=(1λ,0,λ)(w_{ij}^{+},w_{jk}^{+},w_{ik}^{-})=(1-\lambda,0,\lambda). Thus,

α(bij+bjk+bik)\displaystyle\alpha(b_{ij}+b_{jk}+b_{ik}) =α((1λ)xij+λ(1xjk)+λ(1xik))\displaystyle=\alpha((1-\lambda)x_{ij}+\lambda(1-x_{jk})+\lambda(1-x_{ik}))
3(λλxjk)>3(λλ/3)\displaystyle\geq 3(\lambda-\lambda x_{jk})>3(\lambda-\lambda/3)
=2λ1=wij++wjk++wik.\displaystyle=2\lambda\geq 1=w_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-}.

Case 3: When (i,j)E,(j,k)E,(i,k)E(i,j)\notin E,(j,k)\in E,(i,k)\notin E is symmetric to Case 2 and the same result holds.

Case 4: When (i,j)E,(j,k)E,(i,k)E(i,j)\notin E,(j,k)\notin E,(i,k)\notin E, we have (bij,bjk,bik)=(λ(1xij),λ(1xjk),λ(1xik))(b_{ij},b_{jk},b_{ik})=(\lambda(1-x_{ij}),\lambda(1-x_{jk}),\lambda(1-x_{ik})) and (wij+,wjk+,wik)=(0,0,λ)(w_{ij}^{+},w_{jk}^{+},w_{ik}^{-})=(0,0,\lambda). Then,

α(bij+bjk+bik)\displaystyle\alpha(b_{ij}+b_{jk}+b_{ik}) =α(λ(1xij)+λ(1xjk)+λ(1xik))\displaystyle=\alpha(\lambda(1-x_{ij})+\lambda(1-x_{jk})+\lambda(1-x_{ik}))
>3λ(2/3+2/3)\displaystyle>3\lambda(2/3+2/3)
=3λ(4/3)=4λ>λ=wij++wjk++wik.\displaystyle=3\lambda(4/3)=4\lambda>\lambda=w_{ij}^{+}+w_{jk}^{+}+w_{ik}^{-}.

Therefore, considering all the cases, we can conclude that Theorem 2.1 holds for α=3\alpha=3, satisfying the 3-approximation guarantee.

4.4. Runtime Analysis

For a graph G=(V,E)G=(V,E), let m=|E|m=|E| and n=|V|n=|V|. When written in the form minAx=bcTx\min_{\textbf{A}\textbf{x}=\textbf{b}}\textbf{c}^{T}\textbf{x}, the canonical LP relaxation for LambdaCC has O(n3)O(n^{3}) constraints and variables. Even using recent theoretical algorithms for solving linear programs in matrix multiplication time (Cohen et al., 2021; Jiang et al., 2021), the runtime is Ω((n3)ω)\Omega((n^{3})^{\omega}) where ω\omega is the matrix multiplication exponent, so the runtime for solving and rounding the canonical relaxation is Ω(n6)\Omega(n^{6}). Not only does this have a prohibitively expensive runtime, but in practice even forming such a large constraint matrix can lead to memory issues that make it infeasible to solve this on a very large scale. Thus, although this approach provides the best theoretical approximation factor, it is not scalable.

Our new approximation algorithms come with good approximation guarantees and are much faster than solving the canonical relaxation, both in theory and practice. Finding the open wedges of GG can be done in time O(vVdv2)O(\sum_{v\in V}d_{v}^{2}) by visiting each node, and then visiting each pair of neighbors of that node in turn. This runtime is upper bounded by O(mn)O(mn). When applying the randomized Pivot algorithm, this is in fact the most expensive part of CFP, so the overall runtime for CFP is O(vVdv2)=O(nm)O(\sum_{v\in V}d_{v}^{2})=O(nm). If we use the deterministic pivoting strategy of van Zuylen and Williamson (van Zuylen and Williamson, 2009), this can be implemented in O(n3)O(n^{3}) time so that is the runtime for a derandomized version of CFP.

The LambdaSTC LP is a covering LP, so for ε0\varepsilon\geq 0 we can find a (1+ε)(1+\varepsilon)-approximate solution in O~(1ε2|𝒲|)\tilde{O}(\frac{1}{\varepsilon^{2}}|\mathcal{W}|) time using the multiplicative weights update method (Quanrud, 2020; Garg and Khandekar, 2004; Fleischer, 2004), where O~\tilde{O} suppresses logarithmic factors. This assumes we already know |𝒲||\mathcal{W}|; if we factor in the time it takes to find all open wedges the runtime comes to O~(1ε2|𝒲|+vVdv2)\tilde{O}(\frac{1}{\varepsilon^{2}}|\mathcal{W}|+\sum_{v\in V}d_{v}^{2}). Minor alteration to our analysis quickly shows that a (1+ε)(1+\varepsilon)-approximate solution to the LP translates to approximation factors that are a factor (1+ε)(1+\varepsilon) larger. Once again, applying the deterministic pivot selection adds O(n3)O(n^{3}) to the runtime, which is still far better than Ω(n6)\Omega(n^{6}).

5. Experiments

This section presents a performance of our algorithms. We conduct experiments on publicly available datasets from various domains, including the SNAP (Leskovec and Sosič, 2016) and Facebook100 (Traud et al., 2012) datasets, which are available at the Suitsparse matrix collection (Davis and Hu, 2011). To implement the algorithms, we use the Julia programming language, and we run all experiments on a Dell XPS machine with 16 GB RAM and an Intel Core i7 processor. Both the canonical and the LambdaSTC LP relaxations are solved using Gurobi optimization software (Gurobi Optimization, 2021). We focus here on finding exact solutions for the LambdaSTC LP relaxation using existing optimization software. This is already far more scalable than trying to form the constraint matrix for the canonical LP relaxation and solve it using Gurobi. Finding faster approximate solutions for the LP using the multiplicative weights update method is a promising direction for future research, but is beyond the scope of the current paper. Code for our implementations and experiments is available at https://github.com/Vedangi/FastLamCC.

5.1. Approximation algorithms for LambdaCC

A natural question to ask is how well our approximation algorithms compare against previous algorithms for LambdaCC based on the canonical LP relaxation. It is worth noting first of all that even forming the full constraint matrix for the canonical LP (which has n(n1)(n2)/2=O(n3)n(n-1)(n-2)/2=O(n^{3}) triangle inequality constraints) becomes infeasible for even modest-sized graphs due to memory constraints. Meanwhile, the LambdaSTC LP relaxation has one triangle inequality constraint for each open wedge. Although there exist graphs such that |𝒲|=O(n3)|\mathcal{W}|=O(n^{3}), this is not the case in practical situations. Figure 1 plots the size of |𝒲||\mathcal{W}| for all of the Facebook100 networks, as well as for a range of graphs of different classes from SNAP (e.g., social networks, citation networks, web networks, etc.). In all cases |𝒲||\mathcal{W}| is orders of magnitude smaller than n(n1)(n2)/2n(n-1)(n-2)/2, illustrating that solving and rounding this LP is far more practical than using existing LP-based techniques. We also plot the number of constraints in the intermediate LP relaxation from Section 4.3, showing that it has only a slight increase in constraint size over the LambdaSTC LP.

An additional reason to use the LambdaSTC relaxation is that in practice, solving the LambdaSTC relaxation often also solves the canonical LP relaxation. This can be checked by seeing whether the optimal LP variables for the LambdaSTC relaxation are also feasible for the canonical LP.111This can also be viewed as the first step in a more memory efficient approach for solving correlation clustering LP relaxations that has been applied elsewhere (Veldt et al., 2019b; Veldt, 2022): solve the LP over a subset of the constraints and iteratively add more constraints until the variables satisfy all triangle inequalities. Our results indicate that for these graphs and λ\lambda values, enforcing triangle inequality constraints just at open wedge is sufficient. Table 2 shows results for solving and rounding the LambdaSTC LP (Algorithm 3) on three graphs for a range of different λ\lambda values. The graphs are Simmons81 (a social network), ca-GrQc (a collaboration network), and Polblogs (a Political blogs network). We attempted to form the full canonical LP relaxation for these graphs and solve it but quickly ran out of memory. We were able to form and solve the LambdaSTC LP relaxation, and in almost all cases the optimal solution variables for this LP were certified as being feasible (and hence optimal) for the canonical LP. Thus, our LP-based algorithm far exceeded its theoretical guarantees. In practice, it produced solutions that are within a factor of 2 or less from the LP lower bound. When rounding, we applied both our new approach (Algorithm 3) as well as the existing rounding strategy for the canonical LP relaxation, since the rounding step is very fast. We used the previous rounding strategy for the canonical LP whenever we could certify we had solved the canonical LP (since this has an improved a priori guarantee). In practice though, the results for the two different rounding strategies were nearly indistinguishable.

Table 2 also displays results for CFP, showing that it is even orders of magnitude faster than solving and rounding the LambdaSTC LP relaxation while producing comparable approximation ratios (ratio between clustering solution and the computed lower bound). While solving our LP relaxation takes up to hundreds of seconds on the three graphs, CFP takes mere fractions of a second.

Table 2. Results for CFP and rounding the LambdaSTC LP relaxation on three graphs. An asterisk indicates when solving the LambdaSTC relaxation produced the optimal solution for the canonical LP.
Lower Bound Clustering score Ratio Runtime (seconds)
Graph λ\lambda CFP LambdaSTC CFP LambdaSTC CFP LambdaSTC CFP LamdaSTC
0.4 2668 2889.7 6043 ±68\tiny{\pm 68} 4611 ±47\tiny{\pm 47} 2.3 ±0.026\tiny{\pm 0.026} 1.6 ±0.016\tiny{\pm 0.016} 0.34 ±0.096\tiny{\pm 0.096} 38.3 ±0.018\tiny{\pm 0.018}
ca-GrQc 0.55 2064 2236.5 4092 ±33\tiny{\pm 33} 3708 ±0\tiny{\pm 0} 2.0 ±0.016\tiny{\pm 0.016} 1.7 ±0\tiny{\pm 0} 0.061 ±0.0084\tiny{\pm 0.0084} 29.7 ±0.0049\tiny{\pm 0.0049}
n = 5242 0.75 1179 1278.2 2373 ±20\tiny{\pm 20} 2118 ±1\tiny{\pm 1} 2.0 ±0.017\tiny{\pm 0.017} 1.7 ±0.0004\tiny{\pm 0.0004} 0.058 ±0.0079\tiny{\pm 0.0079} 27.7 ±0.006\tiny{\pm 0.006}
m = 14484 0.95 239 259.3 469 ±3\tiny{\pm 3} 430 ±0\tiny{\pm 0} 2.0 ±0.011\tiny{\pm 0.011} 1.7 ±0\tiny{\pm 0} 0.055 ±0.0083\tiny{\pm 0.0083} 25.4 ±0.0079\tiny{\pm 0.0079}
0.4 9823 9893.8 21569 ±72\tiny{\pm 72} 20674 ±110\tiny{\pm 110} 2.2 ±0.0073\tiny{\pm 0.0073} 2.1 ±0.011\tiny{\pm 0.011} 0.48 ±0.1\tiny{\pm 0.1} 3064.3 ±0.028\tiny{\pm 0.028}
Simmons81 0.55 7392 7420.5 15797 ±52\tiny{\pm 52} 14839 ±0\tiny{\pm 0} 2.1 ±0.0071\tiny{\pm 0.0071} 2.0 ±0.0\tiny{\pm 0.0} 0.25 ±0.0075\tiny{\pm 0.0075} 2935.4 ±0.0067\tiny{\pm 0.0067}
n = 1518 0.75 4113 4122.5 8657 ±18\tiny{\pm 18} 8244 ±0\tiny{\pm 0} 2.1 ±0.0044\tiny{\pm 0.0044} 2.0 ±0.0\tiny{\pm 0.0} 0.12 ±0.007\tiny{\pm 0.007} 619.7 ±0.0023\tiny{\pm 0.0023}
m = 32988 0.95 822 824.5 1646 ±0\tiny{\pm 0} 1649 ±0\tiny{\pm 0} 2.0 ±0.0005\tiny{\pm 0.0005} 2.0 ±0\tiny{\pm 0} 0.098 ±0.0065\tiny{\pm 0.0065} 464.8 ±0.0028\tiny{\pm 0.0028}
0.4 4960 5013.1 10591 ±59\tiny{\pm 59} 9997 ±120\tiny{\pm 120} 2.1 ±0.012\tiny{\pm 0.012} 2.0 ±0.024\tiny{\pm 0.024} 0.49 ±0.13\tiny{\pm 0.13} 244.4 ±0.0018\tiny{\pm 0.0018}
Polblogs 0.55 3745 3760.2 7883 ±23\tiny{\pm 23} 7517 ±0\tiny{\pm 0} 2.1 ±0.0062\tiny{\pm 0.0062} 2.0 ±0.0\tiny{\pm 0.0} 0.21 ±0.0074\tiny{\pm 0.0074} 217.4 ±0.00035\tiny{\pm 0.00035}
n = 1222 0.75 2084 2089.0 4377 ±16\tiny{\pm 16} 4177 ±0\tiny{\pm 0} 2.1 ±0.0075\tiny{\pm 0.0075} 2.0 ±0.0\tiny{\pm 0.0} 0.071 ±0.01\tiny{\pm 0.01} 187.9 ±0.0079\tiny{\pm 0.0079}
m = 16714 0.95 417 417.8 837 ±0\tiny{\pm 0} 835 ±0\tiny{\pm 0} 2.0 ±0\tiny{\pm 0} 2.0 ±0\tiny{\pm 0} 0.052 ±0.0089\tiny{\pm 0.0089} 114.2 ±0.0024\tiny{\pm 0.0024}
Refer to caption
Refer to caption
Figure 1. Comparing number of constraints in the canonical LP against the number of constraints in the LambdaSTC LP and the intermediate LP (from Section 4.3) for graphs from Facebook100 (left) and SNAP datasets (right). Each dot represents the number of open wedge constraints for each graph while each star represents the number of constraints for the intermediate LP. We use the same SNAP graphs as Veldt (Veldt, 2022), and color code based on their type (location-based social, other social, web, communication, road, product, collaboration, and citation networks).

5.2. Combining CFP with Fast Heuristics

The Louvain method is a widely used heuristic clustering technique that greedily moves nodes in order to optimize a clustering objective (Blondel et al., 2008). The original Louvain method was designed for maximum modularity clustering, but many variations of the method have been designed. This includes a fast heuristic called LambdaLouvain (Veldt et al., 2018), that greedily optimizes the LambdaCC objective for a given parameter λ\lambda, as well as a parallel version of this method (Shi et al., 2021). Although these methods are fast and perform well in practice, they do not compute lower bounds for the LambdaCC objective nor provide any approximation guarantees. One benefit of our algorithms is that they come with lower bounds that can be used not only to design faster approximation algorithms for LambdaCC, but also to obtain a posteriori guarantees for other heuristic methods.

Figures 2 and  3 showcase the combined results of LambdaLouvain with CFP lower bounds on graphs even larger than those considered in Table 2. These results demonstrate superior a posteriori approximation ratios (clustering objective divided by CFP lower bound) compared to running CFP by itself. Notably, as λ1\lambda\rightarrow 1, the difference in approximation factors between CFP and LambdaLouvain decreases, converging toward a similar outcome. We execute both the CFP rounding procedures and LambdaLouvain method for 15 iterations, reporting the mean and standard deviation for approximation ratios and runtimes. While the CFP+LambdaLouvain yields better approximations, it comes with longer runtimes compared to the CFP alone. Even for small values of λ\lambda, we can certify that LambdaLouvain can produce a respectable factor of around 2 by using the lower bounds generated by CFP.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 2. Approximation ratios for combining CFP and LambdaLouvain on ca-HepPh (12K nodes, 118.5K edges), cit-HepTh(27K nodes, 352K edges) SNAP networks, and Auburn71 (18.4K nodes, 973.9K edges), Michigan23(20K nodes, 1.17M edges) facebook graphs for different values of λ\lambda.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 3. Runtimes for combining CFP and LambdaLouvain on ca-HepPh, Auburn71, cit-HepTh and Michigan23 for different values of λ\lambda. The time to compute the CFP lower bound is in blue, which is the bottleneck for this algorithm.

5.3. Scalability of CoverFlipPivot

Refer to caption
Refer to caption
Figure 4. CFP approximations and runtimes on Texas84 (36K nodes and 1.59M edges), cit-Patents (3.6M nodes and 1.65B edges), roadNet-CA (1.97M nodes and 2.76M edges), amazon0601 (403K nodes and 2.44M edges), com-LiveJournal (3.9M nodes, 3.46B edges) and wiki-topcats (1.79M nodes and 2.54B edges).

We further test the limits of CFP by running it on much larger graphs. Figure 4 shows approximation results on a social network with 1.59 million edges (Texas84), a road network with 2.76 million edges (roadNet-CA), a citation network with 1.65 billion edges (cit-Patents), an Amazon product co-purchasing network with 2.4 million edges (amazon0601), Wikipedia web network with 2.54 billion edges (wiki-topcats) and a blogging community network with 3.46 billion edges (com-Journal). CFP consistently outperforms its theoretical 6-approximation guarantee. For λ0.55\lambda\geq 0.55, it produces approximations of 2.1 or better. When λ=0.5\lambda=0.5, approximation factors increase to between 2.4-2.8, which still outperforms the 6-approximation guarantee. (We omit these results from the plot in Figure 4 in order to zoom in and better display factors near 2 for λ1/2\lambda\geq 1/2.) For each value of λ\lambda, the method takes around 58 minutes for the larger com-LiveJournal graph (with 3.46B edges). In contrast, the cit-Patents graph, consisting of 1.65 billion edges, is processed in less than 11 minutes, showcasing a notably faster runtime. Furthermore, the method exhibits even quicker processing times for the other graphs. An intriguing observation in this context is that as the objective transitions from cluster editing to cluster deletion (λ1\lambda\rightarrow 1), both the approximation factor and the runtime exhibit improvements.

6. Conclusion

We present new approximation algorithms for LambdaCC graph clustering framework that are far more scalable than existing approximation algorithms relying on LP relaxations with O(n3)O(n^{3}) constraints. We introduce the first combinatorial algorithm for LambdaCC in the parameter regime λ(12,1)\lambda\in(\frac{1}{2},1)—where the problem interpolates between cluster editing and cluster deletion—which comes with a 6-approximation guarantee. We then provide algorithms for all parameter regimes based on rounding a less expensive LP relaxation. A major theoretical benefit of these alternative LPs is that they are covering LPs. This means that the multiplicative weights update method provides fast combinatorial methods for finding approximation solutions. Although in our work we focused on using existing optimization software to exactly solve these relaxations, a clear direction for future research is to implement these faster approximate solvers in order to achieve additional runtime improvements. Another direction for future work is to see whether it is possible to develop a 3-approximation for all λ(12,1)\lambda\in(\frac{1}{2},1) by rounding the LambdaSTC LP. Though our theoretical approximation factors get increasingly worse for λ1\lambda\rightarrow 1, in practice we see no deterioration in approximations. Finally, a compelling open question is whether we can develop an O(logn)O(\log n) approximation algorithm for LambdaCC that applies for all values of λ\lambda and can be made purely combinatorial, and does not rely on the canonical LP.

References

  • (1)
  • Abbe (2018) Emmanuel Abbe. 2018. Community Detection and Stochastic Block Models: Recent Developments. Journal of Machine Learning Research 18, 177 (2018), 1–86.
  • Adriaens et al. (2020) Florian Adriaens, Tijl De Bie, Aristides Gionis, Jefrey Lijffijt, Antonis Matakos, and Polina Rozenshtein. 2020. Relaxing the strong triadic closure problem for edge strength inference. Data Mining and Knowledge Discovery 34 (2020), 611–651.
  • Ailon et al. (2012) Nir. Ailon, Noa. Avigdor-Elgrabli, Edo. Liberty, and Anke. van Zuylen. 2012. Improved Approximation Algorithms for Bipartite Correlation Clustering. SIAM J. Comput. 41, 5 (2012), 1110–1121.
  • Ailon et al. (2008) Nir Ailon, Moses Charikar, and Alantha Newman. 2008. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM) 55, 5 (2008), 23.
  • Arora et al. (2009) Sanjeev Arora, Satish Rao, and Umesh Vazirani. 2009. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM) 56, 2 (2009), 1–37.
  • Bansal et al. (2004) Nikhil Bansal, Avrim Blum, and Shuchi Chawla. 2004. Correlation Clustering. Machine Learning 56 (2004), 89–113.
  • Bar-Yehuda and Even (1985) Reuven Bar-Yehuda and Shimon Even. 1985. A local-ratio theorem for approximating the weighted vertex cover problem. Annals of Discrete Mathematics 25, 27-46 (1985), 50.
  • Ben-Dor et al. (1999) Amir Ben-Dor, Ron Shamir, and Zohar Yakhini. 1999. Clustering gene expression patterns. Journal of computational biology 6, 3-4 (1999), 281–297.
  • Blondel et al. (2008) Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008, 10 (2008), P10008.
  • Bohlin et al. (2014) Ludvig Bohlin, Daniel Edler, Andrea Lancichinetti, and Martin Rosvall. 2014. Community detection and visualization of networks with the map equation framework. In Measuring Scholarly Impact. Springer, 3–34.
  • Charikar et al. (2003) Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. 2003. Clustering with qualitative information. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on. IEEE, 524–533.
  • Charikar et al. (2005) Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. 2005. Clustering with qualitative information. J. Comput. System Sci. 71, 3 (2005), 360 – 383. https://doi.org/10.1016/j.jcss.2004.10.012 Learning Theory 2003.
  • Chawla et al. (2015) Shuchi Chawla, Konstantin Makarychev, Tselil Schramm, and Grigory Yaroslavtsev. 2015. Near optimal LP rounding algorithm for correlation clustering on complete and complete k-partite graphs. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing. ACM, 219–228.
  • Cohen et al. (2021) Michael B Cohen, Yin Tat Lee, and Zhao Song. 2021. Solving linear programs in the current matrix multiplication time. Journal of the ACM (JACM) 68, 1 (2021), 1–39.
  • Davis and Hu (2011) Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (dec 2011), 25 pages. https://doi.org/10.1145/2049662.2049663
  • Delvenne et al. (2010) J.-C. Delvenne, Sophia N Yaliraki, and Mauricio Barahona. 2010. Stability of graph communities across time scales. Proceedings of the National Academy of Sciences 107, 29 (2010), 12755–12760.
  • Demaine et al. (2006) Erik D. Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica. 2006. Correlation clustering in general weighted graphs. Theoretical Computer Science 361, 2 (2006), 172 – 187. https://doi.org/10.1016/j.tcs.2006.05.008 Approximation and Online Algorithms.
  • Easley and Kleinberg (2010) David Easley and Jon Kleinberg. 2010. Networks, crowds, and markets. Vol. 8. Cambridge university press Cambridge.
  • Fleischer (2004) Lisa Fleischer. 2004. A fast approximation scheme for fractional covering problems with variable upper bounds. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms. 1001–1010.
  • Fortunato and Hric (2016) Santo Fortunato and Darko Hric. 2016. Community detection in networks: A user guide. Physics Reports 659 (2016), 1 – 44. https://doi.org/10.1016/j.physrep.2016.09.002 Community detection in networks: A user guide.
  • Gan et al. (2020) Junhao Gan, David F. Gleich, Nate Veldt, Anthony Wirth, and Xin Zhang. 2020. Graph Clustering in All Parameter Regimes. In 45th International Symposium on Mathematical Foundations of Computer Science (MFCS ’20, Vol. 170). 39:1–39:15. https://doi.org/10.4230/LIPIcs.MFCS.2020.39
  • Garg and Khandekar (2004) Naveen Garg and Rohit Khandekar. 2004. Fractional covering with upper bounds on the variables: Solving LPs with negative entries. In Algorithms–ESA 2004: 12th Annual European Symposium, Bergen, Norway, September 14-17, 2004. Proceedings 12. Springer, 371–382.
  • Gleich et al. (2018) David F. Gleich, Nate Veldt, and Anthony Wirth. 2018. Correlation Clustering Generalized. In 29th International Symposium on Algorithms and Computation (ISAAC 2018, Vol. 123). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 44:1–44:13. https://doi.org/10.4230/LIPIcs.ISAAC.2018.44
  • Granovetter (1973) Mark S Granovetter. 1973. The strength of weak ties. American journal of sociology 78, 6 (1973), 1360–1380.
  • Grüttemeier and Komusiewicz (2020) Niels Grüttemeier and Christian Komusiewicz. 2020. On the relation of strong triadic closure and cluster deletion. Algorithmica 82, 4 (2020), 853–880. https://doi.org/10.1007/s00453-019-00617-1
  • Grüttemeier and Morawietz (2020) Niels Grüttemeier and Nils Morawietz. 2020. On Strong Triadic Closure with Edge Insertion. Technical report (2020).
  • Gurobi Optimization (2021) LLC Gurobi Optimization. 2021. Gurobi optimizer reference manual.
  • Jiang et al. (2021) Shunhua Jiang, Zhao Song, Omri Weinstein, and Hengjie Zhang. 2021. A Faster Algorithm for Solving General LPs. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC ’21). Association for Computing Machinery, New York, NY, USA, 823–832. https://doi.org/10.1145/3406325.3451058
  • Konstantinidis et al. (2018) Athanasios L Konstantinidis, Stavros D Nikolopoulos, and Charis Papadopoulos. 2018. Strong triadic closure in cographs and graphs of low maximum degree. Theoretical Computer Science 740 (2018), 76–84.
  • Leighton and Rao (1999) Tom Leighton and Satish Rao. 1999. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM (JACM) 46, 6 (November 1999), 787–832.
  • Leskovec and Sosič (2016) Jure Leskovec and Rok Sosič. 2016. Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1–20.
  • Newman (2006) Mark EJ Newman. 2006. Finding community structure in networks using the eigenvectors of matrices. Physical review E 74, 3 (2006), 036104.
  • Newman (2016) Mark EJ Newman. 2016. Equivalence between modularity optimization and maximum likelihood methods for community detection. Physical Review E 94, 5 (2016), 052315.
  • Newman and Girvan (2004) Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E 69, 2 (2004), 026113.
  • Oettershagen et al. (2023) Lutz Oettershagen, Athanasios L Konstantinidis, and Giuseppe F Italiano. 2023. Inferring Tie Strength in Temporal Networks. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part II. Springer, 69–85.
  • Porter et al. (2009) Mason A Porter, Jukka-Pekka Onnela, and Peter J Mucha. 2009. Communities in networks. Notices of the AMS 56, 9 (2009), 1082–1097.
  • Quanrud (2020) Kent Quanrud. 2020. Nearly linear time approximations for mixed packing and covering problems without data structures or randomization. In Symposium on Simplicity in Algorithms. SIAM, 69–80.
  • Reichardt and Bornholdt (2006) Jörg Reichardt and Stefan Bornholdt. 2006. Statistical mechanics of community detection. Physical Review E 74, 016110 (2006).
  • Schaeffer (2007) Satu Elisa Schaeffer. 2007. Graph clustering. Computer Science Review 1, 1 (2007), 27 – 64. https://doi.org/10.1016/j.cosrev.2007.05.001
  • Schaub et al. (2012) Michael T Schaub, Renaud Lambiotte, and Mauricio Barahona. 2012. Encoding dynamics for multiscale community detection: Markov time sweeping for the map equation. Physical Review E 86, 2 (2012), 026112.
  • Shamir et al. (2004) Ron Shamir, Roded Sharan, and Dekel Tsur. 2004. Cluster graph modification problems. Discrete Applied Mathematics 144, 1-2 (2004), 173–182.
  • Shi et al. (2021) Jessica Shi, Laxman Dhulipala, David Eisenstat, Jakub Łăcki, and Vahab Mirrokni. 2021. Scalable Community Detection via Parallel Correlation Clustering. Proc. VLDB Endow. 14, 11 (jul 2021), 2305–2313. https://doi.org/10.14778/3476249.3476282
  • Shi and Malik (2000) Jianbo Shi and J. Malik. 2000. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22, 8 (August 2000), 888–905. https://doi.org/10.1109/34.868688
  • Sintos and Tsaparas (2014) Stavros Sintos and Panayiotis Tsaparas. 2014. Using strong triadic closure to characterize ties in social networks. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD ’14). 1466–1475. https://doi.org/10.1145/2623330.2623664
  • Traag et al. (2019) Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. 2019. From Louvain to Leiden: guaranteeing well-connected communities. Scientific reports 9, 1 (2019), 5233.
  • Traud et al. (2012) Amanda L Traud, Peter J Mucha, and Mason A Porter. 2012. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications 391, 16 (2012), 4165–4180.
  • van Zuylen and Williamson (2009) Anke van Zuylen and David P. Williamson. 2009. Deterministic Pivoting Algorithms for Constrained Ranking and Clustering Problems. Mathematics of Operations Research 34, 3 (2009), 594–620. http://www.jstor.org/stable/40538434
  • Veldt (2022) Nate Veldt. 2022. Correlation Clustering via Strong Triadic Closure Labeling: Fast Approximation Algorithms and Practical Lower Bounds. In International Conference on Machine Learning. PMLR, 22060–22083.
  • Veldt et al. (2017) Nate Veldt, David Gleich, and Anthony Wirth. 2017. Unifying sparsest cut, cluster deletion, and modularity clustering objectives with correlation clustering. arXiv preprint arXiv:1712.05825 (2017).
  • Veldt et al. (2018) Nate Veldt, David F Gleich, and Anthony Wirth. 2018. A correlation clustering framework for community detection. In Proceedings of the 2018 World Wide Web Conference. 439–448.
  • Veldt et al. (2019a) Nate Veldt, David F. Gleich, and Anthony Wirth. 2019a. Learning Resolution Parameters for Graph Clustering. In Proceedings of the 28th International Conference on World Wide Web (San Francisco, CA, USA) (WWW ’19). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 11 pages. https://doi.org/10.1145/3308558.3313471
  • Veldt et al. (2019b) Nate Veldt, David F. Gleich, Anthony Wirth, and James Saunderson. 2019b. Metric-Constrained Optimization for Graph Clustering Algorithms. SIAM Journal on Mathematics of Data Science 1, 2 (2019), 333–355. https://doi.org/10.1137/18M1217152 arXiv:https://doi.org/10.1137/18M1217152
  • Veldt et al. (2020) Nate Veldt, Anthony Wirth, and David F Gleich. 2020. Parameterized correlation clustering in hypergraphs and bipartite graphs. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1868–1876.