This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Finding the KT partition of a weighted graph in near-linear time

Simon Apers   Paweł Gawrychowski   Troy Lee IRIF, CNRS, Paris. Email: smgapers@gmail.comInstitute of Computer Science, University of Wrocław, Poland. Email: gawry@cs.uni.wroc.pl. Partially supported by the Bekker programme of the Polish National Agency for Academic Exchange (PPN/BEK/2020/1/00444).Centre for Quantum Software and Information, University of Technology Sydney. Email: troyjlee@gmail.com. Supported in part by the Australian Research Council Grant No: DP200100950
Abstract

In a breakthrough work, Kawarabayashi and Thorup (J. ACM’19) gave a near-linear time deterministic algorithm to compute the weight of a minimum cut in a simple graph G=(V,E)G=(V,E). A key component of this algorithm is finding the (1+ε)(1+\varepsilon)-KT partition of GG, the coarsest partition {P1,,Pk}\{P_{1},\ldots,P_{k}\} of VV such that for every non-trivial (1+ε)(1+\varepsilon)-near minimum cut with sides {S,S¯}\{S,\bar{S}\} it holds that PiP_{i} is contained in either SS or S¯\bar{S}, for i=1,,ki=1,\ldots,k. In this work we give a near-linear time randomized algorithm to find the (1+ε)(1+\varepsilon)-KT partition of a weighted graph. Our algorithm is quite different from that of Kawarabayashi and Thorup and builds on Karger’s framework of tree-respecting cuts (J. ACM’00).

We describe a number of applications of the algorithm. (i) The algorithm makes progress towards a more efficient algorithm for constructing the polygon representation of the set of near-minimum cuts in a graph. This is a generalization of the cactus representation, and was initially described by Benczúr (FOCS’95). (ii) We improve the time complexity of a recent quantum algorithm for minimum cut in a simple graph in the adjacency list model from 𝒪~(n3/2)\widetilde{\mathcal{O}}(n^{3/2}) to 𝒪~(mn)\widetilde{\mathcal{O}}(\sqrt{mn}), when the graph has nn vertices and mm edges. (iii) We describe a new type of randomized algorithm for minimum cut in simple graphs with complexity 𝒪(m+nlog6n)\mathcal{O}(m+n\log^{6}n). For graphs that are not too sparse, this matches the complexity of the current best 𝒪(m+nlog2n)\mathcal{O}(m+n\log^{2}n) algorithm which uses a different approach based on random contractions.

The key technical contribution of our work is the following. Given a weighted graph GG with mm edges and a spanning tree TT of GG, consider the graph HH whose nodes are the edges of TT, and where there is an edge between two nodes of HH iff the corresponding 2-respecting cut of TT is a non-trivial near-minimum cut of GG. We give a 𝒪(mlog4n)\mathcal{O}(m\log^{4}n) time deterministic algorithm to compute a spanning forest of HH.

1 Introduction

Given a weighted and undirected graph GG with nn vertices and mm edges,111Throughout this paper we will use nn and mm to denote the number of vertices and edges of the input graph. the minimum cut problem is to find the minimum weight λ(G)\lambda(G) of a set of edges whose removal disconnects GG. When GG is unweighted, this is simply the minimum number of edges whose removal disconnects GG, also known as the edge connectivity of GG. The minimum cut problem is a fundamental problem in theoretical computer science whose study goes back to at least the 1960s when the first polynomial time algorithm computing edge connectivity was given by Gomory and Hu [GH61]. In the current state-of-the-art, there are near-linear time randomized algorithms for the minimum cut problem in weighted graphs [Kar00, GMW20, MN20] and near-linear time deterministic algorithms in the case of simple graphs222A simple graph is an unweighted graph with no self loops and at most one edge between any pair of vertices. [KT19, HRW20]. Very recently, Li [Li21] has given an almost-linear time (i.e. time 𝒪(m1+o(1))\mathcal{O}(m^{1+o(1)})) deterministic algorithm for weighted graphs as well.

The best known algorithms for weighted graphs all rely on a framework developed by Karger [Kar00] which, for an input graph GG, relies on finding 𝒪(logn)\mathcal{O}(\log n) spanning trees of GG such that with high probability one of these spanning trees will contain at most 2 edges from a minimum cut of GG. In this case the cut is said to 2-respect the tree. A key insight of Karger is that, given a spanning tree TT of GG, the problem of finding a 2-respecting cut of TT that has minimum weight in GG can be solved deterministically in near-linear time, specifically time 𝒪(mlog2n)\mathcal{O}(m\log^{2}n). After standing for 20 years, the bound for this minimum-weight 2-respecting cut problem was recently improved by Gawrychowski, Mozes, and Weimann [GMW20], who gave a deterministic 𝒪(mlogn)\mathcal{O}(m\log n) time algorithm, and independently by Mukhopadhyay and Nanongkai [MN20] who gave a randomized algorithm with complexity 𝒪(mlogn+nlog4n)\mathcal{O}(m\log n+n\log^{4}n).

The best algorithms in the case of a simple graph GG rely on a quite different approach, pioneered by Kawarabayashi and Thorup [KT19]. This approach begins by finding the minimum degree dd of a vertex in GG. Then the question becomes if there is a non-trivial cut, i.e. a cut where both sides of the corresponding bipartition have cardinality at least 2, whose weight is less than dd. This problem is solved by finding what we call the (1+ε)(1+\varepsilon)-KT partition of the graph. Let εnt(G)\mathcal{B}_{\varepsilon}^{nt}(G) be the set of all bipartitions {S,S¯}\{S,\bar{S}\} of the vertex set corresponding to non-trivial cuts whose weight is at most (1+ε)λ(G)(1+\varepsilon)\lambda(G). The (1+ε)(1+\varepsilon)-KT partition of GG is the coarsest partition {P1,,Pk}\{P_{1},\ldots,P_{k}\} of the vertex set such that for any {S,S¯}εnt(G)\{S,\bar{S}\}\in\mathcal{B}_{\varepsilon}^{nt}(G) it holds that PiP_{i} is contained in either SS or S¯\bar{S}, for each i=1,,ki=1,\ldots,k. If one considers the multigraph GG^{\prime} formed from GG by identifying vertices in the same set PiP_{i}, then GG^{\prime} preserves all non-trivial (1+ε)(1+\varepsilon)-near minimum cuts of GG. Kawarabayashi and Thorup further show that for any ε<1\varepsilon<1 the graph GG^{\prime} only has 𝒪~(n)\widetilde{\mathcal{O}}(n) edges. This bound crucially uses that the original graph is simple. The edge connectivity of GG is thus the minimum of dd and the edge connectivity of GG^{\prime}. One can use Gabow’s deterministic 𝒪(λmlogn)\mathcal{O}(\lambda m^{\prime}\log n) edge connectivity algorithm [Gab95] for a multigraph with mm^{\prime} edges and edge connectivity λ\lambda to check in time 𝒪~(ndlogn)=𝒪~(m)\widetilde{\mathcal{O}}(nd\log n)=\widetilde{\mathcal{O}}(m) if the edge connectivity of GG^{\prime} is less than dd and, if so, compute it. In the most technical part of their work, Kawarabayashi and Thorup give a deterministic algorithm to find the (1+ε)(1+\varepsilon)-KT partition of a simple graph GG in time 𝒪~(m)\widetilde{\mathcal{O}}(m), giving an 𝒪~(m)\widetilde{\mathcal{O}}(m) time deterministic algorithm overall for edge connectivity. The key tool in their algorithm is the PageRank algorithm, which they use for finding low conductance cuts in the graph.

The KT partition has proven to be a very useful concept. Rubinstein, Schramm, and Weinberg [RSW18] also go through the (1+ε)(1+\varepsilon)-KT partition to give a near-optimal 𝒪~(n)\widetilde{\mathcal{O}}(n) randomized query algorithm determining the edge connectivity of a simple graph in the cut query model. In the cut query model one can query a subset of the vertices SS and receive in return the number of edges with exactly one endpoint in SS. En route to their result, [RSW18] also improved the bound on the number of inter-component edges in the (1+ε)(1+\varepsilon)-KT partition of a simple graph to 𝒪(n)\mathcal{O}(n), for any ε<1\varepsilon<1. In the case ε=0\varepsilon=0 this was independently done by Lo, Schmidt, and Thorup [LST20]. The KT partition approach is also used in the current best randomized algorithm for edge connectivity, which runs in time 𝒪(min{m+nlog2n,mlogn})\mathcal{O}(\min\{m+n\log^{2}n,m\log n\}) [GNT20].333The bound quoted in [GNT20] is 𝒪(m+nlog3n)\mathcal{O}(m+n\log^{3}n) but the improvement to Karger’s algorithm by [GMW20] reduces this to 𝒪(m+nlog2n)\mathcal{O}(m+n\log^{2}n).

1.1 Our results

In this work we give the first near-linear time randomized algorithm to find the (1+ε)(1+\varepsilon)-KT partition of a weighted graph, for 0ε1/160\leq\varepsilon\leq 1/16. An interesting aspect of our algorithm is that it uses Karger’s 2-respecting cut framework to find the (1+ε)(1+\varepsilon)-KT partition, thereby combining the aforementioned major lines of work on the minimum cut problem. This makes progress on a number of problems.

  1. 1.

    The polygon representation is a compact representation of the set of near-minimum cuts of a weighted graph, originally described by Benczúr [Ben95, Ben97] and Benczúr-Goemans [BG08]. It extends the cactus representation [DKL76], which only works for the set of exact minimum cuts, and has played a key role in recent breakthroughs on the traveling salesperson problem [GSS11, KKG21]. For a general weighted graph the polygon representation has size O(n2)O(n^{2}), and Benczúr has given a randomized algorithm to construct a polygon representation of the (1+ε)(1+\varepsilon)-near mincuts of a graph in time 𝒪(n2(1+ε))\mathcal{O}(n^{2(1+\varepsilon)}) [Ben97, Section 6.3] by building on the Karger-Stein algorithm. It is an open question whether we can construct a polygon representation in time 𝒪~(n2)\widetilde{\mathcal{O}}(n^{2}) for ε>0\varepsilon>0. In his thesis [Ben97, pg. 126], Benczúr says, “It already seems hard to directly identify the system of atoms within the 𝒪~(n2)\widetilde{\mathcal{O}}(n^{2}) time bound,” where the system of atoms is defined analogously to the (1+ε)(1+\varepsilon)-KT partition but for the set of all (1+ε)(1+\varepsilon)-near minimum cuts, not just the non-trivial ones. One can easily construct the set of atoms from a (1+ε)(1+\varepsilon)-KT partition, thus our KT partition algorithm gives a 𝒪~(m)\widetilde{\mathcal{O}}(m) time algorithm for this task as well, making progress on this open question.

  2. 2.

    The (1+ε)(1+\varepsilon)-KT partition of a weighted graph is exactly what is needed to give an optimal quantum algorithm for minimum cut: Apers and Lee [AL21] showed that the quantum query and time complexity of minimum cut in the adjacency matrix model is Θ~(n3/2τ)\widetilde{\Theta}(n^{3/2}\sqrt{\tau}) for a weighted graph where the ratio of the largest to smallest edge weights is τ\tau, with the algorithm proceeding by finding a (1+ε)(1+\varepsilon)-KT partition.

    In the case where the graph is instead represented as an adjacency list, they gave an algorithm with query complexity 𝒪~(mnτ)\widetilde{\mathcal{O}}(\sqrt{mn\tau}) but whose running time is larger at 𝒪~(mnτ+n3/2)\widetilde{\mathcal{O}}(\sqrt{mn\tau}+n^{3/2}). The bottleneck in the time complexity is the time taken to find a (1+ε)(1+\varepsilon)-KT partition of a weighted graph with 𝒪~(n)\widetilde{\mathcal{O}}(n) edges. Using the near-linear time randomized algorithm we give to find a (1+ε)(1+\varepsilon)-KT partition here improves the time complexity of this algorithm to 𝒪~(mnτ)\widetilde{\mathcal{O}}(\sqrt{mn\tau}), matching the query complexity. We detail the full algorithm in Section 6.1.

    Both quantum algorithms also used the following observation [AL21, Lemma 2]: if in a weighted graph GG the ratio of the largest edge weight to the smallest is τ\tau, then the total weight of inter-component edges in a (1+ε)(1+\varepsilon)-KT partition of GG for ε<1\varepsilon<1 is 𝒪(τn)\mathcal{O}(\tau n), which can be tight.

  3. 3.

    The best randomized algorithm to compute the edge connectivity of a simple graph is the 2-out contraction approach of Ghaffari, Nowicki, and Thorup [GNT20], which has running time 𝒪(min{m+nlog2n,mlogn})\mathcal{O}(\min\{m+n\log^{2}n,m\log n\}). Using our algorithm to find a (1+ε)(1+\varepsilon)-KT partition in a weighted graph we can follow Karger’s 2-respecting tree approach to compute the edge connectivity of a simple graph in time 𝒪(m+nlog6n)\mathcal{O}(m+n\log^{6}n), thus also achieving the optimal bound on graphs that are not too sparse. We postpone details to Section 6.2.

Apart from these examples, we are hopeful that our near-optimal randomized algorithm for finding the KT partition of a weighted graph will find further applications.

In order to find a (1+ε)(1+\varepsilon)-KT partition in near-linear time, Apers and Lee [AL21] show that it suffices to solve the following problem in near-linear time. Let GG be a connected weighted graph and TT a spanning tree of GG. Consider a graph HH whose nodes are the edges of TT, and where two nodes e,fe,f of HH are connected by an edge iff the 2-respecting cut defined by e,fe,f is a non-trivial (1+ε)(1+\varepsilon)-near minimum cut of GG. Then the problem is to find a spanning forest of HH. Our main technical contribution is to give a 𝒪(mlog4n)\mathcal{O}(m\log^{4}n) time deterministic algorithm to solve this problem, where mm is the number of edges of the original graph GG.

It is interesting to compare the problem of finding a spanning forest of HH with the original problem solved by Karger of finding a minimum-weight 2-respecting cut of TT. To find a spanning forest of HH we potentially have to find Ω(n)\Omega(n) many (1+ε)(1+\varepsilon)-near minimum cuts, which we accomplish with only an additional logarithmic overhead in the running time. The first insight to how this might be possible is to note that Karger’s original algorithm to find the minimum weight 2-respecting cut actually does something stronger than needed. Let cost(e,f)\mathrm{cost}(e,f) be the weight of the 2-respecting cut of TT defined by {e,f}\{e,f\}. For every edge ee of TT Karger’s algorithm attempts to find an fargminfcost(e,f)f^{*}\in\operatorname*{arg\,min}_{f}\mathrm{cost}(e,f). It does not always succeed in this task, but if the candidate ff^{\prime} returned for edge ee is not such a minimizer, then for fargminfcost(e,f)f^{*}\in\operatorname*{arg\,min}_{f}\mathrm{cost}(e,f) it must be the case that the candidate gg returned for ff^{*} satisfies cost(f,g)cost(e,f)\mathrm{cost}(f^{*},g)\leq\mathrm{cost}(e,f^{*}). In this way, the algorithm still succeeds to find a minimum weight 2-respecting cut in the end.

In contrast, we give an algorithm that for every edge ee of TT actually finds

fargminf{cost(e,f):{e,f} defines a non-trivial cut}.f^{*}\in\operatorname*{arg\,min}_{f}\large\{\mathrm{cost}(e,f):\{e,f\}\text{ defines a non-trivial cut}\large\}\enspace.

We then show that this suffices to implement a round of Borůvka’s spanning forest algorithm [NMN01] on HH in near-linear time. Borůvka’s spanning forest algorithm consists of logn\log n rounds and maintains the invariant of having a partition {S1,,Sk}\{S_{1},\ldots,S_{k}\} of the vertex set and a spanning tree for each set SiS_{i}. The algorithm terminates when there is no outgoing edge from any set of the partition, at which point the collection of spanning trees for the sets of the partition is a spanning forest of HH. The sets of the partition are initialized to be individual nodes of HH.

In each round of Borůvka’s algorithm the goal is to find an outgoing edge from each set SiS_{i} of the partition which has one. Consider a node ee of HH with eSie\in S_{i}. We can find the best partner ff for ee and check if {e,f}\{e,f\} indeed gives rise to a non-trivial (1+ε)(1+\varepsilon)-near minimum cut and so is an edge of HH. The problem is that ff could also be in SiS_{i} in which case the edge {e,f}\{e,f\} is not an outgoing edge of SiS_{i} as desired. To handle this, we maintain a data structure that allows us to find both the best partner ff for ee, but also the best partner ff^{\prime} for ee that lies in a different set of the partition from ff. We call this operation a categorical top two query. If there actually is an edge of HH with one endpoint ee and the other endpoint outside of SiS_{i} then either {e,f}\{e,f\} or {e,f}\{e,f^{\prime}\} will be such an edge. Following the approach of [GMW20] to the minimum-weight 2-respecting cut problem, combined with an efficient data structure for handling categorical top two queries, we are able to do this for all nodes ee of HH in near-linear time, which allows us to implement a round of Borůvka’s algorithm in near-linear time.

1.2 Technical overview

We now give a more detailed description of our main result. Let G=(V,E,w)G=(V,E,w) be a weighted graph, where EE is the set of edges and w:E+w:E\rightarrow\mathbb{R}_{+} assigns a positive weight to each edge. For a set SVS\subseteq V let ΔG(S)\Delta_{G}(S) be the set of all edges of GG with exactly one endpoint in SS. A cut of GG is a set of edges of the form ΔG(S)\Delta_{G}(S) for some SV\emptyset\neq S\subsetneq V. We call SS and S¯\bar{S} the shores of the cut. Let w(ΔG(S))=eΔ(S)w(e)w(\Delta_{G}(S))=\sum_{e\in\Delta(S)}w(e). We use λ(G)=minSVw(Δ(S))\lambda(G)=\min_{\emptyset\neq S\subsetneq V}w(\Delta(S)) for the minimum weight of a cut in GG.

We will be interested in partitions of VV and the partial order on partitions induced by refinement. For two partitions 𝒳,𝒴\mathcal{X},\mathcal{Y} of VV we say that 𝒳𝒴\mathcal{X}\preceq\mathcal{Y} iff for every X𝒳X\in\mathcal{X} there is a Y𝒴Y\in\mathcal{Y} with XYX\subseteq Y. In this case we say 𝒳\mathcal{X} is a refinement of 𝒴\mathcal{Y}. The meet of two partitions 𝒳\mathcal{X} and 𝒴\mathcal{Y}, denoted 𝒳𝒴\mathcal{X}\wedge\mathcal{Y}, is the partition 𝒵\mathcal{Z} such that 𝒵𝒳,𝒵𝒴\mathcal{Z}\preceq\mathcal{X},\mathcal{Z}\preceq\mathcal{Y} and for any other partition 𝒲\mathcal{W} satisfying these two conditions 𝒲𝒵\mathcal{W}\preceq\mathcal{Z}. In other words, 𝒳𝒴\mathcal{X}\wedge\mathcal{Y} is the greatest lower bound on 𝒳\mathcal{X} and 𝒴\mathcal{Y} under \preceq. For a set of partitions 𝒟={𝒟1,,𝒟K}\mathcal{D}=\{\mathcal{D}_{1},\ldots,\mathcal{D}_{K}\} we write 𝒟=𝒟1𝒟K\bigwedge\mathcal{D}=\mathcal{D}_{1}\wedge\cdots\wedge\mathcal{D}_{K}.

For our applications we need to consider not only minimum cuts, but also near-minimum cuts. For ε0\varepsilon\geq 0, let ε(G)={{S,S¯}:w(ΔG(S))(1+ε)λ(G)}\mathcal{B}_{\varepsilon}(G)=\{\{S,\bar{S}\}:w(\Delta_{G}(S))\leq(1+\varepsilon)\lambda(G)\} be the set of all bipartitions of VV corresponding to (1+ε)(1+\varepsilon)-near minimum cuts. Let εnt(G)ε(G)\mathcal{B}_{\varepsilon}^{nt}(G)\subseteq\mathcal{B}_{\varepsilon}(G) be the set of all the non-trivial cuts in ε(G)\mathcal{B}_{\varepsilon}(G). The (1+ε)(1+\varepsilon)-KT partition of GG is exactly εnt(G)\bigwedge\mathcal{B}_{\varepsilon}^{nt}(G).

Both ε(G)\bigwedge\mathcal{B}_{\varepsilon}(G) and εnt(G)\bigwedge\mathcal{B}_{\varepsilon}^{nt}(G) are important sets for understanding the structure of (near)-minimum cuts in a graph. Consider first 0(G)\bigwedge\mathcal{B}_{0}(G), the meet of the set of all bipartitions corresponding to minimum cuts. This set arises in the cactus decomposition of GG [DKL76], a compact representation of all minimum cuts of GG. A cactus is a connected multigraph where every edge appears in exactly one cycle. The edge connectivity of a cactus is 2 and the minimum cuts are obtained by removing any two edges from the same cycle. A cactus decomposition of a graph GG is a cactus HH on 𝒪(n)\mathcal{O}(n) vertices and a mapping ϕ:V(G)V(H)\phi:V(G)\rightarrow V(H) such that ΔG(ϕ1(S))\Delta_{G}(\phi^{-1}(S)) is a mincut of GG iff ΔH(S)\Delta_{H}(S) is a mincut of HH. The mapping ϕ\phi does not have to be injective, so multiple vertices of GG can map to the same vertex of HH. In this case, however, the cactus decomposition property means that all vertices in ϕ1({v})\phi^{-1}(\{v\}) must be on the same side of every minimum cut of GG, for every vV(H)v\in V(H). Thus as vv ranges over V(H)V(H) the sets ϕ1({v})\phi^{-1}(\{v\}) give the elements of 0(G)\bigwedge\mathcal{B}_{0}(G) (note that ϕ1({v})\phi^{-1}(\{v\}) can also be empty). A cactus decomposition of a weighted graph can be constructed by a randomized algorithm in near-linear time [KP09], thus this also gives a near-linear time randomized algorithm to compute 0(G)\bigwedge\mathcal{B}_{0}(G).

Lo, Schmidt, and Thorup [LST20] give a version of the cactus decomposition that only represents the non-trivial minimum cuts. In fact, they give a deterministic 𝒪(n)\mathcal{O}(n) time algorithm that converts a standard cactus into one representing the non-trivial minimum cuts. Combining this with the near-linear time algorithm to compute a cactus decomposition, this gives a near-linear time randomized algorithm to compute 0nt(G)\bigwedge\mathcal{B}_{0}^{nt}(G) as well.

The situation changes once we go to near-minimum cuts, which can no longer be represented by a cactus, but require the deformable polygon representation from [Ben95, Ben97, BG08]. This construction is fairly intricate, and the best known randomized algorithm to construct a deformable polygon representation of the (1+ε)(1+\varepsilon)-near mincuts of a graph builds on the Karger-Stein algorithm and takes time 𝒪(n2(1+ε))\mathcal{O}(n^{2(1+\varepsilon)}) [Ben97, Section 6.3]. A prerequisite to constructing a deformable polygon representation is being able to compute ε(G)\bigwedge\mathcal{B}_{\varepsilon}(G) as, analogously to the case of a cactus, these sets will be the “atoms” that label regions of the polygons.

Our main result in this work is to give a randomized algorithm to compute ε(G)\bigwedge\mathcal{B}_{\varepsilon}(G) and εnt(G)\bigwedge\mathcal{B}_{\varepsilon}^{nt}(G) in time 𝒪(mlog5n)\mathcal{O}(m\log^{5}n).

Theorem 1.

Let G=(V,E,w)G=(V,E,w) be a graph with nn vertices and mm edges. For 0ε1/160\leq\varepsilon\leq 1/16 let ε={{S,S¯}:w(Δ(S))(1+ε)λ(G)}\mathcal{B}_{\varepsilon}=\{\{S,\bar{S}\}:w(\Delta(S))\leq(1+\varepsilon)\lambda(G)\} and εntε\mathcal{B}_{\varepsilon}^{nt}\subseteq\mathcal{B}_{\varepsilon} be the subset of ε\mathcal{B}_{\varepsilon} containing only non-trivial cuts. Both ε\bigwedge\mathcal{B}_{\varepsilon} and εnt\bigwedge\mathcal{B}_{\varepsilon}^{nt} can be computed with high probability by a randomized algorithm with running time 𝒪(mlog5n)\mathcal{O}(m\log^{5}n).

In the rest of this introduction we focus on computing εnt\bigwedge\mathcal{B}_{\varepsilon}^{nt}. It is easy to construct ε\bigwedge\mathcal{B}_{\varepsilon} from εnt\bigwedge\mathcal{B}_{\varepsilon}^{nt} deterministically in 𝒪(n)\mathcal{O}(n) time.

The first obstacle we face in designing a near-linear time algorithm to compute the meet of εnt\mathcal{B}_{\varepsilon}^{nt} is that the number of near-minimum cuts in GG can be Ω(n2)\Omega(n^{2}), so we cannot afford to consider all of them. An idea to get around this is to try the following:

  1. 1.

    Efficiently find a “small” subset εnt\mathcal{B}^{\prime}\subseteq\mathcal{B}_{\varepsilon}^{nt} such that =εnt\bigwedge\mathcal{B}^{\prime}=\bigwedge\mathcal{B}_{\varepsilon}^{nt}. We call such a subset a generating set.

A greedy argument shows that such a subset \mathcal{B}^{\prime} exists of size at most n1n-1. We initialize ={{S,S¯}}\mathcal{B}^{\prime}=\{\{S,\bar{S}\}\} for some element {S,S¯}\{S,\bar{S}\} in εnt\mathcal{B}_{\varepsilon}^{nt}. We then iterate through the elements {T,T¯}\{T,\bar{T}\} of εnt\mathcal{B}_{\varepsilon}^{nt} and add it to \mathcal{B}^{\prime} iff {T,T¯}\bigwedge\mathcal{B}^{\prime}\cup\{T,\bar{T}\}\neq\bigwedge\mathcal{B}^{\prime}. Each bipartition added to \mathcal{B}^{\prime} increases the number of elements in \bigwedge\mathcal{B}^{\prime} by at least 11. As this size can be at most nn, and begins with size 22 the total number of sets at termination is at most n1n-1. This shows that a small generating set exists, but there still remains the problem of finding such a generating set efficiently.

Assuming we get past the first obstacle, there remains a second obstacle. The most straightforward algorithm to compute the meet of kk partitions of a set of size nn takes time Θ(knlogn)\Theta(kn\log n), which is again too slow if k=Θ(n)k=\Theta(n). Thus we will also need to

  1. 2.

    Exploit the structure of \mathcal{B}^{\prime} to compute \bigwedge\mathcal{B}^{\prime} efficiently.

Apers and Lee [AL21] give an approach to accomplish (1) and (2) following Karger’s framework of tree respecting cuts. Karger shows that in near-linear time one can compute a set of K𝒪(logn)K\in\mathcal{O}(\log n) spanning trees T1,,TKT_{1},\ldots,T_{K} of GG such that every (1+ε)(1+\varepsilon)-near minimum cut of GG 2-respects at least one of these trees. Let iεnt\mathcal{B}_{i}\subseteq\mathcal{B}_{\varepsilon}^{nt} be the bipartitions corresponding to non-trivial near-minimum cuts that 2-respect TiT_{i}. To compute εnt\bigwedge\mathcal{B}_{\varepsilon}^{nt} it suffices to compute 𝒞i=i\mathcal{C}_{i}=\bigwedge\mathcal{B}_{i} for each i=1,,Ki=1,\ldots,K and then compute i=1K𝒞i\bigwedge_{i=1}^{K}\mathcal{C}_{i}. The latter can be done in time 𝒪(nlog2n)\mathcal{O}(n\log^{2}n) by the aforementioned algorithm. This leaves the problem of computing i\bigwedge\mathcal{B}_{i}.

A key observation from [AL21] gives a generating set i\mathcal{B}_{i}^{\prime} for i\mathcal{B}_{i} of size 𝒪(n)\mathcal{O}(n). One intializes i\mathcal{B}_{i}^{\prime} to be empty and then adds bipartitions in i\mathcal{B}_{i} that 1-respect TiT_{i} to i\mathcal{B}_{i}^{\prime}. This is a set of size 𝒪(n)\mathcal{O}(n), and Karger has shown that all near-minimum cuts that 1-respect a tree can be found in time 𝒪(m)\mathcal{O}(m).

Now we focus on the cuts that strictly 2-respect TiT_{i}. To handle these one creates a graph HH whose nodes are the edges of TiT_{i} and where there is an edge between nodes ee and ff iff the 2-respecting cut of TiT_{i} defined by {e,f}\{e,f\} is a near-minimum cut in i\mathcal{B}_{i}. One then adds to i\mathcal{B}_{i}^{\prime} the bipartitions corresponding to a set of 2-respecting cuts that form a spanning forest of HH. The resulting set i\mathcal{B}_{i}^{\prime} has size 𝒪(n)\mathcal{O}(n) and it can be shown to be a generating set for i\mathcal{B}_{i}.

Apers and Lee give a quantum algorithm to find a spanning forest of HH with running time 𝒪~(n3/2)\widetilde{\mathcal{O}}(n^{3/2}). They then give a randomized algorithm to compute i\bigwedge\mathcal{B}_{i}^{\prime} in time 𝒪~(n)\widetilde{\mathcal{O}}(n). As our main technical contribution, we give a deterministic algorithm to find a spanning forest of HH in time 𝒪(mlog4n)\mathcal{O}(m\log^{4}n). We also replace the randomization used in the algorithm to compute i\bigwedge\mathcal{B}_{i}^{\prime} with an appropriate data structure to give an 𝒪~(n)\widetilde{\mathcal{O}}(n) deterministic algorithm to compute the meet.

2 Preliminaries

For a natural number nn we use [n]={1,,n}[n]=\{1,\ldots,n\}.

Graph notation

For a set SS we let S(2)S^{(2)} denote the set of unordered pairs of elements of SS. We represent an undirected edge-weighted graph as a triple G=(V,E,w)G=(V,E,w) where EV(2)E\subseteq V^{(2)} and w:E+w:E\rightarrow\mathbb{R}_{+} gives the weight of an edge eEe\in E. We will also use V(G)V(G) to denote the vertex set of GG and E(G)E(G) to denote the set of edges. We always use nn for the number of vertices in GG and mm for the number of edges. We will overload the function ww to let w(F)=eFw(e)w(F)=\sum_{e\in F}w(e) for a set of edges FF and for two disjoint sets S,TVS,T\subseteq V we use w(S,T)w(S,T) to denote eE:|eS|=|eT|=1w(e)\sum_{e\in E:|e\cap S|=|e\cap T|=1}w(e), that is the sum of the weights of edges with one endpoint in SS and one endpoint in TT. For a subset SV\emptyset\neq S\subsetneq V we let Δ(S)\Delta(S) be the set of edges with exactly one endpoint in SS. This is the cut defined by SS. We let λ(G)\lambda(G) denote the weight of a minimum cut in GG, i.e., λ(G)=minSVw(Δ(S))\lambda(G)=\min_{\emptyset\neq S\subsetneq V}w(\Delta(S)).

Heavy path decomposition

We use the standard notion of heavy path decomposition of TT [ST83, HT84], which is a partition of the edges of TT into heavy paths. We define this partition recursively: first, find the heavy path starting at the root by repeatedly descending to the child of the current node with the largest subtree. This creates the topmost heavy path starting at the root (called its head) and terminating at a leaf (called its tail). Second, remove the topmost heavy path from TT and repeat the reasoning on each of the obtained smaller trees. The crucial property is that, for any node uu, the path from uu to the root in TT intersects at most logn\log n heavy paths.

Algorithmic preliminaries

We collect here a few theorems from previous work that we will need. The first is Karger’s result [Kar00] about finding 𝒪(logn)\mathcal{O}(\log n) many spanning trees of a graph GG such that every minimum cut of GG will 2-respect at least one of these trees. We will need the easy extension of this result to near-minimum cuts, which has been explicitly stated by Apers and Lee.

Theorem 2 ([Kar00, Theorem 4.1], [AL21, Theorem 24]).

Let GG be a weighted graph with nn vertices and mm edges. There is a randomized algorithm that in time 𝒪(mlog2n+nlog4n)\mathcal{O}(m\log^{2}n+n\log^{4}n) time constructs a set of 𝒪(logn)\mathcal{O}(\log n) spanning trees such that every (1+1/16)(1+1/16)-near minimum cut of GG 2-respects 1/41/4 of them with high probability.

We will also need the fact that for a weighted graph G=(V,E,w)G=(V,E,w) the values in GG of all 1-respecting cuts of a tree TT can be computed quickly. For a rooted spanning tree TT of GG and an edge eE(T)e\in E(T), let TeT_{e} be the set of vertices in the component not containing the root when ee is removed from TT. We use the shorthand cost(e)=w(ΔG(Te))\mathrm{cost}(e)=w(\Delta_{G}(T_{e})).

Lemma 3 ([Kar00, Lemma 5.1]).

Let GG be a weighted graph with nn vertices and mm edges, and TT a spanning tree of GG. There is a deterministic algorithm that computes cost(e)\mathrm{cost}(e) for every eE(T)e\in E(T) in time 𝒪(m+n)\mathcal{O}(m+n).

We will also make use of the improvement by Gawrychowski, Mozes and Weimann of Karger’s mincut algorithm.

Lemma 4 ([GMW20, Theorem 7]).

Let GG be a weighted graph with nn vertices and mm edges and TT a spanning tree of GG. A cut of minimum weight in GG that 2-respects TT can be found deterministically in 𝒪(mlogn)\mathcal{O}(m\log n) time. Using this, there is a randomized algorithm that finds a minimum cut in GG with high probability in time 𝒪(mlog2n)\mathcal{O}(m\log^{2}n).

Finally we give the formal statement of the result from [AL21] that underlies our algorithm to construct a KT partition.

Lemma 5 ([AL21, Lemma 29]).

Let T=(V,E)T=(V,E) be a tree and 2V\mathcal{M}\subseteq 2^{V} a family of subsets of VV such that |ΔT(S)|=2|\Delta_{T}(S)|=2 for each SS\in\mathcal{M}. Let Q={ΔT(S):S}Q=\{\Delta_{T}(S):S\in\mathcal{M}\} be a set of pairs of edges in EE. Suppose FF is spanning forest for the graph L=(E,Q)L=(E,Q). Then the set of shores of the 2-respecting cuts defined by edges in E(F)E(F) is a generating set for \bigwedge\mathcal{M}.

3 Data structures

In this section we develop the data structure we will need for a fast implementation of our spanning forest algorithm. We want to maintain a tree TT with root rr, in which each edge has a score and a color, so that we can support the following queries and updates. For an edge ee of the tree, let TeT_{e} be the set of edges in the component not containing rr when ee is removed from the tree. On query an edge ee we want to find the edge fTef\in T_{e} with the smallest score, and the edge fTef^{\prime}\in T_{e} with the smallest score among edges whose color is different from that of ff. We call such a query a categorical top two query. We want to answer these queries while allowing adding Δ\Delta to the score of every edge on the path between two nodes. We could use link-cut trees [ST83] to accomplish this with 𝒪(logn)\mathcal{O}(\log n) update and query time using the fact that link-cut trees can be modified to support any semigroup operation under path updates. However, in our case the tree is static, and this allows for a simple and self-contained solution that requires only a well-known binary tree data structure coupled with the standard heavy path decomposition of a tree. This comes at the expense of implementing updates in 𝒪(log2n)\mathcal{O}(\log^{2}n) time instead of 𝒪(logn)\mathcal{O}(\log n) time. The construction can be seen as folklore and has been explicitly stated by Bhardwaj, Lovett and Sandlund [BLS20] for the case when each edge maintains its score and there are no colors. We provide a detailed description of such an approach in Appendix A. We note that the increased update time does not to affect the overall time complexity of our algorithm.

Lemma 6.

Let A[1],,A[n]A[1],\ldots,A[n] be an array where each element has two fields, a color A[i].colorA[i].\mathrm{color} and a score A[i].scoreA[i].\mathrm{score}. In 𝒪(n)\mathcal{O}(n) time we can create a data structure using 𝒪(n)\mathcal{O}(n) space and supporting the following operations in 𝒪(logn)\mathcal{O}(\log n) time per operation.

  1. 1.

    Add(Δ,i,j)\textsc{Add}(\Delta,i,j): for all ikji\leq k\leq j do A[k].scoreA[k].score+ΔA[k].\mathrm{score}\leftarrow A[k].\mathrm{score}+\Delta,

  2. 2.

    CatTopTwo(i,j)\textsc{CatTopTwo}(i,j): return (k1,k2)(k_{1},k_{2}) where k1=argmin{A[k].score:ikj}k_{1}=\operatorname*{arg\,min}\{A[k].\mathrm{score}:i\leq k\leq j\} and k2=NULLk_{2}=\mathrm{NULL} if A[k].color=A[k1].colorA[k].\mathrm{color}=A[k_{1}].\mathrm{color} for all ikji\leq k\leq j and k2=argmin{A[k].score:ikj,A[k].colorA[k1].color}k_{2}=\operatorname*{arg\,min}\{A[k].\mathrm{score}:i\leq k\leq j,A[k].\mathrm{color}\neq A[k_{1}].\mathrm{color}\} otherwise.

Lemma 7.

Let TT be a tree on nn nodes, with each edge eTe\in T having its color and score. In 𝒪(n)\mathcal{O}(n) time we can create a data structure using 𝒪(n)\mathcal{O}(n) space and supporting the following operations.

  1. 1.

    AddPath(Δ,p)\textsc{AddPath}(\Delta,p): add Δ\Delta to the score of every edge on a path pp in TT in 𝒪(log2n)\mathcal{O}(\log^{2}n) time.

  2. 2.

    CatTopTwo(e)\textsc{CatTopTwo}(e): categorical top-two query in TeT_{e} in 𝒪(logn)\mathcal{O}(\log n) time.

4 Spanning tree of near-minimum 2-respecting cuts in near-linear time

Let G=(V,E,w)G=(V,E,w) be a weighted undirected graph. We will assume throughout that GG is connected, and in particular that mn1m\geq n-1, as the KT partition of a disconnected graph can be easily determined from its connected components. Let TT be a spanning tree of GG. We will choose an rVr\in V with degree 1 in TT to be the root of TT. We view TT as a directed graph with all edges directed away from rr. With some abuse of notation, we will also use TT to refer to this directed version. If we remove any edge eE(T)e\in E(T) from TT then TT becomes disconnected into two components. We use eVe^{\downarrow}\subseteq V to denote the set of vertices in the component not containing the root, and TeE(T)T_{e}\subseteq E(T) to denote the set of edges in the subtree rooted at the head of ee, i.e. the edges in the subgraph of TT induced by ee^{\downarrow}. We further use the shorthand cost(e)=w(Δ(e))\mathrm{cost}(e)=w(\Delta(e^{\downarrow})) for the weight of the cut with shore ee^{\downarrow}.

Two edges e,fE(T)e,f\in E(T) define a unique cut in GG which we denote by cutT(e,f)\mathrm{cut}_{T}(e,f) (or cut(e,f)\mathrm{cut}(e,f) if it is clear from the context which TT we are referring to). The cut depends on the relationship between ee and ff. If eTfe\in T_{f} or fTef\in T_{e} then we say that ee and ff are descendant edges. Without loss of generality, say that fTef\in T_{e}. Then the cut defined by ee and ff is cut(e,f)=Δ(ef)\mathrm{cut}(e,f)=\Delta(e^{\downarrow}\setminus f^{\downarrow}). If ee and ff are not descendant edges, then we say they are independent. For independent edges we see that cut(e,f)=Δ(ef)\mathrm{cut}(e,f)=\Delta(e^{\downarrow}\cup f^{\downarrow}). In both cases we use cost(e,f)\mathrm{cost}(e,f) to denote the weight of the corresponding cut.

In a KT partition we are only interested in non-trivial cuts. We first prove the following simple claim that characterizes when cut(e,f)\mathrm{cut}(e,f) is trivial.

Proposition 8.

Let G=(V,E,w)G=(V,E,w) be a connected graph with nn vertices and let TT be a spanning tree of GG with root rr. For e,fE(T)e,f\in E(T) if cut(e,f)\mathrm{cut}(e,f) is trivial then

  1. 1.

    If e,fe,f are independent then they must be the unique edges incident to the root.

  2. 2.

    If e,fe,f are descendant then there is a vertex vVv\in V such that ee is the edge incoming to vv and ff is the unique edge outgoing from vv, or vice versa.

Proof.

First suppose that e,fe,f are independent. Then a shore of cut(e,f)\mathrm{cut}(e,f) is S=efS=e^{\downarrow}\cup f^{\downarrow}. We have that |g|1|g^{\downarrow}|\geq 1 any gE(T)g\in E(T), thus |S|2|S|\geq 2. Hence for cut(e,f)\mathrm{cut}(e,f) to be trivial we must have |S¯|=1|\bar{S}|=1. The root rr is not contained in gg^{\downarrow} for any gE(T)g\in E(T) thus it must be the case that S¯={r}\bar{S}=\{r\}. For this to happen, e,fe,f must be incident to rr, and rr cannot have any other outgoing edges besides ee and ff.

Now consider the case that e,fe,f are descendant and suppose without loss of generality that fTef\in T_{e}. Let S=efS=e^{\downarrow}\setminus f^{\downarrow}. In this case we have |S|<n1|S|<n-1 as ee^{\downarrow} does not contain the root and |f|1|f^{\downarrow}|\geq 1. Let us understand when |S|=1|S|=1. As all vertices on the path from the head of ee to and including the tail of ff are in SS it must be the case that the head of ee is the tail of ff. Call this vertex vv and note vSv\in S. If vv has any other child uu besides the head of ff then we would have uSu\in S as well, thus ff must be the unique outgoing edge from vv. ∎

By choosing a root rr for TT that has degree 1 we avoid the case of item 1 of Proposition 8. Thus we only have to worry about trivial cuts when e,fe,f are descendant.

With that out of the way, we now turn to the main theorem of this section. As outlined in Section 1.2 this theorem is the key routine in our (1+ε)(1+\varepsilon)-KT partition algorithm, which we fully describe in Section 5.

Theorem 9.

Let G=(V,E,w)G=(V,E,w) be a connected graph with nn vertices and mm edges and let TT be a spanning tree of GG. For a given parameter β\beta, define the graph HH, with V(H)=E(T)V(H)=E(T) and E(H)={{e,f}E(T)(2):cost(e,f)β and cut(e,f) non-trivial}E(H)=\{\{e,f\}\in E(T)^{(2)}:\mathrm{cost}(e,f)\leq\beta\text{ and }\mathrm{cut}(e,f)\text{ non-trivial}\}. There is a deterministic algorithm that given adjacency list access to GG and TT outputs a spanning forest of HH in 𝒪(mlog4n)\mathcal{O}(m\log^{4}n) time.

At a high level, we prove Theorem 9 by following Borůvka’s algorithm to find a spanning forest of HH. Throughout the algorithm we maintain a subgraph FF of HH that is a forest, initialized to be the empty graph on vertex set E(T)E(T). At the end of the algorithm, FF will be a spanning forest of HH. The algorithm proceeds in rounds. In each round, for every tree in the forest, we find an edge connecting it to another tree in the forest, if such an edge exists. If HH has kk connected components, then in each round the number of trees in FF minus kk goes down by at least a factor of 22, and so the algorithm terminates in 𝒪(logn)\mathcal{O}(\log n) rounds.

The main work is implementing a round of Borůvka’s algorithm. We will think of the nodes of FF as having colors, where nodes in the same tree of the forest have the same color, and nodes in distinct trees have distinct colors. The goal of a single round is to find, for each color cc, a pair of edges e,fTe,f\in T such that c=color(e)color(f)c=\mathrm{color}(e)\neq\mathrm{color}(f) and {e,f}E(H)\{e,f\}\in E(H), or detect that there is no such pair with these properties, in which case the nodes colored cc in FF already form a connected component of HH. As we need to refer to such pairs often we make the following definition.

Definition 10 (partner).

Let TT and HH be as in Theorem 9. Given an assignment of colors to the edges of TT we say that ff is a partner for ee if {e,f}E(H)\{e,f\}\in E(H) and color(e)color(f)\mathrm{color}(e)\neq\mathrm{color}(f).

We will actually do something stronger than what is required to implement a round of Borůvka’s algorithm, which we encapsulate in the following code header.

Algorithm 1 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges}

Input: Adjacency list access to GG, a spanning tree TT of GG, a parameter β\beta, and an assignment of colors to each eE(T)e\in E(T).
      Output: For every eE(T)e\in E(T) output a partner fE(T)f\in E(T), or report that no partner for ee exists.

The implementation of 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges} is our main technical contribution. Let us first see how to use 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges} to find a spanning forest of HH.

Lemma 11.

Let GG, TT and HH be as in Theorem 9. There is a deterministic algorithm that makes 𝒪(logn)\mathcal{O}(\log n) calls to 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges} and in 𝒪(nlogn)\mathcal{O}(n\log n) additional time outputs a spanning forest of HH.

Proof.

We construct a spanning forest of HH by maintaining a collection of trees FF that will be updated in rounds by Borůvka’s algorithm until it becomes a spanning forest. We initialize F=(E(T),)F=(E(T),\emptyset) and give all eE(T)e\in E(T) distinct colors. We maintain the invariants that FF is a forest and that nodes in the same tree have the same color and those in different trees have distinct colors.

Consider a generic round where FF contains qq trees. We call 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges} with the current color assignment. For every ee which has one we obtain a partner ff such that {e,f}E(H)\{e,f\}\in E(H) and color(e)color(f)\mathrm{color}(e)\neq\mathrm{color}(f). For each color class cc we select one ee with color(e)=c\mathrm{color}(e)=c which has a returned partner (if it exists) and let XX be the set of selected edges. We then find a maximal subset of edges XXX^{\prime}\subseteq X that do not create a cycle among the color classes by computing a spanning forest of the graph whose supervertices are given by the color classes and edges given by XX. We add the edges in XX^{\prime} to E(F)E(F). Finally we merge the color classes of the connected components in FF by appropriately updating the color assignments, and we pass the updated forest and color assignments to the next round of the algorithm. Each of the steps in a single round can be executed in 𝒪(n)\mathcal{O}(n) time.

We have that |X|(qcc(H))/2|X^{\prime}|\geq(q-\mathrm{cc}(H))/2 where cc(H)\mathrm{cc}(H) is the number of connected components of HH. Each edge from XX^{\prime} added to FF decreases the number of trees in FF by one. Thus the number of trees in FF minus cc(H)\mathrm{cc}(H) decreases by at least a factor of 2 in each round and the algorithm terminates after 𝒪(logn)\mathcal{O}(\log n) rounds. The time spent outside of the calls to 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges} is 𝒪(n)\mathcal{O}(n) for each of the 𝒪(logn)\mathcal{O}(\log n) rounds. This is 𝒪(nlogn)\mathcal{O}(n\log n) overall. ∎

If a node ee has a partner ff, then {e,f}\{e,f\} can either be a pair of descendant or independent edges. To implement 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges} we will separately handle these cases, as described in the next two subsections.

4.1 Descendant edges

We follow the approach from [GMW20] originally designed to find a single pair {e,f}\{e,f\} of descendant edges that minimizes cost(e,f)\mathrm{cost}(e,f) over all e,fE(T)e,f\in E(T) in 𝒪(mlogn)\mathcal{O}(m\log n) time. Their approach actually does something stronger (as does Karger’s original algorithm): for every eE(T)e\in E(T) it finds the best match in the subtree TeT_{e}, i.e., it returns an edge fargmin{cost(e,f)fTe}f^{*}\in\operatorname*{arg\,min}\{\mathrm{cost}(e,f)\mid f\in T_{e}\}. In order to implement the descendant edge part of 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges} we have three additional complications to handle:

  1. 1.

    The edge ff^{*} might have the same color as ee.

  2. 2.

    The resulting cut(e,f)\mathrm{cut}(e,f^{*}) might be a trivial cut.

  3. 3.

    Edge ee may have no partner in TeT_{e} but still have a partner ff such that eTfe\in T_{f}. This partnership may not be discovered when we are looking for partners of ff if there is another gTfg\in T_{f} with cost(f,g)cost(e,f)\mathrm{cost}(f,g)\leq\mathrm{cost}(e,f).

Item 1 can be easily solved by, in addition to finding ff^{*}, also finding gargmin{cost(e,f)fE(Te),color(f)color(f)}g^{*}\in\operatorname*{arg\,min}\{\mathrm{cost}(e,f)\mid f\in E(T_{e}),\,\mathrm{color}(f)\neq\mathrm{color}(f^{*})\}. Phrasing things in this way, rather than simply looking for the edge hh with color different from ee which minimizes cost(e,h)\mathrm{cost}(e,h), helps to limit the dependence of the query on ee and thus reduce the query time. If there is an fTef\in T_{e} with color(f)color(e)\mathrm{color}(f)\neq\mathrm{color}(e) and cost(e,f)β\mathrm{cost}(e,f)\leq\beta then at least one of f,gf^{*},g^{*} will satisfy this too.

For item 2, we use the result of Proposition 8 that descendant edges that give rise to trivial cuts have a very constrained structure. This allow us to avoid trivial cuts when looking for a partner of ee.

Item 3 is relatively subtle and does not arise in the minimum weight 2-respecting cut problem. To explain the issue we have to first say something about the high level structure of our implementation of 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges}. We will perform an Euler tour of TT and, when the tour visits edge ee for the first time, we will look for a partner ff for ee in TeT_{e}. The issue is the following, which we explain in the context of the very first round of Borůvka’s algorithm so we do not have to worry about nodes having different colors. Suppose that in the graph HH the only edge incident to node ee is a node ff with eTfe\in T_{f}. Thus in the execution of 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges} we want to find ff as a partner of ee. When the Euler tour is at ee we will not find any suitable partner for ee, as there is none in TeT_{e}. We would like to identify ff as a partner for ee when the Euler tour visits ff for the first time. However, if there is a gTfg\in T_{f} with cost(f,g)<cost(f,e)\mathrm{cost}(f,g)<\mathrm{cost}(f,e) then the algorithm will return gg as a partner of ff rather than ee. To handle this we will actually make two passes over TT. In the first pass, when we visit edge ee for the first time we look for a partner ff in TeT_{e}. In the second pass, we handle the case where the partner of ee might be an ancestor of ee. To do this we need to de-activate nodes. When the Euler tour visits ff for the first time, we first find the lowest cost partner for ff in TfT_{f}. We then de-activate this node, and again find the best active partner for ff in TfT_{f}. Repeating this process, we will eventually find ee if {e,f}\{e,f\} is indeed an edge of HH and e,fe,f have different colors.

Now we turn to more specific implementation details. A key idea in [GMW20] is that we can do an Euler tour of TT while maintaining a data structure such that when we first visit an edge ee we can easily look up cost(e,f)\mathrm{cost}(e,f) for any fTef\in T_{e}. The way this is maintained can be best understood by noting that for fTef\in T_{e}:

cost(e,f)\displaystyle\mathrm{cost}(e,f) =w(Δ(ef))\displaystyle=w(\Delta(e^{\downarrow}\setminus f^{\downarrow}))
=w(ef,(e)c)+w(ef,f)\displaystyle=w(e^{\downarrow}\setminus f^{\downarrow},(e^{\downarrow})^{c})+w(e^{\downarrow}\setminus f^{\downarrow},f^{\downarrow})
=cost(e)+cost(f)2w(f,(e)c)scoree(f),\displaystyle=\mathrm{cost}(e)+\underbrace{\mathrm{cost}(f)-2w(f^{\downarrow},(e^{\downarrow})^{c})}_{\mathrm{score}_{e}(f)}\enspace, (1)

where for convenience we defined scoree(f)=cost(f)2w(f,(e)c)\mathrm{score}_{e}(f)=\mathrm{cost}(f)-2w(f^{\downarrow},(e^{\downarrow})^{c}), where the superscript cc denotes taking the complement.

We begin the algorithm by computing cost(e)\mathrm{cost}(e) for every eE(T)e\in E(T), which can be done in 𝒪(m)\mathcal{O}(m) time by Lemma 3. We then do an Euler of TT while maintaining a data structure from Lemma 7 such that, when we are considering eE(T)e\in E(T), for every fTef\in T_{e} the value of the data structure at location ff is cost(e,f)\mathrm{cost}(e,f). For fTef\not\in T_{e} this will not in general be the case.

As can be seen from Section 4.1, the key to maintaining this data structure is how to update the values w(f,(e)c)w(f^{\downarrow},(e^{\downarrow})^{c}) when we descend edge ee. Consider the case where we are currently at edge e=(z,x)e^{\prime}=(z,x) and move to a descending edge e=(x,y)e=(x,y). For two vertices u,vu,v let p(u,v)p(u,v) be the set of edges on the path from uu to vv in TT, and let lca(u,v)\mathrm{lca}(u,v) be their least common ancestor in TT. For fTef\in T_{e} we see that

w(f,(e)c)=w(f,(e)c)+{u,v}Efp(u,v),lca(u,v)=xw({u,v}).w(f^{\downarrow},(e^{\downarrow})^{c})=w(f^{\downarrow},(e^{\prime\downarrow})^{c})+\sum_{\{u,v\}\in E\atop f\in p(u,v),\mathrm{lca}(u,v)=x}w(\{u,v\})\enspace. (2)

By its definition in (4.1) we can compute scoree(f)\mathrm{score}_{e}(f) from scoree(f)\mathrm{score}_{e^{\prime}}(f) by subtracting 2w({u,v})2w(\{u,v\}) from for every {u,v}E\{u,v\}\in E such that fp(u,v)f\in p(u,v) and lca(u,v)=x\mathrm{lca}(u,v)=x. The for-loop on line 2 of Algorithm 2 implements this step for all ff by looping over all {u,v}E\{u,v\}\in E with lca(u,v)=x\mathrm{lca}(u,v)=x. After this update we have that cost(e,f)=cost(e)+score(f)\mathrm{cost}(e,f)=\mathrm{cost}(e)+\mathrm{score}(f) for every fTef\in T_{e}. This shows how to descend down TT while keeping the invariant. The full tree is then explored by taking an Euler tour through TT, and whenever we go back up in the tree we revert the score updates (for-loop on line 10 of Algorithm 3). This allows us to find candidate fTef\in T_{e} for every eE(T)e\in E(T). To bound the number of updates, note that each of the mm edges has a unique lca, and we only do an update corresponding to an edge when the lca is visited by the Euler tour. Since the Euler tour visits every vertex at most twice, the number of updates is at most 2m2m. In addition, the number of categorical top two queries is n1n-1. The data structure from Lemma 7 then yields 𝒪(mlog2n)\mathcal{O}(m\log^{2}n) time overall.

The algorithm is formalized in Algorithm 2, whose correctness we prove in the following lemma.

Lemma 12 (cf. [GMW20, Lemma 8]).

Assume that we first initialize e.scorecost(e)e.\mathrm{score}\leftarrow\mathrm{cost}(e) for every eE(T)e\in E(T), and then run Algorithm 2 (doing nothing in line 6). Then whenever an edge e=(x,y)e=(x,y) is followed on line 5 in the call to Traverse(x)(x) it holds that cost(e,f)=cost(e)+f.score\mathrm{cost}(e,f)=\mathrm{cost}(e)+f.\mathrm{score} for all fTef\in T_{e}.

Proof.

We will prove this by induction on the depth of xx. Consider the case where xx is the root rr. Before the call to Traverse(r)(r) we initialized all scores to e.scorecost(e)e.\mathrm{score}\leftarrow\mathrm{cost}(e). Then, on line 3 of Traverse(r)(r), for each {u,v}E(G)\{u,v\}\in E(G) with lca(u,v)=r\mathrm{lca}(u,v)=r we subtract 2w({u,v})2w(\{u,v\}) from the score of every edge on the uu to vv path in TT. Let us refer to scores at this point in time as “at time zero.” We first claim that at time zero for any outgoing edge e=(r,y)e=(r,y) from the root this makes cost(e,f)=cost(e)+f.score\mathrm{cost}(e,f)=\mathrm{cost}(e)+f.\mathrm{score} for all fTef\in T_{e}.

Let p(u,v)p(u,v) be the set of edges on the path from uu to vv in TT. By Section 4.1 we have cost(e,f)=cost(e)+cost(f)2w(f,(e)c)\mathrm{cost}(e,f)=\mathrm{cost}(e)+\mathrm{cost}(f)-2w(f^{\downarrow},(e^{\downarrow})^{c}) thus it suffices to show that for any fTef\in T_{e}

w(f,(e)c)={u,v}E(G)fp(u,v),lca(u,v)=rw({u,v}).w(f^{\downarrow},(e^{\downarrow})^{c})=\sum_{\{u,v\}\in E(G)\atop f\in p(u,v),\mathrm{lca}(u,v)=r}w(\{u,v\})\enspace.

This holds because by definition hE(f,(e)c)h\in E(f^{\downarrow},(e^{\downarrow})^{c}) iff one endpoint is in ff^{\downarrow} the other endpoint is in (e)c)(e^{\downarrow})^{c}), which in turn happens iff the least common ancestor of the endpoints is rr and ff lies on the path between the endpoints.

To finish the base case, we claim that at each iteration of the for loop all scores are the same as at time zero. This is because in the recursive calls that follow the update to the scores on line 3 the update is exactly canceled out by the reverse update on line 10 when the recursive call exits.

For the inductive step, let us suppose that when an edge e=(x,y)e=(x,y) is followed on line 5 in the call to Traverse(x)(x) it holds that cost(e,f)=cost(e)+f.score\mathrm{cost}(e,f)=\mathrm{cost}(e)+f.\mathrm{score} for all fTef\in T_{e}. Let us now refer to scores at this point in time as “at time zero.” We then want to show that on line 5 in the call to Traverse(y)(y) that for an outgoing edge e=(y,z)e^{\prime}=(y,z) it holds that cost(e,f)=cost(e)+f.score\mathrm{cost}(e^{\prime},f)=\mathrm{cost}(e^{\prime})+f.\mathrm{score} for all fTef\in T_{e^{\prime}}. The change in the scores from time zero to the execution of the for loop in the call to Traverse(y)(y) occurs in the update on line 3. Let us refer to scores at this point in time as “at time one.” We first show that at time one for any outgoing edge e=(y,z)e^{\prime}=(y,z) of yy it holds that cost(e,f)=cost(e)+f.score\mathrm{cost}(e^{\prime},f)=\mathrm{cost}(e^{\prime})+f.\mathrm{score} for all fTef\in T_{e^{\prime}}. The key to this is to consider the difference between cost(e,f)\mathrm{cost}(e,f) and cost(e,f)\mathrm{cost}(e^{\prime},f) for an fTef\in T_{e^{\prime}}. By Section 4.1 we have cost(e,f)=cost(e)+cost(f)2w(f,(e)c)\mathrm{cost}(e^{\prime},f)=\mathrm{cost}(e^{\prime})+\mathrm{cost}(f)-2w(f^{\downarrow},(e^{\prime\downarrow})^{c}), and by the inductive hypothesis at time zero score.f=cost(f)2w(f,(e)c)\mathrm{score}.f=\mathrm{cost}(f)-2w(f^{\downarrow},(e^{\downarrow})^{c}). Thus so that cost(e,f)=cost(e)+score.f\mathrm{cost}(e^{\prime},f)=\mathrm{cost}(e^{\prime})+\mathrm{score}.f we need to update the score.f\mathrm{score}.f by

2(w(f,(e)c)w(f,(e)c))=2{u,v}E(G)fp(u,v),lca(u,v)=yw({u,v}).2\left(w(f^{\downarrow},(e^{\downarrow})^{c})-w(f^{\downarrow},(e^{\prime\downarrow})^{c})\right)=-2\sum_{\{u,v\}\in E(G)\atop f\in p(u,v),\mathrm{lca}(u,v)=y}w(\{u,v\})\enspace. (3)

To see this, first note that E(f,(e)c)E(f,(e)c)E(f^{\downarrow},(e^{\downarrow})^{c})\subseteq E(f^{\downarrow},(e^{\prime\downarrow})^{c}). This confirms that we should subtract something to perform this update. An edge {u,v}\{u,v\} is in E(f,(e)c)E(f^{\downarrow},(e^{\prime\downarrow})^{c}) but not E(f,(e)c)E(f^{\downarrow},(e^{\downarrow})^{c}) iff one endpoint, say uu, is in ff\downarrow and the other endpoint vv is in (e)c(e)c(e^{\prime\downarrow})^{c}\setminus(e^{\downarrow})^{c}. This means that vev\in e^{\downarrow} but now ee^{\prime\downarrow} and so y=lca(u,v)y=\mathrm{lca}(u,v). The condition ufu\in f^{\downarrow} is then equivalent to having ff on the path between uu and vv. This confirms that Eq. 3 performs the correct update.

To finish the proof, we claim that cost(e,f)=cost(e)+score.f\mathrm{cost}(e^{\prime},f)=\mathrm{cost}(e^{\prime})+\mathrm{score}.f not just at time one, but at the time when the for loop with ee^{\prime} is executed. This is again because in the changes to the scores on line 3 that are made in a recursive call are reversed when the recursive call exits on line 10, thus every time the for loop is executed the scores are the same as the scores at time one. ∎

Algorithm 2 Euler tour maintaining cost(e,f)\mathrm{cost}(e,f) for fTef\in T_{e}
1:function Traverse(xx)
2:     for all {u,v}E(G)\{u,v\}\in E(G) such that lca(u,v)=x\mathrm{lca}(u,v)=x do
3:         AddPath(2w({u,v}),u-to-v-2w(\{u,v\}),u\text{-to-}v)
4:     end for
5:     for all yy such that e=(x,y)E(T)e=(x,y)\in E(T) do
6:         Process ee. \triangleright “Process” depends on context.
7:         Traverse(yy)
8:     end for
9:     for all {u,v}E(G)\{u,v\}\in E(G) such that lca(u,v)=x\mathrm{lca}(u,v)=x do
10:         AddPath(2w({u,v}),u-to-v2w(\{u,v\}),u\text{-to-}v)
11:     end for
12:end function

Given Lemma 12 to maintain cost(e,f)\mathrm{cost}(e,f) for fTef\in T_{e} during an Euler tour of the tree, and with a link cut tree data structure to handle categorical top two queries, it is now straightforward to design an algorithm to find for every edge ee a partner for ee that is a descendant or ancestor, if such a partner exists.

Algorithm 3 The descendant edge portion of 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges}.
1:XX\leftarrow\emptyset \triangleright XX will hold all edges found during the round
2:for eE(T)e\in E(T) do e.scorecost(e)e.\mathrm{score}\leftarrow\mathrm{cost}(e) \triangleright scores are maintained with Lemma 7
3:end for
4:Run Traverse(rr) where “Process ee” means running Below(ee).
5:Run Traverse(rr) where “Process ee” means running Above(ee).
1:function Below(ee) \triangleright Find a partner for ee in TeT_{e}.
2:     if the head of ee has outdegree 1 then
3:         Let hh be the outgoing edge of the head of ee. \triangleright In this case cut(e,h)\mathrm{cut}(e,h) is trivial
4:         h.score+=β+1h.\mathrm{score}\mathrel{+}=\beta+1
5:     end if
6:     (f,g)=(f,g)= CatTopTwo(ee)
7:     if color(f)color(e)&cost(e)+score(f)β\mathrm{color}(f)\neq\mathrm{color}(e)\And\mathrm{cost}(e)+\mathrm{score}(f)\leq\beta then
8:         XX{e,f}X\leftarrow X\cup\{e,f\}
9:     else if color(g)color(e)&cost(e)+score(f)β\mathrm{color}(g)\neq\mathrm{color}(e)\And\mathrm{cost}(e)+\mathrm{score}(f)\leq\beta then
10:         XX{e,g}X\leftarrow X\cup\{e,g\}
11:     end if
12:     if the head of ee has outdegree 1 then
13:         h.score-=β+1h.\mathrm{score}\mathrel{-}=\beta+1
14:     end if
15:end function
1:function Above(ee) \triangleright Find all ff such that fTef\in T_{e} and ee is a partner of ff.
2:     if the head of ee has outdegree 1 then
3:         Let hh be the edge coming into the head of ee. \triangleright In this case cut(e,h)\mathrm{cut}(e,h) is trivial
4:         h.score+=β+1h.\mathrm{score}\mathrel{+}=\beta+1
5:     end if
6:     noMore = false
7:     repeat
8:         (f,g)=(f,g)= CatTopTwo(ee)
9:         if color(f)color(e)&cost(e)+score(f)β\mathrm{color}(f)\neq\mathrm{color}(e)\And\mathrm{cost}(e)+\mathrm{score}(f)\leq\beta then
10:              XX{e,f}X\leftarrow X\cup\{e,f\}
11:              f.score+=β+1f.\mathrm{score}\mathrel{+}=\beta+1
12:         else if color(g)color(e)&cost(e)+score(f)β\mathrm{color}(g)\neq\mathrm{color}(e)\And\mathrm{cost}(e)+\mathrm{score}(f)\leq\beta then
13:              XX{e,g}X\leftarrow X\cup\{e,g\}
14:              g.score+=β+1g.\mathrm{score}\mathrel{+}=\beta+1
15:         else
16:              noMore = True
17:         end if
18:     until noMore
19:     if the head of ee has outdegree 1 then
20:         h.score-=β+1h.\mathrm{score}\mathrel{-}=\beta+1
21:     end if
22:end function
Theorem 13.

Given an assignment e.colore.\mathrm{color} for each eE(T)e\in E(T), there is a deterministic algorithm that runs in time 𝒪(mlog2n)\mathcal{O}(m\log^{2}n) and for each ee finds an ff such that

  1. 1.

    {e,f}H\{e,f\}\in H

  2. 2.

    eTfe\in T_{f} or fTef\in T_{e}

  3. 3.

    e.colorf.colore.\mathrm{color}\neq f.\mathrm{color}

if such an ff exists.

Proof.

The algorithm is given by Algorithm 3. Suppose that an edge ee has a partner ff satisfying the 3 conditions of the theorem. Then either fTef\in T_{e} or eTfe\in T_{f}. We claim that if fTef\in T_{e} then we will find a partner of ee in the call to Traverse(rr) using Below(ee) to process edge ee, and if eTfe\in T_{f} then we will find a partner of ee in the call to Traverse(rr) using Above(ee) to process edge ee.

Let us show these statements separately, starting with the case fTef\in T_{e}. Consider the time when ee is considered in the for loop line 5 in a recursive call from Traverse(rr) using Below(ee) to process edge ee. In the call to Below(ee) we first check if the tail of ee has a single outgoing edge hh. If this is the case then cut(e,h)\mathrm{cut}(e,h) is a trivial cut and thus we do not want to find hh as a partner for ee. We thus add β+1\beta+1 to the score of hh ensuring that it will never be a valid partner for ee. By Proposition 8 this is the only situation where cut(e,f)\mathrm{cut}(e,f) is trivial for fTef\in T_{e}. By Lemma 12 for all other gTeg\in T_{e} it holds that cost(e,g)=cost(e)+g.score\mathrm{cost}(e,g)=\mathrm{cost}(e)+g.\mathrm{score}. Thus the call to CatTopTwo(ee) will perform correctly, and one of the returned edges must be a valid partner for ee. We then reset the score of hh, if it was changed, to maintain the property given by Lemma 12.

Now consider the case where ee has a partner ff with eTfe\in T_{f}. Let ff be the first such partner that is encountered in an Euler tour of TT. We claim that the edge {e,f}\{e,f\} will be added to XX in the call to Traverse(rr) using Above(ee) to process edge ee. First note that after the previous call to Traverse(rr) terminates it holds that g.score=cost(g)g.\mathrm{score}=\mathrm{cost}(g) as all changes to the scores in the recursive calls are reverted after the call returns. Thus we are again in position to apply Lemma 12, although we have to be slightly more careful this time as scores are modified within the body of the for loop on line 5 of Algorithm 2 when we run Above(ee) to process ee. We again handle the possibility of trivial cuts as in the “below” case. We also modify a score for an edge ee after a partner for ee has been found and thus our job for ee is done and we no longer need to use its score. As by assumption ff is the first potential partner for ff encountered in the Euler tour, the score of ee has not been modified at this point. Thus by Lemma 12 it holds that cost(e,f)=cost(f)+e.score\mathrm{cost}(e,f)=\mathrm{cost}(f)+e.\mathrm{score}. This means that ee will eventually be found in the repeat loop on line 7 of the function Above(ee).

Let us now analyze the running time. Computing cost(e)\mathrm{cost}(e) for each edge ee can be done in time 𝒪(m+n)\mathcal{O}(m+n) by Lemma 3. Before proceeding with the traversal, we gather, for each node xx, all edges {u,v}E(G)\{u,v\}\in E(G) such that lca(u,v)=x\mathrm{lca}(u,v)=x. This can be done in 𝒪(m)\mathcal{O}(m) time by constructing in 𝒪(n)\mathcal{O}(n) time a constant-time LCA structure [BF00], and iterating over the edges. Next consider the body of the for loop in the Below function. Here we make a single CatTopTwo query which takes time 𝒪(logn)\mathcal{O}(\log n) by Lemma 7. Thus over the entire Euler tour these queries contribute 𝒪(nlogn)\mathcal{O}(n\log n) to the running time. In the Above function all CatTopTwo queries in the body of the for loop except for one (when noMore becomes true) will result in de-activating an edge. Thus again the total query time over the Euler tour is 𝒪(nlogn)\mathcal{O}(n\log n).

Finally consider the cost of updating the scores in the Euler tour. As discussed earlier, over the course of the Euler tour this requires doing 2 calls to AddPath for every edge of GG. Each AddPath call can be done in time 𝒪(log2n)\mathcal{O}(\log^{2}n) by Lemma 7, thus the overall time for this is 𝒪(mlog2n)\mathcal{O}(m\log^{2}n), which dominates the complexity of the algorithm. ∎

4.2 Independent edges

The goal now is to find, for every edge eE(T)e\in E(T), a partner fE(T)f\in E(T) such that e,fe,f are independent, or decide that there is no such ff. As we chose the root of TT to have degree 1, by Proposition 8 we do not have to worry about trivial cuts in the independent edge case. Instead of considering all edges eE(T)e\in E(T) one-by-one, we first find a heavy path decomposition of TT and then iterate over all pairs of heavy paths h,hh,h^{\prime} to look for a partner fhf\in h^{\prime} for every ehe\in h. We cannot literally carry out this plan as the number of pairs of heavy paths can be Ω(n2)\Omega(n^{2}) and so we cannot explicitly consider every pair. We show next that many pairs h,hh,h^{\prime} result in a trivial case and that all these trivial pairs can be solved together in one batch. We then bound the number of non-trivial pairs and show that in near-linear time we can explicitly process all of them. The idea of processing pairs of heavy paths, and explicitly considering only the non-trivial ones, was introduced in the context of 2-respecting cuts by Mukhopadhyay and Nanongkai [MN20] (see also [GMW20]).

Consider two distinct heavy paths h,hh,h^{\prime}, where hh is the path u1u2uqu_{1}-u_{2}-\cdots-u_{q} and hh^{\prime} is the path v1v2vqv_{1}-v_{2}-\ldots v_{q^{\prime}}. We let ei=(ui,ui+1)e_{i}=(u_{i},u_{i+1}) for i=1,,q1i=1,\ldots,q-1 and fi=(vi,vi+1)f_{i}=(v_{i},v_{i+1}) for i=1,,q1i=1,\ldots,q^{\prime}-1. It can be that not all pairs ei,fje_{i},f_{j} are independent, see Fig. 1. However, we can easily identify the subpaths of h,hh,h^{\prime} containing pairwise independent edges in constant time by computing the least common ancestor vv of the tails of h,hh,h^{\prime}. If v=vpv=v_{p^{\prime}} lies on hh^{\prime} then ei,fje_{i},f_{j} will be independent for 1i<q1\leq i<q and pj<qp^{\prime}\leq j<q^{\prime}, and similarly if vv lies on hh. In general we assume that p,pp,p^{\prime} have been determined so that ei,fje_{i},f_{j} are independent for all pi<qp^{\prime}\leq i<q and pj<qp^{\prime}\leq j<q^{\prime}, and that these pairs comprise all of the independent pairs on h,hh,h^{\prime}. We can associate to h,hh,h^{\prime} a (q1)(q-1)-by-(q1)(q^{\prime}-1) matrix M(h,h)M^{(h,h^{\prime})} where for pi<qp^{\prime}\leq i<q and pj<qp^{\prime}\leq j<q^{\prime}

M(h,h)[i,j]\displaystyle M^{(h,h^{\prime})}[i,j] =cost(ei,fj)\displaystyle=\mathrm{cost}(e_{i},f_{j})
=cost(ei)+cost(fj)2w(ei,fj),\displaystyle=\mathrm{cost}(e_{i})+\mathrm{cost}(f_{j})-2w(e_{i}^{\downarrow},f_{j}^{\downarrow})\enspace, (4)

and M(h,h)M^{(h,h^{\prime})} is undefined otherwise.444We could restrict M(h,h)M^{(h,h^{\prime})} to the submatrix on which it is defined, but find it notationally easier for the i,ji,j indices in M(h,h)M^{(h,h^{\prime})} to match the edge labels. By Lemma 3, all values of cost(e)\mathrm{cost}(e) can be computed in 𝒪(m)\mathcal{O}(m) total time. To efficiently evaluate M(h,h)M^{(h,h^{\prime})}, we will prepare a list L(h,h)L(h,h^{\prime}) of all edges that contribute to w(e,f)w(e^{\downarrow},f^{\downarrow}) for independent e,fe,f with eh,fhe\in h,f\in h^{\prime}. For many h,hh,h^{\prime} the list L(h,h)L(h,h^{\prime}) will be empty, leading to the trivial case mentioned above. The following lemma bounds the size of all the non-empty lists and shows they can be constructed efficiently.

Lemma 14.

The total length of all lists L(h,h)L(h,h^{\prime}) is 𝒪(mlog2n)\mathcal{O}(m\log^{2}n) and all non-empty lists L(h,h)L(h,h^{\prime}) can be constructed deterministically in time 𝒪(mlog2n)\mathcal{O}(m\log^{2}n).

Proof.

Observe that an edge {u,v}E\{u,v\}\in E can contribute to w(e,f)w(e^{\downarrow},f^{\downarrow}) for independent e,fe,f with eh,fhe\in h,f\in h^{\prime} only if uu is in the subtree rooted at the head of hh and vv is in the subtree rooted at the head of hh^{\prime}. There are at most logn\log n heavy paths intersecting the path from uu to the root and from vv to the root, and we can iterate over all such heavy paths in time proportional to their number (for example, by storing for each edge of TT, a pointer to the head of the heavy path that contains it). Thus, we can iterate over all relevant h,hh,h^{\prime} in 𝒪(log2n)\mathcal{O}(\log^{2}n) time, and add a triple (h,h,{u,v})(h,h^{\prime},\{u,v\}) to an auxiliary list in which the heavy paths are identified by their heads. The total size of the auxiliary list is now 𝒪(mlog2n)\mathcal{O}(m\log^{2}n) and it can be lexicographically sorted in the same time with radix sort. After sorting each non-empty list L(h,h)L(h,h^{\prime}) constitutes a contiguous fragment of the auxiliary list. ∎

We can now describe how to find a partner ff for every ee such that e,fe,f are independent. The algorithm first solves together in one batch the case where the partner of ehe\in h is in a heavy path hh^{\prime} where L(h,h)L(h,h^{\prime}) is empty. After that we explicitly consider all h,hh,h^{\prime} with L(h,h)L(h,h^{\prime}) non-empty. We consider these two cases in the next two subsections.

4.2.1 Empty lists

Lemma 15.

There is a deterministic algorithm that in time 𝒪(m+n)\mathcal{O}(m+n) finds a partner for every edge eE(T)e\in E(T) that has a partner ff such that e,fe,f are independent and eh,fhe\in h,f\in h^{\prime} with L(h,h)L(h,h^{\prime}) empty.

Proof.

The key observation is that if L(h,h)L(h,h^{\prime}) is empty then cost(e,f)=cost(e)+cost(f)\mathrm{cost}(e,f)=\mathrm{cost}(e)+\mathrm{cost}(f) by Section 4.2. As can be seen from Section 4.1 and Section 4.2, for any edge ff^{\prime} it always holds that cost(e,f)cost(e)+cost(f)\mathrm{cost}(e,f^{\prime})\leq\mathrm{cost}(e)+\mathrm{cost}(f^{\prime}), whether e,fe,f^{\prime} are descendant or independent. Thus in this case it suffices for us to find any ff^{\prime} of color different from ee such that cost(e)+cost(f)β\mathrm{cost}(e)+\mathrm{cost}(f^{\prime})\leq\beta, and cut(e,f)\mathrm{cut}(e,f^{\prime}) is non-trivial as this ensures cost(e,f)β\mathrm{cost}(e,f^{\prime})\leq\beta. We are guaranteed such an ff^{\prime} exists as ff satisfies this.

By Lemma 3 we can compute cost(f)\mathrm{cost}(f^{\prime}) for every fE(T)f^{\prime}\in E(T) in time 𝒪(m)\mathcal{O}(m). Then in time 𝒪(n)\mathcal{O}(n) with one pass over E(T)E(T) we compute the edge f1f_{1} of lowest cost and the edge f2f_{2} of lowest cost that is of color different to f1f_{1}. We then repeat this categorical top two query twice more, each time excluding all previously found edges. At the end we obtain edges f1,,f6f_{1},\ldots,f_{6}. We claim that for every ee, at least one of these must be a valid partner.

Consider any particular ee. The first categorical top two query can only fail to find a valid partner for ee if one of f1,f2f_{1},f_{2} creates a trivial cut with ee. In this case, the second categorical top two query can only fail if one of f3,f4f_{3},f_{4} creates a trivial cut with ee as well. By Proposition 8, however, there are at most two possible edges that can create a trivial cut with ee, thus in this case the third categorical top two query must succeed and we find a valid partner for ee. ∎

4.2.2 Non-empty lists

The more difficult case is to find partners among pairs h,hh,h^{\prime} with L(h,h)L(h,h^{\prime}) non-empty. To solve this case we will use the special structure of M(h,h)M^{(h,h^{\prime})}. As above, say that hh is the path u1u2uqu_{1}-u_{2}-\cdots-u_{q} and hh^{\prime} is the path v1v2vqv_{1}-v_{2}-\ldots v_{q^{\prime}}, and let ei=(ui,ui+1)e_{i}=(u_{i},u_{i+1}) for i=1,,q1i=1,\ldots,q-1 and fi=(vi,vi+1)f_{i}=(v_{i},v_{i+1}) for i=1,,q1i=1,\ldots,q^{\prime}-1. Further suppose ei,fje_{i},f_{j} are independent for all pi<q,pj<qp\leq i<q,p^{\prime}\leq j<q^{\prime}. We have that M(h,h)[i,j]=cost(ei)+cost(fj)2w(ei,fj)M^{(h,h^{\prime})}[i,j]=\mathrm{cost}(e_{i})+\mathrm{cost}(f_{j})-2w(e_{i}^{\downarrow},f_{j}^{\downarrow}) for pi<q,pj<qp\leq i<q,p^{\prime}\leq j<q^{\prime}. Recall that L(h,h)L(h,h^{\prime}) is defined precisely as the list of edges that contribute to w(e,f)w(e^{\downarrow},f^{\prime\downarrow}) for independent eh,fhe\in h,f^{\prime}\in h^{\prime}. The contribution of a specific edge {u,v}L(h,h)\{u,v\}\in L(h,h^{\prime}) can be understood as follows: let uiu_{i} be the lowest common ancestor of uu and uqu_{q}, and vjv_{j} be the lowest common ancestor of vv and vqv_{q^{\prime}}. Then the weight of {u,v}\{u,v\} contributes to M[a,b]M[a,b] for every paip\leq a\leq i, pbjp^{\prime}\leq b\leq j. This is depicted in Fig. 1. We will compute these indices ii and jj for every {u,v}L(h,h)\{u,v\}\in L(h,h^{\prime}). This takes constant time per edge using an appropriate LCA structure [BF00], and so total time 𝒪(|L(h,h)|)\mathcal{O}(|L(h,h^{\prime})|). Let (h,h)={(i,j){u,v}L(h,h),ui=lca(u,uq),vj=lca(v,vq)}\mathcal{L}(h,h^{\prime})=\{(i,j)\mid\{u,v\}\in L(h,h^{\prime}),u_{i}=\mathrm{lca}(u,u_{q}),v_{j}=\mathrm{lca}(v,v_{q^{\prime}})\} denote the resulting list of index pairs, each of which has an associated weight.

Refer to caption
Figure 1: Contribution of an edge {u,v}L(h,h)\{u,v\}\in L(h,h^{\prime}) (denoted in green on the left) to M(h,h)[,]M^{(h,h^{\prime})}[\cdot,\cdot] (denoted in grey on the right).
Lemma 16.

Let ={eh,h,f:eh,fh,e,f are partners and L(h,h) non-empty}\mathcal{F}=\{e\mid\exists h,h^{\prime},f:e\in h,f\in h^{\prime},e,f\text{ are partners and }L(h,h^{\prime})\text{ non-empty}\}. There is a deterministic algorithm to find a partner for every ee\in\mathcal{F} in time 𝒪(mlog3n)\mathcal{O}(m\log^{3}n).

Proof.

The algorithm is given in Algorithm 4. We describe the algorithm here and analyze its correctness and running time.

For every heavy path hh let AhA_{h} be an array with Ah[e].score=cost(e)A_{h}[e].\mathrm{score}=\mathrm{cost}(e) and Ah[e].color=color(e)A_{h}[e].\mathrm{color}=\mathrm{color}(e) for every ehe\in h. Via Lemma 6 there is a data structure that supports path updates and CatTopTwo queries to AhA_{h} in 𝒪(logn)\mathcal{O}(\log n) time. The total time for this initialization step is 𝒪(n)\mathcal{O}(n).

Let LL be an ordered list of pairs that contains (h,h)(h,h^{\prime}) and (h,h)(h^{\prime},h) for every h,hh,h^{\prime} with L(h,h)L(h,h^{\prime}) non-empty. We sort LL by the name of the first path with radix sort in 𝒪(mlog2n)\mathcal{O}(m\log^{2}n) time. We will follow LL to iterate over all h,hh,h^{\prime} with L(h,h)L(h,h^{\prime}) non-empty.

Let us describe what the algorithm does when considering pairs h,hh,h^{\prime} where hh consists of edges e1,,eqe_{1},\ldots,e_{q} and hh^{\prime} consists of the edges f1,,fqf_{1},\ldots,f_{q^{\prime}}, where edges ei,fje_{i},f_{j} are independent for pi<q,pj<qp\leq i<q,p^{\prime}\leq j<q^{\prime}, and these comprise all the independent pairs in h,hh,h^{\prime}. We iterate over the columns of M(h,h)M^{(h,h^{\prime})}, starting from q1q^{\prime}-1 and going until pp^{\prime}, and maintain the invariant that, when considering column jj, it holds that Ah[i].score=M(h,h)[i,j]cost(fj)A_{h}[i].\mathrm{score}=M^{(h,h^{\prime})}[i,j]-\mathrm{cost}(f_{j}) for every active edge with index pi<qp\leq i<q (where active will be defined later). We postpone describing how to maintain this invariant for the moment. Then we do a CatTopTwo query on AhA_{h} which returns potential candidates ea,ebe_{a},e_{b}. If there is an edge ehe\in h for which fjf_{j} is a valid partner then fjf_{j} must be a partner for either ea,ebe_{a},e_{b}. This can be checked in constant time. If fjf_{j} is not a partner for either then we move on to column j1j-1; if it is a partner for, say eae_{a}, then we add β+1\beta+1 to Ah[a].scoreA_{h}[a].\mathrm{score} to “de-activate” eae_{a} and repeat the process by doing a CatTopTwo query again on AhA_{h} until no valid partner is returned.

The basic algorithm we have described considers every column of M(h,h)M^{(h,h^{\prime})} from q1q^{\prime}-1 to pp^{\prime}. We now show how to accelerate this algorithm by restricting our attention to a subset of the columns of M(h,h)M^{(h,h^{\prime})} in this interval. Let Kh,h=|L(h,h)|K_{h,h^{\prime}}=|L(h,h^{\prime})|. We sort the pairs in (h,h)\mathcal{L}(h,h^{\prime}) by the second coordinate in time 𝒪(Kh,hlogKh,h)=𝒪(Kh,hlogn)\mathcal{O}(K_{h,h^{\prime}}\log K_{h,h^{\prime}})=\mathcal{O}(K_{h,h^{\prime}}\log n). Let J1<<JtJ_{1}<\cdots<J_{t} be the distinct values of the second coordinate that appear in this sorted list, where tKh,ht\leq K_{h,h^{\prime}}. Set J0=p1J_{0}=p^{\prime}-1 and Jt+1=q1J_{t+1}=q^{\prime}-1. For Jk<jJk+1J_{k}<j\leq J_{k+1} we have that w(ei,fj)w(e_{i}^{\downarrow},f_{j}^{\downarrow}) is constant over jj for every pi<qp\leq i<q by the definition of (h,h)\mathcal{L}(h,h^{\prime}). We call such an interval a void interval. Thus the minimum of M(h,h)[i,j]M^{(h,h^{\prime})}[i,j] over Jk<jJk+1J_{k}<j\leq J_{k+1} necessarily occurs at j=argminJk<jJk+1cost(j)j^{*}=\operatorname*{arg\,min}_{J_{k}<j\leq J_{k+1}}\mathrm{cost}(j). This means that if edge eihe_{i}\in h has a partner fjf_{j} with Jk<jJk+1J_{k}<j\leq J_{k+1} then one of the two edges returned by a CatTopTwo(Jk+1,Jk+1)(J_{k}+1,J_{k+1}) to AhA_{h^{\prime}} must be a partner for eie_{i}.

We can thus amend the algorithm to the following. For J=Jt+1,Jt,,J1J=J_{t+1},J_{t},\ldots,J_{1} we iterate over the endpoints of the void intervals. When J=JkJ=J_{k} we maintain the invariant that Ah[i].score=M(h,h)[i,Jk]cost(fJk)A_{h}[i].\mathrm{score}=M^{(h,h^{\prime})}[i,J_{k}]-\mathrm{cost}(f_{J_{k}}). Thus for all Jk1<jJkJ_{k-1}<j\leq J_{k} it holds that M(h,h)[i,j]=Ah[i].score+cost(fj)M^{(h,h^{\prime})}[i,j]=A_{h}[i].\mathrm{score}+\mathrm{cost}(f_{j}). We then do a CatTopTwo query on Ah[p:q1]A_{h}[p:q-1] and a CatTopTwo query on AhA_{h^{\prime}} with the interval (Jk1,Jk](J_{k-1},J_{k}]. If any ehe\in h has a partner fjf_{j} with Jk1<jJkJ_{k-1}<j\leq J_{k} at least one of the 4 possible pairs returned must be partners. We de-activate any ehe\in h which finds a partner by adding β+1\beta+1 to its score and repeat the process until no valid partners are found, at which point we move on to the next void interval. If PP partners are found then the total time spent in this void interval will be 𝒪((P+1)logn)\mathcal{O}((P+1)\log n) for the CatTopTwo queries and updates to de-activate edges.

It remains to describe how to maintain the invariant Ah[i].score=M(h,h)[i,Jk]cost(fJk)A_{h}[i].\mathrm{score}=M^{(h,h^{\prime})}[i,J_{k}]-\mathrm{cost}(f_{J_{k}}) for all pi<q1p\leq i<q-1 when J=JkJ=J_{k}. To do this for every pair (i,j)(i,j) in (h,h)\mathcal{L}(h,h^{\prime}) with j=Jkj=J_{k} with associated weight ww we subtract 2w2w from AhA_{h} in the interval [p,Jk][p,J_{k}]. Each such interval update can be done in time 𝒪(logn)\mathcal{O}(\log n) by Lemma 6 so the total time for all updates is 𝒪(Kh,hlogn)\mathcal{O}(K_{h,h^{\prime}}\log n).

Once we finish processing hh^{\prime}, we reverse all of the interval updates (but not the edge de-activations) so that we again have Ah[i].score=cost(ei)A_{h}[i].\mathrm{score}=\mathrm{cost}(e_{i}) for all ii for all active edges eie_{i}. This again can be done in time 𝒪(Kh,hlogn)\mathcal{O}(K_{h,h^{\prime}}\log n). Once we finish processing all pairs hh^{\prime} associated hh we subtract β+1\beta+1 from Ah[e].scoreA_{h}[e].\mathrm{score} for all edges ee that were de-activated to make them active again.

The total number of edge de-activations is at most nn thus this contributes 𝒪(nlogn)\mathcal{O}(n\log n) to the running time and is low order. Over all h,hh,h^{\prime} the total time spent is 𝒪(mlog3n)\mathcal{O}(m\log^{3}n). ∎

Algorithm 4 Find partners among non-empty lists
1:for all heavy paths hh do
2:     Initialize data structure AhA_{h} with scores cost(e)\mathrm{cost}(e) and colors e.colore.\mathrm{color} for all ehe\in h.
3:end for
4:Compile a list LL of all ordered pairs (h,h)(h,h^{\prime}) and (h,h)(h^{\prime},h) with L(h,h)L(h,h^{\prime}) non-empty. Sort by the name of the first path.
5:for all hh that appears as a first path in LL do
6:     for all hh^{\prime} paired with hh in LL do
7:         Compute the index list (h,h)\mathcal{L}(h,h^{\prime}) with associated weights.
8:         Partition the interval [p,q1][p^{\prime},q^{\prime}-1] into void intervals (Jk,Jk+1](J_{k},J_{k+1}] for k=0,,tk=0,\ldots,t.
9:         Set Found=\mathrm{Found}=\emptyset.
10:         for all k=tk=t to 0 do
11:              For all (i,Jk)(h,h)(i^{\prime},J_{k})\in\mathcal{L}(h,h^{\prime}), subtract twice its weight from Ah[p:i].scoreA_{h}[p:i^{\prime}].\mathrm{score}.
12:              Call CatTopTwo(Jk+1,Jk+1]J_{k}+1,J_{k+1}]) on AhA_{h^{\prime}} to obtain edges fc,fdf_{c},f_{d}.
13:              while true  do
14:                  Call CatTopTwo(p,q1)p,q-1) on AhA_{h} to obtain edges ea,ebe_{a},e_{b}.
15:                  if fcf_{c} or fdf_{d} is a partner for eae_{a} then
16:                       Save this pair, add eae_{a} to Found\mathrm{Found}, and do Ah[ea].score+=β+1A_{h}[e_{a}].\mathrm{score}\mathrel{+}=\beta+1.
17:                  else if fcf_{c} or fdf_{d} is a partner for ebe_{b} then
18:                       Save this pair, add ebe_{b} to Found\mathrm{Found} and do Ah[eb].score+=β+1A_{h}[e_{b}].\mathrm{score}\mathrel{+}=\beta+1.
19:                  else
20:                       break
21:                  end if
22:              end while
23:         end for
24:         for all k=tk=t to 0 do
25:              For all (i,Jk)(h,h)(i^{\prime},J_{k})\in\mathcal{L}(h,h^{\prime}), add twice its weight to Ah[p:i].scoreA_{h}[p:i^{\prime}].\mathrm{score}.
26:         end for
27:     end for
28:     Subtract β+1\beta+1 from the score of all edges in Found\mathrm{Found}.
29:end for

4.3 Spanning tree algorithm

We now have all components of the spanning tree algorithm, which we can combine together to implement 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges}.

Lemma 17.

There is a deterministic algorithm to implement 𝖱𝗈𝗎𝗇𝖽𝖤𝖽𝗀𝖾𝗌\mathsf{RoundEdges} which runs in time 𝒪(mlog3n)\mathcal{O}(m\log^{3}n).

Proof.

Given an assignment of colors to the edges of TT, our task is to find a partner for every eE(T)e\in E(T) which has one. If ee has a partner ff such that e,fe,f are in a descendant relationship then a partner for ee can be found in time 𝒪(mlog2n)\mathcal{O}(m\log^{2}n) by Theorem 13. The other case is that ee has a partner ff such that e,fe,f are independent. This divides into two subcases. If the heavy paths h,hh,h^{\prime} containing e,fe,f respectively are such that L(h,h)L(h,h^{\prime}) is empty then we will find a partner for ee via Lemma 15 in time 𝒪(m)\mathcal{O}(m). The bottleneck of the algorithm is where L(h,h)L(h,h^{\prime}) in non-empty in which case we use Lemma 16 to find a partner in time 𝒪(mlog3n)\mathcal{O}(m\log^{3}n). ∎

We can now prove the main theorem of this section, Theorem 9, that we can find a spanning tree of HH in time 𝒪(mlog4n)\mathcal{O}(m\log^{4}n).

Proof of Theorem 9.

Follows from Lemma 11 and Lemma 17. ∎

5 KT partition algorithm

For completeness we state here the full KT partition algorithm, including the reductions from [AL21]. At a high level, we follow Karger’s algorithm to find 𝒪(logn)\mathcal{O}(\log n) spanning trees so that with high probability every (1+ε)(1+\varepsilon)-minimum cut 2-respects at least one of them. We then use our algorithm from Theorem 9 to, for each tree TT, find a generating set for the meet of all (1+ε)(1+\varepsilon)-minimum cuts that 2-respect TT. We are then left with two problems. The first is that we still have to find the meet of the partitions in the generating set. A near-linear time randomized algorithm was given to do this in [AL21]. Here we give a deterministic algorithm to do this. Then we need to take the meet of 𝒪(logn)\mathcal{O}(\log n) partitions, one for each tree. This is simple to do and we handle this first.

Lemma 18.

Let {𝒮1,,𝒮K}\{\mathcal{S}_{1},\ldots,\mathcal{S}_{K}\} be a set of KK partitions of [n][n]. There is a deterministic algorithm to compute i=1K𝒮i\bigwedge_{i=1}^{K}\mathcal{S}_{i} in time 𝒪(Knlogn)\mathcal{O}(Kn\log n).

Proof.

In 𝒪(Knlogn)\mathcal{O}(Kn\log n) time we can assign each j[n]j\in[n] a 𝒪(Klogn)\mathcal{O}(K\log n) bit key indicating which set contains jj in each 𝒮i\mathcal{S}_{i}. Collecting together elements with the same key gives i=1K𝒮i\bigwedge_{i=1}^{K}\mathcal{S}_{i}. ∎

Next we see how to explicitly construct the meet of all (1+ε)(1+\varepsilon)-minimum cuts that 2-respect a tree from a generating set. We follow the idea of the proof in [AL21] but make it deterministic by replacing random hashing with an appropriate data structure.

Lemma 19 (Mehlhorn, Sundar, and Uhrig [MSU97]).

A dynamic family of persistent sequences, each of length at most nn, can be maintained under the following updates. A length-1 sequence is created in constant time, and a new sequence can be obtained by joining and splitting existing sequences in 𝒪(logn(logUlogU+logn))\mathcal{O}(\log n(\log U\log^{*}U+\log n)) time, where UU is the number of updates executed so far. Each sequence ss has an associated signature sig(s)[U3]\mathrm{sig}(s)\in[U^{3}] with the property that s=ss=s^{\prime} iff sig(s)=sig(s)\mathrm{sig}(s)=\mathrm{sig}(s^{\prime}).

For the proof of the lemma it will be useful to use the following definition.

Definition 20 (separate).

Let VV be a finite set and XVX\subseteq V. For u,vVu,v\in V we say that XX separates u,vu,v if exactly one of them is in XX.

Lemma 21 (cf. [AL21, Lemma 31]).

Consider as input a tree TT on a vertex set VV of size nn, and sets of edge singletons 𝒞1E(T)\mathcal{C}_{1}\subseteq E(T) and edge pairs 𝒞2E(T)(2)\mathcal{C}_{2}\subseteq E(T)^{(2)}. These sets define sets of 1-respecting and 2-respecting cuts, respectively. There is an algorithm that in time 𝒪((n+|𝒞1|+|𝒞2|)log2nlogn)\mathcal{O}((n+|\mathcal{C}_{1}|+|\mathcal{C}_{2}|)\log^{2}n\log^{*}n) returns the meet of the bipartitions induced by these cuts.

Proof.

We root the tree at an arbitrary vertex rVr\in V. When we speak of the shore of a cut we always refer to the shore not containing rr. Arrange all elements of 𝒞1\mathcal{C}_{1} and 𝒞2\mathcal{C}_{2} in an arbitrary order to obtain a sequence of N=|𝒞1|+|𝒞2|N=|\mathcal{C}_{1}|+|\mathcal{C}_{2}| cuts. Our goal is to construct, for each node vVv\in V, a string s(v){0,1}Ns(v)\in\{0,1\}^{N} where the ithi^{\scriptsize\mbox{{\rm th}}} bit of s(v)s(v) is 11 iff the shore of the ithi^{\scriptsize\mbox{{\rm th}}} cut contains vv. Assuming that we can indeed efficiently construct such strings, the meet is obtained by grouping together nodes vv with the same string s(v)s(v). However, the difficulty is that we cannot afford to construct s(v)s(v) for all vv explicitly as this would require nNnN bits. Instead, we will use Lemma 19 for representing a collection of strings of length NN.

Consider the preorder traversal of TT starting from the root rr. By definition s(r)=0Ns(r)=0^{N}, which we create in the data structure by NN joins of 0. We then create s(v)s(v) from the string s(parent(v))s(\mathrm{parent}(v)) during the preorder traversal, where parent(v)\mathrm{parent}(v) is the parent of vv. To do this we set s(v)s(parent(v))s(v)\leftarrow s(\mathrm{parent}(v)) and then flip the bits of s(v)s(v) corresponding to cuts whose shore contains vv but not parent(v)\mathrm{parent}(v) or vice versa. Thus we need to understand when the shore of a 2-respecting cut separates vv from parent(v)\mathrm{parent}(v). The shore of a one respecting cut defined by edge ee is ee^{\downarrow}, and hence separates vv and parent(v)\mathrm{parent}(v) iff e={v,parent(v)}e=\{v,\mathrm{parent}(v)\}. A 2-respecting cut defined by edges {e,f}\{e,f\} separates two vertices uu and vv iff exactly one of e,fe,f is on the path from uu to vv in TT. Thus a 2-respecting cut will separate vv and parent(v)\mathrm{parent}(v) iff either e={v,parent(v)}e=\{v,\mathrm{parent}(v)\} or f={v,parent(v)}f=\{v,\mathrm{parent}(v)\}. Hence there will be at most 2N2N bit flips in total and in 𝒪(N)\mathcal{O}(N) time we can annotate the tree with which bits should be flipped at each node.

A bit flip can be implemented in the data structure by a constant number of splits, joins and the creation of a length-1 sequence. As there are 𝒪(n+N)\mathcal{O}(n+N) total operations on the data structure, the total time for all updates is 𝒪((n+N)log2nlogn)\mathcal{O}((n+N)\log^{2}n\log^{*}n) by Lemma 19.

Having obtained all the strings s(v)s(v), we can group together nodes vv with the same string s(v)s(v) by sorting their signatures sig(s(v))\mathrm{sig}(s(v)). Because each signature is a positive integer bounded by 𝒪(N3)\mathcal{O}(N^{3}) by Lemma 19, this can be implemented with radix sort in 𝒪(N)\mathcal{O}(N) time. This gives the lemma. ∎

Algorithm 5 (1+ε)(1+\varepsilon)-KT partition

Input: A weighted graph G=(V,E,w)G=(V,E,w) an a parameter 0ε1/160\leq\varepsilon\leq 1/16
      Output: (1+ε)(1+\varepsilon)-KT partition

1:Use Karger’s tree packing algorithm to construct a set of K𝒪(logn)K\in\mathcal{O}(\log n) spanning trees 𝒯={T1,,Tk}\mathcal{T}=\{T_{1},\ldots,T_{k}\} so that with high probability every (1+ε)(1+\varepsilon)-minimum cut 2-respects at least one of them (Theorem 2).
2:Compute the weight of a minimum 2-respecting cut for each tree in 𝒯\mathcal{T} by Lemma 4, and let λ\lambda be the minimum value found.
3:for i=1,2,,Ki=1,2,\dots,K do
4:     Find the set 𝒜i={eE(Ti)cost(e)(1+ε)λ}\mathcal{A}_{i}=\{e\in E(T_{i})\mid\mathrm{cost}(e)\leq(1+\varepsilon)\lambda\} indexing the 1-respecting near-minimum cuts of TiT_{i} by Lemma 3.
5:     Use Theorem 9 with tree TiT_{i} to find a spanning forest THiT_{H_{i}} of the graph HiH_{i} with edge set {{e,f}:e,fE(Ti),cost(e,f)β,cut(e,f) non-trivial}\{\{e,f\}:e,f\in E(T_{i}),\mathrm{cost}(e,f)\leq\beta,\mathrm{cut}(e,f)\text{ non-trivial}\}. Set i={{e,f}E(THi)}\mathcal{B}_{i}=\{\{e,f\}\in E(T_{H_{i}})\}.
6:     Use Lemma 21 to construct the partition 𝒮i\mathcal{S}_{i} induced by the cuts indexed by 𝒜i\mathcal{A}_{i} and i\mathcal{B}_{i}.
7:end for
8:Output the meet 𝒮=i=1K𝒮i\mathcal{S}=\bigwedge_{i=1}^{K}\mathcal{S}_{i} by Lemma 18.

We are now ready to prove our main theorem. See 1

Proof.

We first prove the theorem for εnt\bigwedge\mathcal{B}_{\varepsilon}^{nt}. The algorithm for computing εnt\bigwedge\mathcal{B}_{\varepsilon}^{nt} is given in Algorithm 5. Let us first argue the correctness. Step 1 succeeds with high probability by Theorem 2, and the rest of the algorithm is deterministic. Thus if we show that the algorithm is correct assuming that Step 1 succeeds, then the algorithm will be correct with high probability.

Let us now assume that Step 1 succeeds. Then λ=λ(G)\lambda=\lambda(G) in Step 2 . Let 𝒯i\mathcal{T}_{i} be the set of bipartitions of all non-trivial (1+ε)(1+\varepsilon)-minimum cuts of GG that 2-respect TiT_{i}, for i=1,,Ki=1,\ldots,K. We have that i𝒯i=εnt\cup_{i}\mathcal{T}_{i}=\mathcal{B}_{\varepsilon}^{nt}. Therefore

εnt=i=1K𝒯i.\bigwedge\mathcal{B}_{\varepsilon}^{nt}=\bigwedge_{i=1}^{K}\bigwedge\mathcal{T}_{i}\enspace.

For each ii we have (𝒜ii)=𝒯i\bigwedge(\mathcal{A}_{i}\cup\mathcal{B}_{i})=\bigwedge\mathcal{T}_{i} by the correctness of our main algorithm Theorem 9 and Lemma 5. We compute (𝒜ii)\bigwedge(\mathcal{A}_{i}\cup\mathcal{B}_{i}) via Lemma 21. Finally, we compute i=1K𝒯i\bigwedge_{i=1}^{K}\bigwedge\mathcal{T}_{i} in Step 8 by Lemma 18.

Now let us go over the time complexity. Step 1 runs in time 𝒪(mlog2(n)+nlog4(n))\mathcal{O}(m\log^{2}(n)+n\log^{4}(n)) by Theorem 2. Step 2 takes time 𝒪(mlog2n)\mathcal{O}(m\log^{2}n) by Lemma 4. In the for loop, Step 4 takes time 𝒪(m)\mathcal{O}(m) by Lemma 3; Step 5 takes time 𝒪(mlog4n)\mathcal{O}(m\log^{4}n) by Theorem 9; Step 6 takes time 𝒪(nlog2nlogn)\mathcal{O}(n\log^{2}n\log^{*}n) by Lemma 21. Thus the time in the for loop is dominated by Step 4, and the total time taken over the K=𝒪(logn)K=\mathcal{O}(\log n) iterations is 𝒪(mlog5n)\mathcal{O}(m\log^{5}n). The last step takes time 𝒪(nlog2(n))\mathcal{O}(n\log^{2}(n)). Thus the complexity overall is 𝒪(mlog5n)\mathcal{O}(m\log^{5}n).

To finish the proof of the theorem let us handle the case of ε\bigwedge\mathcal{B}_{\varepsilon}. We claim that given the value of λ(G)\lambda(G) we can compute ε\bigwedge\mathcal{B}_{\varepsilon} from εnt\bigwedge\mathcal{B}_{\varepsilon}^{nt} deterministically in 𝒪(m)\mathcal{O}(m) time. In 𝒪(m)\mathcal{O}(m) time we can identify the set Z={vV:ΔG({v})(1+ε)λ(G)}Z=\{v\in V:\Delta_{G}(\{v\})\leq(1+\varepsilon)\lambda(G)\}. Let 𝒟ε={{v,Vv}:vZ}\mathcal{D}_{\varepsilon}=\{\{v,V\setminus v\}:v\in Z\} be the corresponding set of bipartitions and note that ε=(εnt)(𝒟ε)\bigwedge\mathcal{B}_{\varepsilon}=\left(\bigwedge\mathcal{B}_{\varepsilon}^{nt}\right)\wedge\left(\bigwedge\mathcal{D}_{\varepsilon}\right). The meet 𝒟ε\bigwedge\mathcal{D}_{\varepsilon} is simply the partition consisting of the sets {v}\{v\} for vZv\in Z and VZV\setminus Z. To take the meet of this partition with 𝒫=εnt\mathcal{P}=\bigwedge\mathcal{B}_{\varepsilon}^{nt} we simply cycle through each S𝒫S\in\mathcal{P} and split SS into the sets {v}\{v\} for vSZv\in S\cap Z and SZS\setminus Z, which can be done in time 𝒪(n)\mathcal{O}(n). Thus the total time of computing ε\bigwedge\mathcal{B}_{\varepsilon} is dominated by the computation of εnt\bigwedge\mathcal{B}_{\varepsilon}^{nt}, and can be done asymptotically in the same time. ∎

6 Applications

In this section we give two applications of our main result: an improved quantum algorithm for minimum cut in weighted graphs in the adjacency list model, and a new randomized algorithm with running time 𝒪(m+nlog6n)\mathcal{O}(m+n\log^{6}n) to compute the edge connectivity of a simple graph.

6.1 Quantum algorithm for minimum cut in weighted graphs

In a recent work by Apers and Lee [AL21] the quantum complexity of the minimum cut problem was studied. They distinguish two models for querying a weighted graph as an input. In the adjacency matrix model a query is a pair of vertices i,jVi,j\in V and the answer to the query reveals whether {i,j}E\{i,j\}\in E, and if so, also returns the weight w({i,j})w(\{i,j\}). In the adjacency list model a query is a vertex iVi\in V and an integer k[n]k\in[n], and the answer to the query is the kk-th neighbor jj of vertex ii (if it exists) and the corresponding weight w({i,j})w(\{i,j\}). The main results from [AL21] depend on the edge-weight ratio τ\tau, defined as the ratio of the maximum edge weight over the minimum edge weight. These results can be summarized as follows:

  • In the adjacency matrix model, finding a minimum cut of a weighted graph with edge-weight ratio τ\tau has quantum query and time complexity Θ~(n3/2τ)\widetilde{\Theta}(n^{3/2}\sqrt{\tau}). This compares to the Θ(n2)\Theta(n^{2}) query complexity of any classical algorithm for minimum cut in this model [DHHM06].

  • In the adjacency list model, finding a minimum cut of a weighted graph with edge-weight ratio τ\tau requires quantum query complexity 𝒪~(mnτ)\widetilde{\mathcal{O}}(\sqrt{mn\tau}) and quantum time complexity 𝒪~(mnτ+n3/2)\widetilde{\mathcal{O}}(\sqrt{mn\tau}+n^{3/2}). There are also lower bounds of Ω(n3/2)\Omega(n^{3/2}) for τ>1\tau>1 and Ω(τn)\Omega(\tau n) for 1τn1\leq\tau\leq n. This compares to the Θ(m)\Theta(m) query complexity of any classical algorithm for minimum cut in this model [BGMP21].

While this fully resolves the quantum complexity of minimum cut in the adjacency matrix model, there are two apparent gaps in the adjacency list model. On the one hand there is a gap between the upper and lower bounds on the quantum query complexity. On the other hand there is a gap between the upper bounds on the quantum query complexity and the quantum time complexity. Using our new result (Theorem 1) we can close this second gap.

Let κ(n)\kappa(n) denote the (quantum) time complexity for finding a (1+ε)(1+\varepsilon)-KT partition of a weighted graph with nn vertices and 𝒪~(n)\widetilde{\mathcal{O}}(n) edges. The following lemma is proven in [AL21].

Lemma 22 ([AL21, Lemma 22]).

Let GG be a weighted graph with nn vertices, mm edges, and edge-weight ratio τ\tau. There is a quantum algorithm to compute the weight and shores of a minimum cut of GG with time complexity κ(n)+𝒪~(mnτ)\kappa(n)+\widetilde{\mathcal{O}}(\sqrt{mn\tau}) in the adjacency list model.

In [AL21] a quantum algorithm was proposed for finding the KT partition of a weighted graph with mm edges in time 𝒪~(m+n3/2)\widetilde{\mathcal{O}}(m+n^{3/2}), giving an upper bound κ(n)𝒪~(n3/2)\kappa(n)\in\widetilde{\mathcal{O}}(n^{3/2}) and an upper bound 𝒪~(mnτ+n3/2)\widetilde{\mathcal{O}}(\sqrt{mn\tau}+n^{3/2}) on the quantum time complexity. Our main result gives a classical algorithm that improves this upper bound to κ(n)𝒪~(n)\kappa(n)\in\widetilde{\mathcal{O}}(n), and hence this yields a quantum algorithm for minimum cut with time complexity 𝒪~(mnτ)\widetilde{\mathcal{O}}(\sqrt{mn\tau}).

Corollary 23.

Let GG be a weighted graph with nn vertices, mm edges, and edge-weight ratio τ\tau. There is a quantum algorithm to compute the weight and shores of a minimum cut of GG with time complexity 𝒪~(mnτ)\widetilde{\mathcal{O}}(\sqrt{mn\tau}) in the adjacency list model.

6.2 Randomized algorithm for edge connectivity

We can use our algorithm for finding the KT partition of a weighted graph to give a randomized algorithm that computes the edge connectivity of a simple graph GG with high probability in time 𝒪(m+nlog6n)\mathcal{O}(m+n\log^{6}n). For graphs that are not too sparse this equals the best known 𝒪(m+nlog2n)\mathcal{O}(m+n\log^{2}n) complexity of the random contraction based algorithm by Ghaffari, Nowicki and Thorup [GNT20].

Our new algorithm uses the key idea from Kawarabayashi and Thorup [KT19]: (i) find the KT partition of the graph and contract the components of the partition, and (ii) find a minimum cut in the contracted graph. By definition of the KT partition, this contraction will preserve the set of non-trivial minimum cuts, and so it suffices to find a minimum cut in the contracted graph and the minimum degree of a vertex. Moreover, the contracted graph has only 𝒪(n)\mathcal{O}(n) edges and so we can find a minimum cut in this graph quickly.

Our algorithm follows the same blueprint, except that in order to obtain 𝒪(m)\mathcal{O}(m) leading complexity we first find an ε\varepsilon-cut sparsifier FF of the input graph, for a small constant ε\varepsilon. For this step we can use the 𝒪(m)\mathcal{O}(m) sparsification algorithm from Fung, Hariharan, Harvey and Panigrahi [FHHP19, Theorem 1.22]. Provided that the sparsification step is successful, any minimum cut of the original simple graph GG will be a γ\gamma-near minimum cut of FF for γ=(1+ε)/(1ε)1+3ε\gamma=(1+\varepsilon)/(1-\varepsilon)\leq 1+3\varepsilon. Thus if we find a (1+3ε)(1+3\varepsilon)-KT partition of FF and contract the sets of resulting partition in GG we obtain a multigraph GG^{\prime} which preserves all non-trivial minimum cuts of GG. In this way we only need to find the KT partition of FF, which has 𝒪(nlogn)\mathcal{O}(n\log n) edges rather than mm edges. On the other hand, the sparsifier FF will in general be weighted, and hence we cannot run the near-linear time algorithm from [KT19] to find its KT partition. This is a prime example where finding the KT partition of a weighted graph is very useful.

The next theorem fleshes out this algorithm. For this, we need the fact that for a simple graph there are only 𝒪(n)\mathcal{O}(n) inter-component edges in a KT partition. We take the version from [AL21] which gives an explicit constant in the bound.

Lemma 24 ([RSW18, Lemma 2.6],[AL21, Lemma 2]).

Let G=(V,E)G=(V,E) be a simple graph with |V|=n|V|=n. Let d=minuVdeg(u)d=\min_{u\in V}\deg(u). For a nonnegative ε<1\varepsilon<1, let 𝒯={X:|X|,|X¯|2 and |ΔG(X)|λ(G)+εd}\mathcal{T}=\{X:|X|,|\overline{X}|\geq 2\mbox{ and }|\Delta_{G}(X)|\leq\lambda(G)+\varepsilon d\} and let GG^{\prime} be the multigraph formed from GG by contracting the sets in 𝒯\bigwedge\mathcal{T}. Then

|E(G)|68n(1ε)2.|E(G^{\prime})|\leq\frac{68n}{(1-\varepsilon)^{2}}\enspace.
Algorithm 6 Randomized algorithm for edge connectivity

Input: Adjacency list access to a simple graph GG.
      Output: A minimum cut of GG.

1:Find a vertex vv with minimum degree dmind_{\min}. \triangleright 𝒪(m)\mathcal{O}(m) time.
2:Construct an 1/1001/100-cut sparsifier FF of GG with 𝒪(nlogn)\mathcal{O}(n\log n) edges. \triangleright 𝒪(m)\mathcal{O}(m) time by [FHHP19].
3:Find the (101/99)(101/99)-KT partition 𝒮={S1,,Sk}\mathcal{S}=\{S_{1},\dots,S_{k}\} of FF using Theorem 1. \triangleright 𝒪(nlog6n)\mathcal{O}(n\log^{6}n) time.
4:Contract the components S1,,SkS_{1},\dots,S_{k} and let GG^{\prime} be the resulting multigraph. If GG^{\prime} has at most 100n100n edges find a minimum cut CC of GG^{\prime}, otherwise abort. \triangleright Time 𝒪(m+nlog2n)\mathcal{O}(m+n\log^{2}n) using the minimum cut algorithm of [GMW20] from Lemma 4.
5:If dmin|C|d_{\min}\leq|C|, return the outgoing edges from vv. Otherwise, return CC.
Theorem 25.

Let GG be a simple graph with mm edges. There is a classical randomized algorithm that runs in time 𝒪(m+nlog6n)\mathcal{O}(m+n\log^{6}n) and with high probability outputs the edge connectivity of GG and a cut realizing this value.

Proof.

The algorithm is given in Algorithm 6. The time complexity of each step is given in the comments. Let us prove correctness.

The algorithm either outputs a trivial cut or a cut from a contraction GG^{\prime} of GG. As contraction cannot decrease the edge connectivity, if the edge connectivity of GG is realized by a trivial cut the algorithm will be correct. Let us now assume that the edge connectivity λ(G)\lambda(G) is realized by a non-trivial cut C=ΔG(S)C^{*}=\Delta_{G}(S). In step 2 we use the sparsification algorithm of Fung, Hariharan, Harvey and Panigrahi [FHHP19, Theorem 1.22] to find a 1/1001/100-cut sparsifier F=(V,EF,wF)F=(V,E_{F},w_{F}) of GG, which succeeds with high probability. Thus with high probability wF(ΔF(S))(1+1/100)λ(G)w_{F}(\Delta_{F}(S))\leq(1+1/100)\lambda(G). Also with high probability the weight of a minimum cut of FF is at least (11/100)λ(G)(1-1/100)\lambda(G), in which case ΔF(S)\Delta_{F}(S) will be a 101/99101/99-near minimum cut of FF. Hence with high probability the (101/99)(101/99)-KT partition of FF will be a refinement of {S,S¯}\{S,\bar{S}\}, and in the contraction GG^{\prime} it will hold that |ΔG(S)|=λ(G)|\Delta_{G^{\prime}}(S)|=\lambda(G) and so the edge connectivity of GG^{\prime} is λ(G)\lambda(G). Further, if FF is a valid 1/1001/100-cut sparsifer of GG it will hold that GG^{\prime} has at most 100n100n edges by Lemma 24 and so we can find a minimum cut CC of GG^{\prime} in time 𝒪(nlog2n)\mathcal{O}(n\log^{2}n) using the minimum cut algorithm of [GMW20] given in Lemma 4. Thus in this case with high probability CC will be a cut realizing the edge connectivity of GG and the algorithm is correct. ∎

7 Discussion

We find the (1+ε)(1+\varepsilon)-KT partition of a weighted graph in near-linear time for any 0ε1/160\leq\varepsilon\leq 1/16. The near-linear time deterministic algorithm of Kawarabayashi and Thorup [KT19] to find a KT-partition of a simple graph differs from ours with respect to the parameters in an interesting respect. Recall that we defined εnt(G)\mathcal{B}_{\varepsilon}^{nt}(G) to be the set of all bipartitions {S,S¯}\{S,\bar{S}\} of the vertex set corresponding to non-trivial cuts whose weight is at most (1+ε)λ(G)(1+\varepsilon)\lambda(G), and a (1+ε)(1+\varepsilon)-KT partition to be εnt\bigwedge\mathcal{B}_{\varepsilon}^{nt}. Kawarabayashi and Thorup consider the larger set of bipartitions 𝒦εnt(G)\mathcal{K}_{\varepsilon}^{nt}(G) corresponding to non-trival cuts of weight at most λ(G)+εd\lambda(G)+\varepsilon d, where dd is the minimum degree of GG. When GG is simple they can compute 𝒦εnt(G)\bigwedge\mathcal{K}_{\varepsilon}^{nt}(G) for any ε<1\varepsilon<1 in near-linear time. Thus it is stronger than our result with respect to the parameters in two ways: it allows any ε<1\varepsilon<1 and also lets ε\varepsilon multiply the minimum degree rather than λ(G)\lambda(G).

There is an inherent barrier to extending the 2-respecting cut framework we employ here to this parameter regime. The reason is that Karger’s tree packing lemma [Kar00, Lemma 2.3] only shows that a cut of weight <3λ(G)/2<3\lambda(G)/2 will 2-respect a positive fraction of the trees from a maximum tree packing. To handle cuts of weight 3λ(G)/23\lambda(G)/2 one would have to move instead to considering 3-respecting cuts, which seems to add a good deal of complexity. Thus while we have not tried to optimize the constant 1/161/16, there is a natural barrier to extending our methods for ε1/2\varepsilon\geq 1/2. Pushing to larger ε\varepsilon and also allowing ε\varepsilon to multiply the minimum weight of a vertex rather than λ(G)\lambda(G) both seem to require new techniques, and we leave this as an open question.

References

  • [AL21] Simon Apers and Troy Lee. Quantum complexity of minimum cut. In Proceedings of the 36th Computational Complexity Conference (CCC ’21), page 28:1–28:3. LIPIcs, 2021.
  • [Ben95] András A. Benczúr. A representation of cuts within 6/5 times the edge connectivity with applications. In Proceedings of 36th Annual Symposium on Foundations of Computer Science (FOCS ’95), pages 92–102. IEEE Computer Society, 1995.
  • [Ben97] András Benczúr. Cut structures and randomized algorithms in edge-connectivity problems. PhD thesis, MIT, 1997.
  • [BF00] Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Proceedings of 4th Latin American Symposium on Theoretical Informatics (LATIN ’00), pages 88–94. Springer, 2000.
  • [BG08] András A. Benczúr and Michel X. Goemans. Deformable Polygon Representation and Near-Mincuts, pages 103–135. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008.
  • [BGMP21] Arijit Bishnu, Arijit Ghosh, Gopinath Mishra, and Manaswi Paraashar. Query complexity of global minimum cut. In Proceedings of the 24th international conference on Approximation Algorithms for Combinatorial Optimization Problems (APPROX ’21), 2021.
  • [BLS20] Nalin Bhardwaj, Antonio Molina Lovett, and Bryce Sandlund. A simple algorithm for minimum cuts in near-linear time. In 17th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT ’20). Schloss Dagstuhl–Leibniz-Zentrum für Informatik, 2020.
  • [DHHM06] Christoph Dürr, Mark Heiligman, Peter Høyer, and Mehdi Mhalla. Quantum query complexity of some graph problems. SIAM Journal on Computing, 35(6):1310–1328, 2006.
  • [DKL76] Efim A Dinitz, Alexander V Karzanov, and Michael V Lomonosov. On the structure of the system of minimum edge cuts in a graph. Issledovaniya po Diskretnoi Optimizatsii (Studies in Discrete Optimization), pages 290–306, 1976. Appeared in Russian.
  • [FHHP19] Wai-Shing Fung, Ramesh Hariharan, Nicholas JA Harvey, and Debmalya Panigrahi. A general framework for graph sparsification. SIAM Journal on Computing, 48(4):1196–1223, 2019.
  • [Gab95] Harold N. Gabow. A matroid approach to finding edge connectivity and packing arborescences. Journal of Computer and System Sciences, 50(2):259–273, 1995.
  • [GH61] Ralph E. Gomory and Te C. Hu. Multi-terminal network flows. Journal of the Society for Industrial and Applied Mathematics, 9(4):551–570, 1961.
  • [GMW20] Pawel Gawrychowski, Shay Mozes, and Oren Weimann. Minimum cut in O(mlog2n)O(m\log^{2}n) time. In Proceedings of the 47th International Colloquium on Automata, Languages, and Programming (ICALP ’20), volume 168 of LIPIcs, pages 57:1–57:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
  • [GNT20] Mohsen Ghaffari, Krzysztof Nowicki, and Mikkel Thorup. Faster algorithms for edge connectivity via random 2-out contractions. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA ’20), pages 1260–1279. SIAM, 2020.
  • [GSS11] Shayan Oveis Gharan, Amin Saberi, and Mohit Singh. A randomized rounding approach to the traveling salesman problem. In 52nd Annual IEEE Symposium on Foundations of Computer Science (FOCS ’11), pages 550–559. IEEE, 2011.
  • [HRW20] Monika Henzinger, Satish Rao, and Di Wang. Local flow partitioning for faster edge connectivity. SIAM Journal on Computing, 49(1):1–36, 2020.
  • [HT84] Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 13(2):338–355, 1984.
  • [Kar00] David R. Karger. Minimum cuts in near-linear time. Journal of the ACM, 47(1):46–76, 2000. Announced at STOC 1996.
  • [KKG21] Anna R Karlin, Nathan Klein, and Shayan Oveis Gharan. A (slightly) improved approximation algorithm for metric TSP. In 53rd Annual ACM-SIGACT Symposium on Theory of Computing (STOC ’21), pages 32–45, 2021.
  • [KP09] David R. Karger and Debmalya Panigrahi. A near-linear time algorithm for constructing a cactus representation of minimum cuts. In Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’09), pages 246–255. SIAM, 2009.
  • [KT19] Ken-ichi Kawarabayashi and Mikkel Thorup. Deterministic edge connectivity in near-linear time. Journal of the ACM, 66(1):4:1–4:50, 2019. Announced at STOC 2015.
  • [Li21] Jason Li. Deterministic mincut in almost-linear time. In Proceedings of the 53rd Annual ACM Symposium on Theory of Computing (STOC ’21), pages 384–395. ACM, 2021.
  • [LST20] On-Hei S. Lo, Jens M. Schmidt, and Mikkel Thorup. Compact cactus representations of all non-trivial min-cuts. Discrete Applied Mathematics, 303:296–304, 2020.
  • [MN20] Sagnik Mukhopadhyay and Danupon Nanongkai. Weighted min-cut: sequential, cut-query, and streaming algorithms. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC ’20), pages 496–509. ACM, 2020.
  • [MSU97] Kurt Mehlhorn, R. Sundar, and Christian Uhrig. Maintaining dynamic sequences under equality tests in polylogarithmic time. Algorithmica, 17(2):183–198, 1997.
  • [NMN01] Jaroslav Nesetril, Eva Milková, and Helena Nesetrilová. Otakar Borůvka on the minimum spanning tree problem: Translation of both the 1926 papers, comments, history. Discrete Mathematics, 233(1-3):3–36, 2001.
  • [RSW18] Aviad Rubinstein, Tselil Schramm, and S. Matthew Weinberg. Computing exact minimum cuts without knowing the graph. In Proceedings of the 9th Innovations in Theoretical Computer Science Conference (ITCS ’18), pages 39:1–39:16. LIPIcs, 2018.
  • [ST83] Daniel D Sleator and Robert Endre Tarjan. A data structure for dynamic trees. Journal of Computer and System Sciences, 26(3):362–391, 1983.

Appendix A Data structures

We first show how to implement categorical top two queries on an array while allowing updates to add Δ\Delta to the scores in an interval. This can be accomplished using a well-known binary tree data structure. We will then port this construction to a tree TT by means of the heavy path decomposition of TT [ST83, HT84].

The key to the binary tree data structure is the following simple fact. For a node uu of a tree let int(u)\mathrm{int}(u) be the set of labels of leaves that are descendants of uu.

Fact 26.

Let nn be a power of 2 and TT a complete binary tree with nn leaves labeled by 1,,n1,\ldots,n. For any interval [i,j][i,j] there are 𝒪(logn)\mathcal{O}(\log n) many nodes u1,,utu_{1},\ldots,u_{t} such that [i,j]=int(u1)int(ut)[i,j]=\mathrm{int}(u_{1})\sqcup\cdots\sqcup\mathrm{int}(u_{t}). Moreover u1,,utu_{1},\ldots,u_{t} can be found in 𝒪(logn)\mathcal{O}(\log n) time, and the total number of ancestors of u1,,utu_{1},\ldots,u_{t} is 𝒪(logn)\mathcal{O}(\log n).

See 6

Proof.

By padding the array with scores of infinity and an arbitrary color we may assume that nn is a power of 22. The data structure will be a complete binary tree BB with nn leaves labeled as 1,,n\ell_{1},\ldots,\ell_{n}. Each leaf stores a three tuple (score,index,color)(\mathrm{score},\mathrm{index},\mathrm{color}) and at leaf ii this three tuple is initialized to be (A[i].score,i,A[i].color)(A[i].\mathrm{score},i,A[i].\mathrm{color}). Every internal node uu will store a pair of such 3-tuples. The data structure will maintain the invariant (Invariant 1) that at every internal node uu the indices in this pair of three tuples is the answer to the categorical top two query for the interval int(u)\mathrm{int}(u). The answer to a categorical top two query for the interval int(u)\mathrm{int}(u) can be computed in constant time from answers to this query at the children of uu. Thus in time 𝒪(n)\mathcal{O}(n) we can propagate the answers to the categorical top two queries from the leaves to the root so that Invariant 1 holds.

Each node uu will also store an update\mathrm{update} value u.updateu.\mathrm{update}. We initialize the leaves to have i.update=A[i].score\ell_{i}.\mathrm{update}=A[i].\mathrm{score} and set the update value of all internal nodes of the tree to be zero. Thus we have the property (Invariant 2) that the sum of the update values from i\ell_{i} to the root is A[i]A[i], which will be maintained under the Add(Δ,i,j)\textsc{Add}(\Delta,i,j) updates. This completes the pre-processing step and the total pre-processing time is 𝒪(n)\mathcal{O}(n).

We now show that after an update we can adjust the tree to maintain Invariant 1 and Invariant 2 in 𝒪(logn)\mathcal{O}(\log n) time. If the invariants hold, then we can answer a categorical top two query for the interval [i,j][i,j] in time 𝒪(logn)\mathcal{O}(\log n). This is done by first using 26 to find in 𝒪(logn)\mathcal{O}(\log n) time nodes u1,,utu_{1},\ldots,u_{t} such that int(u1),,int(ut)\mathrm{int}(u_{1}),\ldots,\mathrm{int}(u_{t}) form a partition of [i,j][i,j]. Then by building a binary tree on top of u1,,utu_{1},\ldots,u_{t} and propagating the categorical top two query answers up this tree we can answer the categorical top two query for [i,j][i,j] in time 𝒪(logn)\mathcal{O}(\log n).

To restore the invariants after Add(Δ,i,j)\textsc{Add}(\Delta,i,j), we use 26 to find in 𝒪(logn)\mathcal{O}(\log n) time nodes u1,,utu_{1},\ldots,u_{t} such that int(ut),,int(ut)\mathrm{int}(u_{t}),\ldots,\mathrm{int}(u_{t}) form a partition of [i,j][i,j]. Then for each i=1,,ti=1,\ldots,t we set ui.updateui.update+Δu_{i}.\mathrm{update}\leftarrow u_{i}.\mathrm{update}+\Delta. This restores Invariant 2 under the update. To restore Invariant 1, we recompute the answers to the categorical top two query at all ancestors of u1,,utu_{1},\ldots,u_{t}. By 26 there are only 𝒪(logn)\mathcal{O}(\log n) many such ancestors, thus we can perform this computation in 𝒪(logn)\mathcal{O}(\log n) time as well. ∎

In order to extend this structure to a general tree TT, we first construct its heavy path decomposition. Next, we concatenate the heavy paths to form a list of all edges of TT with the property that any subtree TeT_{e} is described by a contiguous range of edges (but potentially containing many heavy paths). This is done recursively as follows. Let the topmost heavy path be h=u1u2ukh=u_{1}-u_{2}-\ldots u_{k}. We first write down its edges (u1,u2),(u2,u3),,(uk1,uk)(u_{1},u_{2}),(u_{2},u_{3}),\ldots,(u_{k-1},u_{k}). Then, we remove them from the tree. We recurse on the trees consisting of more than one node rooted at uk,uk1,,u1u_{k},u_{k-1},\ldots,u_{1} (note that uku_{k} is always a root of tree consisting of size 1), in this order. This guarantees that, for any ehe\in h, TeT_{e} indeed consists of a contiguous range of edges, while for other edges this is guaranteed recursively.

See 7

Proof.

Consider a heavy path decomposition of TT, and construct the edge array A[1..(n1)]A[1..(n-1)] by concatenating the heavy paths as described above. We will use the data structure from Lemma 6 on A[1..(n1)]A[1..(n-1)]. Any path pp can be decomposed into 𝒪(logn)\mathcal{O}(\log n) infixes of heavy paths (in fact, at most one proper infix and a number of prefixes), and hence it corresponds to 𝒪(logn)\mathcal{O}(\log n) contiguous ranges of A[1..(n1)]A[1..(n-1)]. Hence we implement the first operation by making 𝒪(logn)\mathcal{O}(\log n) calls to Add(Δ,i,j)\textsc{Add}(\Delta,i,j), by Lemma 6 this takes 𝒪(logn)\mathcal{O}(\log n) time. Finally, since TeT_{e} is described by a single contiguous range A[i..j]A[i..j], a categorical top-two query in TeT_{e} corresponds to operation CatTopTwo(i,j)\textsc{CatTopTwo}(i,j), which again takes time 𝒪(logn)\mathcal{O}(\log n). ∎