This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\Copyright

Tobias Heuer, Peter Sanders and Sebastian Schlag

Network Flow-Based Refinement for Multilevel Hypergraph Partitioning

Tobias Heuer Karlsruhe Institute of Technology, Karlsruhe, Germany
tobias.heuer@gmx.net
Peter Sanders Karlsruhe Institute of Technology, Karlsruhe, Germany
sanders@kit.edu
Sebastian Schlag Karlsruhe Institute of Technology, Karlsruhe, Germany
sebastian.schlag@kit.edu
Abstract.

We present a refinement framework for multilevel hypergraph partitioning that uses max-flow computations on pairs of blocks to improve the solution quality of a kk-way partition. The framework generalizes the flow-based improvement algorithm of KaFFPa from graphs to hypergraphs and is integrated into the hypergraph partitioner KaHyPar. By reducing the size of hypergraph flow networks, improving the flow model used in KaFFPa, and developing techniques to improve the running time of our algorithm, we obtain a partitioner that computes the best solutions for a wide range of benchmark hypergraphs from different application areas while still having a running time comparable to that of hMetis.

Key words and phrases:
multilevel hypergraph partitioning, network flows, refinement
1991 Mathematics Subject Classification:
G.2.2 Graph Theory, G.2.3 Applications

1. Introduction

Given an undirected hypergraph H=(V,E)H=(V,E), the kk-way hypergraph partitioning problem is to partition the vertex set into kk disjoint blocks of bounded size (at most 1+ε1+\varepsilon times the average block size) such that an objective function involving the cut hyperedges is minimized. Hypergraph partitioning (HGP) has many important applications in practice such as scientific computing [12] or VLSI design [43]. Particularly VLSI design is a field where small improvements can lead to significant savings [56].

It is well known that HGP is NP-hard [38], which is why practical applications mostly use heuristic multilevel algorithms [11, 13, 25, 26]. These algorithms successively contract the hypergraph to obtain a hierarchy of smaller, structurally similar hypergraphs. After applying an initial partitioning algorithm to the smallest hypergraph, contraction is undone and, at each level, a local search method is used to improve the partitioning induced by the coarser level. All state-of-the-art HGP algorithms [2, 4, 7, 16, 28, 31, 32, 33, 48, 51, 52, 54] either use variations of the Kernighan-Lin (KL) [34, 49] or the Fiduccia-Mattheyses (FM) heuristic [19, 46], or simpler greedy algorithms [32, 33] for local search. These heuristics move vertices between blocks in descending order of improvements in the optimization objective (gain) and are known to be prone to get stuck in local optima when used directly on the input hypergraph [33]. The multilevel paradigm helps to some extent, since it allows a more global view on the problem on the coarse levels and a very fine-grained view on the fine levels of the multilevel hierarchy. However, the performance of move-based approaches degrades for hypergraphs with large hyperedges. In these cases, it is difficult to find meaningful vertex moves that improve the solution quality because large hyperedges are likely to have many vertices in multiple blocks [53]. Thus the gain of moving a single vertex to another block is likely to be zero [41].

While finding balanced minimum cuts in hypergraphs is NP-hard, a minimum cut separating two vertices can be found in polynomial time using network flow algorithms and the well-known max-flow min-cut theorem [21]. Flow algorithms find an optimal min-cut and do not suffer the drawbacks of move-based approaches. However, they were long overlooked as heuristics for balanced partitioning due to their high complexity [40, 57]. In the context of graph partitioning, Sanders and Schulz [47] recently presented a max-flow-based improvement algorithm which is integrated into the multilevel partitioner KaFFPa and computes high quality solutions.

Outline and Contribution.

Motivated by the results of Sanders and Schulz [47], we generalize the max-flow min-cut refinement framework of KaFFPa from graphs to hypergraphs. After introducing basic notation and giving a brief overview of related work and the techniques used in KaFFPa in Section 2, we explain how hypergraphs are transformed into flow networks and present a technique to reduce the size of the resulting hypergraph flow network in Section 3.1. In Section 3.2 we then show how this network can be used to construct a flow problem such that the min-cut induced by a max-flow computation between a pair of blocks improves the solution quality of a kk-way partition. We furthermore identify shortcomings of the KaFFPa approach that restrict the search space of feasible solutions significantly and introduce an advanced model that overcomes these limitations by exploiting the structure of hypergraph flow networks. We implemented our algorithm in the open source HGP framework KaHyPar and therefore briefly discuss implementation details and techniques to improve the running time in Section 3.3. Extensive experiments presented in Section 4 demonstrate that our flow model yields better solutions than the KaFFPa approach for both hypergraphs and graphs. We furthermore show that using pairwise flow-based refinement significantly improves partitioning quality. The resulting hypergraph partitioner, KaHyPar-MF, performs better than all competing algorithms on all instance classes and still has a running time comparable to that of hMetis. On a large benchmark set consisting of 3222 instances from various application domains, KaHyPar-MF computes the best partitions in 2427 cases.

2. Preliminaries

2.1. Notation and Definitions

An undirected hypergraph H=(V,E,c,ω)H=(V,E,c,\omega) is defined as a set of nn vertices VV and a set of mm hyperedges/nets EE with vertex weights c:V>0c:V\rightarrow\mathbb{R}_{>0} and net weights ω:E>0\omega:E\rightarrow\mathbb{R}_{>0}, where each net is a subset of the vertex set VV (i.e., eVe\subseteq V). The vertices of a net are called pins. We extend cc and ω\omega to sets, i.e., c(U):=vUc(v)c(U):=\sum_{v\in U}c(v) and ω(F):=eFω(e)\omega(F):=\sum_{e\in F}\omega(e). A vertex vv is incident to a net ee if vev\in e. I(v)\mathrm{I}(v) denotes the set of all incident nets of vv. The degree of a vertex vv is d(v):=|I(v)|d(v):=|\mathrm{I}(v)|. The size |e||e| of a net ee is the number of its pins. Given a subset VVV^{\prime}\subset V, the subhypergraph HVH_{V^{\prime}} is defined as HV:=(V,{eV|eE:eV})H_{V^{\prime}}:=(V^{\prime},\{e\cap V^{\prime}\leavevmode\nobreak\ |\leavevmode\nobreak\ e\in E:e\cap V^{\prime}\neq\emptyset\}).

A kk-way partition of a hypergraph HH is a partition of its vertex set into kk blocks Π={V1,,Vk}\mathrm{\Pi}=\{V_{1},\dots,V_{k}\} such that i=1kVi=V\bigcup_{i=1}^{k}V_{i}=V, ViV_{i}\neq\emptyset for 1ik1\leq i\leq k, and ViVj=V_{i}\cap V_{j}=\emptyset for iji\neq j. We call a kk-way partition Π\mathrm{\Pi} ε\mathrm{\varepsilon}-balanced if each block ViΠV_{i}\in\mathrm{\Pi} satisfies the balance constraint: c(Vi)Lmax:=(1+ε)c(V)kc(V_{i})\leq L_{\max}:=(1+\varepsilon)\lceil\frac{c(V)}{k}\rceil for some parameter ε\mathrm{\varepsilon}. For each net ee, Λ(e):={Vi|Vie}\mathrm{\Lambda}(e):=\{V_{i}\leavevmode\nobreak\ |\leavevmode\nobreak\ V_{i}\cap e\neq\emptyset\} denotes the connectivity set of ee. The connectivity of a net ee is λ(e):=|Λ(e)|\mathrm{\lambda}(e):=|\mathrm{\Lambda}(e)|. A net is called cut net if λ(e)>1\mathrm{\lambda}(e)>1. Given a kk-way partition Π\mathrm{\Pi} of HH, the quotient graph Q:=(Π,{(Vi,Vj)|eE:{Vi,Vj}Λ(e)})Q:=(\mathrm{\Pi},\{(V_{i},V_{j})\leavevmode\nobreak\ |\leavevmode\nobreak\ \exists e\in E:\{V_{i},V_{j}\}\subseteq\mathrm{\Lambda}(e)\}) contains an edge between each pair of adjacent blocks. The kk-way hypergraph partitioning problem is to find an ε\varepsilon-balanced kk-way partition Π\mathrm{\Pi} of a hypergraph HH that minimizes an objective function over the cut nets for some ε\varepsilon. Several objective functions exist in the literature [5, 38]. The most commonly used cost functions are the cut-net metric cut(Π):=eEω(e)\text{cut}(\mathrm{\Pi}):=\sum_{e\in E^{\prime}}\omega(e) and the connectivity metric (λ1)(Π):=eE(λ(e)1)ω(e)(\mathrm{\lambda}-1)(\mathrm{\Pi}):=\sum_{e\in E^{\prime}}(\mathrm{\lambda}(e)-1)\leavevmode\nobreak\ \omega(e) [1], where EE^{\prime} is the set of all cut nets [17]. In this paper, we use the (λ1)(\lambda-1)-metric. Optimizing both objective functions is known to be NP-hard [38]. Hypergraphs can be represented as bipartite graphs [29]. In the following, we use nodes and edges when referring to graphs and vertices and nets when referring to hypergraphs. In the bipartite graph G(V˙E,F)G_{*}(V\dot{\cup}E,F) the vertices and nets of HH form the node set and for each net eI(v)e\in\mathrm{I}(v), we add an edge (e,v)(e,v) to GG_{*}. The edge set FF is thus defined as F:={(e,v)|eE,ve}F:=\{(e,v)\leavevmode\nobreak\ |\leavevmode\nobreak\ e\in E,v\in e\}. Each net in EE therefore corresponds to a star in GG_{*}.

Let G=(V,E,c,ω)G=(V,E,c,\omega) be a weighted directed graph. We use the same notation as for hypergraphs to refer to node weights cc, edge weights ω\omega, and node degrees d(v)d(v). Furthermore Γ(u):={v:(u,v)E}\mathrm{\Gamma(u)}:=\{v:(u,v)\in E\} denotes the neighbors of node uu. A path P=v1,,vkP=\langle v_{1},\ldots,v_{k}\rangle is a sequence of nodes, such that each pair of consecutive nodes is connected by an edge. A strongly connected component CVC\subseteq V is a set of nodes such that for each u,vCu,v\in C there exists a path from uu to vv. A topological ordering is a linear ordering \prec of VV such that every directed edge (u,v)E(u,v)\in E implies uvu\prec v in the ordering. A set of nodes BVB\subseteq V is called a closed set iff there are no outgoing edges leaving BB, i.e., if the conditions uBu\in B and (u,v)E(u,v)\in E imply vBv\in B. A subset SVS\subset V is called a node separator if its removal divides GG into two disconnected components.

A flow network 𝒩=(𝒱,,𝒸)\mathcal{N}=(\mathcal{V},\mathcal{E},\mathpzc{c}) is a directed graph with two distinguished nodes 𝓈\mathpzc{s} and 𝓉\mathpzc{t} in which each edge ee\in\mathcal{E} has a capacity 𝒸()0\mathpzc{c}(e)\geq 0. An (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-flow (or flow) is a function f:𝒱×𝒱f:\mathcal{V}\times\mathcal{V}\rightarrow\mathbb{R} that satisfies the capacity constraint u,v𝒱:f(u,v)𝒸(𝓊,𝓋)\forall u,v\in\mathcal{V}:f(u,v)\leq\mathpzc{c}(u,v), the skew symmetry constraint v𝒱×𝒱:f(u,v)=f(v,u)\forall v\in\mathcal{V}\times\mathcal{V}:f(u,v)=-f(v,u), and the flow conservation constraint u𝒱{𝓈,𝓉}:𝓋𝒱𝒻(𝓊,𝓋)=0\forall u\in\mathcal{V}\setminus\{\mathpzc{s},\mathpzc{t}\}:\sum_{v\in\mathcal{V}}f(u,v)=0. The value of a flow |f|:=v𝒱f(𝓈,𝓋)|f|:=\sum_{v\in\mathcal{V}}{f(\mathpzc{s},v)} is defined as the total amount of flow transferred from 𝓈\mathpzc{s} to 𝓉\mathpzc{t}. The residual capacity is defined as rf(u,v)=𝒸(𝓊,𝓋)𝒻(𝓊,𝓋)r_{f}(u,v)=\mathpzc{c}(u,v)-f(u,v). Given a flow ff, 𝒩f=(𝒱,f,rf)\mathcal{N}_{f}=(\mathcal{V},\mathcal{E}_{f},r_{f}) with f={(u,v)𝒱×𝒱|rf(u,v)>0}\mathcal{E}_{f}=\{(u,v)\in\mathcal{V}\times\mathcal{V}\ |\ r_{f}(u,v)>0\} is the residual network. An (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-cut (or cut) is a bipartition (𝒮,𝒱𝒮)(\mathcal{S},\mathcal{V}\setminus\mathcal{S}) of a flow network 𝒩\mathcal{N} with 𝓈𝒮𝒱\mathpzc{s}\in\mathcal{S}\subset\mathcal{V} and 𝓉𝒱𝒮\mathpzc{t}\in\mathcal{V}\setminus\mathcal{S}. The capacity of an (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-cut is defined as e𝒸()\sum_{e\in\mathcal{E}^{\prime}}\mathpzc{c}(e), where ={(u,v):u𝒮,v𝒱𝒮}\mathcal{E}^{\prime}=\{(u,v)\in\mathcal{E}:u\in\mathcal{S},v\in\mathcal{V}\setminus\mathcal{S}\}. The max-flow min-cut theorem states that the value |f||f| of a maximum flow is equal to the capacity of a minimum cut separating 𝓈\mathpzc{s} and 𝓉\mathpzc{t} [21].

2.2. Related Work

Hypergraph Partitioning.

Driven by applications in VLSI design and scientific computing, HGP has evolved into a broad research area since the 1990s. We refer to [5, 8, 43, 50] for an extensive overview. Well-known multilevel HGP software packages with certain distinguishing characteristics include PaToH [12] (originating from scientific computing), hMetis [32, 33] (originating from VLSI design), KaHyPar [2, 28, 48] (general purpose, nn-level), Mondriaan [54] (sparse matrix partitioning), MLPart [4] (circuit partitioning), Zoltan [16], Parkway [51] and SHP [31] (distributed), UMPa [52] (directed hypergraph model, multi-objective), and kPaToH (multiple constraints, fixed vertices) [7]. All of these tools either use variations of the Kernighan-Lin (KL) [34, 49] or the Fiduccia-Mattheyses (FM) heuristic [19, 46], or algorithms that greedily move vertices [33] or nets [32] to improve solution quality in the refinement phase.

Flows on Hypergraphs.

While flow-based approaches have not yet been considered as refinement algorithms for multilevel HGP, several works deal with flow-based hypergraph min-cut computation. The problem of finding minimum (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-cuts in hypergraphs was first considered by Lawler [36], who showed that it can be reduced to computing maximum flows in directed graphs. Hu and Moerder [29] present an augmenting path algorithm to compute a minimum-weight vertex separator on the star-expansion of the hypergraph. Their vertex-capacitated network can also be transformed into an edge-capacitated network using a transformation due to Lawler [37]. Yang and Wong [57] use repeated, incremental max-flow min-cut computations on the Lawler network [36] to find ε\varepsilon-balanced hypergraph bipartitions. Solution quality and running time of this algorithm are improved by Lillis and Cheng [39] by introducing advanced heuristics to select source and sink nodes. Furthermore, they present a preflow-based [22] min-cut algorithm that implicitly operates on the star-expanded hypergraph. Pistorius and Minoux [45] generalize the algorithm of Edmonds and Karp [18] to hypergraphs by labeling both vertices and nets. Liu and Wong [40] simplify Lawler’s hypergraph flow network [36] by explicitly distinguishing between graph edges and hyperedges with three or more pins. This approach significantly reduces the size of flow networks derived from VLSI hypergraphs, since most of the nets in a circuit are graph edges. Note that the above-mentioned approaches to model hypergraphs as flow networks for max-flow min-cut computations do not contradict the negative results of Ihler et al. [30], who show that, in general, there does not exist an edge-weighted graph G=(V,E)G=(V,E) that correctly represents the min-cut properties of the corresponding hypergraph H=(V,E)H=(V,E).

Flow-Based Graph Partitioning.

Flow-based refinement algorithms for graph partitioning include Improve [6] and MQI [35], which improve expansion or conductance of bipartitions. MQI also yields as small improvement when used as a post processing technique on hypergraph bipartitions initially computed by hMetis [35]. FlowCutter [24] uses an approach similar to Yang and Wong [57] to compute graph bisections that are Pareto-optimal in regard to cut size and balance. Sanders and Schulz [47] present a flow-based refinement framework for their direct kk-way graph partitioner KaFFPa. The algorithm works on pairs of adjacent blocks and constructs flow problems such that each min-cut in the flow network is a feasible solution in regard to the original partitioning problem.

KaHyPar.

Since our algorithm is integrated into the KaHyPar framework, we briefly review its core components. While traditional multilevel HGP algorithms contract matchings or clusterings and therefore work with a coarsening hierarchy of 𝒪(logn)\mathcal{O}\!\left(\log n\right) levels, KaHyPar instantiates the multilevel paradigm in the extreme nn-level version, removing only a single vertex between two levels. After coarsening, a portfolio of simple algorithms is used to create an initial partition of the coarsest hypergraph. During uncoarsening, strong localized local search heuristics based on the FM algorithm [19, 46] are used to refine the solution. Our work builds on KaHyPar-CA [28], which is a direct kk-way partitioning algorithm for optimizing the (λ1)(\lambda-1)-metric. It uses an improved coarsening scheme that incorporates global information about the community structure of the hypergraph into the coarsening process.

2.3. The Flow-Based Improvement Framework of KaFFPa

We discuss the framework of Sanders and Schulz [47] in greater detail, since our work makes use of the techniques proposed by the authors. For simplicity, we assume k=2k=2. The techniques can be applied on a kk-way partition by repeatedly executing the algorithm on pairs of adjacent blocks. To schedule these refinements, the authors propose an active block scheduling algorithm, which schedules blocks as long as their participation in a pairwise refinement step results in some changes in the kk-way partition.

An ε\varepsilon-balanced bipartition of a graph G=(V,E,c,ω)G=(V,E,c,\omega) is improved with flow computations as follows. The basic idea is to construct a flow network 𝒩\mathcal{N} based on the induced subgraph G[B]G[B], where BVB\subseteq V is a set of nodes around the cut of GG. The size of BB is controlled by an imbalance factor ε:=αε\varepsilon^{\prime}:=\alpha\varepsilon, where α\alpha is a scaling parameter that is chosen adaptively depending on the result of the min-cut computation. If the heuristic found an ε\varepsilon-balanced partition using ε\varepsilon^{\prime}, the cut is accepted and α\alpha is increased to min(2α,α)\min(2\alpha,\alpha^{\prime}) where α\alpha^{\prime} is a predefined upper bound. Otherwise it is decreased to max(α2,1)\max(\frac{\alpha}{2},1). This scheme continues until a maximal number of rounds is reached or a feasible partition that did not improve the cut is found.

In each round, the corridor B:=B1B2B:=B_{1}\cup B_{2} is constructed by performing two breadth-first searches (BFS). The first BFS is done in the induced subgraph G[V1]G[V_{1}]. It is initialized with the boundary nodes of V1V_{1} and stops if c(B1)c(B_{1}) would exceed (1+ϵ)c(V)2c(V2)(1+\epsilon^{\prime})\lceil\frac{c(V)}{2}\rceil-c(V_{2}). The second BFS constructs B2B_{2} in an analogous fashion using G[V2]G[V_{2}]. Let δB:={uB|(u,v)E:vB}\delta B:=\{u\in B\ |\ \exists(u,v)\in E:v\notin B\} be the border of BB. Then 𝒩\mathcal{N} is constructed by connecting all border nodes δBV1\delta B\cap V_{1} of G[B]G[B] to the source 𝓈\mathpzc{s} and all border nodes δBV2\delta B\cap V_{2} to the sink 𝓉\mathpzc{t} using directed edges with an edge weight of \infty. By connecting 𝓈\mathpzc{s} and 𝓉\mathpzc{t} to the respective border nodes, it is ensured that edges incident to border nodes, but not contained in G[B]G[B], cannot become cut edges. For α=1\alpha=1, the size of BB thus ensures that the flow network 𝒩\mathcal{N} has the cut property, i.e., each (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-min-cut in 𝒩\mathcal{N} yields an ε\varepsilon-balanced partition of GG with a possibly smaller cut. For larger values of α\alpha, this does not have to be the case.

After computing a max-flow in 𝒩\mathcal{N}, the algorithm tries to find a min-cut with better balance. This is done by exploiting the fact that one (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-max-flow contains information about all (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-min-cuts [44]. More precisely, the algorithm uses the 1–1 correspondence between (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-min-cuts and closed sets containing 𝓈\mathpzc{s} in the Picard-Queyranne-DAG D𝓈,𝓉D_{\mathpzc{s},\mathpzc{t}} of the residual graph 𝒩f\mathcal{N}_{f} [44]. First, D𝓈,𝓉D_{\mathpzc{s},\mathpzc{t}} is constructed by contracting each strongly connected component of the residual graph. Then the following heuristic (called most balanced minimum cuts) is repeated several times using different random seeds. Closed node sets containing ss are computed by sweeping through the nodes of DAG𝓈,𝓉DAG_{\mathpzc{s},\mathpzc{t}} in reverse topological order (e.g. computed using a randomized DFS). Each closed set induces a differently balanced min-cut and the one with the best balance (with respect to the original balance constraint) is used as resulting bipartition.

3. Hypergraph Max-Flow Min-Cut Refinement

In the following, we generalize the flow-based refinement algorithm of KaFFPa to hypergraph partitioning. In Section 3.1 we first show how hypergraph flow networks 𝒩\mathcal{N} are constructed in general and introduce a technique to reduce their size by removing low-degree hypernodes. Given a kk-way partition Πk={V1,,Vk}\mathrm{\Pi}_{k}=\{V_{1},\dots,V_{k}\} of a hypergraph H=(V,E)H=(V,E), a pair of blocks (Vi,Vj)(V_{i},V_{j}) adjacent in the quotient graph QQ, and a corridor BB, Section 3.2 then explains how 𝒩\mathcal{N} is used to build a flow problem \mathcal{F} based on a BB-induced subhypergraph HB=(VB,EB)H_{B}=(V_{B},E_{B}). The flow problem \mathcal{F} is constructed such that an (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-max-flow computation optimizes the cut metric of the bipartition Π2=(Vi,Vj)\mathrm{\Pi}_{2}=(V_{i},V_{j}) of HBH_{B} and thus improves the (λ1)(\lambda-1)-metric in HH. Section 3.3 then discusses the integration into KaHyPar and introduces several techniques to speed up flow-based refinement. Algorithm 1 gives a pseudocode description of the entire flow-based refinement framework.

Input: Hypergraph HH, kk-way partition Πk={V1,,Vk}\mathrm{\Pi}_{k}=\{V_{1},\dots,V_{k}\}, imbalance parameter ε\varepsilon.
Algorithm MaxFlowMinCutRefinement(H,ΠkH,\mathrm{\Pi}_{k})
 Q:=QuotientGraph(H,Πk)Q:=\textnormal{{QuotientGraph}}(H,\mathrm{\Pi}_{k})
 while \exists active blocks Q\in Q do // in the beginning all blocks are active
    foreach {(Vi,Vj)Q|ViVjis active}\{(V_{i},V_{j})\in Q\leavevmode\nobreak\ |\leavevmode\nobreak\ V_{i}\vee V_{j}\leavevmode\nobreak\ \text{is active}\}  do // choose a pair of blocks
       Πold=Πbest:={Vi,Vj}Πk\mathrm{\Pi_{\text{old}}}=\mathrm{\Pi}_{\text{best}}:=\{V_{i},V_{j}\}\subseteq\Pi_{k}
          // extract bipartition to be improved
       εold=εbest:=imbalance(Πk)\varepsilon_{\text{old}}=\varepsilon_{\text{best}}:=\textnormal{{imbalance}}(\Pi_{k})
          // imbalance of current kk-way partition
       α:=α\alpha:=\alpha^{\prime}
          // use large BB-corridor for first iteration
       do // adaptive flow iterations
          B:=computeB-Corridor(H,Πbest,αε)B:=\textnormal{{computeB-Corridor}}(H,\mathrm{\Pi}_{\text{best}},\alpha\varepsilon)
             // as described in Section 2.3
          HB:=SubHypergraph(H,B)H_{B}:=\textnormal{{SubHypergraph}}(H,B)
             // create BB-induced subhypergraph
          𝒩B:=FlowNetwork(HB)\mathcal{N}_{B}:=\textnormal{{FlowNetwork}}(H_{B})
             // as described in Section 3.1
          :=FlowProblem(𝒩)\mathcal{F}:=\textnormal{{FlowProblem}}(\mathcal{N_{B}})
             // as described in Section 3.2
          f:=maxFlow()f:=\textnormal{{maxFlow}}(\mathcal{F})
             // compute maximum flow on \mathcal{F}
          Πf:=mostBalancedMinCut(f,)\mathrm{\Pi}_{f}:=\textnormal{{mostBalancedMinCut}}(f,\mathcal{F})
             // as in Section 2.3 & 3.1
          εf:=imbalance(ΠfΠkΠold)\varepsilon_{f}:=\textnormal{{imbalance}}(\mathrm{\Pi}_{f}\cup\mathrm{\Pi}_{k}\setminus\mathrm{\Pi}_{\text{old}})
             // imbalance of new kk-way partition
          if (cut(Πf)<cut(Πbest)εfε)εf<εbest(\text{cut}(\mathrm{\Pi}_{f})<\text{cut}(\mathrm{\Pi}_{\text{best}})\wedge\varepsilon_{f}\leq\varepsilon)\vee\varepsilon_{f}<\varepsilon_{\text{best}} then // found improvement
             α:=min(2α,α),Πbest:=Πf,εbest:=εf\alpha:=\min(2\alpha,\alpha^{\prime}),\leavevmode\nobreak\ \mathrm{\Pi}_{\text{best}}:=\mathrm{\Pi}_{f},\leavevmode\nobreak\ \varepsilon_{\text{best}}:=\varepsilon_{f} // update α\alpha, Πbest\mathrm{\Pi}_{\text{best}},εbest\varepsilon_{\text{best}}
          else  α:=α2\alpha:=\frac{\alpha}{2} // decrease size of BB-corridor in next iteration
          
       while α1\alpha\geq 1
       if ΠbestΠold\mathrm{\Pi}_{\text{best}}\neq\mathrm{\Pi}_{\text{old}} then // improvement found
          Πk:=ΠbestΠkΠold\mathrm{\Pi}_{k}:=\mathrm{\Pi}_{\text{best}}\cup\mathrm{\Pi}_{k}\setminus\mathrm{\Pi}_{\text{old}}
             // replace Πold\mathrm{\Pi}_{\text{old}} with Πbest\mathrm{\Pi}_{\text{best}}
          activateForNextRound(Vi,Vj)\textnormal{{activateForNextRound}}(V_{i},V_{j})
             // reactivate blocks for next round
          
       
    
 return Πk\mathrm{\Pi}_{k}
Output: improved ε\varepsilon-balanced kk-way partition Πk={V1,,Vk}\mathrm{\Pi}_{k}=\{V_{1},\dots,V_{k}\}
Algorithm 1 Flow-Based Refinement

3.1. Hypergraph Flow Networks

The Liu-Wong Network [40].

Given a hypergraph H=(V,E,c,ω)H=(V,E,c,\omega) and two distinct nodes 𝓈\mathpzc{s} and 𝓉\mathpzc{t}, an (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-min-cut can be computed by finding a minimum-capacity cut in the following flow network 𝒩=(𝒱,)\mathcal{N}=(\mathcal{V},\mathcal{E}):

  • 𝒱\mathcal{V} contains all vertices in VV.

  • For each multi-pin net eEe\in E with |e|3|e|\geq 3, add two bridging nodes ee^{\prime} and e′′e^{\prime\prime} to 𝒱\mathcal{V} and a bridging edge (e,e′′)(e^{\prime},e^{\prime\prime}) with capacity 𝒸(,′′)=ω()\mathpzc{c}(e^{\prime},e^{\prime\prime})=\omega(e) to \mathcal{E}. For each pin pep\in e, add two edges (p,e)(p,e^{\prime}) and (e′′,p)(e^{\prime\prime},p) with capacity \infty to \mathcal{E}.

  • For each two-pin net e=(u,v)Ee=(u,v)\in E, add two bridging edges (u,v)(u,v) and (v,u)(v,u) with capacity ω(e)\omega(e) to \mathcal{E}.

The flow network of Lawler [36] does not distinguish between two-pin and multi-pin nets. This increases the size of the network by two vertices and three edges per two-pin net. Figure 1 shows an example of the Lawler and Liu-Wong hypergraph flow networks as well as of our network described in the following paragraph.

Removing Low Degree Hypernodes.

We further decrease the size of the network by using the observation that the problem of finding an (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-min-cut of HH can be reduced to finding a minimum-weight (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-vertex-separator in the star-expansion, where the capacity of each star-node is the weight of the corresponding net and all other nodes (corresponding to vertices in HH) have infinite capacity [29]. Since the separator has to be a subset of the star-nodes, it is possible to replace any infinite-capacity node by adding a clique between all adjacent star-nodes without affecting the separator. The key observation now is that an infinite-capacity node vv with degree d(v)d(v) induces 2d(v)2d(v) infinite-capacity edges in the Lawler network [36], while a clique between star-nodes induces d(v)(d(v)1)d(v)(d(v)-1) edges. For hypernodes with d(v)3d(v)\leq 3, it therefore holds that d(v)(d(v)1)2d(v)d(v)(d(v)-1)\leq 2d(v). Thus we can reduce the number of nodes and edges of the Liu-Wong network as follows. Before applying the transformation on the star-expansion of HH, we remove all infinite-capacity nodes vv corresponding to hypernodes with d(v)3d(v)\leq 3 that are not incident to any two-pin nets and add a clique between all star-nodes adjacent to vv. In case vv was a source or sink node, we create a multi-source multi-sink problem by adding all adjacent star-nodes to the set of sources resp. sinks [20].

Refer to caption
Figure 1. Illustration of hypergraph flow networks. Our approach further sparsifies the flow network of Liu and Wong [40]. Thin edges have infinite capacity.

Reconstructing Min-Cuts.

After computing an (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-max-flow in the Lawler or Liu-Wong network, an (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-min-cut of HH can be computed by a BFS in the residual graph starting from 𝓈\mathpzc{s}. Let SS be the set of nodes corresponding to vertices of HH reached by the BFS. Then (S,VS)(S,V\setminus S) is an (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-min-cut. Since our network does not contain low degree hypernodes, we use the following lemma to compute an (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-min-cut of HH:

Lemma 3.1.

Let ff be a maximum (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-flow in the Lawler network 𝒩=(𝒱,)\mathcal{N}=(\mathcal{V},\mathcal{E}) of a hypergraph H=(V,E)H=(V,E) and (𝒮,𝒱𝒮)(\mathcal{S},\mathcal{V}\setminus\mathcal{S}) be the corresponding (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-min-cut in 𝒩\mathcal{N}. Then for each node v𝒮Vv\in\mathcal{S}\cap V, the residual graph 𝒩f=(𝒱f,f)\mathcal{N}_{f}=(\mathcal{V}_{f},\mathcal{E}_{f}) contains at least one path 𝓈,,′′\langle\mathpzc{s},\dots,e^{\prime\prime}\rangle to a bridging node e′′e^{\prime\prime} of a net eI(v)e\in\mathrm{I}(v).

Proof 3.2.

Since v𝒮v\in\mathcal{S}, there has to be some path 𝓈𝓋\mathpzc{s}\rightsquigarrow v in 𝒩f\mathcal{N}_{f}. By definition of the flow network, this path can either be of the form P1=𝓈,,′′,𝓋P_{1}=\langle\mathpzc{s},\dots,e^{\prime\prime},v\rangle or P2=𝓈,,,𝓋P_{2}=\langle\mathpzc{s},\dots,e^{\prime},v\rangle for some bridging nodes e,e′′e^{\prime},e^{\prime\prime} corresponding to nets eI(v)e\in\mathrm{I}(v). In the former case we are done, since e′′P1e^{\prime\prime}\in P_{1}. In the latter case the existence of edge (e,v)f(e^{\prime},v)\in\mathcal{E}_{f} implies that there is a positive flow f(v,e)>0f(v,e^{\prime})>0 over edge (v,e)(v,e^{\prime})\in\mathcal{E}. Due to flow conservation, there exists at least one edge (e′′,v)(e^{\prime\prime},v)\in\mathcal{E} with f(e′′,v)>0f(e^{\prime\prime},v)>0, which implies that (v,e′′)f(v,e^{\prime\prime})\in\mathcal{E}_{f}. Thus we can extend the path P2P_{2} to 𝓈,,,𝓋,′′\langle\mathpzc{s},\dots,e^{\prime},v,e^{\prime\prime}\rangle.

Thus (A,VA)(A,V\setminus A) is an (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-min-cut of HH, where A:={ve|eE:𝓈,,′′in𝒩𝒻}A:=\{v\in e\leavevmode\nobreak\ |\leavevmode\nobreak\ \exists e\in E:\langle\mathpzc{s},\dots,e^{\prime\prime}\rangle\leavevmode\nobreak\ \text{in}\leavevmode\nobreak\ \mathcal{N}_{f}\}. Furthermore this allows us to search for more balanced min-cuts using the Picard-Queyranne-DAG of 𝒩f\mathcal{N}_{f} as described in Section 2.3. By the definition of closed sets it follows that if a bridging node e′′e^{\prime\prime} is contained in a closed set CC, then all nodes vΓ(e′′)v\in\mathrm{\Gamma}(e^{\prime\prime}) (which correspond to vertices of HH) are also contained in CC. Thus we can use the respective bridging nodes e′′e^{\prime\prime} as representatives of removed low degree hypernodes.

3.2. Constructing the Hypergraph Flow Problem

Let HB=(VB,EB)H_{B}=(V_{B},E_{B}) be the subhypergraph of H=(V,E)H=(V,E) that is induced by a corridor BB computed in the bipartition Π2=(Vi,Vj)\mathrm{\Pi}_{2}=(V_{i},V_{j}). In the following, we distinguish between the set of internal border nodes B:={vVB|eE:{u,v}euVB}\overrightarrow{B}:=\{v\in V_{B}\leavevmode\nobreak\ |\leavevmode\nobreak\ \exists e\in E:\{u,v\}\subseteq e\wedge u\notin V_{B}\} and the set of external border nodes B:={uVB|eE:{u,v}evVB}\overleftarrow{B}:=\{u\notin V_{B}\leavevmode\nobreak\ |\leavevmode\nobreak\ \exists e\in E:\{u,v\}\subseteq e\wedge v\in V_{B}\}. Similarly, we distinguish between external nets (eVB=)(e\cap V_{B}=\emptyset) with no pins inside HBH_{B}, internal nets (eVB=e)(e\cap V_{B}=e) with all pins inside HBH_{B}, and border nets eI(B)I(B)e\in\mathrm{I}(\overrightarrow{B})\cap\mathrm{I}(\overleftarrow{B}) with some pins inside HBH_{B} and some pins outside of HBH_{B}. We use EB\overleftrightarrow{E_{B}} to denote the set of border nets.

A hypergraph flow problem consists of a flow network 𝒩B=(𝒱B,B)\mathcal{N}_{B}=(\mathcal{V}_{B},\mathcal{E}_{B}) derived from HBH_{B} and two additional nodes 𝓈\mathpzc{s} and 𝓉\mathpzc{t} that are connected to some nodes v𝒱Bv\in\mathcal{V}_{B}. Our approach works with all flow networks presented in Section 3.1. A flow problem has the cut property if the resulting min-cut bipartition Πf\mathrm{\Pi}_{f} of HBH_{B} does not increase the (λ1)(\lambda-1)-metric in HH. Thus it has to hold that cut(Πf)cut(Π2)\text{cut}(\mathrm{\Pi}_{f})\leq\text{cut}(\mathrm{\Pi}_{2}). While external nets are not affected by a max-flow computation, the max-flow min-cut theorem [21] ensures the cut property for all internal nets. Border nets however require special attention. Since a border net ee is only partially contained in HBH_{B}, it will remain connected to the blocks of its external border nodes in HH. In case external border nodes connect ee to both ViV_{i} and VjV_{j}, it will remain a cut net in HH even if it is removed from the cut-set in Πf\mathrm{\Pi}_{f}. It is therefore necessary to “encode” information about external border nodes into the flow problem.

The KaFFPa Model and its Limitations.

In KaFFPa, this is done by directly connecting internal border nodes B\overrightarrow{B} to 𝓈\mathpzc{s} and 𝓉\mathpzc{t}. This approach can also be used for hypergraphs. In the hypergraph flow problem G\mathcal{F}_{G}, the source 𝓈\mathpzc{s} is connected to all nodes 𝒮=BVi\mathcal{S}=\overrightarrow{B}\cap V_{i} and all nodes 𝒯=BVj\mathcal{T}=\overrightarrow{B}\cap V_{j} are connected to 𝓉\mathpzc{t} using directed edges with infinite capacity. While this ensures that G\mathcal{F}_{G} has the cut property, applying the graph-based model to hypergraphs unnecessarily restricts the search space. Since all internal border nodes B\overrightarrow{B} are connected to either 𝓈\mathpzc{s} or 𝓉\mathpzc{t}, every min-cut (S,VBS)(S,V_{B}\setminus S) will have 𝒮S\mathcal{S}\subseteq S and 𝒯VBS\mathcal{T}\subseteq V_{B}\setminus S. The KaFFPa model therefore prevents all min-cuts in which any non-cut border net (i.e., eEBe\in\overleftrightarrow{E_{B}} with λ(e)=1\lambda(e)=1) becomes part of the cut-set. This restricts the space of possible solutions, since corridor BB was computed such that even a min-cut along either side of the border would result in a feasible cut in HBH_{B}. Thus, ideally, all vertices vBv\in B should be able to change their block as result of an (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-max-flow computation on G\mathcal{F}_{G} – not only vertices vBBv\in B\setminus\overrightarrow{B}. This limitation becomes increasingly relevant for hypergraphs with large nets as well as for partitioning problems with small imbalance ε\varepsilon, since large nets are likely to be only partially contained in HBH_{B} and tight balance constraints enforce small BB-corridors. While the former is a problem only for HGP, the latter also applies to GP.

A more flexible Model.

We propose a more general model that allows an (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-max-flow computation to also cut through border nets by exploiting the structure of hypergraph flow networks. Instead of directly connecting 𝓈\mathpzc{s} and 𝓉\mathpzc{t} to internal border nodes B\overrightarrow{B} and thus preventing all min-cuts in which these nodes switch blocks, we conceptually extend HBH_{B} to contain all external border nodes B\overleftarrow{B} and all border nets EB\overleftrightarrow{E_{B}}. The resulting hypergraph is HB=(VBB,{eE|eVB})\overleftarrow{H_{B}}=(V_{B}\cup\overleftarrow{B},\{e\in E\leavevmode\nobreak\ |\leavevmode\nobreak\ e\cap V_{B}\neq\emptyset\}). The key insight now is that by using the flow network of HB\overleftarrow{H_{B}} and connecting 𝓈\mathpzc{s} resp. 𝓉\mathpzc{t} to the external border nodes BVi\overleftarrow{B}\cap V_{i} resp. BVj\overleftarrow{B}\cap V_{j}, we get a flow problem that does not lock any node vVBv\in V_{B} in its block, since none of these nodes is directly connected to either 𝓈\mathpzc{s} or 𝓉\mathpzc{t}. Due to the max-flow min-cut theorem [21], this flow problem furthermore has the cut property, since all border nets of HBH_{B} are now internal nets and all external border nodes B\overleftarrow{B} are locked inside their block. However, it is not necessary to use HB\overleftarrow{H_{B}} instead of HBH_{B} to achieve this result. For all vertices vBv\in\overleftarrow{B} the flow network of HB\overleftarrow{H_{B}} contains paths 𝓈,𝓋,\langle\mathpzc{s},v,e^{\prime}\rangle and e′′,v,𝓉\langle e^{\prime\prime},v,\mathpzc{t}\rangle that only involve infinite-capacity edges. Therefore we can remove all nodes vBv\in\overleftarrow{B} by directly connecting 𝓈\mathpzc{s} and 𝓉\mathpzc{t} to the corresponding bridging nodes e,e′′e^{\prime},e^{\prime\prime} via infinite-capacity edges without affecting the maximal flow [20]. More precisely, in the hypergraph flow problem H\mathcal{F}_{H}, we connect 𝓈\mathpzc{s} to all bridging nodes ee^{\prime} corresponding to border nets eEB:eBVie\in\overleftrightarrow{E_{B}}:e\subset\overleftarrow{B}\cap V_{i} and all bridging nodes e′′e^{\prime\prime} corresponding to border nets eEB:eBVje\in\overleftrightarrow{E_{B}}:e\subset\overleftarrow{B}\cap V_{j} to 𝓉\mathpzc{t} using directed, infinite-capacity edges.

Single-Pin Border Nets.

Furthermore, we model border nets with |eB|=1|e\cap\overrightarrow{B}|=1 more efficiently. For such a net ee, the flow problem contains paths of the form 𝓈,,′′,𝓋\langle\mathpzc{s},e^{\prime},e^{\prime\prime},v\rangle or v,e,e′′,𝓉\langle v,e^{\prime},e^{\prime\prime},\mathpzc{t}\rangle which can be replaced by paths of the form 𝓈,,𝓋\langle\mathpzc{s},e^{\prime},v\rangle or v,e′′,𝓉\langle v,e^{\prime\prime},\mathpzc{t}\rangle with 𝒸(,𝓋)=ω()\mathpzc{c}(e^{\prime},v)=\omega(e) resp. 𝒸(𝓋,′′)=ω()\mathpzc{c}(v,e^{\prime\prime})=\omega(e). In both cases we can thus remove one bridging node and two infinite-capacity edges. A comparison of G\mathcal{F}_{G} and H\mathcal{F}_{H} is shown in Figure 2.

Refer to caption
Figure 2. Comparison of the KaFFPa flow problem G\mathcal{F}_{G} and our flow problem H\mathcal{F}_{H}. For clarity the zoomed in view is based on the Lawler network.

3.3. Implementation Details

Since KaHyPar is an nn-level partitioner, its FM-based local search algorithms are executed each time a vertex is uncontracted. To prevent expensive recalculations, it therefore uses a cache to maintain the gain values of FM moves throughout the nn-level hierarchy [2]. In order to combine our flow-based refinement with FM local search, we not only perform the moves induced by the max-flow min-cut computation but also update the FM gain cache accordingly.

Since it is not feasible to execute our algorithm on every level of the nn-level hierarchy, we use an exponentially spaced approach that performs flow-based refinements after uncontracting i=2ji=2^{j} vertices for j+j\in\mathbb{N}_{+}. This way, the algorithm is executed more often on smaller flow problems than on larger ones. To further improve the running time, we introduce the following speedup techniques:

  • S1: We modify active block scheduling such that after the first round the algorithm is only executed on a pair of blocks if at least one execution using these blocks improved connectivity or imbalance of the partition on previous levels.

  • S2: For all levels except the finest level: Skip flow-based refinement if the cut between two adjacent blocks is less than ten.

  • S3: Stop resizing the corridor BB if the current (𝓈,𝓉)(\mathpzc{s},\mathpzc{t})-cut did not improve the previously best solution.

4. Experimental Evaluation

We implemented the max-flow min-cut refinement algorithm in the nn-level hypergraph partitioning framework KaHyPar (Karlsruhe Hypergraph Partitioning). The code is written in C++ and compiled using g++-5.2 with flags -O3 -march=native. The latest version of the framework is called KaHyPar-CA [28]. We refer to our new algorithm as KaHyPar-MF. Both versions use the default configuration for community-aware direct kk-way partitioning.111https://github.com/SebastianSchlag/kahypar/blob/master/config/km1_direct_kway_sea17.ini

Instances.

All experiments use hypergraphs from the benchmark set of Heuer and Schlag [28]222The complete benchmark set along with detailed statistics for each hypergraph is publicly available from http://algo2.iti.kit.edu/schlag/sea2017/., which contains 488488 hypergraphs derived from four benchmark sets: the ISPD98 VLSI Circuit Benchmark Suite [3], the DAC 2012 Routability-Driven Placement Contest [55], the University of Florida Sparse Matrix Collection [15], and the international SAT Competition 2014 [9]. Sparse matrices are translated into hypergraphs using the row-net model [12], i.e., each row is treated as a net and each column as a vertex. SAT instances are converted to three different representations: For literal hypergraphs, each boolean literal is mapped to one vertex and each clause constitutes a net [43], while in the primal model each variable is represented by a vertex and each clause is represented by a net. In the dual model the opposite is the case [41]. All hypergraphs have unit vertex and net weights.

Table 1 gives an overview about the different benchmark sets used in the experiments. The full benchmark set is referred to as set A. We furthermore use the representative subset of 165 hypergraphs proposed in [28] (set B) and a smaller subset consisting of 2525 hypergraphs (set C), which is used to devise the final configuration of KaHyPar-MF. Basic properties of set C can be found in Table 10 in Appendix C. Unless mentioned otherwise, all hypergraphs are partitioned into k{2,4,8,16,32,64,128}k\in\{2,4,8,16,32,64,128\} blocks with ε=0.03\varepsilon=0.03. For each value of kk, a kk-way partition is considered to be one test instance, resulting in a total of 175175 instances for set C, 11551155 instances for set B and 34163416 instances for set A. Furthermore we use 15 graphs from [42] to compare our flow model H\mathcal{F}_{H} to the KaFFPa [47] model G\mathcal{F}_{G}. Table 11 in Appendix C summarizes the basic properties of these graphs, which constitute set D.

Table 1. Overview about different benchmark sets. Set B and set C are subsets of set A.
Source # DAC ISPD98 Primal Dual Literal SPM Graphs
Set A [28] 477 10 18 92 92 92 184 -
Set B [28] 165 5 10 30 30 30 60 -
Set C new 25 - 5 5 5 5 5 -
Set D [42] 15 - - - - - - 15

System and Methodology.

All experiments are performed on a single core of a machine consisting of two Intel Xeon E5-2670 Octa-Core processors (Sandy Bridge) clocked at 2.62.6 GHz. The machine has 6464 GB main memory, 2020 MB L3- and 8x256 KB L2-Cache and is running RHEL 7.2. We compare KaHyPar-MF to KaHyPar-CA, as well as to the kk-way (hMetis-K) and the recursive bisection variant (hMetis-R) of hMetis 2.0 (p1) [32, 33], and to PaToH 3.2 [12]. These HGP libraries were chosen because they provide the best solution quality [2, 28]. The partitioning results of these tools are already available from http://algo2.iti.kit.edu/schlag/sea2017/. For each partitioner except PaToH the results summarize ten repetitions with different seeds for each test instance and report the arithmetic mean of the computed cut and running time as well as the best cut found. Since PaToH ignores the random seed if configured to use the quality preset, the results contain both the result of single run of the quality preset (PaToH-Q) and the average over ten repetitions using the default configuration (PaToH-D). Each partitioner had a time limit of eight hours per test instance. We use the same number of repetitions and the same time limit for our experiments with KaHyPar-MF.

In the following, we use the geometric mean when averaging over different instances in order to give every instance a comparable influence on the final result. In order to compare the algorithms in terms of solution quality, we perform a more detailed analysis using improvement plots. For each algorithm, these plots relate the minimum connectivity of KaHyPar-MF to the minimum connectivity produced by the corresponding algorithm on a per-instance basis. For each algorithm, these ratios are sorted in decreasing order. The plots use a cube root scale for the y-axis to reduce right skewness [14] and show the improvement of KaHyPar-MF in percent (i.e., 1(KaHyPar-MF/algorithm)1-(\text{KaHyPar-MF}/\text{algorithm})) on the y-axis. A value below zero indicates that the partition of KaHyPar-MF was worse than the partition produced by the corresponding algorithm, while a value above zero indicates that KaHypar-MF performed better than the algorithm in question. A value of zero implies that the partitions of both algorithms had the same solution quality. Values above one correspond to infeasible solutions that violated the balance constraint. In order to include instances with a cut of zero into the results, we set the corresponding cut values to one for ratio computations.

Table 2. Statistics of benchmark set B. We use x¯\overline{x} to denote mean and x~\widetilde{x} to denote the median.
Type # d(v)¯\overline{d(v)} d(v)~\widetilde{d(v)} |e|¯\overline{|e|} |e|~\widetilde{|e|}
DAC 5 3.32 3.28 3.37 3.35
ISPD 10 4.20 4.24 3.89 3.90
Primal 30 16.29 9.97 2.63 2.39
Literal 30 8.21 4.99 2.63 2.39
Dual 30 2.63 2.38 16.29 9.97
SPM 60 24.78 14.15 26.58 15.01

4.1. Evaluating Flow Networks, Models, and Algorithms

Flow Networks and Algorithms.

To analyze the effects of the different hypergraph flow networks we compute five bipartitions for each hypergraph of set B with KaHyPar-CA using different seeds. Statistics of the hypergraphs are shown in Table 2. The bipartitions are then used to generate hypergraph flow networks for a corridor of size |B|=25 000|B|=\numprint{25000} hypernodes around the cut. Figure 3 (top) summarizes the sizes of the respective flow networks in terms of number of nodes |𝒱||\mathcal{V}| and number of edges |||\mathcal{E}| for each instance class. The flow networks of primal and literal SAT instances are the largest in terms of both numbers of nodes and edges. High average vertex degree combined with low average net sizes leads to subhypergraphs HBH_{B} containing many small nets, which then induce many nodes and (infinite-capacity) edges in 𝒩L\mathcal{N}_{L}. Dual instances with low average degree and large average net size on the other hand lead to smaller flow networks. For VLSI instances (DAC, ISPD) both average degree and average net sizes are low, while for SPM hypergraphs the opposite is the case. This explains why SPM flow networks have significantly more edges, despite the number of nodes being comparable in both classes.

Refer to caption
Figure 3. Top: Size of the flow networks when using the Lawler network 𝒩L\mathcal{N}_{\text{L}}, the Liu-Wong network 𝒩W\mathcal{N}_{\text{W}} and our network 𝒩Our\mathcal{N}_{\text{Our}}. Network 𝒩Our1\mathcal{N}_{\text{Our}}^{1} additionally models single-pin border nets more efficiently. The dashed line indicates 25 000\numprint{25000} nodes. Bottom: Speedup of BK [10] and IBFS [23] max-flow algorithms over the execution on 𝒩L\mathcal{N}_{L}.

As expected, the Lawler-Network 𝒩L\mathcal{N}_{L} induces the biggest flow problems. Looking at the Liu-Wong network 𝒩W\mathcal{N}_{W}, we can see that distinguishing between graph edges and nets with |e|3|e|\geq 3 pins has an effect for all hypergraphs with many small nets (i.e., DAC, ISPD, Primal, Literal). While this technique alone does not improve dual SAT instances, we see that the combination of the Liu-Wong approach and our removal of low degree hypernodes in 𝒩Our\mathcal{N}_{\text{Our}} reduces the size of the networks for all instance classes except SPM. Both techniques only have a limited effect on these instances, since both hypernode degrees and net sizes are large on average. Since our flow problems are based on BB-corridor induced subhypergraphs, 𝒩Our1\mathcal{N}_{\text{Our}}^{1} additionally models single-pin border nets more efficiently as described in Section 3.2. This further reduces the network sizes significantly. As expected, the reduction in numbers of nodes and edges is most pronounced for hypergraphs with low average net sizes because these instances are likely to contain many single-pin border nets.

To further see how these reductions in network size translate to improved running times of max-flow algorithms, we use these networks to create flow problems using our flow model H\mathcal{F}_{H} and compute min-cuts using two highly tuned max-flow algorithms, namely the BK-algorithm333Available from: https://github.com/gerddie/maxflow [10] and the incremental breadth-first search (IBFS) algorithm444Available from: http://www.cs.tau.ac.il/~sagihed/ibfs/code.html [23]. These algorithms were chosen because they performed best in preliminary experiments [27]. We then compare the speedups of these algorithms when executed on 𝒩W\mathcal{N}_{W}, 𝒩Our\mathcal{N}_{\text{Our}}, and 𝒩Our1\mathcal{N}_{\text{Our}}^{1} to the execution on the Lawler network 𝒩L\mathcal{N}_{L}. As can be seen in Figure 3 (bottom) both algorithms benefit from improved network models and the speedups directly correlate with the reductions in network size. While 𝒩W\mathcal{N}_{W} significantly reduces the running times for Primal and Literal instances, 𝒩Our\mathcal{N}_{\text{Our}} additionally leads to a speedup for Dual instances. By additionally considering single-pin border nets, 𝒩Our1\mathcal{N}_{\text{Our}}^{1} results in an average speedup between 1.521.52 and 2.212.21 (except for SPM instances). Since IBFS outperformed the BK algorithm in [27], we use 𝒩Our1\mathcal{N}_{\text{Our}}^{1} and IBFS in all following experiments.

Table 3. Comparing the KaFFPa flow model G\mathcal{F}_{G} with our model H\mathcal{F}_{H} as described in Section 3.2. The table shows the average improvement of H\mathcal{F}_{H} over G\mathcal{F}_{G} (in [%]) for different imbalance parameters ε\varepsilon on hypergraphs as well as on graphs. All experiments use configuration (+F,-M,-FM).
Hypergraphs Graphs
α\alpha^{\prime} ε=1%\varepsilon=1\% ε=3%\varepsilon=3\% ε=5%\varepsilon=5\% ε=1%\varepsilon=1\% ε=3%\varepsilon=3\% ε=5%\varepsilon=5\%
11 7.77.7 8.18.1 7.67.6 11.711.7 11.311.3 10.510.5
22 7.97.9 6.66.6 4.84.8 11.011.0 9.19.1 7.87.8
44 6.96.9 3.93.9 2.72.7 9.99.9 7.37.3 5.45.4
88 5.15.1 2.32.3 1.51.5 8.68.6 5.35.3 3.93.9
1616 3.43.4 1.31.3 1.21.2 7.07.0 4.14.1 3.53.5

Flow Models.

We now compare the flow model G\mathcal{F}_{G} of KaFFPa to our advanced model H\mathcal{F}_{H} described in Section 3.2. The experiments summarized in Table 4.1 were performed using sets C and D. To focus on the impact of the models on solution quality, we deactivated KaHyPar’s FM local search algorithms and only use flow-based refinement without the most balanced minimum cut heuristic. The results confirm our hypothesis that G\mathcal{F}_{G} restricts the space of possible solutions. For all flow problem sizes and all imbalances tested, H\mathcal{F}_{H} yields better solution quality. As expected, the effects are most pronounced for small flow problems and small imbalances where many vertices are likely to be border nodes. Since these nodes are locked inside their respective block in G\mathcal{F}_{G}, they prevent all non-cut border nets from becoming part of the cut-set. Our model, on the other hand, allows all min-cuts that yield a feasible solution for the original partitioning problem. The fact that this effect also occurs for the graphs of set D indicates that our model can also be effective for traditional graph partitioning. All following experiments are performed using H\mathcal{F}_{H}.

4.2. Configuring the Algorithm

We now evaluate different configurations of the max-flow min-cut based refinement framework on set C. In the following, KaHyPar-CA [28] is used as a reference. Since it neither uses (F)lows nor the (M)ost balanced minimum cut heuristic and only relies on the (FM) algorithm for local search, it is referred to as (-F,-M,+FM). This basic configuration is then successively extended with specific components. The results of our experiments are summarized in Table 4.2 for increasing scaling parameter α\alpha^{\prime}. The table furthermore includes a configuration Constant128. In this configuration all components are enabled (+F,+M,+FM) and we perform flow-based refinements every 128 uncontractions. While this configuration is slow, it is used as a reference point for the quality achievable using flow-based refinement.

Table 4. Different configurations of our flow-based refinement framework for increasing α\alpha^{\prime}. Column Avg[%][\%] reports the quality improvement relative to the reference configuration (-F,-M,+FM).

The results indicate that only using flows (+F,-M,-FM) as refinement technique is inferior to localized FM local search in regard to both running time and solution quality. Although the quality improves with increasing flow problem size (i.e., increasing α\alpha^{\prime}), the average connectivity is still worse than the reference configuration. Enabling the most balanced minimum cut heuristic improves partitioning quality. Configuration (+F,+M,-FM) performs better than the basic configuration for α8\alpha^{\prime}\geq 8. By combining flows with the FM algorithm (+F,-M,+FM) we get a configuration that improves upon the baseline configuration even for small flow problems. However, comparing this variant with (+F,+M,-FM) for α=16\alpha^{\prime}=16, we see that using large flow problems together with the most balanced minimum cut heuristic yields solutions of comparable quality. Enabling all components (+F,+M,+FM) and using large flow problems performs best. Furthermore we see that enabling FM local search slightly improves the running time for α8\alpha^{\prime}\geq 8. This can be explained by the fact that the FM algorithm already produces good cuts between the blocks such that fewer rounds of pairwise flow refinements are necessary to further improve the solution. Comparing configuration (+F,+M,+FM) with Constant128 shows that performing flows more often further improves solution quality at the cost of slowing down the algorithm by more than an order of magnitude. In all further experiments, we therefore use configuration (+F,+M,+FM) with α=16\alpha^{\prime}=16 for KaHyPar-MF. This configuration also performed best in the effectiveness tests presented in Appendix A. While this configuration performs better than KaHyPar-CA, its running time is still more than a factor of 33 higher.

We therefore perform additional experiments on set B and successively enable the speedup heuristics described in Section 3.3. The results are summarized in Table 4.2. Only executing pairwise flow refinements on blocks that lead to an improvement on previous levels (S1) reduces the running time of flow-based refinement by a factor of 1.271.27, while skipping flows in case of small cuts (S2) results in a further speedup of 1.191.19. By additionally stopping the resizing of the flow problem as early as possible (S3), we decrease the running time of flow-based improvement by a factor of 22 in total, while still computing solutions of comparable quality. Thus in the comparisons with other systems, all heuristics are enabled.

Table 5. Comparison of quality improvement and running times using speedup heuristics. Column tflow[s]t_{\text{flow}}[s] refers to the running time of flow-based refinement, column t[s]t[s] to the total partitioning time.

4.3. Comparison with other Systems

Finally, we compare KaHyPar-MF to different state-of-the-art hypergraph partitioners on the full benchmark set. We exclude the same 194194 out of 34163416 instances as in [28] because either PaToH-Q could not allocate enough memory or other partitioners did not finish in time. The excluded instances are shown in Table LABEL:tbl:excluded in Appendix D. Note that KaHyPar-MF did not lead to any further exclusions. The following comparison is therefore based on the remaining 3222 instances. As can be seen in Figure 4, KaHyPar-MF outperforms all other algorithms on all benchmark sets. Comparing the best solutions of KaHyPar-MF to each partitioner individually across all instances (top left), KaHyPar-MF produced better partitions than PaToH-Q, PaToH-D, hMetis-K, KaHyPar-CA, hMetis-R for 92.1%92.1\%, 91.7%91.7\%, 85.1%85.1\%, 83.7%83.7\%, and 75.6%75.6\% of the instances, respectively.

Comparing the best solutions of all systems simultaneously, KaHyPar-MF produced the best partitions for 24272427 of the 32223222 instances. It is followed by hMetis-R (678678), KaHyPar-CA (388388), hMetis-K (352352), PaToH-D (154154), and PaToH-Q (146146). Note that for some instances multiple partitioners computed the same best solution and that we disqualified infeasible solutions that violated the balance constraint.

Figure 5 shows that KaHyPar-MF also performs best for different values of kk and that pairwise flow refinements are an effective strategy to improve kk-way partitions. As can be seen in Table 4.3, the improvement over KaHyPar-CA is most pronounced for hypergraphs derived from matrices of web graphs and social networks555Based on the following matrices: webbase-1M, ca-CondMat, soc-sign-epinions, wb-edu, IMDB, as-22july06, as-caida, astro-ph, HEP-th, Oregon-1, Reuters911, PGPgiantcompo, NotreDame_www, NotreDame_actors, p2p-Gnutella25, Stanford, cnr-2000. and dual SAT instances. While the former are difficult to partition due to skewed degree and net size distributions, the latter are difficult because they contain many large nets.

Finally, Table B compares the running times of all partitioners. By using simplified flow networks, highly tuned flow algorithms and several techniques to speed up the flow-based refinement framework, KaHyPar-MF is less than a factor of two slower than KaHyPar-CA and still achieves a running time comparable to that of hMetis.

Table 6. Comparing the best solutions of KaHyPar-MF with the best results of KaHyPar-CA and other partitioners for different benchmark sets (top) and different values of kk (bottom). All values correspond to the quality improvement of KaHyPar-MF relative to the respective partitioner (in %).

k=2k=2       k=4k=4       k=8k=8       k=16k=16       k=32k=32       k=64k=64       k=128k=128    KaHyPar-MF 1005.76 2985.22 5805.19 9097.31 14352.34 21537.33 31312.48 KaHyPar-CA 1.71 2.16 2.51 2.51 2.45 2.16 2.05 hMetis-R 22.25 17.62 15.63 14.29 11.94 9.80 8.01 hMetis-K 21.82 13.66 12.76 13.49 10.62 9.18 7.81 PaToH-Q 14.92 12.60 11.81 11.66 10.66 9.77 8.63 PaToH-D 8.54 10.41 13.64 14.50 12.70 12.66 11.89

Table 7. Comparing the average running times of KaHyPar-MF with KaHyPar-CA and other hypergraph partitioners for different benchmark sets (top) and different values of kk (bottom).

k=2k=2       k=4k=4       k=8k=8       k=16k=16       k=32k=32       k=64k=64       k=128k=128    KaHyPar-MF 19.75 32.89 47.52 60.38 78.51 100.34 119.15 KaHyPar-CA 12.68 17.16 23.88 31.01 41.69 57.35 76.61 hMetis-R 27.87 51.59 74.74 91.09 109.13 128.66 149.34 hMetis-K 25.47 32.27 42.50 53.41 74.00 109.12 152.92 PaToH-Q 1.93 3.61 5.44 7.01 8.40 10.06 11.44 PaToH-D 0.43 0.77 1.12 1.42 1.71 2.02 2.29

Refer to caption
Figure 4. Min-Cut improvement plots comparing KaHyPar-MF with KaHyPar-CA and other partitioners for different instance classes.
Refer to caption
Figure 5. Min-Cut improvement plots comparing KaHyPar-MF with KaHyPar-CA and other partitioners for different values of kk.

5. Conclusion

We generalize the flow-based refinement framework of KaFFPa [47] from graph to hypergraph partitioning. We reduce the size of Liu and Wong’s hypergraph flow network [40] by removing low degree hypernodes and exploiting the fact that our flow problems are built on subhypergraphs of the input hypergraph. Furthermore we identify shortcomings of the KaFFPa [47] approach that restrict the search space of feasible solutions significantly and introduce an advanced model that overcomes these limitations by exploiting the structure of hypergraph flow networks. Lastly, we present techniques to improve the running time of the flow-based refinement framework by a factor of 22 without affecting solution quality. The resulting hypergraph partitioner KaHyPar-MF performs better than all competing algorithms on all instance classes of a large benchmark set and still has a running time comparable to that of hMetis.

Since our flow problem formulation yields significantly better solutions for both hypergraphs and graphs than the KaFFPa [47] approach, future work includes the integration of our flow model into KaFFPa and the evaluation in the context of a high quality graph partitioner. Furthermore an approach similar to Yang and Wong [57] could be used as an alternative to the most balanced minimum cut heuristic and adaptive BB-corridor resizing. We also plan to extend our framework to optimize other objective functions such as cut or sum of external degrees.

References

  • [1] P. Agrawal, B. Narendran, and N. Shivakumar. Multi-way partitioning of VLSI circuits. In Proceedings of 9th International Conference on VLSI Design, pages 393–399, Jan 1996.
  • [2] Y. Akhremtsev, T. Heuer, P. Sanders, and S. Schlag. Engineering a direct k-way hypergraph partitioning algorithm. In 19th Workshop on Algorithm Engineering and Experiments, (ALENEX), pages 28–42, 2017.
  • [3] C. J. Alpert. The ISPD98 Circuit Benchmark Suite. In Proceedings of the 1998 International Symposium on Physical Design, pages 80–85. ACM, 1998.
  • [4] C. J. Alpert, J.-H. Huang, and A. B. Kahng. Multilevel Circuit Partitioning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17(8):655–667, 1998.
  • [5] C. J. Alpert and A. B. Kahng. Recent Directions in Netlist Partitioning: a Survey. Integration, the VLSI Journal, 19(1–2):1 – 81, 1995.
  • [6] Reid Andersen and Kevin J Lang. An Algorithm for Improving Graph Partitions. In Proceedings of the 19th annual ACM-SIAM Symposium on Discrete Algorithms, pages 651–660. Society for Industrial and Applied Mathematics, 2008.
  • [7] C. Aykanat, B. B. Cambazoglu, and B. Uçar. Multi-level Direct K-way Hypergraph Partitioning with Multiple Constraints and Fixed Vertices. Journal of Parallel and Distributed Computing, 68(5):609–625, 2008.
  • [8] D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, editors. Proc. Graph Partitioning and Graph Clustering - 10th DIMACS Implementation Challenge Workshop, volume 588 of Contemporary Mathematics. AMS, 2013.
  • [9] A. Belov, D. Diepold, M. Heule, and M. Järvisalo. The SAT Competition 2014. http://www.satcompetition.org/2014/, 2014.
  • [10] Yuri Boykov and Vladimir Kolmogorov. An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9):1124–1137, 2004.
  • [11] T. N. Bui and C. Jones. A Heuristic for Reducing Fill-In in Sparse Matrix Factorization. In SIAM Conference on Parallel Processing for Scientific Computing, pages 445–452, 1993.
  • [12] Ü. V. Catalyürek and C. Aykanat. Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication. IEEE Transactions on Parallel and Distributed Systems, 10(7):673–693, Jul 1999.
  • [13] J. Cong and M. Smith. A Parallel Bottom-up Clustering Algorithm with Applications to Circuit Partitioning in VLSI Design. In 30th Conference on Design Automation, pages 755–760, June 1993.
  • [14] N. J. Cox. Stata tip 96: Cube roots. Stata Journal, 11(1):149–154(6), 2011. URL: http://www.stata-journal.com/article.html?article=st0223.
  • [15] T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software, 38(1):1:1–1:25, 2011.
  • [16] K. D. Devine, E. G. Boman, R. T. Heaphy, R. H. Bisseling, and Ü. V. Catalyürek. Parallel Hypergraph Partitioning for Scientific Computing. In 20th International Conference on Parallel and Distributed Processing, IPDPS, pages 124–124. IEEE, 2006.
  • [17] W.E. Donath. Logic partitioning. Physical Design Automation of VLSI Systems, pages 65–86, 1988.
  • [18] J. Edmonds and R. M. Karp. Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems. Journal of the ACM, 19(2):248–264, 1972.
  • [19] C. Fiduccia and R. Mattheyses. A Linear Time Heuristic for Improving Network Partitions. In 19th ACM/IEEE Design Automation Conf., pages 175–181, 1982.
  • [20] D. R. Ford and D. R. Fulkerson. Flows in Networks. Princeton University Press, 1962.
  • [21] Lester R Ford and Delbert R Fulkerson. Maximal Flow through a Network. Canadian Journal of Mathematics, 8(3):399–404, 1956.
  • [22] A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. Journal of the ACM, 35(4):921–940, 1988.
  • [23] Andrew Goldberg, Sagi Hed, Haim Kaplan, Robert Tarjan, and Renato Werneck. Maximum Flows by Incremental Breadth-First Search. Proceedings of 2011 European Symposium on Algorithms, pages 457–468, 2011.
  • [24] M. Hamann and B. Strasser. Graph Bisection with Pareto-Optimization. In Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments, ALENEX 2016, Arlington, Virginia, USA, January 10, 2016, pages 90–102, 2016.
  • [25] S. Hauck and G. Borriello. An Evaluation of Bipartitioning Techniques. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 16(8):849–866, Aug 1997.
  • [26] B. Hendrickson and R. Leland. A Multi-Level Algorithm For Partitioning Graphs. SC Conference, 0:28, 1995.
  • [27] T. Heuer. High Quality Hypergraph Partitioning via Max-Flow-Min-Cut Computations. Master’s thesis, KIT, 2018.
  • [28] T. Heuer and S. Schlag. Improving Coarsening Schemes for Hypergraph Partitioning by Exploiting Community Structure. In 16th International Symposium on Experimental Algorithms, (SEA), page 21:1–21:19, 2017.
  • [29] T. C. Hu and K. Moerder. Multiterminal Flows in a Hypergraph. In T.C. Hu and E.S. Kuh, editors, VLSI Circuit Layout: Theory and Design, chapter 3, pages 87–93. IEEE Press, 1985.
  • [30] E. Ihler, D. Wagner, and F. Wagner. Modeling Hypergraphs by Graphs with the Same Mincut Properties. Inf. Process. Lett., 45(4):171–175, 1993.
  • [31] I. Kabiljo, B. Karrer, M. Pundir, S. Pupyrev, A. Shalita, Y. Akhremtsev, and Presta. A. Social Hash Partitioner: A Scalable Distributed Hypergraph Partitioner. PVLDB, 10(11):1418–1429, 2017.
  • [32] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel Hypergraph Partitioning: Applications in VLSI Domain. IEEE Transactions on Very Large Scale Integration VLSI Systems, 7(1):69–79, 1999.
  • [33] G. Karypis and V. Kumar. Multilevel KK-way Hypergraph Partitioning. In Proceedings of the 36th ACM/IEEE Design Automation Conference, pages 343–348. ACM, 1999.
  • [34] B. W. Kernighan and S. Lin. An Efficient Heuristic Procedure for Partitioning Graphs. The Bell System Technical Journal, 49(2):291–307, Feb 1970.
  • [35] K. Lang and S. Rao. A Flow-Based Method for Improving the Expansion or Conductance of Graph Cuts. In Proceedings of 10th International IPCO Conference, volume 4, pages 325–337. Springer, 2004.
  • [36] E. Lawler. Cutsets and Partitions of Hypergraphs. Networks, 3(3):275–285, 1973.
  • [37] E. Lawler. Combinatorial Optimization : Networks and Matroids. Holt, Rinehart, and Whinston, 1976.
  • [38] T. Lengauer. Combinatorial Algorithms for Integrated Circuit Layout. John Wiley & Sons, Inc., 1990.
  • [39] J. Li, J. Lillis, and C. K. Cheng. Linear decomposition algorithm for VLSI design applications. In Proceedings of IEEE International Conference on Computer Aided Design (ICCAD), pages 223–228, Nov 1995.
  • [40] H. Liu and D. F. Wong. Network-Flow-Based Multiway Partitioning with Area and Pin Constraints. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17(1):50–59, Jan 1998.
  • [41] Z. Mann and P. Papp. Formula partitioning revisited. In Daniel Le Berre, editor, POS-14. Fifth Pragmatics of SAT workshop, volume 27 of EPiC Series in Computing, pages 41–56. EasyChair, 2014.
  • [42] H. Meyerhenke, P. Sanders, and C. Schulz. Partitioning Complex Networks via Size-Constrained Clustering. In 13th International Symposium on Experimental Algorithms, (SEA), pages 351–363, 2014.
  • [43] D. A. Papa and I. L. Markov. Hypergraph Partitioning and Clustering. In T. F. Gonzalez, editor, Handbook of Approximation Algorithms and Metaheuristics. Chapman and Hall/CRC, 2007.
  • [44] Jean-Claude Picard and Maurice Queyranne. On the Structure of all Minimum Cuts in a Network and Applications. Combinatorial Optimization II, pages 8–16, 1980.
  • [45] Joachim Pistorius and Michel Minoux. An Improved Direct Labeling Method for the Max–Flow Min–Cut Computation in Large Hypergraphs and Applications. International Transactions in Operational Research, 10(1):1–11, 2003.
  • [46] L. A. Sanchis. Multiple-way Network Partitioning. IEEE Trans. on Computers, 38(1):62–81, 1989. doi:10.1109/12.8730.
  • [47] P. Sanders and C. Schulz. Engineering Multilevel Graph Partitioning Algorithms. In 19th European Symposium on Algorithms, volume 6942 of LNCS, pages 469–480. Springer, 2011.
  • [48] S. Schlag, V. Henne, T. Heuer, H. Meyerhenke, P. Sanders, and C. Schulz. kk-way Hypergraph Partitioning via nn-Level Recursive Bisection. In 18th Workshop on Algorithm Engineering and Experiments (ALENEX), pages 53–67, 2016.
  • [49] D. G. Schweikert and B. W. Kernighan. A Proper Model for the Partitioning of Electrical Circuits. In Proceedings of the 9th Design Automation Workshop, DAC, pages 57–62. ACM, 1972.
  • [50] A. Trifunovic. Parallel Algorithms for Hypergraph Partitioning. PhD thesis, University of London, 2006.
  • [51] A. Trifunović and W. J. Knottenbelt. Parallel Multilevel Algorithms for Hypergraph Partitioning. Journal of Parallel and Distributed Computing, 68(5):563 – 581, 2008.
  • [52] Ü. V. Çatalyürek and M. Deveci and K. Kaya and B. Uçar. UMPa: A multi-objective, multi-level partitioner for communication minimization. In Bader et al. [8], pages 53–66.
  • [53] B. Uçar and C. Aykanat. Encapsulating Multiple Communication-Cost Metrics in Partitioning Sparse Rectangular Matrices for Parallel Matrix-Vector Multiplies. SIAM Journal on Scientific Computing, 25(6):1837–1859, 2004.
  • [54] B. Vastenhouw and R. H. Bisseling. A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication. SIAM Review, 47(1):67–95, 2005.
  • [55] N. Viswanathan, C. Alpert, C. Sze, Z. Li, and Y. Wei. The dac 2012 routability-driven placement contest and benchmark suite. In Proceedings of the 49th Annual Design Automation Conference, DAC ’12, pages 774–782. ACM, 2012.
  • [56] S. Wichlund. On multilevel circuit partitioning. In 1998 International Conference on Computer-aided Design, ICCAD, pages 505–511. ACM, 1998.
  • [57] H. H. Yang and D. F. Wong. Efficient Network Flow Based Min-Cut Balanced Partitioning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 15(12):1533–1540, 1996.

Appendix A Effectiveness Tests

To evaluate the effectiveness of our configurations presented in Section 4.2 we give each configuration the same time to compute a partition. For each instance (hypergraph, kk), we execute each configuration once and note the largest running time tH,kt_{H,k}. Then each configuration gets time 3tH,k3t_{H,k} to compute a partition (i.e., we take the best partition out of several repeated runs). Whenever a new run of a partition would exceed the largest running time, we perform the next run with a certain probability such that the expected running time is 3tH,k3t_{H,k}. The results of this procedure, which was initially proposed in [47], are presented in Table A. We see that the combinations of flow-based refinement and FM local search perform better than repeated executions of the baseline configuration (-F,-M,+FM). The most effective configuration is (+F,+M,-FM) with α=16\alpha^{\prime}=16, which was chosen as the default configuration for KaHyPar-MF.

Table 8. Results of the effectiveness test for different configurations of our flow-based refinement framework for increasing α\alpha^{\prime}. The quality in column Avg[%][\%] is relative to the baseline configuration (-F,-M,+FM).

Appendix B Average Connectivity Improvement

Table 9. Comparing the average solution quality of KaHyPar-MF with the average results of KaHyPar-CA and other partitioners for different benchmark sets (top) and different values of kk (bottom). All values correspond to the quality improvement of KaHyPar-MF relative to the respective partitioner (in %).

k=2k=2       k=4k=4       k=8k=8       k=16k=16       k=32k=32       k=64k=64       k=128k=128    KaHyPar-MF 1057.93 3130.20 6032.58 9362.55 14693.96 21893.59 31706.57 KaHyPar-CA 2.27 2.57 2.80 2.68 2.48 2.24 2.05 hMetis-R 21.38 15.92 14.47 13.63 11.35 9.63 7.80 hMetis-K 21.63 15.15 13.61 13.49 10.52 9.30 7.83 PaToH-Q 10.51 8.36 8.35 9.09 8.54 8.27 7.48 PaToH-D 13.26 14.24 16.07 16.59 13.87 13.61 12.66

Appendix C Properties of Benchmark Sets

Table 10. Basic properties of our parameter tuning benchmark set. The number of pins is denoted with pp.
Class Hypergraph nn mm pp
ISPD ibm06 32 498 34 826 128 182
ibm07 45 926 48 117 175 639
ibm08 51 309 50 513 204 890
ibm09 53 395 60 902 222 088
ibm10 69 429 75 196 297 567
Dual 6s9 100 384 34 317 234 228
6s133 140 968 48 215 328 924
6s153 245 440 85 646 572 692
dated-10-11-u 629 461 141 860 1 429 872
dated-10-17-u 1 070 757 229 544 2 471 122
Literal 6s133 96 430 140 968 328 924
6s153 171 292 245 440 572 692
aaai10-planning-ipc5 107 838 308 235 690 466
dated-10-11-u 283 720 629 461 1 429 872
atco_enc2_opt1_05_21 112 732 526 872 2 097 393
Primal 6s153 85 646 245 440 572 692
aaai10-planning-ipc5 53 919 308 235 690 466
hwmcc10-timeframe 163 622 488 120 1 138 944
dated-10-11-u 141 860 629 461 1 429 872
atco_enc2_opt1_05_21 56 533 526 872 2 097 393
SPM mult_dcop_01 25 187 25 187 193 276
vibrobox 12 328 12 328 342 828
RFdevice 74 104 74 104 365 580
mixtank_new 29 957 29 957 1 995 041
laminar_duct3D 67 173 67 173 3 833 077
Table 11. Basic properties of the graph instances.
Graph nn mm
p2p-Gnutella04 6 405 29 215
wordassociation-2011 10 617 63 788
PGPgiantcompo 10 680 24 316
email-EuAll 16 805 60 260
as-22july06 22 963 48 436
soc-Slashdot0902 28 550 379 445
loc-brightkite 56 739 212 945
enron 69 244 254 449
loc-gowalla 196 591 950 327
coAuthorsCiteseer 227 320 814 134
wiki-Talk 232 314 \approx1.5M
citationCiteseer 268 495 \approx1.2M
coAuthorsDBLP 299 067 977 676
cnr-2000 325 557 \approx2.7M
web-Google 356 648 \approx2.1M

Appendix D Excluded Instances

Table 12. Instances excluded from the full benchmark set evaluation. Note that using flow-based refinements did not lead to any further exclusions.
Hypergraph 2 4 8 16 32 64 128
Primal
10pipe-q0-k \square \square \square \square \square \square \square
11pipe-k \square \square \square \square \square \square \square
11pipe-q0-k \square \square \square \square \square \square \square
9dlx-vliw-at-b-iq3 \square \square \square \square \square \square \square
9vliw-m-9stages-iq3-C1-bug7 \triangle \triangle \triangle \triangle \triangle \triangle
9vliw-m-9stages-iq3-C1-bug8 \triangle \triangle \triangle \triangle \triangle \triangle
blocks-blocks-37-1.130-NOTKNOWN \square \square \square \square \square \square \square
openstacks-p30-3.085-SAT \square \square \square \square \square \square \square
openstacks-sequencedstrips-nonadl-nonnegated-os-sequencedstrips-p30-3.025-NOTKNOWN \square \square \square \square \square \square \square
openstacks-sequencedstrips-nonadl-nonnegated-os-sequencedstrips-p30-3.085-SAT \square \square \square \square \square \square \square
transport-transport-city-sequential-25nodes-1000size-3degree-100mindistance-3trucks-10packages-2008seed.050-NOTKNOWN \square \square \square
velev-vliw-uns-2.0-uq5 \square \square \square \square \square \square \square
velev-vliw-uns-4.0-9 \square \square \square \square \square \square \square
Literal
11pipe-k
9vliw-m-9stages-iq3-C1-bug7 \triangle \triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\square\triangle ⚫❍\square\triangle
9vliw-m-9stages-iq3-C1-bug8 \triangle \triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\square\triangle ⚫❍\square\triangle
blocks-blocks-37-1.130 \square \square \square \square \square \square
Dual
10pipe-q0-k \triangle \triangle \triangle \triangle
11pipe-k \triangle \triangle \triangle \triangle \triangle \triangle \triangle
11pipe-q0-k \triangle \triangle \triangle
9dlx-vliw-at-b-iq3 \triangle
9vliw-m-9stages-iq3-C1-bug7 \triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\triangle
9vliw-m-9stages-iq3-C1-bug8 \triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\triangle ⚫❍\triangle
blocks-blocks-37-1.130-NOTKNOWN ⚫❍ ⚫❍ ⚫❍ ⚫❍ ⚫❍\triangle
E02F20
E02F22
q-query-3-L100-coli.sat \triangle
q-query-3-L150-coli.sat \triangle \triangle
q-query-3-L200-coli.sat \triangle \triangle \triangle
q-query-3-L80-coli.sat \triangle
transport-transport-city-sequential-25nodes-1000size-3degree-100mindistance-3trucks-10packages-2008seed.030-NOTKNOWN \triangle
velev-vliw-uns-2.0-uq5 \triangle \triangle \triangle \triangle \triangle
velev-vliw-uns-4.0-9 \triangle \triangle \triangle
SPM
192bit \square \square
appu
ESOC \square \square \square \square \square
human-gene2 \triangle \triangle \triangle
IMDB \triangle \triangle \triangle \triangle
kron-g500-logn16 \triangle \triangle \triangle \triangle \triangle \triangle
Rucci1 \square
sls \square \square \square \square \square \square \square
Trec14
\triangle : KaHyPar-CA exceeded time limit
⚫ : hMetis-R exceeded time limit
❍ : hMetis-K exceeded time limit
\square : PaToH-Q memory allocation error