\Copyright

Tobias Heuer, Peter Sanders and Sebastian Schlag

Network Flow-Based Refinement for Multilevel Hypergraph Partitioning

Tobias Heuer Karlsruhe Institute of Technology, Karlsruhe, Germany
tobias.heuer@gmx.net Peter Sanders Karlsruhe Institute of Technology, Karlsruhe, Germany
sanders@kit.edu Sebastian Schlag Karlsruhe Institute of Technology, Karlsruhe, Germany
sebastian.schlag@kit.edu

Abstract.

We present a refinement framework for multilevel hypergraph partitioning that uses max-flow computations on pairs of blocks to improve the solution quality of a $k$ -way partition. The framework generalizes the flow-based improvement algorithm of KaFFPa from graphs to hypergraphs and is integrated into the hypergraph partitioner KaHyPar. By reducing the size of hypergraph flow networks, improving the flow model used in KaFFPa, and developing techniques to improve the running time of our algorithm, we obtain a partitioner that computes the best solutions for a wide range of benchmark hypergraphs from different application areas while still having a running time comparable to that of hMetis.

Key words and phrases:

multilevel hypergraph partitioning, network flows, refinement

1991 Mathematics Subject Classification:

G.2.2 Graph Theory, G.2.3 Applications

1. Introduction

Given an undirected hypergraph $H=(V,E)$ , the $k$ -way hypergraph partitioning problem is to partition the vertex set into $k$ disjoint blocks of bounded size (at most $1+\varepsilon$ times the average block size) such that an objective function involving the cut hyperedges is minimized. Hypergraph partitioning (HGP) has many important applications in practice such as scientific computing [12] or VLSI design [43]. Particularly VLSI design is a field where small improvements can lead to significant savings [56].

It is well known that HGP is NP-hard [38], which is why practical applications mostly use heuristic multilevel algorithms [11, 13, 25, 26]. These algorithms successively contract the hypergraph to obtain a hierarchy of smaller, structurally similar hypergraphs. After applying an initial partitioning algorithm to the smallest hypergraph, contraction is undone and, at each level, a local search method is used to improve the partitioning induced by the coarser level. All state-of-the-art HGP algorithms [2, 4, 7, 16, 28, 31, 32, 33, 48, 51, 52, 54] either use variations of the Kernighan-Lin (KL) [34, 49] or the Fiduccia-Mattheyses (FM) heuristic [19, 46], or simpler greedy algorithms [32, 33] for local search. These heuristics move vertices between blocks in descending order of improvements in the optimization objective (gain) and are known to be prone to get stuck in local optima when used directly on the input hypergraph [33]. The multilevel paradigm helps to some extent, since it allows a more global view on the problem on the coarse levels and a very fine-grained view on the fine levels of the multilevel hierarchy. However, the performance of move-based approaches degrades for hypergraphs with large hyperedges. In these cases, it is difficult to find meaningful vertex moves that improve the solution quality because large hyperedges are likely to have many vertices in multiple blocks [53]. Thus the gain of moving a single vertex to another block is likely to be zero [41].

While finding balanced minimum cuts in hypergraphs is NP-hard, a minimum cut separating two vertices can be found in polynomial time using network flow algorithms and the well-known max-flow min-cut theorem [21]. Flow algorithms find an optimal min-cut and do not suffer the drawbacks of move-based approaches. However, they were long overlooked as heuristics for balanced partitioning due to their high complexity [40, 57]. In the context of graph partitioning, Sanders and Schulz [47] recently presented a max-flow-based improvement algorithm which is integrated into the multilevel partitioner KaFFPa and computes high quality solutions.

Outline and Contribution.

Motivated by the results of Sanders and Schulz [47], we generalize the max-flow min-cut refinement framework of KaFFPa from graphs to hypergraphs. After introducing basic notation and giving a brief overview of related work and the techniques used in KaFFPa in Section 2, we explain how hypergraphs are transformed into flow networks and present a technique to reduce the size of the resulting hypergraph flow network in Section 3.1. In Section 3.2 we then show how this network can be used to construct a flow problem such that the min-cut induced by a max-flow computation between a pair of blocks improves the solution quality of a $k$ -way partition. We furthermore identify shortcomings of the KaFFPa approach that restrict the search space of feasible solutions significantly and introduce an advanced model that overcomes these limitations by exploiting the structure of hypergraph flow networks. We implemented our algorithm in the open source HGP framework KaHyPar and therefore briefly discuss implementation details and techniques to improve the running time in Section 3.3. Extensive experiments presented in Section 4 demonstrate that our flow model yields better solutions than the KaFFPa approach for both hypergraphs and graphs. We furthermore show that using pairwise flow-based refinement significantly improves partitioning quality. The resulting hypergraph partitioner, KaHyPar-MF, performs better than all competing algorithms on all instance classes and still has a running time comparable to that of hMetis. On a large benchmark set consisting of 3222 instances from various application domains, KaHyPar-MF computes the best partitions in 2427 cases.

2. Preliminaries

2.1. Notation and Definitions

An undirected hypergraph $H=(V,E,c,\omega)$ is defined as a set of $n$ vertices $V$ and a set of $m$ hyperedges/nets $E$ with vertex weights $c:V\rightarrow\mathbb{R}_{>0}$ and net weights $\omega:E\rightarrow\mathbb{R}_{>0}$ , where each net is a subset of the vertex set $V$ (i.e., $e\subseteq V$ ). The vertices of a net are called pins. We extend $c$ and $\omega$ to sets, i.e., $c(U):=\sum_{v\in U}c(v)$ and $\omega(F):=\sum_{e\in F}\omega(e)$ . A vertex $v$ is incident to a net $e$ if $v\in e$ . $\mathrm{I}(v)$ denotes the set of all incident nets of $v$ . The degree of a vertex $v$ is $d(v):=|\mathrm{I}(v)|$ . The size $|e|$ of a net $e$ is the number of its pins. Given a subset $V^{\prime}\subset V$ , the subhypergraph $H_{V^{\prime}}$ is defined as $H_{V^{\prime}}:=(V^{\prime},\{e\cap V^{\prime}\leavevmode\nobreak\ |\leavevmode\nobreak\ e\in E:e\cap V^{\prime}\neq\emptyset\})$ .

A $k$ -way partition of a hypergraph $H$ is a partition of its vertex set into $k$ blocks $\mathrm{\Pi}=\{V_{1},\dots,V_{k}\}$ such that $\bigcup_{i=1}^{k}V_{i}=V$ , $V_{i}\neq\emptyset$ for $1\leq i\leq k$ , and $V_{i}\cap V_{j}=\emptyset$ for $i\neq j$ . We call a $k$ -way partition $\mathrm{\Pi}$ $\mathrm{\varepsilon}$ -balanced if each block $V_{i}\in\mathrm{\Pi}$ satisfies the balance constraint: $c(V_{i})\leq L_{\max}:=(1+\varepsilon)\lceil\frac{c(V)}{k}\rceil$ for some parameter $\mathrm{\varepsilon}$ . For each net $e$ , $\mathrm{\Lambda}(e):=\{V_{i}\leavevmode\nobreak\ |\leavevmode\nobreak\ V_{i}\cap e\neq\emptyset\}$ denotes the connectivity set of $e$ . The connectivity of a net $e$ is $\mathrm{\lambda}(e):=|\mathrm{\Lambda}(e)|$ . A net is called cut net if $\mathrm{\lambda}(e)>1$ . Given a $k$ -way partition $\mathrm{\Pi}$ of $H$ , the quotient graph $Q:=(\mathrm{\Pi},\{(V_{i},V_{j})\leavevmode\nobreak\ |\leavevmode\nobreak\ \exists e\in E:\{V_{i},V_{j}\}\subseteq\mathrm{\Lambda}(e)\})$ contains an edge between each pair of adjacent blocks. The $k$ -way hypergraph partitioning problem is to find an $\varepsilon$ -balanced $k$ -way partition $\mathrm{\Pi}$ of a hypergraph $H$ that minimizes an objective function over the cut nets for some $\varepsilon$ . Several objective functions exist in the literature [5, 38]. The most commonly used cost functions are the cut-net metric $\text{cut}(\mathrm{\Pi}):=\sum_{e\in E^{\prime}}\omega(e)$ and the connectivity metric $(\mathrm{\lambda}-1)(\mathrm{\Pi}):=\sum_{e\in E^{\prime}}(\mathrm{\lambda}(e)-1)\leavevmode\nobreak\ \omega(e)$ [1], where $E^{\prime}$ is the set of all cut nets [17]. In this paper, we use the $(\lambda-1)$ -metric. Optimizing both objective functions is known to be NP-hard [38]. Hypergraphs can be represented as bipartite graphs [29]. In the following, we use nodes and edges when referring to graphs and vertices and nets when referring to hypergraphs. In the bipartite graph $G_{*}(V\dot{\cup}E,F)$ the vertices and nets of $H$ form the node set and for each net $e\in\mathrm{I}(v)$ , we add an edge $(e,v)$ to $G_{*}$ . The edge set $F$ is thus defined as $F:=\{(e,v)\leavevmode\nobreak\ |\leavevmode\nobreak\ e\in E,v\in e\}$ . Each net in $E$ therefore corresponds to a star in $G_{*}$ .

Let $G=(V,E,c,\omega)$ be a weighted directed graph. We use the same notation as for hypergraphs to refer to node weights $c$ , edge weights $\omega$ , and node degrees $d(v)$ . Furthermore $\mathrm{\Gamma(u)}:=\{v:(u,v)\in E\}$ denotes the neighbors of node $u$ . A path $P=\langle v_{1},\ldots,v_{k}\rangle$ is a sequence of nodes, such that each pair of consecutive nodes is connected by an edge. A strongly connected component $C\subseteq V$ is a set of nodes such that for each $u,v\in C$ there exists a path from $u$ to $v$ . A topological ordering is a linear ordering $\prec$ of $V$ such that every directed edge $(u,v)\in E$ implies $u\prec v$ in the ordering. A set of nodes $B\subseteq V$ is called a closed set iff there are no outgoing edges leaving $B$ , i.e., if the conditions $u\in B$ and $(u,v)\in E$ imply $v\in B$ . A subset $S\subset V$ is called a node separator if its removal divides $G$ into two disconnected components.

A flow network $\mathcal{N}=(\mathcal{V},\mathcal{E},\mathpzc{c})$ is a directed graph with two distinguished nodes $\mathpzc{s}$ and $\mathpzc{t}$ in which each edge $e\in\mathcal{E}$ has a capacity $\mathpzc{c}(e)\geq 0$ . An $(\mathpzc{s},\mathpzc{t})$ -flow (or flow) is a function $f:\mathcal{V}\times\mathcal{V}\rightarrow\mathbb{R}$ that satisfies the capacity constraint $\forall u,v\in\mathcal{V}:f(u,v)\leq\mathpzc{c}(u,v)$ , the skew symmetry constraint $\forall v\in\mathcal{V}\times\mathcal{V}:f(u,v)=-f(v,u)$ , and the flow conservation constraint $\forall u\in\mathcal{V}\setminus\{\mathpzc{s},\mathpzc{t}\}:\sum_{v\in\mathcal{V}}f(u,v)=0$ . The value of a flow $|f|:=\sum_{v\in\mathcal{V}}{f(\mathpzc{s},v)}$ is defined as the total amount of flow transferred from $\mathpzc{s}$ to $\mathpzc{t}$ . The residual capacity is defined as $r_{f}(u,v)=\mathpzc{c}(u,v)-f(u,v)$ . Given a flow $f$ , $\mathcal{N}_{f}=(\mathcal{V},\mathcal{E}_{f},r_{f})$ with $\mathcal{E}_{f}=\{(u,v)\in\mathcal{V}\times\mathcal{V}\ |\ r_{f}(u,v)>0\}$ is the residual network. An $(\mathpzc{s},\mathpzc{t})$ -cut (or cut) is a bipartition $(\mathcal{S},\mathcal{V}\setminus\mathcal{S})$ of a flow network $\mathcal{N}$ with $\mathpzc{s}\in\mathcal{S}\subset\mathcal{V}$ and $\mathpzc{t}\in\mathcal{V}\setminus\mathcal{S}$ . The capacity of an $(\mathpzc{s},\mathpzc{t})$ -cut is defined as $\sum_{e\in\mathcal{E}^{\prime}}\mathpzc{c}(e)$ , where $\mathcal{E}^{\prime}=\{(u,v)\in\mathcal{E}:u\in\mathcal{S},v\in\mathcal{V}\setminus\mathcal{S}\}$ . The max-flow min-cut theorem states that the value $|f|$ of a maximum flow is equal to the capacity of a minimum cut separating $\mathpzc{s}$ and $\mathpzc{t}$ [21].

2.2. Related Work

Hypergraph Partitioning.

Driven by applications in VLSI design and scientific computing, HGP has evolved into a broad research area since the 1990s. We refer to [5, 8, 43, 50] for an extensive overview. Well-known multilevel HGP software packages with certain distinguishing characteristics include PaToH [12] (originating from scientific computing), hMetis [32, 33] (originating from VLSI design), KaHyPar [2, 28, 48] (general purpose, $n$ -level), Mondriaan [54] (sparse matrix partitioning), MLPart [4] (circuit partitioning), Zoltan [16], Parkway [51] and SHP [31] (distributed), UMPa [52] (directed hypergraph model, multi-objective), and kPaToH (multiple constraints, fixed vertices) [7]. All of these tools either use variations of the Kernighan-Lin (KL) [34, 49] or the Fiduccia-Mattheyses (FM) heuristic [19, 46], or algorithms that greedily move vertices [33] or nets [32] to improve solution quality in the refinement phase.

Flows on Hypergraphs.

While flow-based approaches have not yet been considered as refinement algorithms for multilevel HGP, several works deal with flow-based hypergraph min-cut computation. The problem of finding minimum $(\mathpzc{s},\mathpzc{t})$ -cuts in hypergraphs was first considered by Lawler [36], who showed that it can be reduced to computing maximum flows in directed graphs. Hu and Moerder [29] present an augmenting path algorithm to compute a minimum-weight vertex separator on the star-expansion of the hypergraph. Their vertex-capacitated network can also be transformed into an edge-capacitated network using a transformation due to Lawler [37]. Yang and Wong [57] use repeated, incremental max-flow min-cut computations on the Lawler network [36] to find $\varepsilon$ -balanced hypergraph bipartitions. Solution quality and running time of this algorithm are improved by Lillis and Cheng [39] by introducing advanced heuristics to select source and sink nodes. Furthermore, they present a preflow-based [22] min-cut algorithm that implicitly operates on the star-expanded hypergraph. Pistorius and Minoux [45] generalize the algorithm of Edmonds and Karp [18] to hypergraphs by labeling both vertices and nets. Liu and Wong [40] simplify Lawler’s hypergraph flow network [36] by explicitly distinguishing between graph edges and hyperedges with three or more pins. This approach significantly reduces the size of flow networks derived from VLSI hypergraphs, since most of the nets in a circuit are graph edges. Note that the above-mentioned approaches to model hypergraphs as flow networks for max-flow min-cut computations do not contradict the negative results of Ihler et al. [30], who show that, in general, there does not exist an edge-weighted graph $G=(V,E)$ that correctly represents the min-cut properties of the corresponding hypergraph $H=(V,E)$ .

Flow-Based Graph Partitioning.

Flow-based refinement algorithms for graph partitioning include Improve [6] and MQI [35], which improve expansion or conductance of bipartitions. MQI also yields as small improvement when used as a post processing technique on hypergraph bipartitions initially computed by hMetis [35]. FlowCutter [24] uses an approach similar to Yang and Wong [57] to compute graph bisections that are Pareto-optimal in regard to cut size and balance. Sanders and Schulz [47] present a flow-based refinement framework for their direct $k$ -way graph partitioner KaFFPa. The algorithm works on pairs of adjacent blocks and constructs flow problems such that each min-cut in the flow network is a feasible solution in regard to the original partitioning problem.

KaHyPar.

Since our algorithm is integrated into the KaHyPar framework, we briefly review its core components. While traditional multilevel HGP algorithms contract matchings or clusterings and therefore work with a coarsening hierarchy of $\mathcal{O}\!\left(\log n\right)$ levels, KaHyPar instantiates the multilevel paradigm in the extreme $n$ -level version, removing only a single vertex between two levels. After coarsening, a portfolio of simple algorithms is used to create an initial partition of the coarsest hypergraph. During uncoarsening, strong localized local search heuristics based on the FM algorithm [19, 46] are used to refine the solution. Our work builds on KaHyPar-CA [28], which is a direct $k$ -way partitioning algorithm for optimizing the $(\lambda-1)$ -metric. It uses an improved coarsening scheme that incorporates global information about the community structure of the hypergraph into the coarsening process.

2.3. The Flow-Based Improvement Framework of KaFFPa

We discuss the framework of Sanders and Schulz [47] in greater detail, since our work makes use of the techniques proposed by the authors. For simplicity, we assume $k=2$ . The techniques can be applied on a $k$ -way partition by repeatedly executing the algorithm on pairs of adjacent blocks. To schedule these refinements, the authors propose an active block scheduling algorithm, which schedules blocks as long as their participation in a pairwise refinement step results in some changes in the $k$ -way partition.

An $\varepsilon$ -balanced bipartition of a graph $G=(V,E,c,\omega)$ is improved with flow computations as follows. The basic idea is to construct a flow network $\mathcal{N}$ based on the induced subgraph $G[B]$ , where $B\subseteq V$ is a set of nodes around the cut of $G$ . The size of $B$ is controlled by an imbalance factor $\varepsilon^{\prime}:=\alpha\varepsilon$ , where $\alpha$ is a scaling parameter that is chosen adaptively depending on the result of the min-cut computation. If the heuristic found an $\varepsilon$ -balanced partition using $\varepsilon^{\prime}$ , the cut is accepted and $\alpha$ is increased to $\min(2\alpha,\alpha^{\prime})$ where $\alpha^{\prime}$ is a predefined upper bound. Otherwise it is decreased to $\max(\frac{\alpha}{2},1)$ . This scheme continues until a maximal number of rounds is reached or a feasible partition that did not improve the cut is found.

In each round, the corridor $B:=B_{1}\cup B_{2}$ is constructed by performing two breadth-first searches (BFS). The first BFS is done in the induced subgraph $G[V_{1}]$ . It is initialized with the boundary nodes of $V_{1}$ and stops if $c(B_{1})$ would exceed $(1+\epsilon^{\prime})\lceil\frac{c(V)}{2}\rceil-c(V_{2})$ . The second BFS constructs $B_{2}$ in an analogous fashion using $G[V_{2}]$ . Let $\delta B:=\{u\in B\ |\ \exists(u,v)\in E:v\notin B\}$ be the border of $B$ . Then $\mathcal{N}$ is constructed by connecting all border nodes $\delta B\cap V_{1}$ of $G[B]$ to the source $\mathpzc{s}$ and all border nodes $\delta B\cap V_{2}$ to the sink $\mathpzc{t}$ using directed edges with an edge weight of $\infty$ . By connecting $\mathpzc{s}$ and $\mathpzc{t}$ to the respective border nodes, it is ensured that edges incident to border nodes, but not contained in $G[B]$ , cannot become cut edges. For $\alpha=1$ , the size of $B$ thus ensures that the flow network $\mathcal{N}$ has the cut property, i.e., each $(\mathpzc{s},\mathpzc{t})$ -min-cut in $\mathcal{N}$ yields an $\varepsilon$ -balanced partition of $G$ with a possibly smaller cut. For larger values of $\alpha$ , this does not have to be the case.

After computing a max-flow in $\mathcal{N}$ , the algorithm tries to find a min-cut with better balance. This is done by exploiting the fact that one $(\mathpzc{s},\mathpzc{t})$ -max-flow contains information about all $(\mathpzc{s},\mathpzc{t})$ -min-cuts [44]. More precisely, the algorithm uses the 1–1 correspondence between $(\mathpzc{s},\mathpzc{t})$ -min-cuts and closed sets containing $\mathpzc{s}$ in the Picard-Queyranne-DAG $D_{\mathpzc{s},\mathpzc{t}}$ of the residual graph $\mathcal{N}_{f}$ [44]. First, $D_{\mathpzc{s},\mathpzc{t}}$ is constructed by contracting each strongly connected component of the residual graph. Then the following heuristic (called most balanced minimum cuts) is repeated several times using different random seeds. Closed node sets containing $s$ are computed by sweeping through the nodes of $DAG_{\mathpzc{s},\mathpzc{t}}$ in reverse topological order (e.g. computed using a randomized DFS). Each closed set induces a differently balanced min-cut and the one with the best balance (with respect to the original balance constraint) is used as resulting bipartition.

3. Hypergraph Max-Flow Min-Cut Refinement

In the following, we generalize the flow-based refinement algorithm of KaFFPa to hypergraph partitioning. In Section 3.1 we first show how hypergraph flow networks $\mathcal{N}$ are constructed in general and introduce a technique to reduce their size by removing low-degree hypernodes. Given a $k$ -way partition $\mathrm{\Pi}_{k}=\{V_{1},\dots,V_{k}\}$ of a hypergraph $H=(V,E)$ , a pair of blocks $(V_{i},V_{j})$ adjacent in the quotient graph $Q$ , and a corridor $B$ , Section 3.2 then explains how $\mathcal{N}$ is used to build a flow problem $\mathcal{F}$ based on a $B$ -induced subhypergraph $H_{B}=(V_{B},E_{B})$ . The flow problem $\mathcal{F}$ is constructed such that an $(\mathpzc{s},\mathpzc{t})$ -max-flow computation optimizes the cut metric of the bipartition $\mathrm{\Pi}_{2}=(V_{i},V_{j})$ of $H_{B}$ and thus improves the $(\lambda-1)$ -metric in $H$ . Section 3.3 then discusses the integration into KaHyPar and introduces several techniques to speed up flow-based refinement. Algorithm 1 gives a pseudocode description of the entire flow-based refinement framework.

Input: Hypergraph

H

k

-way partition

\mathrm{\Pi}_{k}=\{V_{1},\dots,V_{k}\}

, imbalance parameter

\varepsilon

Algorithm MaxFlowMinCutRefinement( $H,\mathrm{\Pi}_{k}$ )

Q:=\textnormal{{QuotientGraph}}(H,\mathrm{\Pi}_{k})

while $\exists$ active blocks $\in Q$ do // in the beginning all blocks are active

foreach $\{(V_{i},V_{j})\in Q\leavevmode\nobreak\ |\leavevmode\nobreak\ V_{i}\vee V_{j}\leavevmode\nobreak\ \text{is active}\}$ do // choose a pair of blocks

\mathrm{\Pi_{\text{old}}}=\mathrm{\Pi}_{\text{best}}:=\{V_{i},V_{j}\}\subseteq\Pi_{k}

// extract bipartition to be improved

\varepsilon_{\text{old}}=\varepsilon_{\text{best}}:=\textnormal{{imbalance}}(\Pi_{k})

// imbalance of current

k

-way partition

\alpha:=\alpha^{\prime}

// use large

B

-corridor for first iteration

do // adaptive flow iterations

B:=\textnormal{{computeB-Corridor}}(H,\mathrm{\Pi}_{\text{best}},\alpha\varepsilon)

// as described in Section 2.3

H_{B}:=\textnormal{{SubHypergraph}}(H,B)

// create

B

-induced subhypergraph

\mathcal{N}_{B}:=\textnormal{{FlowNetwork}}(H_{B})

// as described in Section 3.1

\mathcal{F}:=\textnormal{{FlowProblem}}(\mathcal{N_{B}})

// as described in Section 3.2

f:=\textnormal{{maxFlow}}(\mathcal{F})

// compute maximum flow on

\mathcal{F}

\mathrm{\Pi}_{f}:=\textnormal{{mostBalancedMinCut}}(f,\mathcal{F})

// as in Section 2.3 & 3.1

\varepsilon_{f}:=\textnormal{{imbalance}}(\mathrm{\Pi}_{f}\cup\mathrm{\Pi}_{k}\setminus\mathrm{\Pi}_{\text{old}})

// imbalance of new

k

-way partition

if $(\text{cut}(\mathrm{\Pi}_{f})<\text{cut}(\mathrm{\Pi}_{\text{best}})\wedge\varepsilon_{f}\leq\varepsilon)\vee\varepsilon_{f}<\varepsilon_{\text{best}}$ then // found improvement

\alpha:=\min(2\alpha,\alpha^{\prime}),\leavevmode\nobreak\ \mathrm{\Pi}_{\text{best}}:=\mathrm{\Pi}_{f},\leavevmode\nobreak\ \varepsilon_{\text{best}}:=\varepsilon_{f}

// update

\alpha

\mathrm{\Pi}_{\text{best}}

\varepsilon_{\text{best}}

else

\alpha:=\frac{\alpha}{2}

// decrease size of

B

-corridor in next iteration

while $\alpha\geq 1$

if $\mathrm{\Pi}_{\text{best}}\neq\mathrm{\Pi}_{\text{old}}$ then // improvement found

\mathrm{\Pi}_{k}:=\mathrm{\Pi}_{\text{best}}\cup\mathrm{\Pi}_{k}\setminus\mathrm{\Pi}_{\text{old}}

// replace

\mathrm{\Pi}_{\text{old}}

with

\mathrm{\Pi}_{\text{best}}

\textnormal{{activateForNextRound}}(V_{i},V_{j})

// reactivate blocks for next round

return

\mathrm{\Pi}_{k}

Output: improved

\varepsilon

-balanced

k

-way partition

\mathrm{\Pi}_{k}=\{V_{1},\dots,V_{k}\}

Algorithm 1 Flow-Based Refinement

3.1. Hypergraph Flow Networks

The Liu-Wong Network [40].

Given a hypergraph $H=(V,E,c,\omega)$ and two distinct nodes $\mathpzc{s}$ and $\mathpzc{t}$ , an $(\mathpzc{s},\mathpzc{t})$ -min-cut can be computed by finding a minimum-capacity cut in the following flow network $\mathcal{N}=(\mathcal{V},\mathcal{E})$ :

•

$\mathcal{V}$ contains all vertices in $V$ .
•

For each multi-pin net $e\in E$ with $|e|\geq 3$ , add two bridging nodes $e^{\prime}$ and $e^{\prime\prime}$ to $\mathcal{V}$ and a bridging edge $(e^{\prime},e^{\prime\prime})$ with capacity $\mathpzc{c}(e^{\prime},e^{\prime\prime})=\omega(e)$ to $\mathcal{E}$ . For each pin $p\in e$ , add two edges $(p,e^{\prime})$ and $(e^{\prime\prime},p)$ with capacity $\infty$ to $\mathcal{E}$ .
•

For each two-pin net $e=(u,v)\in E$ , add two bridging edges $(u,v)$ and $(v,u)$ with capacity $\omega(e)$ to $\mathcal{E}$ .

The flow network of Lawler [36] does not distinguish between two-pin and multi-pin nets. This increases the size of the network by two vertices and three edges per two-pin net. Figure 1 shows an example of the Lawler and Liu-Wong hypergraph flow networks as well as of our network described in the following paragraph.

Removing Low Degree Hypernodes.

We further decrease the size of the network by using the observation that the problem of finding an $(\mathpzc{s},\mathpzc{t})$ -min-cut of $H$ can be reduced to finding a minimum-weight $(\mathpzc{s},\mathpzc{t})$ -vertex-separator in the star-expansion, where the capacity of each star-node is the weight of the corresponding net and all other nodes (corresponding to vertices in $H$ ) have infinite capacity [29]. Since the separator has to be a subset of the star-nodes, it is possible to replace any infinite-capacity node by adding a clique between all adjacent star-nodes without affecting the separator. The key observation now is that an infinite-capacity node $v$ with degree $d(v)$ induces $2d(v)$ infinite-capacity edges in the Lawler network [36], while a clique between star-nodes induces $d(v)(d(v)-1)$ edges. For hypernodes with $d(v)\leq 3$ , it therefore holds that $d(v)(d(v)-1)\leq 2d(v)$ . Thus we can reduce the number of nodes and edges of the Liu-Wong network as follows. Before applying the transformation on the star-expansion of $H$ , we remove all infinite-capacity nodes $v$ corresponding to hypernodes with $d(v)\leq 3$ that are not incident to any two-pin nets and add a clique between all star-nodes adjacent to $v$ . In case $v$ was a source or sink node, we create a multi-source multi-sink problem by adding all adjacent star-nodes to the set of sources resp. sinks [20].

Refer to caption — Figure 1. Illustration of hypergraph flow networks. Our approach further sparsifies the flow network of Liu and Wong [40]. Thin edges have infinite capacity.

Reconstructing Min-Cuts.

After computing an $(\mathpzc{s},\mathpzc{t})$ -max-flow in the Lawler or Liu-Wong network, an $(\mathpzc{s},\mathpzc{t})$ -min-cut of $H$ can be computed by a BFS in the residual graph starting from $\mathpzc{s}$ . Let $S$ be the set of nodes corresponding to vertices of $H$ reached by the BFS. Then $(S,V\setminus S)$ is an $(\mathpzc{s},\mathpzc{t})$ -min-cut. Since our network does not contain low degree hypernodes, we use the following lemma to compute an $(\mathpzc{s},\mathpzc{t})$ -min-cut of $H$ :

Lemma 3.1.

Let $f$ be a maximum $(\mathpzc{s},\mathpzc{t})$ -flow in the Lawler network $\mathcal{N}=(\mathcal{V},\mathcal{E})$ of a hypergraph $H=(V,E)$ and $(\mathcal{S},\mathcal{V}\setminus\mathcal{S})$ be the corresponding $(\mathpzc{s},\mathpzc{t})$ -min-cut in $\mathcal{N}$ . Then for each node $v\in\mathcal{S}\cap V$ , the residual graph $\mathcal{N}_{f}=(\mathcal{V}_{f},\mathcal{E}_{f})$ contains at least one path $\langle\mathpzc{s},\dots,e^{\prime\prime}\rangle$ to a bridging node $e^{\prime\prime}$ of a net $e\in\mathrm{I}(v)$ .

Proof 3.2.

Since $v\in\mathcal{S}$ , there has to be some path $\mathpzc{s}\rightsquigarrow v$ in $\mathcal{N}_{f}$ . By definition of the flow network, this path can either be of the form $P_{1}=\langle\mathpzc{s},\dots,e^{\prime\prime},v\rangle$ or $P_{2}=\langle\mathpzc{s},\dots,e^{\prime},v\rangle$ for some bridging nodes $e^{\prime},e^{\prime\prime}$ corresponding to nets $e\in\mathrm{I}(v)$ . In the former case we are done, since $e^{\prime\prime}\in P_{1}$ . In the latter case the existence of edge $(e^{\prime},v)\in\mathcal{E}_{f}$ implies that there is a positive flow $f(v,e^{\prime})>0$ over edge $(v,e^{\prime})\in\mathcal{E}$ . Due to flow conservation, there exists at least one edge $(e^{\prime\prime},v)\in\mathcal{E}$ with $f(e^{\prime\prime},v)>0$ , which implies that $(v,e^{\prime\prime})\in\mathcal{E}_{f}$ . Thus we can extend the path $P_{2}$ to $\langle\mathpzc{s},\dots,e^{\prime},v,e^{\prime\prime}\rangle$ .

Thus $(A,V\setminus A)$ is an $(\mathpzc{s},\mathpzc{t})$ -min-cut of $H$ , where $A:=\{v\in e\leavevmode\nobreak\ |\leavevmode\nobreak\ \exists e\in E:\langle\mathpzc{s},\dots,e^{\prime\prime}\rangle\leavevmode\nobreak\ \text{in}\leavevmode\nobreak\ \mathcal{N}_{f}\}$ . Furthermore this allows us to search for more balanced min-cuts using the Picard-Queyranne-DAG of $\mathcal{N}_{f}$ as described in Section 2.3. By the definition of closed sets it follows that if a bridging node $e^{\prime\prime}$ is contained in a closed set $C$ , then all nodes $v\in\mathrm{\Gamma}(e^{\prime\prime})$ (which correspond to vertices of $H$ ) are also contained in $C$ . Thus we can use the respective bridging nodes $e^{\prime\prime}$ as representatives of removed low degree hypernodes.

3.2. Constructing the Hypergraph Flow Problem

Let $H_{B}=(V_{B},E_{B})$ be the subhypergraph of $H=(V,E)$ that is induced by a corridor $B$ computed in the bipartition $\mathrm{\Pi}_{2}=(V_{i},V_{j})$ . In the following, we distinguish between the set of internal border nodes $\overrightarrow{B}:=\{v\in V_{B}\leavevmode\nobreak\ |\leavevmode\nobreak\ \exists e\in E:\{u,v\}\subseteq e\wedge u\notin V_{B}\}$ and the set of external border nodes $\overleftarrow{B}:=\{u\notin V_{B}\leavevmode\nobreak\ |\leavevmode\nobreak\ \exists e\in E:\{u,v\}\subseteq e\wedge v\in V_{B}\}$ . Similarly, we distinguish between external nets $(e\cap V_{B}=\emptyset)$ with no pins inside $H_{B}$ , internal nets $(e\cap V_{B}=e)$ with all pins inside $H_{B}$ , and border nets $e\in\mathrm{I}(\overrightarrow{B})\cap\mathrm{I}(\overleftarrow{B})$ with some pins inside $H_{B}$ and some pins outside of $H_{B}$ . We use $\overleftrightarrow{E_{B}}$ to denote the set of border nets.

A hypergraph flow problem consists of a flow network $\mathcal{N}_{B}=(\mathcal{V}_{B},\mathcal{E}_{B})$ derived from $H_{B}$ and two additional nodes $\mathpzc{s}$ and $\mathpzc{t}$ that are connected to some nodes $v\in\mathcal{V}_{B}$ . Our approach works with all flow networks presented in Section 3.1. A flow problem has the cut property if the resulting min-cut bipartition $\mathrm{\Pi}_{f}$ of $H_{B}$ does not increase the $(\lambda-1)$ -metric in $H$ . Thus it has to hold that $\text{cut}(\mathrm{\Pi}_{f})\leq\text{cut}(\mathrm{\Pi}_{2})$ . While external nets are not affected by a max-flow computation, the max-flow min-cut theorem [21] ensures the cut property for all internal nets. Border nets however require special attention. Since a border net $e$ is only partially contained in $H_{B}$ , it will remain connected to the blocks of its external border nodes in $H$ . In case external border nodes connect $e$ to both $V_{i}$ and $V_{j}$ , it will remain a cut net in $H$ even if it is removed from the cut-set in $\mathrm{\Pi}_{f}$ . It is therefore necessary to “encode” information about external border nodes into the flow problem.

The KaFFPa Model and its Limitations.

In KaFFPa, this is done by directly connecting internal border nodes $\overrightarrow{B}$ to $\mathpzc{s}$ and $\mathpzc{t}$ . This approach can also be used for hypergraphs. In the hypergraph flow problem $\mathcal{F}_{G}$ , the source $\mathpzc{s}$ is connected to all nodes $\mathcal{S}=\overrightarrow{B}\cap V_{i}$ and all nodes $\mathcal{T}=\overrightarrow{B}\cap V_{j}$ are connected to $\mathpzc{t}$ using directed edges with infinite capacity. While this ensures that $\mathcal{F}_{G}$ has the cut property, applying the graph-based model to hypergraphs unnecessarily restricts the search space. Since all internal border nodes $\overrightarrow{B}$ are connected to either $\mathpzc{s}$ or $\mathpzc{t}$ , every min-cut $(S,V_{B}\setminus S)$ will have $\mathcal{S}\subseteq S$ and $\mathcal{T}\subseteq V_{B}\setminus S$ . The KaFFPa model therefore prevents all min-cuts in which any non-cut border net (i.e., $e\in\overleftrightarrow{E_{B}}$ with $\lambda(e)=1$ ) becomes part of the cut-set. This restricts the space of possible solutions, since corridor $B$ was computed such that even a min-cut along either side of the border would result in a feasible cut in $H_{B}$ . Thus, ideally, all vertices $v\in B$ should be able to change their block as result of an $(\mathpzc{s},\mathpzc{t})$ -max-flow computation on $\mathcal{F}_{G}$ – not only vertices $v\in B\setminus\overrightarrow{B}$ . This limitation becomes increasingly relevant for hypergraphs with large nets as well as for partitioning problems with small imbalance $\varepsilon$ , since large nets are likely to be only partially contained in $H_{B}$ and tight balance constraints enforce small $B$ -corridors. While the former is a problem only for HGP, the latter also applies to GP.

A more flexible Model.

We propose a more general model that allows an $(\mathpzc{s},\mathpzc{t})$ -max-flow computation to also cut through border nets by exploiting the structure of hypergraph flow networks. Instead of directly connecting $\mathpzc{s}$ and $\mathpzc{t}$ to internal border nodes $\overrightarrow{B}$ and thus preventing all min-cuts in which these nodes switch blocks, we conceptually extend $H_{B}$ to contain all external border nodes $\overleftarrow{B}$ and all border nets $\overleftrightarrow{E_{B}}$ . The resulting hypergraph is $\overleftarrow{H_{B}}=(V_{B}\cup\overleftarrow{B},\{e\in E\leavevmode\nobreak\ |\leavevmode\nobreak\ e\cap V_{B}\neq\emptyset\})$ . The key insight now is that by using the flow network of $\overleftarrow{H_{B}}$ and connecting $\mathpzc{s}$ resp. $\mathpzc{t}$ to the external border nodes $\overleftarrow{B}\cap V_{i}$ resp. $\overleftarrow{B}\cap V_{j}$ , we get a flow problem that does not lock any node $v\in V_{B}$ in its block, since none of these nodes is directly connected to either $\mathpzc{s}$ or $\mathpzc{t}$ . Due to the max-flow min-cut theorem [21], this flow problem furthermore has the cut property, since all border nets of $H_{B}$ are now internal nets and all external border nodes $\overleftarrow{B}$ are locked inside their block. However, it is not necessary to use $\overleftarrow{H_{B}}$ instead of $H_{B}$ to achieve this result. For all vertices $v\in\overleftarrow{B}$ the flow network of $\overleftarrow{H_{B}}$ contains paths $\langle\mathpzc{s},v,e^{\prime}\rangle$ and $\langle e^{\prime\prime},v,\mathpzc{t}\rangle$ that only involve infinite-capacity edges. Therefore we can remove all nodes $v\in\overleftarrow{B}$ by directly connecting $\mathpzc{s}$ and $\mathpzc{t}$ to the corresponding bridging nodes $e^{\prime},e^{\prime\prime}$ via infinite-capacity edges without affecting the maximal flow [20]. More precisely, in the hypergraph flow problem $\mathcal{F}_{H}$ , we connect $\mathpzc{s}$ to all bridging nodes $e^{\prime}$ corresponding to border nets $e\in\overleftrightarrow{E_{B}}:e\subset\overleftarrow{B}\cap V_{i}$ and all bridging nodes $e^{\prime\prime}$ corresponding to border nets $e\in\overleftrightarrow{E_{B}}:e\subset\overleftarrow{B}\cap V_{j}$ to $\mathpzc{t}$ using directed, infinite-capacity edges.

Single-Pin Border Nets.

Furthermore, we model border nets with $|e\cap\overrightarrow{B}|=1$ more efficiently. For such a net $e$ , the flow problem contains paths of the form $\langle\mathpzc{s},e^{\prime},e^{\prime\prime},v\rangle$ or $\langle v,e^{\prime},e^{\prime\prime},\mathpzc{t}\rangle$ which can be replaced by paths of the form $\langle\mathpzc{s},e^{\prime},v\rangle$ or $\langle v,e^{\prime\prime},\mathpzc{t}\rangle$ with $\mathpzc{c}(e^{\prime},v)=\omega(e)$ resp. $\mathpzc{c}(v,e^{\prime\prime})=\omega(e)$ . In both cases we can thus remove one bridging node and two infinite-capacity edges. A comparison of $\mathcal{F}_{G}$ and $\mathcal{F}_{H}$ is shown in Figure 2.

3.3. Implementation Details

Since KaHyPar is an $n$ -level partitioner, its FM-based local search algorithms are executed each time a vertex is uncontracted. To prevent expensive recalculations, it therefore uses a cache to maintain the gain values of FM moves throughout the $n$ -level hierarchy [2]. In order to combine our flow-based refinement with FM local search, we not only perform the moves induced by the max-flow min-cut computation but also update the FM gain cache accordingly.

Since it is not feasible to execute our algorithm on every level of the $n$ -level hierarchy, we use an exponentially spaced approach that performs flow-based refinements after uncontracting $i=2^{j}$ vertices for $j\in\mathbb{N}_{+}$ . This way, the algorithm is executed more often on smaller flow problems than on larger ones. To further improve the running time, we introduce the following speedup techniques:

•

S1: We modify active block scheduling such that after the first round the algorithm is only executed on a pair of blocks if at least one execution using these blocks improved connectivity or imbalance of the partition on previous levels.
•

S2: For all levels except the finest level: Skip flow-based refinement if the cut between two adjacent blocks is less than ten.
•

S3: Stop resizing the corridor $B$ if the current $(\mathpzc{s},\mathpzc{t})$ -cut did not improve the previously best solution.

4. Experimental Evaluation

We implemented the max-flow min-cut refinement algorithm in the $n$ -level hypergraph partitioning framework KaHyPar (Karlsruhe Hypergraph Partitioning). The code is written in C++ and compiled using g++-5.2 with flags -O3 -march=native. The latest version of the framework is called KaHyPar-CA [28]. We refer to our new algorithm as KaHyPar-MF. Both versions use the default configuration for community-aware direct $k$ -way partitioning.¹¹1https://github.com/SebastianSchlag/kahypar/blob/master/config/km1_direct_kway_sea17.ini

Instances.

All experiments use hypergraphs from the benchmark set of Heuer and Schlag [28]²²2The complete benchmark set along with detailed statistics for each hypergraph is publicly available from http://algo2.iti.kit.edu/schlag/sea2017/., which contains $488$ hypergraphs derived from four benchmark sets: the ISPD98 VLSI Circuit Benchmark Suite [3], the DAC 2012 Routability-Driven Placement Contest [55], the University of Florida Sparse Matrix Collection [15], and the international SAT Competition 2014 [9]. Sparse matrices are translated into hypergraphs using the row-net model [12], i.e., each row is treated as a net and each column as a vertex. SAT instances are converted to three different representations: For literal hypergraphs, each boolean literal is mapped to one vertex and each clause constitutes a net [43], while in the primal model each variable is represented by a vertex and each clause is represented by a net. In the dual model the opposite is the case [41]. All hypergraphs have unit vertex and net weights.

Table 1 gives an overview about the different benchmark sets used in the experiments. The full benchmark set is referred to as set A. We furthermore use the representative subset of 165 hypergraphs proposed in [28] (set B) and a smaller subset consisting of $25$ hypergraphs (set C), which is used to devise the final configuration of KaHyPar-MF. Basic properties of set C can be found in Table 10 in Appendix C. Unless mentioned otherwise, all hypergraphs are partitioned into $k\in\{2,4,8,16,32,64,128\}$ blocks with $\varepsilon=0.03$ . For each value of $k$ , a $k$ -way partition is considered to be one test instance, resulting in a total of $175$ instances for set C, $1155$ instances for set B and $3416$ instances for set A. Furthermore we use 15 graphs from [42] to compare our flow model $\mathcal{F}_{H}$ to the KaFFPa [47] model $\mathcal{F}_{G}$ . Table 11 in Appendix C summarizes the basic properties of these graphs, which constitute set D.

Table 1. Overview about different benchmark sets. Set B and set C are subsets of set A.

	Source	#	DAC	ISPD98	Primal	Dual	Literal	SPM	Graphs
Set A	[28]	477	10	18	92	92	92	184	-
Set B	[28]	165	5	10	30	30	30	60	-
Set C	new	25	-	5	5	5	5	5	-
Set D	[42]	15	-	-	-	-	-	-	15

System and Methodology.

All experiments are performed on a single core of a machine consisting of two Intel Xeon E5-2670 Octa-Core processors (Sandy Bridge) clocked at $2.6$ GHz. The machine has $64$ GB main memory, $20$ MB L3- and 8x256 KB L2-Cache and is running RHEL 7.2. We compare KaHyPar-MF to KaHyPar-CA, as well as to the $k$ -way (hMetis-K) and the recursive bisection variant (hMetis-R) of hMetis 2.0 (p1) [32, 33], and to PaToH 3.2 [12]. These HGP libraries were chosen because they provide the best solution quality [2, 28]. The partitioning results of these tools are already available from http://algo2.iti.kit.edu/schlag/sea2017/. For each partitioner except PaToH the results summarize ten repetitions with different seeds for each test instance and report the arithmetic mean of the computed cut and running time as well as the best cut found. Since PaToH ignores the random seed if configured to use the quality preset, the results contain both the result of single run of the quality preset (PaToH-Q) and the average over ten repetitions using the default configuration (PaToH-D). Each partitioner had a time limit of eight hours per test instance. We use the same number of repetitions and the same time limit for our experiments with KaHyPar-MF.

In the following, we use the geometric mean when averaging over different instances in order to give every instance a comparable influence on the final result. In order to compare the algorithms in terms of solution quality, we perform a more detailed analysis using improvement plots. For each algorithm, these plots relate the minimum connectivity of KaHyPar-MF to the minimum connectivity produced by the corresponding algorithm on a per-instance basis. For each algorithm, these ratios are sorted in decreasing order. The plots use a cube root scale for the y-axis to reduce right skewness [14] and show the improvement of KaHyPar-MF in percent (i.e., $1-(\text{KaHyPar-MF}/\text{algorithm})$ ) on the y-axis. A value below zero indicates that the partition of KaHyPar-MF was worse than the partition produced by the corresponding algorithm, while a value above zero indicates that KaHypar-MF performed better than the algorithm in question. A value of zero implies that the partitions of both algorithms had the same solution quality. Values above one correspond to infeasible solutions that violated the balance constraint. In order to include instances with a cut of zero into the results, we set the corresponding cut values to one for ratio computations.

Table 2. Statistics of benchmark set B. We use

\overline{x}

to denote mean and

\widetilde{x}

to denote the median.

Type	#	$\overline{d(v)}$	$\widetilde{d(v)}$	$\overline{\|e\|}$	$\widetilde{\|e\|}$
DAC	5	3.32	3.28	3.37	3.35
ISPD	10	4.20	4.24	3.89	3.90
Primal	30	16.29	9.97	2.63	2.39
Literal	30	8.21	4.99	2.63	2.39
Dual	30	2.63	2.38	16.29	9.97
SPM	60	24.78	14.15	26.58	15.01

4.1. Evaluating Flow Networks, Models, and Algorithms

Flow Networks and Algorithms.

To analyze the effects of the different hypergraph flow networks we compute five bipartitions for each hypergraph of set B with KaHyPar-CA using different seeds. Statistics of the hypergraphs are shown in Table 2. The bipartitions are then used to generate hypergraph flow networks for a corridor of size $|B|=\numprint{25000}$ hypernodes around the cut. Figure 3 (top) summarizes the sizes of the respective flow networks in terms of number of nodes $|\mathcal{V}|$ and number of edges $|\mathcal{E}|$ for each instance class. The flow networks of primal and literal SAT instances are the largest in terms of both numbers of nodes and edges. High average vertex degree combined with low average net sizes leads to subhypergraphs $H_{B}$ containing many small nets, which then induce many nodes and (infinite-capacity) edges in $\mathcal{N}_{L}$ . Dual instances with low average degree and large average net size on the other hand lead to smaller flow networks. For VLSI instances (DAC, ISPD) both average degree and average net sizes are low, while for SPM hypergraphs the opposite is the case. This explains why SPM flow networks have significantly more edges, despite the number of nodes being comparable in both classes.

As expected, the Lawler-Network $\mathcal{N}_{L}$ induces the biggest flow problems. Looking at the Liu-Wong network $\mathcal{N}_{W}$ , we can see that distinguishing between graph edges and nets with $|e|\geq 3$ pins has an effect for all hypergraphs with many small nets (i.e., DAC, ISPD, Primal, Literal). While this technique alone does not improve dual SAT instances, we see that the combination of the Liu-Wong approach and our removal of low degree hypernodes in $\mathcal{N}_{\text{Our}}$ reduces the size of the networks for all instance classes except SPM. Both techniques only have a limited effect on these instances, since both hypernode degrees and net sizes are large on average. Since our flow problems are based on $B$ -corridor induced subhypergraphs, $\mathcal{N}_{\text{Our}}^{1}$ additionally models single-pin border nets more efficiently as described in Section 3.2. This further reduces the network sizes significantly. As expected, the reduction in numbers of nodes and edges is most pronounced for hypergraphs with low average net sizes because these instances are likely to contain many single-pin border nets.

To further see how these reductions in network size translate to improved running times of max-flow algorithms, we use these networks to create flow problems using our flow model $\mathcal{F}_{H}$ and compute min-cuts using two highly tuned max-flow algorithms, namely the BK-algorithm³³3Available from: https://github.com/gerddie/maxflow [10] and the incremental breadth-first search (IBFS) algorithm⁴⁴4Available from: http://www.cs.tau.ac.il/~sagihed/ibfs/code.html [23]. These algorithms were chosen because they performed best in preliminary experiments [27]. We then compare the speedups of these algorithms when executed on $\mathcal{N}_{W}$ , $\mathcal{N}_{\text{Our}}$ , and $\mathcal{N}_{\text{Our}}^{1}$ to the execution on the Lawler network $\mathcal{N}_{L}$ . As can be seen in Figure 3 (bottom) both algorithms benefit from improved network models and the speedups directly correlate with the reductions in network size. While $\mathcal{N}_{W}$ significantly reduces the running times for Primal and Literal instances, $\mathcal{N}_{\text{Our}}$ additionally leads to a speedup for Dual instances. By additionally considering single-pin border nets, $\mathcal{N}_{\text{Our}}^{1}$ results in an average speedup between $1.52$ and $2.21$ (except for SPM instances). Since IBFS outperformed the BK algorithm in [27], we use $\mathcal{N}_{\text{Our}}^{1}$ and IBFS in all following experiments.

	Hypergraphs			Graphs
$\alpha^{\prime}$	$\varepsilon=1\%$	$\varepsilon=3\%$	$\varepsilon=5\%$	$\varepsilon=1\%$	$\varepsilon=3\%$	$\varepsilon=5\%$
$1$	$7.7$	$8.1$	$7.6$	$11.7$	$11.3$	$10.5$
$2$	$7.9$	$6.6$	$4.8$	$11.0$	$9.1$	$7.8$
$4$	$6.9$	$3.9$	$2.7$	$9.9$	$7.3$	$5.4$
$8$	$5.1$	$2.3$	$1.5$	$8.6$	$5.3$	$3.9$
$16$	$3.4$	$1.3$	$1.2$	$7.0$	$4.1$	$3.5$

Class	Hypergraph	$n$	$m$	$p$
ISPD	ibm06	32 498	34 826	128 182
	ibm07	45 926	48 117	175 639
	ibm08	51 309	50 513	204 890
	ibm09	53 395	60 902	222 088
	ibm10	69 429	75 196	297 567
Dual	6s9	100 384	34 317	234 228
	6s133	140 968	48 215	328 924
	6s153	245 440	85 646	572 692
	dated-10-11-u	629 461	141 860	1 429 872
	dated-10-17-u	1 070 757	229 544	2 471 122
Literal	6s133	96 430	140 968	328 924
	6s153	171 292	245 440	572 692
	aaai10-planning-ipc5	107 838	308 235	690 466
	dated-10-11-u	283 720	629 461	1 429 872
	atco_enc2_opt1_05_21	112 732	526 872	2 097 393
Primal	6s153	85 646	245 440	572 692
	aaai10-planning-ipc5	53 919	308 235	690 466
	hwmcc10-timeframe	163 622	488 120	1 138 944
	dated-10-11-u	141 860	629 461	1 429 872
	atco_enc2_opt1_05_21	56 533	526 872	2 097 393
SPM	mult_dcop_01	25 187	25 187	193 276
	vibrobox	12 328	12 328	342 828
	RFdevice	74 104	74 104	365 580
	mixtank_new	29 957	29 957	1 995 041
	laminar_duct3D	67 173	67 173	3 833 077

Graph	$n$	$m$
p2p-Gnutella04	6 405	29 215
wordassociation-2011	10 617	63 788
PGPgiantcompo	10 680	24 316
email-EuAll	16 805	60 260
as-22july06	22 963	48 436
soc-Slashdot0902	28 550	379 445
loc-brightkite	56 739	212 945
enron	69 244	254 449
loc-gowalla	196 591	950 327
coAuthorsCiteseer	227 320	814 134
wiki-Talk	232 314	$\approx$ 1.5M
citationCiteseer	268 495	$\approx$ 1.2M
coAuthorsDBLP	299 067	977 676
cnr-2000	325 557	$\approx$ 2.7M
web-Google	356 648	$\approx$ 2.1M

Hypergraph	2	4	8	16	32	64	128
Primal
10pipe-q0-k	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$
11pipe-k	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$	❍ $\square$
11pipe-q0-k	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$
9dlx-vliw-at-b-iq3	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$
9vliw-m-9stages-iq3-C1-bug7	$\triangle$	$\triangle$		$\triangle$	❍ $\triangle$	❍ $\triangle$	❍ $\triangle$
9vliw-m-9stages-iq3-C1-bug8	$\triangle$	$\triangle$		$\triangle$	❍ $\triangle$	❍ $\triangle$	❍ $\triangle$
blocks-blocks-37-1.130-NOTKNOWN	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$
openstacks-p30-3.085-SAT	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$
openstacks-sequencedstrips-nonadl-nonnegated-os-sequencedstrips-p30-3.025-NOTKNOWN	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$
openstacks-sequencedstrips-nonadl-nonnegated-os-sequencedstrips-p30-3.085-SAT	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$
transport-transport-city-sequential-25nodes-1000size-3degree-100mindistance-3trucks-10packages-2008seed.050-NOTKNOWN	$\square$			$\square$			$\square$
velev-vliw-uns-2.0-uq5	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$
velev-vliw-uns-4.0-9	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$	$\square$
Literal
11pipe-k				❍	❍	❍	❍
9vliw-m-9stages-iq3-C1-bug7	$\triangle$	$\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\square$ $\triangle$	⚫❍ $\square$ $\triangle$
9vliw-m-9stages-iq3-C1-bug8	$\triangle$	$\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\square$ $\triangle$	⚫❍ $\square$ $\triangle$
blocks-blocks-37-1.130		$\square$	$\square$	$\square$	$\square$	$\square$	$\square$
Dual
10pipe-q0-k				$\triangle$	$\triangle$	$\triangle$	❍ $\triangle$
11pipe-k	$\triangle$	❍ $\triangle$	❍ $\triangle$	❍ $\triangle$	❍ $\triangle$	❍ $\triangle$	❍ $\triangle$
11pipe-q0-k					$\triangle$	❍ $\triangle$	❍ $\triangle$
9dlx-vliw-at-b-iq3							$\triangle$
9vliw-m-9stages-iq3-C1-bug7	$\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$
9vliw-m-9stages-iq3-C1-bug8	$\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$	⚫❍ $\triangle$
blocks-blocks-37-1.130-NOTKNOWN		❍	⚫❍	⚫❍	⚫❍	⚫❍	⚫❍ $\triangle$
E02F20							❍
E02F22						❍	❍
q-query-3-L100-coli.sat							$\triangle$
q-query-3-L150-coli.sat						$\triangle$	$\triangle$
q-query-3-L200-coli.sat					$\triangle$	$\triangle$	$\triangle$
q-query-3-L80-coli.sat							$\triangle$
transport-transport-city-sequential-25nodes-1000size-3degree-100mindistance-3trucks-10packages-2008seed.030-NOTKNOWN							$\triangle$
velev-vliw-uns-2.0-uq5			$\triangle$	$\triangle$	$\triangle$	$\triangle$	$\triangle$
velev-vliw-uns-4.0-9					$\triangle$	$\triangle$	$\triangle$
SPM
192bit	$\square$			$\square$
appu						❍	❍
ESOC	$\square$	$\square$			$\square$	❍ $\square$	$\square$
human-gene2					❍ $\triangle$	❍ $\triangle$	❍ $\triangle$
IMDB				$\triangle$	$\triangle$	$\triangle$	$\triangle$
kron-g500-logn16		$\triangle$	$\triangle$	$\triangle$	$\triangle$	❍ $\triangle$	❍ $\triangle$
Rucci1					$\square$
sls	$\square$	$\square$	$\square$	❍ $\square$	❍ $\square$	❍ $\square$	❍ $\square$
Trec14							❍

$\triangle$ :	KaHyPar-CA exceeded time limit
⚫ :	hMetis-R exceeded time limit
❍ :	hMetis-K exceeded time limit
$\square$ :	PaToH-Q memory allocation error

Network Flow-Based Refinement for Multilevel Hypergraph Partitioning

Abstract.

Key words and phrases:

1991 Mathematics Subject Classification:

1. Introduction

Outline and Contribution.

2. Preliminaries

2.1. Notation and Definitions

2.2. Related Work

Hypergraph Partitioning.

Flows on Hypergraphs.

Flow-Based Graph Partitioning.

KaHyPar.

2.3. The Flow-Based Improvement Framework of KaFFPa

3. Hypergraph Max-Flow Min-Cut Refinement

3.1. Hypergraph Flow Networks

The Liu-Wong Network [40].

Removing Low Degree Hypernodes.

Reconstructing Min-Cuts.

Lemma 3.1.

Proof 3.2.

3.2. Constructing the Hypergraph Flow Problem

The KaFFPa Model and its Limitations.

A more flexible Model.

Single-Pin Border Nets.

3.3. Implementation Details

4. Experimental Evaluation

Instances.

System and Methodology.

4.1. Evaluating Flow Networks, Models, and Algorithms

Flow Networks and Algorithms.

Flow Models.

4.2. Configuring the Algorithm

4.3. Comparison with other Systems

5. Conclusion

References

Appendix A Effectiveness Tests

Appendix B Average Connectivity Improvement

Appendix C Properties of Benchmark Sets

Appendix D Excluded Instances