\Style

DSymb=d,DShorten=true,IntegrateDifferentialDSymb=d

Uniform Sampling through the Lovász Local Lemma

Heng Guo School of Informatics, University of Edinburgh, Informatics Forum, Edinburgh EH8 9AB, United Kingdom. hguo@inf.ed.ac.uk , Mark Jerrum School of Mathematical Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, United Kingdom. mj@qmul.ac.uk and Jingcheng Liu Department of EECS, University of California, Berkeley, CA liuexp@berkeley.edu

Abstract.

We propose a new algorithmic framework, called “partial rejection sampling”, to draw samples exactly from a product distribution, conditioned on none of a number of bad events occurring. Our framework builds new connections between the variable framework of the Lovász Local Lemma and some classical sampling algorithms such as the “cycle-popping” algorithm for rooted spanning trees. Among other applications, we discover new algorithms to sample satisfying assignments of $k$ -CNF formulas with bounded variable occurrences.

1. Introduction

The Lovász Local Lemma [9] is a classical gem in combinatorics that guarantees the existence of a perfect object that avoids all events deemed to be “bad”. The original proof is non-constructive but there has been great progress in the algorithmic aspects of the local lemma. After a long line of research [3, 2, 30, 8, 34, 37], the celebrated result by Moser and Tardos [31] gives efficient algorithms to find such a perfect object under conditions that match the Lovász Local Lemma in the so-called variable framework. However, it is natural to ask whether, under the same condition, we can also sample a perfect object uniformly at random instead of merely finding one.

Roughly speaking, the resampling algorithm by Moser and Tardos [31] works as follows. We initialize all variables randomly. If bad events occur, then we arbitrarily choose a bad event and resample all the involved variables. Unfortunately, it is not hard to see that this algorithm can produce biased samples. This seems inevitable. As Bezáková et al. showed [4], sampling can be NP-hard even under conditions that are stronger than those of the local lemma. On the one hand, the symmetric Lovász Local Lemma only requires $ep\Delta\leq 1$ , where $p$ is the probability of bad events and $\Delta$ is the maximum degree of the dependency graph. On the other hand, translating the result of [4] to this setting, one sees that as soon as $p\Delta^{2}\geq C$ for some constant $C$ , then even approximately sampling perfect objects in the variable framework becomes NP-hard.

The starting point of our work is a new condition (see Condition 5) under which we show that the output of the Moser-Tardos algorithm is in fact uniform (see Theorem 8). Intuitively, the condition requires any two dependent bad events to be disjoint. Indeed, instances satisfying this condition are called “extremal” in the study of Lovász Local Lemma. For these extremal instances, we can in fact resample in a parallel fashion, since the occurring bad events form an independent set in the dependency graph. We call this algorithm “partial rejection sampling”,¹¹1Despite the apparent similarity in names, our algorithm is different from “partial resampling” in [20, 21]. We resample all variables in certain sets of events whereas “partial resampling” only resamples a subset of variables from some bad event. in the sense that it is like rejection sampling, but only resamples an appropriate subset of variables.

Our result puts some classical sampling algorithms under a unified framework, including the “cycle-popping” algorithm by Wilson [39] for sampling rooted spanning trees, and the “sink-popping” algorithm by Cohn, Pemantle, and Propp [7] for sampling sink-free orientations of an undirected graph. Indeed, Cohn et al. [7] coined the term “partial rejection sampling” and asked for a general theory, and we believe that extremal instances under the variable framework is a satisfactory answer. With our techniques, we are able to give a new algorithm to sample solutions for a special class of $k$ -CNF formulas, under conditions matching the Lovász Local Lemma (see Corollary 19), which is an NP-hard task for general $k$ -CNF formulas. Furthermore, we provide explicit formulas for the expected running time of these algorithms (see Theorem 13), which matches the running time upper bound given by Kolipaka and Szegedy [26] under Shearer’s condition [35].

The next natural question is thus whether we can go beyond extremal instances. Indeed, our main technical contribution is a general uniform sampler (Algorithm 6) that applies to any problem under the variable framework. The main idea is that, instead of only resampling occurring bad events, we resample a larger set of events so that the choices made do not block any perfect assignments in the end, in order to make sure of uniformity in the final output.

As a simple example, we describe how our algorithm samples independent sets. The algorithm starts by choosing each vertex with probability $1/2$ independently. At each subsequent round, in the induced subgraph on the currently chosen vertices, the algorithm finds all the connected components of size $\geq 2$ . Then it resamples all these vertices and their boundaries (which are unoccupied). And it repeats this process until there is no edge with both endpoints occupied. What seems surprising is that this simple process does yield a uniformly random independent set when it stops. Indeed, as we will show in Theorem 34, this simple process is an exact sampler for weighted independent sets (also known as the hard-core model in statistical physics). In addition, it runs in expected linear time under a condition that matches, up to a constant factor, the uniqueness threshold of the model (beyond which the problem of approximate sampling becomes NP-hard).

In the more general setting, we will choose the set of events to be resampled, denoted by ${\sf Res}$ , iteratively. We start from the set of occurring bad events. Then we include all neighbouring events of the current set ${\sf Res}$ , until there is no event $A$ on the boundary of ${\sf Res}$ such that the current assignment, projected on the common variables of $A$ and ${\sf Res}$ , can be extended so that $A$ may happen. In the worst case, we will resample all events (there is no event in the boundary at all). In that scenario the algorithm is the same as a naive rejection sampling, but typically we resample fewer variables in every step. We show that this is a uniform sampler on assignments that avoid all bad events once it stops (see Theorem 24).

One interesting feature of our algorithm is that, unlike Markov chain based algorithms, ours does not require the solution space (or any augmented space) to be connected. Moreover, our sampler is exact; that is, when the algorithm halts, the final distribution is precisely the desired distribution. Prior to our work, most exact sampling algorithms were obtained by coupling from the past [32]. We also note that previous work on the Moser-Tardos output distribution, such as [19], is not strong enough to guarantee a uniform sample (or $\varepsilon$ -close to uniform in terms of total variation distances).

We give sufficient conditions that guarantee a linear expected running time of our algorithm in the general setting (see Theorem 25). The first condition is that $p\Delta^{2}$ is bounded above by a constant. This is optimal up to constants in observance of the NP-hardness result in [4]. Unfortunately, the condition on $p\Delta^{2}$ alone does not make the algorithm efficient. In addition, we also need to bound the expansion from bad events to resampling events, which leads to an extra condition on intersections of bad events. Removing this extra condition seems to require substantial changes to our current algorithm.

To illustrate the result, we apply our algorithm to sample satisfying assignments of $k$ -CNF formulas in which the degree of each variable (the number of clauses containing it) is at most $d$ . We say that a $k$ -CNF formula has intersection $s$ if any two dependent clauses share at least $s$ variables. The extra condition from our analysis naturally leads to a lower bound on $s$ . Let $n$ be the number of variables. We (informally) summarize our results on $k$ -CNF formulas as follows (see Corollary 30 and Theorem 32):

•

If $d\leq\frac{1}{6e}\cdot 2^{k/2}$ , $dk\geq 2^{3e}$ and $s\geq\min\{\log_{2}dk,k/2\}$ , then the general partial rejection resampling algorithm outputs a uniformly random solution to a $k$ -CNF formula with degree $d$ and intersection $s$ in expected running time $O(n)$ .
•

If $d\geq 4\cdot 2^{k/2}$ (for an even $k$ ), then even if $s=k/2$ , it is NP-hard even to approximately sample a solution to a $k$ -CNF formula with degree $d$ and intersection $s$ .

As shown in the hardness result, the intersection bound does not render the problem trivial.

Previously, sampling/counting satisfying assignments of $k$ -CNF formulas required the formula to be monotone and $d\leq k$ to be large enough [4] (see also [5, 28]). Although our result requires an additional lower bound on intersections, not only does it improve the dependency of $k$ and $d$ exponentially, but also achieves a matching constant $1/2$ in the exponent. Furthermore the samples produced are exactly uniform. Thus, if the extra condition on intersections can be removed, we will have a sharp phase transition at around $d=O(2^{k/2})$ in the computational complexity of sampling solutions to $k$ -CNF formulas with bounded variable occurrences. A similar sharp transition has been recently established for, e.g., sampling configurations in the hard-core model [38, 36, 12].

Simultaneous to our work, Hermon, Sly, and Zhang [24] showed that Markov chains for monotone $k$ -CNF formulas are rapidly mixing, if $d\leq c2^{k/2}$ for a constant $c$ . In another parallel work, Moitra [29] gave a novel algorithm to sample solutions for general $k$ -CNF when $d\lesssim 2^{k/60}$ . We note that neither results are directly comparable to ours and the techniques are very different. Both of these two samplers are approximate while ours is exact. Moreover, ours does not require monotonicity (unlike [24]), and allows larger $d$ than [29] but at the cost of an extra intersection lower bound. Unfortunately, our algorithm can be exponentially slow when the intersection $s$ is not large enough. In sharp contrast, as shown by Hermon et al. [24], Markov chains mix rapidly for $d\leq c2^{k}/k^{2}$ when $s=1$ .

While the study of algorithmic Lovász Local Lemma has progressed beyond the variable framework [22, 1, 23], we restrict our focus to the variable framework in this work. It is also an interesting future direction to investigate and extend our techniques of uniform sampling beyond the variable framework. For example, one may want to sample a permutation that avoids certain patterns. The classical sampling problem of perfect matchings in a bipartite graph can be formulated in this way.

Since the conference version of this paper appeared [18], a number of applications of the partial rejection sampling method have been found [17, 16, 15]. One highlight is the first fully polynomial-time randomised approximation scheme (FPRAS) for all-terminal network reliability [17]. For the extremal instances, tight running time bounds have also been obtained [14]. Moreover, partial rejection sampling is adapted to dynamic and distributed settings as well [10].

2. Partial Rejection Sampling

We first describe the “variable” framework. Let $\{X_{1},\dots,X_{n}\}$ be a set of random variables. Each $X_{i}$ can have its own distribution and range $D_{i}$ . Let $\{A_{1},\dots,A_{m}\}$ be a set of “bad” events that depend on $X_{i}$ ’s. For example, for a constraint satisfaction problem (CSP) with variables $X_{i}$ ( $1\leq i\leq n$ ) and constraints $C_{j}$ ( $1\leq j\leq m$ ), each $A_{j}$ is the set of unsatisfying assignments of $C_{j}$ for $1\leq j\leq m$ . Let ${\sf var}(A_{i})$ be the (index) set of variables that $A_{i}$ depends on.

The dependency graph $G=(V,E)$ has $m$ vertices, identified with the integers $\{1,2,\ldots,m\}$ , corresponding to the events $A_{i}$ , and $(i,j)$ is an edge if $A_{i}$ and $A_{j}$ depend on one or more common variables, and $i\neq j$ . In other words, for any distinct $i,j$ , $(i,j)\in E$ if ${\sf var}(A_{i})\cap{\sf var}(A_{j})\neq\emptyset$ . We write $A_{i}\sim A_{j}$ if the vertices $i$ and $j$ are adjacent in $G$ . The asymmetric Lovász Local Lemma [9] states the following.

Theorem 1.

If there exists a vector $\boldsymbol{x}\in[0,1)^{m}$ such that $\forall i\in[m]$ ,

(1)

\displaystyle\mathop{\mathrm{Pr}}\nolimits(A_{i})\leq x_{i}\prod_{(i,j)\in E}(1-x_{j}),

then $\displaystyle\mathop{\mathrm{Pr}}\nolimits\left(\bigwedge_{i=1}^{m}\overline{A_{i}}\right)\geq\prod_{i=1}^{m}(1-x_{i})>0$ .

Theorem 1 has a condition that is easy to verify, but not necessarily optimal. Shearer [35] gave the optimal condition for the local lemma to hold for a fixed dependency graph $G$ . To state Shearer’s condition, we will need the following definitions. Let $p_{i}:=\mathop{\mathrm{Pr}}\nolimits(A_{i})$ for all $1\leq i\leq m$ . Let $\mathcal{I}$ be the collection of independent sets of $G$ . Define the following quantity:

\displaystyle q_{I}(\textbf{p}):=\sum_{J\in\mathcal{I},\,I\subseteq J}(-1)^{|J|-|I|}\prod_{i\in J}p_{i},

where $\textbf{p}=(p_{1},\ldots,p_{m})$ . When there is no confusion we also simply write $q_{I}$ instead of the more cumbersome $q_{I}(\textbf{p})$ . Moreover, to simplify the notation, let $q_{i}:=q_{\{i\}}$ for $1\leq i\leq m$ . Note that if $I\notin\mathcal{I}$ , $q_{I}=0$ .

Theorem 2 (Shearer [35]).

If $q_{I}\geq 0$ for all $I\subseteq V$ , then $\mathop{\mathrm{Pr}}\nolimits\left(\bigwedge_{i=1}^{m}\overline{A_{i}}\,\right)\geq q_{\emptyset}$ .

In particular, if the condition holds with $q_{\emptyset}>0$ , then $\mathop{\mathrm{Pr}}\nolimits\left(\bigwedge_{i=1}^{m}\overline{A_{i}}\,\right)>0$ .

Neither Theorem 1 nor Theorem 2 yields an efficient algorithm to find the assignment avoiding all bad events, since they only guarantee an exponentially small probability. There has been a long line of research devoted to an algorithmic version of LLL, culminating in Moser and Tardos [31] with essentially the same condition as in Theorem 1. The Resample algorithm of Moser and Tardos is very simple, described in Algorithm 1.

(1)

Draw independent samples of all variables $X_{1},\dots,X_{n}$ from their respective distributions.
(2)

While at least one $A_{i}$ holds, pick one such $A_{i}$ arbitrarily and resample all variables in ${\sf var}(A_{i})$ .
(3)

Output the current assignment.

Algorithm 1 The Resample algorithm

In [31], Moser and Tardos showed that Algorithm 1 finds a good assignment very efficiently.

Theorem 3 (Moser and Tardos [31]).

Under the condition of Theorem 1, the expected number of resampling steps in Algorithm 1 is at most $\sum_{i=1}^{m}\frac{x_{i}}{1-x_{i}}$ .

Unfortunately, the final output of Algorithm 1 is not distributed as we would like, namely as a product distribution conditioned on avoiding all bad events.

In addition, Kolipaka and Szegedy [26] showed that up to Shearer’s condition, Algorithm 1 is efficient. Recall that $q_{i}:=q_{\{i\}}$ for $1\leq i\leq m$ .

Theorem 4 (Kolipaka and Szegedy [26]).

If $q_{I}\geq 0$ for all $I\in\mathcal{I}$ and $q_{\emptyset}>0$ , then the expected number of resampling steps in Algorithm 1 is at most $\sum_{i=1}^{m}\frac{q_{i}}{q_{\emptyset}}$ .

On the other hand, Wilson’s cycle-popping algorithm [39] is very similar to the Resample algorithm but it outputs a uniformly random rooted spanning tree. Another similar algorithm is the sink-popping algorithm by Cohn, Pemantle, and Propp [7] to generate a sink-free orientation uniformly at random. Upon close examination of these two algorithms, we found a common feature of both problems.

Condition 5.

If $(i,j)\in E$ $($ or equivalently $A_{i}\sim A_{j})$ , then $\mathop{\mathrm{Pr}}\nolimits(A_{i}\wedge A_{j})=0$ ; namely the two events $A_{i}$ and $A_{j}$ are disjoint if they are dependent.

In other words, any two events $A_{i}$ and $A_{j}$ are either independent or disjoint. These instances have been noticed in the study of Lovász Local Lemma. They are the ones that minimize $\mathop{\mathrm{Pr}}\nolimits\left(\bigwedge_{i=1}^{m}\overline{A_{i}}\,\right)$ given Shearer’s condition (namely $\mathop{\mathrm{Pr}}\nolimits\left(\bigwedge_{i=1}^{m}\overline{A_{i}}\,\right)=q_{\emptyset}$ ). Instances satisfying Condition 5 have been named extremal [26].

We will show that, given Condition 5, the final output of the Resample algorithm is a sample from a conditional product distribution (Theorem 8). Moreover, we will show that under Condition 5, the running time upper bound $\sum_{i=1}^{m}\frac{q_{i}}{q_{\emptyset}}$ given by Kolipaka and Szegedy (Theorem 4) is indeed the exact expected running time. See Theorem 13.

In fact, when Condition 5 holds, at each step of Algorithm 1, the occurring events form an independent set of the dependency graph $G$ . Think of the execution of Algorithm 1 as going in rounds. In each round we find the set $I$ of bad events that occur. Due to Condition 5, ${\sf var}(A_{i})\cap{\sf var}(A_{j})=\emptyset$ for any $i,j\in I$ , i.e., $I$ is an independent set in the dependency graph. Therefore, we can resample all variables involved in the occurring bad events without interfering with each other. This motivates Algorithm 2.

We call Algorithm 2 the Partial Rejection Sampling algorithm. This name was coined by Cohn, Pemantle, and Propp [7]. Indeed, they ask as an open problem how to generalize their sink-popping algorithm and Wilson’s cycle popping algorithm. We answer this question under the variable framework. Partial Rejection Sampling differs from the normal rejection sampling algorithm by only resampling “bad” events. Moreover, Algorithm 2 is uniform only on extremal instances, and is a special case of Algorithm 6 given in Section 5, which is a uniform sampler for all instances.

(1)

Draw independent samples of all variables $X_{1},\dots,X_{n}$ from their respective distributions.
(2)

While at least one bad event holds, find the independent set $I$ of occurring $A_{i}$ ’s. Independently resample all variables in $\bigcup_{i\in I}{\sf var}(A_{i})$ .
(3)

Output the current assignment.

Algorithm 2 Partial Rejection Sampling for extremal instances

In fact, Algorithm 2 is the same as the parallel version of Algorithm 1 by Moser and Tardos [31] for extremal instances. Suppose each event is assigned to a processor, which determines whether the event holds by looking at the variables associated with the event. If the event holds then all associated variables are resampled. No conflict will be created due to Condition 5.

In the following analysis, we will use the resampling table idea, which has appeared in both the analysis of Moser and Tardos [31] and Wilson [39]. Note that we only use this idea to analyze the algorithm rather than to really create the table in the execution. Associate each variable $X_{i}$ with an infinite stack of random values $\{X_{i,1},X_{i,2},\dots\}$ . This forms the resampling table where each row represents a variable and there are infinitely many columns, as shown in Table 1. In the execution of the algorithm, when a variable needs to be resampled, the algorithm draws the top value from the stack, or equivalently moves from the current entry in the resampling table to its right.

Table 1. A resampling table with

4

variables

$X_{1}$	$X_{1,1}$	$X_{1,2}$	$X_{1,3}$	$\dots$
$X_{2}$	$X_{2,1}$	$X_{2,2}$	$X_{2,3}$	$\dots$
$X_{3}$	$X_{3,1}$	$X_{3,2}$	$X_{3,3}$	$\dots$
$X_{4}$	$X_{4,1}$	$X_{4,2}$	$X_{4,3}$	$\dots$

Let $t$ be a positive integer to denote the round of Algorithm 2. Let $j_{i,t}$ be the index of the variable $X_{i}$ in the resampling table at round $t$ . In other words, at the $t$ -th round, $X_{i}$ takes value $X_{i,j_{i,t}}$ . Thus, the set $\sigma_{t}=\{X_{i,j_{i,t}}\mid 1\leq i\leq n\}$ is the current assignment of variables at round $t$ . This $\sigma_{t}$ determines which events happen. Call the set of occurring events, viewed as a subset of the vertex set of the dependency graph, $I_{t}$ . (For convenience, we shall sometimes identify the event $A_{i}$ with its index $i$ ; thus, we shall refer to the “events in $S$ ” rather than the “events indexed by $S$ ”.) As explained above, $I_{t}$ is an independent set of $G$ due to Condition 5. Then variables involved in any of the events in $I_{t}$ are resampled. In other words,

\displaystyle j_{i,t+1}=\begin{cases}j_{i,t}+1&\text{if }\exists\ell\in I_{t}\text{ such that }i\in{\sf var}(A_{\ell});\\ j_{i,t}&\text{otherwise.}\end{cases}

Therefore, any event that happens in round $t+1$ , must share at least one variable with some event in $I_{t}$ (possibly itself). In other words, $I_{t+1}\subseteq\Gamma^{+}(I_{t})$ where $\Gamma^{+}(\cdot)$ denotes the set of all neighbours of $I$ unioned with $I$ itself. This inspires the notion of independent set sequences (first introduced in [26]).

Definition 6.

A list $\mathcal{S}=S_{1},S_{2},\dots,S_{\ell}$ of independent sets in $G$ is called an independent set sequence if $S_{i}\neq\emptyset$ for all $1\leq i\leq\ell-1$ and for every $1\leq i\leq\ell-1$ , $S_{i+1}\subseteq\Gamma^{+}(S_{i})$ .

We adopt the convention that the empty list is an independent set sequence with $\ell=0$ . Note that we allow $S_{\ell}$ to be $\emptyset$ .

Let $M$ be a resampling table. Suppose running Algorithm 2 on $M$ does not terminate up to some integer $\ell\geq 1$ rounds. Define the log of running Algorithm 2 on $M$ up to round $\ell$ as the sequence of independent sets $I_{1},I_{2},\dots,I_{\ell}$ created by this run. Thus, for any $M$ and $\ell\geq 1$ , the log $I_{1},I_{2},\dots,I_{\ell}$ must be an independent set sequence. Moreover, if Algorithm 2 terminates at round $T$ , let $\sigma_{t}:=\sigma_{T}$ if $t>T$ . Denote by $\mu(\cdot)$ the product distribution of all random variables.

Lemma 7.

Suppose Condition 5 holds. Given any log $\mathcal{S}=S_{1},S_{2},\dots,S_{\ell}$ of length $\ell\geq 1$ and conditioned on seeing the log $\mathcal{S}$ , $\sigma_{\ell+1}$ is a random sample from the product distribution conditioned on the event $\bigwedge_{i\in[m]\setminus\Gamma^{+}(S_{\ell})}\overline{A_{i}}$ , namely from $\mu\big{(}\cdot\mid\bigwedge_{i\in[m]\setminus\Gamma^{+}(S_{\ell})}\overline{A_{i}}\big{)}$ .

We remark that Lemma 7 is not true for non-extremal instances (that is, if Condition 5 fails). In particular, Lemma 7 says that given any log, every valid assignment is not only reachable, but also with the correct probability. This is no longer the case for non-extremal instances — some valid assignments from the desired conditional product distribution could be “blocked” under the log $\mathcal{S}$ . In Section 5 we show how to instead achieve uniformity by resampling an “unblocking” set of bad events.

Proof.

The set of occurring events at round $\ell$ is $S_{\ell}$ . Hence $\sigma_{\ell+1}$ does not make any of the $A_{i}$ ’s happen where $i\notin\Gamma^{+}(S_{\ell})$ . Call an assignment $\sigma$ valid if none of the $A_{i}$ ’s happen where $i\notin\Gamma^{+}(S_{\ell})$ . To show that $\sigma_{\ell+1}$ has the desired conditional product distribution, we will show that the probabilities of getting any two valid assignments $\sigma$ and $\sigma^{\prime}$ are proportional to their probabilities of occurrence in $\mu(\cdot)$ .

Let $M$ be the resampling table so that the log of the algorithm is $\mathcal{S}$ up to round $\ell\geq 1$ , and $\sigma_{\ell+1}=\sigma$ . Indeed, since we only care about events up to round $\ell+1$ , we may truncate the table so that $M=\{X_{i,j}\;|\;1\leq i\leq n,\;\;1\leq j\leq j_{i,\ell+1}\}$ . Let $M^{\prime}=\{X_{i,j}^{\prime}\;|\;1\leq i\leq n,\;\;1\leq j\leq j_{i,\ell+1}\}$ be another table where $X_{i,j}^{\prime}=X_{i,j}$ if $j<j_{i,\ell+1}$ for any $i\in[n]$ , but $\sigma_{\ell+1}=\sigma^{\prime}$ . In other words, we only change the values in the final round $(X_{i,j_{i,\ell+1}})$ , and only to another valid assignment.

We claim that the algorithm running on $M^{\prime}$ generates the same log $\mathcal{S}$ . The lemma then follows by the following argument. Assuming the claim holds, then conditioned on the log $\mathcal{S}$ , every possible table $M$ such that $\sigma_{\ell+1}=\sigma$ is one-to-one corresponding to another table $M^{\prime}$ so that $\sigma_{\ell+1}=\sigma^{\prime}$ . It implies that for every pair of valid assignments $\sigma,\sigma^{\prime}$ , there is a bijection between the resampling tables resulting in them. The ratio between the probability of two corresponding tables is exactly the ratio between the probabilities of $\sigma$ and $\sigma^{\prime}$ under $\mu(\cdot)$ . Since the probability of getting a particular $\sigma$ in round $\ell+1$ is the sum over the probabilities of all resampling tables (conditioned on the log $\mathcal{S}$ ) leading to $\sigma$ , the probability of getting $\sigma$ is also proportional to its weight under $\mu(\cdot)$ .

Suppose the claim fails and the logs obtained by running the algorithm on $M$ and $M^{\prime}$ differ. Let $t_{0}\leq\ell$ be the first round where resampling changes. Without loss of generality, let $A$ be an event that occurs in $S_{t_{0}}$ on $M^{\prime}$ but not on $M$ . Moreover, there must be a non-empty set of variables $Y\subseteq{\sf var}(A)$ that have values $(X_{i,j_{i,\ell+1}})$ , as otherwise the two runs would be identical. Since resampling history does not change before $t_{0}$ , in the $M^{\prime}$ run, the assignment of variables in $Y$ must be $(X^{\prime}_{i,j_{i,\ell+1}})$ at time $t_{0}$ .

We claim that $Y={\sf var}(A)$ . If the claim does not hold, then $Z:={\sf var}(A)\setminus Y\neq\emptyset$ . Any variable in $Z$ has not reached final round, and must be resampled in the $M$ run. Let $X_{j}\in Z$ be the first such variable being resampled at or after round $t_{0}$ in the $M$ run. (The choice of $X_{j}$ may not be unique, and we just choose an arbitrary one.) Recall that $Y\neq\emptyset$ , $A$ can no longer happen, thus there must be $A^{\prime}\neq A$ causing such a resampling of $X_{j}$ . Then ${\sf var}(A)\cap{\sf var}(A^{\prime})\neq\emptyset$ . Consider any variable $X_{k}$ with $k\in{\sf var}(A)\cap{\sf var}(A^{\prime})$ . It is resampled at or after time $t_{0}$ in the $M$ run due to $A^{\prime}$ . Hence $X_{k}\in Z$ for any such $k$ . Moreover, in the $M$ run, until $A^{\prime}$ happens, $X_{k}$ has not been resampled since time $t_{0}$ , because $A^{\prime}$ is the first resampling event at or after time $t_{0}$ that involves variables in $Z$ . On the other hand, in the $M^{\prime}$ run, $X_{k}$ ’s value causes $A$ to happen at time $t_{0}$ . Hence, there exists an assignment on variables in ${\sf var}(A)\cap{\sf var}(A^{\prime})$ such that both $A$ and $A^{\prime}$ happen. Clearly this assignment can be extended to a full assignment so that both $A$ and $A^{\prime}$ happen. However, $A\sim A^{\prime}$ as they share the variable $X_{j}$ . Due to Condition 5, $A\cap A^{\prime}=\emptyset$ . Contradiction! Therefore the claim holds.

We argue that the remaining case, $Y={\sf var}(A)$ , is also not possible. Since $A$ occurs in the $M^{\prime}$ run, we know, by the definition of $\sigma^{\prime}$ , that $A\in\Gamma^{+}(S_{\ell})$ . Thus, some event whose variables intersect with those in $A$ must occur in the $M$ run. But when the algorithm attempts to update variables shared by these two events in the $M$ run, it will access values beyond the final round of the resampling table, a contradiction. ∎

Theorem 8.

When Condition 5 holds and Algorithm 2 halts, its output is the product distribution $\mu(\cdot)$ conditioned on avoiding all bad events.

Proof.

Let an independent set sequence $\mathcal{S}$ of length $\ell$ be the log of any successful run. Then $S_{\ell}=\emptyset$ . By Lemma 7, conditioned on the log $\mathcal{S}$ , the output assignment $\sigma$ is $\mu\big{(}\cdot\mid\bigwedge_{i\in[m]\setminus\Gamma^{+}(S_{\ell})}\overline{A_{i}}\big{)}=\mu\big{(}\cdot\mid\bigwedge_{i\in[m]}\overline{A_{i}}\big{)}$ . This is valid for any possible log, and the theorem follows. ∎

In other words, let $\Sigma$ be the set of assignments that avoid all bad events. Let $U$ be the output of Algorithm 2. In the case that all variables are sampled from the uniform distribution, we have $\mathop{\mathrm{Pr}}\nolimits(U=\sigma)=\frac{1}{|\Sigma|}$ , for all $\sigma\in\Sigma$ .

3. Expected running time of Algorithm 2

In this section we give an explicit formula for the running time of Algorithm 2. We assume that Condition 5 holds throughout the section.

We first give a combinatorial explanation of $q_{I}$ for any independent set $I$ of the dependency graph $G$ . To simplify the notation, we denote the event $\bigwedge_{i\in S}A_{i}$ , i.e., the conjunction of all events in $S$ , by $A(S)$ .

For any set $I$ in the dependency graph, we denote by $p_{I}$ the probability $\mathop{\mathrm{Pr}}\nolimits_{\mu}(A(I))$ that all events in $I$ happen (and possibly some other events too). If $I$ is an independent set in the dependency graph, any two events in $I$ are independent and

(2)

\displaystyle p_{I}=\prod_{i\in I}p_{i}.

Moreover, for any set $J$ of events that is not an independent set, we have $p_{J}=0$ due to Condition 5.

On the other hand, the quantity $q_{I}$ is in fact the probability that exactly the events in $I$ happen and no others do. This can be verified using inclusion-exclusion, together with Condition 5:

	$\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(\bigwedge_{i\in I}A_{i}\wedge\bigwedge_{i\notin I}\overline{A_{i}}\right)$	$\displaystyle=\sum_{J\supseteq I}(-1)^{\|J\setminus I\|}p_{J}$
(3)			$\displaystyle=\sum_{J\in\mathcal{I},\ J\supseteq I}(-1)^{\|J\setminus I\|}p_{J}=q_{I},$

where $\mathcal{I}$ denotes the collection of all independent sets of $G$ . Since the events $(\bigwedge_{i\in I}A_{i}\wedge\bigwedge_{i\notin I}\overline{A_{i}})$ are mutually exclusive for different $I$ ’s,

\displaystyle\sum_{I\in\mathcal{I}}q_{I}=1.

Moreover, since the event $A(I)$ is the union over $J\supseteq I$ of the events $(\bigwedge_{i\in J}A_{i}\wedge\bigwedge_{i\notin J}\overline{A_{i}})$ , we have

(4)

\displaystyle p_{I}=\sum_{J\in\mathcal{I},\ J\supseteq I}q_{J}.

Lemma 9.

Assume Condition 5 holds. Let $\mathcal{S}=S_{1},\ldots,S_{\ell}$ be an independent set sequence of length $\ell>0$ . Then in Algorithm 2,

\displaystyle\mathop{\mathrm{Pr}}\nolimits\big{(}\text{the log is $\mathcal{S}$ up to round $\ell$}\big{)}=q_{S_{\ell}}\prod_{t=1}^{\ell-1}p_{S_{t}}.

Proof.

Clearly, if $q_{S_{\ell}}=0$ , then the said sequence will never happen. We assume that $q_{S_{\ell}}>0$ in the following.

Recall that $\mu$ is the product distribution of sampling all variables independently. We need to distinguish the probability space with respect to $\mu$ from that with respect to the execution of the algorithm. We write $\mathop{\mathrm{Pr}}\nolimits_{\mathrm{PRS}}(\cdot)$ to refer to the algorithm, and write $\mathop{\mathrm{Pr}}\nolimits_{\mu}(\cdot)$ to refer to the (static) space with respect to $\mu$ . As noted before, to simplify the notation we will use $A(S)$ to denote the event $\bigwedge_{i\in S}A_{i}$ , where $S\subseteq[m]$ . In addition, $B(S)$ will be used to denote $\bigwedge_{i\in S}\overline{A_{i}}$ . For $I\in\mathcal{I}$ , define

\partial I:=\Gamma^{+}(I)\setminus I,\quad I^{e}:=[m]\setminus\Gamma^{+}(I),\quad\text{and}\quad I^{c}:=[m]\setminus I=\partial I\cup I^{e}.

So $\partial I$ is the “boundary” of $I$ , comprising events that are not in $I$ but which depend on $I$ , and $I^{e}$ is the “exterior” of $I$ , comprising events that are independent of all events in $I$ . The complement $I^{c}$ is simply the set of all events not in $I$ . Note that $B(I^{c})=B(\partial I)\wedge B(I^{e})$ . As examples of the notation, $\mathop{\mathrm{Pr}}\nolimits_{\mu}(A(I))=\prod_{i\in I}p_{i}=p_{I}$ is the probability that all events in $I$ occur under $\mu$ , and $\mathop{\mathrm{Pr}}\nolimits_{\mu}(A(I)\wedge B(I^{c}))=q_{I}$ is the probability that exactly the events in $I$ occur.

By the definition of $I^{e}$ , we have that

(5)

\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(B(I^{e})\mid A(I)\right)=\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(I^{e})),

and, by Condition 5, that

(6)

\displaystyle A(I)\wedge B(\partial I)=A(I).

Hence for any $I\in\mathcal{I}$ ,

	$\displaystyle q_{I}$	$\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(I)\wedge B(I^{c})\big{)}$
		$\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(I)\wedge B(\partial I)\wedge B(I^{e})\big{)}$
		$\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(I)\wedge B(I^{e})\big{)}$	by (6)
(7)			$\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(A(I)\right)\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(B(I^{e})\right).$	by (5)

We prove the lemma by induction. It clearly holds when $\ell=1$ . At round $\ell\geq 2$ , since we only resample variables that are involved in $S_{\ell-1}$ , we have that $S_{\ell}\subseteq\Gamma^{+}(S_{\ell-1})$ . Moreover, variables are not resampled in any $A_{i}$ where $i\in S_{\ell-1}^{e}$ , and hence

(8)

\displaystyle B(S_{\ell}^{c})\wedge B(S_{\ell-1}^{e})=B(S_{\ell}^{c}).

Conditioned on $S_{\ell-1}$ , by Lemma 7, the distribution of $\sigma_{\ell}$ at round $\ell$ is the product distribution conditioned on none of the events outside of $\Gamma^{+}(S_{\ell-1})$ occuring; namely, it is $\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}\cdot\mid B(S_{\ell-1}^{e})\big{)}$ . Thus the probability of getting $S_{\ell}$ in round $\ell$ is

	$\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mathrm{PRS}}\big{(}A(S_{\ell})\wedge B(S_{\ell}^{c})\text{ holds in round $\ell$}\bigm{\|}\text{prior log is }S_{1},\ldots,S_{\ell-1}\big{)}$
	$\displaystyle\qquad{}=\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(S_{\ell})\wedge B(S_{\ell}^{c})\mid B(S_{\ell-1}^{e})\big{)}$
	$\displaystyle\qquad{}=\frac{\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(S_{\ell})\wedge B(S_{\ell}^{c})\wedge B(S_{\ell-1}^{e})\big{)}}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{\ell-1}^{e}))}$
	$\displaystyle\qquad{}=\frac{\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(S_{\ell})\wedge B(S_{\ell}^{c})\big{)}}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{\ell-1}^{e}))}$	by (8)
(9)		$\displaystyle\qquad{}=\frac{q_{S_{\ell}}}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{\ell-1}^{e}))}.$

By (9) and the induction hypothesis, we have

	$\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mathrm{PRS}}\left(\text{the log is $\mathcal{S}$ up to round $\ell$}\right)$	$\displaystyle=\frac{q_{S_{\ell}}}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{\ell-1}^{e}))}\cdot q_{S_{\ell-1}}\prod_{t=1}^{\ell-2}p_{S_{t}}$
		$\displaystyle=q_{S_{\ell}}\prod_{t=1}^{\ell-1}p_{S_{t}},$

where to get the last line we used (7) on $S_{\ell-1}$ . ∎

Essentially the proof above is a delayed revelation argument. At each round $1\leq t\leq\ell-1$ , we only reveal variables that are involved in $S_{t}$ . Thus, at round $\ell$ , we have revealed all variables that are involved in $\mathcal{S}$ . With respect to these variables, the sequence $\mathcal{S}$ happens with probability $p_{\mathcal{S}}$ . Condition 5 guarantees that what we have revealed so far does not interfere with the final output (cf. Lemma 7). Hence the final state happens with probability $q_{S_{\ell}}$ .

We write $p_{\mathcal{S}}=\prod_{i=1}^{\ell}p_{S_{i}}$ for an independent set sequence $\mathcal{S}$ of length $\ell\geq 0$ . Note the convention that $p_{\mathcal{S}}=1$ if $\mathcal{S}$ is empty and $\ell=0$ . Then, Lemma 9 implies the following equality, which is first shown by Kolipaka and Szegedy [26] in the more general (not necessarily extremal) setting of the local lemma.

Corollary 10.

Assume Condition 5 holds. If $q_{\emptyset}>0$ , then

\displaystyle\sum_{\mathcal{S}\text{ s.t.\ }S_{1}=I}p_{\mathcal{S}}q_{\emptyset}=q_{I},

where $\mathcal{S}$ is an independent set sequence and $I$ is an independent set of $G$ .

Proof.

First we claim that if $q_{\emptyset}>0$ , then Algorithm 2 halts with probability $1$ . Conditioned on any log $\mathcal{S}=S_{1},\dots,S_{\ell-1}$ , by Lemma 7, the distribution of $\sigma_{\ell}$ at round $\ell$ is $\mu\big{(}\cdot\mid B(S_{\ell-1}^{e})\big{)}$ . The probability of getting a desired assignment is thus $\mu\big{(}B([m])\mid B(S_{\ell-1}^{e})\big{)}=\frac{\mu(B([m]))}{\mu(B(S_{\ell-1}^{e}))}\geq\mu(B([m]))=q_{\emptyset}$ . Hence the probability that the algorithm does not halt at time $t$ is at most $(1-q_{\emptyset})^{t}$ , which goes to $0$ as $t$ goes to infinity.

Then we apply Lemma 9 when $\emptyset$ is the final independent set. The left hand side is the total probability of all possible halting logs whose first independent set is exactly $I$ . This is equivalent to getting exactly $I$ in the first step, which happens with probability $q_{I}$ . ∎

As a sanity check, the probability of all possible logs should sum to $1$ when $q_{\emptyset}>0$ and the algorithm halts with probability $1$ . Indeed, by Corollary 10,

\displaystyle\sum_{\mathcal{S}}p_{\mathcal{S}}q_{\emptyset}

\displaystyle=\sum_{I\in\mathcal{I}}\;\;\sum_{\mathcal{S}\text{ s.t.\ }S_{1}=I}p_{\mathcal{S}}q_{\emptyset}=\sum_{I\in\mathcal{I}}q_{I}=1,

where $\mathcal{S}$ is an independent set sequence. In other words,

(10)

\displaystyle\sum_{\mathcal{S}}p_{\mathcal{S}}=\frac{1}{q_{\emptyset}},

where $\mathcal{S}$ is an independent set sequence. This fact is also observed by Knuth [25, Page 86, Theorem F] and Harvey and Vondrák [23, Corollary 5.28] in the more general settings. Our proof here gives a combinatorial explanation of this equality.

Equation (10) holds whenever $q_{\emptyset}>0$ . Recall that $q_{\emptyset}$ is the shorthand of $q_{\emptyset}(\textbf{p})$ , which is

(11)

\displaystyle q_{\emptyset}(\textbf{p})=\sum_{I\in\mathcal{I}}(-1)^{|I|}\prod_{i\in I}p_{i},

where $\mathcal{I}$ is the collection of independent sets of the dependency graph $G$ .

Lemma 11.

Assume Condition 5 holds. If $q_{\emptyset}(\textbf{p})>0$ , then $q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})>0$ for any $i\in[m]$ and $0\leq z\leq 1$ .

Proof.

By (11),

\displaystyle q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})=\sum_{I\in\mathcal{I},\;i\notin I}(-1)^{|I|}\prod_{j\in I}p_{j}+z\sum_{I\in\mathcal{I},\;i\in I}(-1)^{|I|}\prod_{j\in I}p_{j}.

Notice that $\sum_{I\in\mathcal{I},\;i\in I}(-1)^{|I|}\prod_{j\in I}p_{j}=-q_{i}(\textbf{p})<0$ ( $q_{i}(\textbf{p})$ is the probability of exactly event $A_{i}$ occurring). Hence $q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})\geq q_{\emptyset}(\textbf{p})>0$ . ∎

Let $T_{i}$ be the number of resamplings of event $A_{i}$ and $T$ be the total number of resampling events. Then $T=\sum_{i=1}^{m}T_{i}$ .

Lemma 12.

Assume Condition 5 holds. If $q_{\emptyset}(\textbf{p})>0$ , then $\mathop{\mathbb{{}E}}\nolimits T_{i}=q_{\emptyset}(\textbf{p})\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime}\bigg{|}_{z=1}$ .

Proof.

By Lemma 11, Equation (10) holds with $p_{i}$ replaced by $p_{i}z$ where $z\in[0,1]$ . For a given independent set sequence $\mathcal{S}$ , let $T_{i}(\mathcal{S})$ be the total number of occurences of $A_{i}$ in $\mathcal{S}$ . Then we have that

(12)

\displaystyle\sum_{\mathcal{S}}p_{\mathcal{S}}z^{T_{i}(\mathcal{S})}=\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}.

Take derivative with respect to $z$ of (12):

\displaystyle\sum_{\mathcal{S}}T_{i}(\mathcal{S})p_{\mathcal{S}}z^{T_{i}(\mathcal{S})-1}=\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime}.

Evaluate the equation above at $z=1$ :

(13)

\displaystyle\sum_{\mathcal{S}}T_{i}(\mathcal{S})p_{\mathcal{S}}=\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime}\bigg{|}_{z=1}.

On the other hand, we have that

	$\displaystyle\mathop{\mathbb{{}E}}\nolimits T_{i}$	$\displaystyle=\sum_{\mathcal{S}}\textstyle\mathop{\mathrm{Pr}}\nolimits_{\mathrm{PRS}}\left(\text{the log is $\mathcal{S}$}\right)T_{i}(\mathcal{S})$
(by Lemma 9)			$\displaystyle=\sum_{\mathcal{S}}p_{\mathcal{S}}q_{\emptyset}(\textbf{p})T_{i}(\mathcal{S})$
(by (13))			$\displaystyle=q_{\emptyset}(\textbf{p})\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime}\bigg{\|}_{z=1}.$

This completes the proof. ∎

Theorem 13.

Assume Condition 5 holds. If $q_{\emptyset}>0$ , then $\mathop{\mathbb{{}E}}\nolimits T=\sum_{i=1}^{m}\frac{q_{i}}{q_{\emptyset}}$ .

Proof.

Clearly $\mathop{\mathbb{{}E}}\nolimits T=\sum_{i=1}^{m}\mathop{\mathbb{{}E}}\nolimits T_{i}$ . By Lemma 12, all we need to show is that

(14)

\displaystyle q_{\emptyset}(\textbf{p})\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime}\bigg{|}_{z=1}=\frac{q_{i}(\textbf{p})}{q_{\emptyset}(\textbf{p})}.

This is because

	$\displaystyle q_{\emptyset}^{\prime}(p_{1},\dots,p_{i}z,\dots,p_{m})$	$\displaystyle=\sum_{i\in J,\;J\in\mathcal{I}}(-1)^{\|J\|}\prod_{j\in J}p_{j}$
		$\displaystyle=-q_{i}(\textbf{p}),$

and thus

	$\displaystyle\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime}$	$\displaystyle=\frac{-q_{\emptyset}^{\prime}(p_{1},\dots,p_{i}z,\dots,p_{m})}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})^{2}}$
		$\displaystyle=\frac{q_{i}(\textbf{p})}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})^{2}}.$

It is easy to see that (14) follows as we set $z=1$ and the theorem is shown. ∎

The quantity $\sum_{i=1}^{m}\frac{q_{i}}{q_{\emptyset}}$ is not always easy to bound. Kolipaka and Szegedy [26] have shown that when the probability vector p satisfies Shearer’s condition with a constant “slack”, the running time is in fact linear in the number of events in the more general setting. We rewrite it in our notations.

Theorem 14 ([26, Theorem 5]).

Let $d\geq 2$ be a positive integer and $p_{c}=\frac{(d-1)^{(d-1)}}{d^{d}}$ . Let $p=\max_{i\in[m]}\{p_{i}\}$ . If $G$ has maximum degree $d$ and $p<p_{c}$ , then $\mathop{\mathbb{{}E}}\nolimits T\leq\frac{p}{p_{c}-p}\cdot m$ .

4. Applications of Algorithm 2

4.1. Sink-free Orientations

The goal of this application is to sample a sink-free orientation. Given a graph $G=(V,E)$ , an orientation of edges is a mapping $\sigma$ so that $\sigma(e)=(u,v)$ or $(v,u)$ where $e=(u,v)\in E$ . A sink under orientation $\sigma$ is a vertex $v$ so that for any adjacent edge $e=(u,v)$ , $\sigma(e)=(u,v)$ . A sink-free orientation is an orientation so that no vertex is a sink.

Name:

Sampling Sink-free Orientations
Instance:

A Graph $G$ .
Output:

A uniform sink-free orientation.

The first algorithm for this problem is given by Bubley and Dyer [6], using Markov chains and path coupling techniques.

In this application, we associate with each edge a random variable, whose possible values are $(u,v)$ or $(v,u)$ . For each vertex $v$ , we associate it with a bad event $A_{v}$ , which happens when $v$ is a sink. Thus the graph $G$ itself is also the dependency graph. Condition 5 is satisfied, because if a vertex is a sink, then none of its neighbours can be a sink. Thus we may apply Algorithm 2, which yields Algorithm 3. This is the “sink-popping” algorithm given by Cohn, Pemantle, and Propp [7].

(1)

Orient each edge independently and uniformly at random.
(2)

While there is at least one sink, re-orient all edges that are adjacent to a sink.
(3)

Output the current assignment.

Algorithm 3 Sample Sink-free Orientations

Let $Z_{\mathrm{sink},0}$ be the number of sink-free orientations, and let $Z_{\mathrm{sink},1}$ be the number of orientations having exactly one sink. Then Theorem 13 specializes into the following.

Theorem 15.

The expected number of resampled sinks in Algorithm 3 is $\frac{Z_{\mathrm{sink},1}}{Z_{\mathrm{sink},0}}$ .

It is easy to see that a graph has a sink-free orientation if and only if the graph is not a tree. The next theorem gives an explicit bound on $\frac{Z_{\mathrm{sink},1}}{Z_{\mathrm{sink},0}}$ when sink-free orientations exist.

Theorem 16.

Let $G$ be a connected graph on $n$ vertices. If $G$ is not a tree, then $\frac{Z_{\mathrm{sink},1}}{Z_{\mathrm{sink},0}}\leq n(n-1)$ , where $n=\left|V(G)\right|$ .

Proof.

Consider an orientation of the edges of $G$ with a unique sink at vertex $v$ . We give a systematic procedure for transforming this orientation to a sink-free orientation. Since $G$ is connected and not a tree, there exists an (undirected) path $\Pi$ in $G$ of the form $v=v_{0},v_{1},\ldots,v_{\ell-1},v_{\ell}=v_{k}$ , where the vertices $v_{0},v_{1},\ldots,v_{\ell-1}$ are all distinct and $0\leq k\leq\ell-2$ . In other words, $\Pi$ is a simple path of length $\ell-1$ followed by a single edge back to some previously visited vertex. We will choose a canonical path of this form (depending only on $G$ and not on the current orientation) for each start vertex $v$ .

We now proceed as follows. Since $v$ is a sink, the first edge on $\Pi$ is directed $(v_{1},v_{0})$ . Reverse the orientation of this edge so that it is now oriented $(v_{0},v_{1})$ . This operation destroys the sink at $v=v_{0}$ but may create a new sink at $v_{1}$ . If $v_{1}$ is not a sink then halt. Otherwise, reverse the orientation of the second edge of $\Pi$ from $(v_{2},v_{1})$ to $(v_{1},v_{2})$ . Continue in this fashion: if we reach $v_{i}$ and it is not a sink then halt; otherwise reverse the orientation of the $(i+1)$ th edge from $(v_{i+1},v_{i})$ to $(v_{i},v_{i+1})$ . This procedure must terminate with a sink-free graph before we reach $v_{\ell}$ . To see this, note that if we reach the vertex $v_{\ell-1}$ then the final edge of $\Pi$ must be oriented $(v_{\ell-1},v_{\ell})$ , otherwise the procedure would have terminated already at vertex $v_{k}(=v_{\ell})$ .

The effect of the above procedure is to reverse the orientation of edges on some initial segment $v_{0},\ldots,v_{i}$ of $\Pi$ . To put the procedure into reverse, we just need to know the identity of the vertex $v_{i}$ . So our procedure associates at most $n$ orientations having a single sink at vertex $v$ with each sink-free orientation. There are $n(n-1)$ choices for the pair $(v,v_{i})$ , and hence $n(n-1)$ single-sink orientations associated with each sink-free orientation. This establishes the result. ∎

Remark.

The bound in Theorem 16 is optimal up to a factor of $2$ . Consider a cycle of length $n$ . Then there are $2$ sink-free orientations, and $n(n-1)$ single-sink orientations.

Theorem 16 and Theorem 15 together yield an $n^{2}$ bound on the expected number of resamplings that occur during a run of Algorithm 3. A cycle of length $n$ is an interesting special case. Consider the number of clockwise oriented edges during a run of the algorithm. It is easy to check that this number evolves as an unbiased lazy simple random walk on $[0,n]$ . Since the walk starts close to $n/2$ with high probability, we know that it will take $\Omega(n^{2})$ steps to reach one of the sink-free states, i.e., 0 or $n$ .

On the other hand, if $G$ is a regular graph of degree $\Delta\geq 3$ , then we get a much better linear bound from Theorem 14. In the case $\Delta=3$ , we have $p_{c}=4/27$ , $p=1/8$ and $p/(p_{c}-p)=27/5$ . So the expected number of resamplings is bounded by $27n/5$ . The constant in the bound improves as $\Delta$ increases. Conversely, since the expected running time is exact, we can also apply Theorem 14 to give an upper bound of $\frac{Z_{\mathrm{sink},1}}{Z_{\mathrm{sink},0}}$ when $G$ is a regular graph.

4.2. Rooted Spanning Trees

Given a graph $G=(V,E)$ with a special vertex $r$ , we want to sample a uniform spanning tree with $r$ as the root.

Name:

Sampling Rooted Spanning Trees
Instance:

A Graph $G$ with a vertex $r$ .
Output:

A uniform spanning tree rooted at $r$ .

Of course, any given spanning tree may be rooted at any vertex $r$ , so there is no real difference between rooted and unrooted spanning trees. However, since this approach to sampling generates an oriented tree, it is easier to think of the trees as being rooted at a particular vertex $r$ .

For all vertices other than $r$ , we randomly assign it to point to one of its neighbours. This is the random variable associated with $v$ . We will think of this random variable as an arrow $v\to s(v)$ pointing from $v$ to its successor $s(v)$ . The arrows point out an oriented subgraph of $G$ with $n-1$ edges $\{\{v,s(v)\}:v\in V\setminus\{r\}\}$ directed as specified by the arrows. The constraint for this subgraph to be a tree rooted at $r$ is that it contains no directed cycles. Note that there are $2^{|E|-|V|+\kappa(G)}$ (undirected) cycles in $G$ , where $\kappa(G)$ is the number of connected components of $G$ . Hence, we have possibly exponentially many constraints.

Two cycles are dependent if they share at least one vertex. We claim that Condition 5 is satisfied. Suppose a cycle $C$ is present, and $C^{\prime}\neq C$ is another cycle that shares at least one vertex with $C$ . If $C^{\prime}$ is also present, then we may start from any vertex $v\in C\cap C^{\prime}$ , and then follow the arrows $v\to v^{\prime}$ . Since both $C$ and $C^{\prime}$ are present, it must be that $v^{\prime}\in C\cap C^{\prime}$ as well. Continuing this argument we see that $C=C^{\prime}$ . Contradiction!

As Condition 5 is met, we may apply Algorithm 2, yielding Algorithm 4. This is exactly the “cycle-popping” algorithm by Wilson [39], as described in [33].

(1)

Let $T$ be an empty set. For each vertex $v\neq r$ , randomly choose a neighbour $u\in\Gamma(v)$ and add an edge $(v,u)$ to $T$ .
(2)

While there is at least one cycle in $T$ , remove all edges in all cycles, and for all vertices whose edges are removed, redo step (1).
(3)

Output the current set of edges.

Algorithm 4 Sample Rooted Spanning Trees

Let $Z_{\mathrm{tree},0}$ be the number of possible assignments of arrows to the vertices of $G$ , that yield a (directed) tree with root $r$ , and $Z_{\mathrm{tree},1}$ be the number of assignments that yield a unicyclic subgraph. The next theorem gives an explicit bound on $\frac{Z_{\mathrm{tree},1}}{Z_{\mathrm{tree},0}}$ .

Theorem 17.

Suppose $G$ is a connected graph on $n$ vertices, with $m$ edges. Then $\frac{Z_{\mathrm{tree},1}}{Z_{\mathrm{tree},0}}\leq mn$ .

Proof.

Consider an assignment of arrows to the vertices of $G$ that forms a unicyclic graph. This unicyclic subgraph has two components: a directed tree with root $r$ , and a directed cycle with a number of directed subtrees rooted on the cycle. This is because if we remove the unicyclic component, the rest of the graph has one less edge than vertices and no cycles, which must be a spanning tree.

As $G$ is connected, there must be an edge in $G$ joining the two components; let this edge be $\{v_{0},v_{1}\}$ , where $v_{0}$ is in the tree component and $v_{1}$ in the unicyclic component. Now extend this edge to a path $v_{0},v_{1},\ldots,v_{\ell}$ , by following arrows until we reach the cycle. Thus, $v_{1}\to v_{2},\,v_{2}\to v_{3},\,\ldots,\,v_{\ell-1}\to v_{\ell}$ are all arrows, and $v_{\ell}$ is the first vertex that lies on the cycle. (It may happen that $\ell=1$ .) Let $v_{\ell}\to v_{\ell+1}$ be the arrow out of $v_{\ell}$ . Now reassign the arrows from vertices $v_{1},\ldots,v_{\ell}$ thus: $v_{\ell}\to v_{\ell-1},\,\ldots,v_{2}\to v_{1},\,v_{1}\to v_{0}$ . Notice that the result is a directed tree rooted at $r$ .

As before, we would like to bound the number of unicyclic subgraphs associated with a given tree by this procedure. We claim that the procedure can be reversed given just two pieces of information, namely, the edge $\{v_{\ell},v_{\ell+1}\}$ and the vertex $v_{0}$ . Note that, even though the edge $\{v_{\ell},v_{\ell+1}\}$ is undirected, we can disambiguate the endpoints, as $v_{\ell}$ is the vertex closer to the root $r$ . The vertices $v_{\ell-1},\ldots,v_{1}$ are easy to recover, as they are the vertices on the unique path in the tree from $v_{\ell}$ to $v_{0}$ . To recover the unicyclic subgraph, we just need to reassign the arrows for vertices $v_{1},\ldots,v_{\ell}$ as follows: $v_{1}\to v_{2},\,\ldots,\,v_{\ell}\to v_{\ell+1}$ .

As the procedure can be reversed knowing one edge and one vertex, the number of unicyclic graphs associated with each tree can be at most $mn$ . ∎

Theorem 17 combined with Theorem 13 yields an $mn$ upper bound on the expected number of “popped cycles” during a run of Algorithm 4.

On the other hand, take a cycle of length $n$ . There are $n$ spanning trees with a particular root $v$ , and there are $\Omega(n^{3})$ unicyclic graphs (here a cycle has to be of length $2$ ). Thus the ratio is $\Omega(n^{2})=\Omega(mn)$ since $m=n$ , matching the bound of Theorem 17. Moreover, it is known that the cycle-popping algorithm may take $\Omega(n^{3})$ time, for example on a dumbbell graph [33].

4.3. Extremal CNF formulas

A classical setting in the study of algorithmic Lovász Local Lemma is to find satisfying assignments in $k$ -CNF formulas²²2As usual in the study of Lovász Local Lemma, by “ $k$ -CNF” we mean that every clause has exactly size $k$ ., when the number of appearances of every variable is bounded by $d$ . Theorem 1 guarantees the existence of a satisfying assignment as long as $d\leq\frac{2^{k}}{ek}+1$ . On the other hand, sampling is apparently harder than searching in this setting. As shown in [4, Corollary 30], it is NP-hard to approximately sample satisfying assignments when $d\geq 5\cdot 2^{k/2}$ , even restricted to the special case of monotone formulas.

Meanwhile, sink-free orientations can be recast in terms of CNF formulas. Every vertex in the graph is mapped to a clause, and every edge is a variable. Thus every variable appears exactly twice, and we require that the two literals of the same variable are always opposite. Interpreting an orientation from $u$ to $v$ as making the literal in the clause corresponding to $v$ false, the “sink-free” requirement is thus “not all literals in a clause are false”. Hence a “sink-free” orientation is just a satisfying assignment for the corresponding CNF formula.

To apply Algorithm 2, we need to require that the CNF formula satisfies Condition 5. Such formulas are defined as follows.

Definition 18.

We call a CNF formula extremal if for every two clauses $C_{i}$ and $C_{j}$ , if there is a common variable shared by $C_{i}$ and $C_{j}$ , then there exists some variable $x$ such that $x$ appears in both $C_{i}$ and $C_{j}$ and the two literals are one positive and one negative.

Let $C_{1},\dots,C_{m}$ be the clauses of a formula $\varphi$ . Then define the bad event $A_{i}$ as the set of unsatisfying assignments of clause $C_{i}$ . For an extremal CNF formula, these bad events satisfy Condition 5. This is because if $A_{i}\sim A_{j}$ , then by Definition 18, there exists a variable $x\in{\sf var}(A_{i})\cap{\sf var}(A_{j})$ such that the unsatisfying assignment of $C_{i}$ and $C_{j}$ differ on $x$ . Hence $A_{i}\cap A_{j}=\emptyset$ .

In this formulation, if the size of $C_{i}$ is $k$ , then the corresponding event $A_{i}$ happens with probability $p_{i}=\mathop{\mathrm{Pr}}\nolimits(A_{i})=2^{-k}$ , where variables are sampled uniformly and independently.³³3We note that to find a satisfying assignment it is sometimes beneficial to consider non-uniform distributions. See [13]. Suppose each variable appears at most $d$ times. Then the maximum degree in the dependency graph is at most $\Delta=(d-1)k$ . Note that in Theorem 14, $p_{c}=\frac{(\Delta-1)^{(\Delta-1)}}{\Delta^{\Delta}}\geq\frac{1}{e\Delta}$ . Thus if $d\leq\frac{2^{k}}{ek}+1$ , then $p_{i}=2^{-k}<p_{c}$ and we may apply Theorem 14 to obtain a polynomial time sampling algorithm.

Corollary 19.

For extremal $k$ -CNF formulas where each variable appears in at most $d$ clauses, if $d\leq\frac{2^{k}}{ek}+1$ , then Algorithm 2 samples satisfying assignments uniformly at random, with $O(m)$ expected resamplings where $m$ is the number of clauses.

The condition in Corollary 19 essentially matches the condition of Theorem 1. On the other hand, if we only require Shearer’s condition as in Theorem 2, Algorithm 2 is not necessarily efficient. More precisely, let $Z_{\mathrm{CNF},0}$ be the number of satisfying assignments, and $Z_{\mathrm{CNF},1}$ be the number of assignments satisfying all but one clause. If we only require Shearer’s condition in Theorem 2, then the expected number of resamplings $\frac{Z_{\mathrm{CNF},1}}{Z_{\mathrm{CNF},0}}$ can be exponential, as shown in the next example.

Example.

Construct an extremal CNF formula $\varphi=C_{1}\wedge C_{2}\wedge\dots\wedge C_{4m}$ as follows. Let $C_{1}:=x_{1}$ . Then the variable $x_{1}$ is pinned to $1$ to satisfy $C_{1}$ . Let $C_{2}:=\overline{x}_{1}\vee y_{1}\vee y_{2}$ , $C_{3}:=\overline{x}_{1}\vee y_{1}\vee\overline{y}_{2}$ , and $C_{4}:=\overline{x}_{1}\vee\overline{y}_{1}\vee y_{2}$ . Then $y_{1}$ and $y_{2}$ are also pinned to $1$ to satisfy all $C_{1}-C_{4}$ .

We continue this construction by letting

	$\displaystyle C_{4k+1}$	$\displaystyle:=\overline{y}_{2k-1}\vee\overline{y}_{2k}\vee x_{k+1},$
	$\displaystyle C_{4k+2}$	$\displaystyle:=\overline{x}_{k+1}\vee y_{2k+1}\vee y_{2k+2},$
	$\displaystyle C_{4k+3}$	$\displaystyle:=\overline{x}_{k+1}\vee y_{2k+1}\vee\overline{y}_{2k+2},$
	$\displaystyle C_{4k+4}$	$\displaystyle:=\overline{x}_{k+1}\vee\overline{y}_{2k+1}\vee y_{2k+2},$

for all $1\leq k\leq m-1$ . It is easy to see by induction that to satisfy all of them, all $x_{i}$ ’s and $y_{i}$ ’s have to be $1$ . Moreover, one can verify that this is indeed an extremal formula. Thus $Z_{\mathrm{CNF},0}=1$ . Moreover, since $\varphi$ has a satisfying assignment and is extremal, Shearer’s condition is satisfied. Note also that $\varphi$ is not a $3$ -CNF formula as $C_{1}$ contains a single variable.

On the other hand, if we are allowed to ignore $C_{1}$ , then $x_{1}$ can be $0$ . In that case, there are $3$ choices of $y_{1}$ and $y_{2}$ so that $x_{2}$ to be $0$ as well. Thus, there are at least $3^{m}$ assignments that only violate $C_{1}$ , where $x_{1}=x_{2}=\dots=x_{m}=0$ . It implies that $Z_{\mathrm{CNF},1}\geq 3^{m}$ . Hence we see that $\frac{Z_{\mathrm{CNF},1}}{Z_{\mathrm{CNF},0}}\geq 3^{m}$ . Due to Theorem 13, the expected running time of Algorithm 2 on this formula $\varphi$ is exponential in $m$ .

We will discuss more on sampling satisfying assignments of a $k$ -CNF formula in Section 7.1.

5. General Partial Rejection Sampling

In this section we give a general version of Algorithm 2 which can be applied to arbitrary instances in the variable framework, even without Condition 5.

Recall the notation introduced at the beginning of Section 2. So, $\{X_{1},\dots,X_{n}\}$ is a set of random variables, each with its own distribution and range $D_{i}$ , and $\{A_{1},\dots,A_{m}\}$ is a set of bad events that depend on $X_{i}$ ’s. The dependencies between events are encoded in the dependency graph $G=(V,E)$ . As before, we will use the idea of a resampling table. Recall that $\sigma=\sigma_{t}=\{X_{i,j_{i,t}}\mid 1\leq i\leq n\}$ denotes the current assignment of variables at round $t$ , i.e., the elements of the resampling table that are active at time $t$ . Given $\sigma$ , let ${\sf Bad}(\sigma)$ be the set of occurring bad events; that is, ${\sf Bad}(\sigma)=\{i\mid\sigma\in A_{i}\}$ . For a subset $S\subset V$ , let $\partial S$ be the boundary of $S$ ; that is, $\partial S=\{i\mid i\notin S\text{ and }\exists j\in S,\;(i,j)\in E\}$ . Moreover, let

\displaystyle{\sf var}(S):=\bigcup_{i\in S}{\sf var}(A_{i}).

Let $\sigma|_{S}$ be the partial assignment of $\sigma$ restricted to ${\sf var}(S)$ . For an event $A_{i}$ and $S\subseteq V$ , we write $A_{i}\perp\sigma|_{S}$ if either ${\sf var}(A_{i})\cap{\sf var}(S)=\emptyset$ , or there is no way to extend the partial assignment $\sigma|_{S}$ to all variables so that $A_{i}$ holds. Otherwise $A_{i}\not\perp\sigma|_{S}$ .

Definition 20.

A set $S\subseteq V$ is unblocking under $\sigma$ if for every $i\in\partial S$ , $A_{i}\perp\sigma|_{S}$ .

Given $\sigma$ , our goal is to resample a set of events that is unblocking and contains ${\sf Bad}(\sigma)$ . Such a set must exist because $V$ is unblocking ( $\partial V$ is empty) and ${\sf Bad}(\sigma)\subseteq V$ . However, we want to resample as few events as possible.

R\leftarrow{\sf Bad}(\sigma)

;

R

is the set of events that will be resampled.

N\leftarrow\emptyset

;

N

is the set of events that will not be resampled.

U\leftarrow\partial R\setminus N

;

while $U\neq\emptyset$ do

for $i\in U$ do

if $A_{i}\not\perp\sigma|_{R}$ then

R\leftarrow R\cup\{i\}

;

else

N\leftarrow N\cup\{i\}

;

end if

end for

U\leftarrow\partial R\setminus N

;

end while

return $R$

Algorithm 5 Select the resampling set

{\sf Res}(\sigma)

under an assignment

\sigma

Intuitively, we start by setting the resampling set $R_{0}$ as the set of bad events ${\sf Bad}(\sigma)$ . We mark resampling events in rounds, similar to a breadth first search. Let $R_{t}$ be the resampling set of round $t\geq 0$ . In round $t+1$ , let $A_{i}$ be an event on the boundary of $R_{t}$ that hasn’t been marked yet. We mark it “resampling” if the partial assignments on the shared variables of $A_{i}$ and $R_{t}$ can be extended so that $A_{i}$ occurs. Otherwise we mark it “not resampling”. We continue this process until there is no unmarked event left on the boundary of the current $R$ . An event outside of $\Gamma^{+}(R)$ may be left unmarked at the end of Algorithm 5. Note that once we mark some event “not resampling”, it will never be added into the resampling set. This is because $R$ is only grow in size during the algorithm.

In Algorithm 5, we are dynamically updating $R$ during each iteration of going through $U$ . This is potentially beneficial as an event $A_{i}$ may become incompatible with $R$ after some event $A_{j}$ is added, where both $i,j\in U$ .

We fix a priori an arbitrary ordering while choosing $i\in U$ in the “for” loop of Algorithm 5. Then the output of Algorithm 5 is deterministic under $\sigma$ . Call it ${\sf Res}(\sigma)$ .

Lemma 21.

Let $\sigma$ be an assignment. For any $i\in\partial{\sf Res}(\sigma)$ , $A_{i}\perp\sigma|_{{\sf Res}(\sigma)}$ .

Proof.

Since $i\in\partial{\sf Res}(\sigma)$ , it must have been marked. Moreover, $i\not\in{\sf Res}(\sigma)$ , so it must be marked as “not resampling”. Thus, there exists an intermediate set $R\subseteq{\sf Res}(\sigma)$ during the execution of Algorithm 5 such that $A_{i}\perp\sigma|_{R}$ and $i\in\partial R$ . It implies that $A_{i}$ is disjoint from the partial assignment of $\sigma$ restricted to ${\sf var}(A_{i})\cap{\sf var}(R)$ . However,

\displaystyle{\sf var}(A_{i})\cap{\sf var}(R)\subseteq{\sf var}(A_{i})\cap{\sf var}({\sf Res}(\sigma))

as $R\subseteq{\sf Res}(\sigma)$ . We have that $A_{i}\perp\sigma|_{{\sf Res}(\sigma)}$ . ∎

If Condition 5 is met, then ${\sf Res}(\sigma)={\sf Bad}(\sigma)$ . This is because at the first step, $R={\sf Bad}(\sigma)$ . By Condition 5, for any $i\in\partial{\sf Bad}(\sigma)$ , $A_{i}$ is disjoint from all $A_{j}$ ’s where $j\in{\sf Bad}(\sigma)$ and $A_{i}\sim A_{j}$ . Since $A_{j}$ occurs under $\sigma$ , $A_{i}\perp\sigma|_{R}$ . Algorithm 5 halts in the first iteration. In this case, since the resampling set is just the (independent) set of occurring bad events, the later Algorithm 6 coincides with Algorithm 2.

The key property of ${\sf Res}(\sigma)$ is that if we change the assignment outside of ${\sf Res}(\sigma)$ , then ${\sf Res}(\sigma)$ does not change, unless the new assignment introduces a new bad event outside of ${\sf Res}(\sigma)$ . More formally, we have the following lemma.

Lemma 22.

Let $\sigma$ be an assignment. Let $\sigma^{\prime}$ be another assignment such that ${\sf Bad}(\sigma^{\prime})\subseteq{\sf Res}(\sigma)$ and such that $\sigma$ and $\sigma^{\prime}$ agree on all variables in ${\sf var}({\sf Res}(\sigma))=\bigcup_{i\in{\sf Res}(\sigma)}{\sf var}(A_{i})$ . Then, ${\sf Res}(\sigma^{\prime})={\sf Res}(\sigma)$ .

Proof.

Let $R_{t}(\sigma),N_{t}(\sigma)$ be the intermediate sets $R,N$ , respectively, at time $t$ of the execution of Algorithm 5 under $\sigma$ . Thus $R_{0}(\sigma)={\sf Bad}(\sigma)$ and $R_{0}(\sigma)\subseteq R_{1}(\sigma)\subseteq\dots\subseteq{\sf Res}(\sigma)$ . Moreover, $N_{0}(\sigma)\subseteq N_{1}(\sigma)\subseteq\cdots$ . We will show by induction that $R_{t}(\sigma)=R_{t}(\sigma^{\prime})$ and $N_{t}(\sigma)=N_{t}(\sigma^{\prime})$ for any $t\geq 0$ .

For the base case of $t=0$ , by the condition of the lemma, for every $i\in{\sf Bad}(\sigma)\subseteq{\sf Res}(\sigma)$ , the assignments $\sigma$ and $\sigma^{\prime}$ agree on ${\sf var}(A_{i})$ ; or equivalently $\sigma|_{{\sf Res}(\sigma)}=\sigma^{\prime}|_{{\sf Res}(\sigma)}$ . Together with ${\sf Bad}(\sigma^{\prime})\subseteq{\sf Res}(\sigma)$ , it implies that ${\sf Bad}(\sigma)={\sf Bad}(\sigma^{\prime})$ and $R_{0}(\sigma)=R_{0}(\sigma^{\prime})$ . Moreover, $N_{0}(\sigma)=N_{0}(\sigma^{\prime})=\emptyset$ .

For the induction step $t>0$ , we have that $R_{t-1}(\sigma)=R_{t-1}(\sigma^{\prime})\subseteq{\sf Res}(\sigma)$ and $N_{t-1}(\sigma)=N_{t-1}(\sigma^{\prime})$ . Let $R=R_{t-1}(\sigma)=R_{t-1}(\sigma^{\prime})$ and $N=N_{t-1}(\sigma)=N_{t-1}(\sigma^{\prime})$ . Then we will go through $U=\partial R\setminus N$ , which is the same for both $\sigma$ and $\sigma^{\prime}$ . Moreover, while marking individual events “resampling” or not, it is sufficient to look at only $\sigma|_{R}=\sigma^{\prime}|_{R}$ since $R\subseteq{\sf Res}(\sigma)$ . Thus the markings are exactly the same, implying that $R_{t}(\sigma)=R_{t}(\sigma^{\prime})\subseteq{\sf Res}(\sigma)$ and $N_{t}(\sigma)=N_{t}(\sigma^{\prime})$ . ∎

(1)

Draw independent samples of all variables $X_{1},\dots,X_{n}$ from their respective distributions.
(2)

While at least one bad event occurs under the current assignment $\sigma$ , use Algorithm 5 to find ${\sf Res}(\sigma)$ . Resample all variables in $\bigcup_{i\in{\sf Res}(\sigma)}{\sf var}(A_{i})$ .
(3)

When none of the bad events holds, output the current assignment.

Algorithm 6 General Partial Rejection Sampling

To prove the correctness of Algorithm 6, we will only use three properties of ${\sf Res}(\sigma)$ , which are intuitively summarized as follows:

(1)

${\sf Bad}(\sigma)\subseteq{\sf Res}(\sigma)$ ;
(2)

For any $i\in\partial{\sf Res}(\sigma)$ , $A_{i}$ is disjoint from the partial assignment of $\sigma$ projected on ${\sf var}(A_{i})\cap{\sf var}({\sf Res}(\sigma))$ (Lemma 21);
(3)

If we fix the partial assignment of $\sigma$ projected on ${\sf var}({\sf Res}(\sigma))$ , then the output of Algorithm 5 is fixed, unless there are new bad events occurring outside of ${\sf Res}(\sigma)$ (Lemma 22).

Similarly to the analysis of Algorithm 2, we call $\mathcal{S}=S_{1},\dots,S_{\ell}$ the log, if $S_{i}$ is the set of resampling events in step $i$ of Algorithm 6. Note that for Algorithm 6, the log is not necessarily an independent set sequence. Also, recall that $\sigma_{i}$ is the assignment of variables in step $i$ , and $\sigma_{t}=\sigma_{T}$ if $T$ is when Algorithm 6 terminates and $t>T$ . The following lemma is an analogue of Lemma 7.

Lemma 23.

Given any log $\mathcal{S}$ of length $\ell\geq 1$ , $\sigma_{\ell+1}$ has the product distribution conditioned on none of $A_{i}$ ’s occurring where $i\notin\Gamma^{+}(S_{\ell})$ , namely from $\mu\big{(}\cdot\mid\bigwedge_{i\in[m]\setminus\Gamma^{+}(S_{\ell})}\overline{A_{i}}\big{)}$ .

Proof.

Suppose $i\notin\Gamma^{+}(S_{\ell})$ . By construction, $S_{\ell}$ contains all occurring bad events of $\sigma_{\ell}$ , and hence $A_{i}$ does not occur under $\sigma_{\ell}$ . In step $\ell$ , we only resample variables that are involved in $S_{\ell}$ , so $\sigma_{\ell+1}$ and $\sigma_{\ell}$ agree on ${\sf var}(A_{i})$ . Hence $A_{i}$ cannot occur under $\sigma_{\ell+1}$ . Call an assignment $\sigma$ valid if none of $A_{i}$ occurs where $i\notin\Gamma^{+}(S_{\ell})$ . To show that $\sigma_{\ell+1}$ has the desired conditional product distribution, we will show that the probabilities of getting any two valid assignments $\sigma$ and $\sigma^{\prime}$ are proportional to their probabilities of occurrence under the product distrbution $\mu(\cdot)$ .

Let $M$ be the resampling table so that the log of Algorithm 6 is $\mathcal{S}$ up to round $\ell$ , and $\sigma_{\ell+1}=\sigma$ . Indeed, since we only care about events up to round $\ell+1$ , we may truncate the table so that $M=\{X_{i,j}\mid 1\leq i\leq n,\;\;1\leq j\leq j_{i,\ell+1}\}$ . Let $M^{\prime}=\{X_{i,j}^{\prime}\;|\;1\leq i\leq n,\;\;1\leq j\leq j_{i,\ell+1}\}$ be another table where $X_{i,j}^{\prime}=X_{i,j}$ if $j<j_{i,\ell+1}$ for any $i\in[n]$ , and $\sigma^{\prime}=(X^{\prime}_{i,j_{i,\ell+1}}:1\leq i\leq n)$ is a valid assignment. In other words, we only change the last assignment $(X_{i,j_{i,\ell+1}}:1\leq i\leq n)$ to another valid assignment. We will use $\sigma^{\prime}_{t}=(X^{\prime}_{i,j_{i,t}})$ to denote the active elements of the second resampling table at time $t$ ; thus $\sigma^{\prime}=\sigma^{\prime}_{\ell+1}$ .

The lemma follows if Algorithm 6 running on $M^{\prime}$ generates the same log $\mathcal{S}$ up to round $\ell$ , since, if this is the case, then conditioned on the log $\mathcal{S}$ , every possible table $M$ where $\sigma_{\ell+1}=\sigma$ is one-to-one correspondence with another table $M^{\prime}$ where $\sigma_{\ell+1}^{\prime}=\sigma^{\prime}$ . Hence the probability of getting $\sigma$ is proportional to its weight under $\mu(\cdot)$ .

Suppose otherwise and the log of running Algorithm 6 on $M$ and $M^{\prime}$ differ. Let $t_{0}\leq\ell$ be the first round where resampling changes, by which we mean that ${\sf Res}(\sigma_{t_{0}})\neq{\sf Res}(\sigma_{t_{0}}^{\prime})$ . By Lemma 22, either ${\sf Bad}(\sigma_{t_{0}}^{\prime})\not\subseteq{\sf Res}(\sigma_{t_{0}})$ , or $\sigma_{t_{0}}|_{{\sf Res}(\sigma_{t_{0}})}\neq\sigma_{t_{0}}^{\prime}|_{{\sf Res}(\sigma_{t_{0}})}$ . In the latter case, there must be a variable $X_{i}$ with $i\in{\sf var}({\sf Res}(\sigma_{t_{0}}))$ and index $j_{i,\ell+1}$ . However, $i\in{\sf var}({\sf Res}(\sigma_{t_{0}}))$ means that $X_{i}$ is resampled at least once more in the original run on $M$ , and its index goes up to at least $j_{i,\ell+1}+1$ at round $\ell+1$ . A contradiction. Thus, $\sigma_{t_{0}}|_{{\sf Res}(\sigma_{t_{0}})}=\sigma_{t_{0}}^{\prime}|_{{\sf Res}(\sigma_{t_{0}})}$ and ${\sf Bad}(\sigma_{t_{0}}^{\prime})\not\subseteq{\sf Res}(\sigma_{t_{0}})$ .

As ${\sf Bad}(\sigma_{t_{0}}^{\prime})\not\subseteq{\sf Res}(\sigma_{t_{0}})$ , there must be a variable $X_{i_{0}}$ such that $j_{i_{0},t_{0}}=j_{i_{0},\ell+1}$ (otherwise $X_{i_{0},j_{i_{0},t_{0}}}=X_{i_{0},j_{i_{0},t_{0}}}^{\prime}$ ) and an event $A_{k}$ such that $i_{0}\in{\sf var}(A_{k})$ , $k\in{\sf Bad}(\sigma_{t_{0}}^{\prime})$ but $k\not\in{\sf Res}(\sigma_{t_{0}})$ . Suppose first that $\forall i\in{\sf var}(A_{k})$ , $j_{i,t_{0}}=j_{i,\ell+1}$ , which means that all variables of $A_{k}$ have reached their final values in the $M$ run at time $t_{0}$ . This implies that $k\notin\Gamma^{+}(S_{t})$ for any $t\geq t_{0}$ as otherwise some of the variables in ${\sf var}(A_{k})$ would be resampled at least once after round $t_{0}$ . In particular, $k\notin\Gamma^{+}(S_{\ell})$ . This contradicts with $\sigma^{\prime}$ being valid.

Otherwise there are some variables in ${\sf var}(A_{k})$ that get resampled after time $t_{0}$ in the $M$ run. Let $t_{1}$ be the first such time and $Y\subset{\sf var}(A_{k})$ be the set of variables resampled at round $t_{1}$ ; namely, $Y={\sf var}(A_{k})\cap{\sf var}({\sf Res}(\sigma_{t_{1}}))$ . We have that $\sigma_{t_{1}}|_{Y}=\sigma_{t_{0}}|_{Y}$ because $t_{1}$ is the first time of resampling variables in $Y$ . Moreover, as variables of $Y$ have not reached their final values yet in the $M$ run, $\sigma_{t_{0}}|_{Y}=\sigma_{t_{0}}^{\prime}|_{Y}$ . Thus, $\sigma_{t_{1}}|_{Y}=\sigma_{t_{0}}^{\prime}|_{Y}$ .

Assuming $k\in{\sf Res}(\sigma_{t_{1}})$ would contradict the fact that $X_{i_{0}}$ has reached its final value in the $M$ run. Hence $k\notin{\sf Res}(\sigma_{t_{1}})$ , but nevertheless variables in $Y\subset{\sf var}(A_{k})$ are resampled. This implies that $k\in\partial{\sf Res}(\sigma_{t_{1}})$ . By Lemma 21, $A_{k}\perp\sigma_{t_{1}}|_{{\sf Res}(\sigma_{t_{1}})}$ . As ${\sf var}(A_{k})$ cannot be disjoint from ${\sf var}({\sf Res}(\sigma_{t_{1}}))$ , this means that $A_{k}$ is imcompartible with the partial assignment of $\sigma_{t_{1}}$ restricted to ${\sf var}(A_{k})\cap{\sf var}({\sf Res}(\sigma_{t_{1}}))=Y$ . Equivalently, $A_{k}\perp\sigma_{t_{1}}|_{Y}$ . However we know that $\sigma_{t_{1}}|_{Y}=\sigma_{t_{0}}^{\prime}|_{Y}$ , so $A_{k}\perp\sigma_{t_{0}}^{\prime}|_{Y}$ , contradicting $k\in{\sf Bad}(\sigma_{t_{0}}^{\prime})$ . ∎

Theorem 24.

If Algorithm 6 halts, then its output has the product distribution conditioned on none of $A_{i}$ ’s occurring.

Proof.

Let a sequence $\mathcal{S}$ of sets of events be the log of any successful run. Then $S_{\ell}=\emptyset$ . By Lemma 23, conditioned on the log $\mathcal{S}$ , the output assignment $\sigma$ is $\mu\big{(}\cdot\mid\bigwedge_{i\in[m]\setminus\Gamma^{+}(S_{\ell})}\overline{A_{i}}\big{)}=\mu\big{(}\cdot\mid\bigwedge_{i\in[m]}\overline{A_{i}}\big{)}$ . This is valid for any possible log, and the theorem follows. ∎

6. Running Time Analysis of Algorithm 6

Obviously when there is no assignment avoiding all bad events, then Algorithm 6 will never halt. Thus we want to assume some conditions to guarantee a desired assignment. However, the optimal condition of Theorem 2 is quite difficult to work under. Instead, in this section we will be working under the assumption that the asymmetric LLL condition (1) holds. In fact, to make the presentation clean, we will mostly work with the simpler symmetric case.

However, as mentioned in Section 4.3, [4, Corollary 30] showed that even under the asymmetric LLL condition (1), sampling can still be NP-hard. We thus in turn look for further conditions to make Algorithm 6 efficient.

Recall that $\mu(\cdot)$ is the product distribution of sampling all variables independently. For two distinct events $A_{i}\sim A_{j}$ , let $R_{ij}$ be the event that the partial assignments on ${\sf var}(A_{i})\cap{\sf var}(A_{j})$ can be extended to an assignment making $A_{j}$ true. Thus, if $A_{i}\not\perp\sigma|_{S}$ for some event set $S$ , then $R_{ji}$ must hold for all $A_{j}\in S$ and $A_{j}\sim A_{i}$ . Conversely, it is possible that each individual $R_{ji}$ is true for all $A_{j}\in R$ and $A_{j}\sim A_{i}$ , and yet $A_{i}\perp\sigma|_{S}$ . Also note that $R_{ij}$ is not necessarily the same as $R_{ji}$ . Let $r_{ij}:=\mu(R_{ij})$ .

Define $\displaystyle p:=\max_{i\in[m]}p_{i}$ and $\displaystyle r:=\max_{A_{i}\sim A_{j},\;i\neq j}r_{ij}$ . Let $\Delta$ be the maximum degree of the dependency graph $G$ . The main result of the section is the following theorem.

Theorem 25.

Let $m$ be the number of events and $n$ be the number of variables. For any $\Delta\geq 2$ , if $6ep\Delta^{2}\leq 1$ and $3er\Delta\leq 1$ , then the expected number of resampled events of Algorithm 6 is $O(m)$ .

Moreover, when these conditions hold, the number of rounds is $O(\log m)$ and the number of variable resamples is $O(n\log m)$ , both in expectation and with high probability.

The first condition $6ep\Delta^{2}\leq 1$ is stronger than the condition of the symmetric Lovász Local Lemma, but this seems necessary since [4, Corollary 30] implies that if $p\Delta^{2}\geq C$ for some constant $C$ then the sampling problem is NP-hard. Intuitively, the second condition $3er\Delta\leq 1$ bounds the expansion from bad events to resampling events at every step of Algorithm 6. We will prove Theorem 25 in the rest of the section.

Let $S\subseteq[m]$ be a subset of vertices of the dependency graph $G$ . Recall that $A(S)$ is the event $\bigwedge_{i\in S}A_{i}$ and $B(S)$ is the event $\bigwedge_{i\in S}\overline{A_{i}}$ . Moreover, $S^{c}$ is the complement of $S$ , namely $S^{c}=[m]\setminus S$ , and $S^{e}$ is the “exterior” of $S$ , namely $S^{e}=[m]\setminus\Gamma^{+}(S)$ .

Lemma 23 implies that if we resample $S$ at some step $t$ of Algorithm 6, then at step $t+1$ the distribution is the product measure $\mu$ conditioned on none of the events in the exterior of $S$ holds; namely $\mathop{\mathrm{Pr}}\nolimits_{\mu}(\cdot\mid B(S^{e}))$ .

Let $E$ be an event (not necessarily one of $A_{i}$ ) depending on a set ${\sf var}(E)$ of variables. Let $\Gamma(E):=\{i\mid i\in[m],\;{\sf var}(A_{i})\cap{\sf var}(E)\neq\emptyset\}$ if $E$ is not one of $A_{i}$ , and $\Gamma(A_{i}):=\{j\mid j\in[m],\;j\not=i\text{ and }{\sf var}(A_{j})\cap{\sf var}(A_{i})\neq\emptyset\}$ is defined as usual. Let $S\subseteq[m]$ be a subset of vertices of $G$ . The next lemma bounds the probability of $E$ conditioned on none of the events in $S$ happening. It was first observed in [19]. We include a proof for completeness (which is a simple adaption of the ordinary local lemma proof).

Lemma 26 ([19, Theorem 2.1]).

Suppose (1) holds. For an event $E$ and any set $S\subseteq[m]$ ,

\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(E\mid B(S))\leq\mathop{\mathrm{Pr}}\nolimits_{\mu}(E)\prod_{i\in\Gamma(E)\cap S}(1-x_{i})^{-1},

where $x_{i}$ ’s are from (1).

Proof.

We prove the inequality by induction on the size of $S$ . The base case is when $S$ is empty and the lemma holds trivially.

For the induction step, let $S_{1}=S\cap\Gamma(E)$ and $S_{2}=S\setminus S_{1}$ . If $S_{1}=\emptyset$ , then the lemma holds trivially as $E$ is independent from $S$ in this case. Otherwise $S_{2}$ is a proper subset of $S$ . We have that

	$\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(E\mid B(S))$	$\displaystyle=\frac{\mathop{\mathrm{Pr}}\nolimits_{\mu}(E\wedge B(S_{1})\mid B(S_{2}))}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{1})\mid B(S_{2}))}$
		$\displaystyle\leq\frac{\mathop{\mathrm{Pr}}\nolimits_{\mu}(E\mid B(S_{2}))}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{1})\mid B(S_{2}))}$
		$\displaystyle=\frac{\mathop{\mathrm{Pr}}\nolimits_{\mu}(E)}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{1})\mid B(S_{2}))},$

where the last line is because $E$ is independent from $B(S_{2})$ . We then use the induction hypothesis to bound the denominator. Suppose $S_{1}=\{j_{1},j_{2},\dots,j_{r}\}$ for some $r>0$ . Then,

	$\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{1})\mid B(S_{2}))$	$\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(\bigwedge_{i\in S_{1}}\overline{A_{i}}\;\Bigg{\|}\bigwedge_{i\in S_{2}}\overline{A_{i}}\right)$
		$\displaystyle=\prod_{t=1}^{r}\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(\overline{A_{j_{t}}}\;\Bigg{\|}\bigwedge_{s=1}^{t-1}\overline{A_{j_{s}}}\wedge\bigwedge_{i\in S_{2}}\overline{A_{i}}\right)$
		$\displaystyle=\prod_{t=1}^{r}\left(1-\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(A_{j_{t}}\;\Bigg{\|}\bigwedge_{s=1}^{t-1}\overline{A_{j_{s}}}\wedge\bigwedge_{i\in S_{2}}\overline{A_{i}}\right)\right).$

By the induction hypothesis and (1), we have that for any $1\leq t\leq r$ ,

	$\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(A_{j_{t}}\;\Bigg{\|}\bigwedge_{s=1}^{t-1}\overline{A_{j_{s}}}\wedge\bigwedge_{i\in S_{2}}\overline{A_{i}}\right)$	$\displaystyle\leq\mathop{\mathrm{Pr}}\nolimits_{\mu}(A_{j_{t}})\prod_{i\in\Gamma(j_{t})}(1-x_{i})^{-1}$
		$\displaystyle\leq x_{j_{t}}\prod_{i\in\Gamma(j_{t})}(1-x_{i})\prod_{i\in\Gamma(j_{t})}(1-x_{i})^{-1}$
		$\displaystyle=x_{j_{t}}.$

Thus,

\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{1})\mid B(S_{2}))

\displaystyle\geq\prod_{i\in S_{1}}\left(1-x_{i}\right).

The lemma follows. ∎

Typically we set $x_{i}=\frac{1}{\Delta+1}$ in the symmetric setting. Then (1) holds if $ep(\Delta+1)\leq 1$ . In this setting, Lemma 26 is specialized into the following.

Corollary 27.

If $ep(\Delta+1)\leq 1$ , then

\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(E\mid B(S))\leq\mathop{\mathrm{Pr}}\nolimits_{\mu}(E)\left(1+\frac{1}{\Delta}\right)^{\left|\Gamma(E)\right|}.

In particular, if $ep(\Delta+1)\leq 1$ , for any event $A_{i}$ where $i\not\in S$ , by Corollary 27,

(15)

\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(A_{i}\mid B(S))

\displaystyle\leq p_{i}\left(\frac{\Delta+1}{\Delta}\right)^{\Delta}\leq ep.

Let ${\sf Res}_{t}$ be the resampling set of Algorithm 6 at round $t\geq 1$ , and let ${\sf Bad}_{t}$ be the set of bad events present at round $t$ . If Algorithm 6 has already stopped at round $t$ , then ${\sf Res}_{t}={\sf Bad}_{t}=\emptyset$ . Furthermore, let ${\sf Bad}_{0}={\sf Res}_{0}=[m]$ since in the first step all random variables are fresh.

Let $C:=1-p$ .

Lemma 28.

For any $\Delta\geq 2$ , if $6ep\Delta^{2}\leq 1$ and $3er\Delta\leq 1$ , then,

\displaystyle\mathop{\mathbb{{}E}}\nolimits\left(\left|{\sf Res}_{t+1}\right|\mid{\sf Res}_{0},\dots,{\sf Res}_{t}\right)\leq C\left|{\sf Res}_{t}\right|.

Proof.

Clearly for any $\Delta\geq 2$ , the condition $6ep\Delta^{2}\leq 1$ implies that $ep(\Delta+1)\leq 1$ . Therefore the prerequisite of Corollary 27 is met. Notice that, by Lemma 23, we have that

\displaystyle\mathop{\mathbb{{}E}}\nolimits\left(\left|{\sf Res}_{t+1}\right|\mid{\sf Res}_{0},\dots,{\sf Res}_{t}\right)=\mathop{\mathbb{{}E}}\nolimits\left(\left|{\sf Res}_{t+1}\right|\mid{\sf Res}_{t}\right).

We will show in the following that

\displaystyle\mathop{\mathbb{{}E}}\nolimits\big{(}\left|{\sf Res}_{t+1}\right|\bigm{|}\text{the set of resampling events at round $t$ is (exactly) ${\sf Res}_{t}$}\big{)}\leq C\left|{\sf Res}_{t}\right|,

where $C=1-p$ . This implies the lemma.

Call a path $i_{0},i_{1},\dots,i_{\ell}$ where $\ell\geq 0$ in the dependency graph $G$ bad if the following holds:

(1)

$i_{0}\in{\sf Bad}_{t+1}$ ;
(2)

the event $R_{i_{k-1}i_{k}}$ holds for every $1\leq k\leq\ell$ ;
(3)

any $i_{k}$ ( $k\in[\ell]$ ) is not adjacent to $i_{k^{\prime}}$ unless $k^{\prime}=k-1$ or $k+1$ .

Indeed, paths having the third property are induced paths in $G$ . If $i\in{\sf Res}_{t+1}$ , $A_{i}$ must be added by Algorithm 5 during some iteration of the while loop. In the $0$ th iteration, all of ${\sf Bad}_{t+1}$ are added. We claim that for any $i\in{\sf Res}_{t+1}$ added in $\ell\geq 0$ iteration by Algorithm 5, there exists at least one bad path such that $i_{0},i_{1},\dots,i_{\ell}=i$ . We show the claim by an induction on $\ell$ .

•

The base case is that $\ell=0$ , and thus $i\in{\sf Bad}_{t+1}$ . The bad path is simply $i$ itself.
•

For the induction step $\ell\geq 1$ , due to Algorithm 5, there must exist $i_{\ell-1}$ adjacent to $i_{\ell}=i$ such that $i_{\ell-1}$ has been marked “resampling” during iteration $\ell-1$ , and $R_{i_{\ell-1}i_{\ell}}$ occurs. By the induction hypothesis, there exists a bad path $i_{0},\dots,i_{\ell-1}$ . Since $i$ is not marked at iteration $\ell-1$ , $i$ is not adjacent to any vertices that has been marked up to iteration $\ell-2$ . Thus $i_{\ell}$ is not adjacent to any $i_{k}$ where $k\leq\ell-2$ , and the path $i_{0},\dots,i_{\ell-1},i_{\ell}$ is bad.

We next turn to bounding the number of bad paths. It is straightforward to bound the size of ${\sf Bad}_{t+1}\subseteq\Gamma^{+}({\sf Res}_{t})$ . If $i\in{\sf Bad}_{t+1}$ , then there are two possibilities. The first scenario is that $i\in{\sf Res}_{t}$ and then all of its random variables are fresh. In this case it occurs with probability $p_{i}\leq p$ . Otherwise $i\in\partial{\sf Res}_{t}$ . Recall that by Lemma 23, the distribution at round $t+1$ is $\mathop{\mathrm{Pr}}\nolimits_{\mu}(\cdot\mid B({\sf Res}_{t}^{e}))$ . By Corollary 27, for any $i\in\partial{\sf Res}_{t}$ ,

\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(A_{i}\mid B({\sf Res}_{t}^{e})\right)\leq p\left(1+\frac{1}{\Delta}\right)^{\Delta}\leq ep.

This implies that

		$\displaystyle\mathop{\mathbb{{}E}}\nolimits\left(\left\|{\sf Bad}_{t+1}\right\|\mid\text{the set of resampling events at round $t$ is (exactly) ${\sf Res}_{t}$}\right)$
(16)		$\displaystyle\leq$	$\displaystyle\;p\left\|{\sf Res}_{t}\right\|+ep\left\|\partial{\sf Res}_{t}\right\|\leq p(1+e\Delta)\left\|{\sf Res}_{t}\right\|.$

Next we bound the size of $\left|{\sf Res}_{t+1}\setminus{\sf Bad}_{t+1}\right|$ . Let $P=i_{0},\dots,i_{\ell}$ be an induced path; that is, for any $k\in[\ell]$ , $i_{k}$ is not adjacent to $i_{k^{\prime}}$ unless $k^{\prime}=k-1$ or $k+1$ . Only induced paths are potentially bad. Moreover, $P$ contributes to $\left|{\sf Res}_{t+1}\setminus{\sf Bad}_{t+1}\right|$ only if its length $\ell\geq 1$ . Let $D_{P}$ be the event that $P$ is bad. In other words, $D_{P}:=A_{i_{0}}\wedge R_{i_{0}i_{1}}\wedge\dots\wedge R_{i_{\ell-1}i_{\ell}}$ . By Lemma 23, we have that

		$\displaystyle\;\mathop{\mathrm{Pr}}\nolimits(P\text{ is bad at round $t+1$}\mid\text{the set of resampling events at round $t$ is ${\sf Res}_{t}$})$
(17)		$\displaystyle=$	$\displaystyle\;\mathop{\mathrm{Pr}}\nolimits_{\mu}(D_{P}\mid B({\sf Res}_{t}^{e})),$

where we recall that we denote ${\sf Res}_{t}^{e}=\inb{m}\setminus\Gamma^{+}({\sf Res}_{t})$ . Applying Corollary 27 with $S={\sf Res}_{t}^{e}$ , we have that

(18)

\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(D_{P}\mid B({\sf Res}_{t}^{e}))\leq\mathop{\mathrm{Pr}}\nolimits_{\mu}(D_{P})\left(1+\frac{1}{\Delta}\right)^{\left|\Gamma(D_{P})\right|}.

Note that $\Gamma(R_{i_{k}i_{k+1}})\subseteq\Gamma^{+}(A_{i_{k}})$ for all $0\leq k\leq\ell-1$ . By the definition of $D_{P}$ ,

	$\displaystyle\Gamma(D_{P})$	$\displaystyle\subseteq\Gamma(A_{i_{0}})\cup\Gamma(R_{i_{0}i_{1}})\cup\dots\cup\Gamma(R_{i_{\ell-1}i_{\ell}})$
		$\displaystyle\subseteq\Gamma^{+}(A_{i_{0}})\cup\Gamma^{+}(A_{i_{1}})\cup\dots\cup\Gamma^{+}(A_{i_{\ell-1}}),$

implying that

(19)

\displaystyle\left|\Gamma(D_{P})\right|

\displaystyle\leq\ell(\Delta+1),

as $\left|\Gamma^{+}(A_{k})\right|\leq\Delta+1$ for all $0\leq k\leq\ell-1$ .

We claim that $A_{i_{0}}$ is independent from $R_{i_{k-1}i_{k}}$ for any $2\leq k\leq\ell$ . This is because $i_{k}$ is not adjacent to $i_{0}$ for any $k\geq 2$ , implying that

	$\displaystyle{\sf var}(R_{i_{k-1}i_{k}})\cap{\sf var}(A_{i_{0}})$	$\displaystyle={\sf var}(A_{i_{k-1}})\cap{\sf var}(A_{i_{k}})\cap{\sf var}(A_{i_{0}})$
		$\displaystyle\subseteq{\sf var}(A_{i_{k}})\cap{\sf var}(A_{i_{0}})=\emptyset.$

Moreover, any two events $R_{i_{k-1}i_{k}}$ and $R_{i_{k^{\prime}-1}i_{k^{\prime}}}$ are independent of each other as long as $k<k^{\prime}$ . This is also due to the third property of bad paths. Since $k<k^{\prime}$ , we see that $\left|k^{\prime}-(k-1)\right|\geq 2$ and $i_{k^{\prime}}$ is not adjacent to $i_{k-1}$ . It implies that

	$\displaystyle{\sf var}(R_{i_{k-1}i_{k}})\cap{\sf var}(R_{i_{k^{\prime}-1}i_{k^{\prime}}})$	$\displaystyle={\sf var}(A_{i_{k-1}})\cap{\sf var}(A_{i_{k}})\cap{\sf var}(A_{i_{k^{\prime}-1}})\cap{\sf var}(A_{i_{k^{\prime}}})$
		$\displaystyle\subseteq{\sf var}(A_{i_{k-1}})\cap{\sf var}(A_{i_{k^{\prime}}})=\emptyset.$

The consequence of these independences is

	$\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(D_{P})$	$\displaystyle\leq\mathop{\mathrm{Pr}}\nolimits_{\mu}(A_{i_{0}}\wedge R_{i_{1}i_{2}}\wedge\dots\wedge R_{i_{\ell-1}i_{\ell}})$
		$\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}(A_{i_{0}})\prod_{k=2}^{\ell}\mathop{\mathrm{Pr}}\nolimits_{\mu}(R_{i_{k-1}i_{k}})$
(20)			$\displaystyle\leq pr^{\ell-1}.$

Note that in the calculation above we ignore $R_{i_{0}i_{1}}$ as it can be positively correlated to $A_{i_{0}}$ .

Combining (6), (18), (19), and (20), we have that

		$\displaystyle\mathop{\mathrm{Pr}}\nolimits(D_{P}\mid\text{the set of resampling events at round $t$ is (exactly) ${\sf Res}_{t}$})$
(21)		$\displaystyle\leq$	$\displaystyle\;pr^{\ell-1}\left(1+\frac{1}{\Delta}\right)^{\ell(\Delta+1)}\leq\frac{p}{r}\left(\left(1+\frac{1}{\Delta}\right)er\right)^{\ell}.$

In order to apply a union bound on all bad paths, we need to bound their number. The first vertex $i_{0}$ must be in ${\sf Bad}_{t+1}$ , implying that $i_{0}\in\Gamma^{+}({\sf Res}_{t})$ . Hence there are at most $(\Delta+1)\left|{\sf Res}_{t}\right|$ choices. Then there are at most $\Delta$ choices of $i_{1}$ and $(\Delta-1)$ choices of every subsequent $i_{k}$ where $k\geq 2$ . Hence, there are at most $\Delta(\Delta-1)^{\ell-1}$ induced paths of length $\ell\geq 1$ , originating from a particular $i_{0}\in\Gamma^{+}({\sf Res}_{t})$ . Thus, by a union bound on all potentially bad paths and (6),

		$\displaystyle\mathop{\mathbb{{}E}}\nolimits\big{(}\left\|{\sf Res}_{t+1}\setminus{\sf Bad}_{t+1}\right\|\bigm{\|}\text{the set of resampling events at round $t$ is (exactly) ${\sf Res}_{t}$}\big{)}$
	$\displaystyle\leq$	$\displaystyle\;\sum_{\ell=1}^{\infty}(\Delta+1)\left\|{\sf Res}_{t}\right\|\Delta(\Delta-1)^{\ell-1}p/r\left(\left(1+\frac{1}{\Delta}\right)er\right)^{\ell}$
	$\displaystyle=$	$\displaystyle\;\frac{(\Delta+1)\Delta p}{(\Delta-1)r}\left\|{\sf Res}_{t}\right\|\sum_{\ell=1}^{\infty}\left(\left(\frac{\Delta^{2}-1}{\Delta}\right)er\right)^{\ell}$
	$\displaystyle\leq$	$\displaystyle\;\frac{(\Delta+1)\Delta p}{(\Delta-1)r}\left\|{\sf Res}_{t}\right\|\sum_{\ell=1}^{\infty}\left(er\Delta\right)^{\ell}=\frac{(\Delta+1)\Delta p}{(\Delta-1)r}\cdot\frac{er\Delta}{1-er\Delta}\left\|{\sf Res}_{t}\right\|$
(22)		$\displaystyle\leq$	$\displaystyle\;\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot ep\Delta^{2}\left\|{\sf Res}_{t}\right\|,$

where we use the condition that $er\Delta\leq 1/3$ .

Combining (16) and (22), we have that

		$\displaystyle\mathop{\mathbb{{}E}}\nolimits\left(\left\|{\sf Res}_{t+1}\right\|\mid\text{the set of resampling events at round $t$ is (exactly) ${\sf Res}_{t}$}\right)$
	$\displaystyle\leq$	$\displaystyle\;\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot ep\Delta^{2}\left\|{\sf Res}_{t}\right\|+p(1+e\Delta)\left\|{\sf Res}_{t}\right\|$
	$\displaystyle=$	$\displaystyle\;p\left(\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot e\Delta^{2}+(1+e\Delta)\right)\left\|{\sf Res}_{t}\right\|.$

All that is left is to verify that

\displaystyle p\left(\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot e\Delta^{2}+(1+e\Delta)\right)\leq C,

where $C=1-p$ . This is straightforward by the condition $6ep\Delta^{2}\leq 1$ and $\Delta\geq 2$ , as

	$\displaystyle C-p\left(\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot e\Delta^{2}+(1+e\Delta)\right)$	$\displaystyle\geq 6ep\Delta^{2}-p-p\left(\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot e\Delta^{2}+(1+e\Delta)\right)$
		$\displaystyle\geq p\left(6e\Delta^{2}-1-\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot e\Delta^{2}-(1+e\Delta)\right)\geq 0.\qed$

For $t\geq 1$ , by Lemma 28 and the law of iterated expectations,

\displaystyle\mathop{\mathbb{{}E}}\nolimits\left|{\sf Res}_{t}\right|\leq C\mathop{\mathbb{{}E}}\nolimits\left|R_{t-1}\right|.

Thus, $\mathop{\mathbb{{}E}}\nolimits\left|{\sf Res}_{t}\right|\leq C^{t}\left|R_{0}\right|=C^{t}m$ . As $C<1$ , the expected number of resampling events is

\displaystyle\sum_{t=0}^{\infty}\mathop{\mathbb{{}E}}\nolimits\left|{\sf Res}_{t}\right|\leq\sum_{t=0}^{\infty}C^{t}m=\frac{1}{1-C}\cdot m.

This implies the first part of Theorem 25. For the second part, just observe that after $O(\log m)$ rounds, the expected number of bad events is less than $m^{-c}$ for any constant $c$ , and Markov inequality applies.

The first condition of Theorem 25 requires $p$ to be roughly $O(\Delta^{-2})$ . This is necessary, due to the hardness result in [4] (see also Theorem 32). Also, in the analysis, it is possible to always add all of $\partial{\sf Bad}_{t}$ into ${\sf Res}_{t}$ . Consider a monotone CNF formula. If a clause is unsatisfied, then all of its neighbours need to be added into the resampling set. Such behaviours would eventually lead to the $O(\Delta^{-2})$ bound. This situation is in contrast to the resampling algorithm of Moser and Tardos [31], which only requires $p=O(\Delta^{-1})$ as in the symmetric Lovász Local Lemma.

Also, we note that monotone CNF formulas, in which all correlations are positive, seem to be the worst instances for our algorithms. In particular, Algorithm 6 is exponentially slow when the underlying hypergraph of the monotone CNF is a (hyper-)tree. This indicates that our condition on $r$ in Theorem 25 is necessary for Algorithm 6. In contrast, Hermon et al. [24] show that on a linear hypergraph (including the hypertree), the Markov chain mixes rapidly for degrees higher than the general bound. It is unclear how to combine the advantages from these two approaches.

7. Applications of Algorithm 6

7.1. $k$ -CNF Formulas

Consider a $k$ -CNF formula where every variable appears in at most $d$ clauses. Then Theorem 1 says that if $d\leq 2^{k}/(ek)+1$ , then there exists a satisfying assignment. However, as mentioned in Section 4.3, [4, Corollary 30] showed that when $d\geq 5\cdot 2^{k/2}$ , then sampling satisfying assignments is NP-hard, even restricted to monotone formulas.

To apply Algorithm 6 in this setting, we need to bound the parameter $r$ in Theorem 25. A natural way is to lower bound the number of shared variables between any two dependent clauses. If this lower bound is $s$ , then $r=2^{-s}$ since there is a unique assignment on these $s$ variables that can be extended in such a way as to falsify the clauses.

Definition 29.

Let $d\geq 2$ and $s\geq 1$ . A $k$ -CNF formula is said to have degree $d$ if every variable appears in at most $d$ clauses. Moreover, it has intersection $s$ if for any two clauses $C_{i}$ and $C_{j}$ that share at least one variable, $\left|{\sf var}(C_{i})\cap{\sf var}(C_{j})\right|\geq s$ .

Note that by the definition if $k<s$ then the formula is simply isolated clauses. Otherwise, $k\geq s$ and we have that $p_{i}=p=2^{-k}$ and $r\leq 2^{-s}$ . A simple double counting argument indicates that the maximum degree $\Delta$ in the dependency graph satisfies $\Delta\leq\frac{dk}{s}$ .

We claim that for integers $d$ and $k$ such that $d\geq 3$ and $dk\geq 2^{3e}$ , conditions $d\leq\frac{2^{k/2}}{6e}$ and $s\geq\min\{\log_{2}dk,k/2\}$ imply the conditions of Theorem 25, namely, $6ep\Delta^{2}\leq 1$ and $3er\Delta\leq 1$ . In fact, if $s\geq\log_{2}dk\geq\log_{2}d$ , then

\displaystyle 6ep\Delta^{2}\leq 6e2^{-k}\left(\frac{dk}{s}\right)^{2}\leq 6e2^{-k}\left(\frac{dk}{\log_{2}d}\right)^{2}\leq 6e\left(\frac{k}{6e\left(k/2-\log_{2}6e\right)}\right)^{2}<1,

as $\frac{d}{\log_{2}d}$ is increasing for any $d\geq 3$ . Moreover,

\displaystyle 3er\Delta\leq\frac{3edk}{2^{s}s}\leq\frac{3e}{\log_{2}(dk)}\leq 1.

Otherwise $k/2\leq s\leq\log_{2}dk$ , which implies that

\displaystyle 6ep\Delta^{2}\leq 6e2^{-k}\left(\frac{dk}{s}\right)^{2}\leq 6e2^{-k}\left(\frac{dk}{k/2}\right)^{2}\leq 6e2^{-k}\left(\frac{2^{k/2}}{3e}\right)^{2}<1,

and

\displaystyle 3er\Delta\leq\frac{3edk}{2^{s}s}\leq\frac{6edk}{k2^{k/2}}\leq 1.

Thus by Theorem 25 we have the following result. Note that resampling a clause involves at most $k$ variables, and for $k$ -CNF formulas with degree $d$ , the number of clauses is linear in the number of variables.

Corollary 30.

For integers $d$ and $k$ such that $d\geq 3$ and $dk\geq 2^{3e}$ , if $d\leq\frac{1}{6e}\cdot 2^{k/2}$ and $s\geq\min\{\log_{2}dk,k/2\}$ , then Algorithm 6 samples satisfying assignments of $k$ -CNF formulas with degree $d$ and intersection $s$ in $O(n)$ time in expectation and in $O(n\log n)$ time with high probability, where $n$ is the number of variables.

We remark that the lower bound on intersection size $s$ in Corollary 30 does not make the problem trivial. Note that the lower bound $\min\{\log_{2}dk,k/2\}$ is at most $k/2$ . The “hard” instance in the proof of [4, Corollary 30] has roughly $k/2$ shared variables for each pair of dependent clauses. For completeness, we will show that if $k$ is even, and $d\geq 4\cdot 2^{k/2}$ and $s=k/2$ , then the sampling problem is NP-hard. The proof is almost identical to that of [4, Corollary 30]. The case of odd $k$ can be similarly handled but with larger constants.

We will use the inapproximability result of Sly and Sun [36] (or equivalently, of Galanis et al. [12]) for the hard-core model. We first remind the reader of the relevant definitions. Let $\lambda>0$ . For a graph $G=(V,E)$ , the hard-core model with parameter $\lambda>0$ is a probability distribution over the set of independent sets of $G$ ; each independent set $I$ of $G$ has weight proportional to $\lambda^{|I|}$ . The normalizing factor of this distribution is the partition function $Z_{G}(\lambda)$ , formally defined as $Z_{G}(\lambda):=\sum_{I}\lambda^{|I|}$ where the sum ranges over all independent sets $I$ of $G$ . The hardness result we are going to use is about approximating $Z_{G}(\lambda)$ , but it is standard to translate it into the sampling setting as the problem is self-reducible.

Theorem 31 ([36, 12]).

For $d\geq 3$ , let $\lambda_{c}(d):=(d-1)^{d-1}/(d-2)^{d}$ . For all $\lambda>\lambda_{c}(d)$ , it is NP-hard to sample an independent set $I$ with probability proportional to $\lambda^{|I|}$ in a $d$ -regular graph.

Theorem 32.

Let $k$ be an even integer. If $d\geq 4\cdot 2^{k/2}$ and $s=k/2$ , then it is NP-hard to sample satisfying assignments of $k$ -CNF formulas with degree $d$ and intersection $s$ uniformly at random.

Proof.

Given a $d$ -regular graph $G=(V,E)$ , we will construct a monotone $k$ -CNF formula $C$ with degree $d$ and intersection $k/2$ such that satisfying assignments of $C$ can be mapped to independent sets of $G$ . Replace each vertex $v\in V$ by $s$ variables, say $v_{1},\dots,v_{s}$ . If $(u,v)\in E$ , then create a monotone clause $v_{1}\vee\dots\vee v_{s}\vee u_{1}\vee\dots\vee u_{s}$ . It is easy to see that every variable appears exactly $d$ times since $G$ is $d$ -regular. Moreover, the number of shared variables is always $s$ and the clause size is $2s=k$ .

For each satisfying assignment, we map it to a subset of vertices of $G$ . If all of $v_{1},\dots,v_{s}$ are false, then make $v$ occupied. Otherwise $v$ is unoccupied. Thus a satisfying assignment is mapped to an independent set of $G$ . Moreover, there are $(2^{k/2}-1)^{n-\left|I\right|}$ satisfying assignments corresponding to an independent set $I$ , where $n$ is the number of vertices in $G$ . Thus the weight of $I$ is proportional to $(2^{k/2}-1)^{-\left|I\right|}$ ; namely $\lambda=(2^{k/2}-1)^{-1}$ in the hard-core model.

In order to apply Theorem 31, all we need to do is to verify that $\lambda>\lambda_{c}$ , or equivalently

\displaystyle 2^{k/2}-1<\frac{(d-2)^{d}}{(d-1)^{d-1}}.

This can be done as follows,

\displaystyle\frac{(d-2)^{d}}{(d-1)^{d-1}}

\displaystyle=(d-2)\Big{(}1-\frac{1}{d-1}\Big{)}^{d-1}\geq\bigg{(}\frac{4}{5}\bigg{)}^{5}(d-2)>2^{k/2}-1.\qed

Due to Theorem 32, we see that the dependence between $k$ and $d$ in Corollary 30 is tight in the exponent, even with the further assumption on intersection $s$ .

7.2. Independent Sets

We may also apply Algorithm 6 to sample hard-core configurations with parameter $\lambda$ . Every vertex is associated with a random variable which is occupied with probability $\frac{\lambda}{1+\lambda}$ . In this case, each edge defines a bad event which holds if both endpoints are occupied. Thus $p=\left(\frac{\lambda}{1+\lambda}\right)^{2}$ . Algorithm 6 is specialized to Algorithm 7.

(1)

Mark each vertex occupied with probability $\frac{\lambda}{1+\lambda}$ independently.
(2)

While there is at least one edge with both end points occupied, resample all occupied components of sizes at least $2$ and their boundaries.
(3)

Output the set of vertices.

Algorithm 7 Sample Hard-core Configurations

To see this, consider a graph $G=(V,E)$ with maximum degree $d$ . Given a configuration $\sigma:V\rightarrow\{0,1\}$ , consider the subgraph $G[\sigma]$ of $G$ induced by the vertex subset $\{v\in V:\sigma(v)=1\}$ . Then we denote by ${\sf BadVtx}(\sigma)$ the set of vertices in any component of $G[\sigma]$ of size at least $2$ . Then the output of Algorithm 5 is

\displaystyle{\sf ResVtx}(\sigma):={\sf BadVtx}(\sigma)\cup\partial{\sf BadVtx}(\sigma).

This is because first, all of $\partial{\sf BadVtx}(\sigma)$ will be resampled, since any of them has at least one occupied neighbour in ${\sf BadVtx}(\sigma)$ . Secondly, $v\in\partial{\sf BadVtx}(\sigma)$ is unoccupied (otherwise $v\in{\sf BadVtx}(\sigma)$ ), and Algorithm 5 stops after adding all of $\partial{\sf BadVtx}(\sigma)$ . This explains Algorithm 7.

Moreover, let ${\sf Bad}(\sigma)$ be the set of edges whose both endpoints are occupied under $\sigma$ . Let ${\sf Res}(\sigma)$ be the set of edges whose both endpoints are in ${\sf ResVtx}(\sigma)$ . Let $\sigma_{t}$ be the random configuration of Algorithm 7 at round $t$ if it has not halted, and ${\sf Bad}_{t}={\sf Bad}(\sigma_{t})$ , ${\sf Res}_{t}={\sf Res}(\sigma_{t})$ .

Lemma 33.

If $ep(2d-1)<1$ , then $\mathop{\mathbb{{}E}}\nolimits\left|{\sf Bad}_{t+1}\right|\leq(4ed^{2}-1)p\mathop{\mathbb{{}E}}\nolimits\left|{\sf Bad}_{t}\right|$ .

Proof.

First note that the dependency graph is the line graph of $G$ and $\Delta=2d-2$ is the maximum degree of the line graph of $G$ . Thus $ep(2d-1)<1$ guarantees the prerequisite of Corollary 27 is met. It also implies that for any $\sigma$ , $\left|{\sf Res}(\sigma)\right|\leq(2d-1){\sf Bad}(\sigma)$ , and $\partial\left|{\sf Res}(\sigma)\right|\leq(2d-2)\left|{\sf Res}(\sigma)\right|$ . Similarly to the analysis in Lemma 28, conditioned on a fixed ${\sf Bad}_{t}$ , by Corollary 27 (or (15) in particular), we have that

	$\displaystyle\mathop{\mathbb{{}E}}\nolimits\left\|{\sf Bad}_{t+1}\right\|$	$\displaystyle\leq p\left\|{\sf Res}(\sigma)\right\|+ep\left\|\partial{\sf Res}(\sigma)\right\|$
		$\displaystyle\leq\left(p(2d-1)+ep(2d-2)(2d-1)\right)\left\|{\sf Bad}_{t}\right\|$
		$\displaystyle<(4ed^{2}-1)p\left\|{\sf Bad}_{t}\right\|.$

Since the inequality above holds for any ${\sf Bad}_{t}$ , the lemma follows. ∎

Lemma 33 implies that, if $4epd^{2}\leq 1$ , then the number of bad edges shrinks with a constant factor, and Algorithm 7 resamples $O(m)$ edges in expectation and $O(m\log m)$ edges with high probability, where $m=\left|E\right|$ . A bounded degree graph is sparse and thus $m=O(n)$ , where $n$ is the number of vertices. Since $p=\left(\frac{\lambda}{1+\lambda}\right)^{2}$ , the condition $4epd^{2}\leq 1$ is equivalent to

\displaystyle\lambda\leq\frac{1}{2\sqrt{e}d-1}.

Thus we have the following theorem, where the constants are slightly better than directly applying Theorem 25.

Theorem 34.

If $\lambda\leq\frac{1}{2\sqrt{e}d-1}$ , then Algorithm 7 draws a uniform hard-core configuration with parameter $\lambda$ from a graph with maximum degree $d$ in $O(n)$ time in expectation and in $O(n\log n)$ time with high probability, where $n$ is the number of vertices.

The optimal bound of sampling hard-core configurations is $\lambda<\lambda_{c}\approx\frac{e}{d}$ where $\lambda_{c}$ is defined in Theorem 31. The algorithm is due to Weitz [38] and the hardness is shown in [36, 12]. The condition of our Theorem 34 is more restricted than correlation decay based algorithms [38] or traditional Markov chain based algorithms. Nevertheless, our algorithm matches the correct order of magnitude $\lambda=O(d^{-1})$ . Moreover, our algorithm has the advantage of being simple, exact, and running in linear time in expectation.

8. Distributed algorithms for sampling

An interesting feature of Algorithm 6 is that it is distributed.⁴⁴4See [11] for a very recent work by Feng, Sun, and Yin on distributed sampling algorithms. In particular, they show a similar lower bound in [11, Section 5]. For concreteness, consider the algorithm applied to sampling hard-core configurations on a graph $G$ (i.e. Algorithm 7), assumed to be of bounded maximum degree. Imagine that each vertex is assigned a processor that has access to a source of random bits. Communication is possible between adjacent processors and is assumed to take constant time. This is essentially Linial’s LOCAL model [27]. Then, in each parallel round of the algorithm, the processor at vertex $v$ can update the value $\sigma(v)$ in constant time, as this requires access only to the values of $\sigma(u)$ for vertices $u\in V(G)$ within a bounded distance $r$ of $v$ . In the case of the hard-core model, we have $r=2$ , since the value $\sigma(v)$ at vertex $v$ should be updated precisely if there are vertices $u$ and $u^{\prime}$ such that $v\sim u$ and $u\sim u^{\prime}$ and $\sigma(u)=\sigma(u^{\prime})=1$ . Note that we allow $u^{\prime}=v$ here.

In certain applications, including the hard-core model, Algorithm 6 runs in a number of rounds that is bounded by a logarithmic function of the input size with high probability. (Recall Theorem 25.) We show that this is optimal. (Although the argument is presented in the context of the hard-core model, it ought to generalise to many other applications.)

Set $L=\lceil c\log n\rceil$ for some constant $c>0$ to be chosen later. The instance that establishes the lower bound is a graph $G$ consisting of a collection of $n/L$ disjoint paths $\Pi_{1},\ldots,\Pi_{n/L}$ with $L$ vertices each. (Assume that $n$ is an exact multiple of $L$ ; this is not a significant restriction.) The high-level idea behind the lower bound is simple, and consists of two observations. We assume first that the distributed algorithm we are considering always produces an output, say $\hat{\sigma}:V(G)\to\{0,1\}$ , within $t$ rounds. It will be easy at the end to extend the argument to the situation where the running time is a possibly unbounded random variable with bounded expectation.

Focus attention on a particular path $\Pi$ with endpoints $u$ and $v$ . The first observation is that if $rt<L/2$ then $\sigma(u)$ (respectively, $\sigma(v)$ ) depends only on the computations performed by processors in the half of $\Pi$ containing $u$ (respectively $v$ ). Therefore, in the algorithm’s output, $\hat{\sigma}(u)$ and $\hat{\sigma}(v)$ are probabilistically independent. The second observation is that if the constant $c$ is sufficiently small then, in the hard-core distribution, $\sigma(u)$ and $\sigma(v)$ are significantly correlated. Since the algorithm operates independently on each of the $n/L$ paths, these small but significant correlations combine to force to a large variation distance between the hard-core distribution and the output distribution of the algorithm.

We now quantify the second observation. Let $\sigma:V(G)\to\{0,1\}$ be a sample from the hard-core distribution on a path $\Pi$ on $k$ vertices with endpoints $u$ and $v$ , and let $I_{k}=Z_{\Pi}(\lambda)$ denote the corresponding hard-core partition function (weighted sum over independent sets). Define the matrix $W_{k}=\big{(}\begin{smallmatrix}w_{00}&w_{01}\\ w_{10}&w_{11}\end{smallmatrix}\big{)}$ , where $w_{ij}=\mathop{\mathrm{Pr}}\nolimits(\sigma(u)=i\wedge\sigma(v)=j)$ . Then

W_{k}=\frac{1}{I_{k}}\begin{pmatrix}I_{k-2}&\lambda I_{k-3}\\ \lambda I_{k-3}&\lambda^{2}I_{k-4}\end{pmatrix},

since $I_{k}$ is the total weight of independent sets in $\Pi$ , $I_{k-2}$ is the total weight of independent sets with $\sigma(u)=\sigma(v)=0$ , $I_{k-3}$ is the total weight of independent sets with $\sigma(u)=0$ and $\sigma(v)=1$ , and so on. Also note that $I_{k}$ satisfies the recurrence

(23)

I_{0}=1,\quad I_{1}=\lambda+1,\quad\text{and}\quad I_{k}=I_{k-1}+\lambda I_{k-2},\>\text{for $k\geq 2$}.

We will use $\det W_{k}$ to measure the deviation of the distribution of $(\sigma(u),\sigma(v))$ from a product distribution. Write

W_{k}^{\prime}=\begin{pmatrix}I_{k-2}&I_{k-3}\\ I_{k-3}&I_{k-4}\end{pmatrix},

and note that $\det W_{k}=\lambda^{2}I_{k}^{-2}\det W_{k}^{\prime}$ . Applying recurrence (23) once to each of the four entries of $W_{k}^{\prime}$ , we have

	$\displaystyle\det W_{k}^{\prime}$	$\displaystyle=I_{k-2}I_{k-4}-I_{k-3}^{2}$
		$\displaystyle=(I_{k-3}+\lambda I_{k-4})(I_{k-5}+\lambda I_{k-6})-(I_{k-4}+\lambda I_{k-5})^{2}$
		$\displaystyle=I_{k-3}(I_{k-5}+\lambda I_{k-6})-I_{k-4}(I_{k-4}+\lambda I_{k-5})+\lambda^{2}(I_{k-4}I_{k-6}-I_{k-5}^{2})$
		$\displaystyle=I_{k-3}I_{k-4}-I_{k-4}I_{k-3}+\lambda^{2}\det W_{k-2}^{\prime}$
		$\displaystyle=\lambda^{2}\det W_{k-2}^{\prime},$

for all $k\geq 6$ . By direct calculation, $\det W_{4}^{\prime}=-\lambda^{2}$ and $\det W_{5}^{\prime}=\lambda^{3}$ . Hence, by induction, $\det W_{k}^{\prime}=(-1)^{k-1}\lambda^{k-2}$ , and

(24)

\det W_{k}=\frac{(-1)^{k-1}\lambda^{k}}{I_{k}^{2}},

for all $k\geq 4$ .

Solving the recurrence (23) gives the following formula for $I_{k}$ :

I_{k}=A_{\lambda}\left(\frac{1+\sqrt{4\lambda+1}}{2}\,\right)^{k}+B_{\lambda}\left(\frac{1-\sqrt{4\lambda+1}}{2}\,\right)^{k},

where

A_{\lambda}=\left(\frac{1}{2}+\frac{2\lambda+1}{2\sqrt{4\lambda+1}}\right)\quad\text{and}\quad B_{\lambda}=\left(\frac{1}{2}-\frac{2\lambda+1}{2\sqrt{4\lambda+1}}\right)

Asymptotically,

I_{k}=(1+o(1))A_{\lambda}\left(\frac{1+\sqrt{4\lambda+1}}{2}\,\right)^{k}.

Substituting this estimate into (24) yields $|\det W_{k}|=(1+o(1))A_{\lambda}^{-2}\alpha^{k}$ where

\alpha=\frac{2\lambda}{2\lambda+\sqrt{4\lambda+1}+1}.

Note that $0<\alpha<1$ and $\alpha$ depends only on $\lambda$ .

Now let the matrix $\widehat{W}_{k}=\big{(}\begin{smallmatrix}\widehat{w}_{00}&\widehat{w}_{01}\\ \widehat{w}_{10}&\widehat{w}_{11}\end{smallmatrix}\big{)}$ be defined as for $W_{k}$ , but with respect to the output distribution of the distributed sampling algorithm rather than the true hard-core distribution. Recall that we choose $L=\lceil c\log n\rceil>2rt$ , which implies that $\hat{\sigma}(u)$ and $\hat{\sigma}(v)$ are independent and $\det\widehat{W}_{L}=0$ . It is easy to check that if $\|\widehat{W}_{k}-W_{k}\|_{\infty}\leq\varepsilon$ , where the matrix norm is entrywise, then $|\det W_{k}|\leq\varepsilon$ . Thus, for $c$ sufficiently small (and $L=\lceil c\log n\rceil$ ), we can ensure that $\|\widehat{W}_{L}-W_{L}\|_{\infty}\geq n^{-1/3}$ . Thus, $|\widehat{w}_{ij}-w_{ij}|\geq n^{-1/3}$ , for some $i,j$ ; for definiteness, suppose that $i=j=0$ and that $\widehat{w}_{00}>w_{00}$ .

Let $Z$ (respectively $\widehat{Z}$ ) be the number of paths whose endpoints are both assigned 0 in the hard-core distribution (respectively, the algorithm’s output distribution). Then $Z$ (respectively $\widehat{Z}$ ) is a binomial random variable with expectation $\mu=w_{00}n/L$ (respectively $\hat{\mu}=\widehat{w}_{00}n/L$ ). Since $|\mathop{\mathbb{{}E}}\nolimits Z-\mathop{\mathbb{{}E}}\nolimits\widehat{Z}|>\Omega(n^{2/3}/\log n)$ , a Chernoff bound gives that $\mathop{\mathrm{Pr}}\nolimits(Z\geq(\mu+\hat{\mu})/2)$ and $\mathop{\mathrm{Pr}}\nolimits(\widehat{Z}\leq(\mu+\hat{\mu})/2)$ both tend to zero exponentially fast with $n$ . It follows that the variation distance between the distributions of $\sigma$ and $\hat{\sigma}$ is $1-o(1)$ .

The above argument assumes an absolute bound on running time, whereas the running time of an exact sampling algorithm will in general be a random variable $T$ . To bridge the gap, suppose $\mathop{\mathrm{Pr}}\nolimits(T\leq t)\geq\frac{2}{3}$ . Then

	$\displaystyle\\|\hat{\sigma}-\sigma\\|_{\mathrm{TV}}$	$\displaystyle=\max_{A}\bigl{\|}\mathop{\mathrm{Pr}}\nolimits(\hat{\sigma}\in A)-\mathop{\mathrm{Pr}}\nolimits(\sigma\in A)\bigr{\|}$
		$\displaystyle=\max_{A}\Bigl{\|}\big{(}\mathop{\mathrm{Pr}}\nolimits(\hat{\sigma}\in A\mid T\leq t)-\mathop{\mathrm{Pr}}\nolimits(\sigma\in A)\big{)}\mathop{\mathrm{Pr}}\nolimits(T\leq t)$
		$\displaystyle\qquad\qquad\hbox{}+\bigl{(}\mathop{\mathrm{Pr}}\nolimits(\hat{\sigma}\in A\mid T>t)-\mathop{\mathrm{Pr}}\nolimits(\sigma\in A)\big{)}\mathop{\mathrm{Pr}}\nolimits(T>t)\Bigr{\|}$
		$\displaystyle\geq\tfrac{2}{3}(1-o(1))-\tfrac{1}{3}\times 1,$

Where $\|\cdot\|_{\mathrm{TV}}$ denotes variation distance, and $A$ ranges over events $A\subseteq\{0,1\}^{|V(G)|}$ . Thus $\|\sigma-\hat{\sigma}\|_{\mathrm{TV}}\geq\frac{1}{3}-o(1)$ , which is a contradiction. It follows that $\mathop{\mathrm{Pr}}\nolimits(T\leq t)<\frac{2}{3}$ and hence $\mathop{\mathbb{{}E}}\nolimits(T)\geq\frac{1}{3}t$ . Note that this argument places a lower bound on parallel time not just for exact samplers, but even for (very) approximate ones.

With only a slight increase in work, one could take the instance $G$ to be a path of length $n$ , which might be considered more natural. Identify $O(n/L)$ subpaths within $G$ , suitably spaced, and of length $L$ . The only complication is that the hard-core distribution does not have independent marginals on distinct subpaths. However, by ensuring that the subpaths are separated by distance $n^{\alpha}$ , for some small $\alpha>0$ , the correlations can be controlled, and the argument proceeds, with only slight modification, as before.

Acknowledgements

We would like to thank Yumeng Zhang for pointing out a factor $k$ saving in Corollary 30. We thank Dimitris Achlioptas, Fotis Iliopoulos, Pinyan Lu, Alistair Sinclair, and Yitong Yin for their helpful comments. We also thank anonymous reviewers for their detailed comments.

HG and MJ are supported by the EPSRC grant EP/N004221/1. JL is supported by NSF grant CCF-1420934. This work was done (in part) while the authors were visiting the Simons Institute for the Theory of Computing. HG was also supported by a Google research fellowship in the Simons Institute.

References

[1] Dimitris Achlioptas and Fotis Iliopoulos. Random walks that find perfect objects and the Lovász Local Lemma. J. ACM, 63(3):22, 2016.
[2] Noga Alon. A parallel algorithmic version of the Local Lemma. Random Struct. Algorithms, 2(4):367–378, 1991.
[3] József Beck. An algorithmic approach to the Lovász Local Lemma. I. Random Struct. Algorithms, 2(4):343–366, 1991.
[4] Ivona Bezáková, Andreas Galanis, Leslie Ann Goldberg, Heng Guo, and Daniel Štefankovič. Approximation via correlation decay when strong spatial mixing fails. In ICALP, 45:1–13, 2016.
[5] Magnus Bordewich, Martin E. Dyer, and Marek Karpinski. Stopping times, metrics and approximate counting. In ICALP, pages 108–119, 2006.
[6] Russ Bubley and Martin E. Dyer. Graph orientations with no sink and an approximation for a hard case of #SAT. In SODA, pages 248–257, 1997.
[7] Henry Cohn, Robin Pemantle, and James G. Propp. Generating a random sink-free orientation in quadratic time. Electr. J. Comb., 9(1), 2002.
[8] Artur Czumaj and Christian Scheideler. Coloring nonuniform hypergraphs: A new algorithmic approach to the general Lovász Local Lemma. Random Struct. Algorithms, 17(3-4):213–237, 2000.
[9] Paul Erdős and László Lovász. Problems and results on 3-chromatic hypergraphs and some related questions. Infinite and finite sets, volume 10 of Colloquia Mathematica Societatis János Bolyai, pages 609–628, 1975.
[10] Weiming Feng, Yahui Liu, and Yitong Yin. Local rejection sampling with soft filters. CoRR, abs/1807.06481, 2018.
[11] Weiming Feng, Yuxin Sun, and Yitong Yin. What can be sampled locally? In PODC, pages 121–130. ACM, 2017.
[12] Andreas Galanis, Daniel Štefankovič, and Eric Vigoda. Inapproximability of the partition function for the antiferromagnetic Ising and hard-core models. Comb. Probab. Comput., 25(4):500–559, 2016.
[13] Heidi Gebauer, Tibor Szabó, and Gábor Tardos. The local lemma is asymptotically tight for SAT. J. ACM, 63(5):43:1–43:32, 2016.
[14] Heng Guo and Kun He. Tight bounds for popping algorithms. CoRR, abs/1807.01680, 2018.
[15] Heng Guo and Mark Jerrum. Approximately counting bases of bicircular matroids. CoRR, abs/1808.09548, 2018.
[16] Heng Guo and Mark Jerrum. Perfect simulation of the hard disks model by partial rejection sampling. In ICALP, volume 107 of LIPIcs, pages 69:1–69:10. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.
[17] Heng Guo and Mark Jerrum. A polynomial-time approximation algorithm for all-terminal network reliability. In ICALP, volume 107 of LIPIcs, pages 68:1–68:12. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.
[18] Heng Guo, Mark Jerrum, and Jingcheng Liu. Uniform sampling through the Lovász local lemma. In STOC, pages 342–355. ACM, 2017.
[19] Bernhard Haeupler, Barna Saha, and Aravind Srinivasan. New constructive aspects of the Lovász Local Lemma. J. ACM, 58(6):28:1–28:28, 2011.
[20] David G. Harris and Aravind Srinivasan. Constraint satisfaction, packet routing, and the Lovász Local Lemma. In STOC, pages 685–694, 2013.
[21] David G. Harris and Aravind Srinivasan. The Moser-Tardos framework with partial resampling. In FOCS, pages 469–478, 2013.
[22] David G. Harris and Aravind Srinivasan. A constructive algorithm for the Lovász Local Lemma on permutations. In SODA, pages 907–925, 2014.
[23] Nicholas J. A. Harvey and Jan Vondrák. An algorithmic proof of the Lovász Local Lemma via resampling oracles. In FOCS, pages 1327–1346, 2015. Full version available at abs/1504.02044.
[24] Jonathan Hermon, Allan Sly, and Yumeng Zhang. Rapid mixing of hypergraph independent set. CoRR, abs/1610.07999, 2016.
[25] Donald E. Knuth. The art of computer programming, volume 4b (draft, pre-fascicle 6a). 2015. Available at http://www-cs-faculty.stanford.edu/~uno/fasc6a.ps.gz.
[26] Kashyap Babu Rao Kolipaka and Mario Szegedy. Moser and Tardos meet Lovász. In STOC, pages 235–244, 2011.
[27] Nathan Linial. Distributive graph algorithms-global solutions from local data. In FOCS, pages 331–335, 1987.
[28] Jingcheng Liu and Pinyan Lu. FPTAS for counting monotone CNF. In SODA, pages 1531–1548, 2015.
[29] Ankur Moitra. Approximate counting, the Lovász Local Lemma and inference in graphical models. CoRR, abs/1610.04317, 2016. STOC 2017, to appear.
[30] Michael Molloy and Bruce A. Reed. Further algorithmic aspects of the Local Lemma. In STOC, pages 524–529, 1998.
[31] Robin A. Moser and Gábor Tardos. A constructive proof of the general Lovász Local Lemma. J. ACM, 57(2), 2010.
[32] James G. Propp and David B. Wilson. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Struct. Algorithms, 9(1-2):223–252, 1996.
[33] James G. Propp and David B. Wilson. How to get a perfectly random sample from a generic markov chain and generate a random spanning tree of a directed graph. J. Algorithms, 27(2):170–217, 1998.
[34] Alexander D. Scott and Alan D. Sokal. The repulsive lattice gas, the independent-set polynomial, and the Lovász Local Lemma. J. Stat. Phy., 118(5):1151–1261, 2005.
[35] James B. Shearer. On a problem of Spencer. Combinatorica, 5(3):241–245, 1985.
[36] Allan Sly and Nike Sun. The computational hardness of counting in two-spin models on $d$ -regular graphs. Ann. Probab., 42(6):2383–2416, 2014.
[37] Aravind Srinivasan. Improved algorithmic versions of the Lovász Local Lemma. In SODA, pages 611–620, 2008.
[38] Dror Weitz. Counting independent sets up to the tree threshold. In STOC, pages 140–149, 2006.
[39] David B. Wilson. Generating random spanning trees more quickly than the cover time. In STOC, pages 296–303, 1996.

	$\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{1})\mid B(S_{2}))$	$\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(\bigwedge_{i\in S_{1}}\overline{A_{i}}\;\Bigg{\|}\bigwedge_{i\in S_{2}}\overline{A_{i}}\right)$
		$\displaystyle=\prod_{t=1}^{r}\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(\overline{A_{j_{t}}}\;\Bigg{\|}\bigwedge_{s=1}^{t-1}\overline{A_{j_{s}}}\wedge\bigwedge_{i\in S_{2}}\overline{A_{i}}\right)$
		$\displaystyle=\prod_{t=1}^{r}\left(1-\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(A_{j_{t}}\;\Bigg{\|}\bigwedge_{s=1}^{t-1}\overline{A_{j_{s}}}\wedge\bigwedge_{i\in S_{2}}\overline{A_{i}}\right)\right).$

		$\displaystyle\mathop{\mathbb{{}E}}\nolimits\left(\left\|{\sf Res}_{t+1}\right\|\mid\text{the set of resampling events at round $t$ is (exactly) ${\sf Res}_{t}$}\right)$
	$\displaystyle\leq$	$\displaystyle\;\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot ep\Delta^{2}\left\|{\sf Res}_{t}\right\|+p(1+e\Delta)\left\|{\sf Res}_{t}\right\|$
	$\displaystyle=$	$\displaystyle\;p\left(\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot e\Delta^{2}+(1+e\Delta)\right)\left\|{\sf Res}_{t}\right\|.$

	$\displaystyle\mathop{\mathbb{{}E}}\nolimits\left\|{\sf Bad}_{t+1}\right\|$	$\displaystyle\leq p\left\|{\sf Res}(\sigma)\right\|+ep\left\|\partial{\sf Res}(\sigma)\right\|$
		$\displaystyle\leq\left(p(2d-1)+ep(2d-2)(2d-1)\right)\left\|{\sf Bad}_{t}\right\|$
		$\displaystyle<(4ed^{2}-1)p\left\|{\sf Bad}_{t}\right\|.$

	$\displaystyle\\|\hat{\sigma}-\sigma\\|_{\mathrm{TV}}$	$\displaystyle=\max_{A}\bigl{\|}\mathop{\mathrm{Pr}}\nolimits(\hat{\sigma}\in A)-\mathop{\mathrm{Pr}}\nolimits(\sigma\in A)\bigr{\|}$
		$\displaystyle=\max_{A}\Bigl{\|}\big{(}\mathop{\mathrm{Pr}}\nolimits(\hat{\sigma}\in A\mid T\leq t)-\mathop{\mathrm{Pr}}\nolimits(\sigma\in A)\big{)}\mathop{\mathrm{Pr}}\nolimits(T\leq t)$
		$\displaystyle\qquad\qquad\hbox{}+\bigl{(}\mathop{\mathrm{Pr}}\nolimits(\hat{\sigma}\in A\mid T>t)-\mathop{\mathrm{Pr}}\nolimits(\sigma\in A)\big{)}\mathop{\mathrm{Pr}}\nolimits(T>t)\Bigr{\|}$
		$\displaystyle\geq\tfrac{2}{3}(1-o(1))-\tfrac{1}{3}\times 1,$

Uniform Sampling through the Lovász Local Lemma

Abstract.

1. Introduction

2. Partial Rejection Sampling

Theorem 1.

Theorem 2 (Shearer [35]).

Theorem 3 (Moser and Tardos [31]).

Theorem 4 (Kolipaka and Szegedy [26]).

Condition 5.

Definition 6.

Lemma 7.

Proof.

Theorem 8.

Proof.

3. Expected running time of Algorithm 2

Lemma 9.

Proof.

Corollary 10.

Proof.

Lemma 11.

Proof.

Lemma 12.

Proof.

Theorem 13.

Proof.

Theorem 14 ([26, Theorem 5]).

4. Applications of Algorithm 2

4.1. Sink-free Orientations

Theorem 15.

Theorem 16.

Proof.

Remark.

4.2. Rooted Spanning Trees

Theorem 17.

Proof.

4.3. Extremal CNF formulas

Definition 18.

Corollary 19.

Example.

5. General Partial Rejection Sampling

Definition 20.

Lemma 21.

Proof.

Lemma 22.

Proof.

Lemma 23.

Proof.

Theorem 24.

Proof.

6. Running Time Analysis of Algorithm 6

Theorem 25.

Lemma 26 ([19, Theorem 2.1]).

Proof.

Corollary 27.

Lemma 28.

Proof.

7. Applications of Algorithm 6

7.1. kk-CNF Formulas

Definition 29.

Corollary 30.

Theorem 31 ([36, 12]).

Theorem 32.

Proof.

7.2. Independent Sets

Lemma 33.

Proof.

Theorem 34.

8. Distributed algorithms for sampling

Acknowledgements

References

7.1. $k$ -CNF Formulas