This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\Style

DSymb=d,DShorten=true,IntegrateDifferentialDSymb=d

Uniform Sampling through the Lovász Local Lemma

Heng Guo School of Informatics, University of Edinburgh, Informatics Forum, Edinburgh EH8 9AB, United Kingdom. hguo@inf.ed.ac.uk Mark Jerrum School of Mathematical Sciences, Queen Mary, University of London, Mile End Road, London E1 4NS, United Kingdom. mj@qmul.ac.uk  and  Jingcheng Liu Department of EECS, University of California, Berkeley, CA liuexp@berkeley.edu
Abstract.

We propose a new algorithmic framework, called “partial rejection sampling”, to draw samples exactly from a product distribution, conditioned on none of a number of bad events occurring. Our framework builds new connections between the variable framework of the Lovász Local Lemma and some classical sampling algorithms such as the “cycle-popping” algorithm for rooted spanning trees. Among other applications, we discover new algorithms to sample satisfying assignments of kk-CNF formulas with bounded variable occurrences.

1. Introduction

The Lovász Local Lemma [9] is a classical gem in combinatorics that guarantees the existence of a perfect object that avoids all events deemed to be “bad”. The original proof is non-constructive but there has been great progress in the algorithmic aspects of the local lemma. After a long line of research [3, 2, 30, 8, 34, 37], the celebrated result by Moser and Tardos [31] gives efficient algorithms to find such a perfect object under conditions that match the Lovász Local Lemma in the so-called variable framework. However, it is natural to ask whether, under the same condition, we can also sample a perfect object uniformly at random instead of merely finding one.

Roughly speaking, the resampling algorithm by Moser and Tardos [31] works as follows. We initialize all variables randomly. If bad events occur, then we arbitrarily choose a bad event and resample all the involved variables. Unfortunately, it is not hard to see that this algorithm can produce biased samples. This seems inevitable. As Bezáková et al. showed [4], sampling can be NP-hard even under conditions that are stronger than those of the local lemma. On the one hand, the symmetric Lovász Local Lemma only requires epΔ1ep\Delta\leq 1, where pp is the probability of bad events and Δ\Delta is the maximum degree of the dependency graph. On the other hand, translating the result of [4] to this setting, one sees that as soon as pΔ2Cp\Delta^{2}\geq C for some constant CC, then even approximately sampling perfect objects in the variable framework becomes NP-hard.

The starting point of our work is a new condition (see Condition 5) under which we show that the output of the Moser-Tardos algorithm is in fact uniform (see Theorem 8). Intuitively, the condition requires any two dependent bad events to be disjoint. Indeed, instances satisfying this condition are called “extremal” in the study of Lovász Local Lemma. For these extremal instances, we can in fact resample in a parallel fashion, since the occurring bad events form an independent set in the dependency graph. We call this algorithm “partial rejection sampling”,111Despite the apparent similarity in names, our algorithm is different from “partial resampling” in [20, 21]. We resample all variables in certain sets of events whereas “partial resampling” only resamples a subset of variables from some bad event. in the sense that it is like rejection sampling, but only resamples an appropriate subset of variables.

Our result puts some classical sampling algorithms under a unified framework, including the “cycle-popping” algorithm by Wilson [39] for sampling rooted spanning trees, and the “sink-popping” algorithm by Cohn, Pemantle, and Propp [7] for sampling sink-free orientations of an undirected graph. Indeed, Cohn et al. [7] coined the term “partial rejection sampling” and asked for a general theory, and we believe that extremal instances under the variable framework is a satisfactory answer. With our techniques, we are able to give a new algorithm to sample solutions for a special class of kk-CNF formulas, under conditions matching the Lovász Local Lemma (see Corollary 19), which is an NP-hard task for general kk-CNF formulas. Furthermore, we provide explicit formulas for the expected running time of these algorithms (see Theorem 13), which matches the running time upper bound given by Kolipaka and Szegedy [26] under Shearer’s condition [35].

The next natural question is thus whether we can go beyond extremal instances. Indeed, our main technical contribution is a general uniform sampler (Algorithm 6) that applies to any problem under the variable framework. The main idea is that, instead of only resampling occurring bad events, we resample a larger set of events so that the choices made do not block any perfect assignments in the end, in order to make sure of uniformity in the final output.

As a simple example, we describe how our algorithm samples independent sets. The algorithm starts by choosing each vertex with probability 1/21/2 independently. At each subsequent round, in the induced subgraph on the currently chosen vertices, the algorithm finds all the connected components of size 2\geq 2. Then it resamples all these vertices and their boundaries (which are unoccupied). And it repeats this process until there is no edge with both endpoints occupied. What seems surprising is that this simple process does yield a uniformly random independent set when it stops. Indeed, as we will show in Theorem 34, this simple process is an exact sampler for weighted independent sets (also known as the hard-core model in statistical physics). In addition, it runs in expected linear time under a condition that matches, up to a constant factor, the uniqueness threshold of the model (beyond which the problem of approximate sampling becomes NP-hard).

In the more general setting, we will choose the set of events to be resampled, denoted by 𝖱𝖾𝗌{\sf Res}, iteratively. We start from the set of occurring bad events. Then we include all neighbouring events of the current set 𝖱𝖾𝗌{\sf Res}, until there is no event AA on the boundary of 𝖱𝖾𝗌{\sf Res} such that the current assignment, projected on the common variables of AA and 𝖱𝖾𝗌{\sf Res}, can be extended so that AA may happen. In the worst case, we will resample all events (there is no event in the boundary at all). In that scenario the algorithm is the same as a naive rejection sampling, but typically we resample fewer variables in every step. We show that this is a uniform sampler on assignments that avoid all bad events once it stops (see Theorem 24).

One interesting feature of our algorithm is that, unlike Markov chain based algorithms, ours does not require the solution space (or any augmented space) to be connected. Moreover, our sampler is exact; that is, when the algorithm halts, the final distribution is precisely the desired distribution. Prior to our work, most exact sampling algorithms were obtained by coupling from the past [32]. We also note that previous work on the Moser-Tardos output distribution, such as [19], is not strong enough to guarantee a uniform sample (or ε\varepsilon-close to uniform in terms of total variation distances).

We give sufficient conditions that guarantee a linear expected running time of our algorithm in the general setting (see Theorem 25). The first condition is that pΔ2p\Delta^{2} is bounded above by a constant. This is optimal up to constants in observance of the NP-hardness result in [4]. Unfortunately, the condition on pΔ2p\Delta^{2} alone does not make the algorithm efficient. In addition, we also need to bound the expansion from bad events to resampling events, which leads to an extra condition on intersections of bad events. Removing this extra condition seems to require substantial changes to our current algorithm.

To illustrate the result, we apply our algorithm to sample satisfying assignments of kk-CNF formulas in which the degree of each variable (the number of clauses containing it) is at most dd. We say that a kk-CNF formula has intersection ss if any two dependent clauses share at least ss variables. The extra condition from our analysis naturally leads to a lower bound on ss. Let nn be the number of variables. We (informally) summarize our results on kk-CNF formulas as follows (see Corollary 30 and Theorem 32):

  • If d16e2k/2d\leq\frac{1}{6e}\cdot 2^{k/2}, dk23edk\geq 2^{3e} and smin{log2dk,k/2}s\geq\min\{\log_{2}dk,k/2\}, then the general partial rejection resampling algorithm outputs a uniformly random solution to a kk-CNF formula with degree dd and intersection ss in expected running time O(n)O(n).

  • If d42k/2d\geq 4\cdot 2^{k/2} (for an even kk), then even if s=k/2s=k/2, it is NP-hard even to approximately sample a solution to a kk-CNF formula with degree dd and intersection ss.

As shown in the hardness result, the intersection bound does not render the problem trivial.

Previously, sampling/counting satisfying assignments of kk-CNF formulas required the formula to be monotone and dkd\leq k to be large enough [4] (see also [5, 28]). Although our result requires an additional lower bound on intersections, not only does it improve the dependency of kk and dd exponentially, but also achieves a matching constant 1/21/2 in the exponent. Furthermore the samples produced are exactly uniform. Thus, if the extra condition on intersections can be removed, we will have a sharp phase transition at around d=O(2k/2)d=O(2^{k/2}) in the computational complexity of sampling solutions to kk-CNF formulas with bounded variable occurrences. A similar sharp transition has been recently established for, e.g., sampling configurations in the hard-core model [38, 36, 12].

Simultaneous to our work, Hermon, Sly, and Zhang [24] showed that Markov chains for monotone kk-CNF formulas are rapidly mixing, if dc2k/2d\leq c2^{k/2} for a constant cc. In another parallel work, Moitra [29] gave a novel algorithm to sample solutions for general kk-CNF when d2k/60d\lesssim 2^{k/60}. We note that neither results are directly comparable to ours and the techniques are very different. Both of these two samplers are approximate while ours is exact. Moreover, ours does not require monotonicity (unlike [24]), and allows larger dd than [29] but at the cost of an extra intersection lower bound. Unfortunately, our algorithm can be exponentially slow when the intersection ss is not large enough. In sharp contrast, as shown by Hermon et al. [24], Markov chains mix rapidly for dc2k/k2d\leq c2^{k}/k^{2} when s=1s=1.

While the study of algorithmic Lovász Local Lemma has progressed beyond the variable framework [22, 1, 23], we restrict our focus to the variable framework in this work. It is also an interesting future direction to investigate and extend our techniques of uniform sampling beyond the variable framework. For example, one may want to sample a permutation that avoids certain patterns. The classical sampling problem of perfect matchings in a bipartite graph can be formulated in this way.

Since the conference version of this paper appeared [18], a number of applications of the partial rejection sampling method have been found [17, 16, 15]. One highlight is the first fully polynomial-time randomised approximation scheme (FPRAS) for all-terminal network reliability [17]. For the extremal instances, tight running time bounds have also been obtained [14]. Moreover, partial rejection sampling is adapted to dynamic and distributed settings as well [10].

2. Partial Rejection Sampling

We first describe the “variable” framework. Let {X1,,Xn}\{X_{1},\dots,X_{n}\} be a set of random variables. Each XiX_{i} can have its own distribution and range DiD_{i}. Let {A1,,Am}\{A_{1},\dots,A_{m}\} be a set of “bad” events that depend on XiX_{i}’s. For example, for a constraint satisfaction problem (CSP) with variables XiX_{i} (1in1\leq i\leq n) and constraints CjC_{j} (1jm1\leq j\leq m), each AjA_{j} is the set of unsatisfying assignments of CjC_{j} for 1jm1\leq j\leq m. Let 𝗏𝖺𝗋(Ai){\sf var}(A_{i}) be the (index) set of variables that AiA_{i} depends on.

The dependency graph G=(V,E)G=(V,E) has mm vertices, identified with the integers {1,2,,m}\{1,2,\ldots,m\}, corresponding to the events AiA_{i}, and (i,j)(i,j) is an edge if AiA_{i} and AjA_{j} depend on one or more common variables, and iji\neq j. In other words, for any distinct i,ji,j, (i,j)E(i,j)\in E if 𝗏𝖺𝗋(Ai)𝗏𝖺𝗋(Aj){\sf var}(A_{i})\cap{\sf var}(A_{j})\neq\emptyset. We write AiAjA_{i}\sim A_{j} if the vertices ii and jj are adjacent in GG. The asymmetric Lovász Local Lemma [9] states the following.

Theorem 1.

If there exists a vector 𝐱[0,1)m\boldsymbol{x}\in[0,1)^{m} such that i[m]\forall i\in[m],

(1) Pr(Ai)xi(i,j)E(1xj),\displaystyle\mathop{\mathrm{Pr}}\nolimits(A_{i})\leq x_{i}\prod_{(i,j)\in E}(1-x_{j}),

then Pr(i=1mAi¯)i=1m(1xi)>0\displaystyle\mathop{\mathrm{Pr}}\nolimits\left(\bigwedge_{i=1}^{m}\overline{A_{i}}\right)\geq\prod_{i=1}^{m}(1-x_{i})>0.

Theorem 1 has a condition that is easy to verify, but not necessarily optimal. Shearer [35] gave the optimal condition for the local lemma to hold for a fixed dependency graph GG. To state Shearer’s condition, we will need the following definitions. Let pi:=Pr(Ai)p_{i}:=\mathop{\mathrm{Pr}}\nolimits(A_{i}) for all 1im1\leq i\leq m. Let \mathcal{I} be the collection of independent sets of GG. Define the following quantity:

qI(p):=J,IJ(1)|J||I|iJpi,\displaystyle q_{I}(\textbf{p}):=\sum_{J\in\mathcal{I},\,I\subseteq J}(-1)^{|J|-|I|}\prod_{i\in J}p_{i},

where p=(p1,,pm)\textbf{p}=(p_{1},\ldots,p_{m}). When there is no confusion we also simply write qIq_{I} instead of the more cumbersome qI(p)q_{I}(\textbf{p}). Moreover, to simplify the notation, let qi:=q{i}q_{i}:=q_{\{i\}} for 1im1\leq i\leq m. Note that if II\notin\mathcal{I}, qI=0q_{I}=0.

Theorem 2 (Shearer [35]).

If qI0q_{I}\geq 0 for all IVI\subseteq V, then Pr(i=1mAi¯)q\mathop{\mathrm{Pr}}\nolimits\left(\bigwedge_{i=1}^{m}\overline{A_{i}}\,\right)\geq q_{\emptyset}.

In particular, if the condition holds with q>0q_{\emptyset}>0, then Pr(i=1mAi¯)>0\mathop{\mathrm{Pr}}\nolimits\left(\bigwedge_{i=1}^{m}\overline{A_{i}}\,\right)>0.

Neither Theorem 1 nor Theorem 2 yields an efficient algorithm to find the assignment avoiding all bad events, since they only guarantee an exponentially small probability. There has been a long line of research devoted to an algorithmic version of LLL, culminating in Moser and Tardos [31] with essentially the same condition as in Theorem 1. The Resample algorithm of Moser and Tardos is very simple, described in Algorithm 1.

  1. (1)

    Draw independent samples of all variables X1,,XnX_{1},\dots,X_{n} from their respective distributions.

  2. (2)

    While at least one AiA_{i} holds, pick one such AiA_{i} arbitrarily and resample all variables in 𝗏𝖺𝗋(Ai){\sf var}(A_{i}).

  3. (3)

    Output the current assignment.

Algorithm 1 The Resample algorithm

In [31], Moser and Tardos showed that Algorithm 1 finds a good assignment very efficiently.

Theorem 3 (Moser and Tardos [31]).

Under the condition of Theorem 1, the expected number of resampling steps in Algorithm 1 is at most i=1mxi1xi\sum_{i=1}^{m}\frac{x_{i}}{1-x_{i}}.

Unfortunately, the final output of Algorithm 1 is not distributed as we would like, namely as a product distribution conditioned on avoiding all bad events.

In addition, Kolipaka and Szegedy [26] showed that up to Shearer’s condition, Algorithm 1 is efficient. Recall that qi:=q{i}q_{i}:=q_{\{i\}} for 1im1\leq i\leq m.

Theorem 4 (Kolipaka and Szegedy [26]).

If qI0q_{I}\geq 0 for all II\in\mathcal{I} and q>0q_{\emptyset}>0, then the expected number of resampling steps in Algorithm 1 is at most i=1mqiq\sum_{i=1}^{m}\frac{q_{i}}{q_{\emptyset}}.

On the other hand, Wilson’s cycle-popping algorithm [39] is very similar to the Resample algorithm but it outputs a uniformly random rooted spanning tree. Another similar algorithm is the sink-popping algorithm by Cohn, Pemantle, and Propp [7] to generate a sink-free orientation uniformly at random. Upon close examination of these two algorithms, we found a common feature of both problems.

Condition 5.

If (i,j)E(i,j)\in E ((or equivalently AiAj)A_{i}\sim A_{j}), then Pr(AiAj)=0\mathop{\mathrm{Pr}}\nolimits(A_{i}\wedge A_{j})=0; namely the two events AiA_{i} and AjA_{j} are disjoint if they are dependent.

In other words, any two events AiA_{i} and AjA_{j} are either independent or disjoint. These instances have been noticed in the study of Lovász Local Lemma. They are the ones that minimize Pr(i=1mAi¯)\mathop{\mathrm{Pr}}\nolimits\left(\bigwedge_{i=1}^{m}\overline{A_{i}}\,\right) given Shearer’s condition (namely Pr(i=1mAi¯)=q\mathop{\mathrm{Pr}}\nolimits\left(\bigwedge_{i=1}^{m}\overline{A_{i}}\,\right)=q_{\emptyset}). Instances satisfying Condition 5 have been named extremal [26].

We will show that, given Condition 5, the final output of the Resample algorithm is a sample from a conditional product distribution (Theorem 8). Moreover, we will show that under Condition 5, the running time upper bound i=1mqiq\sum_{i=1}^{m}\frac{q_{i}}{q_{\emptyset}} given by Kolipaka and Szegedy (Theorem 4) is indeed the exact expected running time. See Theorem 13.

In fact, when Condition 5 holds, at each step of Algorithm 1, the occurring events form an independent set of the dependency graph GG. Think of the execution of Algorithm 1 as going in rounds. In each round we find the set II of bad events that occur. Due to Condition 5, 𝗏𝖺𝗋(Ai)𝗏𝖺𝗋(Aj)={\sf var}(A_{i})\cap{\sf var}(A_{j})=\emptyset for any i,jIi,j\in I, i.e., II is an independent set in the dependency graph. Therefore, we can resample all variables involved in the occurring bad events without interfering with each other. This motivates Algorithm 2.

We call Algorithm 2 the Partial Rejection Sampling algorithm. This name was coined by Cohn, Pemantle, and Propp [7]. Indeed, they ask as an open problem how to generalize their sink-popping algorithm and Wilson’s cycle popping algorithm. We answer this question under the variable framework. Partial Rejection Sampling differs from the normal rejection sampling algorithm by only resampling “bad” events. Moreover, Algorithm 2 is uniform only on extremal instances, and is a special case of Algorithm 6 given in Section 5, which is a uniform sampler for all instances.

  1. (1)

    Draw independent samples of all variables X1,,XnX_{1},\dots,X_{n} from their respective distributions.

  2. (2)

    While at least one bad event holds, find the independent set II of occurring AiA_{i}’s. Independently resample all variables in iI𝗏𝖺𝗋(Ai)\bigcup_{i\in I}{\sf var}(A_{i}).

  3. (3)

    Output the current assignment.

Algorithm 2 Partial Rejection Sampling for extremal instances

In fact, Algorithm 2 is the same as the parallel version of Algorithm 1 by Moser and Tardos [31] for extremal instances. Suppose each event is assigned to a processor, which determines whether the event holds by looking at the variables associated with the event. If the event holds then all associated variables are resampled. No conflict will be created due to Condition 5.

In the following analysis, we will use the resampling table idea, which has appeared in both the analysis of Moser and Tardos [31] and Wilson [39]. Note that we only use this idea to analyze the algorithm rather than to really create the table in the execution. Associate each variable XiX_{i} with an infinite stack of random values {Xi,1,Xi,2,}\{X_{i,1},X_{i,2},\dots\}. This forms the resampling table where each row represents a variable and there are infinitely many columns, as shown in Table 1. In the execution of the algorithm, when a variable needs to be resampled, the algorithm draws the top value from the stack, or equivalently moves from the current entry in the resampling table to its right.

Table 1. A resampling table with 44 variables
X1X_{1} X1,1X_{1,1} X1,2X_{1,2} X1,3X_{1,3} \dots
X2X_{2} X2,1X_{2,1} X2,2X_{2,2} X2,3X_{2,3} \dots
X3X_{3} X3,1X_{3,1} X3,2X_{3,2} X3,3X_{3,3} \dots
X4X_{4} X4,1X_{4,1} X4,2X_{4,2} X4,3X_{4,3} \dots

Let tt be a positive integer to denote the round of Algorithm 2. Let ji,tj_{i,t} be the index of the variable XiX_{i} in the resampling table at round tt. In other words, at the tt-th round, XiX_{i} takes value Xi,ji,tX_{i,j_{i,t}}. Thus, the set σt={Xi,ji,t1in}\sigma_{t}=\{X_{i,j_{i,t}}\mid 1\leq i\leq n\} is the current assignment of variables at round tt. This σt\sigma_{t} determines which events happen. Call the set of occurring events, viewed as a subset of the vertex set of the dependency graph, ItI_{t}. (For convenience, we shall sometimes identify the event AiA_{i} with its index ii; thus, we shall refer to the “events in SS” rather than the “events indexed by SS”.) As explained above, ItI_{t} is an independent set of GG due to Condition 5. Then variables involved in any of the events in ItI_{t} are resampled. In other words,

ji,t+1={ji,t+1if It such that i𝗏𝖺𝗋(A);ji,totherwise.\displaystyle j_{i,t+1}=\begin{cases}j_{i,t}+1&\text{if }\exists\ell\in I_{t}\text{ such that }i\in{\sf var}(A_{\ell});\\ j_{i,t}&\text{otherwise.}\end{cases}

Therefore, any event that happens in round t+1t+1, must share at least one variable with some event in ItI_{t} (possibly itself). In other words, It+1Γ+(It)I_{t+1}\subseteq\Gamma^{+}(I_{t}) where Γ+()\Gamma^{+}(\cdot) denotes the set of all neighbours of II unioned with II itself. This inspires the notion of independent set sequences (first introduced in [26]).

Definition 6.

A list 𝒮=S1,S2,,S\mathcal{S}=S_{1},S_{2},\dots,S_{\ell} of independent sets in GG is called an independent set sequence if SiS_{i}\neq\emptyset for all 1i11\leq i\leq\ell-1 and for every 1i11\leq i\leq\ell-1, Si+1Γ+(Si)S_{i+1}\subseteq\Gamma^{+}(S_{i}).

We adopt the convention that the empty list is an independent set sequence with =0\ell=0. Note that we allow SS_{\ell} to be \emptyset.

Let MM be a resampling table. Suppose running Algorithm 2 on MM does not terminate up to some integer 1\ell\geq 1 rounds. Define the log of running Algorithm 2 on MM up to round \ell as the sequence of independent sets I1,I2,,II_{1},I_{2},\dots,I_{\ell} created by this run. Thus, for any MM and 1\ell\geq 1, the log I1,I2,,II_{1},I_{2},\dots,I_{\ell} must be an independent set sequence. Moreover, if Algorithm 2 terminates at round TT, let σt:=σT\sigma_{t}:=\sigma_{T} if t>Tt>T. Denote by μ()\mu(\cdot) the product distribution of all random variables.

Lemma 7.

Suppose Condition 5 holds. Given any log 𝒮=S1,S2,,S\mathcal{S}=S_{1},S_{2},\dots,S_{\ell} of length 1\ell\geq 1 and conditioned on seeing the log 𝒮\mathcal{S}, σ+1\sigma_{\ell+1} is a random sample from the product distribution conditioned on the event i[m]Γ+(S)Ai¯\bigwedge_{i\in[m]\setminus\Gamma^{+}(S_{\ell})}\overline{A_{i}}, namely from μ(i[m]Γ+(S)Ai¯)\mu\big{(}\cdot\mid\bigwedge_{i\in[m]\setminus\Gamma^{+}(S_{\ell})}\overline{A_{i}}\big{)}.

We remark that Lemma 7 is not true for non-extremal instances (that is, if Condition 5 fails). In particular, Lemma 7 says that given any log, every valid assignment is not only reachable, but also with the correct probability. This is no longer the case for non-extremal instances — some valid assignments from the desired conditional product distribution could be “blocked” under the log 𝒮\mathcal{S}. In Section 5 we show how to instead achieve uniformity by resampling an “unblocking” set of bad events.

Proof.

The set of occurring events at round \ell is SS_{\ell}. Hence σ+1\sigma_{\ell+1} does not make any of the AiA_{i}’s happen where iΓ+(S)i\notin\Gamma^{+}(S_{\ell}). Call an assignment σ\sigma valid if none of the AiA_{i}’s happen where iΓ+(S)i\notin\Gamma^{+}(S_{\ell}). To show that σ+1\sigma_{\ell+1} has the desired conditional product distribution, we will show that the probabilities of getting any two valid assignments σ\sigma and σ\sigma^{\prime} are proportional to their probabilities of occurrence in μ()\mu(\cdot).

Let MM be the resampling table so that the log of the algorithm is 𝒮\mathcal{S} up to round 1\ell\geq 1, and σ+1=σ\sigma_{\ell+1}=\sigma. Indeed, since we only care about events up to round +1\ell+1, we may truncate the table so that M={Xi,j| 1in,  1jji,+1}M=\{X_{i,j}\;|\;1\leq i\leq n,\;\;1\leq j\leq j_{i,\ell+1}\}. Let M={Xi,j| 1in,  1jji,+1}M^{\prime}=\{X_{i,j}^{\prime}\;|\;1\leq i\leq n,\;\;1\leq j\leq j_{i,\ell+1}\} be another table where Xi,j=Xi,jX_{i,j}^{\prime}=X_{i,j} if j<ji,+1j<j_{i,\ell+1} for any i[n]i\in[n], but σ+1=σ\sigma_{\ell+1}=\sigma^{\prime}. In other words, we only change the values in the final round (Xi,ji,+1)(X_{i,j_{i,\ell+1}}), and only to another valid assignment.

We claim that the algorithm running on MM^{\prime} generates the same log 𝒮\mathcal{S}. The lemma then follows by the following argument. Assuming the claim holds, then conditioned on the log 𝒮\mathcal{S}, every possible table MM such that σ+1=σ\sigma_{\ell+1}=\sigma is one-to-one corresponding to another table MM^{\prime} so that σ+1=σ\sigma_{\ell+1}=\sigma^{\prime}. It implies that for every pair of valid assignments σ,σ\sigma,\sigma^{\prime}, there is a bijection between the resampling tables resulting in them. The ratio between the probability of two corresponding tables is exactly the ratio between the probabilities of σ\sigma and σ\sigma^{\prime} under μ()\mu(\cdot). Since the probability of getting a particular σ\sigma in round +1\ell+1 is the sum over the probabilities of all resampling tables (conditioned on the log 𝒮\mathcal{S}) leading to σ\sigma, the probability of getting σ\sigma is also proportional to its weight under μ()\mu(\cdot).

Suppose the claim fails and the logs obtained by running the algorithm on MM and MM^{\prime} differ. Let t0t_{0}\leq\ell be the first round where resampling changes. Without loss of generality, let AA be an event that occurs in St0S_{t_{0}} on MM^{\prime} but not on MM. Moreover, there must be a non-empty set of variables Y𝗏𝖺𝗋(A)Y\subseteq{\sf var}(A) that have values (Xi,ji,+1)(X_{i,j_{i,\ell+1}}), as otherwise the two runs would be identical. Since resampling history does not change before t0t_{0}, in the MM^{\prime} run, the assignment of variables in YY must be (Xi,ji,+1)(X^{\prime}_{i,j_{i,\ell+1}}) at time t0t_{0}.

We claim that Y=𝗏𝖺𝗋(A)Y={\sf var}(A). If the claim does not hold, then Z:=𝗏𝖺𝗋(A)YZ:={\sf var}(A)\setminus Y\neq\emptyset. Any variable in ZZ has not reached final round, and must be resampled in the MM run. Let XjZX_{j}\in Z be the first such variable being resampled at or after round t0t_{0} in the MM run. (The choice of XjX_{j} may not be unique, and we just choose an arbitrary one.) Recall that YY\neq\emptyset, AA can no longer happen, thus there must be AAA^{\prime}\neq A causing such a resampling of XjX_{j}. Then 𝗏𝖺𝗋(A)𝗏𝖺𝗋(A){\sf var}(A)\cap{\sf var}(A^{\prime})\neq\emptyset. Consider any variable XkX_{k} with k𝗏𝖺𝗋(A)𝗏𝖺𝗋(A)k\in{\sf var}(A)\cap{\sf var}(A^{\prime}). It is resampled at or after time t0t_{0} in the MM run due to AA^{\prime}. Hence XkZX_{k}\in Z for any such kk. Moreover, in the MM run, until AA^{\prime} happens, XkX_{k} has not been resampled since time t0t_{0}, because AA^{\prime} is the first resampling event at or after time t0t_{0} that involves variables in ZZ. On the other hand, in the MM^{\prime} run, XkX_{k}’s value causes AA to happen at time t0t_{0}. Hence, there exists an assignment on variables in 𝗏𝖺𝗋(A)𝗏𝖺𝗋(A){\sf var}(A)\cap{\sf var}(A^{\prime}) such that both AA and AA^{\prime} happen. Clearly this assignment can be extended to a full assignment so that both AA and AA^{\prime} happen. However, AAA\sim A^{\prime} as they share the variable XjX_{j}. Due to Condition 5, AA=A\cap A^{\prime}=\emptyset. Contradiction! Therefore the claim holds.

We argue that the remaining case, Y=𝗏𝖺𝗋(A)Y={\sf var}(A), is also not possible. Since AA occurs in the MM^{\prime} run, we know, by the definition of σ\sigma^{\prime}, that AΓ+(S)A\in\Gamma^{+}(S_{\ell}). Thus, some event whose variables intersect with those in AA must occur in the MM run. But when the algorithm attempts to update variables shared by these two events in the MM run, it will access values beyond the final round of the resampling table, a contradiction. ∎

Theorem 8.

When Condition 5 holds and Algorithm 2 halts, its output is the product distribution μ()\mu(\cdot) conditioned on avoiding all bad events.

Proof.

Let an independent set sequence 𝒮\mathcal{S} of length \ell be the log of any successful run. Then S=S_{\ell}=\emptyset. By Lemma 7, conditioned on the log 𝒮\mathcal{S}, the output assignment σ\sigma is μ(i[m]Γ+(S)Ai¯)=μ(i[m]Ai¯)\mu\big{(}\cdot\mid\bigwedge_{i\in[m]\setminus\Gamma^{+}(S_{\ell})}\overline{A_{i}}\big{)}=\mu\big{(}\cdot\mid\bigwedge_{i\in[m]}\overline{A_{i}}\big{)}. This is valid for any possible log, and the theorem follows. ∎

In other words, let Σ\Sigma be the set of assignments that avoid all bad events. Let UU be the output of Algorithm 2. In the case that all variables are sampled from the uniform distribution, we have Pr(U=σ)=1|Σ|\mathop{\mathrm{Pr}}\nolimits(U=\sigma)=\frac{1}{|\Sigma|}, for all σΣ\sigma\in\Sigma.

3. Expected running time of Algorithm 2

In this section we give an explicit formula for the running time of Algorithm 2. We assume that Condition 5 holds throughout the section.

We first give a combinatorial explanation of qIq_{I} for any independent set II of the dependency graph GG. To simplify the notation, we denote the event iSAi\bigwedge_{i\in S}A_{i}, i.e., the conjunction of all events in SS, by A(S)A(S).

For any set II in the dependency graph, we denote by pIp_{I} the probability Prμ(A(I))\mathop{\mathrm{Pr}}\nolimits_{\mu}(A(I)) that all events in II happen (and possibly some other events too). If II is an independent set in the dependency graph, any two events in II are independent and

(2) pI=iIpi.\displaystyle p_{I}=\prod_{i\in I}p_{i}.

Moreover, for any set JJ of events that is not an independent set, we have pJ=0p_{J}=0 due to Condition 5.

On the other hand, the quantity qIq_{I} is in fact the probability that exactly the events in II happen and no others do. This can be verified using inclusion-exclusion, together with Condition 5:

Prμ(iIAiiIAi¯)\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(\bigwedge_{i\in I}A_{i}\wedge\bigwedge_{i\notin I}\overline{A_{i}}\right) =JI(1)|JI|pJ\displaystyle=\sum_{J\supseteq I}(-1)^{|J\setminus I|}p_{J}
(3) =J,JI(1)|JI|pJ=qI,\displaystyle=\sum_{J\in\mathcal{I},\ J\supseteq I}(-1)^{|J\setminus I|}p_{J}=q_{I},

where \mathcal{I} denotes the collection of all independent sets of GG. Since the events (iIAiiIAi¯)(\bigwedge_{i\in I}A_{i}\wedge\bigwedge_{i\notin I}\overline{A_{i}}) are mutually exclusive for different II’s,

IqI=1.\displaystyle\sum_{I\in\mathcal{I}}q_{I}=1.

Moreover, since the event A(I)A(I) is the union over JIJ\supseteq I of the events (iJAiiJAi¯)(\bigwedge_{i\in J}A_{i}\wedge\bigwedge_{i\notin J}\overline{A_{i}}), we have

(4) pI=J,JIqJ.\displaystyle p_{I}=\sum_{J\in\mathcal{I},\ J\supseteq I}q_{J}.
Lemma 9.

Assume Condition 5 holds. Let 𝒮=S1,,S\mathcal{S}=S_{1},\ldots,S_{\ell} be an independent set sequence of length >0\ell>0. Then in Algorithm 2,

Pr(the log is 𝒮 up to round )=qSt=11pSt.\displaystyle\mathop{\mathrm{Pr}}\nolimits\big{(}\text{the log is $\mathcal{S}$ up to round $\ell$}\big{)}=q_{S_{\ell}}\prod_{t=1}^{\ell-1}p_{S_{t}}.
Proof.

Clearly, if qS=0q_{S_{\ell}}=0, then the said sequence will never happen. We assume that qS>0q_{S_{\ell}}>0 in the following.

Recall that μ\mu is the product distribution of sampling all variables independently. We need to distinguish the probability space with respect to μ\mu from that with respect to the execution of the algorithm. We write PrPRS()\mathop{\mathrm{Pr}}\nolimits_{\mathrm{PRS}}(\cdot) to refer to the algorithm, and write Prμ()\mathop{\mathrm{Pr}}\nolimits_{\mu}(\cdot) to refer to the (static) space with respect to μ\mu. As noted before, to simplify the notation we will use A(S)A(S) to denote the event iSAi\bigwedge_{i\in S}A_{i}, where S[m]S\subseteq[m]. In addition, B(S)B(S) will be used to denote iSAi¯\bigwedge_{i\in S}\overline{A_{i}}. For II\in\mathcal{I}, define

I:=Γ+(I)I,Ie:=[m]Γ+(I),andIc:=[m]I=IIe.\partial I:=\Gamma^{+}(I)\setminus I,\quad I^{e}:=[m]\setminus\Gamma^{+}(I),\quad\text{and}\quad I^{c}:=[m]\setminus I=\partial I\cup I^{e}.

So I\partial I is the “boundary” of II, comprising events that are not in II but which depend on II, and IeI^{e} is the “exterior” of II, comprising events that are independent of all events in II. The complement IcI^{c} is simply the set of all events not in II. Note that B(Ic)=B(I)B(Ie)B(I^{c})=B(\partial I)\wedge B(I^{e}). As examples of the notation, Prμ(A(I))=iIpi=pI\mathop{\mathrm{Pr}}\nolimits_{\mu}(A(I))=\prod_{i\in I}p_{i}=p_{I} is the probability that all events in II occur under μ\mu, and Prμ(A(I)B(Ic))=qI\mathop{\mathrm{Pr}}\nolimits_{\mu}(A(I)\wedge B(I^{c}))=q_{I} is the probability that exactly the events in II occur.

By the definition of IeI^{e}, we have that

(5) Prμ(B(Ie)A(I))=Prμ(B(Ie)),\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(B(I^{e})\mid A(I)\right)=\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(I^{e})),

and, by Condition 5, that

(6) A(I)B(I)=A(I).\displaystyle A(I)\wedge B(\partial I)=A(I).

Hence for any II\in\mathcal{I},

qI\displaystyle q_{I} =Prμ(A(I)B(Ic))\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(I)\wedge B(I^{c})\big{)}
=Prμ(A(I)B(I)B(Ie))\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(I)\wedge B(\partial I)\wedge B(I^{e})\big{)}
=Prμ(A(I)B(Ie))\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(I)\wedge B(I^{e})\big{)} by (6)
(7) =Prμ(A(I))Prμ(B(Ie)).\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(A(I)\right)\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(B(I^{e})\right). by (5)

We prove the lemma by induction. It clearly holds when =1\ell=1. At round 2\ell\geq 2, since we only resample variables that are involved in S1S_{\ell-1}, we have that SΓ+(S1)S_{\ell}\subseteq\Gamma^{+}(S_{\ell-1}). Moreover, variables are not resampled in any AiA_{i} where iS1ei\in S_{\ell-1}^{e}, and hence

(8) B(Sc)B(S1e)=B(Sc).\displaystyle B(S_{\ell}^{c})\wedge B(S_{\ell-1}^{e})=B(S_{\ell}^{c}).

Conditioned on S1S_{\ell-1}, by Lemma 7, the distribution of σ\sigma_{\ell} at round \ell is the product distribution conditioned on none of the events outside of Γ+(S1)\Gamma^{+}(S_{\ell-1}) occuring; namely, it is Prμ(B(S1e))\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}\cdot\mid B(S_{\ell-1}^{e})\big{)}. Thus the probability of getting SS_{\ell} in round \ell is

PrPRS(A(S)B(Sc) holds in round |prior log is S1,,S1)\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mathrm{PRS}}\big{(}A(S_{\ell})\wedge B(S_{\ell}^{c})\text{ holds in round $\ell$}\bigm{|}\text{prior log is }S_{1},\ldots,S_{\ell-1}\big{)}
=Prμ(A(S)B(Sc)B(S1e))\displaystyle\qquad{}=\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(S_{\ell})\wedge B(S_{\ell}^{c})\mid B(S_{\ell-1}^{e})\big{)}
=Prμ(A(S)B(Sc)B(S1e))Prμ(B(S1e))\displaystyle\qquad{}=\frac{\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(S_{\ell})\wedge B(S_{\ell}^{c})\wedge B(S_{\ell-1}^{e})\big{)}}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{\ell-1}^{e}))}
=Prμ(A(S)B(Sc))Prμ(B(S1e))\displaystyle\qquad{}=\frac{\mathop{\mathrm{Pr}}\nolimits_{\mu}\big{(}A(S_{\ell})\wedge B(S_{\ell}^{c})\big{)}}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{\ell-1}^{e}))} by (8)
(9) =qSPrμ(B(S1e)).\displaystyle\qquad{}=\frac{q_{S_{\ell}}}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{\ell-1}^{e}))}.

By (9) and the induction hypothesis, we have

PrPRS(the log is 𝒮 up to round )\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mathrm{PRS}}\left(\text{the log is $\mathcal{S}$ up to round $\ell$}\right) =qSPrμ(B(S1e))qS1t=12pSt\displaystyle=\frac{q_{S_{\ell}}}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{\ell-1}^{e}))}\cdot q_{S_{\ell-1}}\prod_{t=1}^{\ell-2}p_{S_{t}}
=qSt=11pSt,\displaystyle=q_{S_{\ell}}\prod_{t=1}^{\ell-1}p_{S_{t}},

where to get the last line we used (7) on S1S_{\ell-1}. ∎

Essentially the proof above is a delayed revelation argument. At each round 1t11\leq t\leq\ell-1, we only reveal variables that are involved in StS_{t}. Thus, at round \ell, we have revealed all variables that are involved in 𝒮\mathcal{S}. With respect to these variables, the sequence 𝒮\mathcal{S} happens with probability p𝒮p_{\mathcal{S}}. Condition 5 guarantees that what we have revealed so far does not interfere with the final output (cf. Lemma 7). Hence the final state happens with probability qSq_{S_{\ell}}.

We write p𝒮=i=1pSip_{\mathcal{S}}=\prod_{i=1}^{\ell}p_{S_{i}} for an independent set sequence 𝒮\mathcal{S} of length 0\ell\geq 0. Note the convention that p𝒮=1p_{\mathcal{S}}=1 if 𝒮\mathcal{S} is empty and =0\ell=0. Then, Lemma 9 implies the following equality, which is first shown by Kolipaka and Szegedy [26] in the more general (not necessarily extremal) setting of the local lemma.

Corollary 10.

Assume Condition 5 holds. If q>0q_{\emptyset}>0, then

𝒮 s.t. S1=Ip𝒮q=qI,\displaystyle\sum_{\mathcal{S}\text{ s.t.\ }S_{1}=I}p_{\mathcal{S}}q_{\emptyset}=q_{I},

where 𝒮\mathcal{S} is an independent set sequence and II is an independent set of GG.

Proof.

First we claim that if q>0q_{\emptyset}>0, then Algorithm 2 halts with probability 11. Conditioned on any log 𝒮=S1,,S1\mathcal{S}=S_{1},\dots,S_{\ell-1}, by Lemma 7, the distribution of σ\sigma_{\ell} at round \ell is μ(B(S1e))\mu\big{(}\cdot\mid B(S_{\ell-1}^{e})\big{)}. The probability of getting a desired assignment is thus μ(B([m])B(S1e))=μ(B([m]))μ(B(S1e))μ(B([m]))=q\mu\big{(}B([m])\mid B(S_{\ell-1}^{e})\big{)}=\frac{\mu(B([m]))}{\mu(B(S_{\ell-1}^{e}))}\geq\mu(B([m]))=q_{\emptyset}. Hence the probability that the algorithm does not halt at time tt is at most (1q)t(1-q_{\emptyset})^{t}, which goes to 0 as tt goes to infinity.

Then we apply Lemma 9 when \emptyset is the final independent set. The left hand side is the total probability of all possible halting logs whose first independent set is exactly II. This is equivalent to getting exactly II in the first step, which happens with probability qIq_{I}. ∎

As a sanity check, the probability of all possible logs should sum to 11 when q>0q_{\emptyset}>0 and the algorithm halts with probability 11. Indeed, by Corollary 10,

𝒮p𝒮q\displaystyle\sum_{\mathcal{S}}p_{\mathcal{S}}q_{\emptyset} =I𝒮 s.t. S1=Ip𝒮q=IqI=1,\displaystyle=\sum_{I\in\mathcal{I}}\;\;\sum_{\mathcal{S}\text{ s.t.\ }S_{1}=I}p_{\mathcal{S}}q_{\emptyset}=\sum_{I\in\mathcal{I}}q_{I}=1,

where 𝒮\mathcal{S} is an independent set sequence. In other words,

(10) 𝒮p𝒮=1q,\displaystyle\sum_{\mathcal{S}}p_{\mathcal{S}}=\frac{1}{q_{\emptyset}},

where 𝒮\mathcal{S} is an independent set sequence. This fact is also observed by Knuth [25, Page 86, Theorem F] and Harvey and Vondrák [23, Corollary 5.28] in the more general settings. Our proof here gives a combinatorial explanation of this equality.

Equation (10) holds whenever q>0q_{\emptyset}>0. Recall that qq_{\emptyset} is the shorthand of q(p)q_{\emptyset}(\textbf{p}), which is

(11) q(p)=I(1)|I|iIpi,\displaystyle q_{\emptyset}(\textbf{p})=\sum_{I\in\mathcal{I}}(-1)^{|I|}\prod_{i\in I}p_{i},

where \mathcal{I} is the collection of independent sets of the dependency graph GG.

Lemma 11.

Assume Condition 5 holds. If q(p)>0q_{\emptyset}(\textbf{p})>0, then q(p1,,piz,,pm)>0q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})>0 for any i[m]i\in[m] and 0z10\leq z\leq 1.

Proof.

By (11),

q(p1,,piz,,pm)=I,iI(1)|I|jIpj+zI,iI(1)|I|jIpj.\displaystyle q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})=\sum_{I\in\mathcal{I},\;i\notin I}(-1)^{|I|}\prod_{j\in I}p_{j}+z\sum_{I\in\mathcal{I},\;i\in I}(-1)^{|I|}\prod_{j\in I}p_{j}.

Notice that I,iI(1)|I|jIpj=qi(p)<0\sum_{I\in\mathcal{I},\;i\in I}(-1)^{|I|}\prod_{j\in I}p_{j}=-q_{i}(\textbf{p})<0 (qi(p)q_{i}(\textbf{p}) is the probability of exactly event AiA_{i} occurring). Hence q(p1,,piz,,pm)q(p)>0q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})\geq q_{\emptyset}(\textbf{p})>0. ∎

Let TiT_{i} be the number of resamplings of event AiA_{i} and TT be the total number of resampling events. Then T=i=1mTiT=\sum_{i=1}^{m}T_{i}.

Lemma 12.

Assume Condition 5 holds. If q(p)>0q_{\emptyset}(\textbf{p})>0, then 𝔼Ti=q(p)(1q(p1,,piz,,pm))|z=1\mathop{\mathbb{{}E}}\nolimits T_{i}=q_{\emptyset}(\textbf{p})\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime}\bigg{|}_{z=1}.

Proof.

By Lemma 11, Equation (10) holds with pip_{i} replaced by pizp_{i}z where z[0,1]z\in[0,1]. For a given independent set sequence 𝒮\mathcal{S}, let Ti(𝒮)T_{i}(\mathcal{S}) be the total number of occurences of AiA_{i} in 𝒮\mathcal{S}. Then we have that

(12) 𝒮p𝒮zTi(𝒮)=1q(p1,,piz,,pm).\displaystyle\sum_{\mathcal{S}}p_{\mathcal{S}}z^{T_{i}(\mathcal{S})}=\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}.

Take derivative with respect to zz of (12):

𝒮Ti(𝒮)p𝒮zTi(𝒮)1=(1q(p1,,piz,,pm)).\displaystyle\sum_{\mathcal{S}}T_{i}(\mathcal{S})p_{\mathcal{S}}z^{T_{i}(\mathcal{S})-1}=\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime}.

Evaluate the equation above at z=1z=1:

(13) 𝒮Ti(𝒮)p𝒮=(1q(p1,,piz,,pm))|z=1.\displaystyle\sum_{\mathcal{S}}T_{i}(\mathcal{S})p_{\mathcal{S}}=\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime}\bigg{|}_{z=1}.

On the other hand, we have that

𝔼Ti\displaystyle\mathop{\mathbb{{}E}}\nolimits T_{i} =𝒮PrPRS(the log is 𝒮)Ti(𝒮)\displaystyle=\sum_{\mathcal{S}}\textstyle\mathop{\mathrm{Pr}}\nolimits_{\mathrm{PRS}}\left(\text{the log is $\mathcal{S}$}\right)T_{i}(\mathcal{S})
(by Lemma 9) =𝒮p𝒮q(p)Ti(𝒮)\displaystyle=\sum_{\mathcal{S}}p_{\mathcal{S}}q_{\emptyset}(\textbf{p})T_{i}(\mathcal{S})
(by (13)) =q(p)(1q(p1,,piz,,pm))|z=1.\displaystyle=q_{\emptyset}(\textbf{p})\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime}\bigg{|}_{z=1}.

This completes the proof. ∎

Theorem 13.

Assume Condition 5 holds. If q>0q_{\emptyset}>0, then 𝔼T=i=1mqiq\mathop{\mathbb{{}E}}\nolimits T=\sum_{i=1}^{m}\frac{q_{i}}{q_{\emptyset}}.

Proof.

Clearly 𝔼T=i=1m𝔼Ti\mathop{\mathbb{{}E}}\nolimits T=\sum_{i=1}^{m}\mathop{\mathbb{{}E}}\nolimits T_{i}. By Lemma 12, all we need to show is that

(14) q(p)(1q(p1,,piz,,pm))|z=1=qi(p)q(p).\displaystyle q_{\emptyset}(\textbf{p})\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime}\bigg{|}_{z=1}=\frac{q_{i}(\textbf{p})}{q_{\emptyset}(\textbf{p})}.

This is because

q(p1,,piz,,pm)\displaystyle q_{\emptyset}^{\prime}(p_{1},\dots,p_{i}z,\dots,p_{m}) =iJ,J(1)|J|jJpj\displaystyle=\sum_{i\in J,\;J\in\mathcal{I}}(-1)^{|J|}\prod_{j\in J}p_{j}
=qi(p),\displaystyle=-q_{i}(\textbf{p}),

and thus

(1q(p1,,piz,,pm))\displaystyle\left(\frac{1}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})}\right)^{\prime} =q(p1,,piz,,pm)q(p1,,piz,,pm)2\displaystyle=\frac{-q_{\emptyset}^{\prime}(p_{1},\dots,p_{i}z,\dots,p_{m})}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})^{2}}
=qi(p)q(p1,,piz,,pm)2.\displaystyle=\frac{q_{i}(\textbf{p})}{q_{\emptyset}(p_{1},\dots,p_{i}z,\dots,p_{m})^{2}}.

It is easy to see that (14) follows as we set z=1z=1 and the theorem is shown. ∎

The quantity i=1mqiq\sum_{i=1}^{m}\frac{q_{i}}{q_{\emptyset}} is not always easy to bound. Kolipaka and Szegedy [26] have shown that when the probability vector p satisfies Shearer’s condition with a constant “slack”, the running time is in fact linear in the number of events in the more general setting. We rewrite it in our notations.

Theorem 14 ([26, Theorem 5]).

Let d2d\geq 2 be a positive integer and pc=(d1)(d1)ddp_{c}=\frac{(d-1)^{(d-1)}}{d^{d}}. Let p=maxi[m]{pi}p=\max_{i\in[m]}\{p_{i}\}. If GG has maximum degree dd and p<pcp<p_{c}, then 𝔼Tppcpm\mathop{\mathbb{{}E}}\nolimits T\leq\frac{p}{p_{c}-p}\cdot m.

4. Applications of Algorithm 2

4.1. Sink-free Orientations

The goal of this application is to sample a sink-free orientation. Given a graph G=(V,E)G=(V,E), an orientation of edges is a mapping σ\sigma so that σ(e)=(u,v)\sigma(e)=(u,v) or (v,u)(v,u) where e=(u,v)Ee=(u,v)\in E. A sink under orientation σ\sigma is a vertex vv so that for any adjacent edge e=(u,v)e=(u,v), σ(e)=(u,v)\sigma(e)=(u,v). A sink-free orientation is an orientation so that no vertex is a sink.

  • Name:

    Sampling Sink-free Orientations

  • Instance:

    A Graph GG.

  • Output:

    A uniform sink-free orientation.

The first algorithm for this problem is given by Bubley and Dyer [6], using Markov chains and path coupling techniques.

In this application, we associate with each edge a random variable, whose possible values are (u,v)(u,v) or (v,u)(v,u). For each vertex vv, we associate it with a bad event AvA_{v}, which happens when vv is a sink. Thus the graph GG itself is also the dependency graph. Condition 5 is satisfied, because if a vertex is a sink, then none of its neighbours can be a sink. Thus we may apply Algorithm 2, which yields Algorithm 3. This is the “sink-popping” algorithm given by Cohn, Pemantle, and Propp [7].

  1. (1)

    Orient each edge independently and uniformly at random.

  2. (2)

    While there is at least one sink, re-orient all edges that are adjacent to a sink.

  3. (3)

    Output the current assignment.

Algorithm 3 Sample Sink-free Orientations

Let Zsink,0Z_{\mathrm{sink},0} be the number of sink-free orientations, and let Zsink,1Z_{\mathrm{sink},1} be the number of orientations having exactly one sink. Then Theorem 13 specializes into the following.

Theorem 15.

The expected number of resampled sinks in Algorithm 3 is Zsink,1Zsink,0\frac{Z_{\mathrm{sink},1}}{Z_{\mathrm{sink},0}}.

It is easy to see that a graph has a sink-free orientation if and only if the graph is not a tree. The next theorem gives an explicit bound on Zsink,1Zsink,0\frac{Z_{\mathrm{sink},1}}{Z_{\mathrm{sink},0}} when sink-free orientations exist.

Theorem 16.

Let GG be a connected graph on nn vertices. If GG is not a tree, then Zsink,1Zsink,0n(n1)\frac{Z_{\mathrm{sink},1}}{Z_{\mathrm{sink},0}}\leq n(n-1), where n=|V(G)|n=\left|V(G)\right|.

Proof.

Consider an orientation of the edges of GG with a unique sink at vertex vv. We give a systematic procedure for transforming this orientation to a sink-free orientation. Since GG is connected and not a tree, there exists an (undirected) path Π\Pi in GG of the form v=v0,v1,,v1,v=vkv=v_{0},v_{1},\ldots,v_{\ell-1},v_{\ell}=v_{k}, where the vertices v0,v1,,v1v_{0},v_{1},\ldots,v_{\ell-1} are all distinct and 0k20\leq k\leq\ell-2. In other words, Π\Pi is a simple path of length 1\ell-1 followed by a single edge back to some previously visited vertex. We will choose a canonical path of this form (depending only on GG and not on the current orientation) for each start vertex vv.

We now proceed as follows. Since vv is a sink, the first edge on Π\Pi is directed (v1,v0)(v_{1},v_{0}). Reverse the orientation of this edge so that it is now oriented (v0,v1)(v_{0},v_{1}). This operation destroys the sink at v=v0v=v_{0} but may create a new sink at v1v_{1}. If v1v_{1} is not a sink then halt. Otherwise, reverse the orientation of the second edge of Π\Pi from (v2,v1)(v_{2},v_{1}) to (v1,v2)(v_{1},v_{2}). Continue in this fashion: if we reach viv_{i} and it is not a sink then halt; otherwise reverse the orientation of the (i+1)(i+1)th edge from (vi+1,vi)(v_{i+1},v_{i}) to (vi,vi+1)(v_{i},v_{i+1}). This procedure must terminate with a sink-free graph before we reach vv_{\ell}. To see this, note that if we reach the vertex v1v_{\ell-1} then the final edge of Π\Pi must be oriented (v1,v)(v_{\ell-1},v_{\ell}), otherwise the procedure would have terminated already at vertex vk(=v)v_{k}(=v_{\ell}).

The effect of the above procedure is to reverse the orientation of edges on some initial segment v0,,viv_{0},\ldots,v_{i} of Π\Pi. To put the procedure into reverse, we just need to know the identity of the vertex viv_{i}. So our procedure associates at most nn orientations having a single sink at vertex vv with each sink-free orientation. There are n(n1)n(n-1) choices for the pair (v,vi)(v,v_{i}), and hence n(n1)n(n-1) single-sink orientations associated with each sink-free orientation. This establishes the result. ∎

Remark.

The bound in Theorem 16 is optimal up to a factor of 22. Consider a cycle of length nn. Then there are 22 sink-free orientations, and n(n1)n(n-1) single-sink orientations.

Theorem 16 and Theorem 15 together yield an n2n^{2} bound on the expected number of resamplings that occur during a run of Algorithm 3. A cycle of length nn is an interesting special case. Consider the number of clockwise oriented edges during a run of the algorithm. It is easy to check that this number evolves as an unbiased lazy simple random walk on [0,n][0,n]. Since the walk starts close to n/2n/2 with high probability, we know that it will take Ω(n2)\Omega(n^{2}) steps to reach one of the sink-free states, i.e., 0 or nn.

On the other hand, if GG is a regular graph of degree Δ3\Delta\geq 3, then we get a much better linear bound from Theorem 14. In the case Δ=3\Delta=3, we have pc=4/27p_{c}=4/27, p=1/8p=1/8 and p/(pcp)=27/5p/(p_{c}-p)=27/5. So the expected number of resamplings is bounded by 27n/527n/5. The constant in the bound improves as Δ\Delta increases. Conversely, since the expected running time is exact, we can also apply Theorem 14 to give an upper bound of Zsink,1Zsink,0\frac{Z_{\mathrm{sink},1}}{Z_{\mathrm{sink},0}} when GG is a regular graph.

4.2. Rooted Spanning Trees

Given a graph G=(V,E)G=(V,E) with a special vertex rr, we want to sample a uniform spanning tree with rr as the root.

  • Name:

    Sampling Rooted Spanning Trees

  • Instance:

    A Graph GG with a vertex rr.

  • Output:

    A uniform spanning tree rooted at rr.

Of course, any given spanning tree may be rooted at any vertex rr, so there is no real difference between rooted and unrooted spanning trees. However, since this approach to sampling generates an oriented tree, it is easier to think of the trees as being rooted at a particular vertex rr.

For all vertices other than rr, we randomly assign it to point to one of its neighbours. This is the random variable associated with vv. We will think of this random variable as an arrow vs(v)v\to s(v) pointing from vv to its successor s(v)s(v). The arrows point out an oriented subgraph of GG with n1n-1 edges {{v,s(v)}:vV{r}}\{\{v,s(v)\}:v\in V\setminus\{r\}\} directed as specified by the arrows. The constraint for this subgraph to be a tree rooted at rr is that it contains no directed cycles. Note that there are 2|E||V|+κ(G)2^{|E|-|V|+\kappa(G)} (undirected) cycles in GG, where κ(G)\kappa(G) is the number of connected components of GG. Hence, we have possibly exponentially many constraints.

Two cycles are dependent if they share at least one vertex. We claim that Condition 5 is satisfied. Suppose a cycle CC is present, and CCC^{\prime}\neq C is another cycle that shares at least one vertex with CC. If CC^{\prime} is also present, then we may start from any vertex vCCv\in C\cap C^{\prime}, and then follow the arrows vvv\to v^{\prime}. Since both CC and CC^{\prime} are present, it must be that vCCv^{\prime}\in C\cap C^{\prime} as well. Continuing this argument we see that C=CC=C^{\prime}. Contradiction!

As Condition 5 is met, we may apply Algorithm 2, yielding Algorithm 4. This is exactly the “cycle-popping” algorithm by Wilson [39], as described in [33].

  1. (1)

    Let TT be an empty set. For each vertex vrv\neq r, randomly choose a neighbour uΓ(v)u\in\Gamma(v) and add an edge (v,u)(v,u) to TT.

  2. (2)

    While there is at least one cycle in TT, remove all edges in all cycles, and for all vertices whose edges are removed, redo step (1).

  3. (3)

    Output the current set of edges.

Algorithm 4 Sample Rooted Spanning Trees

Let Ztree,0Z_{\mathrm{tree},0} be the number of possible assignments of arrows to the vertices of GG, that yield a (directed) tree with root rr, and Ztree,1Z_{\mathrm{tree},1} be the number of assignments that yield a unicyclic subgraph. The next theorem gives an explicit bound on Ztree,1Ztree,0\frac{Z_{\mathrm{tree},1}}{Z_{\mathrm{tree},0}}.

Theorem 17.

Suppose GG is a connected graph on nn vertices, with mm edges. Then Ztree,1Ztree,0mn\frac{Z_{\mathrm{tree},1}}{Z_{\mathrm{tree},0}}\leq mn.

Proof.

Consider an assignment of arrows to the vertices of GG that forms a unicyclic graph. This unicyclic subgraph has two components: a directed tree with root rr, and a directed cycle with a number of directed subtrees rooted on the cycle. This is because if we remove the unicyclic component, the rest of the graph has one less edge than vertices and no cycles, which must be a spanning tree.

As GG is connected, there must be an edge in GG joining the two components; let this edge be {v0,v1}\{v_{0},v_{1}\}, where v0v_{0} is in the tree component and v1v_{1} in the unicyclic component. Now extend this edge to a path v0,v1,,vv_{0},v_{1},\ldots,v_{\ell}, by following arrows until we reach the cycle. Thus, v1v2,v2v3,,v1vv_{1}\to v_{2},\,v_{2}\to v_{3},\,\ldots,\,v_{\ell-1}\to v_{\ell} are all arrows, and vv_{\ell} is the first vertex that lies on the cycle. (It may happen that =1\ell=1.) Let vv+1v_{\ell}\to v_{\ell+1} be the arrow out of vv_{\ell}. Now reassign the arrows from vertices v1,,vv_{1},\ldots,v_{\ell} thus: vv1,,v2v1,v1v0v_{\ell}\to v_{\ell-1},\,\ldots,v_{2}\to v_{1},\,v_{1}\to v_{0}. Notice that the result is a directed tree rooted at rr.

As before, we would like to bound the number of unicyclic subgraphs associated with a given tree by this procedure. We claim that the procedure can be reversed given just two pieces of information, namely, the edge {v,v+1}\{v_{\ell},v_{\ell+1}\} and the vertex v0v_{0}. Note that, even though the edge {v,v+1}\{v_{\ell},v_{\ell+1}\} is undirected, we can disambiguate the endpoints, as vv_{\ell} is the vertex closer to the root rr. The vertices v1,,v1v_{\ell-1},\ldots,v_{1} are easy to recover, as they are the vertices on the unique path in the tree from vv_{\ell} to v0v_{0}. To recover the unicyclic subgraph, we just need to reassign the arrows for vertices v1,,vv_{1},\ldots,v_{\ell} as follows: v1v2,,vv+1v_{1}\to v_{2},\,\ldots,\,v_{\ell}\to v_{\ell+1}.

As the procedure can be reversed knowing one edge and one vertex, the number of unicyclic graphs associated with each tree can be at most mnmn. ∎

Theorem 17 combined with Theorem 13 yields an mnmn upper bound on the expected number of “popped cycles” during a run of Algorithm 4.

On the other hand, take a cycle of length nn. There are nn spanning trees with a particular root vv, and there are Ω(n3)\Omega(n^{3}) unicyclic graphs (here a cycle has to be of length 22). Thus the ratio is Ω(n2)=Ω(mn)\Omega(n^{2})=\Omega(mn) since m=nm=n, matching the bound of Theorem 17. Moreover, it is known that the cycle-popping algorithm may take Ω(n3)\Omega(n^{3}) time, for example on a dumbbell graph [33].

4.3. Extremal CNF formulas

A classical setting in the study of algorithmic Lovász Local Lemma is to find satisfying assignments in kk-CNF formulas222As usual in the study of Lovász Local Lemma, by “kk-CNF” we mean that every clause has exactly size kk., when the number of appearances of every variable is bounded by dd. Theorem 1 guarantees the existence of a satisfying assignment as long as d2kek+1d\leq\frac{2^{k}}{ek}+1. On the other hand, sampling is apparently harder than searching in this setting. As shown in [4, Corollary 30], it is NP-hard to approximately sample satisfying assignments when d52k/2d\geq 5\cdot 2^{k/2}, even restricted to the special case of monotone formulas.

Meanwhile, sink-free orientations can be recast in terms of CNF formulas. Every vertex in the graph is mapped to a clause, and every edge is a variable. Thus every variable appears exactly twice, and we require that the two literals of the same variable are always opposite. Interpreting an orientation from uu to vv as making the literal in the clause corresponding to vv false, the “sink-free” requirement is thus “not all literals in a clause are false”. Hence a “sink-free” orientation is just a satisfying assignment for the corresponding CNF formula.

To apply Algorithm 2, we need to require that the CNF formula satisfies Condition 5. Such formulas are defined as follows.

Definition 18.

We call a CNF formula extremal if for every two clauses CiC_{i} and CjC_{j}, if there is a common variable shared by CiC_{i} and CjC_{j}, then there exists some variable xx such that xx appears in both CiC_{i} and CjC_{j} and the two literals are one positive and one negative.

Let C1,,CmC_{1},\dots,C_{m} be the clauses of a formula φ\varphi. Then define the bad event AiA_{i} as the set of unsatisfying assignments of clause CiC_{i}. For an extremal CNF formula, these bad events satisfy Condition 5. This is because if AiAjA_{i}\sim A_{j}, then by Definition 18, there exists a variable x𝗏𝖺𝗋(Ai)𝗏𝖺𝗋(Aj)x\in{\sf var}(A_{i})\cap{\sf var}(A_{j}) such that the unsatisfying assignment of CiC_{i} and CjC_{j} differ on xx. Hence AiAj=A_{i}\cap A_{j}=\emptyset.

In this formulation, if the size of CiC_{i} is kk, then the corresponding event AiA_{i} happens with probability pi=Pr(Ai)=2kp_{i}=\mathop{\mathrm{Pr}}\nolimits(A_{i})=2^{-k}, where variables are sampled uniformly and independently.333We note that to find a satisfying assignment it is sometimes beneficial to consider non-uniform distributions. See [13]. Suppose each variable appears at most dd times. Then the maximum degree in the dependency graph is at most Δ=(d1)k\Delta=(d-1)k. Note that in Theorem 14, pc=(Δ1)(Δ1)ΔΔ1eΔp_{c}=\frac{(\Delta-1)^{(\Delta-1)}}{\Delta^{\Delta}}\geq\frac{1}{e\Delta}. Thus if d2kek+1d\leq\frac{2^{k}}{ek}+1, then pi=2k<pcp_{i}=2^{-k}<p_{c} and we may apply Theorem 14 to obtain a polynomial time sampling algorithm.

Corollary 19.

For extremal kk-CNF formulas where each variable appears in at most dd clauses, if d2kek+1d\leq\frac{2^{k}}{ek}+1, then Algorithm 2 samples satisfying assignments uniformly at random, with O(m)O(m) expected resamplings where mm is the number of clauses.

The condition in Corollary 19 essentially matches the condition of Theorem 1. On the other hand, if we only require Shearer’s condition as in Theorem 2, Algorithm 2 is not necessarily efficient. More precisely, let ZCNF,0Z_{\mathrm{CNF},0} be the number of satisfying assignments, and ZCNF,1Z_{\mathrm{CNF},1} be the number of assignments satisfying all but one clause. If we only require Shearer’s condition in Theorem 2, then the expected number of resamplings ZCNF,1ZCNF,0\frac{Z_{\mathrm{CNF},1}}{Z_{\mathrm{CNF},0}} can be exponential, as shown in the next example.

Example.

Construct an extremal CNF formula φ=C1C2C4m\varphi=C_{1}\wedge C_{2}\wedge\dots\wedge C_{4m} as follows. Let C1:=x1C_{1}:=x_{1}. Then the variable x1x_{1} is pinned to 11 to satisfy C1C_{1}. Let C2:=x¯1y1y2C_{2}:=\overline{x}_{1}\vee y_{1}\vee y_{2}, C3:=x¯1y1y¯2C_{3}:=\overline{x}_{1}\vee y_{1}\vee\overline{y}_{2}, and C4:=x¯1y¯1y2C_{4}:=\overline{x}_{1}\vee\overline{y}_{1}\vee y_{2}. Then y1y_{1} and y2y_{2} are also pinned to 11 to satisfy all C1C4C_{1}-C_{4}.

We continue this construction by letting

C4k+1\displaystyle C_{4k+1} :=y¯2k1y¯2kxk+1,\displaystyle:=\overline{y}_{2k-1}\vee\overline{y}_{2k}\vee x_{k+1},
C4k+2\displaystyle C_{4k+2} :=x¯k+1y2k+1y2k+2,\displaystyle:=\overline{x}_{k+1}\vee y_{2k+1}\vee y_{2k+2},
C4k+3\displaystyle C_{4k+3} :=x¯k+1y2k+1y¯2k+2,\displaystyle:=\overline{x}_{k+1}\vee y_{2k+1}\vee\overline{y}_{2k+2},
C4k+4\displaystyle C_{4k+4} :=x¯k+1y¯2k+1y2k+2,\displaystyle:=\overline{x}_{k+1}\vee\overline{y}_{2k+1}\vee y_{2k+2},

for all 1km11\leq k\leq m-1. It is easy to see by induction that to satisfy all of them, all xix_{i}’s and yiy_{i}’s have to be 11. Moreover, one can verify that this is indeed an extremal formula. Thus ZCNF,0=1Z_{\mathrm{CNF},0}=1. Moreover, since φ\varphi has a satisfying assignment and is extremal, Shearer’s condition is satisfied. Note also that φ\varphi is not a 33-CNF formula as C1C_{1} contains a single variable.

On the other hand, if we are allowed to ignore C1C_{1}, then x1x_{1} can be 0. In that case, there are 33 choices of y1y_{1} and y2y_{2} so that x2x_{2} to be 0 as well. Thus, there are at least 3m3^{m} assignments that only violate C1C_{1}, where x1=x2==xm=0x_{1}=x_{2}=\dots=x_{m}=0. It implies that ZCNF,13mZ_{\mathrm{CNF},1}\geq 3^{m}. Hence we see that ZCNF,1ZCNF,03m\frac{Z_{\mathrm{CNF},1}}{Z_{\mathrm{CNF},0}}\geq 3^{m}. Due to Theorem 13, the expected running time of Algorithm 2 on this formula φ\varphi is exponential in mm.

We will discuss more on sampling satisfying assignments of a kk-CNF formula in Section 7.1.

5. General Partial Rejection Sampling

In this section we give a general version of Algorithm 2 which can be applied to arbitrary instances in the variable framework, even without Condition 5.

Recall the notation introduced at the beginning of Section 2. So, {X1,,Xn}\{X_{1},\dots,X_{n}\} is a set of random variables, each with its own distribution and range DiD_{i}, and {A1,,Am}\{A_{1},\dots,A_{m}\} is a set of bad events that depend on XiX_{i}’s. The dependencies between events are encoded in the dependency graph G=(V,E)G=(V,E). As before, we will use the idea of a resampling table. Recall that σ=σt={Xi,ji,t1in}\sigma=\sigma_{t}=\{X_{i,j_{i,t}}\mid 1\leq i\leq n\} denotes the current assignment of variables at round tt, i.e., the elements of the resampling table that are active at time tt. Given σ\sigma, let 𝖡𝖺𝖽(σ){\sf Bad}(\sigma) be the set of occurring bad events; that is, 𝖡𝖺𝖽(σ)={iσAi}{\sf Bad}(\sigma)=\{i\mid\sigma\in A_{i}\}. For a subset SVS\subset V, let S\partial S be the boundary of SS; that is, S={iiS and jS,(i,j)E}\partial S=\{i\mid i\notin S\text{ and }\exists j\in S,\;(i,j)\in E\}. Moreover, let

𝗏𝖺𝗋(S):=iS𝗏𝖺𝗋(Ai).\displaystyle{\sf var}(S):=\bigcup_{i\in S}{\sf var}(A_{i}).

Let σ|S\sigma|_{S} be the partial assignment of σ\sigma restricted to 𝗏𝖺𝗋(S){\sf var}(S). For an event AiA_{i} and SVS\subseteq V, we write Aiσ|SA_{i}\perp\sigma|_{S} if either 𝗏𝖺𝗋(Ai)𝗏𝖺𝗋(S)={\sf var}(A_{i})\cap{\sf var}(S)=\emptyset, or there is no way to extend the partial assignment σ|S\sigma|_{S} to all variables so that AiA_{i} holds. Otherwise Ai⟂̸σ|SA_{i}\not\perp\sigma|_{S}.

Definition 20.

A set SVS\subseteq V is unblocking under σ\sigma if for every iSi\in\partial S, Aiσ|SA_{i}\perp\sigma|_{S}.

Given σ\sigma, our goal is to resample a set of events that is unblocking and contains 𝖡𝖺𝖽(σ){\sf Bad}(\sigma). Such a set must exist because VV is unblocking (V\partial V is empty) and 𝖡𝖺𝖽(σ)V{\sf Bad}(\sigma)\subseteq V. However, we want to resample as few events as possible.

R𝖡𝖺𝖽(σ)R\leftarrow{\sf Bad}(\sigma) ;
  // RR is the set of events that will be resampled.
NN\leftarrow\emptyset ;
  // NN is the set of events that will not be resampled.
URNU\leftarrow\partial R\setminus N;
while UU\neq\emptyset do
 for iUi\in U do
    if Ai⟂̸σ|RA_{i}\not\perp\sigma|_{R} then
       RR{i}R\leftarrow R\cup\{i\};
    else
       NN{i}N\leftarrow N\cup\{i\};
      end if
    
   end for
 URNU\leftarrow\partial R\setminus N;
 
end while
return RR
Algorithm 5 Select the resampling set 𝖱𝖾𝗌(σ){\sf Res}(\sigma) under an assignment σ\sigma

Intuitively, we start by setting the resampling set R0R_{0} as the set of bad events 𝖡𝖺𝖽(σ){\sf Bad}(\sigma). We mark resampling events in rounds, similar to a breadth first search. Let RtR_{t} be the resampling set of round t0t\geq 0. In round t+1t+1, let AiA_{i} be an event on the boundary of RtR_{t} that hasn’t been marked yet. We mark it “resampling” if the partial assignments on the shared variables of AiA_{i} and RtR_{t} can be extended so that AiA_{i} occurs. Otherwise we mark it “not resampling”. We continue this process until there is no unmarked event left on the boundary of the current RR. An event outside of Γ+(R)\Gamma^{+}(R) may be left unmarked at the end of Algorithm 5. Note that once we mark some event “not resampling”, it will never be added into the resampling set. This is because RR is only grow in size during the algorithm.

In Algorithm 5, we are dynamically updating RR during each iteration of going through UU. This is potentially beneficial as an event AiA_{i} may become incompatible with RR after some event AjA_{j} is added, where both i,jUi,j\in U.

We fix a priori an arbitrary ordering while choosing iUi\in U in the “for” loop of Algorithm 5. Then the output of Algorithm 5 is deterministic under σ\sigma. Call it 𝖱𝖾𝗌(σ){\sf Res}(\sigma).

Lemma 21.

Let σ\sigma be an assignment. For any i𝖱𝖾𝗌(σ)i\in\partial{\sf Res}(\sigma), Aiσ|𝖱𝖾𝗌(σ)A_{i}\perp\sigma|_{{\sf Res}(\sigma)}.

Proof.

Since i𝖱𝖾𝗌(σ)i\in\partial{\sf Res}(\sigma), it must have been marked. Moreover, i𝖱𝖾𝗌(σ)i\not\in{\sf Res}(\sigma), so it must be marked as “not resampling”. Thus, there exists an intermediate set R𝖱𝖾𝗌(σ)R\subseteq{\sf Res}(\sigma) during the execution of Algorithm 5 such that Aiσ|RA_{i}\perp\sigma|_{R} and iRi\in\partial R. It implies that AiA_{i} is disjoint from the partial assignment of σ\sigma restricted to 𝗏𝖺𝗋(Ai)𝗏𝖺𝗋(R){\sf var}(A_{i})\cap{\sf var}(R). However,

𝗏𝖺𝗋(Ai)𝗏𝖺𝗋(R)𝗏𝖺𝗋(Ai)𝗏𝖺𝗋(𝖱𝖾𝗌(σ))\displaystyle{\sf var}(A_{i})\cap{\sf var}(R)\subseteq{\sf var}(A_{i})\cap{\sf var}({\sf Res}(\sigma))

as R𝖱𝖾𝗌(σ)R\subseteq{\sf Res}(\sigma). We have that Aiσ|𝖱𝖾𝗌(σ)A_{i}\perp\sigma|_{{\sf Res}(\sigma)}. ∎

If Condition 5 is met, then 𝖱𝖾𝗌(σ)=𝖡𝖺𝖽(σ){\sf Res}(\sigma)={\sf Bad}(\sigma). This is because at the first step, R=𝖡𝖺𝖽(σ)R={\sf Bad}(\sigma). By Condition 5, for any i𝖡𝖺𝖽(σ)i\in\partial{\sf Bad}(\sigma), AiA_{i} is disjoint from all AjA_{j}’s where j𝖡𝖺𝖽(σ)j\in{\sf Bad}(\sigma) and AiAjA_{i}\sim A_{j}. Since AjA_{j} occurs under σ\sigma, Aiσ|RA_{i}\perp\sigma|_{R}. Algorithm 5 halts in the first iteration. In this case, since the resampling set is just the (independent) set of occurring bad events, the later Algorithm 6 coincides with Algorithm 2.

The key property of 𝖱𝖾𝗌(σ){\sf Res}(\sigma) is that if we change the assignment outside of 𝖱𝖾𝗌(σ){\sf Res}(\sigma), then 𝖱𝖾𝗌(σ){\sf Res}(\sigma) does not change, unless the new assignment introduces a new bad event outside of 𝖱𝖾𝗌(σ){\sf Res}(\sigma). More formally, we have the following lemma.

Lemma 22.

Let σ\sigma be an assignment. Let σ\sigma^{\prime} be another assignment such that 𝖡𝖺𝖽(σ)𝖱𝖾𝗌(σ){\sf Bad}(\sigma^{\prime})\subseteq{\sf Res}(\sigma) and such that σ\sigma and σ\sigma^{\prime} agree on all variables in 𝗏𝖺𝗋(𝖱𝖾𝗌(σ))=i𝖱𝖾𝗌(σ)𝗏𝖺𝗋(Ai){\sf var}({\sf Res}(\sigma))=\bigcup_{i\in{\sf Res}(\sigma)}{\sf var}(A_{i}). Then, 𝖱𝖾𝗌(σ)=𝖱𝖾𝗌(σ){\sf Res}(\sigma^{\prime})={\sf Res}(\sigma).

Proof.

Let Rt(σ),Nt(σ)R_{t}(\sigma),N_{t}(\sigma) be the intermediate sets R,NR,N, respectively, at time tt of the execution of Algorithm 5 under σ\sigma. Thus R0(σ)=𝖡𝖺𝖽(σ)R_{0}(\sigma)={\sf Bad}(\sigma) and R0(σ)R1(σ)𝖱𝖾𝗌(σ)R_{0}(\sigma)\subseteq R_{1}(\sigma)\subseteq\dots\subseteq{\sf Res}(\sigma). Moreover, N0(σ)N1(σ)N_{0}(\sigma)\subseteq N_{1}(\sigma)\subseteq\cdots. We will show by induction that Rt(σ)=Rt(σ)R_{t}(\sigma)=R_{t}(\sigma^{\prime}) and Nt(σ)=Nt(σ)N_{t}(\sigma)=N_{t}(\sigma^{\prime}) for any t0t\geq 0.

For the base case of t=0t=0, by the condition of the lemma, for every i𝖡𝖺𝖽(σ)𝖱𝖾𝗌(σ)i\in{\sf Bad}(\sigma)\subseteq{\sf Res}(\sigma), the assignments σ\sigma and σ\sigma^{\prime} agree on 𝗏𝖺𝗋(Ai){\sf var}(A_{i}); or equivalently σ|𝖱𝖾𝗌(σ)=σ|𝖱𝖾𝗌(σ)\sigma|_{{\sf Res}(\sigma)}=\sigma^{\prime}|_{{\sf Res}(\sigma)}. Together with 𝖡𝖺𝖽(σ)𝖱𝖾𝗌(σ){\sf Bad}(\sigma^{\prime})\subseteq{\sf Res}(\sigma), it implies that 𝖡𝖺𝖽(σ)=𝖡𝖺𝖽(σ){\sf Bad}(\sigma)={\sf Bad}(\sigma^{\prime}) and R0(σ)=R0(σ)R_{0}(\sigma)=R_{0}(\sigma^{\prime}). Moreover, N0(σ)=N0(σ)=N_{0}(\sigma)=N_{0}(\sigma^{\prime})=\emptyset.

For the induction step t>0t>0, we have that Rt1(σ)=Rt1(σ)𝖱𝖾𝗌(σ)R_{t-1}(\sigma)=R_{t-1}(\sigma^{\prime})\subseteq{\sf Res}(\sigma) and Nt1(σ)=Nt1(σ)N_{t-1}(\sigma)=N_{t-1}(\sigma^{\prime}). Let R=Rt1(σ)=Rt1(σ)R=R_{t-1}(\sigma)=R_{t-1}(\sigma^{\prime}) and N=Nt1(σ)=Nt1(σ)N=N_{t-1}(\sigma)=N_{t-1}(\sigma^{\prime}). Then we will go through U=RNU=\partial R\setminus N, which is the same for both σ\sigma and σ\sigma^{\prime}. Moreover, while marking individual events “resampling” or not, it is sufficient to look at only σ|R=σ|R\sigma|_{R}=\sigma^{\prime}|_{R} since R𝖱𝖾𝗌(σ)R\subseteq{\sf Res}(\sigma). Thus the markings are exactly the same, implying that Rt(σ)=Rt(σ)𝖱𝖾𝗌(σ)R_{t}(\sigma)=R_{t}(\sigma^{\prime})\subseteq{\sf Res}(\sigma) and Nt(σ)=Nt(σ)N_{t}(\sigma)=N_{t}(\sigma^{\prime}). ∎

  1. (1)

    Draw independent samples of all variables X1,,XnX_{1},\dots,X_{n} from their respective distributions.

  2. (2)

    While at least one bad event occurs under the current assignment σ\sigma, use Algorithm 5 to find 𝖱𝖾𝗌(σ){\sf Res}(\sigma). Resample all variables in i𝖱𝖾𝗌(σ)𝗏𝖺𝗋(Ai)\bigcup_{i\in{\sf Res}(\sigma)}{\sf var}(A_{i}).

  3. (3)

    When none of the bad events holds, output the current assignment.

Algorithm 6 General Partial Rejection Sampling

To prove the correctness of Algorithm 6, we will only use three properties of 𝖱𝖾𝗌(σ){\sf Res}(\sigma), which are intuitively summarized as follows:

  1. (1)

    𝖡𝖺𝖽(σ)𝖱𝖾𝗌(σ){\sf Bad}(\sigma)\subseteq{\sf Res}(\sigma);

  2. (2)

    For any i𝖱𝖾𝗌(σ)i\in\partial{\sf Res}(\sigma), AiA_{i} is disjoint from the partial assignment of σ\sigma projected on 𝗏𝖺𝗋(Ai)𝗏𝖺𝗋(𝖱𝖾𝗌(σ)){\sf var}(A_{i})\cap{\sf var}({\sf Res}(\sigma)) (Lemma 21);

  3. (3)

    If we fix the partial assignment of σ\sigma projected on 𝗏𝖺𝗋(𝖱𝖾𝗌(σ)){\sf var}({\sf Res}(\sigma)), then the output of Algorithm 5 is fixed, unless there are new bad events occurring outside of 𝖱𝖾𝗌(σ){\sf Res}(\sigma) (Lemma 22).

Similarly to the analysis of Algorithm 2, we call 𝒮=S1,,S\mathcal{S}=S_{1},\dots,S_{\ell} the log, if SiS_{i} is the set of resampling events in step ii of Algorithm 6. Note that for Algorithm 6, the log is not necessarily an independent set sequence. Also, recall that σi\sigma_{i} is the assignment of variables in step ii, and σt=σT\sigma_{t}=\sigma_{T} if TT is when Algorithm 6 terminates and t>Tt>T. The following lemma is an analogue of Lemma 7.

Lemma 23.

Given any log 𝒮\mathcal{S} of length 1\ell\geq 1, σ+1\sigma_{\ell+1} has the product distribution conditioned on none of AiA_{i}’s occurring where iΓ+(S)i\notin\Gamma^{+}(S_{\ell}), namely from μ(i[m]Γ+(S)Ai¯)\mu\big{(}\cdot\mid\bigwedge_{i\in[m]\setminus\Gamma^{+}(S_{\ell})}\overline{A_{i}}\big{)}.

Proof.

Suppose iΓ+(S)i\notin\Gamma^{+}(S_{\ell}). By construction, SS_{\ell} contains all occurring bad events of σ\sigma_{\ell}, and hence AiA_{i} does not occur under σ\sigma_{\ell}. In step \ell, we only resample variables that are involved in SS_{\ell}, so σ+1\sigma_{\ell+1} and σ\sigma_{\ell} agree on 𝗏𝖺𝗋(Ai){\sf var}(A_{i}). Hence AiA_{i} cannot occur under σ+1\sigma_{\ell+1}. Call an assignment σ\sigma valid if none of AiA_{i} occurs where iΓ+(S)i\notin\Gamma^{+}(S_{\ell}). To show that σ+1\sigma_{\ell+1} has the desired conditional product distribution, we will show that the probabilities of getting any two valid assignments σ\sigma and σ\sigma^{\prime} are proportional to their probabilities of occurrence under the product distrbution μ()\mu(\cdot).

Let MM be the resampling table so that the log of Algorithm 6 is 𝒮\mathcal{S} up to round \ell, and σ+1=σ\sigma_{\ell+1}=\sigma. Indeed, since we only care about events up to round +1\ell+1, we may truncate the table so that M={Xi,j1in,  1jji,+1}M=\{X_{i,j}\mid 1\leq i\leq n,\;\;1\leq j\leq j_{i,\ell+1}\}. Let M={Xi,j| 1in,  1jji,+1}M^{\prime}=\{X_{i,j}^{\prime}\;|\;1\leq i\leq n,\;\;1\leq j\leq j_{i,\ell+1}\} be another table where Xi,j=Xi,jX_{i,j}^{\prime}=X_{i,j} if j<ji,+1j<j_{i,\ell+1} for any i[n]i\in[n], and σ=(Xi,ji,+1:1in)\sigma^{\prime}=(X^{\prime}_{i,j_{i,\ell+1}}:1\leq i\leq n) is a valid assignment. In other words, we only change the last assignment (Xi,ji,+1:1in)(X_{i,j_{i,\ell+1}}:1\leq i\leq n) to another valid assignment. We will use σt=(Xi,ji,t)\sigma^{\prime}_{t}=(X^{\prime}_{i,j_{i,t}}) to denote the active elements of the second resampling table at time tt; thus σ=σ+1\sigma^{\prime}=\sigma^{\prime}_{\ell+1}.

The lemma follows if Algorithm 6 running on MM^{\prime} generates the same log 𝒮\mathcal{S} up to round \ell, since, if this is the case, then conditioned on the log 𝒮\mathcal{S}, every possible table MM where σ+1=σ\sigma_{\ell+1}=\sigma is one-to-one correspondence with another table MM^{\prime} where σ+1=σ\sigma_{\ell+1}^{\prime}=\sigma^{\prime}. Hence the probability of getting σ\sigma is proportional to its weight under μ()\mu(\cdot).

Suppose otherwise and the log of running Algorithm 6 on MM and MM^{\prime} differ. Let t0t_{0}\leq\ell be the first round where resampling changes, by which we mean that 𝖱𝖾𝗌(σt0)𝖱𝖾𝗌(σt0){\sf Res}(\sigma_{t_{0}})\neq{\sf Res}(\sigma_{t_{0}}^{\prime}). By Lemma 22, either 𝖡𝖺𝖽(σt0)𝖱𝖾𝗌(σt0){\sf Bad}(\sigma_{t_{0}}^{\prime})\not\subseteq{\sf Res}(\sigma_{t_{0}}), or σt0|𝖱𝖾𝗌(σt0)σt0|𝖱𝖾𝗌(σt0)\sigma_{t_{0}}|_{{\sf Res}(\sigma_{t_{0}})}\neq\sigma_{t_{0}}^{\prime}|_{{\sf Res}(\sigma_{t_{0}})}. In the latter case, there must be a variable XiX_{i} with i𝗏𝖺𝗋(𝖱𝖾𝗌(σt0))i\in{\sf var}({\sf Res}(\sigma_{t_{0}})) and index ji,+1j_{i,\ell+1}. However, i𝗏𝖺𝗋(𝖱𝖾𝗌(σt0))i\in{\sf var}({\sf Res}(\sigma_{t_{0}})) means that XiX_{i} is resampled at least once more in the original run on MM, and its index goes up to at least ji,+1+1j_{i,\ell+1}+1 at round +1\ell+1. A contradiction. Thus, σt0|𝖱𝖾𝗌(σt0)=σt0|𝖱𝖾𝗌(σt0)\sigma_{t_{0}}|_{{\sf Res}(\sigma_{t_{0}})}=\sigma_{t_{0}}^{\prime}|_{{\sf Res}(\sigma_{t_{0}})} and 𝖡𝖺𝖽(σt0)𝖱𝖾𝗌(σt0){\sf Bad}(\sigma_{t_{0}}^{\prime})\not\subseteq{\sf Res}(\sigma_{t_{0}}).

As 𝖡𝖺𝖽(σt0)𝖱𝖾𝗌(σt0){\sf Bad}(\sigma_{t_{0}}^{\prime})\not\subseteq{\sf Res}(\sigma_{t_{0}}), there must be a variable Xi0X_{i_{0}} such that ji0,t0=ji0,+1j_{i_{0},t_{0}}=j_{i_{0},\ell+1} (otherwise Xi0,ji0,t0=Xi0,ji0,t0X_{i_{0},j_{i_{0},t_{0}}}=X_{i_{0},j_{i_{0},t_{0}}}^{\prime}) and an event AkA_{k} such that i0𝗏𝖺𝗋(Ak)i_{0}\in{\sf var}(A_{k}), k𝖡𝖺𝖽(σt0)k\in{\sf Bad}(\sigma_{t_{0}}^{\prime}) but k𝖱𝖾𝗌(σt0)k\not\in{\sf Res}(\sigma_{t_{0}}). Suppose first that i𝗏𝖺𝗋(Ak)\forall i\in{\sf var}(A_{k}), ji,t0=ji,+1j_{i,t_{0}}=j_{i,\ell+1}, which means that all variables of AkA_{k} have reached their final values in the MM run at time t0t_{0}. This implies that kΓ+(St)k\notin\Gamma^{+}(S_{t}) for any tt0t\geq t_{0} as otherwise some of the variables in 𝗏𝖺𝗋(Ak){\sf var}(A_{k}) would be resampled at least once after round t0t_{0}. In particular, kΓ+(S)k\notin\Gamma^{+}(S_{\ell}). This contradicts with σ\sigma^{\prime} being valid.

Otherwise there are some variables in 𝗏𝖺𝗋(Ak){\sf var}(A_{k}) that get resampled after time t0t_{0} in the MM run. Let t1t_{1} be the first such time and Y𝗏𝖺𝗋(Ak)Y\subset{\sf var}(A_{k}) be the set of variables resampled at round t1t_{1}; namely, Y=𝗏𝖺𝗋(Ak)𝗏𝖺𝗋(𝖱𝖾𝗌(σt1))Y={\sf var}(A_{k})\cap{\sf var}({\sf Res}(\sigma_{t_{1}})). We have that σt1|Y=σt0|Y\sigma_{t_{1}}|_{Y}=\sigma_{t_{0}}|_{Y} because t1t_{1} is the first time of resampling variables in YY. Moreover, as variables of YY have not reached their final values yet in the MM run, σt0|Y=σt0|Y\sigma_{t_{0}}|_{Y}=\sigma_{t_{0}}^{\prime}|_{Y}. Thus, σt1|Y=σt0|Y\sigma_{t_{1}}|_{Y}=\sigma_{t_{0}}^{\prime}|_{Y}.

Assuming k𝖱𝖾𝗌(σt1)k\in{\sf Res}(\sigma_{t_{1}}) would contradict the fact that Xi0X_{i_{0}} has reached its final value in the MM run. Hence k𝖱𝖾𝗌(σt1)k\notin{\sf Res}(\sigma_{t_{1}}), but nevertheless variables in Y𝗏𝖺𝗋(Ak)Y\subset{\sf var}(A_{k}) are resampled. This implies that k𝖱𝖾𝗌(σt1)k\in\partial{\sf Res}(\sigma_{t_{1}}). By Lemma 21, Akσt1|𝖱𝖾𝗌(σt1)A_{k}\perp\sigma_{t_{1}}|_{{\sf Res}(\sigma_{t_{1}})}. As 𝗏𝖺𝗋(Ak){\sf var}(A_{k}) cannot be disjoint from 𝗏𝖺𝗋(𝖱𝖾𝗌(σt1)){\sf var}({\sf Res}(\sigma_{t_{1}})), this means that AkA_{k} is imcompartible with the partial assignment of σt1\sigma_{t_{1}} restricted to 𝗏𝖺𝗋(Ak)𝗏𝖺𝗋(𝖱𝖾𝗌(σt1))=Y{\sf var}(A_{k})\cap{\sf var}({\sf Res}(\sigma_{t_{1}}))=Y. Equivalently, Akσt1|YA_{k}\perp\sigma_{t_{1}}|_{Y}. However we know that σt1|Y=σt0|Y\sigma_{t_{1}}|_{Y}=\sigma_{t_{0}}^{\prime}|_{Y}, so Akσt0|YA_{k}\perp\sigma_{t_{0}}^{\prime}|_{Y}, contradicting k𝖡𝖺𝖽(σt0)k\in{\sf Bad}(\sigma_{t_{0}}^{\prime}). ∎

Theorem 24.

If Algorithm 6 halts, then its output has the product distribution conditioned on none of AiA_{i}’s occurring.

Proof.

Let a sequence 𝒮\mathcal{S} of sets of events be the log of any successful run. Then S=S_{\ell}=\emptyset. By Lemma 23, conditioned on the log 𝒮\mathcal{S}, the output assignment σ\sigma is μ(i[m]Γ+(S)Ai¯)=μ(i[m]Ai¯)\mu\big{(}\cdot\mid\bigwedge_{i\in[m]\setminus\Gamma^{+}(S_{\ell})}\overline{A_{i}}\big{)}=\mu\big{(}\cdot\mid\bigwedge_{i\in[m]}\overline{A_{i}}\big{)}. This is valid for any possible log, and the theorem follows. ∎

6. Running Time Analysis of Algorithm 6

Obviously when there is no assignment avoiding all bad events, then Algorithm 6 will never halt. Thus we want to assume some conditions to guarantee a desired assignment. However, the optimal condition of Theorem 2 is quite difficult to work under. Instead, in this section we will be working under the assumption that the asymmetric LLL condition (1) holds. In fact, to make the presentation clean, we will mostly work with the simpler symmetric case.

However, as mentioned in Section 4.3, [4, Corollary 30] showed that even under the asymmetric LLL condition (1), sampling can still be NP-hard. We thus in turn look for further conditions to make Algorithm 6 efficient.

Recall that μ()\mu(\cdot) is the product distribution of sampling all variables independently. For two distinct events AiAjA_{i}\sim A_{j}, let RijR_{ij} be the event that the partial assignments on 𝗏𝖺𝗋(Ai)𝗏𝖺𝗋(Aj){\sf var}(A_{i})\cap{\sf var}(A_{j}) can be extended to an assignment making AjA_{j} true. Thus, if Ai⟂̸σ|SA_{i}\not\perp\sigma|_{S} for some event set SS, then RjiR_{ji} must hold for all AjSA_{j}\in S and AjAiA_{j}\sim A_{i}. Conversely, it is possible that each individual RjiR_{ji} is true for all AjRA_{j}\in R and AjAiA_{j}\sim A_{i}, and yet Aiσ|SA_{i}\perp\sigma|_{S}. Also note that RijR_{ij} is not necessarily the same as RjiR_{ji}. Let rij:=μ(Rij)r_{ij}:=\mu(R_{ij}).

Define p:=maxi[m]pi\displaystyle p:=\max_{i\in[m]}p_{i} and r:=maxAiAj,ijrij\displaystyle r:=\max_{A_{i}\sim A_{j},\;i\neq j}r_{ij}. Let Δ\Delta be the maximum degree of the dependency graph GG. The main result of the section is the following theorem.

Theorem 25.

Let mm be the number of events and nn be the number of variables. For any Δ2\Delta\geq 2, if 6epΔ216ep\Delta^{2}\leq 1 and 3erΔ13er\Delta\leq 1, then the expected number of resampled events of Algorithm 6 is O(m)O(m).

Moreover, when these conditions hold, the number of rounds is O(logm)O(\log m) and the number of variable resamples is O(nlogm)O(n\log m), both in expectation and with high probability.

The first condition 6epΔ216ep\Delta^{2}\leq 1 is stronger than the condition of the symmetric Lovász Local Lemma, but this seems necessary since [4, Corollary 30] implies that if pΔ2Cp\Delta^{2}\geq C for some constant CC then the sampling problem is NP-hard. Intuitively, the second condition 3erΔ13er\Delta\leq 1 bounds the expansion from bad events to resampling events at every step of Algorithm 6. We will prove Theorem 25 in the rest of the section.

Let S[m]S\subseteq[m] be a subset of vertices of the dependency graph GG. Recall that A(S)A(S) is the event iSAi\bigwedge_{i\in S}A_{i} and B(S)B(S) is the event iSAi¯\bigwedge_{i\in S}\overline{A_{i}}. Moreover, ScS^{c} is the complement of SS, namely Sc=[m]SS^{c}=[m]\setminus S, and SeS^{e} is the “exterior” of SS, namely Se=[m]Γ+(S)S^{e}=[m]\setminus\Gamma^{+}(S).

Lemma 23 implies that if we resample SS at some step tt of Algorithm 6, then at step t+1t+1 the distribution is the product measure μ\mu conditioned on none of the events in the exterior of SS holds; namely Prμ(B(Se))\mathop{\mathrm{Pr}}\nolimits_{\mu}(\cdot\mid B(S^{e})).

Let EE be an event (not necessarily one of AiA_{i}) depending on a set 𝗏𝖺𝗋(E){\sf var}(E) of variables. Let Γ(E):={ii[m],𝗏𝖺𝗋(Ai)𝗏𝖺𝗋(E)}\Gamma(E):=\{i\mid i\in[m],\;{\sf var}(A_{i})\cap{\sf var}(E)\neq\emptyset\} if EE is not one of AiA_{i}, and Γ(Ai):={jj[m],ji and 𝗏𝖺𝗋(Aj)𝗏𝖺𝗋(Ai)}\Gamma(A_{i}):=\{j\mid j\in[m],\;j\not=i\text{ and }{\sf var}(A_{j})\cap{\sf var}(A_{i})\neq\emptyset\} is defined as usual. Let S[m]S\subseteq[m] be a subset of vertices of GG. The next lemma bounds the probability of EE conditioned on none of the events in SS happening. It was first observed in [19]. We include a proof for completeness (which is a simple adaption of the ordinary local lemma proof).

Lemma 26 ([19, Theorem 2.1]).

Suppose (1) holds. For an event EE and any set S[m]S\subseteq[m],

Prμ(EB(S))Prμ(E)iΓ(E)S(1xi)1,\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(E\mid B(S))\leq\mathop{\mathrm{Pr}}\nolimits_{\mu}(E)\prod_{i\in\Gamma(E)\cap S}(1-x_{i})^{-1},

where xix_{i}’s are from (1).

Proof.

We prove the inequality by induction on the size of SS. The base case is when SS is empty and the lemma holds trivially.

For the induction step, let S1=SΓ(E)S_{1}=S\cap\Gamma(E) and S2=SS1S_{2}=S\setminus S_{1}. If S1=S_{1}=\emptyset, then the lemma holds trivially as EE is independent from SS in this case. Otherwise S2S_{2} is a proper subset of SS. We have that

Prμ(EB(S))\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(E\mid B(S)) =Prμ(EB(S1)B(S2))Prμ(B(S1)B(S2))\displaystyle=\frac{\mathop{\mathrm{Pr}}\nolimits_{\mu}(E\wedge B(S_{1})\mid B(S_{2}))}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{1})\mid B(S_{2}))}
Prμ(EB(S2))Prμ(B(S1)B(S2))\displaystyle\leq\frac{\mathop{\mathrm{Pr}}\nolimits_{\mu}(E\mid B(S_{2}))}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{1})\mid B(S_{2}))}
=Prμ(E)Prμ(B(S1)B(S2)),\displaystyle=\frac{\mathop{\mathrm{Pr}}\nolimits_{\mu}(E)}{\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{1})\mid B(S_{2}))},

where the last line is because EE is independent from B(S2)B(S_{2}). We then use the induction hypothesis to bound the denominator. Suppose S1={j1,j2,,jr}S_{1}=\{j_{1},j_{2},\dots,j_{r}\} for some r>0r>0. Then,

Prμ(B(S1)B(S2))\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{1})\mid B(S_{2})) =Prμ(iS1Ai¯|iS2Ai¯)\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(\bigwedge_{i\in S_{1}}\overline{A_{i}}\;\Bigg{|}\bigwedge_{i\in S_{2}}\overline{A_{i}}\right)
=t=1rPrμ(Ajt¯|s=1t1Ajs¯iS2Ai¯)\displaystyle=\prod_{t=1}^{r}\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(\overline{A_{j_{t}}}\;\Bigg{|}\bigwedge_{s=1}^{t-1}\overline{A_{j_{s}}}\wedge\bigwedge_{i\in S_{2}}\overline{A_{i}}\right)
=t=1r(1Prμ(Ajt|s=1t1Ajs¯iS2Ai¯)).\displaystyle=\prod_{t=1}^{r}\left(1-\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(A_{j_{t}}\;\Bigg{|}\bigwedge_{s=1}^{t-1}\overline{A_{j_{s}}}\wedge\bigwedge_{i\in S_{2}}\overline{A_{i}}\right)\right).

By the induction hypothesis and (1), we have that for any 1tr1\leq t\leq r,

Prμ(Ajt|s=1t1Ajs¯iS2Ai¯)\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(A_{j_{t}}\;\Bigg{|}\bigwedge_{s=1}^{t-1}\overline{A_{j_{s}}}\wedge\bigwedge_{i\in S_{2}}\overline{A_{i}}\right) Prμ(Ajt)iΓ(jt)(1xi)1\displaystyle\leq\mathop{\mathrm{Pr}}\nolimits_{\mu}(A_{j_{t}})\prod_{i\in\Gamma(j_{t})}(1-x_{i})^{-1}
xjtiΓ(jt)(1xi)iΓ(jt)(1xi)1\displaystyle\leq x_{j_{t}}\prod_{i\in\Gamma(j_{t})}(1-x_{i})\prod_{i\in\Gamma(j_{t})}(1-x_{i})^{-1}
=xjt.\displaystyle=x_{j_{t}}.

Thus,

Prμ(B(S1)B(S2))\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(B(S_{1})\mid B(S_{2})) iS1(1xi).\displaystyle\geq\prod_{i\in S_{1}}\left(1-x_{i}\right).

The lemma follows. ∎

Typically we set xi=1Δ+1x_{i}=\frac{1}{\Delta+1} in the symmetric setting. Then (1) holds if ep(Δ+1)1ep(\Delta+1)\leq 1. In this setting, Lemma 26 is specialized into the following.

Corollary 27.

If ep(Δ+1)1ep(\Delta+1)\leq 1, then

Prμ(EB(S))Prμ(E)(1+1Δ)|Γ(E)|.\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(E\mid B(S))\leq\mathop{\mathrm{Pr}}\nolimits_{\mu}(E)\left(1+\frac{1}{\Delta}\right)^{\left|\Gamma(E)\right|}.

In particular, if ep(Δ+1)1ep(\Delta+1)\leq 1, for any event AiA_{i} where iSi\not\in S, by Corollary 27,

(15) Prμ(AiB(S))\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(A_{i}\mid B(S)) pi(Δ+1Δ)Δep.\displaystyle\leq p_{i}\left(\frac{\Delta+1}{\Delta}\right)^{\Delta}\leq ep.

Let 𝖱𝖾𝗌t{\sf Res}_{t} be the resampling set of Algorithm 6 at round t1t\geq 1, and let 𝖡𝖺𝖽t{\sf Bad}_{t} be the set of bad events present at round tt. If Algorithm 6 has already stopped at round tt, then 𝖱𝖾𝗌t=𝖡𝖺𝖽t={\sf Res}_{t}={\sf Bad}_{t}=\emptyset. Furthermore, let 𝖡𝖺𝖽0=𝖱𝖾𝗌0=[m]{\sf Bad}_{0}={\sf Res}_{0}=[m] since in the first step all random variables are fresh.

Let C:=1pC:=1-p.

Lemma 28.

For any Δ2\Delta\geq 2, if 6epΔ216ep\Delta^{2}\leq 1 and 3erΔ13er\Delta\leq 1, then,

𝔼(|𝖱𝖾𝗌t+1|𝖱𝖾𝗌0,,𝖱𝖾𝗌t)C|𝖱𝖾𝗌t|.\displaystyle\mathop{\mathbb{{}E}}\nolimits\left(\left|{\sf Res}_{t+1}\right|\mid{\sf Res}_{0},\dots,{\sf Res}_{t}\right)\leq C\left|{\sf Res}_{t}\right|.
Proof.

Clearly for any Δ2\Delta\geq 2, the condition 6epΔ216ep\Delta^{2}\leq 1 implies that ep(Δ+1)1ep(\Delta+1)\leq 1. Therefore the prerequisite of Corollary 27 is met. Notice that, by Lemma 23, we have that

𝔼(|𝖱𝖾𝗌t+1|𝖱𝖾𝗌0,,𝖱𝖾𝗌t)=𝔼(|𝖱𝖾𝗌t+1|𝖱𝖾𝗌t).\displaystyle\mathop{\mathbb{{}E}}\nolimits\left(\left|{\sf Res}_{t+1}\right|\mid{\sf Res}_{0},\dots,{\sf Res}_{t}\right)=\mathop{\mathbb{{}E}}\nolimits\left(\left|{\sf Res}_{t+1}\right|\mid{\sf Res}_{t}\right).

We will show in the following that

𝔼(|𝖱𝖾𝗌t+1||the set of resampling events at round t is (exactly) 𝖱𝖾𝗌t)C|𝖱𝖾𝗌t|,\displaystyle\mathop{\mathbb{{}E}}\nolimits\big{(}\left|{\sf Res}_{t+1}\right|\bigm{|}\text{the set of resampling events at round $t$ is (exactly) ${\sf Res}_{t}$}\big{)}\leq C\left|{\sf Res}_{t}\right|,

where C=1pC=1-p. This implies the lemma.

Call a path i0,i1,,ii_{0},i_{1},\dots,i_{\ell} where 0\ell\geq 0 in the dependency graph GG bad if the following holds:

  1. (1)

    i0𝖡𝖺𝖽t+1i_{0}\in{\sf Bad}_{t+1};

  2. (2)

    the event Rik1ikR_{i_{k-1}i_{k}} holds for every 1k1\leq k\leq\ell;

  3. (3)

    any iki_{k} (k[]k\in[\ell]) is not adjacent to iki_{k^{\prime}} unless k=k1k^{\prime}=k-1 or k+1k+1.

Indeed, paths having the third property are induced paths in GG. If i𝖱𝖾𝗌t+1i\in{\sf Res}_{t+1}, AiA_{i} must be added by Algorithm 5 during some iteration of the while loop. In the 0th iteration, all of 𝖡𝖺𝖽t+1{\sf Bad}_{t+1} are added. We claim that for any i𝖱𝖾𝗌t+1i\in{\sf Res}_{t+1} added in 0\ell\geq 0 iteration by Algorithm 5, there exists at least one bad path such that i0,i1,,i=ii_{0},i_{1},\dots,i_{\ell}=i. We show the claim by an induction on \ell.

  • The base case is that =0\ell=0, and thus i𝖡𝖺𝖽t+1i\in{\sf Bad}_{t+1}. The bad path is simply ii itself.

  • For the induction step 1\ell\geq 1, due to Algorithm 5, there must exist i1i_{\ell-1} adjacent to i=ii_{\ell}=i such that i1i_{\ell-1} has been marked “resampling” during iteration 1\ell-1, and Ri1iR_{i_{\ell-1}i_{\ell}} occurs. By the induction hypothesis, there exists a bad path i0,,i1i_{0},\dots,i_{\ell-1}. Since ii is not marked at iteration 1\ell-1, ii is not adjacent to any vertices that has been marked up to iteration 2\ell-2. Thus ii_{\ell} is not adjacent to any iki_{k} where k2k\leq\ell-2, and the path i0,,i1,ii_{0},\dots,i_{\ell-1},i_{\ell} is bad.

We next turn to bounding the number of bad paths. It is straightforward to bound the size of 𝖡𝖺𝖽t+1Γ+(𝖱𝖾𝗌t){\sf Bad}_{t+1}\subseteq\Gamma^{+}({\sf Res}_{t}). If i𝖡𝖺𝖽t+1i\in{\sf Bad}_{t+1}, then there are two possibilities. The first scenario is that i𝖱𝖾𝗌ti\in{\sf Res}_{t} and then all of its random variables are fresh. In this case it occurs with probability pipp_{i}\leq p. Otherwise i𝖱𝖾𝗌ti\in\partial{\sf Res}_{t}. Recall that by Lemma 23, the distribution at round t+1t+1 is Prμ(B(𝖱𝖾𝗌te))\mathop{\mathrm{Pr}}\nolimits_{\mu}(\cdot\mid B({\sf Res}_{t}^{e})). By Corollary 27, for any i𝖱𝖾𝗌ti\in\partial{\sf Res}_{t},

Prμ(AiB(𝖱𝖾𝗌te))p(1+1Δ)Δep.\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}\left(A_{i}\mid B({\sf Res}_{t}^{e})\right)\leq p\left(1+\frac{1}{\Delta}\right)^{\Delta}\leq ep.

This implies that

𝔼(|𝖡𝖺𝖽t+1|the set of resampling events at round t is (exactly) 𝖱𝖾𝗌t)\displaystyle\mathop{\mathbb{{}E}}\nolimits\left(\left|{\sf Bad}_{t+1}\right|\mid\text{the set of resampling events at round $t$ is (exactly) ${\sf Res}_{t}$}\right)
(16) \displaystyle\leq p|𝖱𝖾𝗌t|+ep|𝖱𝖾𝗌t|p(1+eΔ)|𝖱𝖾𝗌t|.\displaystyle\;p\left|{\sf Res}_{t}\right|+ep\left|\partial{\sf Res}_{t}\right|\leq p(1+e\Delta)\left|{\sf Res}_{t}\right|.

Next we bound the size of |𝖱𝖾𝗌t+1𝖡𝖺𝖽t+1|\left|{\sf Res}_{t+1}\setminus{\sf Bad}_{t+1}\right|. Let P=i0,,iP=i_{0},\dots,i_{\ell} be an induced path; that is, for any k[]k\in[\ell], iki_{k} is not adjacent to iki_{k^{\prime}} unless k=k1k^{\prime}=k-1 or k+1k+1. Only induced paths are potentially bad. Moreover, PP contributes to |𝖱𝖾𝗌t+1𝖡𝖺𝖽t+1|\left|{\sf Res}_{t+1}\setminus{\sf Bad}_{t+1}\right| only if its length 1\ell\geq 1. Let DPD_{P} be the event that PP is bad. In other words, DP:=Ai0Ri0i1Ri1iD_{P}:=A_{i_{0}}\wedge R_{i_{0}i_{1}}\wedge\dots\wedge R_{i_{\ell-1}i_{\ell}}. By Lemma 23, we have that

Pr(P is bad at round t+1the set of resampling events at round t is 𝖱𝖾𝗌t)\displaystyle\;\mathop{\mathrm{Pr}}\nolimits(P\text{ is bad at round $t+1$}\mid\text{the set of resampling events at round $t$ is ${\sf Res}_{t}$})
(17) =\displaystyle= Prμ(DPB(𝖱𝖾𝗌te)),\displaystyle\;\mathop{\mathrm{Pr}}\nolimits_{\mu}(D_{P}\mid B({\sf Res}_{t}^{e})),

where we recall that we denote 𝖱𝖾𝗌te=\inbmΓ+(𝖱𝖾𝗌t){\sf Res}_{t}^{e}=\inb{m}\setminus\Gamma^{+}({\sf Res}_{t}). Applying Corollary 27 with S=𝖱𝖾𝗌teS={\sf Res}_{t}^{e}, we have that

(18) Prμ(DPB(𝖱𝖾𝗌te))Prμ(DP)(1+1Δ)|Γ(DP)|.\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(D_{P}\mid B({\sf Res}_{t}^{e}))\leq\mathop{\mathrm{Pr}}\nolimits_{\mu}(D_{P})\left(1+\frac{1}{\Delta}\right)^{\left|\Gamma(D_{P})\right|}.

Note that Γ(Rikik+1)Γ+(Aik)\Gamma(R_{i_{k}i_{k+1}})\subseteq\Gamma^{+}(A_{i_{k}}) for all 0k10\leq k\leq\ell-1. By the definition of DPD_{P},

Γ(DP)\displaystyle\Gamma(D_{P}) Γ(Ai0)Γ(Ri0i1)Γ(Ri1i)\displaystyle\subseteq\Gamma(A_{i_{0}})\cup\Gamma(R_{i_{0}i_{1}})\cup\dots\cup\Gamma(R_{i_{\ell-1}i_{\ell}})
Γ+(Ai0)Γ+(Ai1)Γ+(Ai1),\displaystyle\subseteq\Gamma^{+}(A_{i_{0}})\cup\Gamma^{+}(A_{i_{1}})\cup\dots\cup\Gamma^{+}(A_{i_{\ell-1}}),

implying that

(19) |Γ(DP)|\displaystyle\left|\Gamma(D_{P})\right| (Δ+1),\displaystyle\leq\ell(\Delta+1),

as |Γ+(Ak)|Δ+1\left|\Gamma^{+}(A_{k})\right|\leq\Delta+1 for all 0k10\leq k\leq\ell-1.

We claim that Ai0A_{i_{0}} is independent from Rik1ikR_{i_{k-1}i_{k}} for any 2k2\leq k\leq\ell. This is because iki_{k} is not adjacent to i0i_{0} for any k2k\geq 2, implying that

𝗏𝖺𝗋(Rik1ik)𝗏𝖺𝗋(Ai0)\displaystyle{\sf var}(R_{i_{k-1}i_{k}})\cap{\sf var}(A_{i_{0}}) =𝗏𝖺𝗋(Aik1)𝗏𝖺𝗋(Aik)𝗏𝖺𝗋(Ai0)\displaystyle={\sf var}(A_{i_{k-1}})\cap{\sf var}(A_{i_{k}})\cap{\sf var}(A_{i_{0}})
𝗏𝖺𝗋(Aik)𝗏𝖺𝗋(Ai0)=.\displaystyle\subseteq{\sf var}(A_{i_{k}})\cap{\sf var}(A_{i_{0}})=\emptyset.

Moreover, any two events Rik1ikR_{i_{k-1}i_{k}} and Rik1ikR_{i_{k^{\prime}-1}i_{k^{\prime}}} are independent of each other as long as k<kk<k^{\prime}. This is also due to the third property of bad paths. Since k<kk<k^{\prime}, we see that |k(k1)|2\left|k^{\prime}-(k-1)\right|\geq 2 and iki_{k^{\prime}} is not adjacent to ik1i_{k-1}. It implies that

𝗏𝖺𝗋(Rik1ik)𝗏𝖺𝗋(Rik1ik)\displaystyle{\sf var}(R_{i_{k-1}i_{k}})\cap{\sf var}(R_{i_{k^{\prime}-1}i_{k^{\prime}}}) =𝗏𝖺𝗋(Aik1)𝗏𝖺𝗋(Aik)𝗏𝖺𝗋(Aik1)𝗏𝖺𝗋(Aik)\displaystyle={\sf var}(A_{i_{k-1}})\cap{\sf var}(A_{i_{k}})\cap{\sf var}(A_{i_{k^{\prime}-1}})\cap{\sf var}(A_{i_{k^{\prime}}})
𝗏𝖺𝗋(Aik1)𝗏𝖺𝗋(Aik)=.\displaystyle\subseteq{\sf var}(A_{i_{k-1}})\cap{\sf var}(A_{i_{k^{\prime}}})=\emptyset.

The consequence of these independences is

Prμ(DP)\displaystyle\mathop{\mathrm{Pr}}\nolimits_{\mu}(D_{P}) Prμ(Ai0Ri1i2Ri1i)\displaystyle\leq\mathop{\mathrm{Pr}}\nolimits_{\mu}(A_{i_{0}}\wedge R_{i_{1}i_{2}}\wedge\dots\wedge R_{i_{\ell-1}i_{\ell}})
=Prμ(Ai0)k=2Prμ(Rik1ik)\displaystyle=\mathop{\mathrm{Pr}}\nolimits_{\mu}(A_{i_{0}})\prod_{k=2}^{\ell}\mathop{\mathrm{Pr}}\nolimits_{\mu}(R_{i_{k-1}i_{k}})
(20) pr1.\displaystyle\leq pr^{\ell-1}.

Note that in the calculation above we ignore Ri0i1R_{i_{0}i_{1}} as it can be positively correlated to Ai0A_{i_{0}}.

Combining (6), (18), (19), and (20), we have that

Pr(DPthe set of resampling events at round t is (exactly) 𝖱𝖾𝗌t)\displaystyle\mathop{\mathrm{Pr}}\nolimits(D_{P}\mid\text{the set of resampling events at round $t$ is (exactly) ${\sf Res}_{t}$})
(21) \displaystyle\leq pr1(1+1Δ)(Δ+1)pr((1+1Δ)er).\displaystyle\;pr^{\ell-1}\left(1+\frac{1}{\Delta}\right)^{\ell(\Delta+1)}\leq\frac{p}{r}\left(\left(1+\frac{1}{\Delta}\right)er\right)^{\ell}.

In order to apply a union bound on all bad paths, we need to bound their number. The first vertex i0i_{0} must be in 𝖡𝖺𝖽t+1{\sf Bad}_{t+1}, implying that i0Γ+(𝖱𝖾𝗌t)i_{0}\in\Gamma^{+}({\sf Res}_{t}). Hence there are at most (Δ+1)|𝖱𝖾𝗌t|(\Delta+1)\left|{\sf Res}_{t}\right| choices. Then there are at most Δ\Delta choices of i1i_{1} and (Δ1)(\Delta-1) choices of every subsequent iki_{k} where k2k\geq 2. Hence, there are at most Δ(Δ1)1\Delta(\Delta-1)^{\ell-1} induced paths of length 1\ell\geq 1, originating from a particular i0Γ+(𝖱𝖾𝗌t)i_{0}\in\Gamma^{+}({\sf Res}_{t}). Thus, by a union bound on all potentially bad paths and (6),

𝔼(|𝖱𝖾𝗌t+1𝖡𝖺𝖽t+1||the set of resampling events at round t is (exactly) 𝖱𝖾𝗌t)\displaystyle\mathop{\mathbb{{}E}}\nolimits\big{(}\left|{\sf Res}_{t+1}\setminus{\sf Bad}_{t+1}\right|\bigm{|}\text{the set of resampling events at round $t$ is (exactly) ${\sf Res}_{t}$}\big{)}
\displaystyle\leq =1(Δ+1)|𝖱𝖾𝗌t|Δ(Δ1)1p/r((1+1Δ)er)\displaystyle\;\sum_{\ell=1}^{\infty}(\Delta+1)\left|{\sf Res}_{t}\right|\Delta(\Delta-1)^{\ell-1}p/r\left(\left(1+\frac{1}{\Delta}\right)er\right)^{\ell}
=\displaystyle= (Δ+1)Δp(Δ1)r|𝖱𝖾𝗌t|=1((Δ21Δ)er)\displaystyle\;\frac{(\Delta+1)\Delta p}{(\Delta-1)r}\left|{\sf Res}_{t}\right|\sum_{\ell=1}^{\infty}\left(\left(\frac{\Delta^{2}-1}{\Delta}\right)er\right)^{\ell}
\displaystyle\leq (Δ+1)Δp(Δ1)r|𝖱𝖾𝗌t|=1(erΔ)=(Δ+1)Δp(Δ1)rerΔ1erΔ|𝖱𝖾𝗌t|\displaystyle\;\frac{(\Delta+1)\Delta p}{(\Delta-1)r}\left|{\sf Res}_{t}\right|\sum_{\ell=1}^{\infty}\left(er\Delta\right)^{\ell}=\frac{(\Delta+1)\Delta p}{(\Delta-1)r}\cdot\frac{er\Delta}{1-er\Delta}\left|{\sf Res}_{t}\right|
(22) \displaystyle\leq Δ+1Δ132epΔ2|𝖱𝖾𝗌t|,\displaystyle\;\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot ep\Delta^{2}\left|{\sf Res}_{t}\right|,

where we use the condition that erΔ1/3er\Delta\leq 1/3.

Combining (16) and (22), we have that

𝔼(|𝖱𝖾𝗌t+1|the set of resampling events at round t is (exactly) 𝖱𝖾𝗌t)\displaystyle\mathop{\mathbb{{}E}}\nolimits\left(\left|{\sf Res}_{t+1}\right|\mid\text{the set of resampling events at round $t$ is (exactly) ${\sf Res}_{t}$}\right)
\displaystyle\leq Δ+1Δ132epΔ2|𝖱𝖾𝗌t|+p(1+eΔ)|𝖱𝖾𝗌t|\displaystyle\;\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot ep\Delta^{2}\left|{\sf Res}_{t}\right|+p(1+e\Delta)\left|{\sf Res}_{t}\right|
=\displaystyle= p(Δ+1Δ132eΔ2+(1+eΔ))|𝖱𝖾𝗌t|.\displaystyle\;p\left(\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot e\Delta^{2}+(1+e\Delta)\right)\left|{\sf Res}_{t}\right|.

All that is left is to verify that

p(Δ+1Δ132eΔ2+(1+eΔ))C,\displaystyle p\left(\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot e\Delta^{2}+(1+e\Delta)\right)\leq C,

where C=1pC=1-p. This is straightforward by the condition 6epΔ216ep\Delta^{2}\leq 1 and Δ2\Delta\geq 2, as

Cp(Δ+1Δ132eΔ2+(1+eΔ))\displaystyle C-p\left(\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot e\Delta^{2}+(1+e\Delta)\right) 6epΔ2pp(Δ+1Δ132eΔ2+(1+eΔ))\displaystyle\geq 6ep\Delta^{2}-p-p\left(\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot e\Delta^{2}+(1+e\Delta)\right)
p(6eΔ21Δ+1Δ132eΔ2(1+eΔ))0.\displaystyle\geq p\left(6e\Delta^{2}-1-\frac{\Delta+1}{\Delta-1}\cdot\frac{3}{2}\cdot e\Delta^{2}-(1+e\Delta)\right)\geq 0.\qed

For t1t\geq 1, by Lemma 28 and the law of iterated expectations,

𝔼|𝖱𝖾𝗌t|C𝔼|Rt1|.\displaystyle\mathop{\mathbb{{}E}}\nolimits\left|{\sf Res}_{t}\right|\leq C\mathop{\mathbb{{}E}}\nolimits\left|R_{t-1}\right|.

Thus, 𝔼|𝖱𝖾𝗌t|Ct|R0|=Ctm\mathop{\mathbb{{}E}}\nolimits\left|{\sf Res}_{t}\right|\leq C^{t}\left|R_{0}\right|=C^{t}m. As C<1C<1, the expected number of resampling events is

t=0𝔼|𝖱𝖾𝗌t|t=0Ctm=11Cm.\displaystyle\sum_{t=0}^{\infty}\mathop{\mathbb{{}E}}\nolimits\left|{\sf Res}_{t}\right|\leq\sum_{t=0}^{\infty}C^{t}m=\frac{1}{1-C}\cdot m.

This implies the first part of Theorem 25. For the second part, just observe that after O(logm)O(\log m) rounds, the expected number of bad events is less than mcm^{-c} for any constant cc, and Markov inequality applies.

The first condition of Theorem 25 requires pp to be roughly O(Δ2)O(\Delta^{-2}). This is necessary, due to the hardness result in [4] (see also Theorem 32). Also, in the analysis, it is possible to always add all of 𝖡𝖺𝖽t\partial{\sf Bad}_{t} into 𝖱𝖾𝗌t{\sf Res}_{t}. Consider a monotone CNF formula. If a clause is unsatisfied, then all of its neighbours need to be added into the resampling set. Such behaviours would eventually lead to the O(Δ2)O(\Delta^{-2}) bound. This situation is in contrast to the resampling algorithm of Moser and Tardos [31], which only requires p=O(Δ1)p=O(\Delta^{-1}) as in the symmetric Lovász Local Lemma.

Also, we note that monotone CNF formulas, in which all correlations are positive, seem to be the worst instances for our algorithms. In particular, Algorithm 6 is exponentially slow when the underlying hypergraph of the monotone CNF is a (hyper-)tree. This indicates that our condition on rr in Theorem 25 is necessary for Algorithm 6. In contrast, Hermon et al. [24] show that on a linear hypergraph (including the hypertree), the Markov chain mixes rapidly for degrees higher than the general bound. It is unclear how to combine the advantages from these two approaches.

7. Applications of Algorithm 6

7.1. kk-CNF Formulas

Consider a kk-CNF formula where every variable appears in at most dd clauses. Then Theorem 1 says that if d2k/(ek)+1d\leq 2^{k}/(ek)+1, then there exists a satisfying assignment. However, as mentioned in Section 4.3, [4, Corollary 30] showed that when d52k/2d\geq 5\cdot 2^{k/2}, then sampling satisfying assignments is NP-hard, even restricted to monotone formulas.

To apply Algorithm 6 in this setting, we need to bound the parameter rr in Theorem 25. A natural way is to lower bound the number of shared variables between any two dependent clauses. If this lower bound is ss, then r=2sr=2^{-s} since there is a unique assignment on these ss variables that can be extended in such a way as to falsify the clauses.

Definition 29.

Let d2d\geq 2 and s1s\geq 1. A kk-CNF formula is said to have degree dd if every variable appears in at most dd clauses. Moreover, it has intersection ss if for any two clauses CiC_{i} and CjC_{j} that share at least one variable, |𝗏𝖺𝗋(Ci)𝗏𝖺𝗋(Cj)|s\left|{\sf var}(C_{i})\cap{\sf var}(C_{j})\right|\geq s.

Note that by the definition if k<sk<s then the formula is simply isolated clauses. Otherwise, ksk\geq s and we have that pi=p=2kp_{i}=p=2^{-k} and r2sr\leq 2^{-s}. A simple double counting argument indicates that the maximum degree Δ\Delta in the dependency graph satisfies Δdks\Delta\leq\frac{dk}{s}.

We claim that for integers dd and kk such that d3d\geq 3 and dk23edk\geq 2^{3e}, conditions d2k/26ed\leq\frac{2^{k/2}}{6e} and smin{log2dk,k/2}s\geq\min\{\log_{2}dk,k/2\} imply the conditions of Theorem 25, namely, 6epΔ216ep\Delta^{2}\leq 1 and 3erΔ13er\Delta\leq 1. In fact, if slog2dklog2ds\geq\log_{2}dk\geq\log_{2}d, then

6epΔ26e2k(dks)26e2k(dklog2d)26e(k6e(k/2log26e))2<1,\displaystyle 6ep\Delta^{2}\leq 6e2^{-k}\left(\frac{dk}{s}\right)^{2}\leq 6e2^{-k}\left(\frac{dk}{\log_{2}d}\right)^{2}\leq 6e\left(\frac{k}{6e\left(k/2-\log_{2}6e\right)}\right)^{2}<1,

as dlog2d\frac{d}{\log_{2}d} is increasing for any d3d\geq 3. Moreover,

3erΔ3edk2ss3elog2(dk)1.\displaystyle 3er\Delta\leq\frac{3edk}{2^{s}s}\leq\frac{3e}{\log_{2}(dk)}\leq 1.

Otherwise k/2slog2dkk/2\leq s\leq\log_{2}dk, which implies that

6epΔ26e2k(dks)26e2k(dkk/2)26e2k(2k/23e)2<1,\displaystyle 6ep\Delta^{2}\leq 6e2^{-k}\left(\frac{dk}{s}\right)^{2}\leq 6e2^{-k}\left(\frac{dk}{k/2}\right)^{2}\leq 6e2^{-k}\left(\frac{2^{k/2}}{3e}\right)^{2}<1,

and

3erΔ3edk2ss6edkk2k/21.\displaystyle 3er\Delta\leq\frac{3edk}{2^{s}s}\leq\frac{6edk}{k2^{k/2}}\leq 1.

Thus by Theorem 25 we have the following result. Note that resampling a clause involves at most kk variables, and for kk-CNF formulas with degree dd, the number of clauses is linear in the number of variables.

Corollary 30.

For integers dd and kk such that d3d\geq 3 and dk23edk\geq 2^{3e}, if d16e2k/2d\leq\frac{1}{6e}\cdot 2^{k/2} and smin{log2dk,k/2}s\geq\min\{\log_{2}dk,k/2\}, then Algorithm 6 samples satisfying assignments of kk-CNF formulas with degree dd and intersection ss in O(n)O(n) time in expectation and in O(nlogn)O(n\log n) time with high probability, where nn is the number of variables.

We remark that the lower bound on intersection size ss in Corollary 30 does not make the problem trivial. Note that the lower bound min{log2dk,k/2}\min\{\log_{2}dk,k/2\} is at most k/2k/2. The “hard” instance in the proof of [4, Corollary 30] has roughly k/2k/2 shared variables for each pair of dependent clauses. For completeness, we will show that if kk is even, and d42k/2d\geq 4\cdot 2^{k/2} and s=k/2s=k/2, then the sampling problem is NP-hard. The proof is almost identical to that of [4, Corollary 30]. The case of odd kk can be similarly handled but with larger constants.

We will use the inapproximability result of Sly and Sun [36] (or equivalently, of Galanis et al. [12]) for the hard-core model. We first remind the reader of the relevant definitions. Let λ>0\lambda>0. For a graph G=(V,E)G=(V,E), the hard-core model with parameter λ>0\lambda>0 is a probability distribution over the set of independent sets of GG; each independent set II of GG has weight proportional to λ|I|\lambda^{|I|}. The normalizing factor of this distribution is the partition function ZG(λ)Z_{G}(\lambda), formally defined as ZG(λ):=Iλ|I|Z_{G}(\lambda):=\sum_{I}\lambda^{|I|} where the sum ranges over all independent sets II of GG. The hardness result we are going to use is about approximating ZG(λ)Z_{G}(\lambda), but it is standard to translate it into the sampling setting as the problem is self-reducible.

Theorem 31 ([36, 12]).

For d3d\geq 3, let λc(d):=(d1)d1/(d2)d\lambda_{c}(d):=(d-1)^{d-1}/(d-2)^{d}. For all λ>λc(d)\lambda>\lambda_{c}(d), it is NP-hard to sample an independent set II with probability proportional to λ|I|\lambda^{|I|} in a dd-regular graph.

Theorem 32.

Let kk be an even integer. If d42k/2d\geq 4\cdot 2^{k/2} and s=k/2s=k/2, then it is NP-hard to sample satisfying assignments of kk-CNF formulas with degree dd and intersection ss uniformly at random.

Proof.

Given a dd-regular graph G=(V,E)G=(V,E), we will construct a monotone kk-CNF formula CC with degree dd and intersection k/2k/2 such that satisfying assignments of CC can be mapped to independent sets of GG. Replace each vertex vVv\in V by ss variables, say v1,,vsv_{1},\dots,v_{s}. If (u,v)E(u,v)\in E, then create a monotone clause v1vsu1usv_{1}\vee\dots\vee v_{s}\vee u_{1}\vee\dots\vee u_{s}. It is easy to see that every variable appears exactly dd times since GG is dd-regular. Moreover, the number of shared variables is always ss and the clause size is 2s=k2s=k.

For each satisfying assignment, we map it to a subset of vertices of GG. If all of v1,,vsv_{1},\dots,v_{s} are false, then make vv occupied. Otherwise vv is unoccupied. Thus a satisfying assignment is mapped to an independent set of GG. Moreover, there are (2k/21)n|I|(2^{k/2}-1)^{n-\left|I\right|} satisfying assignments corresponding to an independent set II, where nn is the number of vertices in GG. Thus the weight of II is proportional to (2k/21)|I|(2^{k/2}-1)^{-\left|I\right|}; namely λ=(2k/21)1\lambda=(2^{k/2}-1)^{-1} in the hard-core model.

In order to apply Theorem 31, all we need to do is to verify that λ>λc\lambda>\lambda_{c}, or equivalently

2k/21<(d2)d(d1)d1.\displaystyle 2^{k/2}-1<\frac{(d-2)^{d}}{(d-1)^{d-1}}.

This can be done as follows,

(d2)d(d1)d1\displaystyle\frac{(d-2)^{d}}{(d-1)^{d-1}} =(d2)(11d1)d1(45)5(d2)>2k/21.\displaystyle=(d-2)\Big{(}1-\frac{1}{d-1}\Big{)}^{d-1}\geq\bigg{(}\frac{4}{5}\bigg{)}^{5}(d-2)>2^{k/2}-1.\qed

Due to Theorem 32, we see that the dependence between kk and dd in Corollary 30 is tight in the exponent, even with the further assumption on intersection ss.

7.2. Independent Sets

We may also apply Algorithm 6 to sample hard-core configurations with parameter λ\lambda. Every vertex is associated with a random variable which is occupied with probability λ1+λ\frac{\lambda}{1+\lambda}. In this case, each edge defines a bad event which holds if both endpoints are occupied. Thus p=(λ1+λ)2p=\left(\frac{\lambda}{1+\lambda}\right)^{2}. Algorithm 6 is specialized to Algorithm 7.

  1. (1)

    Mark each vertex occupied with probability λ1+λ\frac{\lambda}{1+\lambda} independently.

  2. (2)

    While there is at least one edge with both end points occupied, resample all occupied components of sizes at least 22 and their boundaries.

  3. (3)

    Output the set of vertices.

Algorithm 7 Sample Hard-core Configurations

To see this, consider a graph G=(V,E)G=(V,E) with maximum degree dd. Given a configuration σ:V{0,1}\sigma:V\rightarrow\{0,1\}, consider the subgraph G[σ]G[\sigma] of GG induced by the vertex subset {vV:σ(v)=1}\{v\in V:\sigma(v)=1\}. Then we denote by 𝖡𝖺𝖽𝖵𝗍𝗑(σ){\sf BadVtx}(\sigma) the set of vertices in any component of G[σ]G[\sigma] of size at least 22. Then the output of Algorithm 5 is

𝖱𝖾𝗌𝖵𝗍𝗑(σ):=𝖡𝖺𝖽𝖵𝗍𝗑(σ)𝖡𝖺𝖽𝖵𝗍𝗑(σ).\displaystyle{\sf ResVtx}(\sigma):={\sf BadVtx}(\sigma)\cup\partial{\sf BadVtx}(\sigma).

This is because first, all of 𝖡𝖺𝖽𝖵𝗍𝗑(σ)\partial{\sf BadVtx}(\sigma) will be resampled, since any of them has at least one occupied neighbour in 𝖡𝖺𝖽𝖵𝗍𝗑(σ){\sf BadVtx}(\sigma). Secondly, v𝖡𝖺𝖽𝖵𝗍𝗑(σ)v\in\partial{\sf BadVtx}(\sigma) is unoccupied (otherwise v𝖡𝖺𝖽𝖵𝗍𝗑(σ)v\in{\sf BadVtx}(\sigma)), and Algorithm 5 stops after adding all of 𝖡𝖺𝖽𝖵𝗍𝗑(σ)\partial{\sf BadVtx}(\sigma). This explains Algorithm 7.

Moreover, let 𝖡𝖺𝖽(σ){\sf Bad}(\sigma) be the set of edges whose both endpoints are occupied under σ\sigma. Let 𝖱𝖾𝗌(σ){\sf Res}(\sigma) be the set of edges whose both endpoints are in 𝖱𝖾𝗌𝖵𝗍𝗑(σ){\sf ResVtx}(\sigma). Let σt\sigma_{t} be the random configuration of Algorithm 7 at round tt if it has not halted, and 𝖡𝖺𝖽t=𝖡𝖺𝖽(σt){\sf Bad}_{t}={\sf Bad}(\sigma_{t}), 𝖱𝖾𝗌t=𝖱𝖾𝗌(σt){\sf Res}_{t}={\sf Res}(\sigma_{t}).

Lemma 33.

If ep(2d1)<1ep(2d-1)<1, then 𝔼|𝖡𝖺𝖽t+1|(4ed21)p𝔼|𝖡𝖺𝖽t|\mathop{\mathbb{{}E}}\nolimits\left|{\sf Bad}_{t+1}\right|\leq(4ed^{2}-1)p\mathop{\mathbb{{}E}}\nolimits\left|{\sf Bad}_{t}\right|.

Proof.

First note that the dependency graph is the line graph of GG and Δ=2d2\Delta=2d-2 is the maximum degree of the line graph of GG. Thus ep(2d1)<1ep(2d-1)<1 guarantees the prerequisite of Corollary 27 is met. It also implies that for any σ\sigma, |𝖱𝖾𝗌(σ)|(2d1)𝖡𝖺𝖽(σ)\left|{\sf Res}(\sigma)\right|\leq(2d-1){\sf Bad}(\sigma), and |𝖱𝖾𝗌(σ)|(2d2)|𝖱𝖾𝗌(σ)|\partial\left|{\sf Res}(\sigma)\right|\leq(2d-2)\left|{\sf Res}(\sigma)\right|. Similarly to the analysis in Lemma 28, conditioned on a fixed 𝖡𝖺𝖽t{\sf Bad}_{t}, by Corollary 27 (or (15) in particular), we have that

𝔼|𝖡𝖺𝖽t+1|\displaystyle\mathop{\mathbb{{}E}}\nolimits\left|{\sf Bad}_{t+1}\right| p|𝖱𝖾𝗌(σ)|+ep|𝖱𝖾𝗌(σ)|\displaystyle\leq p\left|{\sf Res}(\sigma)\right|+ep\left|\partial{\sf Res}(\sigma)\right|
(p(2d1)+ep(2d2)(2d1))|𝖡𝖺𝖽t|\displaystyle\leq\left(p(2d-1)+ep(2d-2)(2d-1)\right)\left|{\sf Bad}_{t}\right|
<(4ed21)p|𝖡𝖺𝖽t|.\displaystyle<(4ed^{2}-1)p\left|{\sf Bad}_{t}\right|.

Since the inequality above holds for any 𝖡𝖺𝖽t{\sf Bad}_{t}, the lemma follows. ∎

Lemma 33 implies that, if 4epd214epd^{2}\leq 1, then the number of bad edges shrinks with a constant factor, and Algorithm 7 resamples O(m)O(m) edges in expectation and O(mlogm)O(m\log m) edges with high probability, where m=|E|m=\left|E\right|. A bounded degree graph is sparse and thus m=O(n)m=O(n), where nn is the number of vertices. Since p=(λ1+λ)2p=\left(\frac{\lambda}{1+\lambda}\right)^{2}, the condition 4epd214epd^{2}\leq 1 is equivalent to

λ12ed1.\displaystyle\lambda\leq\frac{1}{2\sqrt{e}d-1}.

Thus we have the following theorem, where the constants are slightly better than directly applying Theorem 25.

Theorem 34.

If λ12ed1\lambda\leq\frac{1}{2\sqrt{e}d-1}, then Algorithm 7 draws a uniform hard-core configuration with parameter λ\lambda from a graph with maximum degree dd in O(n)O(n) time in expectation and in O(nlogn)O(n\log n) time with high probability, where nn is the number of vertices.

The optimal bound of sampling hard-core configurations is λ<λced\lambda<\lambda_{c}\approx\frac{e}{d} where λc\lambda_{c} is defined in Theorem 31. The algorithm is due to Weitz [38] and the hardness is shown in [36, 12]. The condition of our Theorem 34 is more restricted than correlation decay based algorithms [38] or traditional Markov chain based algorithms. Nevertheless, our algorithm matches the correct order of magnitude λ=O(d1)\lambda=O(d^{-1}). Moreover, our algorithm has the advantage of being simple, exact, and running in linear time in expectation.

8. Distributed algorithms for sampling

An interesting feature of Algorithm 6 is that it is distributed.444See [11] for a very recent work by Feng, Sun, and Yin on distributed sampling algorithms. In particular, they show a similar lower bound in [11, Section 5]. For concreteness, consider the algorithm applied to sampling hard-core configurations on a graph GG (i.e. Algorithm 7), assumed to be of bounded maximum degree. Imagine that each vertex is assigned a processor that has access to a source of random bits. Communication is possible between adjacent processors and is assumed to take constant time. This is essentially Linial’s LOCAL model [27]. Then, in each parallel round of the algorithm, the processor at vertex vv can update the value σ(v)\sigma(v) in constant time, as this requires access only to the values of σ(u)\sigma(u) for vertices uV(G)u\in V(G) within a bounded distance rr of vv. In the case of the hard-core model, we have r=2r=2, since the value σ(v)\sigma(v) at vertex vv should be updated precisely if there are vertices uu and uu^{\prime} such that vuv\sim u and uuu\sim u^{\prime} and σ(u)=σ(u)=1\sigma(u)=\sigma(u^{\prime})=1. Note that we allow u=vu^{\prime}=v here.

In certain applications, including the hard-core model, Algorithm 6 runs in a number of rounds that is bounded by a logarithmic function of the input size with high probability. (Recall Theorem 25.) We show that this is optimal. (Although the argument is presented in the context of the hard-core model, it ought to generalise to many other applications.)

Set L=clognL=\lceil c\log n\rceil for some constant c>0c>0 to be chosen later. The instance that establishes the lower bound is a graph GG consisting of a collection of n/Ln/L disjoint paths Π1,,Πn/L\Pi_{1},\ldots,\Pi_{n/L} with LL vertices each. (Assume that nn is an exact multiple of LL; this is not a significant restriction.) The high-level idea behind the lower bound is simple, and consists of two observations. We assume first that the distributed algorithm we are considering always produces an output, say σ^:V(G){0,1}\hat{\sigma}:V(G)\to\{0,1\}, within tt rounds. It will be easy at the end to extend the argument to the situation where the running time is a possibly unbounded random variable with bounded expectation.

Focus attention on a particular path Π\Pi with endpoints uu and vv. The first observation is that if rt<L/2rt<L/2 then σ(u)\sigma(u) (respectively, σ(v)\sigma(v)) depends only on the computations performed by processors in the half of Π\Pi containing uu (respectively vv). Therefore, in the algorithm’s output, σ^(u)\hat{\sigma}(u) and σ^(v)\hat{\sigma}(v) are probabilistically independent. The second observation is that if the constant cc is sufficiently small then, in the hard-core distribution, σ(u)\sigma(u) and σ(v)\sigma(v) are significantly correlated. Since the algorithm operates independently on each of the n/Ln/L paths, these small but significant correlations combine to force to a large variation distance between the hard-core distribution and the output distribution of the algorithm.

We now quantify the second observation. Let σ:V(G){0,1}\sigma:V(G)\to\{0,1\} be a sample from the hard-core distribution on a path Π\Pi on kk vertices with endpoints uu and vv, and let Ik=ZΠ(λ)I_{k}=Z_{\Pi}(\lambda) denote the corresponding hard-core partition function (weighted sum over independent sets). Define the matrix Wk=(w00w01w10w11)W_{k}=\big{(}\begin{smallmatrix}w_{00}&w_{01}\\ w_{10}&w_{11}\end{smallmatrix}\big{)}, where wij=Pr(σ(u)=iσ(v)=j)w_{ij}=\mathop{\mathrm{Pr}}\nolimits(\sigma(u)=i\wedge\sigma(v)=j). Then

Wk=1Ik(Ik2λIk3λIk3λ2Ik4),W_{k}=\frac{1}{I_{k}}\begin{pmatrix}I_{k-2}&\lambda I_{k-3}\\ \lambda I_{k-3}&\lambda^{2}I_{k-4}\end{pmatrix},

since IkI_{k} is the total weight of independent sets in Π\Pi, Ik2I_{k-2} is the total weight of independent sets with σ(u)=σ(v)=0\sigma(u)=\sigma(v)=0, Ik3I_{k-3} is the total weight of independent sets with σ(u)=0\sigma(u)=0 and σ(v)=1\sigma(v)=1, and so on. Also note that IkI_{k} satisfies the recurrence

(23) I0=1,I1=λ+1,andIk=Ik1+λIk2,for k2.I_{0}=1,\quad I_{1}=\lambda+1,\quad\text{and}\quad I_{k}=I_{k-1}+\lambda I_{k-2},\>\text{for $k\geq 2$}.

We will use detWk\det W_{k} to measure the deviation of the distribution of (σ(u),σ(v))(\sigma(u),\sigma(v)) from a product distribution. Write

Wk=(Ik2Ik3Ik3Ik4),W_{k}^{\prime}=\begin{pmatrix}I_{k-2}&I_{k-3}\\ I_{k-3}&I_{k-4}\end{pmatrix},

and note that detWk=λ2Ik2detWk\det W_{k}=\lambda^{2}I_{k}^{-2}\det W_{k}^{\prime}. Applying recurrence (23) once to each of the four entries of WkW_{k}^{\prime}, we have

detWk\displaystyle\det W_{k}^{\prime} =Ik2Ik4Ik32\displaystyle=I_{k-2}I_{k-4}-I_{k-3}^{2}
=(Ik3+λIk4)(Ik5+λIk6)(Ik4+λIk5)2\displaystyle=(I_{k-3}+\lambda I_{k-4})(I_{k-5}+\lambda I_{k-6})-(I_{k-4}+\lambda I_{k-5})^{2}
=Ik3(Ik5+λIk6)Ik4(Ik4+λIk5)+λ2(Ik4Ik6Ik52)\displaystyle=I_{k-3}(I_{k-5}+\lambda I_{k-6})-I_{k-4}(I_{k-4}+\lambda I_{k-5})+\lambda^{2}(I_{k-4}I_{k-6}-I_{k-5}^{2})
=Ik3Ik4Ik4Ik3+λ2detWk2\displaystyle=I_{k-3}I_{k-4}-I_{k-4}I_{k-3}+\lambda^{2}\det W_{k-2}^{\prime}
=λ2detWk2,\displaystyle=\lambda^{2}\det W_{k-2}^{\prime},

for all k6k\geq 6. By direct calculation, detW4=λ2\det W_{4}^{\prime}=-\lambda^{2} and detW5=λ3\det W_{5}^{\prime}=\lambda^{3}. Hence, by induction, detWk=(1)k1λk2\det W_{k}^{\prime}=(-1)^{k-1}\lambda^{k-2}, and

(24) detWk=(1)k1λkIk2,\det W_{k}=\frac{(-1)^{k-1}\lambda^{k}}{I_{k}^{2}},

for all k4k\geq 4.

Solving the recurrence (23) gives the following formula for IkI_{k}:

Ik=Aλ(1+4λ+12)k+Bλ(14λ+12)k,I_{k}=A_{\lambda}\left(\frac{1+\sqrt{4\lambda+1}}{2}\,\right)^{k}+B_{\lambda}\left(\frac{1-\sqrt{4\lambda+1}}{2}\,\right)^{k},

where

Aλ=(12+2λ+124λ+1)andBλ=(122λ+124λ+1)A_{\lambda}=\left(\frac{1}{2}+\frac{2\lambda+1}{2\sqrt{4\lambda+1}}\right)\quad\text{and}\quad B_{\lambda}=\left(\frac{1}{2}-\frac{2\lambda+1}{2\sqrt{4\lambda+1}}\right)

Asymptotically,

Ik=(1+o(1))Aλ(1+4λ+12)k.I_{k}=(1+o(1))A_{\lambda}\left(\frac{1+\sqrt{4\lambda+1}}{2}\,\right)^{k}.

Substituting this estimate into (24) yields |detWk|=(1+o(1))Aλ2αk|\det W_{k}|=(1+o(1))A_{\lambda}^{-2}\alpha^{k} where

α=2λ2λ+4λ+1+1.\alpha=\frac{2\lambda}{2\lambda+\sqrt{4\lambda+1}+1}.

Note that 0<α<10<\alpha<1 and α\alpha depends only on λ\lambda.

Now let the matrix W^k=(w^00w^01w^10w^11)\widehat{W}_{k}=\big{(}\begin{smallmatrix}\widehat{w}_{00}&\widehat{w}_{01}\\ \widehat{w}_{10}&\widehat{w}_{11}\end{smallmatrix}\big{)} be defined as for WkW_{k}, but with respect to the output distribution of the distributed sampling algorithm rather than the true hard-core distribution. Recall that we choose L=clogn>2rtL=\lceil c\log n\rceil>2rt, which implies that σ^(u)\hat{\sigma}(u) and σ^(v)\hat{\sigma}(v) are independent and detW^L=0\det\widehat{W}_{L}=0. It is easy to check that if W^kWkε\|\widehat{W}_{k}-W_{k}\|_{\infty}\leq\varepsilon, where the matrix norm is entrywise, then |detWk|ε|\det W_{k}|\leq\varepsilon. Thus, for cc sufficiently small (and L=clognL=\lceil c\log n\rceil), we can ensure that W^LWLn1/3\|\widehat{W}_{L}-W_{L}\|_{\infty}\geq n^{-1/3}. Thus, |w^ijwij|n1/3|\widehat{w}_{ij}-w_{ij}|\geq n^{-1/3}, for some i,ji,j; for definiteness, suppose that i=j=0i=j=0 and that w^00>w00\widehat{w}_{00}>w_{00}.

Let ZZ (respectively Z^\widehat{Z}) be the number of paths whose endpoints are both assigned 0 in the hard-core distribution (respectively, the algorithm’s output distribution). Then ZZ (respectively Z^\widehat{Z}) is a binomial random variable with expectation μ=w00n/L\mu=w_{00}n/L (respectively μ^=w^00n/L\hat{\mu}=\widehat{w}_{00}n/L). Since |𝔼Z𝔼Z^|>Ω(n2/3/logn)|\mathop{\mathbb{{}E}}\nolimits Z-\mathop{\mathbb{{}E}}\nolimits\widehat{Z}|>\Omega(n^{2/3}/\log n), a Chernoff bound gives that Pr(Z(μ+μ^)/2)\mathop{\mathrm{Pr}}\nolimits(Z\geq(\mu+\hat{\mu})/2) and Pr(Z^(μ+μ^)/2)\mathop{\mathrm{Pr}}\nolimits(\widehat{Z}\leq(\mu+\hat{\mu})/2) both tend to zero exponentially fast with nn. It follows that the variation distance between the distributions of σ\sigma and σ^\hat{\sigma} is 1o(1)1-o(1).

The above argument assumes an absolute bound on running time, whereas the running time of an exact sampling algorithm will in general be a random variable TT. To bridge the gap, suppose Pr(Tt)23\mathop{\mathrm{Pr}}\nolimits(T\leq t)\geq\frac{2}{3}. Then

σ^σTV\displaystyle\|\hat{\sigma}-\sigma\|_{\mathrm{TV}} =maxA|Pr(σ^A)Pr(σA)|\displaystyle=\max_{A}\bigl{|}\mathop{\mathrm{Pr}}\nolimits(\hat{\sigma}\in A)-\mathop{\mathrm{Pr}}\nolimits(\sigma\in A)\bigr{|}
=maxA|(Pr(σ^ATt)Pr(σA))Pr(Tt)\displaystyle=\max_{A}\Bigl{|}\big{(}\mathop{\mathrm{Pr}}\nolimits(\hat{\sigma}\in A\mid T\leq t)-\mathop{\mathrm{Pr}}\nolimits(\sigma\in A)\big{)}\mathop{\mathrm{Pr}}\nolimits(T\leq t)
+(Pr(σ^AT>t)Pr(σA))Pr(T>t)|\displaystyle\qquad\qquad\hbox{}+\bigl{(}\mathop{\mathrm{Pr}}\nolimits(\hat{\sigma}\in A\mid T>t)-\mathop{\mathrm{Pr}}\nolimits(\sigma\in A)\big{)}\mathop{\mathrm{Pr}}\nolimits(T>t)\Bigr{|}
23(1o(1))13×1,\displaystyle\geq\tfrac{2}{3}(1-o(1))-\tfrac{1}{3}\times 1,

Where TV\|\cdot\|_{\mathrm{TV}} denotes variation distance, and AA ranges over events A{0,1}|V(G)|A\subseteq\{0,1\}^{|V(G)|}. Thus σσ^TV13o(1)\|\sigma-\hat{\sigma}\|_{\mathrm{TV}}\geq\frac{1}{3}-o(1), which is a contradiction. It follows that Pr(Tt)<23\mathop{\mathrm{Pr}}\nolimits(T\leq t)<\frac{2}{3} and hence 𝔼(T)13t\mathop{\mathbb{{}E}}\nolimits(T)\geq\frac{1}{3}t. Note that this argument places a lower bound on parallel time not just for exact samplers, but even for (very) approximate ones.

With only a slight increase in work, one could take the instance GG to be a path of length nn, which might be considered more natural. Identify O(n/L)O(n/L) subpaths within GG, suitably spaced, and of length LL. The only complication is that the hard-core distribution does not have independent marginals on distinct subpaths. However, by ensuring that the subpaths are separated by distance nαn^{\alpha}, for some small α>0\alpha>0, the correlations can be controlled, and the argument proceeds, with only slight modification, as before.

Acknowledgements

We would like to thank Yumeng Zhang for pointing out a factor kk saving in Corollary 30. We thank Dimitris Achlioptas, Fotis Iliopoulos, Pinyan Lu, Alistair Sinclair, and Yitong Yin for their helpful comments. We also thank anonymous reviewers for their detailed comments.

HG and MJ are supported by the EPSRC grant EP/N004221/1. JL is supported by NSF grant CCF-1420934. This work was done (in part) while the authors were visiting the Simons Institute for the Theory of Computing. HG was also supported by a Google research fellowship in the Simons Institute.

References

  • [1] Dimitris Achlioptas and Fotis Iliopoulos. Random walks that find perfect objects and the Lovász Local Lemma. J. ACM, 63(3):22, 2016.
  • [2] Noga Alon. A parallel algorithmic version of the Local Lemma. Random Struct. Algorithms, 2(4):367–378, 1991.
  • [3] József Beck. An algorithmic approach to the Lovász Local Lemma. I. Random Struct. Algorithms, 2(4):343–366, 1991.
  • [4] Ivona Bezáková, Andreas Galanis, Leslie Ann Goldberg, Heng Guo, and Daniel Štefankovič. Approximation via correlation decay when strong spatial mixing fails. In ICALP, 45:1–13, 2016.
  • [5] Magnus Bordewich, Martin E. Dyer, and Marek Karpinski. Stopping times, metrics and approximate counting. In ICALP, pages 108–119, 2006.
  • [6] Russ Bubley and Martin E. Dyer. Graph orientations with no sink and an approximation for a hard case of #SAT. In SODA, pages 248–257, 1997.
  • [7] Henry Cohn, Robin Pemantle, and James G. Propp. Generating a random sink-free orientation in quadratic time. Electr. J. Comb., 9(1), 2002.
  • [8] Artur Czumaj and Christian Scheideler. Coloring nonuniform hypergraphs: A new algorithmic approach to the general Lovász Local Lemma. Random Struct. Algorithms, 17(3-4):213–237, 2000.
  • [9] Paul Erdős and László Lovász. Problems and results on 3-chromatic hypergraphs and some related questions. Infinite and finite sets, volume 10 of Colloquia Mathematica Societatis János Bolyai, pages 609–628, 1975.
  • [10] Weiming Feng, Yahui Liu, and Yitong Yin. Local rejection sampling with soft filters. CoRR, abs/1807.06481, 2018.
  • [11] Weiming Feng, Yuxin Sun, and Yitong Yin. What can be sampled locally? In PODC, pages 121–130. ACM, 2017.
  • [12] Andreas Galanis, Daniel Štefankovič, and Eric Vigoda. Inapproximability of the partition function for the antiferromagnetic Ising and hard-core models. Comb. Probab. Comput., 25(4):500–559, 2016.
  • [13] Heidi Gebauer, Tibor Szabó, and Gábor Tardos. The local lemma is asymptotically tight for SAT. J. ACM, 63(5):43:1–43:32, 2016.
  • [14] Heng Guo and Kun He. Tight bounds for popping algorithms. CoRR, abs/1807.01680, 2018.
  • [15] Heng Guo and Mark Jerrum. Approximately counting bases of bicircular matroids. CoRR, abs/1808.09548, 2018.
  • [16] Heng Guo and Mark Jerrum. Perfect simulation of the hard disks model by partial rejection sampling. In ICALP, volume 107 of LIPIcs, pages 69:1–69:10. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.
  • [17] Heng Guo and Mark Jerrum. A polynomial-time approximation algorithm for all-terminal network reliability. In ICALP, volume 107 of LIPIcs, pages 68:1–68:12. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2018.
  • [18] Heng Guo, Mark Jerrum, and Jingcheng Liu. Uniform sampling through the Lovász local lemma. In STOC, pages 342–355. ACM, 2017.
  • [19] Bernhard Haeupler, Barna Saha, and Aravind Srinivasan. New constructive aspects of the Lovász Local Lemma. J. ACM, 58(6):28:1–28:28, 2011.
  • [20] David G. Harris and Aravind Srinivasan. Constraint satisfaction, packet routing, and the Lovász Local Lemma. In STOC, pages 685–694, 2013.
  • [21] David G. Harris and Aravind Srinivasan. The Moser-Tardos framework with partial resampling. In FOCS, pages 469–478, 2013.
  • [22] David G. Harris and Aravind Srinivasan. A constructive algorithm for the Lovász Local Lemma on permutations. In SODA, pages 907–925, 2014.
  • [23] Nicholas J. A. Harvey and Jan Vondrák. An algorithmic proof of the Lovász Local Lemma via resampling oracles. In FOCS, pages 1327–1346, 2015. Full version available at abs/1504.02044.
  • [24] Jonathan Hermon, Allan Sly, and Yumeng Zhang. Rapid mixing of hypergraph independent set. CoRR, abs/1610.07999, 2016.
  • [25] Donald E. Knuth. The art of computer programming, volume 4b (draft, pre-fascicle 6a). 2015. Available at http://www-cs-faculty.stanford.edu/~uno/fasc6a.ps.gz.
  • [26] Kashyap Babu Rao Kolipaka and Mario Szegedy. Moser and Tardos meet Lovász. In STOC, pages 235–244, 2011.
  • [27] Nathan Linial. Distributive graph algorithms-global solutions from local data. In FOCS, pages 331–335, 1987.
  • [28] Jingcheng Liu and Pinyan Lu. FPTAS for counting monotone CNF. In SODA, pages 1531–1548, 2015.
  • [29] Ankur Moitra. Approximate counting, the Lovász Local Lemma and inference in graphical models. CoRR, abs/1610.04317, 2016. STOC 2017, to appear.
  • [30] Michael Molloy and Bruce A. Reed. Further algorithmic aspects of the Local Lemma. In STOC, pages 524–529, 1998.
  • [31] Robin A. Moser and Gábor Tardos. A constructive proof of the general Lovász Local Lemma. J. ACM, 57(2), 2010.
  • [32] James G. Propp and David B. Wilson. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Struct. Algorithms, 9(1-2):223–252, 1996.
  • [33] James G. Propp and David B. Wilson. How to get a perfectly random sample from a generic markov chain and generate a random spanning tree of a directed graph. J. Algorithms, 27(2):170–217, 1998.
  • [34] Alexander D. Scott and Alan D. Sokal. The repulsive lattice gas, the independent-set polynomial, and the Lovász Local Lemma. J. Stat. Phy., 118(5):1151–1261, 2005.
  • [35] James B. Shearer. On a problem of Spencer. Combinatorica, 5(3):241–245, 1985.
  • [36] Allan Sly and Nike Sun. The computational hardness of counting in two-spin models on dd-regular graphs. Ann. Probab., 42(6):2383–2416, 2014.
  • [37] Aravind Srinivasan. Improved algorithmic versions of the Lovász Local Lemma. In SODA, pages 611–620, 2008.
  • [38] Dror Weitz. Counting independent sets up to the tree threshold. In STOC, pages 140–149, 2006.
  • [39] David B. Wilson. Generating random spanning trees more quickly than the cover time. In STOC, pages 296–303, 1996.