Limits of CDCL Learning via Merge Resolution
Abstract
In their seminal work, Atserias et al. and independently Pipatsrisawat and Darwiche in 2009 showed that CDCL solvers can simulate resolution proofs with polynomial overhead. However, previous work does not address the tightness of the simulation, i.e., the question of how large this overhead needs to be. In this paper, we address this question by focusing on an important property of proofs generated by CDCL solvers that employ standard learning schemes, namely that the derivation of a learned clause has at least one inference where a literal appears in both premises (aka, a merge literal). Specifically, we show that proofs of this kind can simulate resolution proofs with at most a linear overhead, but there also exist formulas where such overhead is necessary or, more precisely, that there exist formulas with resolution proofs of linear length that require quadratic CDCL proofs.
1 Introduction
Over the last two decades, CDCL SAT solvers have had a dramatic impact on many areas of software engineering [CGP+08], security [DVT07, XA05], and AI [BF97]. This is due to their ability to solve very large real-world formulas that contain upwards of millions of variables and clauses [MLM21]. Both theorists and practitioners have expended considerable effort in understanding the CDCL algorithm and the reasons for its unreasonable effectiveness in the context of practical applications. While considerable progress has been made, many questions remain unanswered.
Perhaps the most successful set of tools for understanding the CDCL algorithm come from proof complexity, and a highly influential result is the one that shows that idealized models of CDCL can polynomially simulate the resolution proof system, proved independently by Atserias, Fichte, and Thurley [AFT11], and Pipatsrisawat and Darwiche [PD11], building on initial results by Beame et al. [BKS04] and Hertel et al. [HBPV08]. (See also a recent alternative proof by Beyersdorff and Böhm [BB21].) Such simulation results are very useful because they reassure us that whenever a formula has a short resolution proof then CDCL with the right choice of heuristics can reproduce it.
Recent models make assumptions that are closer to real solvers, but pay for that with a polynomial overhead in the simulation. A series of papers have focused on understanding which of the assumptions are needed for these simulations to hold, often using and/or introducing refinements of resolution along the way. For instance, the question of whether restarts are needed, while still open, has been investigated at length, and the pool resolution [Van05] and RTL [BHJ08] proof systems were devised to capture proofs produced by CDCL solvers that do not restart. The importance of decision heuristics has also been explored recently, with results showing that neither static [MPR20] nor VSIDS-like [Vin20] ordering of variables are enough to simulate resolution in full generality (unless VSIDS scores are periodically erased [LFV+20]). In the case of static ordering, the (semi-)ordered resolution proof system [MPR20] was used to reason about such variants of CDCL solvers.
But even if we stay within the idealized model, it is not clear how efficient CDCL is in simulating resolution. The analysis of Pipatsrisawat and Darwiche gives an overhead—that is, if a formula over variables has a resolution refutation of length , then a CDCL proof with no more than steps exists. Beyersdorff and Böhm [BB21] improved the overhead to , but we do not know what the optimal is. Furthermore, to the best of our knowledge, prior to our paper, we did not even know if the overhead can be avoided altogether.
1.1 Learning Schemes in CDCL and Connection with Merges
A common feature of CDCL solvers is the use of 1-empowering learning schemes [PD08, AFT11]: that is, they only learn clauses which enable unit propagations that were not possible before. An example of 1-empowering learning scheme is the popular \lfstyle1UIP learning scheme [MS99]. To model this behavior we build upon a connection between 1-empowerment, and merges [And68], i.e., resolution steps involving clauses with shared literals.
Nearly every CDCL solver nowadays uses the First Unique Implication Point (\lfstyle1UIP) learning scheme, where conflict analysis starts with a clause falsified by the current state of the solver and sequentially resolves it with clauses responsible for unit propagations leading to the conflict, until the clause becomes asserting, i.e., unit immediately upon backjumping.
Descriptions of early implementations of CDCL solvers [MS99, MMZ+01] already remark on the importance of learning an asserting clause, since that nudges the solver towards another part of the search space, and consequently early alternative learning schemes explored learning many kinds of asserting clauses. First observe that conflict analysis can be extended to produce other asserting clauses that appear after the \lfstyle1UIP during conflict analysis such as intermediate UIPs and the last UIP [BS97]. The early solver GRASP can even learn multiple UIP clauses from a single conflict. While there is empirical evidence that it is often best to stop conflict analysis at the \lfstyle1UIP [ZMMM01], recent work has identified conditions where it is advantageous to continue past it [FB20] (see also the discussion of learning schemes therein).
Ryan [Rya04, §2.5] also observed empirically that clause quality is negatively correlated with the length of the conflict analysis derivation and considered the opposite approach, that is, learning clauses that appear before the \lfstyle1UIP during conflict analysis in addition to the \lfstyle1UIP. This approach is claimed to be useful for some empirical benchmarks but, like any scheme that learns multiple clauses, slows down Boolean constraint propagation (BCP) in comparison to a scheme that learns just the \lfstyle1UIP.
Later works provide a more theoretically oriented approach to understanding the strength of \lfstyle1UIP and to learning clauses that appear before the \lfstyle1UIP [DHN07, PD08]. In particular, and highly relevant for our discussion, Pipatsrisawat and Darwiche identified 1-empowerment as a fundamental property of asserting clauses. Furthermore they identified a connection between 1-empowering clauses and merges, and used the simplicity of checking for merges as an approximation for 1-empowerment.
An orthogonal approach is to extend the \lfstyle1UIP derivation by resolving it with clauses other than those that would usually be used during conflict analysis [ABH+08]. A prominent example is clause minimization [SB09], where literals are eliminated from the \lfstyle1UIP clause by resolving it with the appropriate input clauses, independently of their role in the conflict, so the resultant clause that is actually learned is a shorter and therefore stronger version of the \lfstyle1UIP.
Furthermore, a relation between merges and unit-resolution completeness has also been observed in the context of knowledge compilation [dV94]. Finally, the amount of merges directly inferable from a formula (i.e., in a single resolution step) has been proposed, under the name of mergeability, as a measure to help explain the hardness of a formula based on both controlled experiments as well as analysis of real-world instances [ZMW+18].
To summarize, merges are relevant in the context of CDCL learning schemes for the following reason: all practical CDCL learning schemes either produce a 1-empowering clause or extend one, and since 1-empowering clauses always contain a merge in its derivation, we have that all practical learning schemes produce a clause that contains a merge in its derivation, which is exactly the property imposed by the proof systems we introduce below.
1.2 Our Contributions
As mentioned earlier, we build upon a connection between 1-empowerment and merges [PD08, AFT11], and introduce a proof system RMA (for “resolution with merge ancestors”) which includes CDCL with an arbitrary 1-empowering learning scheme. The “merge ancestors” in the name of this system comes from the fact that for any 1-empowering clause, at least one step in its resolution derivation must resolve two clauses that share a common literal: a merge step in the sense of [And68]. Clause minimization procedures, as long as they are applied on top of 1-empowering clauses, are also modelled by RMA.
We prove that, on the one hand, RMA is able to simulate resolution only with a linear overhead. On the other hand, we show a quadratic separation between resolution and RMA, that is there exist formulas with resolution proofs of linear length that require RMA proofs of quadratic length. That is, we show that CDCL may be polynomially worse than resolution because of the properties of a standard learning scheme, but that the blow-up due to these properties is not more than linear.
We also consider weaker proof systems, all of which contain \lfstyle1UIP (and do so with finer granularity), but not necessarily other asserting learning schemes. A technical point of interest is that we work with proof systems that are provably not closed under restrictions, which is unusual in proof complexity. This fact forces our proof to exploit syntactic properties of the proof system, as opposed to relying on more convenient semantic properties.
2 Preliminaries
A literal is either a variable or its negation . A clause is a disjunction of literals, and a CNF formula is a conjunction of clauses. The support of a clause or is the set of variables it contains. A resolution derivation from a formula is a sequence of clauses such that is either an axiom in or it is the conclusion of applying the resolution rule
on two premises , with . The variable that appears with opposite signs in the premises of a resolution inference is called the pivot. If furthermore there is a literal common to and the resolvent is called a merge. If instead of being the result of a syntactic inference we allow to be any clause semantically implied by and , even if and might not be resolvable, then we say is a semantic resolution derivation. A derivation is a refutation if its last clause is the empty clause . We denote .
We assume that every clause in a derivation is annotated with the premises it is obtained from, which allows us to treat the proof as a DAG where vertices are clauses and edges point from premises to conclusions. When this DAG is a tree we call a derivation tree-like, and when it is a centipede (i.e., a maximally unbalanced tree) we call it input.
A derivation is unit if in every inference at least one of the premises is a unit clause consisting of a single literal. Since neither input nor unit resolution are complete proof systems, we write (respectively ) to indicate that there exists an input (resp. unit) resolution derivation of from .
A clause syntactically depends on an axiom with respect to a derivation if there is a path from to in the DAG representation of . This does not imply that is required to derive , since a different derivation might not use .
A restriction to variables is a mapping , successively extended to literals, clauses, formulas, and refutations, simplifying where needed. We write as a shorthand for . It is well-known that if is a resolution derivation from and is a restriction, then is a semantic resolution derivation from .
It is convenient to leave satisfied clauses in place in a derivation that is the result of applying a restriction to another derivation so that we can use the same indices to refer to both derivations. To do that we use the symbol and treat it as a clause that always evaluates to true, is not supported on any set, does not depend on any clause, and cannot be syntactically resolved with any clause.
A semantic derivation can be turned into a syntactic derivation by ignoring unnecessary clauses. Formally, if is a semantic resolution derivation, we define its syntactic equivalent as the following syntactic resolution derivation. Let and let and be the parents of . If we set , analogously with . Otherwise we set . It is not hard to see that for each , .
2.1 CDCL
We need to define a few concepts from CDCL proofs. An in-depth treatment can be found in the Handbook of Satisfiability [BN21]. Fix a CNF , also known as clause database. A trail is a sequence of tuples where is either a clause in or the special symbol representing a decision. We denote by the assignment , and we denote by the decision level at position , that is the number of decisions up to . We mark the position of the last decision in a trail by .
A trail is valid if for every position that is not a decision we have that and for every decision we have that for every clause such that , the literal appears in the trail before . In particular, for every position with we have .
A clause is asserting if it is unit at the last decision in the trail, that is . It is 1-empowering if is implied by and can lead to new unit propagations after being added to , that is if there exists a literal such that for some , it holds that . If a clause is not 1-empowering then we say it is absorbed by .
Given a clause falsified by a trail , the conflict derivation is an input derivation where if , and otherwise. The first (i.e., with the largest index) asserting clause in the derivation is called the \lfstyle1UIP. Note that is always asserting (because is falsified by for and is not falsified by ), therefore we can assume that the \lfstyle1UIP always has index at least .
We call a sequence of input derivations input-structured if the last clause of each derivation can be used as an axiom in successive derivations. The last clause of each but the last derivation is called a lemma. A CDCL derivation is an input-structured sequence of conflict derivations, where learned clauses are lemmas. This definition is similar to that of Resolution Trees with Input Lemmas [BHJ08], with the difference that the sequence only needs to be ordered, without imposing any further tree-structure on the global proof.
The following Lemmas highlight the practical relevance of merges by relating them to \lfstyle1UIP, asserting, and 1-empowering clauses.
Lemma 2.1 ([PD08, Proposition 2]).
If a clause is asserting, then it is 1-empowering.111The original result does not prove 1-consistency, but the proof is analogous.
Lemma 2.2 ([AFT11, Lemma 8]).
If and are absorbed but is 1-empowering, then is a merge. In particular, if a clause is 1-empowering, then it contains a merge in its derivation.
Lemma 2.3.
The \lfstyle1UIP clause is a merge.
Proof.
Let be the \lfstyle1UIP. On the one hand, since every clause in the trail contains at least two literals at the same decision level it appears in, contains two literals at the last decision level. On the other hand, any clause that is not in the trail also contains two literals at the last decision level, and in particular . Since and is not asserting, it also contains two literals at the last decision level.
We accounted for 4 literals at the last decision level present in the premises of , of which 2 are not present in the conclusion because they are the pivots. In order for to contain only one literal at the last decision level, the remaining two literals must be equal. ∎
3 Proof Systems
We define our proof systems in terms of the input-structured framework. Every resolution proof can be thought of as being input-structured if we consider it as a sequence of unit-length input resolutions and every clause as a lemma; it is when we impose restrictions on which clauses are permitted as lemmas that we obtain different proof systems. The diagram in Figure 1 can help keeping track of the proof systems.
Andrews’ definition of merge resolution [And68] considers tree-like proofs with the additional restriction that in every inference at least one premise is an axiom or a merge. He also observes that such derivations can be made input-structured.
Observation 3.1 ([And68]).
A tree-like merge resolution derivation can be decomposed into an input-structured sequence where all the lemmas are merges.
This observation is key when working with such derivations, as is apparent in Sections 4 and A, to the point that we use as an alternative way to define merge resolution.
Andrews’ main result is that the merge restriction does not affect tree-like resolution.
Lemma 3.2 ([And68, Lemma 5]).
If there is a tree-like resolution derivation of of length where at most the root is a merge, then there is an input resolution derivation of some of length at most .
Theorem 3.3 ([And68, Theorem 1]).
If there is a tree-like resolution derivation of of length , then there is a tree-like merge resolution derivation of some of length at most .
If we lift the tree-like restriction from the input-structured view of merge resolution proofs we obtain a proof system between tree- and DAG-like resolution where clauses can be reused (i.e., have outdegree larger than ) if and only if they are merges or, in other words, lemmas in the input-structured decomposition. We call this proof system Resolution with Merge Lemmas and refer to it with the acronym RML.
Definition 3.4.
A RML derivation is an input-structured sequence of unit resolution derivations where all lemmas are merges.
CDCL refutations produced by solvers that use the \lfstyle1UIP learning scheme are in RML form, as a consequence of Lemma 2.3. We can also generalize RML to allow reusing clauses that contain a merge anywhere in their derivation. We call this proof system Resolution with Merge Ancestors, or RMA for short.
Definition 3.5.
A RMA derivation is an input-structured sequence of unit resolution derivations where all derivations but the last contain a merge.
Note that by Lemma 3.2 it does not matter if we require the sequence of derivations of an RMA derivation to be input derivations or if we allow general trees. In fact, our lower bound results hold for a more general proof system where we only ask that every clause with outdegree larger than has an ancestor that is a merge. Such proof system does not have a simple input structure, but can rather be thought of as a sequence of tree-like resolution derivations whose roots are merges, followed by a standard resolution derivation using the roots of the previous derivations as axioms.
To make the connection back to CDCL, we can define a proof system called Resolution with Empowering Lemmas that captures CDCL refutations produced by solvers that use any asserting learning scheme or 1-empowering learning scheme.
Definition 3.6.
Let be the lemmas of an input-structured sequence of unit derivations. The sequence is a Resolution with Empowering Lemmas (REL) derivation of a formula if is 1-empowering with respect to for all .
Observation 3.7.
A REL derivation is a RMA derivation.
It might seem more natural to work with the REL proof system rather than its merge-based counterparts, since REL is defined exactly through the 1-empowering property. However, while the merge property is easy to check because it is local to the derivation at hand, we can only determine if a clause is 1-empowering by looking at the full history of the derivation, in particular what the previous lemmas are. This makes REL too cumbersome to analyse. Furthermore, CDCL refutations produced apply a clause minimization scheme on top of an asserting clause might not be in REL form, but they are still in RMA form.
A further property of input derivations produced by a CDCL solver is that once a variable is resolved, it does not appear later in the derivation.
Definition 3.8.
A resolution derivation is strongly regular if for every resolution step , the pivot variable is not part of the support of any clause . A sequence of derivations is locally regular if every derivation in the sequence is strongly regular. A LRML derivation (resp. LRMA) is a locally regular RML derivation (resp. RMA).
Finally we can consider derivations that have empowering, merge lemmas and are locally regular. These still include \lfstyle1UIP proofs.
Definition 3.9.
A LREML derivation is a derivation that is both LRML and REL.
It follows from the simulation of resolution by CDCL [PD11, AFT11] that all (DAG-like) proof systems we defined polynomially simulate standard resolution. In Section 4 we make this simulation more precise and prove that the simulation overhead can be made linear, and in Section 5 that the simulation is optimal because there exist formulas that have resolution refutations of linear length but require RMA refutations of quadratic length.
4 Simulation
As an auxiliary tool to simulate resolution in RML we define the input-resolution closure of a set , denoted , as the set of clauses derivable from via input resolution plus weakening. It is well-known that, since input resolution derivations can be assumed to be strongly regular without loss of generality, we can also assume them to be at most linear in the number of variables.
Observation 4.1.
If is a CNF formula over variables and then there is a strongly regular input resolution derivation of some from of length at most .
Combining Theorem 3.3 with the idea that in order to simulate a resolution derivation we do not need to generate each clause, but only do enough work so that in the following steps we can pretend that we had derived it [PD11, AFT11], we can prove that merge resolution simulates resolution with at most a multiplicative linear overhead in the number of variables.
Theorem 4.2.
If is a CNF formula over variables that has a resolution refutation of length then it has a RML refutation of length .
Proof.
Let be a resolution refutation. We construct a sequence of sets with the following properties.
-
1.
is the set of lemmas in a RML derivation of length at most .
-
2.
.
This is enough to prove the theorem: since we can obtain from in length , so the total length of the refutation is .
We build the sets by induction, starting with . Assume we have built and let with . If we set and we are done. Otherwise, by induction we have , therefore by Observation 4.1 there are input resolution derivations of and of length at most . Since neither nor , and can be resolved and therefore there is a tree-like derivation of from of length at most . By Theorem 3.3 there is a tree-like merge resolution derivation of from of length at most . By Observation 3.1 the derivation can be decomposed into a sequence of input derivations of total length at most . Let be the lemmas in that sequence and set . We have that , and that we can obtain from in at most steps. Thus has all the required properties. ∎
We can be a bit more precise with the description of the simulation if we look at the structure of before applying Theorem 3.3. Let and be the last merges in the input derivation of and respectively, and let .
Now consider the fragment of the input derivation of from to , analogously with . We have a tree-like derivation of where at most the root is a merge, therefore we can apply Lemma 3.2 directly instead of Theorem 3.3 and obtain an input resolution derivation of from .
If we also make sure that the input derivations of and are strongly regular, we have that LRML can also simulate resolution with the same overhead as RML.
An analogous result can be obtained for LREML from the following lemma.
Lemma 4.3 ([PD11]).
If absorbs and , then .
Corollary 4.4.
If is a CNF formula over variables that has a resolution refutation of length then it has a LREML refutation of length .
Proof.
The proof follows the general structure of Theorem 4.2, except that we use a sequence of steps in order to construct . Our induction hypothesis is that can be derived from in inference steps in LREML, and that and can be derived from in steps, with .
The base case is trivial.
5 Separation
We prove the following separation between standard resolution and RMA.
Theorem 5.1.
There exists a family of formulas over variables and clauses that have resolution refutations of length but every RMA refutation requires length .
5.1 Formula
Let be positive integers. We have variables for and for and . For convenience we define and , which are not variables. Let , and . For each we build the following gadget:
for | (1) |
Each equality is expanded into the two clauses and , and we collectively call them . Observe that the -th gadget implies . Additionally we build the following gadget:
(2) | ||||
for | (3) | |||
(4) |
where denotes the canonical form of . Each constraint is expanded into the two clauses and , and we collectively call them . The resulting formula is called .
5.2 Upper Bound
It is not hard to see that there is a resolution refutation of of length . Indeed, we first derive the two clauses representing for each , which requires steps:
(5) |
Then we resolve each of the axioms with one of these clauses, appropriately chosen so that we obtain pairs of clauses of the form for , and resolve each pair to obtain the chain of implications in steps.
(6) |
Since we have derived a chain of implications , , …, , we can complete the refutation in more steps. Let us record our discussion.
Lemma 5.2.
has a resolution refutation of length .
Before we prove the lower bound let us discuss informally what are the natural ways to refute this formula in RML, so that we understand which behaviours we need to rule out.
If we try to reproduce the previous resolution refutation, since we cannot reuse the clauses representing because they are not merges, we have to rederive them each time we need them, which means that it takes steps to derive the chain of implications . We call this approach refutation 1. This refutation has merges (over , , and ) when we produce , and (over and ) when we produce , but since we never reuse these clauses the refutation is in fact tree-like.
An alternative approach, which we call refutation 2, is to start working with the axioms instead. In this proof we clump together all of the repeated constraints of the form for every , and then resolve them out in one go. In other words, we first derive the sequence of constraints
(7) |
where can be obtained from and the pair of axioms , then resolve away the inequalities from using the axioms. However, representing any of the constraints for requires clauses, which is significantly larger than and even superpolynomial for large enough , so this refutation is not efficient either. Note that this refutation has merges (over variables) each time that we derive with .
A third and somewhat contrived way to build a refutation is to derive the pair of clauses representing using a derivation whose last step is a merge, so that they can be reused. Each of these clauses can be derived individually in steps, for a total of steps, by slightly adapting refutation 5.2, substituting each derivation of by a derivation of whenever so that at the end we obtain instead of the empty clause. Such a substitution clause can be obtained, e.g., by resolving with as follows
(8) |
After deriving as merges we follow the next steps of refutation 5.2 and complete the refutation in steps. We call this refutation 3.
Observe that the minimum length of deriving the clauses representing is only , even in RML, so if we only used the information that refutation 5.2 contains these clauses we would only be able to bound its length by . Therefore when we compute the hardness of deriving a clause we need to take into account not only its semantics but how it was obtained syntactically.
5.3 Lower Bound
Before we begin proving our lower bound in earnest we make two useful observations.
Lemma 5.3.
Let be a resolution derivation that only depends on the axioms. Then does not contain any merges, and all clauses are supported on .
Proof.
We prove by induction that every clause in is of the form with . This is true for the axioms. By induction hypothesis, a generic resolution step over is of the form
(9) |
and in particular is not a merge. ∎
Lemma 5.4.
Let be a resolution derivation of a clause supported on variables that uses an axiom. Then uses at least one axiom for each .
Proof.
We prove the contrapositive and assume that there is an axiom that is used, and either both and are not used, or both and are not. In the first case the literal appears in every clause in the path from to , contradicting that is supported on variables. Analogously with literal in the second case. ∎
Our first step towards proving the lower bound is to rule out that refutations like refutation 5.2 can be small, and to do that we show that wide clauses allow for very little progress. This is a common theme in proof complexity, and the standard tool is to apply a random restriction to a short refutation in order to obtain a narrow refutation. However, RMA is not closed under restrictions, as we prove later in Corollary 5.12, and because of this we need to argue separately about which merges are preserved.
Let us define the class of restrictions that we use and which need to respect the structure of the formula. A restriction is an autarky [MS85] with respect to a set of clauses if it satisfies every clause that it touches; in other words for every clause either or . A restriction is -respecting if it is an autarky with respect to axioms, we have up to variable renaming, and every variable is mapped to an variable. Our definition of a narrow clause is also tailored to the formula at hand, and counts the number of different -blocks that a clause mentions. Formally .
Lemma 5.5.
Let be a resolution refutation of of length . There exists an -respecting restriction such that every clause in has .
Proof.
We use the probabilistic method. Consider the following distribution over : each coordinate is chosen independently with , . Given a random variable sampled according to this distribution, we derive a random restriction as follows: , if , and otherwise (where ).
Observe that up to variable renaming, and by a Chernoff bound we have .
We also have, for every clause with , that
(10) |
Therefore by a union bound the probability that or that any clause has is bounded away from and we conclude that there exists a restriction that satisfies the conclusion of the lemma. ∎
Note that is a resolution refutation of , but not necessarily a RMA refutation, therefore we lose control over which clauses may be reused222Recall that is the syntactic equivalent of .. Nevertheless, we can identify a fragment of where we still have enough information.
Lemma 5.6.
There exists an integer such that is a resolution derivation of a clause supported on variables that depends on an axiom and where no clause supported on variables is reused.
Proof.
Let be the first clause that depends on an axiom and such that is supported on , which exists because is one such clause.
By definition of , we have that every ancestor of that is supported on variables corresponds to a clause in that only depends on axioms, hence by Lemma 5.3 is not a merge. By definition of RMA is not reused, and by construction of neither is .
It remains to prove that depends on an axiom. Since depends on an axiom, at least one of its predecessors and also does, say . By definition of , is not supported on , and hence by Lemma 5.3 either depends on an axiom or . Analogously, if also depends on an axiom then so does (or it is ) and we are done. Otherwise is of the form and is either satisfied by or left untouched. In both cases we have that (trivially in the first case and because contains the pivot while does not in the second), hence depends on . ∎
Note that may be semantically implied by the axioms, and have a short derivation as in refutation 5.2, therefore we are forced to use syntactic arguments to argue that deriving using an axiom takes many resolution steps.
The next step is to break into (possibly intersecting) parts, each corresponding roughly to the part of that uses axioms with variables in an interval of length (by Lemma 5.4 we can assume that contains axioms from every interval). To do this we use the following family of restrictions defined for :
(11) |
Let and note that .
Clauses in with many variables could be tricky to classify, but intuitively it should be enough to look at the smallest positive literal and the largest negative literal, since these are the hardest to eliminate. Therefore we define to be the following operation on a clause: literals over variables are left untouched, all positive literals but the smallest are removed, and all negative literals but the largest are removed. Formally,
(12) |
where (resp. ) is omitted if (resp. ) is empty.
We need the following property of .
Lemma 5.7.
If and then is supported over variables.
Proof.
The hypothesis that implies that the smallest positive literal in is either not larger than or larger than , but the hypothesis that rules out the first case. Therefore all positive literals are falsified by . Analogously the largest negative literal is not larger than and all negative literals are also falsified. ∎
We define each part to consist of all clauses such that is
-
1.
an axiom not satisfied by ; or
-
2.
the conclusion of an inference with pivot in ; or
-
3.
the conclusion of an inference with pivot in that depends on an axiom if contains a variable in ; or
-
4.
the conclusion of an inference with pivot in that does not depend on axioms if the only immediate successor of is in .
This is the point in the proof where we use crucially that the original derivation is in RMA form: because clauses that do not depend on axioms are not merges, they have only one successor and the definition is well-formed.
Ideally we would like to argue that parts are pairwise disjoint. This is not quite true, but nevertheless they do not overlap too much.
Lemma 5.8.
Let and be as discussed above. Then .
Proof.
Axioms may appear in at most two different , and clauses obtained after resolving with an pivot in only one. The only other clauses that depend on an axiom and may appear in different are obtained after resolving with a pivot, but since only contains two variables, such clause only may appear in two different . Finally, clauses that do not depend on an axiom appear in the same as one clause of the previous types, and therefore at most two different parts. ∎
To conclude the proof we need to argue that each is large. The intuitive reason is that must use one axiom for each , which introduces a pair of variables from each block, but since no clause contains more than such variables, we need to use enough axioms to remove the aforementioned variables. Formally the claim follows from these two lemmas.
Lemma 5.9.
For each there exists an integer such that is a resolution derivation of a clause supported on variables that depends on an axiom.
Proof.
Let be the first clause in that depends on an axiom and such that is supported on variables. We prove that is well-defined, that is a valid semantic resolution derivation, and that depends on an axiom.
Our induction hypothesis is that for (or any if does not exist), if the clause depends on an axiom and is not satisfied by , then there exists a clause with that implies modulo , that is , and depends on an axiom (over ).
If the induction hypothesis holds then is well-defined: since is not satisfied by and depends on an axiom there exists a clause that depends on an axiom and such that , which is supported on variables.
The base case is when is a non-satisfied axiom, where we can take . For the inductive case let and be the premises of in . If exactly one of the premises, say , is non-satisfied and, furthermore, depends on an axiom, then by the induction hypothesis we can take . Otherwise we need to consider a few subcases. If the pivot is an variable then both premises depend on an axiom (by Lemma 5.3), hence neither premise is satisfied. It follows that the pivot is unassigned by , and therefore we can take .
If the pivot is a variable then, because only assigns variables, neither premise is satisfied. We have two subcases: if exactly one premise depends on an axiom, say , then is present in , and by construction of the other premise is present in if and only if the conclusion is. If both premises depend on an axiom then both and are present in .
Therefore in the two latter subcases it is enough to prove that , since then we can take and we have that follows from a valid semantic resolution step. Indeed by Lemma 5.7 is a clause supported on variables, which by definition of implies that . However, since the pivot is a variable, is also supported on variables and, together with the fact that depends on an axiom, this contradicts that is the first such clause.
This finishes the first induction argument and proves that is a valid semantic derivation; it remains to prove that depends on an axiom over . We prove by a second induction argument that for every clause , if depends on an axiom then so does . The base case, when is an axiom, holds.
For the inductive case fix , , and , and let and be the premises of in . When both and depend on an axiom, then by hypothesis so do and and we are done. We only need to argue the case when one premise depends on an axiom and the other premise does not. In that case, because only affects variables, all the axioms used in the derivation of are left untouched by , therefore we have that , which contains the pivot used to derive and therefore does not imply . By construction of , depends on . ∎
Lemma 5.10.
Let be a resolution derivation from of a clause supported on variables that depends on an axiom. Then .
Proof.
By Lemma 5.4 we can assume that uses at least one axiom for each .
Let be the set of blocks mentioned by . We show that for each at least axioms over variables in appear in , which makes for at least axioms.
Fix and assume for the sake of contradiction that less than axioms over variables in appear in . Then there exists such that variable does not appear in . Rename variables as follows: for , and for . Then we can prove by induction, analogously to the proof of Lemma 5.3, that every clause derived from axiom is of the form where are literals supported outside . Since that includes , it contradicts our assumption that . ∎
To conclude the proof of Theorem 5.1 we simply need to put the pieces together.
Proof of Theorem 5.1.
We take as the formula family , for which a resolution refutation of length exists by Lemma 5.2.
To prove a lower bound we and assume that a RMA refutation of length exists; otherwise the lower bound trivially holds. We apply the restriction given by Lemma 5.5 to and we use Lemma 5.6 to obtain a resolution derivation of a clause supported on variables that uses an axiom. We then break into parts , each of size at least as follows from Lemmas 5.9 and 5.10. Finally by Lemma 5.8 we have . ∎
5.4 Structural Consequences
Theorem 5.1 immediately gives us two structural properties of RML and RMA. One is that proof length may decrease when introducing a weakening rule.
Corollary 5.11.
There exists a family of formulas over variables and clauses that have RML with weakening refutations of length but every RMA refutation requires length .
Proof.
Consider the formula , where is the formula given by Theorem 5.1 and is a new variable. If we weaken every clause to then we can derive in RML steps because each inference is a merge. However, if we cannot do weakening, then cannot be resolved with any clause in and the lower bound of Theorem 5.1 applies. ∎
The second property is that RML and RMA are not natural proof systems in the sense of [BKS04] because proof length may increase after a restriction.
Corollary 5.12.
There exists a restriction and a family of formulas over variables and clauses that have RML refutations of length but every RMA refutation of requires length .
6 Further Separations
We can separate the different flavours of merge resolution that we introduced using a few variations of where we add a constant number of redundant clauses for each . We consider these different clauses part of .
Upper bounds all follow the same pattern. We first show on a case-by-case basis how to obtain and as lemmas, and then proceed as in Section 5.2.
Towards proving lower bounds we are going to generalize the lower bound part of the proof of Theorem 5.1 to apply to these variations as well. Fortunately we only require a few local modifications.
First, we need to prove an equivalent of Lemma 5.3, which we do on a case-by-case basis.
Second, we need to show that -respecting restrictions can be extended to the new variables. For each block , since the new clauses are semantically subsumed by , there exists a way to map the new variables into and so that the result of the restriction is the same as if we had started with clauses and , which are already part of . That is, the formula that we work with after Lemma 5.6 is a copy of an unaltered formula.
The only part of the lower bound that depends on the specific subsystem of Resolution is Lemma 5.6; afterwards all the information we use is that no clause supported on variables is reused. Furthermore, the only property of the subsystem that we use in the proof of Lemma 5.6 is that Lemma 5.3 applies. Therefore, the modifications we just outlined are sufficient for the lower bound to go through.
6.1 Separation between RMA and LRMA
Proposition 6.1.
There exists a family of formulas over variables and clauses that have RMA refutations of length but every LRMA refutation requires length .
The separating formula is , where we add to clauses
(C1) | |||
(C2) | |||
(C3) | |||
(C4) |
for each . The new variables can be assigned as and to obtain the original formula back.
The upper bound follows from the following lemma.
Lemma 6.2.
Clauses and can be derived as lemmas from in length in RMA.
Proof.
We resolve clause first with (C2) and then (C1) in order to obtain as a merge, then derive , having a merge as its ancestor, so it can be remembered. Analogously starting from , (C3), and (C4) we can obtain as a lemma. ∎
The following observation is useful for the lower bound.
Lemma 6.3.
Let and be clauses with two pairs of opposite literals. Then and cannot appear in the same locally regular input derivation.
Proof.
Let and . Assume wlog that is the first clause out of and to appear in the derivation. If or are used as pivots before , then the locally regular condition prevents using as an axiom. Otherwise appears in the derivation since the time is used, which also prevents using . ∎
The equivalent of Lemma 5.3 is the following.
Lemma 6.4.
Let be a LRMA derivation that only depends on axioms. Then no clause in can be reused.
Proof.
We can only obtain a merge using one of (C1) or (C3), assume wlog (C1) is the first of these to be used in the derivation. By Lemma 6.3 neither (C2) nor (C3) appear in the derivation. We can show by induction that we can only obtain clauses of the form or , never as a merge. ∎
6.2 Separation between RML/LRMA and LRML
Proposition 6.5.
There exists a family of formulas over variables and clauses that have RML and LRMA and refutations of length but every LRML refutation requires length .
The separating formula is , where we add to clauses
(C1) | |||
(C2) | |||
(C3) | |||
(C4) |
for each . The new variables can be assigned as and to obtain the original formula back.
The upper bounds follow respectively from the following lemmas.
Lemma 6.6.
Clauses and can be derived as lemmas from in length in RML.
Proof.
We first resolve clauses , , …, , (C1) to obtain . We continue the input derivation resolving with (C2) to obtain . We then resolve with , , …, to obtain as a merge over . Analogously we can obtain . ∎
Lemma 6.7.
Clauses and can be derived as lemmas from in length in LRMA.
Proof.
We resolve clauses (C1) and (C2) to obtain , which is a merge, then derive , having a merge as its ancestor, so it can be used as a lemma. Analogously starting from (C3) and (C4) we can obtain as a lemma. ∎
The equivalent of Lemma 5.3 is the following.
Lemma 6.8.
Let be a LRML derivation that only depends on axioms. Then no clause in can be reused.
The proof idea is that the only merge we can obtain involves the or the variable. If we just resolve the two clauses over such a variable we obtain a clause we already had, so this is useless. Otherwise we are resolving one of away, which would be reintroduced at the time of resolving away, and that is not allowed by the SR condition.
Proof.
We can only obtain a merge by using one of the new clauses (C1)–(C4). If we resolve either pair of clauses over or over then we obtain a clause that was already present in the formula, and therefore we may preprocess such derivation away.
Otherwise consider the first step in the derivation where one of the new clauses is used as a premise, assume wlog it is (C1). That step is with a clause of the form , and we obtain a clause of the form , which is not a merge. That clause can be possibly resolved over () to obtain other clauses of the same form, neither of which is a merge, but it cannot be resolved over , , or because that step would reintroduce variable . ∎
6.3 Separation between LRML and REL
Proposition 6.9.
There exists a family of formulas over variables and clauses that have LRML refutations of length but every REL refutation requires length .
The separating formula is , where we add to clauses
(C1) | |||
(C2) |
for each . If we assign we obtain a copy of which, even if technically it is not the same formula we started with, is enough for our purposes.
The upper bound follows from the following lemma.
Lemma 6.10.
Clauses and can be derived as lemmas from in length in LRML.
Proof.
We resolve (C1) with , …, to obtain , then with to obtain as a merge. Analogously starting from (C2) we can obtain as a lemma. ∎
The equivalent of Lemma 5.3 is the following.
Lemma 6.11.
Let be a REL derivation that only depends on axioms. Then no clause in can be reused.
Proof.
Observe that every derivable clause has width at least . Let be any derivable clause and any literal in . We have that is not empty. However, assigning any variable immediately propagates all variables, hence is not empowering. ∎
7 Concluding Remarks
In this paper, we address the question of the tightness of simulation of resolution proofs by CDCL solvers. Specifically, we show that RMA, among other flavours of DAG-like merge resolution, simulates standard resolution with at most a linear multiplicative overhead. However, contrary to what we see in the tree-like case, this overhead is necessary. While the proof systems we introduce help us explain one source of overhead in the simulation of resolution by CDCL, it is not clear if they capture it exactly. In other words, an interesting future direction would be to explore whether it is possible for CDCL to simulate some flavour of merge resolution with less overhead than what is required to simulate standard resolution.
Acknowledgements
The authors are grateful to Yuval Filmus and a long list of participants in the program Satisfiability: Theory, Practice, and Beyond at the Simons Institute for the Theory of Computing for numerous discussions. This work was done in part while the authors were visiting the Simons Institute for the Theory of Computing.
Appendix A Tree-like Merge Resolution
For completeness we informally sketch the proofs of Lemma 3.2 and Theorem 3.3, which can be found in full detail in [And68].
Lemma A.1 (Lemma 3.2, restated).
If there is a tree-like resolution derivation of of length where at most the root is a merge, then there is an input resolution derivation of some of length at most .
Proof (sketch).
We prove by induction on that for every axiom there exists an input derivation of that uses a subset of the axioms of where is the topmost axiom. As intermediate objects we allow clauses in this derivation to contain opposite literals; these are cleaned up later.
Let , and let and be the derivations used to infer and respectively. Assume wlog that . Since does not contain any merges there exists a unique path from to an axiom , where all clauses contain . Note that other clauses in might still contain or . We replace by in (and consequently remove all the occurrences of in the aforementioned path) and we obtain a valid derivation of . We apply the induction hypothesis to and to obtain two unit derivations and of and whose topmost leaves are and . We replace by in and obtain a unit derivation of . We stitch together and by observing that , which is the only axiom in not present in the original axioms, and obtain a unit derivation of that only uses original axioms.
Finally, and outside the inductive argument, we get rid of clauses that contain opposite literals by replacing any such clause by to obtain a semantic derivation . Its syntactic counterpart satisfies the conclusion of the lemma. ∎

Theorem A.2 (Theorem 3.3, restated).
If there is a tree-like resolution derivation of of length , then there is a merge resolution derivation of some of length at most .
Proof (sketch).
The proof is by induction on the number of merges. The base case when there are no merges follows by Lemma A.1. Otherwise let be a subtree where exactly the root is a merge. Let be the input resolution derivation of given by Lemma A.1, let be the last merge in , and let and be the fragments of from to and up to respectively. We replace by in to obtain a refutation that uses as an axiom (note that in replacing by we may have to prune away parts of ). Because has one less merge we can apply the induction hypothesis and obtain a merge resolution derivation . Finally we replace the axiom by the derivation . ∎

References
- [\lfstyleABH+08] Gilles Audemard, Lucas Bordeaux, Youssef Hamadi, Saïd Jabbour, and Lakhdar Sais. A generalized framework for conflict analysis. In Hans Kleine Büning and Xishun Zhao, editors, Theory and Applications of Satisfiability Testing - SAT 2008, 11th International Conference, SAT 2008, Guangzhou, China, May 12-15, 2008. Proceedings, volume 4996 of Lecture Notes in Computer Science, pages 21–27. Springer, 2008. doi:10.1007/978-3-540-79719-7\_3.
- [\lfstyleAFT11] Albert Atserias, Johannes Klaus Fichte, and Marc Thurley. Clause-learning algorithms with many restarts and bounded-width resolution. Journal of Artificial Intelligence Research, 40:353–373, January 2011. Preliminary version in SAT ’09.
- [\lfstyleAnd68] Peter B. Andrews. Resolution with merging. J. ACM, 15(3):367–381, 1968.
- [\lfstyleBB21] Olaf Beyersdorff and Benjamin Böhm. Understanding the relative strength of QBF CDCL solvers and QBF resolution. In James R. Lee, editor, 12th Innovations in Theoretical Computer Science Conference, ITCS 2021, January 6-8, 2021, Virtual Conference, volume 185 of LIPIcs, pages 12:1–12:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021. doi:10.4230/LIPIcs.ITCS.2021.12.
- [\lfstyleBF97] Avrim L Blum and Merrick L Furst. Fast planning through planning graph analysis. Artificial intelligence, 90(1-2):281–300, 1997.
- [\lfstyleBHJ08] Samuel R. Buss, Jan Hoffmann, and Jan Johannsen. Resolution trees with lemmas: Resolution refinements that characterize DLL-algorithms with clause learning. Logical Methods in Computer Science, 4(4:13), December 2008.
- [\lfstyleBKS04] Paul Beame, Henry Kautz, and Ashish Sabharwal. Towards understanding and harnessing the potential of clause learning. Journal of Artificial Intelligence Research, 22:319–351, December 2004. Preliminary version in IJCAI ’03.
- [\lfstyleBN21] Sam Buss and Jakob Nordström. Proof complexity and SAT solving. In Armin Biere, Marijn Heule, Hans van Maaren, and Toby Walsh, editors, Handbook of Satisfiability, volume 336 of Frontiers in Artificial Intelligence and Applications, chapter 7, pages 233–350. IOS Press, 2nd edition, 2021. doi:10.3233/FAIA200990.
- [\lfstyleBS97] Roberto J. Bayardo Jr. and Robert Schrag. Using CSP look-back techniques to solve real-world SAT instances. In Proceedings of the 14th National Conference on Artificial Intelligence (AAAI ’97), pages 203–208, July 1997.
- [\lfstyleCGP+08] Cristian Cadar, Vijay Ganesh, Peter M Pawlowski, David L Dill, and Dawson R Engler. EXE: Automatically Generating Inputs of Death. ACM Transactions on Information and System Security (TISSEC), 12(2):1–38, 2008.
- [\lfstyleDHN07] Nachum Dershowitz, Ziyad Hanna, and Alexander Nadel. Towards a better understanding of the functionality of a conflict-driven SAT solver. In João Marques-Silva and Karem A. Sakallah, editors, Theory and Applications of Satisfiability Testing - SAT 2007, 10th International Conference, Lisbon, Portugal, May 28-31, 2007, Proceedings, volume 4501 of Lecture Notes in Computer Science, pages 287–293. Springer, 2007. doi:10.1007/978-3-540-72788-0\_27.
- [\lfstyledV94] Alvaro del Val. Tractable databases: How to make propositional unit resolution complete through compilation. In Jon Doyle, Erik Sandewall, and Pietro Torasso, editors, Proceedings of the 4th International Conference on Principles of Knowledge Representation and Reasoning (KR’94). Bonn, Germany, May 24-27, 1994, pages 551–561. Morgan Kaufmann, 1994.
- [\lfstyleDVT07] Julian Dolby, Mandana Vaziri, and Frank Tip. Finding Bugs Efficiently With a SAT Solver. In Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 195–204, 2007. doi:10.1145/1287624.1287653.
- [\lfstyleFB20] Nick Feng and Fahiem Bacchus. Clause size reduction with all-uip learning. In Luca Pulina and Martina Seidl, editors, Theory and Applications of Satisfiability Testing - SAT 2020 - 23rd International Conference, Alghero, Italy, July 3-10, 2020, Proceedings, volume 12178 of Lecture Notes in Computer Science, pages 28–45. Springer, 2020. doi:10.1007/978-3-030-51825-7\_3.
- [\lfstyleHBPV08] Philipp Hertel, Fahiem Bacchus, Toniann Pitassi, and Allen Van Gelder. Clause learning can effectively P-simulate general propositional resolution. In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI ’08), pages 283–290, July 2008.
- [\lfstyleLFV+20] Chunxiao Li, Noah Fleming, Marc Vinyals, Toniann Pitassi, and Vijay Ganesh. Towards a complexity-theoretic understanding of restarts in sat solvers. In Theory and Applications of Satisfiability Testing–SAT 2020: 23rd International Conference, Alghero, Italy, July 3–10, 2020, Proceedings 23, pages 233–249. Springer, 2020.
- [\lfstyleMLM21] João Marques-Silva, Inês Lynce, and Sharad Malik. Conflict-driven clause learning SAT solvers. In Armin Biere, Marijn Heule, Hans van Maaren, and Toby Walsh, editors, Handbook of Satisfiability - Second Edition, volume 336 of Frontiers in Artificial Intelligence and Applications, pages 133–182. IOS Press, 2021. doi:10.3233/FAIA200987.
- [\lfstyleMMZ+01] Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. Chaff: Engineering an efficient SAT solver. In Proceedings of the 38th Design Automation Conference (DAC ’01), pages 530–535, June 2001.
- [\lfstyleMPR20] Nathan Mull, Shuo Pang, and Alexander A. Razborov. On CDCL-based proof systems with the ordered decision strategy. In Proceedings of the 23rd International Conference on Theory and Applications of Satisfiability Testing (SAT ’20), volume 12178 of Lecture Notes in Computer Science, pages 149–165. Springer, July 2020.
- [\lfstyleMS85] Burkhard Monien and Ewald Speckenmeyer. Solving satisfiability in less than steps. Discret. Appl. Math., 10(3):287–295, 1985. doi:10.1016/0166-218X(85)90050-2.
- [\lfstyleMS99] João P. Marques-Silva and Karem A. Sakallah. GRASP: A search algorithm for propositional satisfiability. IEEE Transactions on Computers, 48(5):506–521, May 1999. Preliminary version in ICCAD ’96.
- [\lfstylePD08] Knot Pipatsrisawat and Adnan Darwiche. A new clause learning scheme for efficient unsatisfiability proofs. In Dieter Fox and Carla P. Gomes, editors, Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008, pages 1481–1484. AAAI Press, 2008. URL: http://www.aaai.org/Library/AAAI/2008/aaai08-243.php.
- [\lfstylePD11] Knot Pipatsrisawat and Adnan Darwiche. On the power of clause-learning SAT solvers as resolution engines. Artificial Intelligence, 175(2):512–525, February 2011. Preliminary version in CP ’09.
- [\lfstyleRya04] Lawrence Ryan. Efficient algorithms for clause-learning SAT solvers. Master’s thesis, Simon Fraser University, 2004.
- [\lfstyleSB09] Niklas Sörensson and Armin Biere. Minimizing learned clauses. In Proceedings of the 12th International Conference on Theory and Applications of Satisfiability Testing (SAT ’09), volume 5584 of Lecture Notes in Computer Science, pages 237–243. Springer, July 2009.
- [\lfstyleVan05] Allen Van Gelder. Pool resolution and its relation to regular resolution and DPLL with clause learning. In Proceedings of the 12th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (LPAR ’05), volume 3835 of Lecture Notes in Computer Science, pages 580–594. Springer, 2005.
- [\lfstyleVin20] Marc Vinyals. Hard examples for common variable decision heuristics. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI ’20), pages 1652–1659, February 2020.
- [\lfstyleXA05] Yichen Xie and Alexander Aiken. Saturn: A SAT-Based Tool for Bug Detection. In Proceedings of the 17th International Conference on Computer Aided Verification, CAV 2005, pages 139–143, 2005. doi:10.1007/11513988\_13.
- [\lfstyleZMMM01] Lintao Zhang, Conor F. Madigan, Matthew W. Moskewicz, and Sharad Malik. Efficient conflict driven learning in Boolean satisfiability solver. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’01), pages 279–285, November 2001.
- [\lfstyleZMW+18] Edward Zulkoski, Ruben Martins, Christoph M. Wintersteiger, Jia Hui Liang, Krzysztof Czarnecki, and Vijay Ganesh. The effect of structural measures and merges on SAT solver performance. In John N. Hooker, editor, Principles and Practice of Constraint Programming - 24th International Conference, CP 2018, Lille, France, August 27-31, 2018, Proceedings, volume 11008 of Lecture Notes in Computer Science, pages 436–452. Springer, 2018. doi:10.1007/978-3-319-98334-9\_29.