An Efficient Algorithm for Solving the 2-MAXSAT Problem
Abstract
By the MAXSAT problem, we are given a set of variables and a collection of clauses over . We will seek a truth assignment to maximize the number of satisfied clauses. This problem is NP-complete even for its restricted version, the 2-MAXSAT problem by which every clause contains at most 2 literals. In this paper, we discuss an efficient algorithm to solve this problem. Its worst-case time complexity is bounded by O(). This shows that the 2-MAXSAT problem can be solved in polynomial time. Thus, the paper in fact provides a proof of = NP.
Index Terms:
satisfiability problem, maximum satisfiability problem, NP-hard, NP-complete, conjunctive normal form, disjunctive normal form.I Introduction
The satisfiability problem is perhaps one of the most well-studied problems that arise in many areas of discrete optimization, such as artificial intelligence, mathematical logic, and combinatorial optimization, just to name a few. Given a set of Boolean (true/false) variables and a collection of clauses over , or say, a logic formula in CNF (Conjunctive Normal Form), the satisfiability problem is to determine if there is a truth assignment that satisfies all clauses in [4]. The problem is NP-complete even when every clause in has at most three literals [7]. The maximum satisfiability (MAXSAT) problem is an optimization version of satisfiabiltiy that seeks a truth assignment to maximize the number of satisfied clauses [10]. This problem is also NP-complete even for its restricted version, the so-called 2-MAXSAT problem, by which every clause in has at most two literals [8]. Its application can be seen in an extensive biliography [5, 8, 13, 16, 17, 18, 19, 21].
Over the past several decades, a lot of research on the MAXSAT has been conducted. Almost all of them are the approximation methods [10, 12, 20, 22, 1, 6], such as (1-1/)-approximation, 3/4-approximation[22], as well as the method based on the integer linear programming [11]. The only algorithms for exact solution are discussed in [24, 23]. The worst-case time complexity of [24] is bounded by O(2m), where is the maximum number of the occurrences of any variable in the clauses of , while the worst-case time complexity of [23] is bounded by maxO(), O*(). In both algorithms, the traditional branch-and-bound method is used for solving the satisfiability problem, which will search for a solution by letting a variable (or a literal) be 1 or 0.As shown in [9], any algorithm based on branch-and-bound runs in O*() time with 2.
In this paper, we discuss a polynomial time algorithm to solve the 2-MAXSAT problem. Its worst-case time complexity is bounded by O(), where and are the numbers of clauses and the number of variables in , respectively. Thus, our algorithm is in fact a proof of = NP.
The main idea behind our algorithm can be summarized as follows.
-
1.
Given a collection of clauses over a set of variables with each containing at most 2 literals. Construct a formula over another set of variables , but in DNF (Disjunctive Normal Form), containing 2 conjunctions with each of them having at most 2 literals such that there is a truth assignment for that satisfies at least * clauses in if and only if there is a truth assignment for that satisfies at least * conjunctions in .
-
2.
For each in ( 1, …, 2), construct a graph, called a *-graph to represent all those truth assignments of variables such that under evaluates to true.
-
3.
Organize the *-graphs for all ’s into a trie-like graph . Searching bottom up, we can find a maximum subset of satisfied conjunctions in polynomial time.
The organization of the rest of this paper is as follow. First, in Section 2, we restate the definition of the 2-MAXSAT problem and show how to reduce it to a problem that seeks a truth assignment to maximize the number of satisfied conjunctions in a formula in DNF. Then, we discuss a basic algorithm in Section 3. Next, in Section 4, how to improve the basic algorithm is discussed. Section V is devoted to the analysis of the time complexity of the improved algorithm. Finally, a short conclusion is set forth in Section 5.
II 2-MAXSAT Problem
We will deal solely with Boolean variables (that is, those which are either true or false), which we will denote by , , etc. A literal is defined as either a variable or the negation of a variable (e.g., , are literals). A literal is true if the variable is false. A clause is defined as the OR of some literals, written as ( …. ) for some , where each (1 ) is a literal, as illustrated in . We say that a Boolean formula is in conjunctive normal form (CNF) if it is presented as an AND of clauses: … ( 1). For example, ( ) ( ) is in CNF. In addition, a disjunctive normal form (DNF) is an OR of conjunctions: … ( 1). For instance, ( ) ( ) is in DNF.
Finally, the MAXSAT problem is to find an assignment to the variables of a Boolean formula in CNF such that the maximum number of clauses are set to true, or are satisfied. Formally:
2-MAXSAT
-
•
Instance: A finite set of variables, a Boolean formula = … in CNF over such that each has 0 < 2 literals ( = 1, …, ), and a positive integer * .
-
•
Question: Is there a truth assignment for that satisfies at least * clauses?
In terms of [8], the 2-MAXSAT is NP-complete.
To find a truth assignment such that the number of clauses set to is maximized under , we can try all the possible assignments, and count the satisfied clauses as discussed in [18], by which bounds are set up to cut short branches. We may also use a heuristic method to find an approximate solution to the problem as described in [10].
In this paper, we propose a quite different method, by which for = … , we will consider another formula in DNF constructed as follows.
Let = be a clause in , where and denote either variables in or their negations. For , define a variable . and a pair of conjunctions: , , where
= ,
= .
Let = … . Then, given an instance of the 2-MAXSAT problem defined over a variable set and a collection of clauses, we can construct a logic formula in DNF over the set in polynomial time, where = , …, . has = 2 conjunctions.
Concerning the relationship of and , we have the following proposition.
Proposition 1.
Let and be a formula in CNF and a formula in DNF defined above, respectively. No less than * clauses in can be satisfied by a truth assignment for if and only if no less than * conjunctions in can be satisfied by some truth assignment for .
Proof.
Consider every pair of conjunctions in : = and = ( 1, …, ). Clearly, under any truth assignment for the variables in , at most one of and can be satisfied. If = true, we have = and = false. If = false, we have = and = false.
"" Suppose there exists a truth assignment for that satisfies * clauses in . Without loss of generality, assume that the clauses are , , …, .
Then, similar to Theorem 1 of [13], we can find a truth assignment for , satisfying the following condition:
For each = ( = 1, …, ), if is true and is false under , (1) set both and to true for . If is false and is true under , (2) set to true, but to false for . If both and are true, do (1) or (2) arbitrarily.
Obviously, we have at least * conjunctions in satisfied under .
"" We now suppose that a truth assignment for with * conjunctions in satisfied. Again, assume that those conjunctions are , , …, , where each ( = 1, …, ) is 1 or 2.
Then, we can find a truth assignment for , satisfying the following condition:
For each ( = 1, …, ), if = 1, set to true for ; if = 2, set to true for .
Clearly, under , we have at lease * clauses in satisfied.
The above discussion shows that the proposition holds. ∎
Proposition 1 demonstrates that the 2-MAXSAT problem can be transformed, in polynomial time, to a problem to find a maximum number of conjunctions in a logic formula in DNF.
As an example, consider the following logic formula in CNF:
(1) |
Under the truth assignment = = 1, = 1, = 1, evaluates to true, i.e., = 1 for = 1, 2, 3. Thus, * = 3.
For , we will generate another formula , but in DNF, according to the above discussion:
(2) |
According to Proposition 1, should also have at least * = 3 conjunctions which evaluates to true under some truth assignment. In the opposite, if has at least 3 satisfied conjunctions under a truth assignment, then should have at least three clauses satisfied by some truth assignment, too. In fact, it can be seen that under the truth assignment = = 1, = 1, = 1, = 1, = 1, = 1, has three satisfied conjunctions: , , and , from which the three satisfied clauses in can be immediately determined.
In the following, we will discuss a polynomial time algorithm to find a maximum set of satisfied conjunctions in any logic formula in DNF, not only restricted to the case that each conjunction contains up to 2 conjuncts.
III Algorithm description
In this section, we discuss our algorithm. First, we present the main idea in Section III-A. Then, in Section 3.2, a recursive algorithm for solving the problem is described in great detail. The running time of the algorithm will be analyzed in the next section.
III-A Main idea
To develop an efficient algorithm to find a truth assignment that maximizes the number of satisfied conjunctions in formula = …, , where each ( = 1, …, ) is a conjunction of variables ( ), we need to represent each as a sequence of variables (referred to as a variable sequence). For this purpose, we introduce a new notation:
(, *) = = true,
which will be inserted into to represent any missing variable (i.e., , but not appearing in ). Obviously, the truth value of each remains unchanged.
In this way, the above can be rewritten as a new formula in DNF as follows:
(3) |
Doing this enables us to represent each as a variable sequence, but with all the negative literals being removed. It is because if the variable in a negative literal is set to true, the corresponding conjunction must be false, and our goal is to establish a graph in which each node represents a variable and each path corresponds to a truth assignment satisfying (by which any variable on is set true while all those varibles not on are set false). Obviousely, in such a graph, any variable appearing in a negative literal should not be involved since any path through such a variable corresponds a truth assignment not satisfying .
See Table I for illustration.
conjunction | variable sequences | sorted variable sequences |
.(, *).(, *)...(, *) | (, *).(, *)....(, *). | |
(, *)...(, *).(, *) | ..(, *).(, *).(, *). | |
(, *)..(, *).(, *)..(, *) | .(, *).(, *).(, *)..(, *). | |
(, *).(, *).(, *).(, *) | (, *).(, *).(, *).(, *). | |
(, *).(, *)..(, *).(, *). | (, *)..(, *).(, *).(, *).. | |
(, *).(, *).(, *).(, *) | (, *).(, *).(, *).(, *). |
First, we pay attention to the variable sequence for (the second sequence in the second column of Table I), in which the negative literal (in ) is elimilated. In the same way, you can check all the other variable sequences.
Now it is easy for us to compute the appearance frequencies of different variables in the variable sequences, by which each (, *) is counted as a single appearance of while any negative literals are not considered, as illustrated in Table II, in which we show the appearance frequencies of all the variables in the above .
variables | ||||||
appearance frequencies | 5/6 | 6/6 | 5/6 | 5/6 | 5/6 | 5/6 |
According to the variable appearance frequencies, we will impose a global ordering over all variables in such that the most frequent variables appear first, but with ties broken arbitrarily. For instance, for the shown above, we can specify a global ordering like this: . Here, is most frequent and then appears first. The other variables have the same frequency. So, we simply impose a fixed order on them: .
Following this general ordering, each conjunction in can be represented as a sorted variable sequence as illustrated in the third column of Table I, where the variables in a sequence are ordered in terms of their appearance frequencies such that more frequent variables appear before less frequent ones. In addition, a start symbol and an end symbol are used as sentinals for technical convenience. In fact, any global ordering of variables works well (i.e., you can specify any global ordering of variables), based on which a graph representation of assignments can be established. However, ordering variables according to their appearance frequencies can greatly improve the efficiency when searching a graph constructed over all the variable sequences for conjunctions in to find solusions since more variables from different conjunctions can be merged together.
Later on, by a variable sequence, we always mean a sorted variable sequence. Also, we will use and the variable sequence for interchangeably without causing any confusion.
In addition, for our algorithm, we need to introduce a graph structure to represent all the truth assignments for each ( = 1, …, ) (called a *-graph), under which evaluates to true. In the following, however, we first define a simple concept of -graphs for ease of explanation.
Definition 1. (-graph) Let … be a variable sequence representing a in as described above (with , , and each with 1, …, is a variable or a a pair of the form (, *), where is a variable). A -graph over is a directed graph, in which there is a node for each ( , …, ); and an edge for (, ) for each , , …, . In addition, for each with , …, , if it is a pair of the form (, *), an extra edge connecting to is added.
In Fig. 1(a), we show such a -graph for = = (, *).(, *)...(, *).(, *).. Beside a main path going through all the variables in , there are four off-path edges (edges not on the main path), referred to as spans attached to the main path, corresponding to (, *), (, *), (, *), and (, *), respectively. Each span is represented by the subpath covered by it. For example, we will use the subpath , , (subpath going three nodes: , , ) to stand for the span connecting and ; , , for the span connecting and ; , , for the span connecting and , and , , for the span connecting and . By using spans, the meaning of ‘*’s (it is either 0 or 1) is appropriately represented since along a span we can bypass the corresponding variable (then its value is set to 0) while along an edge on the main path we go through the corresponding variable (then its value is set to 1).

In fact, what we want is to represent, in an efficient way, all those truth assignments for each ( = 1, …, ), under which evaluates to true. However, -graphs fail to do so since when we go through from a node to another node through a span, must be selected. If represents a (, *) for some variable name , the meaning of this ‘*’ is not properly rendered. It is because (, *) indicates that is optional, but going through a span from to (, *) makes always selected. So, the notation (, *), which is used to indicate that is optional, is not correctly implemented.
For this reason, we introduce another concept, *-graphs, described as below.
Let = , …, and = , …, be two spans attached onto a same path. We say, and are overlapped, if = for some 1, …, - 1, or if = for some 1, …, - 1. For example, in Fig. 1(a), , , and , , are overlapped. , and , , are also overlapped.
Here, we notice that if we had one more span, , , , for example, it would be connected to , , , but not overlapped with , , . Being aware of this difference is important since the overlapped spans imply the consecutive ‘*’s, just like , , and , , , which correspond to two consecutive ‘*’s: (, *) and (, *). Therefore, the overlapped spans exhibit some kind of transitivity. That is, if and are two overlapped spans, the must be a new, but bigger span. Applying this operation to all the spans over a -path, we will get a ’transitive closure’ of overlapped spans.
Let be the set of all spans over the main path for a certain conjunction. The transive closure of , denoted as *, is another set of spans * = , , …, for sime , which contains the whole and is with each satisfying one of the following two conditions:
1. , or
2. There exist , ( ) such that and are overlapped and = .
Based on the above discussion, we give the following definition.
Definition 2. (*-graph) Let be a -graph. Let be its main path and be the set of all spans over . Denote by * the ‘transitive closure’ of . Then, the *-graph with respect to is the union of and *, denoted as * *.
As another example, consider = ..(, *).(, *).(, *).. Its -graph is shown in Fig. 1(c) and its *-graph in Fig. 1(d), in which we notice that we have span , , , (representing two consecutive ‘*’s) due to two overlapped spans: , , and , , . Further, we have span , , , , (representing three consecutive ‘*’s) due to , , , and , , . In the same way, we can check all the other spans in Fig. 1(d).
The purpose of the *-graph for a certain conjunction is to represent all the truth assignments, under each of which evaluates to true. Specifically, in * each root-to-leaf path corresponds to a truth assignment, by which each variable on is set to true while any other variables are set false.
Concerning *-graphs, we have the following lemma.
Lemma 1.
Let * be a *-graph for a conjunction (represented as a variable sequence) in . Then, any path from to in * represents a truth assignment, under which evaluate to true.
Proof.
(1) Corresponding to any truth assignment , under which evaluates to , there is definitely a path from to in *-path. First, we note that under such a truth assignment each variable in a positive literal must be set to 1, but with some ‘*’s set to 1 or 0. Especially, we may have more than one consecutive ‘*’s that are set 0, which are represented by a span that is the union of the corresponding overlapped spans. Therefore, for we must have a path representing it.
(2) Each path from to represents a truth assignment, under which evaluates to true. To see this, we observe that each path consists of several edges on the main path and several spans. Especially, any such path must go through every variable in a positive literal since for each of them there is no span covering it. But each span stands for a ‘*’ or more than one successive ‘*’s. ∎
For example, in Fig. 1(b), the path: represents a truth assignment: = 1, = 0, = 0, = 1, = 1, = 0, under which evaluates to true. In Fig. 1(d), the path: represents another truth assignment: = 0, = 1, = 1, = 0, = 0, = 0, under which evaluates to true. We can examine all the paths in these two graphs and find that Lemma 1 always holds for them.
III-B Algorithm
To find a truth assignment to maximize the number of satisfied s in , we will first construct a trie-like structure over , and then search bottom-up to find answers.
Let *, *, …, * be all the *-graphs constructed for all ’s in , respectively. Let and * ( = 1, …, ) be the main path of * and the transitive closure over its spans, respectively. We will construct in two steps.
If = 0, () is, of course, empty. For = 1, () is a single node. If 1, is split into (possibly empty) subsets , , …, so that each ( = 1, …, ) contains all those sequences with the same first variable name. The tries: ), ), …, ) are constructed in the same way except that at the th step, the splitting of sets is based on the th variable name (along the global ordering of variables). They are then connected from their respective roots to a single node to create ().
In Fig. 2, we show the trie constructed for the variable sequences given in the third column of Table I. In such a trie, special attention should be paid to all the leaf nodes each labeled with , representing a conjunction (or a subset of conjunctions), which can be satisfied under the truth assignment represented by the corresponding main path. For example, the subset , , associated with is satisfiable under the truth assignment represented by the path from to . Such a path is also called a tree path.
The main advantage of tries is to cluster common parts of variable sequences together to avoid possible repeated checking. Then, if variable sequences are sorted according to their appearance frequencies, more variables will be clustered. More importantly, this idea can also be applied to the variable subsequences (as will be seen later), over which some dynamical tries can be recursively constructed, leading to a polynomial-time algorithm for solving the problem.
Each node in the trie stands for a variable , referred to as the label of and denodeted as () = ; and each edge is referred to as a tree edge, labeled with a set of integers representing all the variable sequences going through , denoted as . For example, = 1, 2, 3, 4, 5, 6. It is because all the variable sequences given in Table I need to pass through this edge to reach their respective leaf nodes. In the same way, you can check all the other labels associated with tree edges.
In regard to the tree paths, we have the following lemma.
Lemma 2.
Let be a trie created over all the variable sequences in . Let = … be a root-to-leaf path in . Let be the subset of conjunctions associated with . Then, = … is satisfiable by the truth assignment represented by .
Finally, we will associate each node in the trie with a pair of numbers (pre, post) to speed up recognizing ancestor/descendant relationships of nodes in , where pre is the order number of when searching in preorder and post is the order number of when searching in postorder.

These two numbers can be used to characterize the ancestor/descendant relationships in as follows.
-
-
Let and be two nodes in . Then, is a descendant of iff pre() > pre() and post() < post().
For the proof of this property of any tree, see Exercise 2.3.2-20 in [14].
For instance, by checking the label associated with against the label for in Fig. 2, we see that is an ancestor of in terms of this property. Specifically, ’s label is (3, 12) and ’s label is (10, 6), and we have 3 < 10 and 12 > 6. We also see that since the pairs associated with and do not satisfy the property, must not be an ancestor of and vice versa.
In the second step, we will add all * ( = 1, …, ) to the trie to construct a trie-like graph , as illustrated in Fig. 3. This trie-like graph is constructed for all the variable sequences given in Table I, in which each span is associated with a set of numbers used to indicate what variable sequences the span belongs to. For example, the span , , (in Fig. 3) is associated with three numbers: 1, 5, 6, indicating that the span belongs to 3 conjunctions: , , and . In Fig. 3, however, the labels for all tree edges are not shown for a clear illustration.
In addition, each *-graph itself is considered to be a simple trie-like graph.
Concerning the paths in a trie-like graph, we have a lemma similar to Lemma 2.
Lemma 3.
Let be a trie-like graph created over all the variable sequences in . Let = … be a root-to-leaf path in , where some edges can be spans. Let be the subset of conjunctions associated with . Then, = … is satisfiable by the truth assignment represented by .
From Fig. 3, we can see that although the number of truth assignments for is exponential, they can be represented by a graph with polynomial numbers of nodes and edges. In fact, in a single *-graph, the number of edges is bounded by O(). Thus, a trie-like graph over *-graphs has at most O() edges.

In a next step, we will search bottom-up level by level to seek all the possible largest subsets of conjunctions which can be satisfied by a certain truth assignment.
First of all, we call each node in with more than one child a branching node. For instance, node with two children and in shown in Fig. 3 is a branching node. For the same reason, and are another two branching nodes. Note that is not a branching node since it has only one child in (although it has more than one child in .)
Around the branching node, we have two very important concepts defined below.
Definition 3. (reachable subsets through spans) Let be a branching node. Let be a node on the tree path (in ) from root to (not including itself). A reachable subset of through spans are all those nodes with a same label in different subgraphs in [] (subgraph of rooted at ) and reachable from through a span, denoted as [], where is a set containing all the labels associated with the corresponding spans.
For [], node is also called its anchor node while any node in [] is called a reachable node of .
For instance, for node in Fig. 3, which is on the tree path from root to (a branching node), we have two RSs with respect to :
-
-
[] = , ,
-
-
[] = , .
We have [] due to two spans and going out of , respectively reaching and on two different *-graphs in [] with () = () = ‘’. We have [] due to another two spans going out of : and with () = () = ‘’.
Hence, is not only the anchor node of , , but also the anchor node of , .
In general, we are interested only in those RSs with |RS| 2 since any RS with = 1 only leads us to a leaf node in , and no larger subsets of conjunctions can be found. In fact, going through a span with the corresponding = 1, we cannot get any new answers. So, in the subsequent discussion, by an RS, we mean an RS with |RS| 2.
The definition of this concept for a branching node itself is a little bit different from any other node on the tree path (from root to ). Specifically, each of its RSs is defined to be a subset of nodes reachable from a span or from a tree edge. So, for we have:
-
-
[] = , ,
-
-
[] = , ,
respectively due to span and tree edge going out of with () = () = ‘’; and two spans and going out of with () = () = ‘’. Here, we notice that the label for the tree edge is 2 since this tree edge belongs to (see Fig. 2).
Concerning RSs, we have the following lemma, which is important for the construction of trie-like subgraphs.
Lemma 4.
Let be a branching node in . Let be an ancestor of on the tree path from root to . If both and exist for a certain label , then we have .
Proof.
Let * = * be a *-graph merged into . Assume that in * we have a span from a node to some other node . Then, for any descedant of on the subpath from the child of to the grandparent of , we must have a span from to due to the transitivity of spans. Assume that = . We can immediately see that . ∎
If , we say, is larger than .
Based on the concept of reachable subsets through spans, we are able to define another more important concept, upper boundaries, given below.
Definition 4. (upper boundaries) Let be a branching node. Let , , …, be all the nodes on the path from root to . An upper boundary (denoted as upBounds) with respect to is a largest subset of nodes , , …, ( > 1) with the following properties satisfied:
-
1.
Each (1 ) appears in some [] (1 ), where is a label and [] > 1.
-
2.
For any two nodes , ( ), they are not related by the ancestor/descendant relationship.
Fig. 4 gives an intuitive illustration of this concept.

As a concrete example, consider and in Fig. 3. They make up an upBound with respect to (a branching node), based on which we will construct a trie-like graph over two subgraphs, rooted at and , respectively. This can be done in a way similar to the construction of over all the initial *-graphs (which then hints a recursive process to do the task). Here, we remark that is not included since it is not invlved in any RS with respect to with 2. In fact, the truth assignment with being set to true satisfies only the conjunctions associated with leaf node . This has already been determined when the initial trie is built up in the first step.
Mainly, the following operations will be carried out when encountering a branching node .
-
•
Calculate all RSs with respect .
-
•
Calculate the upBound in terms of RSs.
-
•
Make a recursive call of the algorithm on a subgraph which is constructed over all the *-subgraphs each rooted at a node on the corresponding upBound.
See the following example for illustration.
Example 1.
When checking the branching node in the bottom-up search process, we will calculate all the reachable subsets through spans with respect to as described above: = , , = , , = , , and = , . In terms of these reachable subsets through spans, we will get the corresponding upBound , . Node (above the upBound) will not be involved in the recursive execution of the algorithm.
Concretely, when we make a recursive call of the algorithm, applied to two subgraphs: - rooted at , and - rooted at (see Fig. 5(a)), we will first construct a trie-like graph as shown in Fig. 5(b). It is in fact a single path, where stands for the merging of and , for the merging of and , and for the merging of and .

In addition, for technical convenience, we will add the corresponding branching node () to the trie as a virtual root, and a new edge as a virtual edge. See Fig. 5(c). Here, the virtual root, as well as the virtual edge, is added to keep the connection of the trie-like subgraph to the tree path from the root to this branching node in , which will greatly facilitate the trace of truth assignments for the corresponding satisfied conjunctions. Particularly, the label of a virtual edge is set to be the label for the largest , where is an anchor node of . If there are more than one largest RSs, choose any one of them. For example, the label for the virtual edge shown in Fig. 5(c) is set to be 2, 5. This is the label for (one of the two relevant RSs: and . Both of them are of the same size.) In this way, the trace of the truth assignment for a subset of satisfied conjunctions can be very easily performed.
Now, searching the path from to in Fig. 5(c) bottom-up, going through the virtual node to find the corresponding anchor node , and then searching the path from to in (see Fig. 3), we will figure out a path:
,
representing a truth assignment = 0, = 1, = 1, = 0, = 1, = 1, satisfying , . Here, we notice that the subset associated with the unique leaf node of the path is , , instead of , , , . It is because the label associated with the virtual edge is 2, 5 (which represent two spans: , covering the branching node ), by which and are filtered out from , , , .
We remember that when generating the trie over the main paths of the *-graphs created for the variable sequences shown in Table I, we have already found a (largest) subset of conjunctions , , , which can be satisfied by a truth assignment represented by the corresponding main path. This is larger than , . Therefore, , should not be kept around and this part of computation is in fact useless. To avoid this kind of futile work, we can simply perform a pre-checking: if the number of *-subgraphs, over which the recursive call of the algorithm will be invoked, is smaller than the size of a satisfiable subset of conjunctions already obtained, the recursive call of the algorithm should not be conducted.
In terms of the above discussion, we come up with a recursive algorithm shown below, in which a data structure is used to accommodate the result, represented as a set of triplets of the form:
<, , >,
where stands for a subset of conjunctions, for a truth assignment satisfying the conjunctions in , and is the size of . Initially, = .
The input of 2-MAXSAT( ) is a formula in CNF. First, we transform it to another formula in DNF (see line 1). Then, for each in , we will create its *-graph (see lines 4). Next, we will contruct a trie-like graph over all ’s (see line 5). In the last step, we call SEARCH() to produce the result (see line 6).
The input of SEARCH( ) is a trie-like subgraph . First, we will check whether is a single *-graph. If it is the case, we must have found a largest subset of conjunctions associated with the leaf node, satisfiable by a certain truth assignment (see lines 1 - 4).
Otherwise, we will search bottom up to find all the branching nodes in . But before that, each subset of conjunctions associated with a leaf node will be first merged into (see line 5 - 7).
For each branching node encountered, we will check all the nodes on the tree path from root to and compute their RSs (see lines 8 - 12), based on which we then compute the corresponding upBound with respect to (see line 13). According to the upBound , a trie-like graph will be created over a set of subgraphs each rooted at a node on (see line 14). Then, will be added to as its root (see line 15). Here, we notice that = is a simplified representation of an operation, by which we add not only , but also the corresponding virtual edges to . Next, a recursive call of the algorithm is made over (see linee 16). Finally, the result of the recursive call of the algorithm will be merged into the global answer (see line 17).
Here, the merge operation used in line 3, 7, 17 is defined as below.
Let = , …, for some 0 with each = <, , >. We have = = … = . Let = , …, for some 0 with each = <, , >. We have = = … = . By merge(, ), we will do the following checks.
-
•
If < , := .
-
•
If > , remains unchanged.
-
•
If = , := .
For simplicity, the heuristic discussed above is not incorporated into the algorithm. But it can be easily extended with this operation included.
Besides, to find a truth assignment satisfying a subset of conjunctions, we need to trace a path which may contain several spans, each corresponding to a recursive call of SEARCH( ).
We will represent a recursive call by a pair <, >, where is a branching node in , and is the upBound with respect to , over which a recursive call of RESEARCH( ) is invoked.
Then, a chain of recursive calls can be described as below:
-
<, > <, > … <, >,
where is a branching node in = , ( = 2, …, ) is a branching node in , the trie-like subgraph created by executing <, >, and is the upBound with respect to in .
Denote by a leaf node in . Assume that is the subset of conjunctions associated with . We will trace a path consisting of the following subpaths and spans, satisfying a largest subset of .
-
-
: treepaths from a child of to in ( = , …, 1), where is the anchor node of for = - 1, …, 0;
-
-
: spans connecting and ( = , …, 1);
-
-
: a treepath from the root of to .
See Fig. 6 for illustration.

In Fig. 6, we show a chain of three recursivel calls:
-
<, > <, > <, >.
Here, we assume that is a branching node in . By executing <, >, we will create . Further, assume that is a branching node in . Then, by executing <, >, we will generate . Next, assume that is a branching node in . We will create by executing <, >. We also assume that is a leaf node in , associated with a subset of conjunctions.
Then, the path shown in Fig. 6 consists of three treepaths from to for = 1, 2, 3, and three spans from to for = 0, 1, 2, and a tree path from the root of to .
This path represents a truth assignment satisfying , where is the intersection of all the edge labels on . ( can be changed to the intersection of all the labels associated with the virtual edges on since the intersection of all the tree edge labels is equal to or contains , as indicated by Lemma 3).
Example 2.
When applying SEARCH( ) to the *-graphs shown in Fig. 3, we will encounter three branching nodes: , , and .
-
•
Intially, when creating , each subset of conjunctions associated with a leaf node is satisfiable by a certain truth assignment represented by the corresponding main path (from root to ). Especially, , , associated with (see Fig. 2) is a largest subset of conjunctions, which can be satisfied by a certain truth assignment: = 1, = 1, = 1, = 1, = 1, = 1.
-
•
Checking . As shown in Example 1, by this checking, we will find a subset of conjunction , satisfied by a truth assignment = 0, = 1, = 1, = 0, = 1, = 1, smaller than , , . Thus, this result will not be kept around.
-
•
Checking . When we encounter this branching node, we will make a second recursive call of SEARCH( ) applied to a trie-like subgraph constructed over two subgraphs in [] (respectively rooted at and ), as shown in Fig. 7.
Figure 7: Two subgraphs in [] and an upBound. First, with respect to , we will calculate all the relevant reachable subsets through spans for all the nodes on the tree path from root to in . Altogether we have five reachable subsets through spans. Among them, associated with (on the tree path from root to in Fig. 3), we have
- [] = , ,
due to the following two spans (see Fig. 3):
- , .
Associated with (the branching node itself) have we the following four reachable subsets through spans:
- [] = , ,
- [] = , , ,
- [] = , ,
- [] = , ,
respectively due to four groups of spans shown below (see Fig. 3):
- , ,
- , , ,
- , ,
- , .
Then, in terms of these reachable subsets through spans, we can recognize the corresponding upper boundary , , (which is illustrated as a thick line in Fig. 7). Next, we will determine over what subgraphs a trie-like graph should be constructed, over which the algorithm will be recursively executed.
In Fig. 8, we show the trie-like graph built over the three *-subgraphs (rooted respectively at , , on the upBound shown in Fig. 7), in which stands for the merging of and , and for the merging of and . Again, the branching node is involved as the virtual root of this trie-like subgraph. The virtual edge is labeled with 3, 5, 6 since it stands for a span (from to ) labeled with 3, 5, and a tree edge (from to ) labeled with 6 in Fig. 3. The virtual edge is labeled with 2 since it represents a span (from to ) labeled with 2. In addition, all the spans going out of in the original graph are kept around (see Fig. 3).

By the corresponding recursive call of SEARCH( ), this graph will be constructed and then searched bottom up, by which we will encounter the first branching nodes: . Then, a next recursive call of the algorithm will be conducted, generating an upBound , , as shown in Fig. 9(a). Similar to the above discussion, we will construct the corresponding trie-like subgraph, which is just a single merged node as shown in Fig. 9(b). Adding the corresponding virtual root , and virtual edge (representing a span and a tree edge ), we will get a path as shown in Fig. 9(c), by which we will find a largest subset of conjunctions , , satifiable by a certain truth assignment: = 0, = 1, = 1, = 1, = 1, = 0. This truth assignment can be figured by tracing the corresponding path:
.
Special attention should be paid to the leaf node of the path shown in Fig. 9(c). It is associated with , , instead of , , , . It is because the intersection of all the labels associated with the virtual edges is 3, 5, 6 1, 3, 6 = 3, 6 and , should be removed.

Continuing the search of the graph shown in Fig. 8, we will encounter its second branching node , by which another set of RSs will be created:
- =
(due to the span , which corresponds to two spans in Fig. 3: and ),
- [] = ,
(due to the span and the tree edge in Fig. 8),
- [] = ,
(due to the spans and in Fig. 8).
Since = 1, it will not be further considered in the subsequent computation.
However, in terms of [] and [], we will construct an upBound , (see Fig. 8), and create a trie-like graph as shown in Fig. 10(a). Add the virtual node and the vitual edge as shown in Fig. 10(b), where the label associated with the virtual edge is set to be the same as for []. The only branching node in this graph is . With respect to , has two RSs in terms of two spans respectively to two nodes ( and ) in this subgraph (see Fig. 10(c). Also see Fig. 8 to know how these two spans are created):
- [] =
(due to the span in Fig. 10(c)),
- [] =
(due to the span in Fig. 10(c)).
Both of these RSs are of size 1. Therefore, they will simply be ignored.
For itself, we have the following RS:
- [] = , .

According to this RS, we will construct the corresponding trie-like graph, as shown in Fig. 10(d), in which the virtual node is and the label of the virtual edge is 1, 2, 3, 6. By tracing the corresponding path:
.
we will get a truth assignment: = 0, = 1, = 1, = 0, = 1, = 0, satisfying a subset , . It is because 2, 5, 6 1, 2, 3, 6 = 2, 6 and , are filtered out from the subset associated with the leaf node in Fig, 10(d).
After we have returned back reversely along the chain of the recursive calls described above, we will continually explore and encounter the last branching node in (see Fig. 3), which will be handled in a way similar to and .
Concerning the correctness of Algorithm 2, we have the following proposition.
Proposition 2.
Let be a trie-like graph established over a logic formula in DNF. Applying SEARCH( ) to , we will get a maximum subset of conjunctions satisfying a certain truth assignment.
Proof.
To prove the proposition, we first show that any subset of conjunctions found by the algorithm must be satisfied by a same truth assignment. This can be observed by the definition of RSs and the corresponding upBounds.
We then need to show that any subset of conjunctions satisfiable by a certain truth assignment can be found by the algorithm. For this purpose, consider a subset of conjunctions = , …, ( > 1) which can be satisfied by a truth assignment represented by a path . We will prove by induction on the number of spans on that our algorithm is able to find .
Basic step. When = 0, must be a tree path in and the claim holds. When = 1, the unique span on must cover a branching node of Case 1 in . Let be such a span. Denote by the tree path from root to in . Then, by a recursive call of SEARCH( ) over the trie-like subgraph constructed with respect to we can find a sub-path ; and must be equal to the concantenation of , the span , and .
Induction step. Assume that when = , the algorithm can find .
Now, assume that contains + 1 spans , , …, , . They must corresponds to a chain of + 1 nested recursive calls of SEARCH( ). Denote by the trie-like subgraph created by the ( - 1)th recursive call, where = . Let be the first span on . Denote by the sub-path from the root of to , and by the sub-path of from to the last node of . Denote by the conjunction obtained by removing variables on from ( = 1, …, ). Let = , …, . Then, the truth assignment represented by satisfies . According to the induction hypothesis, can be found by executing SEARCH( ). Therefore, can also be found by SEARCH( ). To see this, observe the first recursive call of SEARCH( ) made when we encounter the first branching node in , by which we will find satisfying . Then, the concantenation of and definitely satisfies . This completes the proof. ∎
III-C Further improvement
The algorithm discussed in the previous subsection can be greatly improved in two ways. First, we can remove a lot of useless recursive calls of SEARCH( ) by imposing some extra controls. Secondly, any repeated recursive call can also be effectively avoided by checking same trie-like subgraphs repeatedly encountered.
- Reducing recursive calls
Consider Fig. 11(a). In this figure, we assume that and are two branching nodes in . Then, with respect to and , their ancestor will have two identical RSs:
-
[C] = [C] = , .

Thus, during the execution of SEARCH( ), the same trie-like subgraph will be created two times: one is for [C] and another is for [C], but with the same result to be produced.
However, if we create RSs only for those nodes appearing on part of a tree path, i.e., the segment between the current branching node and the lowest ancestor branching node in , this kind of redudancy can be avoided with possible lose of some answers. But the correctness of the algorithm is not affected since one of the maximum satisfiable subsets of conjunctions can always be found. See Fig. 11(b) for illustration. For this figure, the RS of with respect to is different from the RS with respect to . However, when checking , [C] will not be computed since is beyond the segment between and . Therefore, the corresponding result will not be generated. However, [C] must cover [C], implying a larger (or same-sized) subset of conjunctions which can be satisfied by a certain truth assignment.
- Avoiding repeated recursive calls
Now we consider Fig. 11(b) once again. Denote by the trie-like graph made over the subtrees respectively rooted at and , and by the trie-like graph made over the subtrees respectively rooted at , , and . It is possible that and contain some common branching nodes. Therefore, repeated recursive calls on the same trie-like subgraphs can be possibly conducted. To avoid this kind of redundancy, we can examine, by each recursive call, whether the input subgraph has been checked before. If it is the case, the corresponding recursive call should be simply suppressed. This obviously does not impact the correctness of the algorithm since a recursive call on a same subgraph will find only the same satisfiable subset of conjunctions (but with possible different assignments of variables since the trie-like subgraph may be reached through different spans). For this purpose, we will maintain a hash array with each entry used to store the result obtained by a recursive call on a certain trie-like subgraph. Specifically, for each recursive call <, > (this notation was first introduced before Example 2 to describe the chains of recursive calls), we will store the result in the address hash(). Thus, to examine whether an input subgraph has been checked before, we need only a constant time.
IV Time complexity analysis
The total running time of the algorithm consists of three parts.
The first part is the time for computing the frenquencies of variable appearances in . Since in this process each variable in a is accessed only once, = O().
The second part is the time for constructing a trie-like graph for . This part of time can be further partitioned into three portions.
-
•
: The time for sorting variable sequences for ’s. It is obviously bounded by O(log2 ).
-
•
: The time for constructing *-graphs for each ( = 1, …, ). Since for each variable sequence a transitive closure over its spans should be first created and needs O() time, this part of cost is bounded by O().
-
•
: The time for merging all *-graphs to form a trie-like graph . This part is also bounded by O().
The third part is the time for searching to find a maximum subset of conjunctions satisfied by a certain truth assignment. It is a recursive procedure.
First, we notice that in all the generated trie-like subgraphs, the number of all the branching nodes is bounded by O(). But each branching node may be involved in at most O() recursive calls (see the analysis given below) and for each recursive call at most O() time can be required to create the corresponding trie-like subgraph. Thus, the worst-case time complexity of the algorithm is bounded by O().
However, we need to make clear that each branching node can be involved at most in O() recursive calls. For this, we have the following analysis.
Consider a trie-like graph shown in Fig. 12(a), in which is a branching node. With respect to , we will have the following three RSs:
- [C] = , ,
- [D] = , , ,
- [E] = , , , ,
where , and are three label sets for the three RSs, respectively.

According to these RSs, we will construct a trie-like subgraph as shown in Fig. 12(b) and a recursive call of SEARCH( ) will be carried out. It is the first recursive call, in which is involved. During this recursive execution of SEARCH( ), will then be involved in a second recursive call, but on a smaller trie-like subgraph , whose height is one level lower than (see Fig. 12(c)). During the second recursive call, will be involved in a third recursive call. For this time, the height of the corresponding trie-like subgraph is further reduced as demonstrated in Fig. 12(d).
Together with the method discussed in the previous section to avoid repeated recursive calls on of a same trie-like subgraph, the above analysis shows that any branching node can be involved in at most recursive calls of SEARCH( ). In general, we have the following proposition.
Proposition 3.
Let be a trie-like graph and be a branching node of Cae 1 in the corresponding layered graph. Then, can be involved in at most recursive calls of SEARCH( ) (Algorithm 3) in the whole working process.
Proof.
Let , , …, ( 2) be a largest group of nodes appearing on the upBound with respect to satisfying the following three properties:
-
•
Each ( = 1. …, ) has no ancestor appearing on .
-
•
() = () = … = ().
-
•
There is not any other node with () = (), which is a descendant of any node on .
Then, in the trie-like subgraph constructed for , all the nodes in this group will be merged into a single node. The same claim applys to any other largest group of nodes on satisfying the above three properties. Thus, in a next recursive call of SEARCH( ) involving , the trie-like subgraph to be constructed must be at least one level lower than since when constructing a trie-like subgraph any RS with = 1 will not be considered. Because the height of is bounded by and any trie-like subgraph is constructed only once (using the method discussed in the previous section to avoid multiple recursive calls on a same trie-like subgraph), the proposition holds. ∎
Proposition 4.
Let be a trie-like graph over a formula in DNF containing conjunctions with variables. The time complexity of Algorithm SEARCH() is bounded by O().
Proof.
From Proposition 3, we can see that in the whole working process at most O() trie-like subgraphs can be generated. Thus, at most O() recursive calls can be carried out since any repeated recursive call on a same trie-like subgraph can be simply and effectively avoided. Therefore, the time complexity of SEARCH() is bounded by O() O() = O(). ∎
V Conclusions
In this paper, we have presented a new method to solve the 2-MAXSAT problem. The worst-case time complexity of the algorithm is bounded by O(), where and are respectively the numbers of clauses and variables of a logic formula (over a set of variables) in CNF with each clause containing at most 2 literals. The main idea behind this is to construct a different formula (over a set of variables) in DNF, according to , with the property that for a given integer * has at least * clauses satisfied by a truth assignment for if and only if has least * conjunctions satisfied by a truth assignment for . To find a truth assignment that maximizes the number of satisfied conjunctions in , a graph structure, called *-graph, is introduced to represent each conjunction in . In this way, all the conjunctions in can be represented as a trie-like graph . Searching bottom up in a recursive way, we can find the answer efficiently.
References
- [1] J. Argelich, et. al., MinSAT versus MaxSAT for Optimization Problems, International Conference on Principles and Practice of Constraint Programming, 2013, pp. 133-142.
- [2] Y. Chen, The 2-MAXSAT Problem Can Be Solved in Polynomial Time, in Proc. CSCI2022, IEEE, Dec. 14-16, 2022, Las Vegas, USA, pp. 473-480.
- [3] R.H. Connelly and F.L. Morris, A generalization of the trie data structure. Mathematical Structures in Computer Science. 5 (3). Syracuse University: 381–418. doi:10.1017/S0960129500000803. S2CID 18747244. (1993).
- [4] S. A. Cook, The complexity of theorem-proving procedures, in: Proc. of the 3rd Annual ACM Symposium on the Theory of Computing, 1971, pp. 151-158.
- [5] Y. Djenouri, Z. Habbas, D. Djenouri, Data Mining-Based Decomposition for Solving the MAXSAT Problem: Toward a New Approach, IEEE Intelligent Systems, Vol. No. 4, 2017, pp. 48-58.
- [6] C. Dumitrescu, An algorithm for MAX2SAT, International Journal of Scientific and Research Publications, Volume 6, Issue 12, December 2016.
- [7] Y. Even, A. Itai, and A. Shamir, On the complexity of timetable and multicommodity flow problems, SIAM J. Comput., 5 (1976), pp. 691-703.
- [8] M. R. Garey, D. S. Johnson, and L. Stockmeyer, Some simplified NP-complete graph problems, Theoret. Comput. Sci., (1976), pp. 237-267.
- [9] R. Impagliazzo and R. Paturi, On the complexity of k-sat. J. Comput., Syst. Sci., 62(2):367–375, 2001.
- [10] M.S. Johnson, Approximation Algorithm for Combinatorial Problems, J. Computer System Sci., 9(1974), pp. 256-278.
- [11] E. Kemppainen, Imcomplete Maxsat Solving by Linear Programming Relaxation and Rounding, Master thesis, University of Helsinki, 2020.
- [12] M. Krentel, The Complexity of Optimization Problems, J. Computer and System Sci., 36(1988), pp. 490-509.
- [13] R. Kohli, R. Krishnamurti, and P. Mirchandani, The Minimum Satisfiability Problem, SIAM J. Discrete Math., Vol. 7, No. 2, June 1994, pp. 275-283.
- [14] D.E. Knuth, The Art of Computer Programming, Vol.1, Addison-Wesley, Reading, 1969.
- [15] D.E. Knuth, The Art of Computer Programming, Vol.3, Addison-Wesley, Reading, 1975.
- [16] A. Kügel, Natural Max-SAT Encoding of Min-SAT, in: Proc. of the Learning and Intelligence Optimization Conf., LION 6, Paris, France, 2012.
- [17] C.M. Li, Z. Zhu, F. Manya and L. Simon, Exact MINSAT Solving, in: Proc. of 13th Intl. Conf. Theory and Application of Satisfiability Testing, Edinburgh, UK, 2010, PP. 363-368.
- [18] C.M. Li, Z. Zhu, F. Manya and L. Simon, Optimizing with minimum satisfiability, Artificial Intelligence, 190 (2012) 32-44.
- [19] A. Richard, A graph-theoretic definition of a sociometric clique, J. Mathematical Sociology, 3(1), 1974, pp. 113-126.
- [20] C. Papadimitriou, Computational Complexity, Addison-Wesley, 1994.
- [21] Y. Shang, Resilient consensus in multi-agent systems with state constraints, Automatica, Vol. 122, Dec., 2001, 109288.
- [22] V. Vazirani, Approximaton Algorithms, Springer Verlag, 2001.
- [23] M. Xiao, An Exact MaxSAT Algorithm: Further Observations and Further Improvements, Proc. of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22).
- [24] H. Zhang, H. Shen, and F. Manyà, Exact Algorithms for MAX-SAT, Electronic Notes in Theoretical Computer Science 86(1):190-203, May 2003.