This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An Efficient Algorithm for Solving the 2-MAXSAT Problem

Yangjun Chen *Y. Chen is with the Department of Applied Computer Science, University of Winnpeg, Manitoba, Canada, R3B 2E9.
The article is a modification and extension of a conference paper: Y. Chen, The 2-MAXSAT Problem Can Be Solved in Polynomial Time, in Proc. CSCI2022, IEEE, Dec. 14-16, 2022, Las Vegas, USA, pp. 473-480. This work is supported by NSERC, Canada, 239074-01 (242523).
Abstract

By the MAXSAT problem, we are given a set VV of mm variables and a collection CC of nn clauses over VV. We will seek a truth assignment to maximize the number of satisfied clauses. This problem is NP-complete even for its restricted version, the 2-MAXSAT problem by which every clause contains at most 2 literals. In this paper, we discuss an efficient algorithm to solve this problem. Its worst-case time complexity is bounded by O(n2m4n^{2}m^{4}). This shows that the 2-MAXSAT problem can be solved in polynomial time. Thus, the paper in fact provides a proof of PP = NP.

Index Terms:
satisfiability problem, maximum satisfiability problem, NP-hard, NP-complete, conjunctive normal form, disjunctive normal form.

I Introduction

The satisfiability problem is perhaps one of the most well-studied problems that arise in many areas of discrete optimization, such as artificial intelligence, mathematical logic, and combinatorial optimization, just to name a few. Given a set VV of Boolean (true/false) variables and a collection CC of clauses over VV, or say, a logic formula in CNF (Conjunctive Normal Form), the satisfiability problem is to determine if there is a truth assignment that satisfies all clauses in CC [4]. The problem is NP-complete even when every clause in CC has at most three literals [7]. The maximum satisfiability (MAXSAT) problem is an optimization version of satisfiabiltiy that seeks a truth assignment to maximize the number of satisfied clauses [10]. This problem is also NP-complete even for its restricted version, the so-called 2-MAXSAT problem, by which every clause in CC has at most two literals [8]. Its application can be seen in an extensive biliography [5, 8, 13, 16, 17, 18, 19, 21].

Over the past several decades, a lot of research on the MAXSAT has been conducted. Almost all of them are the approximation methods [10, 12, 20, 22, 1, 6], such as (1-1/ee)-approximation, 3/4-approximation[22], as well as the method based on the integer linear programming [11]. The only algorithms for exact solution are discussed in [24, 23]. The worst-case time complexity of [24] is bounded by O(bb2m), where bb is the maximum number of the occurrences of any variable in the clauses of CC, while the worst-case time complexity of [23] is bounded by max{\{O(2m2^{m}), O*(1.2989n1.2989^{n})}\}. In both algorithms, the traditional branch-and-bound method is used for solving the satisfiability problem, which will search for a solution by letting a variable (or a literal) be 1 or 0.As shown in [9], any algorithm based on branch-and-bound runs in O*(cmc^{m}) time with cc \geq 2.

In this paper, we discuss a polynomial time algorithm to solve the 2-MAXSAT problem. Its worst-case time complexity is bounded by O(n2m4n^{2}m^{4}), where nn and mm are the numbers of clauses and the number of variables in CC, respectively. Thus, our algorithm is in fact a proof of PP = NP.

The main idea behind our algorithm can be summarized as follows.

  1. 1.

    Given a collection CC of nn clauses over a set of variables VV with each containing at most 2 literals. Construct a formula DD over another set of variables UU, but in DNF (Disjunctive Normal Form), containing 2nn conjunctions with each of them having at most 2 literals such that there is a truth assignment for VV that satisfies at least nn* \leq nn clauses in CC if and only if there is a truth assignment for UU that satisfies at least nn* conjunctions in DD.

  2. 2.

    For each DiD_{i} in DD (ii \in {\{1, …, 2nn}\}), construct a graph, called a pp*-graph to represent all those truth assignments σ\sigma of variables such that under σ\sigma DiD_{i} evaluates to true.

  3. 3.

    Organize the pp*-graphs for all DiD_{i}’s into a trie-like graph GG. Searching GG bottom up, we can find a maximum subset of satisfied conjunctions in polynomial time.

The organization of the rest of this paper is as follow. First, in Section 2, we restate the definition of the 2-MAXSAT problem and show how to reduce it to a problem that seeks a truth assignment to maximize the number of satisfied conjunctions in a formula in DNF. Then, we discuss a basic algorithm in Section 3. Next, in Section 4, how to improve the basic algorithm is discussed. Section V is devoted to the analysis of the time complexity of the improved algorithm. Finally, a short conclusion is set forth in Section 5.

II 2-MAXSAT Problem

We will deal solely with Boolean variables (that is, those which are either true or false), which we will denote by c1c_{1}, c2c_{2}, etc. A literal is defined as either a variable or the negation of a variable (e.g., c7c_{7}, ¬\negc11c_{11} are literals). A literal ¬\negcic_{i} is true if the variable cic_{i} is false. A clause is defined as the OR of some literals, written as (l1l_{1} \lor l2l_{2} \lor …. \lor lkl_{k}) for some kk, where each lil_{i} (1 \leq ii \leq kk) is a literal, as illustrated in ¬\negc1c_{1} \lor c11c_{11}. We say that a Boolean formula is in conjunctive normal form (CNF) if it is presented as an AND of clauses: C1C_{1} \land\land CnC_{n} (nn \geq 1). For example, (¬\negc1c_{1} \lor c7c_{7} \lor ¬\negc11c_{11}) \land (c5c_{5} \lor ¬\negc2c_{2} \lor ¬\negc3c_{3}) is in CNF. In addition, a disjunctive normal form (DNF) is an OR of conjunctions: D1D_{1} \lor D2D_{2} \lor\lor DmD_{m} (mm \geq 1). For instance, (c1c_{1} \land c2c_{2}) \lor (¬\negc1c_{1} \land c11c_{11}) is in DNF.

Finally, the MAXSAT problem is to find an assignment to the variables of a Boolean formula in CNF such that the maximum number of clauses are set to true, or are satisfied. Formally:

2-MAXSAT

  • Instance: A finite set VV of variables, a Boolean formula CC = C1C_{1} \land\land CnC_{n} in CNF over VV such that each CiC_{i} has 0 < |Ci||C_{i}| \leq 2 literals (ii = 1, …, nn), and a positive integer nn* \leq nn.

  • Question: Is there a truth assignment for VV that satisfies at least nn* clauses?

In terms of [8], the 2-MAXSAT is NP-complete.

To find a truth assignment σ\sigma such that the number of clauses set to truetrue is maximized under σ\sigma, we can try all the possible assignments, and count the satisfied clauses as discussed in [18], by which bounds are set up to cut short branches. We may also use a heuristic method to find an approximate solution to the problem as described in [10].

In this paper, we propose a quite different method, by which for CC = C1C_{1} \land\land CnC_{n}, we will consider another formula DD in DNF constructed as follows.

Let CiC_{i} = ci1c_{i1} \lor ci2c_{i2} be a clause in CC, where ci1c_{i1} and ci2c_{i2} denote either variables in VV or their negations. For CiC_{i}, define a variable xix_{i}. and a pair of conjunctions: Di1D_{i1}, Di2D_{i2}, where

Di1D_{i1} = ci1c_{i1} \land xix_{i},

Di2D_{i2} = ci2c_{i2} \land ¬\negxix_{i}.

Let DD = D11D_{11} \lor D12D_{12} \lor D21D_{21} \lor D22D_{22} \lor\lor Dn1D_{n1} \lor Dn2D_{n2}. Then, given an instance of the 2-MAXSAT problem defined over a variable set VV and a collection CC of nn clauses, we can construct a logic formula DD in DNF over the set VV \cup XX in polynomial time, where XX = {\{x1x_{1}, …, xnx_{n}}\}. DD has mm = 2nn conjunctions.

Concerning the relationship of CC and DD, we have the following proposition.

Proposition 1.

Let CC and DD be a formula in CNF and a formula in DNF defined above, respectively. No less than nn* clauses in CC can be satisfied by a truth assignment for VV if and only if no less than nn* conjunctions in DD can be satisfied by some truth assignment for VV \cup XX.

Proof.

Consider every pair of conjunctions in DD: Di1D_{i1} = ci1c_{i1} \land xix_{i} and Di2D_{i2} = ci2c_{i2} \land ¬\negxix_{i} (ii \in {\{1, …, nn}\}). Clearly, under any truth assignment for the variables in VV \cup XX, at most one of Di1D_{i1} and Di2D_{i2} can be satisfied. If xix_{i} = true, we have Di1D_{i1} = ci1c_{i1} and Di2D_{i2} = false. If xix_{i} = false, we have Di2D_{i2} = ci2c_{i2} and Di1D_{i1} = false.

"\Rightarrow" Suppose there exists a truth assignment σ\sigma for CC that satisfies pp \geq nn* clauses in CC. Without loss of generality, assume that the pp clauses are C1C_{1}, C2C_{2}, …, CpC_{p}.

Then, similar to Theorem 1 of [13], we can find a truth assignment σ~\tilde{\sigma} for DD, satisfying the following condition:

For each CjC_{j} = cj1c_{j1} \lor cj2c_{j2} (jj = 1, …, pp), if cj1c_{j1} is true and cj2c_{j2} is false under σ\sigma, (1) set both cj1c_{j1} and xjx_{j} to true for σ~\tilde{\sigma}. If cj1c_{j1} is false and cj2c_{j2} is true under σ\sigma, (2) set cj2c_{j2} to true, but xjx_{j} to false for σ~\tilde{\sigma}. If both cj1c_{j1} and cj2c_{j2} are true, do (1) or (2) arbitrarily.

Obviously, we have at least nn* conjunctions in DD satisfied under σ~\tilde{\sigma}.

"\Leftarrow" We now suppose that a truth assignment σ~\tilde{\sigma} for DD with qq \geq nn* conjunctions in DD satisfied. Again, assume that those qq conjunctions are D1b1D_{1b_{1}}, D2b2D_{2b_{2}}, …, DqbqD_{qb_{q}}, where each bjb_{j} (jj = 1, …, qq) is 1 or 2.

Then, we can find a truth assignment σ\sigma for CC, satisfying the following condition:

For each DjbjD_{jb_{j}} (jj = 1, …, qq), if bjb_{j} = 1, set cj1c_{j1} to true for σ\sigma; if bjb_{j} = 2, set cj2c_{j2} to true for σ\sigma.

Clearly, under σ\sigma, we have at lease nn* clauses in CC satisfied.

The above discussion shows that the proposition holds. ∎

Proposition 1 demonstrates that the 2-MAXSAT problem can be transformed, in polynomial time, to a problem to find a maximum number of conjunctions in a logic formula in DNF.

As an example, consider the following logic formula in CNF:

C=C1C2C3=(c1c2)(c2¬c3)(c3¬c1)\begin{array}[]{ll}C=C_{1}\land C_{2}\land C_{3}\\ \hskip 10.95415pt=(c_{1}\lor c_{2})\land(c_{2}\lor\neg c_{3})\land(c_{3}\lor\neg c_{1})\end{array} (1)

Under the truth assignment σ\sigma = {\{c1c_{1} = 1, c2c_{2} = 1, c3c_{3} = 1}\}, CC evaluates to true, i.e., CiC_{i} = 1 for ii = 1, 2, 3. Thus, nn* = 3.

For CC, we will generate another formula DD, but in DNF, according to the above discussion:

D=D11D12D21D22D31D32=(c1c4)(c2¬c4)(c2c5)(¬c3¬c5)(c3c6)(¬c1¬c6).\begin{array}[]{ll}D=D_{11}\lor D_{12}\lor D_{21}\lor D_{22}\lor D_{31}\lor D_{32}\\ \hskip 11.09654pt=(c_{1}\land c_{4})\lor(c_{2}\land\neg c_{4})\lor\\ \hskip 21.90874pt(c_{2}\land c_{5})\lor(\neg c_{3}\land\neg c_{5})\lor\\ \hskip 21.90874pt(c_{3}\land c_{6})\lor(\neg c_{1}\land\neg c_{6}).\end{array} (2)

According to Proposition 1, DD should also have at least nn* = 3 conjunctions which evaluates to true under some truth assignment. In the opposite, if DD has at least 3 satisfied conjunctions under a truth assignment, then CC should have at least three clauses satisfied by some truth assignment, too. In fact, it can be seen that under the truth assignment σ~\tilde{\sigma} = {\{c1c_{1} = 1, c2c_{2} = 1, c3c_{3} = 1, c4c_{4} = 1, c5c_{5} = 1, c6c_{6} = 1}\}, DD has three satisfied conjunctions: D11D_{11}, D21D_{21}, and D31D_{31}, from which the three satisfied clauses in CC can be immediately determined.

In the following, we will discuss a polynomial time algorithm to find a maximum set of satisfied conjunctions in any logic formula in DNF, not only restricted to the case that each conjunction contains up to 2 conjuncts.

III Algorithm description

In this section, we discuss our algorithm. First, we present the main idea in Section III-A. Then, in Section 3.2, a recursive algorithm for solving the problem is described in great detail. The running time of the algorithm will be analyzed in the next section.

III-A Main idea

To develop an efficient algorithm to find a truth assignment that maximizes the number of satisfied conjunctions in formula DD = D1D_{1} \lor …, \lor DnD_{n}, where each DiD_{i} (ii = 1, …, nn) is a conjunction of variables cc (\in VV), we need to represent each DiD_{i} as a sequence of variables (referred to as a variable sequence). For this purpose, we introduce a new notation:

(cjc_{j}, *) = cjc_{j} \lor ¬\negcjc_{j} = true,

which will be inserted into DiD_{i} to represent any missing variable cjc_{j} \in DiD_{i} (i.e., cjc_{j} \in VV, but not appearing in DiD_{i}). Obviously, the truth value of each DiD_{i} remains unchanged.

In this way, the above DD can be rewritten as a new formula in DNF as follows:

D=D1D2D3D4D5D6=(c1(c2,)(c3,)c4(c5,)(c6,))((c1,)c2(c3,)¬c4(c5,)(c6,))((c1,)c2(c3,)(c4,)c5(c6,))((c1,)(c2,)¬c3(c4,)¬c5(c6,))((c1,)(c2,)c3(c4,)(c5,)c6)(¬c1(c2,)(c3,)(c4,)(c5,)¬c6)\begin{array}[]{ll}D=D_{1}\lor D_{2}\lor D_{3}\lor D_{4}\lor D_{5}\lor D_{6}\\ \hskip 11.38092pt=(c_{1}\land(c_{2},*)\land(c_{3},*)\land c_{4}\land(c_{5},*)\land(c_{6},*))\lor\\ \hskip 21.90874pt((c_{1},*)\land c_{2}\land(c_{3},*)\land\neg c_{4}\land(c_{5},*)\land(c_{6},*))\lor\\ \hskip 21.90874pt((c_{1},*)\land c_{2}\land(c_{3},*)\land(c_{4},*)\land c_{5}\land(c_{6},*))\lor&\\ \hskip 21.90874pt((c_{1},*)\land(c_{2},*)\land\neg c_{3}\land(c_{4},*)\land\neg c_{5}\land(c_{6},*))\lor\\ \hskip 21.90874pt((c_{1},*)\land(c_{2},*)\land c_{3}\land(c_{4},*)\land(c_{5},*)\land c_{6})\lor\\ \hskip 21.90874pt(\neg c_{1}\land(c_{2},*)\land(c_{3},*)\land(c_{4},*)\land(c_{5},*)\land\neg c_{6})\end{array} (3)

Doing this enables us to represent each DiD_{i} as a variable sequence, but with all the negative literals being removed. It is because if the variable in a negative literal is set to true, the corresponding conjunction DiD_{i} must be false, and our goal is to establish a graph in which each node represents a variable and each path pp corresponds to a truth assignment satisfying DiD_{i} (by which any variable on pp is set true while all those varibles not on pp are set false). Obviousely, in such a graph, any variable appearing in a negative literal should not be involved since any path through such a variable corresponds a truth assignment not satisfying DiD_{i}.

See Table I for illustration.

TABLE I: Conjunctions represented as sorted variable sequences.
conjunction variable sequences sorted variable sequences
D1D_{1} c1c_{1}.(c2c_{2}, *).(c3c_{3}, *).c4c_{4}.(c5,)(c_{5},*).(c6c_{6}, *) #.\#.(c2c_{2}, *).(c3c_{3}, *).c1c_{1}.c4c_{4}.(c5,)(c_{5},*).(c6c_{6}, *).$\$
D2D_{2} (c1c_{1}, *).c2c_{2}.c3c_{3}.(c5c_{5}, *).(c6c_{6}, *) #.\#.c2c_{2}.c3c_{3}.(c1c_{1}, *).(c5c_{5}, *).(c6c_{6}, *).$\$
D3D_{3} (c1c_{1}, *).c2c_{2}.(c3c_{3}, *).(c4c_{4}, *).c5c_{5}.(c6c_{6}, *) #.\#.c2c_{2}.(c3c_{3}, *).(c1c_{1}, *).(c4c_{4}, *).c5c_{5}.(c6c_{6}, *).$\$
D4D_{4} (c1c_{1}, *).(c2c_{2}, *).(c4c_{4}, *).(c6c_{6}, *) #.\#.(c2c_{2}, *).(c1c_{1}, *).(c4c_{4}, *).(c6c_{6}, *).$\$
D5D_{5} (c1c_{1}, *).(c2c_{2}, *).c3c_{3}.(c4c_{4}, *).(c5c_{5}, *).c6c_{6} #.\#.(c2c_{2}, *).c3c_{3}.(c1c_{1}, *).(c4c_{4}, *).(c5c_{5}, *).c6c_{6}.$\$
D6D_{6} (c2c_{2}, *).(c3c_{3}, *).(c4c_{4}, *).(c5c_{5}, *) #.\#.(c2c_{2}, *).(c3c_{3}, *).(c4c_{4}, *).(c5c_{5}, *).$\$

First, we pay attention to the variable sequence for D2D_{2} (the second sequence in the second column of Table  I), in which the negative literal ¬\negc4c_{4} (in D2D_{2}) is elimilated. In the same way, you can check all the other variable sequences.

Now it is easy for us to compute the appearance frequencies of different variables in the variable sequences, by which each (cc, *) is counted as a single appearance of cc while any negative literals are not considered, as illustrated in Table II, in which we show the appearance frequencies of all the variables in the above DD.

TABLE II: Appearance frequencies of variables.
variables c1c_{1} c2c_{2} c3c_{3} c4c_{4} c5c_{5} c6c_{6}
appearance frequencies 5/6 6/6 5/6 5/6 5/6 5/6

According to the variable appearance frequencies, we will impose a global ordering over all variables in DD such that the most frequent variables appear first, but with ties broken arbitrarily. For instance, for the DD shown above, we can specify a global ordering like this: c2c_{2} \to c3c_{3} \to c1c_{1} \to c4c_{4} \to c5c_{5} \to c6c_{6}. Here, c2c_{2} is most frequent and then appears first. The other variables have the same frequency. So, we simply impose a fixed order on them: c3c_{3} \to c1c_{1} \to c4c_{4} \to c5c_{5} \to c6c_{6}.

Following this general ordering, each conjunction DiD_{i} in DD can be represented as a sorted variable sequence as illustrated in the third column of Table I, where the variables in a sequence are ordered in terms of their appearance frequencies such that more frequent variables appear before less frequent ones. In addition, a start symbol #\# and an end symbol $\$ are used as sentinals for technical convenience. In fact, any global ordering of variables works well (i.e., you can specify any global ordering of variables), based on which a graph representation of assignments can be established. However, ordering variables according to their appearance frequencies can greatly improve the efficiency when searching a graph constructed over all the variable sequences for conjunctions in DD to find solusions since more variables from different conjunctions can be merged together.

Later on, by a variable sequence, we always mean a sorted variable sequence. Also, we will use DiD_{i} and the variable sequence for DiD_{i} interchangeably without causing any confusion.

In addition, for our algorithm, we need to introduce a graph structure to represent all the truth assignments for each DiD_{i} (ii = 1, …, nn) (called a pp*-graph), under which DiD_{i} evaluates to true. In the following, however, we first define a simple concept of pp-graphs for ease of explanation.

Definition 1. (pp-graph) Let α\alpha == d0d_{0}d1d_{1}dkd_{k}dk+1d_{k+1} be a variable sequence representing a DiD_{i} in DD as described above (with d0d_{0} == #\#, dk+1d_{k+1} == $\$, and each did_{i} with ii \in {\{1, …, kk}\} is a variable or a a pair of the form (cc, *), where cc is a variable). A pp-graph over α\alpha is a directed graph, in which there is a node for each djd_{j} (jj == 0, …, kk ++ 11); and an edge for (djd_{j}, dj+1d_{j+1}) for each jj \in {\{0, 11, …, kk}\}. In addition, for each did_{i} with ii \in {\{11, …, kk}\}, if it is a pair of the form (cc, *), an extra edge connecting dj1d_{j-1} to dj+1d_{j+1} is added.

In Fig. 1(a), we show such a pp-graph for D1D_{1} = d0d_{0}d1d_{1}d2d_{2}d3d_{3}d4d_{4}d5d_{5}d6d_{6}d7d_{7} = #.\#.(c2c_{2}, *).(c3c_{3}, *).c1c_{1}.c4c_{4}.(c5c_{5}, *).(c6c_{6}, *).$\$. Beside a main path going through all the variables in D1D_{1}, there are four off-path edges (edges not on the main path), referred to as spans attached to the main path, corresponding to (c2c_{2}, *), (c3c_{3}, *), (c5c_{5}, *), and (c6c_{6}, *), respectively. Each span is represented by the subpath covered by it. For example, we will use the subpath <<v0v_{0}, v1v_{1}, v2v_{2}>> (subpath going three nodes: v0v_{0}, v1v_{1}, v2v_{2}) to stand for the span connecting v0v_{0} and v2v_{2}; <<v1v_{1}, v2v_{2}, v3v_{3}>> for the span connecting v1v_{1} and v3v_{3}; <<v4v_{4}, v5v_{5}, v6v_{6}>> for the span connecting v4v_{4} and v6v_{6}, and <<v5v_{5}, v6v_{6}, v7v_{7}>> for the span connecting v6v_{6} and v7v_{7}. By using spans, the meaning of ‘*’s (it is either 0 or 1) is appropriately represented since along a span we can bypass the corresponding variable (then its value is set to 0) while along an edge on the main path we go through the corresponding variable (then its value is set to 1).

Refer to caption
Figure 1: Illustration for pp-graphs and pp*-graphs.

In fact, what we want is to represent, in an efficient way, all those truth assignments for each DiD_{i} (ii = 1, …, nn), under which DiD_{i} evaluates to true. However, pp-graphs fail to do so since when we go through from a node vv to another node uu through a span, uu must be selected. If uu represents a (cc, *) for some variable name cc, the meaning of this ‘*’ is not properly rendered. It is because (cc, *) indicates that cc is optional, but going through a span from vv to (cc, *) makes cc always selected. So, the notation (cc, *), which is used to indicate that cc is optional, is not correctly implemented.

For this reason, we introduce another concept, pp*-graphs, described as below.

Let s1s_{1} = <<v1v_{1}, …, vkv_{k}>> and s2s_{2} = <<u1u_{1}, …, ulu_{l}>> be two spans attached onto a same path. We say, s1s_{1} and s2s_{2} are overlapped, if u1u_{1} = vjv_{j} for some jj \in {\{1, …, kk - 1}\}, or if v1v_{1} = uju_{j^{\prime}} for some jj^{\prime} \in {\{1, …, ll - 1}\}. For example, in Fig. 1(a), <<v0v_{0}, v1v_{1}, v2v_{2}>> and <<v1v_{1}, v2v_{2}, v3v_{3}>> are overlapped. <<v4v_{4} v5v_{5}, v6v_{6}>> and <<v5v_{5}, v6v_{6}, v7v_{7}>> are also overlapped.

Here, we notice that if we had one more span, <<v3v_{3}, v4v_{4}, v5v_{5}>>, for example, it would be connected to <<v1v_{1}, v2v_{2}, v3v_{3}>>, but not overlapped with <<v1v_{1}, v2v_{2}, v3v_{3}>>. Being aware of this difference is important since the overlapped spans imply the consecutive ‘*’s, just like <<v1v_{1}, v1v_{1}, v2v_{2}>> and <<v1v_{1}, v2v_{2}, v3v_{3}>>, which correspond to two consecutive ‘*’s: (c2c_{2}, *) and (c3c_{3}, *). Therefore, the overlapped spans exhibit some kind of transitivity. That is, if s1s_{1} and s2s_{2} are two overlapped spans, the s1s_{1} \cup s2s_{2} must be a new, but bigger span. Applying this operation to all the spans over a pp-path, we will get a ’transitive closure’ of overlapped spans.

Let SS be the set of all spans over the main path pp for a certain conjunction. The transive closure of SS, denoted as SS*, is another set of spans SS* = {\{s1s_{1}, s2s_{2}, …, sls_{l}}\} for sime ll, which contains the whole SS and is with each sis_{i} satisfying one of the following two conditions:

1. sis_{i} \in SS, or

2. There exist jj, kk (\neq ii) such that sjs_{j} and sks_{k} are overlapped and sis_{i} = sjs_{j} \cup sks_{k}.

Based on the above discussion, we give the following definition.

Definition 2. (pp*-graph) Let PP be a pp-graph. Let pp be its main path and SS be the set of all spans over pp. Denote by SS* the ‘transitive closure’ of SS. Then, the pp*-graph with respect to PP is the union of pp and SS*, denoted as PP* == pp \cup SS*.

In Fig. 1(b), we show the pp*-graph with respect to the pp-graph shown in Fig. 1(a).

As another example, consider D2D_{2} = #.\#.c2c_{2}.c3c_{3}.(c1c_{1}, *).(c5c_{5}, *).(c6c_{6}, *).$\$. Its pp-graph is shown in Fig. 1(c) and its pp*-graph in Fig. 1(d), in which we notice that we have span <<u2u_{2}, u3u_{3}, u4u_{4}, u5u_{5}>> (representing two consecutive ‘*’s) due to two overlapped spans: <<u2u_{2}, u3u_{3}, u4u_{4}>> and <<u3u_{3}, u4u_{4}, u5u_{5}>>. Further, we have span <<u2u_{2}, u3u_{3}, u4u_{4}, u5u_{5}, u6u_{6}>> (representing three consecutive ‘*’s) due to <<u2u_{2}, u3u_{3}, u4u_{4}, u5u_{5}>> and <<u4u_{4}, u5u_{5}, u6u_{6}>>. In the same way, we can check all the other spans in Fig. 1(d).

The purpose of the pp*-graph for a certain conjunction DiD_{i} is to represent all the truth assignments, under each of which DiD_{i} evaluates to true. Specifically, in PP* each root-to-leaf path pp corresponds to a truth assignment, by which each variable on pp is set to true while any other variables are set false.

Concerning pp*-graphs, we have the following lemma.

Lemma 1.

Let PP* be a pp*-graph for a conjunction DiD_{i} (represented as a variable sequence) in DD. Then, any path from #\# to $\$ in PP* represents a truth assignment, under which DiD_{i} evaluate to true.

Proof.

(1) Corresponding to any truth assignment σ\sigma, under which DiD_{i} evaluates to truetrue, there is definitely a path from #\# to $\$ in pp*-path. First, we note that under such a truth assignment each variable in a positive literal must be set to 1, but with some ‘*’s set to 1 or 0. Especially, we may have more than one consecutive ‘*’s that are set 0, which are represented by a span that is the union of the corresponding overlapped spans. Therefore, for σ\sigma we must have a path representing it.

(2) Each path from #\# to $\$ represents a truth assignment, under which DiD_{i} evaluates to true. To see this, we observe that each path consists of several edges on the main path and several spans. Especially, any such path must go through every variable in a positive literal since for each of them there is no span covering it. But each span stands for a ‘*’ or more than one successive ‘*’s. ∎

For example, in Fig. 1(b), the path: v0v_{0} \to v3v_{3} \to v4v_{4} \to v5v_{5} \to v7v_{7} represents a truth assignment: {\{c1c_{1} = 1, c2c_{2} = 0, c3c_{3} = 0, c4c_{4} = 1, c5c_{5} = 1, c6c_{6} = 0}\}, under which D1D_{1} evaluates to true. In Fig.  1(d), the path: u0u_{0} \to u1u_{1} \to u2u_{2} \to u6u_{6} represents another truth assignment: {\{c1c_{1} = 0, c2c_{2} = 1, c3c_{3} = 1, c4c_{4} = 0, c5c_{5} = 0, c6c_{6} = 0}\}, under which D2D_{2} evaluates to true. We can examine all the paths in these two graphs and find that Lemma 1 always holds for them.

III-B Algorithm

To find a truth assignment to maximize the number of satisfied DjD_{j}^{\prime}s in DD, we will first construct a trie-like structure GG over DD, and then search GG bottom-up to find answers.

Let P1P_{1}*, P2P_{2}*, …, PnP_{n}* be all the pp*-graphs constructed for all DjD_{j}’s in DD, respectively. Let pjp_{j} and SjS_{j}* (jj = 1, …, nn) be the main path of PjP_{j}* and the transitive closure over its spans, respectively. We will construct GG in two steps.

In the first step, we will establish a trie [3, 15], denoted as TT = trietrie(RR) over RR = {\{p1p_{1}, …, pnp_{n}}\} as follows.

If |R||R| = 0, trietrie(RR) is, of course, empty. For |R||R| = 1, trietrie(RR) is a single node. If |R||R| >> 1, RR is split into rr (possibly empty) subsets R1R_{1}, R2R_{2}, …, RrR_{r} so that each RiR_{i} (ii = 1, …, rr) contains all those sequences with the same first variable name. The tries: trie(R1trie(R_{1}), trie(R2trie(R_{2}), …, trie(Rrtrie(R_{r}) are constructed in the same way except that at the kkth step, the splitting of sets is based on the kkth variable name (along the global ordering of variables). They are then connected from their respective roots to a single node to create trietrie(RR).

In Fig. 2, we show the trie constructed for the variable sequences given in the third column of Table I. In such a trie, special attention should be paid to all the leaf nodes each labeled with $\$, representing a conjunction (or a subset of conjunctions), which can be satisfied under the truth assignment represented by the corresponding main path. For example, the subset {\{D1D_{1}, D3D_{3}, D5D_{5}}\} associated with v7v_{7} is satisfiable under the truth assignment represented by the path from v0v_{0} to v7v_{7}. Such a path is also called a tree path.

The main advantage of tries is to cluster common parts of variable sequences together to avoid possible repeated checking. Then, if variable sequences are sorted according to their appearance frequencies, more variables will be clustered. More importantly, this idea can also be applied to the variable subsequences (as will be seen later), over which some dynamical tries can be recursively constructed, leading to a polynomial-time algorithm for solving the problem.

Each node vv in the trie stands for a variable cc, referred to as the label of vv and denodeted as ll(vv) = cc; and each edge ee is referred to as a tree edge, labeled with a set of integers representing all the variable sequences going through ee, denoted as s(e)s(e). For example, s(v0,v1)s(v_{0},v_{1}) = {\{1, 2, 3, 4, 5, 6}\}. It is because all the variable sequences given in Table I need to pass through this edge to reach their respective leaf nodes. In the same way, you can check all the other labels associated with tree edges.

In regard to the tree paths, we have the following lemma.

Lemma 2.

Let TT be a trie created over all the variable sequences in DD. Let pp = v0v_{0} s1\xrightarrow{s_{1}} v1v_{1}sk\xrightarrow{s_{k}} vkv_{k} be a root-to-leaf path in TT. Let DD^{\prime} be the subset of conjunctions associated with vkv_{k}. Then, RR = s1s_{1} \cap\cap sks_{k} \cap DD^{\prime} is satisfiable by the truth assignment represented by pp.

Finally, we will associate each node vv in the trie TT with a pair of numbers (pre, post) to speed up recognizing ancestor/descendant relationships of nodes in TT, where pre is the order number of vv when searching TT in preorder and post is the order number of vv when searching TT in postorder.

Refer to caption
Figure 2: A trie and tree encoding.

These two numbers can be used to characterize the ancestor/descendant relationships in TT as follows.

  • -

    Let vv and vv\textquoteright be two nodes in TT. Then, vv\textquoteright is a descendant of vv iff pre(vv\textquoteright) > pre(vv) and post(vv\textquoteright) < post(vv).

For the proof of this property of any tree, see Exercise 2.3.2-20 in [14].

For instance, by checking the label associated with v2v_{2} against the label for v9v_{9} in Fig. 2, we see that v2v_{2} is an ancestor of v9v_{9} in terms of this property. Specifically, v2v_{2}’s label is (3, 12) and v9v_{9}’s label is (10, 6), and we have 3 < 10 and 12 > 6. We also see that since the pairs associated with v14v_{14} and v6v_{6} do not satisfy the property, v14v_{14} must not be an ancestor of v6v_{6} and vice versa.

In the second step, we will add all SiS_{i}* (ii = 1, …, nn) to the trie TT to construct a trie-like graph GG, as illustrated in Fig. 3. This trie-like graph is constructed for all the variable sequences given in Table I, in which each span is associated with a set of numbers used to indicate what variable sequences the span belongs to. For example, the span <<v0v_{0}, v1v_{1}, v2v_{2}>> (in Fig. 3) is associated with three numbers: 1, 5, 6, indicating that the span belongs to 3 conjunctions: D1D_{1}, D5D_{5}, and D6D_{6}. In Fig.  3, however, the labels for all tree edges are not shown for a clear illustration.

In addition, each pp*-graph itself is considered to be a simple trie-like graph.

Concerning the paths in a trie-like graph, we have a lemma similar to Lemma 2.

Lemma 3.

Let GG be a trie-like graph created over all the variable sequences in DD. Let pp = v0v_{0} s1\xrightarrow{s_{1}} v1v_{1}sk\xrightarrow{s_{k}} vkv_{k} be a root-to-leaf path in GG, where some edges can be spans. Let DD^{\prime} be the subset of conjunctions associated with vkv_{k}. Then, RR = s1s_{1} \cap\cap sks_{k} \cap DD^{\prime} is satisfiable by the truth assignment represented by pp.

From Fig. 3, we can see that although the number of truth assignments for DD is exponential, they can be represented by a graph with polynomial numbers of nodes and edges. In fact, in a single pp*-graph, the number of edges is bounded by O(m2m^{2}). Thus, a trie-like graph over nn pp*-graphs has at most O(nm2nm^{2}) edges.

Refer to caption
Figure 3: A trie-like graph GG.

In a next step, we will search GG bottom-up level by level to seek all the possible largest subsets of conjunctions which can be satisfied by a certain truth assignment.

First of all, we call each node in TT with more than one child a branching node. For instance, node v3v_{3} with two children v4v_{4} and v8v_{8} in GG shown in Fig. 3 is a branching node. For the same reason, v2v_{2} and v1v_{1} are another two branching nodes. Note that v0v_{0} is not a branching node since it has only one child in TT (although it has more than one child in GG.)

Around the branching node, we have two very important concepts defined below.

Definition 3. (reachable subsets through spans) Let vv be a branching node. Let uu be a node on the tree path (in TT) from root to vv (not including vv itself). A reachable subset of uu through spans are all those nodes with a same label cc in different subgraphs in GG[vv] (subgraph of GG rooted at vv) and reachable from uu through a span, denoted as RSsv,u\textit{RS}^{v,u}_{s}[cc], where ss is a set containing all the labels associated with the corresponding spans.

For RSsv,u\textit{RS}^{v,u}_{s}[cc], node uu is also called its anchor node while any node in RSsv,u\textit{RS}^{v,u}_{s}[cc] is called a reachable node of uu.

For instance, for node v2v_{2} in Fig.  3, which is on the tree path from root to v3v_{3} (a branching node), we have two RSs with respect to v3v_{3}:

  • -

    RS{2,5}v3,v2\textit{RS}^{v_{3},v_{2}}_{\{2,5\}}[c5c_{5}] = {\{v5v_{5}, v8v_{8}}\},

  • -

    RS{2,5}v3,v2\textit{RS}^{v_{3},v_{2}}_{\{2,5\}}[c6c_{6}] = {\{v6v_{6}, v9v_{9}}\}.

We have RS{2,5}v3,v2\textit{RS}^{v_{3},v_{2}}_{\{2,5\}}[c5c_{5}] due to two spans v2v_{2} 5\xrightarrow{5} v5v_{5} and v2v_{2} 2\xrightarrow{2} v8v_{8} going out of v2v_{2}, respectively reaching v5v_{5} and v8v_{8} on two different pp*-graphs in GG[v3v_{3}] with ll(v5v_{5}) = ll(v8v_{8}) = ‘c5c_{5}’. We have RS{2,5}v3,v2\textit{RS}^{v_{3},v_{2}}_{\{2,5\}}[c6c_{6}] due to another two spans going out of v2v_{2}: v2v_{2} 5\xrightarrow{5} v6v_{6} and v2v_{2} 2\xrightarrow{2} v9v_{9} with ll(v6v_{6}) = ll(v9v_{9}) = ‘c6c_{6}’.

Hence, v2v_{2} is not only the anchor node of {\{v5v_{5}, v8v_{8}}\}, but also the anchor node of {\{v6v_{6}, v9v_{9}}\}.

In general, we are interested only in those RSs with |RS| \geq 2 since any RS with |RS||\textit{RS}| = 1 only leads us to a leaf node in TT, and no larger subsets of conjunctions can be found. In fact, going through a span with the corresponding |RS||\textit{RS}| = 1, we cannot get any new answers. So, in the subsequent discussion, by an RS, we mean an RS with |RS| \geq 2.

The definition of this concept for a branching node vv itself is a little bit different from any other node on the tree path (from root to vv). Specifically, each of its RSs is defined to be a subset of nodes reachable from a span or from a tree edge. So, for v3v_{3} we have:

  • -

    RS{2,5}v3,v3\textit{RS}^{v_{3},v_{3}}_{\{2,5\}}[c5c_{5}] = {\{v5v_{5}, v8v_{8}}\},

  • -

    RS{2,5}v3,v3\textit{RS}^{v_{3},v_{3}}_{\{2,5\}}[c6c_{6}] = {\{v6v_{6}, v9v_{9}}\},

respectively due to span v3v_{3} 5\xrightarrow{5} v5v_{5} and tree edge v3v_{3} \rightarrow v8v_{8} going out of v3v_{3} with ll(v6v_{6}) = ll(v8v_{8}) = ‘c5c_{5}’; and two spans v3v_{3} 5\xrightarrow{5} v6v_{6} and v3v_{3} 2\xrightarrow{2} v9v_{9} going out of v3v_{3} with ll(v6v_{6}) = ll(v8v_{8}) = ‘c6c_{6}’. Here, we notice that the label for the tree edge v3v_{3} \rightarrow v8v_{8} is 2 since this tree edge belongs to D2D_{2} (see Fig. 2).

Concerning RSs, we have the following lemma, which is important for the construction of trie-like subgraphs.

Lemma 4.

Let vv be a branching node in GG. Let uu be an ancestor of uu^{\prime} on the tree path from root to vv. If both RSsv,u[c]RS_{s}^{v,u}[c] and RSsv,u[c]RS_{s}^{v,u^{\prime}}[c] exist for a certain label cc, then we have RSsv,u[c]RS_{s}^{v,u}[c] \subseteq RSsv,u[c]RS_{s}^{v,u^{\prime}}[c].

Proof.

Let PP* = pp \cup SS* be a pp*-graph merged into GG. Assume that in PP* we have a span from a node uu to some other node ww. Then, for any descedant uu^{\prime} of uu on the subpath from the child of uu to the grandparent of ww, we must have a span from uu^{\prime} to ww due to the transitivity of spans. Assume that l(w)l(w) = cc. We can immediately see that RSsv,u[c]RS_{s}^{v,u}[c] \subseteq RSsv,u[c]RS_{s}^{v,u^{\prime}}[c]. ∎

If RSsv,u[c]RS_{s}^{v,u}[c] \subset RSsv,u[c]RS_{s}^{v,u^{\prime}}[c], we say, RSsv,u[c]RS_{s}^{v,u^{\prime}}[c] is larger than RSsv,u[c]RS_{s}^{v,u}[c].

Based on the concept of reachable subsets through spans, we are able to define another more important concept, upper boundaries, given below.

Definition 4. (upper boundaries) Let vv be a branching node. Let v1v_{1}, v2v_{2}, …, vkv_{k} be all the nodes on the path from root to vv. An upper boundary (denoted as upBounds) with respect to vv is a largest subset of nodes {\{u1u_{1}, u2u_{2}, …, ufu_{f}}\} (ff > 1) with the following properties satisfied:

  1. 1.

    Each ugu_{g} (1 \leq gg \leq ff) appears in some RSsv,vi\textit{RS}^{v,v_{i}}_{s}[cc] (1 \leq ii \leq kk), where cc is a label and |RSsv,vi|\textit{RS}^{v,v_{i}}_{s}[cc]|| > 1.

  2. 2.

    For any two nodes ugu_{g}, ugu_{g^{\prime}} (gg \neq gg^{\prime}), they are not related by the ancestor/descendant relationship.

Fig.  4 gives an intuitive illustration of this concept.

Refer to caption
Figure 4: Illustration for upBounds.

As a concrete example, consider v5v_{5} and v8v_{8} in Fig.  3. They make up an upBound with respect to v3v_{3} (a branching node), based on which we will construct a trie-like graph over two subgraphs, rooted at v5v_{5} and v8v_{8}, respectively. This can be done in a way similar to the construction of GG over all the initial pp*-graphs (which then hints a recursive process to do the task). Here, we remark that v4v_{4} is not included since it is not invlved in any RS with respect to v3v_{3} with |RS||\textit{RS}| \geq 2. In fact, the truth assignment with v4v_{4} being set to true satisfies only the conjunctions associated with leaf node v10v_{10}. This has already been determined when the initial trie is built up in the first step.

Mainly, the following operations will be carried out when encountering a branching node vv.

  • Calculate all RSs with respect vv.

  • Calculate the upBound in terms of RSs.

  • Make a recursive call of the algorithm on a subgraph which is constructed over all the pp*-subgraphs each rooted at a node on the corresponding upBound.

See the following example for illustration.

Example 1.

When checking the branching node v3v_{3} in the bottom-up search process, we will calculate all the reachable subsets through spans with respect to v3v_{3} as described above: RS{2,5}v3,v2\textit{RS}^{v_{3},v_{2}}_{\{2,5\}}[[c5c_{5}]] = {\{v5v_{5}, v8v_{8}}\}, RS{2,5}v3,v2\textit{RS}^{v_{3},v_{2}}_{\{2,5\}}[[c6c_{6}]] = {\{v6v_{6}, v9v_{9}}\}, RS{2,5}v3,v3\textit{RS}^{v_{3},v_{3}}_{\{2,5\}}[[c5c_{5}]] = {\{v5v_{5}, v8v_{8}}\}, and RS{2,5}v3,v3\textit{RS}^{v_{3},v_{3}}_{\{2,5\}}[[c6c_{6}]] = {\{v6v_{6}, v9v_{9}}\}. In terms of these reachable subsets through spans, we will get the corresponding upBound {\{v5v_{5}, v8v_{8}}\}. Node v4v_{4} (above the upBound) will not be involved in the recursive execution of the algorithm.

Concretely, when we make a recursive call of the algorithm, applied to two subgraphs: G1G_{1} - rooted at v5v_{5}, and G2G_{2} - rooted at v8v_{8} (see Fig.  5(a)), we will first construct a trie-like graph as shown in Fig.  5(b). It is in fact a single path, where v58v_{5-8} stands for the merging of v5v_{5} and v8v_{8}, v69v_{6-9} for the merging of v6v_{6} and v9v_{9}, and v710v_{7-10} for the merging of v7v_{7} and v10v_{10}.

Refer to caption
Figure 5: Illustration for construction of trie-like subgraphs.

In addition, for technical convenience, we will add the corresponding branching node (v3v_{3}) to the trie as a virtual root, and a new edge v3v_{3} 2,5\xrightarrow{2,5} v58v_{5-8} as a virtual edge. See Fig. 5(c). Here, the virtual root, as well as the virtual edge, is added to keep the connection of the trie-like subgraph to the tree path from the root to this branching node in TT, which will greatly facilitate the trace of truth assignments for the corresponding satisfied conjunctions. Particularly, the label of a virtual edge vv \rightarrow uu is set to be the label for the largest RSsv,wRS_{s}^{v,w}, where ww is an anchor node of uu. If there are more than one largest RSs, choose any one of them. For example, the label for the virtual edge shown in Fig. 5(c) is set to be {\{2, 5}\}. This is the label for RS{2,5}v3,v2\textit{RS}^{v_{3},v_{2}}_{\{2,5\}}[[c5c_{5}]] (one of the two relevant RSs: RS{2,5}v3,v2\textit{RS}^{v_{3},v_{2}}_{\{2,5\}}[[c5c_{5}]] and RS{2,5}v3,v3\textit{RS}^{v_{3},v_{3}}_{\{2,5\}}[[c5c_{5}]]. Both of them are of the same size.) In this way, the trace of the truth assignment for a subset of satisfied conjunctions can be very easily performed.

Now, searching the path from v710v_{7-10} to v58v_{5-8} in Fig.  5(c) bottom-up, going through the virtual node v3v_{3} to find the corresponding anchor node v2v_{2}, and then searching the path from v2v_{2} to v0v_{0} in TT (see Fig.  3), we will figure out a path:

v0v_{0} \rightarrow v1v_{1} \rightarrow v2v_{2} 2,5\xrightarrow{2,5} v58v_{5-8} \rightarrow v69v_{6-9} \rightarrow v710v_{7-10},

representing a truth assignment {\{c1c_{1} = 0, c2c_{2} = 1, c3c_{3} = 1, c4c_{4} = 0, c5c_{5} = 1, c6c_{6} = 1}\}, satisfying {\{D2D_{2}, D5D_{5}}\}. Here, we notice that the subset associated with the unique leaf node of the path is {\{D2D_{2}, D5D_{5}}\}, instead of {\{D1D_{1}, D2D_{2}, D3D_{3}, D5D_{5}}\}. It is because the label associated with the virtual edge v2v_{2} \rightarrow v58v_{5-8} is {\{2, 5}\} (which represent two spans: v2v_{2} 5\xrightarrow{5} v5v_{5}, v2v_{2} 2\xrightarrow{2} v8v_{8} covering the branching node v3v_{3}), by which D1D_{1} and D3D_{3} are filtered out from {\{D1D_{1}, D2D_{2}, D3D_{3}, D5D_{5}}\}.

We remember that when generating the trie TT over the main paths of the pp*-graphs created for the variable sequences shown in Table  I, we have already found a (largest) subset of conjunctions {\{D1D_{1}, D3D_{3}, D5D_{5}}\}, which can be satisfied by a truth assignment represented by the corresponding main path. This is larger than {\{D2D_{2}, D5D_{5}}\}. Therefore, {\{D2D_{2}, D5D_{5}}\} should not be kept around and this part of computation is in fact useless. To avoid this kind of futile work, we can simply perform a pre-checking: if the number of pp*-subgraphs, over which the recursive call of the algorithm will be invoked, is smaller than the size of a satisfiable subset of conjunctions already obtained, the recursive call of the algorithm should not be conducted.

In terms of the above discussion, we come up with a recursive algorithm shown below, in which a data structure RR is used to accommodate the result, represented as a set of triplets of the form:

<α\alpha, β\beta, γ\gamma>,

where α\alpha stands for a subset of conjunctions, β\beta for a truth assignment satisfying the conjunctions in α\alpha, and γ\gamma is the size of α\alpha. Initially, RR = \emptyset.

Input : a logic formula CC in CNF with each clause in CC containing at most two literals.
Output : a largest subset of clauses satisfying a certain truth assignment.
1 transform CC to another formula DD in DNF;
2 let DD = D1D_{1} \lor\lor DnD_{n};
3 for  ii = 11 to nn do
4      construct a pp*-graph PiP_{i}^{*} for DiD_{i};
5      
6construct a trie-like graph GG over P1P_{1}^{*}, …, PnP_{n}^{*};
7
8RR := SEARCH(GG);
9 return the result calculated in terms of RR;
Algorithm 1 2-MAXSAT(CC)

The input of 2-MAXSAT( ) is a formula CC in CNF. First, we transform it to another formula DD in DNF (see line 1). Then, for each DiD_{i} in DD, we will create its pp*-graph PiP_{i}^{*} (see lines 4). Next, we will contruct a trie-like graph GG over all PiP_{i}^{*}’s (see line 5). In the last step, we call SEARCH(GG) to produce the result (see line 6).

Input : a trie-like subgraphs GG.
Output : a largest subset of conjunctions satisfying a certain truth assignment.
1 if GG is a single pp*-graph then
2      RR^{\prime} := subset associated with the leaf node;
3       RR := merge(RR, RR^{\prime});
4      return RR;
5      
6for each leaf node vv in GG do
7      let RR^{\prime} be the subset associated with vv;
8       RR := merge(RR, RR^{\prime});
9let v1v_{1}, v2v_{2}, …, vkv_{k} be all branching nodes in postorder;
10 for  ii = 11 to kk do
11      let PP be the tree path from root to viv_{i};
12       for each  uu   on   PP do
13            calculate RSs of uu with respect to vv
14      create the corresponding upBound LL;
15       construct a trie-like graph DD over the subgraphs each rooted at a node on LL;
16       DD^{\prime} := {\{vv}\} \cup DD;
17       RR^{\prime} := SEARCH(DD^{\prime});
18       RR := merge(RR, RR^{\prime});
19return RR;
Algorithm 2 SEARCH(GG)

The input of SEARCH( ) is a trie-like subgraph GG. First, we will check whether GG is a single pp*-graph. If it is the case, we must have found a largest subset of conjunctions associated with the leaf node, satisfiable by a certain truth assignment (see lines 1 - 4).

Otherwise, we will search GG bottom up to find all the branching nodes in GG. But before that, each subset of conjunctions associated with a leaf node will be first merged into RR (see line 5 - 7).

For each branching node vv encountered, we will check all the nodes uu on the tree path from root to vv and compute their RSs (see lines 8 - 12), based on which we then compute the corresponding upBound with respect to vv (see line 13). According to the upBound LL, a trie-like graph DD will be created over a set of subgraphs each rooted at a node on LL (see line 14). Then, vv will be added to DD as its root (see line 15). Here, we notice that DD^{\prime} = {\{vv}\} \cup DD is a simplified representation of an operation, by which we add not only vv, but also the corresponding virtual edges to DD. Next, a recursive call of the algorithm is made over DD^{\prime} (see linee 16). Finally, the result of the recursive call of the algorithm will be merged into the global answer (see line 17).

Here, the merge operation used in line 3, 7, 17 is defined as below.

Let RR = {\{r1r_{1}, …, rtr_{t}}\} for some tt \geq 0 with each rir_{i} = <αi\alpha_{i}, βi\beta_{i}, γi\gamma_{i}>. We have γ1\gamma_{1} = γ2\gamma_{2} = … = γt\gamma_{t}. Let RR^{\prime} = {\{r1r_{1}^{\prime}, …, rsr_{s}^{\prime}}\} for some ss \geq 0 with each rir_{i}^{\prime} = <αi\alpha_{i}^{\prime}, βi\beta_{i}^{\prime}, γi\gamma_{i}^{\prime}>. We have γ1\gamma_{1}^{\prime} = γ2\gamma_{2}^{\prime} = … = γs\gamma_{s}^{\prime}. By merge(RR, RR^{\prime}), we will do the following checks.

  • If γ1\gamma_{1} < γ1\gamma_{1}^{\prime}, RR := RR^{\prime}.

  • If γ1\gamma_{1} > γ1\gamma_{1}^{\prime}, RR remains unchanged.

  • If γ1\gamma_{1} = γ1\gamma_{1}^{\prime}, RR := RR \cup RR^{\prime}.

For simplicity, the heuristic discussed above is not incorporated into the algorithm. But it can be easily extended with this operation included.

Besides, to find a truth assignment satisfying a subset of conjunctions, we need to trace a path which may contain several spans, each corresponding to a recursive call of SEARCH( ).

We will represent a recursive call by a pair <vv, LL>, where vv is a branching node in GG, and LL is the upBound with respect to vv, over which a recursive call of RESEARCH( ) is invoked.

Then, a chain of recursive calls can be described as below:

  • <v1v_{1}, L1L_{1}> \rightarrow <v2v_{2}, L2L_{2}> \rightarrow\rightarrow <vkv_{k}, LkL_{k}>,

where v1v_{1} is a branching node in G0G_{0} = GG, viv_{i} (ii = 2, …, kk) is a branching node in Gi1G_{i-1}, the trie-like subgraph created by executing <v11v_{1-1}, Li1L_{i-1}>, and LiL_{i} is the upBound with respect to viv_{i} in Gi1G_{i-1}.

Denote by wkw_{k} a leaf node in GkG_{k}. Assume that DD^{\prime} is the subset of conjunctions associated with wkw_{k}. We will trace a path consisting of the following subpaths and spans, satisfying a largest subset of DD^{\prime}.

  • -

    pip_{i}: treepaths from a child uiu_{i} of viv_{i} to wiw_{i} in GiG_{i} (ii = kk, …, 1), where wiw_{i} is the anchor node of ui+1u_{i+1} for ii = kk - 1, …, 0;

  • -

    eie_{i}: spans connecting wi1w_{i-1} and uiu_{i} (ii = kk, …, 1);

  • -

    p0p_{0}: a treepath from the root of GG to w0w_{0}.

See Fig. 6 for illustration.

Refer to caption
Figure 6: Illustration for tracing truth assignments for satisfied conjunctions.

In Fig. 6, we show a chain of three recursivel calls:

  • <v1v_{1}, L1L_{1}> \rightarrow <v2v_{2}, L2L_{2}> \rightarrow <v3v_{3}, L3L_{3}>.

Here, we assume that v1v_{1} is a branching node in GG. By executing <v1v_{1}, L1L_{1}>, we will create G1G_{1}. Further, assume that v2v_{2} is a branching node in G1G_{1}. Then, by executing <v2v_{2}, L2L_{2}>, we will generate G2G_{2}. Next, assume that v3v_{3} is a branching node in G2G_{2}. We will create G3G_{3} by executing <v3v_{3}, L3L_{3}>. We also assume that w3w_{3} is a leaf node in G3G_{3}, associated with a subset DD^{\prime} of conjunctions.

Then, the path shown in Fig. 6 consists of three treepaths from uiu_{i} to wiw_{i} for ii = 1, 2, 3, and three spans from wiw_{i} to ui+1u_{i+1} for ii = 0, 1, 2, and a tree path from the root of GG to w0w_{0}.

This path represents a truth assignment satisfying ss \cap DD^{\prime}, where ss is the intersection of all the edge labels on pp. (ss can be changed to the intersection of all the labels associated with the virtual edges on pp since the intersection of all the tree edge labels is equal to or contains DD^{\prime}, as indicated by Lemma 3).

Example 2.

When applying SEARCH( ) to the pp*-graphs shown in Fig.  3, we will encounter three branching nodes: v3v_{3}, v2v_{2}, and v1v_{1}.

  • Intially, when creating TT, each subset of conjunctions associated with a leaf node vv is satisfiable by a certain truth assignment represented by the corresponding main path (from root to vv). Especially, {\{D1D_{1}, D2D_{2}, D5D_{5}}\} associated with v10v_{10} (see Fig.  2) is a largest subset of conjunctions, which can be satisfied by a certain truth assignment: c1c_{1} = 1, c2c_{2} = 1, c3c_{3} = 1, c4c_{4} = 1, c5c_{5} = 1, c6c_{6} = 1.

  • Checking v3v_{3}. As shown in Example 1, by this checking, we will find a subset of conjunction {\{D2D_{2}, D5D_{5}}\} satisfied by a truth assignment {\{c1c_{1} = 0, c2c_{2} = 1, c3c_{3} = 1, c4c_{4} = 0, c5c_{5} = 1, c6c_{6} = 1}\}, smaller than {\{D1D_{1}, D2D_{2}, D5D_{5}}\}. Thus, this result will not be kept around.

  • Checking v2v_{2}. When we encounter this branching node, we will make a second recursive call of SEARCH( ) applied to a trie-like subgraph constructed over two subgraphs in GG[v2v_{2}] (respectively rooted at v3v_{3} and v11v_{11}), as shown in Fig.  7.

    Refer to caption
    Figure 7: Two subgraphs in GG[v2v_{2}] and an upBound.

    First, with respect to v2v_{2}, we will calculate all the relevant reachable subsets through spans for all the nodes on the tree path from root to v2v_{2} in GG. Altogether we have five reachable subsets through spans. Among them, associated with v1v_{1} (on the tree path from root to v2v_{2} in Fig.  3), we have

    - RS{3,6}v2,v1\textit{RS}^{v_{2},v_{1}}_{\{3,6\}}[c4c_{4}] = {\{v4v_{4}, v11v_{11}}\},

    due to the following two spans (see Fig.  3):

    - {\{v1v_{1} 3\xrightarrow{3} v4v_{4}, v1v_{1} 6\xrightarrow{6} v11v_{11}}\}.

    Associated with v2v_{2} (the branching node itself) have we the following four reachable subsets through spans:

    - RS{3,5,6}v2,v2\textit{RS}^{v_{2},v_{2}}_{\{3,5,6\}}[c4c_{4}] = {\{v4v_{4}, v11v_{11}}\},

    - RS{2,5,6}v2,v2\textit{RS}^{v_{2},v_{2}}_{\{2,5,6\}}[c5c_{5}] = {\{v5v_{5}, v8v_{8}, v12v_{12}}\},

    - RS{2,5}v2,v2\textit{RS}^{v_{2},v_{2}}_{\{2,5\}}[c6c_{6}] = {\{v6v_{6}, v9v_{9}}\},

    - RS{2,6}v2,v2\textit{RS}^{v_{2},v_{2}}_{\{2,6\}}[$\$] = {\{v10v_{10}, v13v_{13}}\},

    respectively due to four groups of spans shown below (see Fig.  3):

    - {\{v2v_{2} 3,5\xrightarrow{3,5} v4v_{4}, v2v_{2} 6\xrightarrow{6} v11v_{11}}\},
    - {\{v2v_{2} 5\xrightarrow{5} v5v_{5}, v2v_{2} 2\xrightarrow{2} v8v_{8}, v2v_{2} 6\xrightarrow{6} v12v_{12}}\},
    - {\{v2v_{2} 5\xrightarrow{5} v6v_{6}, v2v_{2} 2\xrightarrow{2} v9v_{9}}\},
    - {\{v2v_{2} 2\xrightarrow{2} v10v_{10}, v2v_{2} 6\xrightarrow{6} v13v_{13}}\}.

Then, in terms of these reachable subsets through spans, we can recognize the corresponding upper boundary {\{v4v_{4}, v8v_{8}, v11v_{11}}\} (which is illustrated as a thick line in Fig.  7). Next, we will determine over what subgraphs a trie-like graph should be constructed, over which the algorithm will be recursively executed.

In Fig.  8, we show the trie-like graph built over the three pp*-subgraphs (rooted respectively at v4v_{4}, v8v_{8}, v11v_{11} on the upBound shown in Fig.  7), in which v411v_{4-11} stands for the merging of v4v_{4} and v11v_{11}, and v512v_{5-12} for the merging of v5v_{5} and v12v_{12}. Again, the branching node v2v_{2} is involved as the virtual root of this trie-like subgraph. The virtual edge v2v_{2} 3,5,6\xrightarrow{3,5,6} v411v_{4-11} is labeled with {\{3, 5, 6}\} since it stands for a span (from v2v_{2} to v4v_{4}) labeled with {\{3, 5}\}, and a tree edge (from v2v_{2} to v11v_{11}) labeled with {\{6}\} in Fig. 3. The virtual edge v2v_{2} 2\xrightarrow{2} v8v_{8} is labeled with {\{2}\} since it represents a span (from v2v_{2} to v8v_{8}) labeled with {\{2}\}. In addition, all the spans going out of v2v_{2} in the original graph are kept around (see Fig. 3).

Refer to caption
Figure 8: A trie-like graph.

By the corresponding recursive call of SEARCH( ), this graph will be constructed and then searched bottom up, by which we will encounter the first branching nodes: v512v_{5-12}. Then, a next recursive call of the algorithm will be conducted, generating an upBound {\{v7v_{7}, v13v_{13}}\}, as shown in Fig.  9(a). Similar to the above discussion, we will construct the corresponding trie-like subgraph, which is just a single merged node v713v_{7-13} as shown in Fig.  9(b). Adding the corresponding virtual root v512v_{5-12}, and virtual edge v512v_{5-12} 1,3,6\xrightarrow{1,3,6} v713v_{7-13} (representing a span v512v_{5-12} 1,3\xrightarrow{1,3} v7v_{7} and a tree edge v512v_{5-12} 6\xrightarrow{6} v13v_{13}), we will get a path as shown in Fig.  9(c), by which we will find a largest subset of conjunctions {\{D3D_{3}, D6D_{6}}\}, satifiable by a certain truth assignment: c1c_{1} = 0, c2c_{2} = 1, c3c_{3} = 1, c4c_{4} = 1, c5c_{5} = 1, c1c_{1} = 0. This truth assignment can be figured by tracing the corresponding path:

v0v_{0} \rightarrow v1v_{1} \rightarrow v2v_{2} 3,5,6\xrightarrow{3,5,6} v411v_{4-11} \rightarrow v512v_{5-12} 1,3,6\xrightarrow{1,3,6} v713v_{7-13}.

Special attention should be paid to the leaf node of the path shown in Fig. 9(c). It is associated with {\{D3D_{3}, D6D_{6}}\}, instead of {\{D1D_{1}, D3D_{3}, D5D_{5}, D6D_{6}}\}. It is because the intersection of all the labels associated with the virtual edges is {\{3, 5, 6}\} \cap {\{1, 3, 6}\} = {\{3, 6}\} and D1D_{1}, D5D_{5} should be removed.

Refer to caption
Figure 9: Illustration for construction of a trie-like subgraph.

Continuing the search of the graph shown in Fig.  8, we will encounter its second branching node v2v_{2} , by which another set of RSs will be created:

- RS{3,6}v2,v1\textit{RS}^{v_{2},v_{1}}_{\{3,6\}} = {\{v411v_{4-11}}\}

(due to the span v1v_{1} 3,6\xrightarrow{3,6} v411v_{4-11}, which corresponds to two spans in Fig. 3: v1v_{1} 3\xrightarrow{3} v4v_{4} and v1v_{1} 6\xrightarrow{6} v11v_{11}),

- RS{2,5,6}v2,v2\textit{RS}^{v_{2},v_{2}}_{\{2,5,6\}}[c5c_{5}] = {\{v512v_{5-12}, v8v_{8}}\}

(due to the span v2v_{2} 5,6\xrightarrow{5,6} v512v_{5-12} and the tree edge v2v_{2} 2\xrightarrow{2} v8v_{8} in Fig. 8),

- RS{2,5}v2v2\textit{RS}^{v_{2}v_{2}}_{\{2,5\}}[c6c_{6}] = {\{v6v_{6}, v9v_{9}}\}

(due to the spans v2v_{2} 5\xrightarrow{5} v6v_{6} and v2v_{2} 2\xrightarrow{2} v9v_{9} in Fig. 8).

Since |RS{3,6}v2,v1||\textit{RS}^{v_{2},v_{1}}_{\{3,6\}}| = 1, it will not be further considered in the subsequent computation.

However, in terms of RS{2,5,6}v2,v2\textit{RS}^{v_{2},v_{2}}_{\{2,5,6\}}[c5c_{5}] and RS{2,5}v2,v2\textit{RS}^{v_{2},v_{2}}_{\{2,5\}}[c6c_{6}], we will construct an upBound {\{v512v_{5-12}, v8v_{8}}\} (see Fig. 8), and create a trie-like graph as shown in Fig. 10(a). Add the virtual node and the vitual edge as shown in Fig. 10(b), where the label associated with the virtual edge is set to be the same as for RS{2,5,6}v2,v2\textit{RS}^{v_{2},v_{2}}_{\{2,5,6\}}[c5c_{5}]. The only branching node in this graph is v5128v_{5-12-8}. With respect to v5128v_{5-12-8}, v2v_{2} has two RSs in terms of two spans respectively to two nodes (v69v_{6-9} and v710v_{7-10}) in this subgraph (see Fig. 10(c). Also see Fig. 8 to know how these two spans are created):

- RS{2,5}v5128,v2\textit{RS}^{v_{5-12-8},v_{2}}_{\{2,5\}}[c6c_{6}] = {\{v69v_{6-9}}\}

(due to the span v2v_{2} 2,5\xrightarrow{2,5} v69v_{6-9} in Fig. 10(c)),

- RS{2}v5128,v2\textit{RS}^{v_{5-12-8},v_{2}}_{\{2\}}[$\$] = {\{v710v_{7-10}}\}

(due to the span v2v_{2} 2\xrightarrow{2} v710v_{7-10} in Fig. 10(c)).

Both of these RSs are of size 1. Therefore, they will simply be ignored.

For v5128v_{5-12-8} itself, we have the following RS:

- RS{1,2,3,6}v5128,v5128\textit{RS}^{v_{5-12-8},v_{5-12-8}}_{\{1,2,3,6\}}[$\$] = {\{v710v_{7-10}, v13v_{13}}\}.

Refer to caption
Figure 10: Illustration for recursive execution of algorithm.

According to this RS, we will construct the corresponding trie-like graph, as shown in Fig. 10(d), in which the virtual node is v5128v_{5-12-8} and the label of the virtual edge is {\{1, 2, 3, 6}\}. By tracing the corresponding path:

v0v_{0} \rightarrow v1v_{1} \rightarrow v2v_{2} 2,5,6\xrightarrow{2,5,6} v5128v_{5-12-8} 1,2,3,6\xrightarrow{1,2,3,6} v71013v_{7-10-13}.

we will get a truth assignment: c1c_{1} = 0, c2c_{2} = 1, c3c_{3} = 1, c4c_{4} = 0, c5c_{5} = 1, c6c_{6} = 0, satisfying a subset {\{D2D_{2}, D6D_{6}}\}. It is because {\{2, 5, 6}\} \cap {\{1, 2, 3, 6}\} = {\{2, 6}\} and D1D_{1}, D3D_{3} D5D_{5} are filtered out from the subset associated with the leaf node in Fig, 10(d).

After we have returned back reversely along the chain of the recursive calls described above, we will continually explore GG and encounter the last branching node v1v_{1} in GG (see Fig.  3), which will be handled in a way similar to v3v_{3} and v2v_{2}.

Concerning the correctness of Algorithm 2, we have the following proposition.

Proposition 2.

Let GG be a trie-like graph established over a logic formula in DNF. Applying SEARCH( ) to GG, we will get a maximum subset of conjunctions satisfying a certain truth assignment.

Proof.

To prove the proposition, we first show that any subset of conjunctions found by the algorithm must be satisfied by a same truth assignment. This can be observed by the definition of RSs and the corresponding upBounds.

We then need to show that any subset of conjunctions satisfiable by a certain truth assignment can be found by the algorithm. For this purpose, consider a subset of conjunctions DD^{\prime} = {\{D1D_{1}, …, DrD_{r}}\} (rr > 1) which can be satisfied by a truth assignment represented by a path PP. We will prove by induction on the number nsn_{s} of spans on PP that our algorithm is able to find PP.

Basic step. When nsn_{s} = 0, PP must be a tree path in TT and the claim holds. When nsn_{s} = 1, the unique span on PP must cover a branching node ww of Case 1 in GG. Let uu 𝑠\xrightarrow{s} vv be such a span. Denote by PP^{\prime} the tree path from root to uu in TT. Then, by a recursive call of SEARCH( ) over the trie-like subgraph constructed with respect to ww we can find a sub-path P′′P^{\prime\prime}; and PP must be equal to the concantenation of PP^{\prime}, the span uu 𝑠\xrightarrow{s} vv, and P′′P^{\prime\prime}.

Induction step. Assume that when nsn_{s} = kk, the algorithm can find PP.

Now, assume that PP contains kk + 1 spans s1s_{1}, s2s_{2}, …, sks_{k}, sk+1s_{k+1}. They must corresponds to a chain of kk + 1 nested recursive calls of SEARCH( ). Denote by GiG_{i} the trie-like subgraph created by the (ii - 1)th recursive call, where G0G_{0} = GG. Let uu 𝑠\xrightarrow{s} vv be the first span on PP. Denote by PP^{\prime} the sub-path from the root of TT to uu, and by P′′P^{\prime\prime} the sub-path of PP from vv to the last node of PP. Denote by DjD_{j}\\backslashPP^{\prime} the conjunction obtained by removing variables on PP^{\prime} from DjD_{j} (jj = 1, …, rr). Let D′′D^{\prime\prime} = {\{ D1D_{1}\\backslashPP^{\prime}, …, DrD_{r}\\backslashPP^{\prime}}\}. Then, the truth assignment represented by P′′P^{\prime\prime} satisfies D′′D^{\prime\prime}. According to the induction hypothesis, P′′P^{\prime\prime} can be found by executing SEARCH( ). Therefore, PP can also be found by SEARCH( ). To see this, observe the first recursive call of SEARCH( ) made when we encounter the first branching node in GG^{\prime}, by which we will find P′′P^{\prime\prime} satisfying D′′D^{\prime\prime}. Then, the concantenation of PP^{\prime} and P′′P^{\prime\prime} definitely satisfies DD^{\prime}. This completes the proof. ∎

III-C Further improvement

The algorithm discussed in the previous subsection can be greatly improved in two ways. First, we can remove a lot of useless recursive calls of SEARCH( ) by imposing some extra controls. Secondly, any repeated recursive call can also be effectively avoided by checking same trie-like subgraphs repeatedly encountered.

- Reducing recursive calls

Consider Fig.  11(a). In this figure, we assume that ww and ww^{\prime} are two branching nodes in GG. Then, with respect to ww and ww^{\prime}, their ancestor uu will have two identical RSs:

  • RSsw,u\textit{RS}_{s}^{w,u}[C] = RSsw,u\textit{RS}_{s}^{w^{\prime},u}[C] = {\{v1v_{1}, v2v_{2}}\}.

Refer to caption
Figure 11: Illustration for redundancy.

Thus, during the execution of SEARCH( ), the same trie-like subgraph will be created two times: one is for RSsw,u\textit{RS}_{s}^{w,u}[C] and another is for RSsw,u\textit{RS}_{s}^{w^{\prime},u}[C], but with the same result to be produced.

However, if we create RSs only for those nodes appearing on part of a tree path, i.e., the segment between the current branching node and the lowest ancestor branching node in TT, this kind of redudancy can be avoided with possible lose of some answers. But the correctness of the algorithm is not affected since one of the maximum satisfiable subsets of conjunctions can always be found. See Fig.  11(b) for illustration. For this figure, the RS of uu with respect to ww is different from the RS with respect to ww^{\prime}. However, when checking ww, RSsw,u\textit{RS}_{s}^{w,u}[C] will not be computed since uu is beyond the segment between ww and ww^{\prime}. Therefore, the corresponding result will not be generated. However, RSsw,u\textit{RS}_{s}^{w^{\prime},u}[C] must cover RSsw,u\textit{RS}_{s}^{w,u}[C], implying a larger (or same-sized) subset of conjunctions which can be satisfied by a certain truth assignment.

- Avoiding repeated recursive calls

Now we consider Fig.  11(b) once again. Denote by G1G_{1} the trie-like graph made over the subtrees respectively rooted at v1v_{1} and v2v_{2}, and by G2G_{2} the trie-like graph made over the subtrees respectively rooted at v1v_{1}, v2v_{2}, and v3v_{3}. It is possible that G1G_{1} and G2G_{2} contain some common branching nodes. Therefore, repeated recursive calls on the same trie-like subgraphs can be possibly conducted. To avoid this kind of redundancy, we can examine, by each recursive call, whether the input subgraph has been checked before. If it is the case, the corresponding recursive call should be simply suppressed. This obviously does not impact the correctness of the algorithm since a recursive call on a same subgraph will find only the same satisfiable subset of conjunctions (but with possible different assignments of variables since the trie-like subgraph may be reached through different spans). For this purpose, we will maintain a hash array with each entry used to store the result obtained by a recursive call on a certain trie-like subgraph. Specifically, for each recursive call <vv, LL> (this notation was first introduced before Example 2 to describe the chains of recursive calls), we will store the result in the address hash(LL). Thus, to examine whether an input subgraph has been checked before, we need only a constant time.

IV Time complexity analysis

The total running time of the algorithm consists of three parts.

The first part τ1\tau_{1} is the time for computing the frenquencies of variable appearances in DD. Since in this process each variable in a DiD_{i} is accessed only once, τ1\tau_{1} = O(nmnm).

The second part τ2\tau_{2} is the time for constructing a trie-like graph GG for DD. This part of time can be further partitioned into three portions.

  • τ21\tau_{21}: The time for sorting variable sequences for DiD_{i}’s. It is obviously bounded by O(nmnmlog2 mm).

  • τ22\tau_{22}: The time for constructing pp*-graphs for each DiD_{i} (ii = 1, …, nn). Since for each variable sequence a transitive closure over its spans should be first created and needs O(m2m^{2}) time, this part of cost is bounded by O(nm2nm^{2}).

  • τ23\tau_{23}: The time for merging all pp*-graphs to form a trie-like graph GG. This part is also bounded by O(nm2nm^{2}).

The third part τ3\tau_{3} is the time for searching GG to find a maximum subset of conjunctions satisfied by a certain truth assignment. It is a recursive procedure.

First, we notice that in all the generated trie-like subgraphs, the number of all the branching nodes is bounded by O(nmnm). But each branching node may be involved in at most O(mm) recursive calls (see the analysis given below) and for each recursive call at most O(nm2nm^{2}) time can be required to create the corresponding trie-like subgraph. Thus, the worst-case time complexity of the algorithm is bounded by O(n2m4n^{2}m^{4}).

However, we need to make clear that each branching node can be involved at most in O(mm) recursive calls. For this, we have the following analysis.

Consider a trie-like graph GG shown in Fig.  12(a), in which ww is a branching node. With respect to ww, we will have the following three RSs:

- RSsw,u\textit{RS}^{w,u}_{s^{\prime}}[C] = {\{v1v_{1}, v2v_{2}}\},

- RSs′′w,u\textit{RS}^{w,u}_{s^{\prime\prime}}[D] = {\{v3v_{3}, v5v_{5}, v6v_{6}}\},

- RSs′′′w,u\textit{RS}^{w,u}_{s^{\prime\prime\prime}}[E] = {\{v4v_{4}, v7v_{7}, v8v_{8}, v9v_{9}}\},

where ss^{\prime}, s′′s^{\prime\prime} and s′′′s^{\prime\prime\prime} are three label sets for the three RSs, respectively.

Refer to caption
Figure 12: Illustration for recursive construction of trie-like subgraphs.

According to these RSs, we will construct a trie-like subgraph GG^{\prime} as shown in Fig.  12(b) and a recursive call of SEARCH( ) will be carried out. It is the first recursive call, in which ww is involved. During this recursive execution of SEARCH( ), ww will then be involved in a second recursive call, but on a smaller trie-like subgraph G′′G^{\prime\prime}, whose height is one level lower than GG^{\prime} (see Fig.  12(c)). During the second recursive call, ww will be involved in a third recursive call. For this time, the height of the corresponding trie-like subgraph is further reduced as demonstrated in Fig.  12(d).

Together with the method discussed in the previous section to avoid repeated recursive calls on of a same trie-like subgraph, the above analysis shows that any branching node can be involved in at most mm recursive calls of SEARCH( ). In general, we have the following proposition.

Proposition 3.

Let GG be a trie-like graph and ww be a branching node of Cae 1 in the corresponding layered graph. Then, ww can be involved in at most mm recursive calls of SEARCH( ) (Algorithm 3) in the whole working process.

Proof.

Let {\{v1v_{1}, v2v_{2}, …, vkv_{k}}\} (kk \geq 2) be a largest group of nodes appearing on the upBound LL with respect to ww satisfying the following three properties:

  • Each viv_{i} (ii = 1. …, kk) has no ancestor appearing on LL.

  • ll(v1v_{1}) = ll(v2v_{2}) = … = ll(vkv_{k}).

  • There is not any other node uu with ll(uu) = ll(v1v_{1}), which is a descendant of any node on LL.

Then, in the trie-like subgraph GG^{\prime} constructed for LL, all the nodes in this group will be merged into a single node. The same claim applys to any other largest group of nodes on LL satisfying the above three properties. Thus, in a next recursive call of SEARCH( ) involving ww, the trie-like subgraph G′′G^{\prime\prime} to be constructed must be at least one level lower than GG^{\prime} since when constructing a trie-like subgraph any RS with |RS||\textit{RS}| = 1 will not be considered. Because the height of GG is bounded by mm and any trie-like subgraph is constructed only once (using the method discussed in the previous section to avoid multiple recursive calls on a same trie-like subgraph), the proposition holds. ∎

Proposition 4.

Let GG be a trie-like graph over a formula in DNF containing nn conjunctions with mm variables. The time complexity of Algorithm SEARCH(GG) is bounded by O(n2m4n^{2}m^{4}).

Proof.

From Proposition 3, we can see that in the whole working process at most O(nmnm) ×\times mm trie-like subgraphs can be generated. Thus, at most O(nmnm) ×\times mm recursive calls can be carried out since any repeated recursive call on a same trie-like subgraph can be simply and effectively avoided. Therefore, the time complexity of SEARCH(GG) is bounded by O(nmnm) ×\times mm ×\times O(nm2nm^{2}) = O(n2m4n^{2}m^{4}). ∎

V Conclusions

In this paper, we have presented a new method to solve the 2-MAXSAT problem. The worst-case time complexity of the algorithm is bounded by O(n2m4n^{2}m^{4}), where nn and mm are respectively the numbers of clauses and variables of a logic formula CC (over a set VV of variables) in CNF with each clause containing at most 2 literals. The main idea behind this is to construct a different formula DD (over a set UU of variables) in DNF, according to CC, with the property that for a given integer nn* \leq nn CC has at least nn* clauses satisfied by a truth assignment for VV if and only if DD has least nn* conjunctions satisfied by a truth assignment for UU. To find a truth assignment that maximizes the number of satisfied conjunctions in DD, a graph structure, called pp*-graph, is introduced to represent each conjunction in DD. In this way, all the conjunctions in DD can be represented as a trie-like graph GG. Searching GG bottom up in a recursive way, we can find the answer efficiently.

References

  • [1] J. Argelich, et. al., MinSAT versus MaxSAT for Optimization Problems,    International Conference on Principles and Practice of Constraint Programming, 2013, pp. 133-142.
  • [2] Y. Chen, The 2-MAXSAT Problem Can Be Solved in Polynomial Time,  in Proc. CSCI2022, IEEE, Dec. 14-16, 2022, Las Vegas, USA, pp. 473-480.
  • [3] R.H. Connelly and F.L. Morris, A generalization of the trie data structure. Mathematical Structures in Computer Science. 5 (3). Syracuse University: 381–418. doi:10.1017/S0960129500000803. S2CID 18747244. (1993).
  • [4] S. A. Cook, The complexity of theorem-proving procedures,  in: Proc. of the 3rd Annual ACM Symposium on the Theory of Computing, 1971, pp. 151-158.
  • [5] Y. Djenouri, Z. Habbas, D. Djenouri, Data Mining-Based Decomposition for Solving the MAXSAT Problem: Toward a New Approach,  IEEE Intelligent Systems, Vol. No. 4, 2017, pp. 48-58.
  • [6] C. Dumitrescu, An algorithm for MAX2SAT, International Journal of Scientific and Research Publications, Volume 6, Issue 12, December 2016.
  • [7] Y. Even, A. Itai, and A. Shamir, On the complexity of timetable and multicommodity flow problems, SIAM J. Comput., 5 (1976), pp. 691-703.
  • [8] M. R. Garey, D. S. Johnson, and L. Stockmeyer, Some simplified NP-complete graph problems,    Theoret. Comput. Sci., (1976), pp. 237-267.
  • [9] R. Impagliazzo and R. Paturi, On the complexity of k-sat. J. Comput.,    Syst. Sci., 62(2):367–375, 2001.
  • [10] M.S. Johnson, Approximation Algorithm for Combinatorial Problems,    J. Computer System Sci., 9(1974), pp. 256-278.
  • [11] E. Kemppainen, Imcomplete Maxsat Solving by Linear Programming Relaxation and Rounding,    Master thesis, University of Helsinki, 2020.
  • [12] M. Krentel, The Complexity of Optimization Problems,    J. Computer and System Sci., 36(1988), pp. 490-509.
  • [13] R. Kohli, R. Krishnamurti, and P. Mirchandani, The Minimum Satisfiability Problem,    SIAM J. Discrete Math., Vol. 7, No. 2, June 1994, pp. 275-283.
  • [14] D.E. Knuth, The Art of Computer Programming, Vol.1, Addison-Wesley, Reading, 1969.
  • [15] D.E. Knuth, The Art of Computer Programming, Vol.3, Addison-Wesley, Reading, 1975.
  • [16] A. Kügel, Natural Max-SAT Encoding of Min-SAT, in: Proc. of the Learning and Intelligence Optimization Conf., LION 6, Paris, France, 2012.
  • [17] C.M. Li, Z. Zhu, F. Manya and L. Simon, Exact MINSAT Solving,    in: Proc. of 13th Intl. Conf. Theory and Application of Satisfiability Testing, Edinburgh, UK, 2010, PP. 363-368.
  • [18] C.M. Li, Z. Zhu, F. Manya and L. Simon, Optimizing with minimum satisfiability,    Artificial Intelligence, 190 (2012) 32-44.
  • [19] A. Richard, A graph-theoretic definition of a sociometric clique,    J. Mathematical Sociology, 3(1), 1974, pp. 113-126.
  • [20] C. Papadimitriou, Computational Complexity,    Addison-Wesley, 1994.
  • [21] Y. Shang, Resilient consensus in multi-agent systems with state constraints,    Automatica, Vol. 122, Dec., 2001, 109288.
  • [22] V. Vazirani, Approximaton Algorithms,    Springer Verlag, 2001.
  • [23] M. Xiao, An Exact MaxSAT Algorithm: Further Observations and Further Improvements,    Proc. of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22).
  • [24] H. Zhang, H. Shen, and F. Manyà, Exact Algorithms for MAX-SAT,    Electronic Notes in Theoretical Computer Science 86(1):190-203, May 2003.