This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Probabilistic Deduction: an Approach to Probabilistic Structured Argumentation

Xiuyi Fan Nanyang Technological University, Singapore
Abstract

This paper introduces Probabilistic Deduction (PD) as an approach to probabilistic structured argumentation. A PD framework is composed of probabilistic rules (p-rules). As rules in classical structured argumentation frameworks, p-rules form deduction systems. In addition, p-rules also represent conditional probabilities that define joint probability distributions. With PD frameworks, one performs probabilistic reasoning by solving Rule-Probabilistic Satisfiability. At the same time, one can obtain an argumentative reading to the probabilistic reasoning with arguments and attacks. In this work, we introduce a probabilistic version of the Closed-World Assumption (P-CWA) and prove that our probabilistic approach coincides with the complete extension in classical argumentation under P-CWA and with maximum entropy reasoning. We present several approaches to compute the joint probability distribution from p-rules for achieving a practical proof theory for PD. PD provides a framework to unify probabilistic reasoning with argumentative reasoning. This is the first work in probabilistic structured argumentation where the joint distribution is not assumed form external sources.

keywords:
Probabilistic Structured Argumentation, Epistemic Probabilistic Argumentation, Probabilistic Satisfiability

1 Introduction

The field of argumentation has been in rapid development in the past three decades. In argumentation, information forms arguments; one argument attacks another if the former is in conflict with the latter. As stated in Dung’s landmark paper [17], “whether or not a rational agent believes in a statement depends on whether or not the argument supporting this statement can be successfully defended against the counterarguments”, argumentation analyses statement acceptability by studying attack relations amongst arguments. As a reasoning paradigm in multi-agent settings, especially for reasoning under uncertainty or with conflict information, argumentation has seen its applications in e.g. medical (see e.g. [9, 24, 42, 21, 41, 10, 25]), legal (see [5] for an overview), and engineering (see e.g. [60] for an overview) domains.

Arguments in Dung’s abstract argumentation (AA) are atomic without internal structure. Also, in AA, there is no specification of what is an argument or an attack as these notions are assumed to be given. To have a more detailed formalisation of arguments than is available with AA, one turns to structured argumentation - using some forms of logic, arguments are built from a formal language, which serves as a representation of information; attacks are also derived from some notion representing conflicts in the underlying logic and language [3]. Both abstract argumentation and structured argumentation are seen as powerful reasoning paradigms with extensive theoretical results and practical applications (see e.g. [2] for an overview).

As reasoning with probabilistic information is considered a pertinent issue in many application areas, several different probabilistic argumentation frameworks have been developed in the literature to join probability with argumentation. As summarised in Hunter [31], two main approaches to probabilistic argumentation exist today: the epistemic and the constellations approaches. Quoting Hunter & Thimm [37] on this distinction:

In the constellations approach, the uncertainty is in the topology of the graph [of arguments]. …In the epistemic approach, the topology of the argument graph is fixed, but there is uncertainty about whether an argument is believed.

In other words, in a constellations approach, probabilities are defined over sets (extensions) of arguments, representing the uncertainty on whether sets of arguments exist in a given context; whereas in an epistemic approach, probabilities are defined over arguments, representing uncertainty on whether arguments are true. Both approaches have seen many successful development. For instance, [18, 43, 52, 30, 16, 49, 15, 54, 22, 23, 12] are works taking the constellations approach; and [55, 36, 40, 32, 31, 37, 35, 33] are works taking the epistemic approach.

As in non-probabilistic or classical argumentation, arguments in probabilistic argumentation can either be atomic or structured. For instance, amongst the works mentioned above, [18, 23, 52, 31, 32, 33, 12] are the ones studying structured arguments whereas the rest are the ones studying non-structured arguments. Within the group of works that studying probabilistic structured argumentation with the epistemic approach, i.e. [31, 32, 33], it is assumed that a probability distribution over the language is given. An implication of this assumption is that the logic component is detached from the probability component in the sense that one first performs logic operations to form arguments, and then view them through a lens of probability. In other words, there is a “logic information” component describing some knowledge that is used to construct arguments; separately, there is “probability information” which acts as an perspective filter to augment arguments.

Table 1: Examples of Different Probabilistic Argumentation Approaches.
Abstract Structured
Constellations [43, 30, 16, 49, 15, 54, 22, 12] [18, 52, 23]
Epistemic [55, 36, 40, 37, 35] [31, 32, 33]

This work aims to provide an alternative approach to epistemic probabilistic structured argumentation. Instead of assuming the duality of logic and probability, we consider all information being probabilistic and represented in the form of Probabilistic Deduction (PD) frameworks composed of probability rules (p-rules). Being the sole representation in our work, p-rules describe both probability and logic information at the same time as p-rules can be read as both conditional probabilities and production rules. Instead of taking a probability distribution from some external source, p-rules define probability distributions; at the same time, when reading them as production rules, p-rules form a deduction system that can be used to build arguments and attacks as in classical structured argumentation.

Example 1.1.

Consider a hypothetical university admission example with the following information.

  • A student is likely to receive good exam scores if he studies hard.
    𝙶𝚘𝚘𝚍𝙴𝚡𝚊𝚖𝚂𝚌𝚘𝚛𝚎𝙷𝚊𝚛𝚍𝚂𝚝𝚞𝚍𝚢\mathtt{GoodExamScore}\leftarrow\mathtt{HardStudy}:[0.80.8]

  • A student is likely to receive good exam scores if he has high IQ.
    𝙶𝚘𝚘𝚍𝙴𝚡𝚊𝚖𝚂𝚌𝚘𝚛𝚎𝙷𝚒𝚐𝚑𝙸𝚀\mathtt{GoodExamScore}\leftarrow\mathtt{HighIQ}:[0.60.6]

  • A student is likely to be admitted to university if he has good exam scores.
    𝙰𝚍𝚖𝚒𝚜𝚜𝚒𝚘𝚗𝙶𝚘𝚘𝚍𝙴𝚡𝚊𝚖𝚂𝚌𝚘𝚛𝚎\mathtt{Admission}\leftarrow\mathtt{GoodExamScore}:[0.70.7]

  • A student is likely not to be admitted if he does not have extracurricular experience.
    ¬𝙰𝚍𝚖𝚒𝚜𝚜𝚒𝚘𝚗¬𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙\mathtt{\neg Admission}\leftarrow\mathtt{\neg ExtraExp}:[0.70.7]

  • A student will have extracurricular experience if he has both time and interest for it.
    𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙𝚃𝚒𝚖𝚎𝙵𝚘𝚛𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙,𝙸𝚗𝚝𝚎𝚛𝚎𝚜𝚝𝙸𝚗𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙\mathtt{ExtraExp}\leftarrow\mathtt{TimeForExtraExp,InterestInExtraExp}:[11]

  • A student may or may not have time for extracurricular experience.
    𝚃𝚒𝚖𝚎𝙵𝚘𝚛𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙\mathtt{TimeForExtraExp}\leftarrow:[0.50.5]

  • A student is likely interest in having extracurricular experience.
    𝙸𝚗𝚝𝚎𝚛𝚎𝚜𝚝𝙸𝚗𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙\mathtt{InterestInExtraExp}\leftarrow:[0.80.8]

  • A student may or may not have high IQ.
    𝙷𝚒𝚐𝚑𝙸𝚀\mathtt{HighIQ}\leftarrow:[0.50.5]

  • A student will not study hard if he is lazy.
    ¬𝙷𝚊𝚛𝚍𝚂𝚝𝚞𝚍𝚢𝙻𝚊𝚣𝚢\neg\mathtt{HardStudy}\leftarrow\mathtt{Lazy}:[11]

Each of these statements is represented with a probabilistic rule (p-rule) denoting conditional probability. For instance,

𝙶𝚘𝚘𝚍𝙴𝚡𝚊𝚖𝚂𝚌𝚘𝚛𝚎𝙷𝚊𝚛𝚍𝚂𝚝𝚞𝚍𝚢\mathtt{GoodExamScore}\leftarrow\mathtt{HardStudy}:[0.80.8]

is read as

Pr(𝙶𝚘𝚘𝚍𝙴𝚡𝚊𝚖𝚂𝚌𝚘𝚛𝚎|𝙷𝚊𝚛𝚍𝚂𝚝𝚞𝚍𝚢)=0.8\Pr(\mathtt{GoodExamScore}|\mathtt{HardStudy})=0.8;

and

𝚃𝚒𝚖𝚎𝙵𝚘𝚛𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙\mathtt{TimeForExtraExp}\leftarrow:[0.50.5]

is read as

Pr(𝚃𝚒𝚖𝚎𝙵𝚘𝚛𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙)=0.5\Pr(\mathtt{TimeForExtraExp})=0.5.

From these p-rules, we can build arguments and specify attacks using the approach we will describe in Section 3. Some arguments and their attacks are shown in Figure 1; and readings of these arguments are summarised in Table 2. With a probability calculation approach we will introduce in Section 2, we compute probabilities for literals as follows:

Pr(𝙶𝚘𝚘𝚍𝙴𝚡𝚊𝚖𝚂𝚌𝚘𝚛𝚎)=0.744\Pr(\mathtt{GoodExamScore})=0.744, Pr(𝙷𝚊𝚛𝚍𝚂𝚝𝚞𝚍𝚢)=0.735\Pr(\mathtt{HardStudy})=0.735,
Pr(𝙷𝚒𝚐𝚑𝙸𝚀)=0.5\Pr(\mathtt{HighIQ})=0.5, Pr(𝙰𝚍𝚖𝚒𝚜𝚜𝚒𝚘𝚗)=0.521\Pr(\mathtt{Admission})=0.521,
Pr(𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙)=0.315\Pr(\mathtt{ExtraExp})=0.315, Pr(𝚃𝚒𝚖𝚎𝙵𝚘𝚛𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙)=0.5\Pr(\mathtt{TimeForExtraExp})=0.5,
Pr(𝙸𝚗𝚝𝚎𝚛𝚎𝚜𝚝𝙸𝚗𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙)=0.8\Pr(\mathtt{InterestInExtraExp})=0.8, Pr(𝙻𝚊𝚣𝚢)=0.265\Pr(\mathtt{Lazy})=0.265.

With the PD framework we will introduce in Section 3, we compute arguments probabilities as follows.

Pr(𝙰)=0.411\Pr(\mathtt{A})=0.411, Pr(𝙱)=0.201\Pr(\mathtt{B})=0.201, Pr(𝙲)=0.5\Pr(\mathtt{C})=0.5, Pr(𝙳)=0.315\Pr(\mathtt{D})=0.315.
𝙰={𝙷𝚂,𝙶𝙴𝚂,𝙰𝚍𝚖}𝙰𝚍𝚖\textstyle{\mathtt{A}=\{\mathtt{HS,GES,Adm}\}\vdash\mathtt{Adm}}𝙱={𝙷𝙸𝚀,𝙶𝙴𝚂,𝙰𝚍𝚖}𝙰𝚍𝚖\textstyle{\mathtt{B}=\{\mathtt{HIQ,GES,Adm}\}\vdash\mathtt{Adm}}𝙲={¬𝙴𝙴,¬𝙰𝚍𝚖}¬𝙰𝚍𝚖\textstyle{\mathtt{C}=\{\neg\mathtt{EE},\neg\mathtt{Adm}\}\vdash\neg\mathtt{Adm}}𝙳={𝚃𝙵𝙴𝙴,𝙸𝙸𝙴𝙴,𝙴𝙴}𝙴𝙴\textstyle{\mathtt{D}=\{\mathtt{TFEE,IIEE,EE}\}\vdash\mathtt{EE}}
Figure 1: Some arguments and attacks in Example 1.1. In this example, 𝙷𝚂,𝙶𝙴𝚂,𝙰𝚍𝚖,𝙷𝙸𝚀,𝙴𝙴\mathtt{HS,GES,Adm,HIQ,EE}, 𝚃𝙵𝙴𝙴\mathtt{TFEE}, 𝙸𝙸𝙴𝙴\mathtt{IIEE} are short hands for 𝙷𝚊𝚛𝚍𝚂𝚝𝚞𝚍𝚢,𝙶𝚘𝚘𝚍𝙴𝚡𝚊𝚖𝚂𝚌𝚘𝚛𝚎,𝙰𝚍𝚖𝚒𝚜𝚜𝚒𝚘𝚗\mathtt{HardStudy,GoodExamScore,Admission}, 𝙷𝚒𝚐𝚑𝙸𝚀,𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙\mathtt{HighIQ,ExtraExp}, 𝚃𝚒𝚖𝚎𝙵𝚘𝚛𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙\mathtt{TimeForExtraExp}, 𝙸𝚗𝚝𝚎𝚛𝚎𝚜𝚝𝙸𝚗𝙴𝚡𝚝𝚛𝚊𝙴𝚡𝚙\mathtt{InterestInExtraExp}, respectively.
Table 2: Arguments and their readings in Figure 1.
Argument Reading
𝙰\mathtt{A} With hard study, a student will score well in exams.
Thus they will be admitted to university.
𝙱\mathtt{B} With high IQ, a student will score well in exams.
Thus they will be admitted to university.
𝙲\mathtt{C} Without extracurricular experience, a student will not
be admitted to university.
𝙳\mathtt{D} With time and interest, student will have extracurricular experience.

As illustrated in Example 1.1, one can view PD as a representation for probabilistic information supported by a well defined probability semantics. (We will show in Section 2 that the probability semantics is developed from Nilssons’ probabilistic satisfiability (PSAT) [47].) At the same time, there is an argumentative interpretation to PD frameworks in which information can be arranged for presentation with the notions of arguments and attacks. This spirit is inline with contemporary approaches on argumentation for explainable AI (see e.g., [13] for a survey). In these works, there is a “computational layer” for carrying out the computation using any suitable (numerical) techniques, such as machine learning or optimization, and an “argumentation layer” built on top of the “computational layer”, in which the argumentation layer is responsible for producing explanations with argumentation notions such as arguments and attacks. A key property of our work, as we will show in Section 3, is that the two layers reconcile with each other in the sense that when there is no uncertainty with p-rules, i.e. when all p-rules derived from classical argumentation having probability 1, the probability computation coincides with the complete semantics [17] in classical abstract argumentation.

As we intend to let PD frameworks to have practical value, effort has been put into proof theories of PD as we will present in Section 4. In a nutshell, our approach works by viewing each p-rules as a constraint imposed on the probability space defined by the language. We then find a solution in the feasible region as the joint probability distribution. For each literal, we define its probability as the marginal probability computed from the joint probability distribution. For each argument, we define its probability as the sum of probabilities of all models entailing all literals in the argument. The core to our probability computation is finding the joint probability distribution. To this end, we have developed approaches using linear programming, quadratic programming and stochastic gradient descent.

A quick summary of the rest of this paper is as follows.

We introduce the notion of probabilistic rules (p-rules) in Section 2.1, describing its syntax and how p-rules define joint probability distributions. We introduce probability computation of literals in Section 2.2 with two key concepts, probabilistic open-world assumption (P-OWA) and probabilistic closed-world assumption (P-CWA), mimicking their counterparts in non-probabilistic logic. We introduce Maximum Entropy Reasoning in Section 2.3, which give a unique joint probability distribution on each sets of p-rules. As maximum entropy solutions distribute probability “as equally as possible”, we also show an important result (Lemma 2.1) that the probability of a possible world is not zero unless zero is the only value it can take. As both P-CWA and maximum entropy reasoning impose constraints on the joint distribution, we clarify their relations in Section 2.4.

In Section 3, we first give an overview to AA in Section 3.1. We then formally introduce PD framework in Section 3.2, presenting definitions of arguments and attacks in PD frameworks. We connect PD with AA in Section 3.3, showing how AA frameworks can be mapped to PD frameworks in which all p-rules have probability 1. We also present the result that on such mapping, the probability semantics of PD (under P-CWA and Maximum Entropy Reasoning) coincides with the complete extension (Theorem 3.1).

Section 4 presents proof theories of PD frameworks, focusing on the joint probability distribution calculation. Section 4.1 gives the basic approach using linear programming. This is modelled after Nilsson’s PSAT approach [47, 20]. Section 4.2 and 4.3 present approaches for computing joint distributions under P-CWA and with maximum entropy reasoning, respectively. The chief contributions are an algorithm that computes P-CWA “locally” (Theorem 4.2) and using linear entropy in place of von Neumann entropy, respectively. Section 4.4 presents the stochastic gradient descent (SGD) approach for computing joint probabilities. As we show in the performance study in Section 4.5, SGD (with its GPU implementation) is the most practical approach for reasoning with PD frameworks.

This paper builds upon our prior work [20] as follows. The concepts of p-rules and Rule-PSAT (Section 2.1) as well as the linear programming approach for calculating joint probability (Section 4.1) have been presented in [20]. In this paper, we have significantly expanded the theoretical presentation of [20] by introducing the PD framework with a probabilistic version of the closed-world assumption and connecting PD to existing argumentation frameworks. We have also present several scalable techniques for calculating joint probabilities, which pave the way to practical probabilistic structured argumentation.

Proofs of all theoretical results are shown in A.

2 Probabilistic Rules

In this section, we introduce probabilistic rules (p-rules) and satisfiability as the cornerstone of this work. We introduce probabilistic versions of closed-world and open-world assumptions and show how probability of literals can be computed from p-rules with these two assumptions. We introduce the concept of maximum entropy reasoning in the context of p-rules as well.

2.1 Probabilistic Rules and Satisfiability

Given nn atoms σ0,,σn1\sigma_{0},\ldots,\sigma_{n-1} forming a language ={σ0,,σn1}\mathcal{L}=\{\sigma_{0},\ldots,\sigma_{n-1}\}, we let c\mathcal{L}^{c} be the closure of \mathcal{L} under the classical negation ¬\neg (namely if σ\sigma\in\mathcal{L}, then σ,¬σc\sigma,\neg\sigma\in\mathcal{L}^{c}).111In this work, symbols ¬\neg, \wedge, \vee and \models take their standard meaning as in classical logic. The core representation of this work, probabilistic rule (p-rule), is defined as follows.

Definition 2.1.

[20] Given a language \mathcal{L}, a probabilistic rule (p-rule) is of the form

σ0σ1,,σk:[θ]\sigma_{0}\leftarrow\sigma_{1},\ldots,\sigma_{k}:[\theta]

for k0k\geq 0, σic\sigma_{i}\in\mathcal{L}^{c}, and 0θ10\leq\theta\leq 1.

σ0\sigma_{0} is referred to as the head of the p-rule, σ1,,σk\sigma_{1},\ldots,\sigma_{k} the body, and θ\theta the probability.

The p-rule in Definition 2.1 states that the probability of σ0\sigma_{0}, when σ1σk\sigma_{1}\ldots\sigma_{k} all hold, is θ\theta. In other words, this rule states that Pr(σ0|σ1,,σk)=θ\Pr(\sigma_{0}|\sigma_{1},\ldots,\sigma_{k})=\theta. Without loss of generality, we only consider θ>0\theta>0 in this work. In other words, for σ0_:[0]\sigma_{0}\leftarrow\_:[0], one writes ¬σ0_:[1]\neg\sigma_{0}\leftarrow\_:[1].222Throughout, _\_ stands for an anonymous variable as in Prolog.

Definition 2.2.

[28] Given a language \mathcal{L} with nn atoms, the Complete Conjunction Set (CC Set) Ω\Omega of \mathcal{L} is the set of 2n2^{n} conjunction of literals such that each conjunction contains nn distinct atoms.

Each ωΩ\omega\in\Omega is referred to as an atomic conjunction.

Ω\Omega represent the set of all possible worlds and each ωΩ\omega\in\Omega is one of them. For instance, for ={σ0,σ1}\mathcal{L}=\{\sigma_{0},\sigma_{1}\}, the CC set of \mathcal{L} is

Ω={¬σ0¬σ1,¬σ0σ1,σ0¬σ1,σ0σ1}.\Omega=\{\neg\sigma_{0}\wedge\neg\sigma_{1},\neg\sigma_{0}\wedge\sigma_{1},\sigma_{0}\wedge\neg\sigma_{1},\sigma_{0}\wedge\sigma_{1}\}.

The four atomic conjunctions are: ¬σ0¬σ1,¬σ0σ1,σ0¬σ1,\neg\sigma_{0}\wedge\neg\sigma_{1},\neg\sigma_{0}\wedge\sigma_{1},\sigma_{0}\wedge\neg\sigma_{1}, and σ0σ1\sigma_{0}\wedge\sigma_{1}.

Definition 2.3.

[20] Given a language \mathcal{L} and a set of p-rules \mathcal{R}, let Ω\Omega be the CC set of \mathcal{L}. A function π:Ω[0,1]\pi:\Omega\rightarrow[0,1] is a consistent probability distribution with respect to \mathcal{R} on \mathcal{L} for Ω\Omega if and only if:

  1. 1.

    For all ωiΩ\omega_{i}\in\Omega,

    0π(ωi)1;0\leq\pi(\omega_{i})\leq 1; (1)
  2. 2.

    It holds that:

    ωiΩπ(ωi)=1.\sum_{\omega_{i}\in\Omega}\pi(\omega_{i})=1. (2)
  3. 3.

    For each p-rule σ0:[θ]\sigma_{0}\leftarrow:[\theta]\in\mathcal{R}, it holds that:

    θ=ωiΩ,ωiσ0π(ωi).\theta=\sum_{\omega_{i}\in\Omega,\omega_{i}\models\sigma_{0}}\pi(\omega_{i}). (3)
  4. 4.

    For each p-rule σ0σ1,,σk:[θ]\sigma_{0}\leftarrow\sigma_{1},\ldots,\sigma_{k}:[\theta]\in\mathcal{R}, (k>0)(k>0), it holds that:

    θ=ωiΩ,ωiσ0σkπ(ωi)ωiΩ,ωiσ1σkπ(ωi).\theta=\frac{\sum_{\omega_{i}\in\Omega,\omega_{i}\models\sigma_{0}\wedge\ldots\wedge\sigma_{k}}\pi(\omega_{i})}{\sum_{\omega_{i}\in\Omega,\omega_{i}\models\sigma_{1}\wedge\ldots\wedge\sigma_{k}}\pi(\omega_{i})}. (4)

Our notion of consistency as given in Definition 2.3 consists of two parts. Equations 1 and 2 assert π\pi being a probability distribution over the CC set of \mathcal{L} with each π(ωi)\pi(\omega_{i}) between 0 and 1, and the sum of all π(ωi)\pi(\omega_{i}) is 1, respectively. Equations 3 and 4 assert that each p-rule should be viewed as defining conditional probabilities for which the probability of the head of the p-rule conditioned on the body is the probability. When the body is empty, the head is conditioned on the universe. In other words, Equation 3 asserts Pr(σ0)=θ\Pr(\sigma_{0})=\theta, whereas Equation 4 asserts Pr(σ0|σ1,,σk)=θ\Pr(\sigma_{0}|\sigma_{1},\ldots,\sigma_{k})=\theta.

Example 2.1.

Let ={σ0,σ1}\mathcal{L}=\{\sigma_{0},\sigma_{1}\}, ={σ0σ1:[α];σ1:[β]}\mathcal{R}=\{\sigma_{0}\leftarrow\sigma_{1}:[\alpha];\sigma_{1}\leftarrow:[\beta]\}. The CC set is Ω={¬σ0¬σ1,σ0¬σ1,¬σ0σ1,σ0σ1}\Omega=\{\neg\sigma_{0}\wedge\neg\sigma_{1},\sigma_{0}\wedge\neg\sigma_{1},\neg\sigma_{0}\wedge\sigma_{1},\sigma_{0}\wedge\sigma_{1}\}. From σ0σ1:[α]\sigma_{0}\leftarrow\sigma_{1}:[\alpha], applying Equation 3, we have

α=π(σ0σ1)π(¬σ0σ1)+π(σ0σ1).\alpha=\frac{\pi(\sigma_{0}\wedge\sigma_{1})}{\pi(\neg\sigma_{0}\wedge\sigma_{1})+\pi(\sigma_{0}\wedge\sigma_{1})}. (5)

From σ1:[β]\sigma_{1}\leftarrow:[\beta], applying Equation 4, we have

β=π(¬σ0σ1)+π(σ0σ1).\beta=\pi(\neg\sigma_{0}\wedge\sigma_{1})+\pi(\sigma_{0}\wedge\sigma_{1}). (6)

Applying Equation 2 on Ω\Omega, we have

π(¬σ0¬σ1)+π(σ0¬σ1)+π(¬σ0σ1)+π(σ0σ1)=1.\pi(\neg\sigma_{0}\wedge\neg\sigma_{1})+\pi(\sigma_{0}\wedge\neg\sigma_{1})+\pi(\neg\sigma_{0}\wedge\sigma_{1})+\pi(\sigma_{0}\wedge\sigma_{1})=1. (7)

Lastly, check the inequality given in Equation 1,

0π(¬σ0¬σ1),π(σ0¬σ1),π(¬σ0σ1),π(σ0σ1)1.0\leq\pi(\neg\sigma_{0}\wedge\neg\sigma_{1}),\pi(\sigma_{0}\wedge\neg\sigma_{1}),\pi(\neg\sigma_{0}\wedge\sigma_{1}),\pi(\sigma_{0}\wedge\sigma_{1})\leq 1. (8)

π\pi is a consistent probability distribution if and only if there is at least one solution to Equations 5-8.

With consistency defined, we are ready to define Rule-PSAT as follows.

Definition 2.4.

[20] The Rule Probabilistic Satisfiability (Rule-PSAT) problem is to determine for a set of p-rules \mathcal{R} on a language \mathcal{L}, whether there exists a consistent probability distribution for the CC set of \mathcal{L} with respect to \mathcal{R}.

If a consistent probability distribution exists, then \mathcal{R} is Rule-PSAT; otherwise, it is not.

We illustrate Rule-PSAT with Example 2.2.

Example 2.2.

(Example 2.1 continued.) To test whether \mathcal{R} is Rule-PSAT on \mathcal{L}, we need to solve Equations 5-8 for π\pi as \mathcal{R} is Rule-PSAT if and only if a solution exists. It is easy to see that this is the case as:

π(σ0σ1)\displaystyle\pi(\sigma_{0}\wedge\sigma_{1}) =αβ\displaystyle=\alpha\beta
π(¬σ0σ1)\displaystyle\pi(\neg\sigma_{0}\wedge\sigma_{1}) =βαβ\displaystyle=\beta-\alpha\beta
π(σ0¬σ1)+π(¬σ0¬σ1)\displaystyle\pi(\sigma_{0}\wedge\neg\sigma_{1})+\pi(\neg\sigma_{0}\wedge\neg\sigma_{1}) =1β\displaystyle=1-\beta

Since 0α,β10\leq\alpha,\beta\leq 1, we have 0π(σ0σ1),π(¬σ0σ1)10\leq\pi(\sigma_{0}\wedge\sigma_{1}),\pi(\neg\sigma_{0}\wedge\sigma_{1})\leq 1. We can let π(σ0¬σ1)=0\pi(\sigma_{0}\wedge\neg\sigma_{1})=0, π(¬σ0¬σ1)=1β\pi(\neg\sigma_{0}\wedge\neg\sigma_{1})=1-\beta and obtain one solution for π\pi. As the system is underspecified with four unknowns and three equations, we have infinitely many solutions to π(σ0¬σ1)\pi(\sigma_{0}\wedge\neg\sigma_{1}) and π(¬σ0¬σ1)\pi(\neg\sigma_{0}\wedge\neg\sigma_{1}) in the range of [0, 1β1-\beta].

The next example gives a set of p-rules that is not Rule-PSAT.

Example 2.3.

Let \mathcal{R} be a set of three p-rules:

{σ0σ1:[0.9],σ0:[0.8],σ1:[0.9]}.\{\sigma_{0}\leftarrow\sigma_{1}:[0.9],\sigma_{0}\leftarrow:[0.8],\sigma_{1}\leftarrow:[0.9]\}.

From σ0σ1:[0.9]\sigma_{0}\leftarrow\sigma_{1}:[0.9] and Equation 4, we have

0.9=π(σ0σ1)π(σ0σ1)+π(¬σ0σ1).0.9=\frac{\pi(\sigma_{0}\wedge\sigma_{1})}{\pi(\sigma_{0}\wedge\sigma_{1})+\pi(\neg\sigma_{0}\wedge\sigma_{1})}. (9)

From σ1:[0.9]\sigma_{1}\leftarrow:[0.9], we have

0.9=π(σ0σ1)+π(¬σ0σ1).0.9=\pi(\sigma_{0}\wedge\sigma_{1})+\pi(\neg\sigma_{0}\wedge\sigma_{1}). (10)

Subsitute (10) in (9), we have π(σ0σ1)=0.81\pi(\sigma_{0}\wedge\sigma_{1})=0.81.

From σ0:[0.8]\sigma_{0}\leftarrow:[0.8], we have

0.8=π(σ0σ1)+π(σ0¬σ1).0.8=\pi(\sigma_{0}\wedge\sigma_{1})+\pi(\sigma_{0}\wedge\neg\sigma_{1}).

Thus, π(σ0¬σ1)=0.01\pi(\sigma_{0}\wedge\neg\sigma_{1})=-0.01, which does not satisfy 0π(ωi)10\leq\pi(\omega_{i})\leq 1.

Note that there is no restriction imposed on the form of p-rules other than the ones given in Definition 2.1, as illustrated in the next two examples, Examples 2.4 and 2.5, a set of p-rules can be consistent even if there are rules in this set forming cycles or having two rules with the same head.

Example 2.4.

Consider a set of p-rules

{σ0σ1:[0.7],σ1σ0:[0.6],σ1:[0.5]}.\{\sigma_{0}\leftarrow\sigma_{1}:[0.7],\sigma_{1}\leftarrow\sigma_{0}:[0.6],\sigma_{1}\leftarrow:[0.5]\}.

We can see that there is a cycle between σ1\sigma_{1} and σ0\sigma_{0} as one can deduce σ0\sigma_{0} from σ1\sigma_{1} and deduce σ1\sigma_{1} from σ0\sigma_{0}. However, we can still compute a (unique) solution for π\pi over the CC set of {σ0,σ1}\{\sigma_{0},\sigma_{1}\}. Using Equations 2 to 4, we have:

0.7\displaystyle 0.7 =π(σ0σ1)/(π(σ0σ1)+π(¬σ0σ1)),\displaystyle=\pi(\sigma_{0}\wedge\sigma_{1})/(\pi(\sigma_{0}\wedge\sigma_{1})+\pi(\neg\sigma_{0}\wedge\sigma_{1})),
0.6\displaystyle 0.6 =π(σ0σ1)/(π(σ0σ1)+π(σ0¬σ1)),\displaystyle=\pi(\sigma_{0}\wedge\sigma_{1})/(\pi(\sigma_{0}\wedge\sigma_{1})+\pi(\sigma_{0}\wedge\neg\sigma_{1})),
0.5\displaystyle 0.5 =π(σ0σ1)+π(¬σ0σ1),\displaystyle=\pi(\sigma_{0}\wedge\sigma_{1})+\pi(\neg\sigma_{0}\wedge\sigma_{1}),
1\displaystyle 1 =π(¬σ0¬σ1)+π(¬σ0σ1)+π(σ0¬σ1)+π(σ0σ1).\displaystyle=\pi(\neg\sigma_{0}\wedge\neg\sigma_{1})+\pi(\neg\sigma_{0}\wedge\sigma_{1})+\pi(\sigma_{0}\wedge\neg\sigma_{1})+\pi(\sigma_{0}\wedge\sigma_{1}).

A solution is: π(¬σ0¬σ1)=0.27\pi(\neg\sigma_{0}\wedge\neg\sigma_{1})=0.27, π(σ0¬σ1)=0.23\pi(\sigma_{0}\wedge\neg\sigma_{1})=0.23, π(¬σ0σ1)=0.15\pi(\neg\sigma_{0}\wedge\sigma_{1})=0.15, π(σ0σ1)=0.35\pi(\sigma_{0}\wedge\sigma_{1})=0.35.

Example 2.5.

Consider a set of p-rules

{σ0σ1:[0.6],σ0σ2:[0.5],σ1:[0.7],σ2:[0.6]}.\{\sigma_{0}\leftarrow\sigma_{1}:[0.6],\sigma_{0}\leftarrow\sigma_{2}:[0.5],\sigma_{1}\leftarrow:[0.7],\sigma_{2}\leftarrow:[0.6]\}.

There are two p-rules with head σ0\sigma_{0}, namely,

σ0σ1:[0.6]\sigma_{0}\leftarrow\sigma_{1}:[0.6] and σ0σ2:[0.5]\sigma_{0}\leftarrow\sigma_{2}:[0.5].

These two p-rules have different bodies and probabilities. We set up equations as follows.333To simplify the presentation, Boolean values are used as shorthand for the literals. E.g., 111, 011, and 001 denote σ0σ1σ2\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2}, ¬σ0σ1σ2\neg\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2}, and ¬σ0¬σ1σ2\neg\sigma_{0}\wedge\neg\sigma_{1}\wedge\sigma_{2}, respectively.

0.6\displaystyle 0.6 =(π(111)+π(110))/(π(010)+π(011)+π(110)+π(111)),\displaystyle=(\pi(111)+\pi(110))/(\pi(010)+\pi(011)+\pi(110)+\pi(111)),
0.5\displaystyle 0.5 =(π(101)+π(111))/(π(001)+π(011)+π(101)+π(111)),\displaystyle=(\pi(101)+\pi(111))/(\pi(001)+\pi(011)+\pi(101)+\pi(111)),
0.7\displaystyle 0.7 =π(010)+π(011)+π(110)+π(111),\displaystyle=\pi(010)+\pi(011)+\pi(110)+\pi(111),
0.6\displaystyle 0.6 =π(001)+π(011)+π(101)+π(111),\displaystyle=\pi(001)+\pi(011)+\pi(101)+\pi(111),
1\displaystyle 1 =π(000)+π(001)+π(010)+π(011)\displaystyle=\pi(000)+\pi(001)+\pi(010)+\pi(011)
+π(100)+π(101)+π(110)+π(111).\displaystyle+\pi(100)+\pi(101)+\pi(110)+\pi(111).

Solve these, a solution found is follows:

π(000)=0\pi(000)=0, π(001)=0.02\pi(001)=0.02, π(010)=0\pi(010)=0, π(011)=0.28\pi(011)=0.28,
π(100)=0.15\pi(100)=0.15, π(101)=0.13\pi(101)=0.13, π(110)=0.25\pi(110)=0.25, π(111)=0.17\pi(111)=0.17.

2.2 Probability of Literals

So far, we have defined probability distribution over the CC set of a language. To discuss probabilities of literals in the language, there are two distinct views we can take: probabilistic open-world assumptions (P-OWA) and probabilistic closed-world assumptions (P-CWA), explained as follows.

With P-OWA, from a set of p-rules, we take the stand that:

The probability of a literal is determined by the p-rules deducing the literal in conjunction with some unspecified factors that are not described by all p-rules that are known.

P-CWA is the opposite of the P-OWA, such that:

The probability of a literal is determined by the known p-rules deducing the literal.

P-OWA and P-CWA can be viewed as probabilistic counterparts to Reiter’s classic OWA and CWA [50] in the following way.

  • P-OWA and OWA assume that things which cannot be deduced from known information can still be true;

  • P-CWA and CWA both assume that the information available is “complete” for reasoning.

We start our discussion with P-OWA. To define literal probability with P-OWA, we need the following result.

Proposition 2.1.

Given a set of p-rules \mathcal{R} over a language \mathcal{L}, if there is a consistent probability distribution π\pi for Ω\Omega with respect to \mathcal{R}, then for any σc\sigma\in\mathcal{L}^{c}, it is the case that:

ωiΩ,ωiσπ(ωi)\displaystyle\sum_{\omega_{i}\in\Omega,\omega_{i}\models\sigma}\pi(\omega_{i}) 0,\displaystyle\geq 0, (11)
ωiΩ,ωiσπ(ωi)+ωiΩ,ωi¬σπ(ωi)\displaystyle\sum_{\omega_{i}\in\Omega,\omega_{i}\models\sigma}\pi(\omega_{i})+\sum_{\omega_{i}\in\Omega,\omega_{i}\models\neg\sigma}\pi(\omega_{i}) =1.\displaystyle=1. (12)

With Proposition 2.1, we can define probability of literals under P-OWA. Given a set of p-rules \mathcal{R}, if there is consistent probability distribution π\pi for Ω\Omega with respect to \mathcal{R}, then for any σc\sigma\in\mathcal{L}^{c}, the probability of σ\sigma under P-OWA is Pro(x)\mathrm{Pr_{o}}(x) such that:

Pro(σ)=ωiΩ,ωiσπ(ωi).\mathrm{Pr_{o}}(\sigma)=\sum_{\omega_{i}\in\Omega,\omega_{i}\models\sigma}\pi(\omega_{i}). (13)

Under P-OWA, the literal probability is as defined in [20]. From a Rule-PSAT solution, which characterises a probability distribution over the CC set, one can compute literal probabilities by summing up π(ωi)\pi(\omega_{i}). We illustrate literal probability computation in P-OWA in the example below.

Example 2.6.

Consider the set of p-rules

{σ0¬σ1:[1],σ1:[1]},\{\sigma_{0}\leftarrow\neg\sigma_{1}:[1],\sigma_{1}\leftarrow:[1]\},

which states that σ0\sigma_{0} holds if ¬σ1\neg\sigma_{1} does, and σ1\sigma_{1} holds. With P-OWA, we have the following equations:

π(σ0¬σ1)/(π(σ0¬σ1)+π(¬σ0¬σ1)\displaystyle\pi(\sigma_{0}\wedge\neg\sigma_{1})/(\pi(\sigma_{0}\wedge\neg\sigma_{1})+\pi(\neg\sigma_{0}\wedge\neg\sigma_{1}) =1\displaystyle=1 (14)
π(σ0σ1)+π(¬σ0σ1)\displaystyle\pi(\sigma_{0}\wedge\sigma_{1})+\pi(\neg\sigma_{0}\wedge\sigma_{1}) =1\displaystyle=1 (15)
π(¬σ0¬σ1)+π(¬σ0σ1)+π(σ0¬σ1)+π(σ0σ1)\displaystyle\pi(\neg\sigma_{0}\wedge\neg\sigma_{1})+\pi(\neg\sigma_{0}\wedge\sigma_{1})+\pi(\sigma_{0}\wedge\neg\sigma_{1})+\pi(\sigma_{0}\wedge\sigma_{1}) =1\displaystyle=1 (16)

Solve these, a solution to the joint distribution is

π(¬σ0¬σ1)=0\pi(\neg\sigma_{0}\wedge\neg\sigma_{1})=0, π(¬σ0σ1)=0.5\pi(\neg\sigma_{0}\wedge\sigma_{1})=0.5,
π(σ0¬σ1)=0\pi(\sigma_{0}\wedge\neg\sigma_{1})=0, π(σ0σ1)=0.5\pi(\sigma_{0}\wedge\sigma_{1})=0.5.

Use Equation 13 to calculate literal probability, we have

Pro(σ0)\displaystyle\mathrm{Pr_{o}}(\sigma_{0}) =π(σ0σ1)+π(σ0¬σ1)=0.5,\displaystyle=\pi(\sigma_{0}\wedge\sigma_{1})+\pi(\sigma_{0}\wedge\neg\sigma_{1})=0.5,
Pro(σ1)\displaystyle\mathrm{Pr_{o}}(\sigma_{1}) =π(σ0σ1)+π(¬σ0σ1)=1.\displaystyle=\pi(\sigma_{0}\wedge\sigma_{1})+\pi(\neg\sigma_{0}\wedge\sigma_{1})=1.

Thus, we read these as:

  1. 1.

    σ1\sigma_{1} holds as we have asserted with the p-rule

    σ1:[1];\sigma_{1}\leftarrow:[1];
  2. 2.

    yet there is a 50/50 chance that σ0\sigma_{0} holds as well, despite that we have the knowledge that σ0\sigma_{0} would hold if σ1\sigma_{1} does not, captured with the p-rule

    σ0¬σ1:[1].\sigma_{0}\leftarrow\neg\sigma_{1}:[1].

Thus, we see that the world is “open” as σ0\sigma_{0} has a chance to hold, even though we have no way to deduce that with the p-rules we have.

P-CWA asserts more constraints to literal probabilities than P-OWA. With P-CWA, the probability of a literal is determined by all ways of deducing the literal. To define literal probability under P-OWA, we formalize deduction with p-rules using the same notion defined in Assumption-based Argumentation (ABA) [56], as follows.

Definition 2.5.

Given a language \mathcal{L} and a set of p-rules \mathcal{R}, a deduction for σc\sigma\in\mathcal{L}^{c} with ScS\subseteq\mathcal{L}^{c}, denoted S𝙳σS\vdash_{\mathtt{D}}\sigma, is a finite tree with nodes labelled by literals in c\mathcal{L}^{c} or by τ\tau444τ\tau\not\in\mathcal{L} represents “true” and stands for the empty body of rules. In other words, each rule σ\sigma\leftarrow can be interpreted as στ\sigma\leftarrow\tau for the purpose of presenting deductions as trees., the root labelled by σ\sigma, leaves either τ\tau or literals in SS, non-leaves σ\sigma, as children, the elements of the body of some rules in \mathcal{R} with head σ\sigma.

With deduction defined, we can define literal probabilities under P-CWA. Formally, let {Σ1𝙳σ,,Σm𝙳σ}\{\Sigma_{1}\vdash_{\mathtt{D}}\sigma,\ldots,\Sigma_{m}\vdash_{\mathtt{D}}\sigma\} be all maximal deductions for σ\sigma,555A deduction S𝙳σS\vdash_{\mathtt{D}}\sigma is maximal when there is no S𝙳σS^{\prime}\vdash_{\mathtt{D}}\sigma such that SSS\subset S^{\prime}. where Σ1={σ11,,σk11},,Σm={σ1m,,σkmm}\Sigma_{1}=\{\sigma_{1}^{1},\ldots,\sigma_{k1}^{1}\},\ldots,\Sigma_{m}=\{\sigma_{1}^{m},\ldots,\sigma_{km}^{m}\}. Let

S=i=1k1σi1i=1kmσim.S=\bigwedge\limits_{i=1}^{k1}\sigma_{i}^{1}\vee\ldots\vee\bigwedge\limits_{i=1}^{km}\sigma_{i}^{m}. (17)

Then,

Prc(σ)=ωiΩ,ωiσπ(ωi)=ωiΩ,ωiSπ(ωi).\mathrm{Pr_{c}}(\sigma)=\sum_{\omega_{i}\in\Omega,\omega_{i}\models\sigma}\pi(\omega_{i})=\sum_{\omega_{i}\in\Omega,\omega_{i}\models S}\pi(\omega_{i}). (18)

The difference between P-OWA and P-CWA is illustrated in the following example.

Example 2.7.

(Example 2.6 continued.) There are maximal deductions

{¬σ1,σ0}𝙳σ0\{\neg\sigma_{1},\sigma_{0}\}\vdash_{\mathtt{D}}\sigma_{0} and {σ1}𝙳σ1\{\sigma_{1}\}\vdash_{\mathtt{D}}\sigma_{1}

for σ0\sigma_{0} and σ1\sigma_{1}, respectively. With P-CWA, in addition to Equation 14 - 16, we have an additional constraint derived from {¬σ1,σ0}𝙳σ0\{\neg\sigma_{1},\sigma_{0}\}\vdash_{\mathtt{D}}\sigma_{0}:

π(σ0σ1)+π(σ0¬σ1)=π(σ0¬σ1).\pi(\sigma_{0}\wedge\sigma_{1})+\pi(\sigma_{0}\wedge\neg\sigma_{1})=\pi(\sigma_{0}\wedge\neg\sigma_{1}). (19)

This is the case as the LHS sums up probabilities on atomic conjunctions that entail σ0\sigma_{0}; and the RHS is the atomic conjunction that entails S=σ0¬σ1S=\sigma_{0}\wedge\neg\sigma_{1}. The (unique) solution to π\pi is

π(¬σ0¬σ1)=0\pi(\neg\sigma_{0}\wedge\neg\sigma_{1})=0, π(¬σ0σ1)=1\pi(\neg\sigma_{0}\wedge\sigma_{1})=1,
π(σ0¬σ1)=0\pi(\sigma_{0}\wedge\neg\sigma_{1})=0, π(σ0σ1)=0\pi(\sigma_{0}\wedge\sigma_{1})=0.

With these, we have Prc(σ0)=0,Prc(σ1)=1\mathrm{Pr_{c}}(\sigma_{0})=0,\mathrm{Pr_{c}}(\sigma_{1})=1.

Such results match with our intuition:

  • With P-CWA, we assume that the only way to obtain σ0\sigma_{0} is by having ¬σ1\neg\sigma_{1} (with the p-rule σ0¬σ1\sigma_{0}\leftarrow\neg\sigma_{1}:[11]). However, since we know σ1\sigma_{1} without any doubt (from the p-rule σ1\sigma_{1}\leftarrow:[11]), there is no room to believe ¬σ1\neg\sigma_{1}; thus we cannot not deduce σ0\sigma_{0}.

  • On the other hand, with P-OWA, we assume that although we can obtain σ0\sigma_{0} from ¬σ1\neg\sigma_{1}, but there are other possible ways of deriving σ0\sigma_{0} that we are unaware of, thus not having ¬σ1\neg\sigma_{1} does not suggest that we can rule out a possibility of σ0\sigma_{0}.

Another way to look at P-OWA and P-CWA is from Equation 18. Given σc\sigma\in\mathcal{L}^{c}, let SS be as defined in Equation 17, then for any ωiΩ\omega_{i}\in\Omega, ωiσ\omega_{i}\models\sigma and ωi⊧̸S\omega_{i}\not\models S, it holds that:

π(ωi)=0.\pi(\omega_{i})=0. (20)

From Equation 20, as demonstrated in Example 2.7, it is clear that P-CWA imposes additional constraints on the joint distribution π\pi. Thus, from a set of p-rules that is Rule-PSAT, literal probabilities under P-CWA may be undefined. Consider the following example.

Example 2.8.

Consider the set of p-rules

R={σ0¬σ1:[1],σ1:[1],σ0:[0.5]}.R=\{\sigma_{0}\leftarrow\neg\sigma_{1}:[1],\sigma_{1}\leftarrow:[1],\sigma_{0}\leftarrow:[0.5]\}.

Clearly, RR is Rule-PSAT with the solution as given in Example 2.6, computed using Equations 14-16. However, if Equation 19 is added to assert P-CWA, then there is no solution to π\pi. Thus, literal probabilities under P-CWA such as Prc(σ0)\mathrm{Pr_{c}}(\sigma_{0}) and Prc(σ1)\mathrm{Pr_{c}}(\sigma_{1}) are undefined in this example.

We introduce the concept of P-CWA consistency as follows.

Definition 2.6.

A set of p-rules defined with a language \mathcal{L} is P-CWA consistent if and only if it is consistent and for each σc\sigma\in\mathcal{L}^{c}, 0Prc(σ)1.0\leq\mathrm{Pr_{c}}(\sigma)\leq 1.

P-CWA differs from assumptions several standard assumptions made in handling probabilistic systems such as independence (discussed in e.g., [28]) or mutual exclusivity (discussed in e.g., [59]), as illustrated in Example 2.9 below.

Example 2.9.

Consider a p-rule:

σ0σ1\sigma_{0}\leftarrow\sigma_{1}:[θ\theta].

By the definition of p-rule, it holds that

Pr(σ0|σ1)=Pr(σ0σ1)Pr(σ1)=θ.\Pr(\sigma_{0}|\sigma_{1})=\frac{\Pr(\sigma_{0}\wedge\sigma_{1})}{\Pr(\sigma_{1})}=\theta.
  • With P-CWA, from the deduction {σ0,σ1}𝙳σ0\{\sigma_{0},\sigma_{1}\}\vdash_{\mathtt{D}}\sigma_{0}, we have

    Pr(σ0)=Pr(σ0σ1).\Pr(\sigma_{0})=\Pr(\sigma_{0}\wedge\sigma_{1}).

    Thus, θ=Pr(σ0)/Pr(σ1)\theta=\Pr(\sigma_{0})/\Pr(\sigma_{1}).

  • With the independence assumption, assuming that σ0\sigma_{0} and σ1\sigma_{1} are independent, we have

    Pr(σ0σ1)=Pr(σ0)Pr(σ1).\Pr(\sigma_{0}\wedge\sigma_{1})=\Pr(\sigma_{0})\Pr(\sigma_{1}).

    Thus, θ=Pr(σ0)\theta=\Pr(\sigma_{0}).

  • With the mutual exclusivity assumption, assuming that σ0\sigma_{0} and σ1\sigma_{1} are mutually exclusive, we have

    Pr(σ0σ1)=0.\Pr(\sigma_{0}\wedge\sigma_{1})=0.

    Thus, θ=0\theta=0.

It is easy to see that P-CWA also differs from conditional independence [14], which is the main assumption enabling Bayesian network [53], as illustrated in Example 2.10 below.

Example 2.10.

Consider two p-rules:

σ0σ1\sigma_{0}\leftarrow\sigma_{1}:[θ1\theta_{1}],     σ1σ2\sigma_{1}\leftarrow\sigma_{2}:[θ2\theta_{2}].

By the definition of p-rule, we have

Pr(σ0|σ1)=θ1,Pr(σ1|σ2)=θ2.\Pr(\sigma_{0}|\sigma_{1})=\theta_{1},\Pr(\sigma_{1}|\sigma_{2})=\theta_{2}.

Use the chain rule,

Pr(σ0σ1σ2)=Pr(σ0|σ1σ2)Pr(σ1|σ2)Pr(σ2).\Pr(\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2})=\Pr(\sigma_{0}|\sigma_{1}\wedge\sigma_{2})\Pr(\sigma_{1}|\sigma_{2})\Pr(\sigma_{2}).

With conditional independence, assuming that σ0\sigma_{0} and σ2\sigma_{2} are conditionally independent given σ1\sigma_{1}, we have Pr(σ0|σ1σ2)=Pr(σ0|σ1).\Pr(\sigma_{0}|\sigma_{1}\wedge\sigma_{2})=\Pr(\sigma_{0}|\sigma_{1}). Thus,

Pr(σ0σ1σ2)=Pr(σ0|σ1)Pr(σ1|σ2)Pr(σ2)=θ1θ2Pr(σ2).\Pr(\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2})=\Pr(\sigma_{0}|\sigma_{1})\Pr(\sigma_{1}|\sigma_{2})\Pr(\sigma_{2})=\theta_{1}\theta_{2}\Pr(\sigma_{2}). (21)

With P-CWA, from the deduction {σ1,σ2}𝙳σ1\{\sigma_{1},\sigma_{2}\}\vdash_{\mathtt{D}}\sigma_{1} we have

Pr(σ1)=Pr(σ1σ2).\Pr(\sigma_{1})=\Pr(\sigma_{1}\wedge\sigma_{2}).

From Pr(σ1|σ2)=θ2\Pr(\sigma_{1}|\sigma_{2})=\theta_{2}, we have

Pr(σ1σ2)Pr(σ2)=θ2.\frac{\Pr(\sigma_{1}\wedge\sigma_{2})}{\Pr(\sigma_{2})}=\theta_{2}.

Thus, Pr(σ1)=θ2Pr(σ2)\Pr(\sigma_{1})=\theta_{2}\Pr(\sigma_{2}). From Pr(σ0|σ1)=θ1\Pr(\sigma_{0}|\sigma_{1})=\theta_{1}, we have

Pr(σ0σ1)Pr(σ1)=Pr(σ0σ1σ2)+Pr(σ0σ1¬σ2)θ2Pr(σ2)=θ1.\frac{\Pr(\sigma_{0}\wedge\sigma_{1})}{\Pr(\sigma_{1})}=\frac{\Pr(\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2})+\Pr(\sigma_{0}\wedge\sigma_{1}\wedge\neg\sigma_{2})}{\theta_{2}\Pr(\sigma_{2})}=\theta_{1}.

Since Pr(σ1)=Pr(σ1σ2)\Pr(\sigma_{1})=\Pr(\sigma_{1}\wedge\sigma_{2}),

Pr(σ1¬σ2)=0.\Pr(\sigma_{1}\wedge\neg\sigma_{2})=0.

Since Pr(σ0σ1¬σ2)Pr(σ1¬σ2)\Pr(\sigma_{0}\wedge\sigma_{1}\wedge\neg\sigma_{2})\leq\Pr(\sigma_{1}\wedge\neg\sigma_{2}), we have

Pr(σ0σ1¬σ2)=0.\Pr(\sigma_{0}\wedge\sigma_{1}\wedge\neg\sigma_{2})=0.

Therefore, with P-CWA we also obtain

Pr(σ0σ1σ2)=θ1θ2Pr(σ2)\Pr(\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2})=\theta_{1}\theta_{2}\Pr(\sigma_{2})

as with the conditional independence assumption.

However, with P-CWA, from the deduction {σ0,σ1,σ2}𝙳σ0\{\sigma_{0},\sigma_{1},\sigma_{2}\}\vdash_{\mathtt{D}}\sigma_{0}, we also have

Pr(σ0)=Pr(σ0σ1σ2),\Pr(\sigma_{0})=\Pr(\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2}),

which infers that

Pr(σ0σ1¬σ2)=Pr(σ0¬σ1σ2)=Pr(σ0¬σ1¬σ2)=0.\Pr(\sigma_{0}\wedge\sigma_{1}\wedge\neg\sigma_{2})=\Pr(\sigma_{0}\wedge\neg\sigma_{1}\wedge\sigma_{2})=\Pr(\sigma_{0}\wedge\neg\sigma_{1}\wedge\neg\sigma_{2})=0.

These do not hold in general with the conditional independence assumption.

2.3 Maximum Entropy Solutions

As we are solving systems derived from p-rules to compute the joint distribution π\pi, when the system is underdetermined, multiple solutions exist, as illustrated in the next example.

Example 2.11.

(Example 2.5 continued.) This example shows a system with five equations and eight unknowns. Thus the system is underdetermined. In addition to the solution shown previously, the following is another solution:

π(ω1)=0.14\pi(\omega_{1})=0.14, π(ω2)=0.16\pi(\omega_{2})=0.16, π(ω3)=0.14\pi(\omega_{3})=0.14, π(ω4)=0.14\pi(\omega_{4})=0.14,
π(ω5)=0\pi(\omega_{5})=0, π(ω6)=0\pi(\omega_{6})=0, π(ω7)=0.12\pi(\omega_{7})=0.12, π(ω8)=0.3\pi(\omega_{8})=0.3.

If a set of p-rules \mathcal{R} is satisfiable (or P-CWA consistent), but the solution to π\pi is not unique, then the range of the probability of any literal in c\mathcal{L}^{c} can be found with optimization. The upper bound of the probability of a literal σc\sigma\in\mathcal{L}^{c} can be found by maximising Pr(σ)\Pr(\sigma) as defined in Equation 13 (with P-OWA) or 18 (with P-CWA), subject to constraints given by the systems derived from the p-rules. The lower bound of Pr(σ)\Pr(\sigma) can be found by minimising these equations accordingly.

Example 2.12.

(Example 2.11 continued.) The solution shown in Example 2.5 maximises Pr(σ0)\Pr(\sigma_{0}) (Pr(σ0)=0.7\Pr(\sigma_{0})=0.7); whereas the solution in Example 2.11 minimises it (Pr(σ0)=0.42).(\Pr(\sigma_{0})=0.42).

In addition to choosing a solution that maximizes or minimizes the probability of a literal, we can also choose the solution that maximizes the entropy of the joint distribution. The principle of maximum entropy is commonly used in probabilistic reasoning [39, 48], including in probabilistic argumentation as discussed in e.g., [32] and [55]. It states that

amongst the set of distributions that characterize the known information equally well, the distribution with the maximum entropy should be chosen [38].

The entropy of a discrete probability distribution {p1,p2,}\{p_{1},p_{2},\ldots\} is

H(p1,p2,)=ipilog(pi).H(p_{1},p_{2},\ldots)=-\sum_{i}p_{i}\log(p_{i}).

In our context, given a language with nn atoms, the maximum entropy distribution can be found by maximising

H(π1,,π2n)=i=12nπ(ωi)log(π(ωi)),H(\pi_{1},\ldots,\pi_{2^{n}})=-\sum_{i=1}^{2^{n}}\pi(\omega_{i})\log(\pi(\omega_{i})), (22)

subject to the system derived from p-rules.

Example 2.13.

(Example 2.12 continued.) The maximum entropy distribution solution found in this example is:

π(ω1)=0.058\pi(\omega_{1})=0.058, π(ω2)=0.114\pi(\omega_{2})=0.114, π(ω3)=0.094\pi(\omega_{3})=0.094, π(ω4)=0.186\pi(\omega_{4})=0.186,
π(ω5)=0.058\pi(\omega_{5})=0.058, π(ω6)=0.07\pi(\omega_{6})=0.07, π(ω7)=0.19\pi(\omega_{7})=0.19, π(ω8)=0.23\pi(\omega_{8})=0.23.

With this π\pi, Pr(σ0)=0.55\Pr(\sigma_{0})=0.55.

By the definition of Rule-PSAT and P-CWA consistency, it is easy to see that maximum entropy solution exists and is unique for satisfiable p-rules. Formally,

Proposition 2.2.

Given a set of consistent p-rules, the maximum entropy solution πm\pi^{m} exists and is uniquely determined.

A maximum entropy solution is as unbiased as possible amongst all solutions [55]. A useful result on maximum entropy solution that we use in Section 3 is the following.

Lemma 2.1.

Given a set of consistent p-rules, for each ωΩ\omega\in\Omega, consider some constant αω\alpha_{\omega} such that [0,αω][0,\alpha_{\omega}] is the feasible region for π(ω)\pi(\omega). Let πm\pi^{m} be the maximum entropy solution. If αω>0\alpha_{\omega}>0, then πm(ω)>0\pi^{m}(\omega)>0.

Lemma 2.1 sanctions that a maximum entropy solution asserting a non-zero probability to each ωΩ\omega\in\Omega if constraints given by p-rules allow such allocation. In other words, with maximum entropy reasoning, πm(ω)\pi^{m}(\omega) is 0 only if there is no other solution π(ω)\pi(\omega) exists. Consequentially, with maximum entropy solution, literal probabilities are not 0 unless explicitly set by p-rules. Formally,

Corollary 2.1.

Given a set of consistent p-rules, for each σ\sigma\in\mathcal{L}, let ασ[0,1]\alpha_{\sigma}\in[0,1] be a constant such that Pr(σ)=Prx(σ)ασ\Pr(\sigma)=\Pr_{x}(\sigma)\leq\alpha_{\sigma} (for x{o,c}x\in\{o,c\}). With the maximum entropy solution π\pi, if ασ>0\alpha_{\sigma}>0, then Pr(σ)>0.\Pr(\sigma)>0.

2.4 Relation between P-CWA and Maximum Entropy Solutions

Although both P-CWA and maximum entropy reasoning restrict the joint probability distribution we can take on the CC set, these are orthogonal concepts. In other words, one can choose to apply either P-CWA, maximum entropy reasoning individually, or both at the same time. They can all lead to different distributions. We illustrate them with the following example.

Example 2.14.

Given a language ={σ0,σ1,σ2}\mathcal{L}=\{\sigma_{0},\sigma_{1},\sigma_{2}\} and a set of two p-rules:

{σ0σ1:[0.5],¬σ1σ2:[0.5]}\{\sigma_{0}\leftarrow\sigma_{1}:[0.5],\neg\sigma_{1}\leftarrow\sigma_{2}:[0.5]\}.

With P-OWA, to compute the joint probability distribution on the CC set, we set up three equations over the 23=82^{3}=8 unknowns:

0.5\displaystyle 0.5 =(π(110)+π(111))/(π(010)+π(011)+π(110)+π(111)),\displaystyle=(\pi(110)+\pi(111))/(\pi(010)+\pi(011)+\pi(110)+\pi(111)),
0.5\displaystyle 0.5 =(π(001)+π(101))/(π(001)+π(011)+π(101)+π(111)),\displaystyle=(\pi(001)+\pi(101))/(\pi(001)+\pi(011)+\pi(101)+\pi(111)),
1\displaystyle 1 =π(000)+π(001)+π(010)+π(011)\displaystyle=\pi(000)+\pi(001)+\pi(010)+\pi(011)
+π(100)+π(101)+π(110)+π(111).\displaystyle+\pi(100)+\pi(101)+\pi(110)+\pi(111).

With P-CWA, we must consider two deductions:

{σ0,σ1}𝙳σ0\{\sigma_{0},\sigma_{1}\}\vdash_{\mathtt{D}}\sigma_{0} and {¬σ1,σ2}𝙳¬σ1\{\neg\sigma_{1},\sigma_{2}\}\vdash_{\mathtt{D}}\neg\sigma_{1}.

These assert that:

ωΩ,ωσ0π(ω)\displaystyle\sum_{\omega\in\Omega,\omega\models\sigma_{0}}\pi(\omega) =ωΩ,ωσ0σ1π(ω),\displaystyle=\sum_{\omega\in\Omega,\omega\models\sigma_{0}\wedge\sigma_{1}}\pi(\omega),
ωΩ,ω¬σ1π(ω)\displaystyle\sum_{\omega\in\Omega,\omega\models\neg\sigma_{1}}\pi(\omega) =ωΩ,ω¬σ1σ2π(ω),\displaystyle=\sum_{\omega\in\Omega,\omega\models\neg\sigma_{1}\wedge\sigma_{2}}\pi(\omega),

which translate to

ωΩ,ωσ0¬σ1π(ω)\displaystyle\sum_{\omega\in\Omega,\omega\models\sigma_{0}\wedge\neg\sigma_{1}}\pi(\omega) =π(101)+π(100)=0,\displaystyle=\pi(101)+\pi(100)=0,
ωΩ,ω¬σ1¬σ2π(ω)\displaystyle\sum_{\omega\in\Omega,\omega\models\neg\sigma_{1}\wedge\neg\sigma_{2}}\pi(\omega) =π(000)+π(100)=0.\displaystyle=\pi(000)+\pi(100)=0.

The resulting distribution over the CC set is as summarised in Table 3. Note that unlike maximum entropy reasoning, which gives us unique solutions, the two systems used to compute P-OWA and P-CWA without maximum entropy reasoning are underdetermined, so infinitely many solutions exist.

Table 3: P-CWA and Maximum Entropy (ME) Solutions for the P-Rules in Example 2.14.
π(000)\pi(000) π(001)\pi(001) π(010)\pi(010) π(011)\pi(011)
P-OWA without ME 1 0 0 0
P-OWA with ME 0.125 0.125 0.125 0.125
P-CWA without ME 0 0.5 0 0.25
P-CWA with ME 0 0.293 0.207 0.146
π(100)\pi(100) π(101)\pi(101) π(110)\pi(110) π(111)\pi(111)
P-OWA without ME 0 0 0 0
P-OWA with ME 0.125 0.125 0.125 0.125
P-CWA without ME 0 0 0 0.25
P-CWA with ME 0 0 0.207 0.146

In this section, we have introduced p-rule as the core building block of this work. Its probability semantics is defined with the joint probability distribution over the CC set of the language. We introduce Rule-PSAT to describe consistent p-rule sets. Literal probability is defined as the sum of probabilities of conjunctions that are models of the literal.

We then introduce Probabilistic Open-World and Closed-World assumptions, P-OWA and P-CWA, respectively, modelling their counterparts in non-monotonic logic. We show how literal probability can be computed with respect to both P-OWA and P-CWA. We make a few remarks that P-CWA differs from other common probabilistic assumptions such as independence, mutual exclusivity and conditional independence. We finish this section with an introduction to maximum entropy solutions and show that when the joint distribution is computed with maximum entropy reasoning, literals will not take 0 probability unless that is the only solution they have.

3 Argumentation with P-Rules

Thus far, we have introduced p-rules as the basic building block in probabilistic deduction. In this section, we show how probabilistic arguments can be built with p-rules and how attacks can be defined between arguments. To this end, we formally define Probabilistic Deduction (PD) Framework composed of p-rules. We then show how PD admits Abstract Argumentation [17] as instances.

Note that in this section, we assume p-rules in discussion are Rule-PSAT consistent. Thus there exists a consistent joint probability distribution π\pi for the CC set of \mathcal{L}. We will discuss several methods for computing joint distributions π\pi from a set of p-rules in Section 4. We also assume that P-CWA can be imposed. Thus, unless specified otherwise, we use Pr(_)\Pr(\_) to denote Prc(_)\mathrm{Pr_{c}}(\_) in this section.

3.1 Background: Abstract Argumentation

We briefly review concepts from abstract argumentation (AA).

An Abstract Argumentation (AA) frameworks [17] are pairs 𝒜,𝒯\langle\mathcal{A},\mathcal{T}\rangle, consisting of a set of abstract arguments, 𝒜\mathcal{A}, and a binary attack relation, 𝒯\mathcal{T}. Given an AA framework AF=𝒜,𝒯\emph{AF}=\langle\mathcal{A},\mathcal{T}\rangle, a set of arguments (or extension) E𝒜E\subseteq\mathcal{A} is

  • admissible (in AF) if and only if 𝙰,𝙱E\forall\mathtt{A},\mathtt{B}\in E, (𝙰,𝙱)𝒯(\mathtt{A},\mathtt{B})\not\in\mathcal{T} (i.e. EE is conflict-free) and for any 𝙰E\mathtt{A}\in E, if (𝙲,𝙰)𝒯(\mathtt{C},\mathtt{A})\in\mathcal{T}, then there exists some 𝙱E\mathtt{B}\in E such that (𝙱,𝙲)(\mathtt{B},\mathtt{C})\in\mathcal{R};

  • complete if and only if EE is admissible and contains all arguments it defends, where EE defends some 𝙰𝒜\mathtt{A}^{\prime}\in\mathcal{A} iff EE attacks all arguments that attack 𝙰\mathtt{A}^{\prime}.

Given 𝒜,𝒯\langle\mathcal{A},\mathcal{T}\rangle and a set of labels Λ={𝚒𝚗,𝚘𝚞𝚝,𝚞𝚗𝚍𝚎𝚌}\Lambda=\{\mathtt{in,out,undec}\}, a labelling is a total function 𝒜Λ\mathcal{A}\mapsto\Lambda. Given a labelling on some argumentation framework,

  • an 𝚒𝚗\mathtt{in}-labelled argument is said to be legally 𝚒𝚗\mathtt{in} if and only if all its attackers are labelled 𝚘𝚞𝚝\mathtt{out};

  • an 𝚘𝚞𝚝\mathtt{out}-labelled argument is said to be legally 𝚘𝚞𝚝\mathtt{out} if and only if it has at least one attacker that is labelled 𝚒𝚗\mathtt{in};

  • an 𝚞𝚗𝚍𝚎𝚌\mathtt{undec}-labelled argument is said to be legally 𝚞𝚗𝚍𝚎𝚌\mathtt{undec} if and only if not all its attackers are labelled 𝚘𝚞𝚝\mathtt{out} and it does not have an attacker that is labelled 𝚒𝚗\mathtt{in}.

A complete labelling is a labelling where every 𝚒𝚗\mathtt{in}-labelled argument is legally 𝚒𝚗\mathtt{in}, every 𝚘𝚞𝚝\mathtt{out}-labelled argument is legally 𝚘𝚞𝚝\mathtt{out} and every 𝚞𝚗𝚍𝚎𝚌\mathtt{undec} labelled argument is legally 𝚞𝚗𝚍𝚎𝚌\mathtt{undec}. An important result that connects complete extensions and complete labelling is that arguments that are labelled 𝚒𝚗\mathtt{in} with a complete-labelling belong to a complete extension. [1]

3.2 Probabilistic Deduction Framework

We define Probabilistic Deduction framework as a set of P-CWA consistent p-rules constructed on a language, as follows.

Definition 3.1.

A Probabilistic Deduction (PD) framework is a pair ,\langle\mathcal{L},\mathcal{R}\rangle where \mathcal{L} is the language, \mathcal{R} is a set of p-rules such that

  • for all ρ\rho\in\mathcal{R}, literals in ρ\rho are in c\mathcal{L}^{c},

  • \mathcal{R} is P-CWA consistent.

With a PD framework, we can build arguments as deductions.

Definition 3.2.

Given a PD framework ,\langle\mathcal{L},\mathcal{R}\rangle, an argument for σ\sigma\in\mathcal{L} supported by SS\subseteq\mathcal{L}, RR\subseteq\mathcal{R}, denoted SσS\vdash\sigma is such that there is a deduction 𝙰=S𝙳σ\mathtt{A}=S\vdash_{\mathtt{D}}\sigma in which for each leaf node NN in 𝙰\mathtt{A}, either

  1. 1.

    NN is labelled by τ\tau, or

  2. 2.

    NN is labelled by some σ\sigma^{\prime}\in\mathcal{L}, ¬σ_:[]\neg\sigma^{\prime}\leftarrow\_:[\cdot]\in\mathcal{R} and |S|>1|S|>1.

The condition |S|>1|S|>1 in Definition 3.2 is put in place to remove “negative singleton argument” resulted from p-rules. For instance, consider a PD framework ,\langle\mathcal{L},\mathcal{R}\rangle with ={σ0}\mathcal{L}=\{\sigma_{0}\}, ={σ0:[1]}.\mathcal{R}=\{\sigma_{0}\leftarrow:[1]\}. Without this condition, we would admit {¬σ0}𝙳¬σ0\{\neg\sigma_{0}\}\vdash_{\mathtt{D}}\neg\sigma_{0} as an argument, because

  1. 1.

    ¬σ0c\neg\sigma_{0}\in\mathcal{L}^{c} forms a tree by itself in which the root and the leaf are both ¬σ0\neg\sigma_{0},

  2. 2.

    ¬¬σ0=σ0\neg\neg\sigma_{0}=\sigma_{0} and σ0:[1]\sigma_{0}\leftarrow:[1] is a p-rule in \mathcal{R}.

Intuitively, in this definition of argument, we want to assert that

  1. 1.

    if there is only a single literal σ\sigma in the deduction, then there must exist a p-rule σ:[]\sigma\leftarrow:[\cdot] in the set of rules; otherwise,

  2. 2.

    there must be some reason to acknowledge each leaf, either directly through a rule without body, or the existence of some information about the negation of the leaf.

Example 3.1 presents a few deductions for illustration.

Example 3.1.

Consider a language ={σ0,σ1,σ2,σ3}\mathcal{L}=\{\sigma_{0},\sigma_{1},\sigma_{2},\sigma_{3}\} and four sets of p-rules R1,,R4R_{1},\ldots,R_{4}, where

  • R1={σ0:[0.7]}R_{1}=\{\sigma_{0}\leftarrow:[0.7]\},

  • R2={σ1:[0.8],σ0σ1:[0.4]}R_{2}=\{\sigma_{1}\leftarrow:[0.8],\sigma_{0}\leftarrow\sigma_{1}:[0.4]\},

  • R3={σ1:[0.7],¬σ2:[0.8],σ0σ1,¬σ2:[0.8]}R_{3}=\{\sigma_{1}\leftarrow:[0.7],\neg\sigma_{2}\leftarrow:[0.8],\sigma_{0}\leftarrow\sigma_{1},\neg\sigma_{2}:[0.8]\},

  • R4={¬σ2σ3:[0.7],σ1σ2:[0.8],¬σ0σ1:[0.8]}R_{4}=\{\neg\sigma_{2}\leftarrow\sigma_{3}:[0.7],\sigma_{1}\leftarrow\sigma_{2}:[0.8],\neg\sigma_{0}\leftarrow\sigma_{1}:[0.8]\}.

Figure 2 shows some examples of arguments built with these sets of p-rules.

R1\textstyle{R_{1}}R2\textstyle{R_{2}}R3\textstyle{R_{3}}R4\textstyle{R_{4}}σ0\textstyle{\sigma_{0}}σ0\textstyle{\sigma_{0}}¬σ0\textstyle{\neg\sigma_{0}}σ0\textstyle{\sigma_{0}}σ1\textstyle{\sigma_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}σ1\textstyle{\sigma_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}¬σ2\textstyle{\neg\sigma_{2}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}σ1\textstyle{\sigma_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}τ\textstyle{\tau\ignorespaces\ignorespaces\ignorespaces\ignorespaces}τ\textstyle{\tau\ignorespaces\ignorespaces\ignorespaces\ignorespaces}τ\textstyle{\tau\ignorespaces\ignorespaces\ignorespaces\ignorespaces}τ\textstyle{\tau\ignorespaces\ignorespaces\ignorespaces\ignorespaces}σ2\textstyle{\sigma_{2}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}
Figure 2: Argument examples: {σ0}σ0\{\sigma_{0}\}\vdash\sigma_{0} (built with R1R_{1}), {σ0,σ1}σ0\{\sigma_{0},\sigma_{1}\}\vdash\sigma_{0} (built with R2R_{2}), {σ0,σ1,¬σ2}σ0\{\sigma_{0},\sigma_{1},\neg\sigma_{2}\}\vdash\sigma_{0} (built with R3R_{3}) and {σ2,σ1,¬σ0}σ0\{\sigma_{2},\sigma_{1},\neg\sigma_{0}\}\vdash\sigma_{0} (built with R4R_{4}) in Example 3.1.
Example 3.2.

(Example 2.14 continued.) With these two p-rules,

{σ0σ1:[0.5],¬σ1σ2:[0.5]}\{\sigma_{0}\leftarrow\sigma_{1}:[0.5],\neg\sigma_{1}\leftarrow\sigma_{2}:[0.5]\}.

although both {σ0,σ1}𝙳σ0\{\sigma_{0},\sigma_{1}\}\vdash_{\mathtt{D}}\sigma_{0} and {¬σ1,σ2}𝙳¬σ1\{\neg\sigma_{1},\sigma_{2}\}\vdash_{\mathtt{D}}\neg\sigma_{1} are deductions; only {σ0,σ1}σ0\{\sigma_{0},\sigma_{1}\}\vdash\sigma_{0} is a PD argument.

Definition 3.3.

For two arguments 𝙰=_σ\mathtt{A}=\_\vdash\sigma and 𝙱=Σ_\mathtt{B}=\Sigma\vdash\_ in some PD framework, 𝙰\mathtt{A} attacks 𝙱\mathtt{B} if ¬σΣ\neg\sigma\in\Sigma.

Example 3.3.

Consider a PD framework ,\langle\mathcal{L},\mathcal{R}\rangle with

={σ0:[0.8],σ1¬σ0:[0.9]}.\mathcal{R}=\{\sigma_{0}\leftarrow:[0.8],\sigma_{1}\leftarrow\neg\sigma_{0}:[0.9]\}.

Two arguments 𝙰={σ0}σ0\mathtt{A}=\{\sigma_{0}\}\vdash\sigma_{0} and 𝙱={¬σ0,σ1}σ1\mathtt{B}=\{\neg\sigma_{0},\sigma_{1}\}\vdash\sigma_{1} can be built with \mathcal{R} such that 𝙰\mathtt{A} attacks 𝙱\mathtt{B}, as illustrated in Figure 3.

𝙰={σ0}σ0\textstyle{\mathtt{A}=\{\sigma_{0}\}\vdash\sigma_{0}}𝙱={¬σ0,σ1}σ1\textstyle{\mathtt{B}=\{\neg\sigma_{0},\sigma_{1}\}\vdash\sigma_{1}}
Figure 3: Argument 𝙰={σ0}σ0\mathtt{A}=\{\sigma_{0}\}\vdash\sigma_{0} attacks 𝙱={¬σ0,σ1}σ1\mathtt{B}=\{\neg\sigma_{0},\sigma_{1}\}\vdash\sigma_{1} in Example 3.3 and 3.8.

At the core of PD semantics is the argument probability, defined as follows.

Definition 3.4.

Given an argument 𝙰=Sσ\mathtt{A}=S\vdash\sigma, in which S={s1,,sk}S=\{s_{1},\ldots,s_{k}\}, the probability of 𝙰\mathtt{A} is:

Pr(𝙰)=Pr(sisk)=ωiΩ,ωis1skπ(ωi).\Pr(\mathtt{A})=\Pr(s_{i}\wedge\ldots\wedge s_{k})=\sum_{\omega_{i}\in\Omega,\omega_{i}\models s_{1}\wedge\ldots\wedge s_{k}}\pi(\omega_{i}). (23)

Trivially, 0Pr(𝙰)10\leq\Pr(\mathtt{A})\leq 1. We illustrate argument probability in Example 3.4.

Example 3.4.

(Example 3.3 continued.) Consider the following joint distribution computed from the p-rules:

π(00)=0.02,π(01)=0.18,π(10)=0.8,π(11)=0.\pi(00)=0.02,\pi(01)=0.18,\pi(10)=0.8,\pi(11)=0.

Note that the joint distribution is unique. With P-CWA, from {σ1,¬σ0}𝙳σ1\{\sigma_{1},\neg\sigma_{0}\}\vdash_{\mathtt{D}}\sigma_{1}, we have

ωΩ,ωσ1π(ω)=ωΩ,ωσ1¬σ0π(ω).\sum_{\omega\in\Omega,\omega\models\sigma_{1}}\pi(\omega)=\sum_{\omega\in\Omega,\omega\models\sigma_{1}\wedge\neg\sigma_{0}}\pi(\omega).

This implies π(σ0σ1)=π(11)=0\pi(\sigma_{0}\wedge\sigma_{1})=\pi(11)=0.

From the joint distribution, we compute literal and argument probabilities using Equations 13 and 23, respectively, as follows.

Pr(σ0)\displaystyle\Pr(\sigma_{0}) =π(10)+π(11)=0.8,\displaystyle=\pi(10)+\pi(11)=0.8,
Pr(¬σ0)\displaystyle\Pr(\neg\sigma_{0}) =π(00)+π(01)=0.2,\displaystyle=\pi(00)+\pi(01)=0.2,
Pr(σ1)\displaystyle\Pr(\sigma_{1}) =π(01)+π(11)=0.18,\displaystyle=\pi(01)+\pi(11)=0.18,
Pr(¬σ1)\displaystyle\Pr(\neg\sigma_{1}) =π(00)+π(10)=0.82,\displaystyle=\pi(00)+\pi(10)=0.82,
Pr({σ0}σ0)\displaystyle\Pr(\{\sigma_{0}\}\vdash\sigma_{0}) =π(10)+π(11)=0.8,\displaystyle=\pi(10)+\pi(11)=0.8,
Pr({¬σ0,σ1}σ1)\displaystyle\Pr(\{\neg\sigma_{0},\sigma_{1}\}\vdash\sigma_{1}) =π(01)=0.18.\displaystyle=\pi(01)=0.18.

Probabilities of arguments that forming attack cycles can be computed without any special treatment, as illustrated in the following example.

Example 3.5.

Consider a PD framework with a set of p-rules

{σ0¬σ1:[0.9],σ1¬σ0:[0.6]}.\{\sigma_{0}\leftarrow\neg\sigma_{1}:[0.9],\sigma_{1}\leftarrow\neg\sigma_{0}:[0.6]\}.

Two arguments 𝙰={¬σ1,σ0}σ0\mathtt{A}=\{\neg\sigma_{1},\sigma_{0}\}\vdash\sigma_{0} and 𝙱={¬σ0,σ1}σ1\mathtt{B}=\{\neg\sigma_{0},\sigma_{1}\}\vdash\sigma_{1} can be built such that 𝙰\mathtt{A} attacks 𝙱\mathtt{B} and 𝙱\mathtt{B} attacks 𝙰\mathtt{A}, as illustrated in Figure 4.

We compute the joint probability distribution as:

π(00)=0.087,π(01)=0.13,π(10)=0.783,π(11)=0.\pi(00)=0.087,\pi(01)=0.13,\pi(10)=0.783,\pi(11)=0.

Note that the joint probability distribution is again unique. With P-CWA, we have

ωΩ,ωσ0π(ω)=ωΩ,ωσ0¬σ1π(ω).\sum_{\omega\in\Omega,\omega\models\sigma_{0}}\pi(\omega)=\sum_{\omega\in\Omega,\omega\models\sigma_{0}\wedge\neg\sigma_{1}}\pi(\omega).

and

ωΩ,ωσ1π(ω)=ωΩ,ωσ1¬σ0π(ω).\sum_{\omega\in\Omega,\omega\models\sigma_{1}}\pi(\omega)=\sum_{\omega\in\Omega,\omega\models\sigma_{1}\wedge\neg\sigma_{0}}\pi(\omega).

Both imply π(σ0σ1)=π(11)=0\pi(\sigma_{0}\wedge\sigma_{1})=\pi(11)=0. Thus there are four equations and four unknowns, so the solution is unique.

With these, we compute literal and argument probabilities:

Pr(𝙰)=π(10)=0.783,\Pr(\mathtt{A})=\pi(10)=0.783,

Pr(𝙱)=π(01)=0.13.\Pr(\mathtt{B})=\pi(01)=0.13.

𝙰={¬σ1,σ0}σ0\textstyle{\mathtt{A}=\{\neg\sigma_{1},\sigma_{0}\}\vdash\sigma_{0}}𝙱={¬σ0,σ1}σ1\textstyle{\mathtt{B}=\{\neg\sigma_{0},\sigma_{1}\}\vdash\sigma_{1}}
Figure 4: Arguments 𝙰={¬σ1,σ0}σ0\mathtt{A}=\{\neg\sigma_{1},\sigma_{0}\}\vdash\sigma_{0} and 𝙱={¬σ0,σ1}σ1\mathtt{B}=\{\neg\sigma_{0},\sigma_{1}\}\vdash\sigma_{1} attack each other in Example 3.5.

A few observations can be made with our notions of arguments and attacks in PD frameworks as follows.

  • Arguments are defined syntactically in that arguments are deductions, which are trees with nodes being literals and edges defined with p-rules.

  • Attacks are also defined syntactically, without referring to either literal or argument probabilities, such that argument 𝙰\mathtt{A} attacks argument 𝙱\mathtt{B} if and only if the claim of the 𝙰\mathtt{A} is the negation of some literals in 𝙱\mathtt{B}. In this process, we make no distinction between “undercut” or “rebuttal” as done in some other constructions (see e.g., [19] for some discussion on these concepts).

  • The probability semantics of PD framework given in Equation 23 is based on solving the joint distribution π\pi over the CC set of the language \mathcal{L} under the P-CWA assumption. Thus, it is by design a “global” semantics in that it requires a sense of “global consistency” as given in Definition 2.6. We give a brief discussion in B.1 about reasoning with PD frameworks that are not P-CWA consistent.

  • Given a PD framework, since its joint distribution π\pi may not be unique (unless maximum entropy reasoning is enforced), argument probabilities may not be unique, as well (subject to the same maximum entropy reasoning condition). We discuss the calculation of the joint distribution in detail in Section 4.

A few results concerning the argument probability semantics are as follows. Arguments containing both a literal and its negation have 0 probability.

Proposition 3.1.

For any argument 𝙰=Σ_\mathtt{A}=\Sigma\vdash\_, if σ,¬σΣ\sigma,\neg\sigma\in\Sigma, then Pr(𝙰)=0\Pr(\mathtt{A})=0.

Self-attacking arguments have 0 probability.

Proposition 3.2.

For any argument 𝙰=Σσ\mathtt{A}=\Sigma\vdash\sigma, if ¬σΣ\neg\sigma\in\Sigma, then Pr(𝙰)=0\Pr(\mathtt{A})=0.

An argument’s probability is no higher than the probability of its claim.

Proposition 3.3.

For any argument 𝙰=Σσ\mathtt{A}=\Sigma\vdash\sigma, Pr(𝙰)Pr(σ)\Pr(\mathtt{A})\leq\Pr(\sigma).

If an argument’s probability equals the probability of its claim, then there is one and only one argument for the claim.

Proposition 3.4.

For any argument 𝙰=Σσ\mathtt{A}=\Sigma\vdash\sigma, Pr(𝙰)=Pr(σ)\Pr(\mathtt{A})=\Pr(\sigma) if and only if there is no 𝙱𝙰\mathtt{B}\neq\mathtt{A} such that 𝙱=Σσ\mathtt{B}=\Sigma^{\prime}\vdash\sigma and Pr(𝙱)0\Pr(\mathtt{B})\neq 0.

If an argument is attacked by another argument, then the sum of the probability of the two arguments is no more than 1.

Proposition 3.5.

For any two arguments 𝙰\mathtt{A} and 𝙱\mathtt{B}, if 𝙰\mathtt{A} attacks 𝙱\mathtt{B}, then Pr(𝙰)+Pr(𝙱)1\Pr(\mathtt{A})+\Pr(\mathtt{B})\leq 1.

Proposition 3.5 is the first of the two conditions of p-justifiable introduced in [55] (Definition 4), also known as the “coherence criterion” introduced in [36]. Note that the second condition of p-justifiable, the sum of probabilities of an argument and its attackers must be no less than 1 (also known as the “optimistic criterion” in [36]), does not hold in PD in general, as illustrated in the following example.

Example 3.6.

Consider a PD framework with two p-rules

σ0¬σ1:[0.1],σ1:[0.1].\sigma_{0}\leftarrow\neg\sigma_{1}:[0.1],\sigma_{1}\leftarrow:[0.1].

Let 𝙰={σ1}σ1\mathtt{A}=\{\sigma_{1}\}\vdash\sigma_{1} and 𝙱={σ0,¬σ1}σ0\mathtt{B}=\{\sigma_{0},\neg\sigma_{1}\}\vdash\sigma_{0}. We compute Pr(𝙰)=0.1\Pr(\mathtt{A})=0.1 and Pr(𝙱)=0.09\Pr(\mathtt{B})=0.09. Clearly, 𝙰\mathtt{A} attacks 𝙱\mathtt{B}, yet Pr(𝙰)+Pr(𝙱)<1\Pr(\mathtt{A})+\Pr(\mathtt{B})<1. This example represents a case where a weak argument attacks another weak argument, and the sum of the probabilities of the two is less than 1.

With p-rules, PD naturally support reasoning with both knowledge with uncertainty and “hard” facts in a single framework. A classic example used in defeasible reasoning, “Nixon diamond” introduced by Reiter and Criscuolo [51], can be modelled with a PD framework as shown in Table 4. We can see that both arguments

𝙰={p,q,n}p\mathtt{A}=\{p,q,n\}\vdash p and 𝙱={¬p,r,n}¬p\mathtt{B}=\{\neg p,r,n\}\vdash\neg p

can be drawn such that they attack each other. Calculate their probabilities, we have Pr(𝙰)=Pr(𝙱)=0.5\Pr(\mathtt{A})=\Pr(\mathtt{B})=0.5.

Table 4: Modelling the Nixon diamond example with PD.
Defeasible Knowledge
usually, Quakers are pacifist pqp\leftarrow q:[0.50.5]
usually, Republicans are not pacifist ¬pr\neg p\leftarrow r:[0.50.5]
“Hard” Facts
Richard Nixon is a Quaker qnq\leftarrow n:[11]
Richard Nixon is a Republican rnr\leftarrow n:[11]
Nixon exists nn\leftarrow:[11]

3.3 PD Frameworks and Abstract Argumentation

To show relations between PD and AA [17], we first explore a few classic examples studied with AA to illustrate PD’s probabilistic semantics. We then present a few intermediate results (Proposition 3.6-3.10). With them, we show that PD generalises AA (Theorem 3.1).

We start by presenting a few examples 3.7-3.11. Arguments and attacks shown in these examples are used in [1] to illustrate differences between several classical (non-probabilistic) argumentation semantics. Characteristics of these examples are summarised in Table 5.

Table 5: Example Argumentation Frameworks Characteristics.
Example Description Source
Example 3.7 Three arguments and two attacks Figure 2 in [1]
Example 3.8 Two arguments attack each other Figure 3 in [1]
Example 3.9 “Floating Acceptance” example Figure 5 in [1]
Example 3.10 “Cycle of three attacking arguments” example Figure 6 in [1]
Example 3.11 Stable extension example Figure 8 in [1]
Example 3.7.

Let FF be a PD framework with three p-rules:

{σ0:[1],σ1¬σ0:[1],σ2¬σ1:[1]}.\{\sigma_{0}\leftarrow:[1],\sigma_{1}\leftarrow\neg\sigma_{0}:[1],\sigma_{2}\leftarrow\neg\sigma_{1}:[1]\}.

Let 𝙰={σ0}σ0\mathtt{A}=\{\sigma_{0}\}\vdash\sigma_{0}, 𝙱={σ1,¬σ0}σ1\mathtt{B}=\{\sigma_{1},\neg\sigma_{0}\}\vdash\sigma_{1}, 𝙲={σ2,¬σ1}σ2\mathtt{C}=\{\sigma_{2},\neg\sigma_{1}\}\vdash\sigma_{2}. Arguments and attacks are shown in Figure 5. The joint distribution π\pi is such that

π(000)=0\pi(000)=0, π(001)=0\pi(001)=0, π(010)=0\pi(010)=0, π(011)=0\pi(011)=0,
π(100)=0\pi(100)=0, π(101)=1\pi(101)=1, π(110)=0\pi(110)=0, π(111)=0\pi(111)=0.

With these, we have Pr(𝙰)=1\Pr(\mathtt{A})=1, Pr(𝙱)=0\Pr(\mathtt{B})=0, and Pr(𝙲)=1\Pr(\mathtt{C})=1.

𝙰={σ0}σ0\textstyle{\mathtt{A}=\{\sigma_{0}\}\vdash\sigma_{0}}𝙱={σ1,¬σ0}σ1\textstyle{\mathtt{B}=\{\sigma_{1},\neg\sigma_{0}\}\vdash\sigma_{1}}𝙲={σ2,¬σ1}σ2\textstyle{\mathtt{C}=\{\sigma_{2},\neg\sigma_{1}\}\vdash\sigma_{2}}
Figure 5: The PD framework shown in Example 3.7
Example 3.8.

Let FF be a PD framework with two p-rules:

{σ0¬σ1:[1],σ1¬σ0:[1]}.\{\sigma_{0}\leftarrow\neg\sigma_{1}:[1],\sigma_{1}\leftarrow\neg\sigma_{0}:[1]\}.

Let 𝙰={σ0,¬σ1}σ0\mathtt{A}=\{\sigma_{0},\neg\sigma_{1}\}\vdash\sigma_{0}, and 𝙱={σ1,¬σ0}σ1\mathtt{B}=\{\sigma_{1},\neg\sigma_{0}\}\vdash\sigma_{1}. The arguments and attacks are shown in Figure 3. A joint distribution π\pi is such that

π(00)=0,π(01)=0.5,π(10)=0.5,π(11)=0.\pi(00)=0,\pi(01)=0.5,\pi(10)=0.5,\pi(11)=0.

With these, we have Pr(𝙰)=Pr(𝙱)=0.5\Pr(\mathtt{A})=\Pr(\mathtt{B})=0.5.

Example 3.9.

Let FF be a PD framework with four p-rules:

{σ0¬σ1:[1]\{\sigma_{0}\leftarrow\neg\sigma_{1}:[1], σ1¬σ0:[1]\sigma_{1}\leftarrow\neg\sigma_{0}:[1], σ2¬σ0,¬σ1:[1]\sigma_{2}\leftarrow\neg\sigma_{0},\neg\sigma_{1}:[1], σ3¬σ2:[1]}\sigma_{3}\leftarrow\neg\sigma_{2}:[1]\}.

Let 𝙰={σ0,¬σ1}σ0\mathtt{A}=\{\sigma_{0},\neg\sigma_{1}\}\vdash\sigma_{0}, 𝙱={σ1,¬σ0}σ1\mathtt{B}=\{\sigma_{1},\neg\sigma_{0}\}\vdash\sigma_{1}, 𝙲={σ2,¬σ0,¬σ1}σ2\mathtt{C}=\{\sigma_{2},\neg\sigma_{0},\neg\sigma_{1}\}\vdash\sigma_{2}, and 𝙳={σ3,¬σ2}σ3\mathtt{D}=\{\sigma_{3},\neg\sigma_{2}\}\vdash\sigma_{3}. The arguments and attacks are shown in Figure 6. The joint distribution π\pi is such that

π(ωi)={0.5if ωi=¬σ0σ1¬σ2σ3,0.5else if ωi=σ0¬σ1¬σ2σ3,0otherwise.\pi(\omega_{i})=\begin{cases}0.5&\quad\text{if }\omega_{i}=\neg\sigma_{0}\wedge\sigma_{1}\wedge\neg\sigma_{2}\wedge\sigma_{3},\\ 0.5&\quad\text{else if }\omega_{i}=\sigma_{0}\wedge\neg\sigma_{1}\wedge\neg\sigma_{2}\wedge\sigma_{3},\\ 0&\quad\text{otherwise.}\end{cases}

Compute argument probabilities, we have Pr(𝙰)=Pr(𝙱)=0.5\Pr(\mathtt{A})=\Pr(\mathtt{B})=0.5, Pr(𝙲)=0\Pr(\mathtt{C})=0, and Pr(𝙳)=1\Pr(\mathtt{D})=1.

𝙰={σ0,¬σ1}σ0\textstyle{\mathtt{A}=\{\sigma_{0},\neg\sigma_{1}\}\vdash\sigma_{0}}𝙱={σ1,¬σ0}σ1\textstyle{\mathtt{B}=\{\sigma_{1},\neg\sigma_{0}\}\vdash\sigma_{1}}𝙲={σ2,¬σ0,¬σ1}σ2\textstyle{\mathtt{C}=\{\sigma_{2},\neg\sigma_{0},\neg\sigma_{1}\}\vdash\sigma_{2}}𝙳={σ3,¬σ2}σ3\textstyle{\mathtt{D}=\{\sigma_{3},\neg\sigma_{2}\}\vdash\sigma_{3}}
Figure 6: The 0/1 PD framework shown in Example 3.9.
Example 3.10.

Let FF be a PD framework with three p-rules:

{σ0¬σ1:[1],σ1¬σ2:[1],σ2¬σ0:[1]}.\{\sigma_{0}\leftarrow\neg\sigma_{1}:[1],\sigma_{1}\leftarrow\neg\sigma_{2}:[1],\sigma_{2}\leftarrow\neg\sigma_{0}:[1]\}.

Let 𝙰={σ0,¬σ1}σ0\mathtt{A}=\{\sigma_{0},\neg\sigma_{1}\}\vdash\sigma_{0}, 𝙱={σ1,¬σ2}σ1\mathtt{B}=\{\sigma_{1},\neg\sigma_{2}\}\vdash\sigma_{1}, and 𝙲={σ2,¬σ0}σ2\mathtt{C}=\{\sigma_{2},\neg\sigma_{0}\}\vdash\sigma_{2}. Arguments and attacks are shown in Figure 7. Although FF is Rule-PSAT, it is not P-CWA consistent. Thus, there is no consistent joint distribution over the CC set. Pr(𝙰)\Pr(\mathtt{A}), Pr(𝙱)\Pr(\mathtt{B}) and Pr(𝙲)\Pr(\mathtt{C}) are undefined.

𝙰={σ0,¬σ2}σ0\textstyle{\mathtt{A}=\{\sigma_{0},\neg\sigma_{2}\}\vdash\sigma_{0}}𝙱={σ1,¬σ0}σ1\textstyle{\mathtt{B}=\{\sigma_{1},\neg\sigma_{0}\}\vdash\sigma_{1}}𝙲={σ2,¬σ1}σ2\textstyle{\mathtt{C}=\{\sigma_{2},\neg\sigma_{1}\}\vdash\sigma_{2}}
Figure 7: The PD framework shown in Example 3.10.
Example 3.11.

Let FF be a PD framework with five p-rules:

{σ0¬σ1:[1],\{\sigma_{0}\leftarrow\neg\sigma_{1}:[1], σ1¬σ0:[1],\sigma_{1}\leftarrow\neg\sigma_{0}:[1], σ2¬σ1,¬σ4:[1],\sigma_{2}\leftarrow\neg\sigma_{1},\neg\sigma_{4}:[1],
σ3¬σ2:[1],\sigma_{3}\leftarrow\neg\sigma_{2}:[1], σ4¬σ3:[1]}.\sigma_{4}\leftarrow\neg\sigma_{3}:[1]\}.

Let 𝙰={σ0,¬σ1}σ0\mathtt{A}=\{\sigma_{0},\neg\sigma_{1}\}\vdash\sigma_{0}, 𝙱={σ1,¬σ0}σ1\mathtt{B}=\{\sigma_{1},\neg\sigma_{0}\}\vdash\sigma_{1}, 𝙲={σ2,¬σ1,¬σ4}σ2\mathtt{C}=\{\sigma_{2},\neg\sigma_{1},\neg\sigma_{4}\}\vdash\sigma_{2}, 𝙳={σ3,¬σ2}σ3\mathtt{D}=\{\sigma_{3},\neg\sigma_{2}\}\vdash\sigma_{3}, and 𝙴={σ4,¬σ3}σ4\mathtt{E}=\{\sigma_{4},\neg\sigma_{3}\}\vdash\sigma_{4}. Arguments and attacks are shown in Figure 8. The joint distribution π\pi is such that

π(ωi)={1if ωi=¬σ0σ1¬σ2σ3¬σ4,0otherwise.\pi(\omega_{i})=\begin{cases}1&\quad\text{if }\omega_{i}=\neg\sigma_{0}\wedge\sigma_{1}\wedge\neg\sigma_{2}\wedge\sigma_{3}\wedge\neg\sigma_{4},\\ 0&\quad\text{otherwise.}\end{cases}

Compute argument probabilities, we have Pr(𝙰)=Pr(𝙲)=Pr(𝙴)=0\Pr(\mathtt{A})=\Pr(\mathtt{C})=\Pr(\mathtt{E})=0, and Pr(𝙱)=Pr(𝙳)=1\Pr(\mathtt{B})=\Pr(\mathtt{D})=1.

𝙰={σ0,¬σ1}σ0\textstyle{\mathtt{A}=\{\sigma_{0},\neg\sigma_{1}\}\vdash\sigma_{0}}𝙳={σ3,¬σ2}σ3\textstyle{\mathtt{D}=\{\sigma_{3},\neg\sigma_{2}\}\vdash\sigma_{3}}𝙱={σ1,¬σ0}σ1\textstyle{\mathtt{B}=\{\sigma_{1},\neg\sigma_{0}\}\vdash\sigma_{1}}𝙲={σ2,¬σ1,¬σ4}σ2\textstyle{\mathtt{C}=\{\sigma_{2},\neg\sigma_{1},\neg\sigma_{4}\}\vdash\sigma_{2}}𝙴={σ4,¬σ3}σ4\textstyle{\mathtt{E}=\{\sigma_{4},\neg\sigma_{3}\}\vdash\sigma_{4}}
Figure 8: The PD framework shown in Example 3.11.

We make two observations from these examples.

  • Syntactically, it is straightforward to map AA frameworks to PD frameworks such that a one-to-one mapping between arguments and attacks in an AA frameworks and their counterparts in the mapped PD framework exist. Thus, for any AA framework FF, there is a counterpart of it represented as a PD framework.

  • Semantically, the probability semantics of PD frameworks in these examples behaves intuitively, in the sense that:

    • winning arguments have probability 1;

    • losing arguments have probability 0;

    • arguments that can either win or lose have probability between 0 and 1; and

    • arguments cannot be labelled neither winning nor losing result in inconsistency.

    With these observations, more formally, as we show below, when argument probabilities are viewed as labelling, they represent a complete labelling.

Starting with the syntactical aspect, we define a mapping from AA frameworks to PD frameworks as follows.

Definition 3.5.

The function 𝙰𝙰𝟸𝙿𝙳\mathtt{AA2PD} is a mapping from AA frameworks to PD frameworks such that for an AA framework 𝒜,𝒯\langle\mathcal{A},\mathcal{T}\rangle, 𝙰𝙰𝟸𝙿𝙳(𝒜,𝒯)=,\mathtt{AA2PD}(\langle\mathcal{A},\mathcal{T}\rangle)=\langle\mathcal{L},\mathcal{R}\rangle, where:

  • =𝒜\mathcal{L}=\mathcal{A},

  • ={σ¬σ1,,¬σm:[1]|σ𝒜\mathcal{R}=\{\sigma\leftarrow\neg\sigma_{1},\ldots,\neg\sigma_{m}:[1]|\sigma\in\mathcal{A}, {σ1,,σm}={σi|(σi,σ)𝒯}}\{\sigma_{1},\ldots,\sigma_{m}\}=\{\sigma_{i}|(\sigma_{i},\sigma)\in\mathcal{T}\}\}.

Proposition 3.6 below sanctions that the arguments and attacks in AA frameworks are mapped to their counterparts in PD frameworks unambiguously.

Proposition 3.6.

Given an AA framework F=𝒜,𝒯F=\langle\mathcal{A},\mathcal{T}\rangle, there exists a function 𝚐\mathtt{g} that maps arguments in FF to arguments in 𝙰𝙰𝟸𝙿𝙳(F)\mathtt{AA2PD}(F) such that for any (𝙰,𝙱)𝒯(\mathtt{A},\mathtt{B})\in\mathcal{T}, 𝚐(𝙰)\mathtt{g}(\mathtt{A}) attacks 𝚐(𝙱)\mathtt{g}(\mathtt{B}) in 𝙰𝙰𝟸𝙿𝙳(F)\mathtt{AA2PD}(F).

To establish semantics connections between AA frameworks and PD frameworks, we first introduce AA-PD frameworks as the set of PD frameworks that are mapped from AA frameworks, i.e., let \mathcal{F} be the set of AA frameworks, then the set of AA-PD frameworks is {𝙰𝙰𝟸𝙿𝙳(F)|F}\{\mathtt{AA2PD}(F)|F\in\mathcal{F}\}. We observe the following with AA-PD frameworks:

  1. 1.

    unattacked arguments have probability 1 (note that this is the “founded” criterion in [36]), and

  2. 2.

    if an argument 𝙰\mathtt{A} with probability 1 attacks another argument 𝙱\mathtt{B}, 𝙱\mathtt{B} will have probability 0.

Formally, we have propositions 3.7 and 3.8 as follows.

Proposition 3.7.

Given an AA-PD framework FF, for an argument 𝙰=Sσ\mathtt{A}=S\vdash\sigma in FF, if 𝙰\mathtt{A} is not attacked in FF, then Pr(𝙰)=1\Pr(\mathtt{A})=1.

Proposition 3.8.

Given an AA-PD framework FF, for two arguments 𝙰\mathtt{A} and 𝙱\mathtt{B} in FF such that 𝙰\mathtt{A} attacks 𝙱\mathtt{B}. If Pr(𝙰)=1\Pr(\mathtt{A})=1 then Pr(𝙱)=0\Pr(\mathtt{B})=0.

Extending Propositions 3.7 and 3.8, we can show that if all attackers of an argument have probability 0, then the argument has probability 1; moreover, if an argument that has been attacked has probability 1, then all of its attackers must have probability 0. Formally,

Proposition 3.9.

Given an AA-PD framework FF, let 𝙰\mathtt{A} be an argument in FF and 𝙰𝚜\mathtt{As} the set of arguments attacking 𝙰\mathtt{A}, 𝙰𝙰𝚜\mathtt{A}\not\in\mathtt{As}.

  1. 1.

    If for all 𝙱𝙰𝚜\mathtt{B}\in\mathtt{As}, Pr(𝙱)=0\Pr(\mathtt{B})=0, then Pr(𝙰)=1\Pr(\mathtt{A})=1.

  2. 2.

    If Pr(𝙰)=1\Pr(\mathtt{A})=1, then for all 𝙱𝙰𝚜\mathtt{B}\in\mathtt{As}, Pr(𝙱)=0\Pr(\mathtt{B})=0.

An argument has probability 0 if and only if it has an attacker with probability 1.

Proposition 3.10.

Given an AA-PD framework FF, let 𝙰\mathtt{A} be an argument in FF and 𝙰𝚜\mathtt{As} the set of arguments attacking 𝙰\mathtt{A}, 𝙰𝙰𝚜\mathtt{A}\not\in\mathtt{As}. With maximum entropy reasoning,

  1. 1.

    if Pr(𝙰)=0\Pr(\mathtt{A})=0, then there exists 𝙱𝙰𝚜\mathtt{B}\in\mathtt{As}, such that Pr(𝙱)=1\Pr(\mathtt{B})=1;

  2. 2.

    if there exists 𝙱𝙰𝚜\mathtt{B}\in\mathtt{As}, such that Pr(𝙱)=1\Pr(\mathtt{B})=1, then Pr(𝙰)=0\Pr(\mathtt{A})=0.

Maximum entropy reasoning is a key condition for Proposition 3.10. This proposition does not hold without it, as illustrate in the following example.

Example 3.12.

Consider an AA-PD framework with five p-rules:

σ1¬σ2\sigma_{1}\leftarrow\neg\sigma_{2}:[11], σ2¬σ1\sigma_{2}\leftarrow\neg\sigma_{1}:[11], σ3¬σ1,¬σ4\sigma_{3}\leftarrow\neg\sigma_{1},\neg\sigma_{4}:[11],
σ4¬σ5\sigma_{4}\leftarrow\neg\sigma_{5}:[11], σ5¬σ4\sigma_{5}\leftarrow\neg\sigma_{4}:[11].

The arguments and attacks are shown in Figure 9. Calculate the joint probability distribution with maximum entropy reasoning, we have the solution

π(ωi)={0.25if ωi{01011,01100,10010,10100},0otherwise.\pi(\omega_{i})=\begin{cases}0.25&\quad\text{if }\omega_{i}\in\{01011,01100,10010,10100\},\\ 0&\quad\text{otherwise.}\end{cases}

This solution gives Pr(𝙰)=Pr(𝙱)=Pr(𝙲)=Pr(𝙳)=0.5\Pr(\mathtt{A})=\Pr(\mathtt{B})=\Pr(\mathtt{C})=\Pr(\mathtt{D})=0.5 and Pr(𝙴)=0.25\Pr(\mathtt{E})=0.25.

Without maximum entropy reasoning, a possible solution is

π(ωi)={0.33if ωi{01100,10010,10100},0otherwise.\pi(\omega_{i})=\begin{cases}0.33&\quad\text{if }\omega_{i}\in\{01100,10010,10100\},\\ 0&\quad\text{otherwise.}\end{cases}

This joint distribution gives Pr(𝙰)=Pr(𝙲)=0.67,Pr(𝙱)=Pr(𝙳)=0.33\Pr(\mathtt{A})=\Pr(\mathtt{C})=0.67,\Pr(\mathtt{B})=\Pr(\mathtt{D})=0.33 and Pr(𝙴)=0.\Pr(\mathtt{E})=0. Thus, Pr(𝙴)=0\Pr(\mathtt{E})=0 even though neither of the two arguments (𝙰\mathtt{A} and 𝙲\mathtt{C}) attacking Pr(𝙴)\Pr(\mathtt{E}) has probability 1.

𝙰={σ1¬σ2}σ1\textstyle{\mathtt{A}=\{\sigma_{1}\,\neg\sigma_{2}\}\vdash\sigma_{1}}𝙴={σ3,¬σ1,¬σ4}σ3\textstyle{\mathtt{E}=\{\sigma_{3},\neg\sigma_{1},\neg\sigma_{4}\}\vdash\sigma_{3}}𝙲={σ4,¬σ5}σ4\textstyle{\mathtt{C}=\{\sigma_{4},\neg\sigma_{5}\}\vdash\sigma_{4}}𝙱={σ2¬σ1}σ2\textstyle{\mathtt{B}=\{\sigma_{2}\,\neg\sigma_{1}\}\vdash\sigma_{2}}𝙳={σ5,¬σ4}σ5\textstyle{\mathtt{D}=\{\sigma_{5},\neg\sigma_{4}\}\vdash\sigma_{5}}
Figure 9: The AA-PD framework shown in Example 3.12.

Propositions 3.7 - 3.10 describe attack relations between PD arguments similar to attacks in AA (or other non-probabilistic argumentation frameworks). For instance, if we consider arguments with probability 0 𝚘𝚞𝚝\mathtt{out} and probability 1 𝚒𝚗\mathtt{in}, then we obtain a labelling-based semantics as shown in [1]. Formally,

Theorem 3.1.

Given an AA PD framework FF, let 𝙰𝚜\mathtt{As} be the set of arguments in FF, with maximum entropy reasoning, the Probabilistic Labelling function Ξ:𝙰𝚜{𝚒𝚗,𝚘𝚞𝚝,𝚞𝚗𝚍𝚎𝚌}\Xi:\mathtt{As}\mapsto\{\mathtt{in},\mathtt{out},\mathtt{undec}\} defined as

Ξ(𝙰)={𝚒𝚗if Pr(𝙰)=1,𝚘𝚞𝚝if Pr(𝙰)=0,𝚞𝚗𝚍𝚎𝚌otherwise.\Xi(\mathtt{A})=\begin{cases}\mathtt{in}&\quad\text{if }\Pr(\mathtt{A})=1,\\ \mathtt{out}&\quad\text{if }\Pr(\mathtt{A})=0,\\ \mathtt{undec}&\quad\text{otherwise.}\end{cases}

in which 𝙰𝙰𝚜\mathtt{A}\in\mathtt{As}, is a complete labelling.

Theorem 3.1 bridges PD and AA semantically as relations between the complete labelling and argument extensions have been studied extensively in e.g., [1, 7, 8]. In short, labelling can be mapped to extensions in the way that given some semantics ss, arguments that are labelled 𝚒𝚗\mathtt{in} with ss-labelling belong to an ss-extension. Moreover, the complete labelling can be viewed at the centre of defining labellings for other semantics [57]. For instance, a grounded labelling is a complete labelling such that the set of arguments labelled in is minimal with respect to set inclusion among all complete labellings; a stable labelling is a complete labelling such that the set of undecided arguments is empty; a preferred labelling is a complete labelling such that the set of arguments labelled in is maximal with respect to set inclusion among all complete labellings [57].

One last result we would like to present on AA-PD framework is the following.

Proposition 3.11.

Given an AA-PD framework FF, let 𝙰\mathtt{A} be an argument in FF and 𝙰𝚜\mathtt{As} the set of arguments attacking 𝙰\mathtt{A}, 𝙰𝙰𝚜\mathtt{A}\not\in\mathtt{As}. Pr(𝙰)1𝙱𝙰𝚜Pr(𝙱)\Pr(\mathtt{A})\geq 1-\sum_{\mathtt{B}\in\mathtt{As}}\Pr(\mathtt{B}).

This is the “optimistic criterion” introduced in [36, 37]. We will discuss more on the relation between AA-PD and probabilistic abstract argumentation in Section 5.1.

In this section, we have introduced arguments built with p-rules and attacks between arguments in PD frameworks. In PD frameworks, arguments are deductions as they are in ABA frameworks, [11]. Attacks are defined syntactically such that an argument 𝙰\mathtt{A} attacks another argument 𝙱\mathtt{B} if the claim of 𝙰\mathtt{A} is the negation of some literal in 𝙱\mathtt{B}. We have compared PD with AA and show that AA can be mapped to PD frameworks containing only rules assigned with probability 1. The key insight is that the probability semantics given by PD can be viewed as a complete labelling as defined in AA.

4 Probability Calculation

So far we have introduced the probability semantics of p-rules in Section 2 and argument construction in Section 3. In both sections, we have assumed that the joint probability distribution π\pi for the CC set can be computed. In this section, we study methods for computing π\pi from p-rules. We look at methods for computing exact solutions as well as their approximations.

4.1 Compute Joint Distribution with Linear Programming

We begin with methods for computing exact solutions. Given a set of p-rules ={ρ1,,ρm}\mathcal{R}=\{\rho_{1},\ldots,\rho_{m}\} such that \mathcal{L} contains nn literals, to test whether \mathcal{R} is Rule-PSAT, we set up a linear system

Aπ=B,A\pi=B, (24)

where AA is an (m+1)(m+1)-by-2n2^{n} matrix, π=[π(ω1),,π(ω2n)]T\pi=[\pi(\omega_{1}),\ldots,\pi(\omega_{2^{n}})]^{T}, BB an (m+1)(m+1)-by-11 matrix.777We let {ω1,,ω2n}\{\omega_{1},\ldots,\omega_{2^{n}}\} be the CC set of \mathcal{L}. We consider elements in this set being ordered with their Boolean values. E.g., for ={σ0,σ1}\mathcal{L}=\{\sigma_{0},\sigma_{1}\}, the four elements in the CC set are ordered such that {ω1=¬σ0¬σ1,ω2=¬σ0σ1,ω3=σ0¬σ1,ω4=σ0σ1}\{\omega_{1}=\neg\sigma_{0}\wedge\neg\sigma_{1},\omega_{2}=\neg\sigma_{0}\wedge\sigma_{1},\omega_{3}=\sigma_{0}\wedge\neg\sigma_{1},\omega_{4}=\sigma_{0}\wedge\sigma_{1}\}. We construct AA and BB in a way such that RR is Rule-PSAT if and only if π\pi has a solution in [0,1]2n[0,1]^{2^{n}}, as follows.

For each rule ρi\rho_{i}\in\mathcal{R}, if ρi=σ0:[θ]\rho_{i}=\sigma_{0}\leftarrow:[\theta] has an empty body, then

A[i,j]={1,if ωjσ0;0,otherwise;A[i,j]=\begin{cases}1,&\mbox{if }\omega_{j}\models\sigma_{0}\mbox{;}\\ 0,&\mbox{otherwise};\end{cases} (25)

and

B[i]=θ.B[i]=\theta. (26)

Otherwise, ρi=σ0σ1,,σk:[θ]\rho_{i}=\sigma_{0}\leftarrow\sigma_{1},\ldots,\sigma_{k}:[\theta], then

A[i,j]={θ1,if ωjσ0σ1σk;θ,if ωj¬σ0σ1σk;0,otherwise;A[i,j]=\begin{cases}\theta-1,&\mbox{if }\omega_{j}\models\sigma_{0}\wedge\sigma_{1}\wedge\ldots\wedge\sigma_{k}\mbox{;}\\ \theta,&\mbox{if }\omega_{j}\models\neg\sigma_{0}\wedge\sigma_{1}\wedge\ldots\wedge\sigma_{k}\mbox{;}\\ 0,&\mbox{otherwise};\end{cases} (27)
B[i]=0.B[i]=0. (28)

Row m+1m+1 in AA and BB are 111\ldots 1 and 11, respectively.

Example 4.1.

Consider a p-rule set containing two p-rules, ρ0,ρ1\rho_{0},\rho_{1}. Let

ρ0=σ0σ1:[α],ρ1=σ1:[β].\rho_{0}=\sigma_{0}\leftarrow\sigma_{1}:[\alpha],\rho_{1}=\sigma_{1}\leftarrow:[\beta].

Here, m=2m=2, n=2n=2. From Equations 24 to 28, we have

\fixTABwidthTA=\bracketMatrixstack0&α0α101011111,\fixTABwidth{T}A=\bracketMatrixstack{0&\alpha 0\alpha-1\\ 0101\\ 1111},

π=[π(¬σ0¬σ1),π(¬σ0σ1),π(σ0¬σ1),π(σ0σ1)]T\pi=[\pi(\neg\sigma_{0}\wedge\neg\sigma_{1}),\pi(\neg\sigma_{0}\wedge\sigma_{1}),\pi(\sigma_{0}\wedge\neg\sigma_{1}),\pi(\sigma_{0}\wedge\sigma_{1})]^{T}, and B=[0,β,1]T.B=[0,\beta,1]^{T}. It is easy to see that π\pi has solutions as shown in Example 2.2.

Theorem 4.1.

[20] Given a set of p-rules \mathcal{R} on some language \mathcal{L}, \mathcal{R} is Rule-PSAT if and only if Equation 24 has a solution for π\pi in [0,1]2n[0,1]^{2^{n}}.

4.2 Compute Joint Distribution under P-CWA

To reason with P-CWA, additional equations must be introduced as constraints. To this end, we revise the construction of matrices AA and BB in Equation 24 as given in Equations 25 & 27 and Equations 26 & 28, respectively, to meet the requirement given in Equation 18.

In revising constructions of these two matrices, one useful observation we can make is that the P-CWA constraint

ωiΩ,ωixπ(ωi)=ωiΩ,ωiSπ(ωi)\sum_{\omega_{i}\in\Omega,\omega_{i}\models x}\pi(\omega_{i})=\sum_{\omega_{i}\in\Omega,\omega_{i}\models S}\pi(\omega_{i})

can be computed “locally” in the sense that one does not need to explicitly identify SS, the disjunction of conjunctions of literals that are in deductions for xx (Equation 17), when computing Pr(x)\Pr(x) for each literal xx. This is important as if we were to identify SS explicitly upon computing Pr(x)\Pr(x) for each xx, then we need to compute all deductions of xx, which is both repetitive and computationally expensive. We first illustrate the “local computation” idea with two examples and then present the algorithm along with a formal proof.

Consider three p-rules:

σ0σ1:[],σ1σ2:[],σ2:[]\sigma_{0}\leftarrow\sigma_{1}:[\cdot],\sigma_{1}\leftarrow\sigma_{2}:[\cdot],\sigma_{2}\leftarrow:[\cdot].

Directly applying the definition of P-CWA, we have

  • Pr(σ0)=π(σ0σ1σ2)\Pr(\sigma_{0})=\pi(\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2}) from the deduction {σ0,σ1,σ2}σ0\{\sigma_{0},\sigma_{1},\sigma_{2}\}\vdash\sigma_{0}888To simplify the presentation, we use π(s)\pi(s) to denote ωΩ,ωsπ(ω)\sum_{\omega\in\Omega,\omega\models s}\pi(\omega). E.g., π(σ0σ1σ2)\pi(\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2}) denotes ωΩ,ωσ0σ1σ2π(ω).\sum_{\omega\in\Omega,\omega\models\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2}}\pi(\omega)., and

  • Pr(σ1)=π(σ1σ2)\Pr(\sigma_{1})=\pi(\sigma_{1}\wedge\sigma_{2}) from the deduction {σ1,σ2}σ1\{\sigma_{1},\sigma_{2}\}\vdash\sigma_{1}.

However, if we were to take the “global” view and directly encode

Pr(σ0)=π(σ0σ1σ2)\Pr(\sigma_{0})=\pi(\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2})

with the equation

ωΩ,ωσ0,ω⊧̸σ0σ1σ2π(ω)=0\sum_{\omega\in\Omega,\omega\models\sigma_{0},\omega\not\models\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2}}\pi(\omega)=0 (29)

then we must traverse all three rules to find the deduction {σ0,σ1,σ2}𝙳σ0\{\sigma_{0},\sigma_{1},\sigma_{2}\}\vdash_{\mathtt{D}}\sigma_{0}. Instead of doing this traversal, we can simply encode

  • Pr(σ0)=π(σ0σ1)\Pr(\sigma_{0})=\pi(\sigma_{0}\wedge\sigma_{1}) from the p-rule σ0σ1:[]\sigma_{0}\leftarrow\sigma_{1}:[\cdot] with

    ωΩ,ωσ0,ω⊧̸σ0σ1π(ω)=0,\sum_{\omega\in\Omega,\omega\models\sigma_{0},\omega\not\models\sigma_{0}\wedge\sigma_{1}}\pi(\omega)=0, (30)

    and

  • Pr(σ1)=π(σ1σ2)\Pr(\sigma_{1})=\pi(\sigma_{1}\wedge\sigma_{2}) from the p-rule σ1σ2:[]\sigma_{1}\leftarrow\sigma_{2}:[\cdot] with

    ωΩ,ωσ1,ω⊧̸σ1σ2π(ω)=0.\sum_{\omega\in\Omega,\omega\models\sigma_{1},\omega\not\models\sigma_{1}\wedge\sigma_{2}}\pi(\omega)=0. (31)

The new equations 30 and 31 are “local” as given a rule headbodyhead\leftarrow body, we simply assert Pr(head)=π(headbody)\Pr(head)=\pi(head\wedge body). There is no deduction construction or multi-rule traversal, which is needed for constructing Equation 29. To see their equivalence, we show that they assign the same set of ωΩ\omega\in\Omega to 0.

Example 4.2.

We examine the ω\omega assigned to 0 from each equation. To simplify the presentation, we again use the Boolean string representation introduced in Example 2.5 for literals. E.g., “110” denotes σ0σ1¬σ2\sigma_{0}\wedge\sigma_{1}\wedge\neg\sigma_{2}.

  • Pr(σ0)=π(σ0σ1σ2)\Pr(\sigma_{0})=\pi(\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2}) asserts that π(100)=π(101)=π(110)=0\pi(100)=\pi(101)=\pi(110)=0.999These are easy to see as the first bit in the three-bit string must be 1 so the conjunction represented by the string satisfies σ0\sigma_{0}; the remaining two bits cannot both be 1 as that would make the conjunction satisfies σ0σ1σ2\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2}. So we have 100, 101, and 110 produced in this case.

  • Pr(σ1)=π(σ1σ2)\Pr(\sigma_{1})=\pi(\sigma_{1}\wedge\sigma_{2}) asserts that π(010)=π(110)=0\pi(010)=\pi(110)=0.101010Similarly, in this case the second bit must be 1 to satisfy σ1\sigma_{1}, and the third bit must be 0 to not satisfy σ1σ2\sigma_{1}\wedge\sigma_{2}. There is no constraint on the first bit, so we produce 010 and 110 in this case.

  • Pr(σ0)=π(σ0σ1)\Pr(\sigma_{0})=\pi(\sigma_{0}\wedge\sigma_{1}) asserts that π(100)=π(101)=0\pi(100)=\pi(101)=0.

We see that the only difference between

Pr(σ0)=π(σ0σ1σ2)\Pr(\sigma_{0})=\pi(\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{2}) and Pr(σ0)=π(σ0σ1)\Pr(\sigma_{0})=\pi(\sigma_{0}\wedge\sigma_{1})

is on setting π(110)=0\pi(110)=0. Yet, this is asserted by Pr(σ1)=π(σ1σ2)\Pr(\sigma_{1})=\pi(\sigma_{1}\wedge\sigma_{2}), which is available in both the “global” and the “local” versions of constraint.

The above example illustrates the case where p-rules with different heads are chained. When there are two p-rules with the same head, e.g., there exist

σ0σ1\sigma_{0}\leftarrow\sigma_{1}:[\cdot] and σ0σ2\sigma_{0}\leftarrow\sigma_{2}:[\cdot],

then we assert

Pr(σ0)=π((σ0σ1)(σ0σ2))=ωΩ,ω(σ0σ1)(σ0σ2)π(ω).\Pr(\sigma_{0})=\pi((\sigma_{0}\wedge\sigma_{1})\vee(\sigma_{0}\wedge\sigma_{2}))=\sum_{\omega\in\Omega,\omega\models(\sigma_{0}\wedge\sigma_{1})\vee(\sigma_{0}\wedge\sigma_{2})}\pi(\omega).
Example 4.3.

Consider five p-rules:

σ0σ1\sigma_{0}\leftarrow\sigma_{1}:[\cdot], σ0σ2\sigma_{0}\leftarrow\sigma_{2}:[\cdot], σ1σ3\sigma_{1}\leftarrow\sigma_{3}:[\cdot], σ2\sigma_{2}\leftarrow:[\cdot], σ3\sigma_{3}\leftarrow:[\cdot].

With a direct application of P-CWA definition, we have

Pr(σ0)=π((σ0σ1σ3)(σ0σ2))\Pr(\sigma_{0})=\pi((\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{3})\vee(\sigma_{0}\wedge\sigma_{2})) and Pr(σ1)=π(σ1σ3)\Pr(\sigma_{1})=\pi(\sigma_{1}\wedge\sigma_{3}).

We show that this is the same as asserting

Pr(σ0)=π((σ0σ1)(σ0σ2))\Pr(\sigma_{0})=\pi((\sigma_{0}\wedge\sigma_{1})\vee(\sigma_{0}\wedge\sigma_{2})) and Pr(σ1)=π(σ1σ3)\Pr(\sigma_{1})=\pi(\sigma_{1}\wedge\sigma_{3}).

Using the Boolean string representation, E.g., “1001” denotes σ0¬σ1¬σ2σ3\sigma_{0}\wedge\neg\sigma_{1}\wedge\neg\sigma_{2}\wedge\sigma_{3},

  • with Pr(σ0)=π((σ0σ1σ3)(σ0σ2))\Pr(\sigma_{0})=\pi((\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{3})\vee(\sigma_{0}\wedge\sigma_{2})), we assert

    π(1000)=π(1001)=π(1100)=0.\pi(1000)=\pi(1001)=\pi(1100)=0.
  • With Pr(σ1)=π(σ1σ3)\Pr(\sigma_{1})=\pi(\sigma_{1}\wedge\sigma_{3}), we assert

    π(0100)=π(0110)=π(1100)=π(1110)=0.\pi(0100)=\pi(0110)=\pi(1100)=\pi(1110)=0.
  • With Pr(σ0)=π((σ0σ1)(σ0σ2))\Pr(\sigma_{0})=\pi((\sigma_{0}\wedge\sigma_{1})\vee(\sigma_{0}\wedge\sigma_{2})), we assert

    π(1000)=π(1001)=0.\pi(1000)=\pi(1001)=0.

We can see that the difference between the “global” constraint

Pr(σ0)=π((σ0σ1σ3)(σ0σ2))\Pr(\sigma_{0})=\pi((\sigma_{0}\wedge\sigma_{1}\wedge\sigma_{3})\vee(\sigma_{0}\wedge\sigma_{2}))

and the “local” one

Pr(σ0)=π((σ0σ1)(σ0σ2))\Pr(\sigma_{0})=\pi((\sigma_{0}\wedge\sigma_{1})\vee(\sigma_{0}\wedge\sigma_{2}))

is on asserting π(1100)=0\pi(1100)=0. However, this is asserted by Pr(σ1)=π(σ1σ3)\Pr(\sigma_{1})=\pi(\sigma_{1}\wedge\sigma_{3}). Thus the “global” constraint is indeed satisfied by the “local” version.

Summarising these two examples, additional rows of matrices AA and BB describing P-CWA can be constructed as follows.

Given a set of p-rules \mathcal{R} such that Σ={σ1,,σm}\Sigma=\{\sigma_{1},\ldots,\sigma_{m}\} are heads of p-rules in \mathcal{R}, for each σΣ\sigma\in\Sigma, let

σσ11,,σl11:[],,σσ1m,,σlmm:[]\sigma\leftarrow\sigma_{1}^{1},\ldots,\sigma_{l1}^{1}:[\cdot],\ldots,\sigma\leftarrow\sigma_{1}^{m},\ldots,\sigma_{lm}^{m}:[\cdot]

be the p-rules in \mathcal{R} with head σ\sigma. We construct

s=(σσ11σl11)(σσ1mσlmm).s=(\sigma\wedge\sigma_{1}^{1}\wedge\ldots\wedge\sigma_{l1}^{1})\vee\ldots\vee(\sigma\wedge\sigma_{1}^{m}\wedge\ldots\wedge\sigma_{lm}^{m}). (32)

For each σΣ\sigma\in\Sigma, append a new row ii to AA such that

A[i,j]={1,if ωjσ and ωj⊧̸s,0,otherwise;A[i,j]=\begin{cases}1,&\mbox{if }\omega_{j}\models\sigma\mbox{ and }\omega_{j}\not\models s\mbox{,}\\ 0,&\mbox{otherwise};\end{cases} (33)

and

B[i]=0,B[i]=0, (34)

where j=12nj=1\ldots 2^{n} are the column indices of AA, ωj\omega_{j} all atomic conjunctions in Ω\Omega. Here, we again consider that Ω\Omega is ordered with the Boolean values of its elements as in footnote 7.

Theorem 4.2.

Given a set of p-rules \mathcal{R} on nn literals, if there is a solution π[0,1]2n\pi\in[0,1]^{2^{n}} to

Aπ=BA\pi=B

with AA, BB constructed with Equations 25, 27, 33 and 26, 28, 34, respectively. Then \mathcal{R} is P-CWA consistent, π\pi a P-CWA solution.

Theorem 4.2 sanctions the correctness of coding the P-CWA criterion with local constraints. The proof of this theorem shown in A is long-winded. However, the idea is simple. We first observe that the “global” constraints given by the P-CWA definition, which are defined with respect to deductions, require us setting Pr(ωG)=0\Pr(\omega_{G})=0 for some ωGΩ\omega_{G}\in\Omega; and the “local” constraints, given by Equations 33 and 34, which only need information in the level of p-rules, also set Pr(ωL)=0\Pr(\omega_{L})=0 for some ωLΩ\omega_{L}\in\Omega. This theorem states that ωG\omega_{G}s and ωL\omega_{L}s are the same set of atomic conjunctions. This is the case as shown by our induction proof that:

  1. 1.

    when each deduction contains a single p-rule, it is obvious to see that the set of ωL\omega_{L}s is the same set of ωG\omega_{G}s;

  2. 2.

    when a deduction contains multiple p-rules, assume it is the case that ωL\omega_{L}s = ωG\omega_{G}s, then introducing any a new p-rule will not break the equality. This is the case because for any ω\omega that is set to Pr(ω)=0\Pr(\omega)=0 by the global constraint but not the local constraint defined by the new p-rule, we can find an existing p-rule that sets Pr(ω)=0\Pr(\omega)=0 for the same ω\omega.

    To see this, we observe that for a new p-rule of the form

    ρ=σ0σ1,,σn:[]\rho=\sigma_{0}\leftarrow\sigma_{1},\ldots,\sigma_{n}:[\cdot]

    with deduction

    𝙰={σ0,,σn,σn+1,,σn+m}𝙳σ0,\mathtt{A}=\{\sigma_{0},\ldots,\sigma_{n},\sigma_{n+1},\ldots,\sigma_{n+m}\}\vdash_{\mathtt{D}}\sigma_{0},

    the ω\omega that is set to Pr(ω)=0\Pr(\omega)=0 by 𝙰\mathtt{A} but not ρ\rho is of the form

    ω=σ0σn¬σk\omega=\sigma_{0}\wedge\ldots\wedge\sigma_{n}\wedge\ldots\wedge\neg\sigma_{k}\wedge\ldots

    in which σk{σn+1,,σn+m}\sigma_{k}\in\{\sigma_{n+1},\ldots,\sigma_{n+m}\}. However, Pr(ω)=0\Pr(\omega)=0 will be set for such ω\omega by the p-rule

    ρ=σσk,:[].\rho^{\prime}=\sigma^{*}\leftarrow\sigma_{k},\ldots:[\cdot].

    σ\sigma^{*} and ρ\rho^{\prime} must both exist in 𝙰\mathtt{A} as without them, there would not be σk𝙰\sigma_{k}\in\mathtt{A}. With the local constraint, ρ\rho^{\prime} will set Pr(σ¬σk)=0\Pr(\sigma^{*}\wedge\neg\sigma_{k}\wedge\ldots)=0.

4.3 Compute Maximum Entropy Solutions

To compute the maximum entropy distribution introduced in Section 2.3, we use

Maximize:

H(π1,,π2n)=i=12nπilog(πi),H(\pi_{1},\ldots,\pi_{2^{n}})=-\sum_{i=1}^{2^{n}}\pi_{i}\log(\pi_{i}), (35)

subject to:

Aπ\displaystyle A\pi =B,\displaystyle=B,
0\displaystyle 0 πi.\displaystyle\leq\pi_{i}.

As the objective function HH is quadratic, it can no longer be solved with linear programming techniques. Thus, one possibility is to use a quadratic programming (QP) approach, such as a trust-region method [61], which can optimise quadratic objective functions with linear constraints and allow specification of variable bounds.

Alternatively, to maintain linearity hence reducing complexity, instead of maximising the von Neumann entropy HH (Equation 36), we can use the linear entropy [6], which approximates log(x)\log(x) with x1x-1 (the first term in the Taylor series of log(x)\log(x)), and maximise

Hl(π1,,π2n)=i=12nπi(1πi),H_{l}(\pi_{1},\ldots,\pi_{2^{n}})=-\sum_{i=1}^{2^{n}}\pi_{i}(1-\pi_{i}), (36)

with Lagrange multipliers [4] as follows.

Consider AA as a column vector of mm row vectors

A=[𝒂𝟏𝒂𝒎],A=\begin{bmatrix}\bm{a_{1}}\\ \vdots\\ \bm{a_{m}}\end{bmatrix},

and B=[b1,,bm]TB=[b_{1},\ldots,b_{m}]^{T}. Define an auxiliary function LL:

L(π1,,π2n,λ1,,λm)=Hl(π1,,π2n)i=1mλi(𝒂𝒊πbi).L(\pi_{1},\ldots,\pi_{2^{n}},\lambda_{1},\ldots,\lambda_{m})=H_{l}(\pi_{1},\ldots,\pi_{2^{n}})-\sum_{i=1}^{m}\lambda_{i}(\bm{a_{i}}\cdot\pi-b_{i}).

λ1,,λm\lambda_{1},\ldots,\lambda_{m} are the Lagrange multipliers. We need to solve

π1,,π2n,λ1,,λmL(π1,,π2n,λ1,,λm)=0.\nabla_{\pi_{1},\ldots,\pi_{2^{n}},\lambda_{1},\ldots,\lambda_{m}}L(\pi_{1},\ldots,\pi_{2^{n}},\lambda_{1},\ldots,\lambda_{m})=0.

This amounts to solve the following m+2nm+2^{n} equations:

For i=1mi=1\ldots m:

𝒂𝒊πbi=0.\bm{a_{i}}\cdot\pi-b_{i}=0. (37)

For j=12nj=1\ldots 2^{n}:

πji=1mλi𝒂𝒊,𝒋=0.\pi_{j}-\sum_{i=1}^{m}\lambda_{i}\bm{a_{i,j}}=0. (38)

Together with the remaining mm equations (Equation 37), we have a new system that gives a maximum linear entropy solution to Aπ=BA\pi=B.

In a matrix form, we have

\fixTABwidthT\bracketMatrixstackA&0I2nAT\bracketMatrixstackπλ=\bracketMatrixstackB0,\fixTABwidth{T}\bracketMatrixstack{A&0\\ I_{2^{n}}-A^{T}}\bracketMatrixstack{\pi\\ \lambda}=\bracketMatrixstack{B\\ 0}, (39)

where I2nI_{2^{n}} is the 2n2^{n}-by-2n2^{n} identity matrix, and λ=[λ1,,λm]\lambda=[\lambda_{1},\ldots,\lambda_{m}].

Example 4.4.

Consider a p-rule set with two p-rules

σ0σ1:[0.9]\sigma_{0}\leftarrow\sigma_{1}:[0.9] and σ1:[0.8]\sigma_{1}\leftarrow:[0.8].

To find the maximum linear entropy solution of π\pi using Lagrange multipliers, we start by constructing matrices AA and BB as:

\fixTABwidthTA=\bracketMatrixstack0&0.900.101011111,\fixTABwidth{T}A=\bracketMatrixstack{0&0.90-0.1\\ 0101\\ 1111},

B=[0,0.8,1]TB=[0,0.8,1]^{T}. Thus, m=3m=3 and

𝒂𝟏=[0,0.9,0,0.1],𝒂𝟐=[0,1,0,1],𝒂𝟑=[1,1,1,1].\bm{a_{1}}=[0,0.9,0,-0.1],\bm{a_{2}}=[0,1,0,1],\bm{a_{3}}=[1,1,1,1].

Using Equations 38 and 37, we have a linear system with 7 equations to solve:

0.9π20.1π4\displaystyle 0.9\pi_{2}-0.1\pi_{4} =0\displaystyle=0
π2+π4\displaystyle\pi_{2}+\pi_{4} =0.8\displaystyle=0.8
π1+π2+π3+π4\displaystyle\pi_{1}+\pi_{2}+\pi_{3}+\pi_{4} =1\displaystyle=1
π1λ3\displaystyle\pi_{1}-\lambda_{3} =0\displaystyle=0
π20.9λ1λ2λ3\displaystyle\pi_{2}-0.9\lambda_{1}-\lambda_{2}-\lambda_{3} =0\displaystyle=0
π3λ3\displaystyle\pi_{3}-\lambda_{3} =0\displaystyle=0
π4+0.1λ1λ2λ3\displaystyle\pi_{4}+0.1\lambda_{1}-\lambda_{2}-\lambda_{3} =0\displaystyle=0

In matrix form, we have:

\fixTABwidthT\bracketMatrixstack0&0.900.100001010001111000100000101000.911001000100010.111\bracketMatrixstackπ1π2π3π4λ1λ2λ3=\bracketMatrixstack00.810000.\fixTABwidth{T}\bracketMatrixstack{0&0.90-0.1000\\ 0101000\\ 1111000\\ 100000-1\\ 0100-0.9-1-1\\ 001000-1\\ 00010.1-1-1}\bracketMatrixstack{\pi_{1}\\ \pi_{2}\\ \pi_{3}\\ \pi_{4}\\ \lambda_{1}\\ \lambda_{2}\\ \lambda_{3}}=\bracketMatrixstack{0\\ 0.8\\ 1\\ 0\\ 0\\ 0\\ 0}.

Solve these, we find:

π1=0.1,\pi_{1}=0.1, π2=0.08,\pi_{2}=0.08, π3=0.1,\pi_{3}=0.1, π4=0.72\pi_{4}=0.72,
λ1=0.64,\lambda_{1}=-0.64, λ2=0.556,\lambda_{2}=0.556, λ3=0.1.\lambda_{3}=0.1.

Drop auxiliary variables λ1\lambda_{1}, λ2\lambda_{2} and λ3\lambda_{3}. From π1\pi_{1} to π4\pi_{4}, we find Pr(σ0)=0.82\Pr(\sigma_{0})=0.82, and Pr(σ1)=0.8\Pr(\sigma_{1})=0.8.

Refer to caption
Figure 10: Linear Entropy vs. von Neumann Entropy Illustration for a language containing a single literal.

As illustrate in Figure 10, linear entropy, HlH_{l}, gives a lower bound to von Neumann entropy HH. More importantly, both HH and HlH_{l} attain their maxima when the probabilities are most equally distributed. It is easy to see that the distribution that maximize linear entropy also maximizes von Neumann entropy; thus we do not lose accuracy while maximizing linear entropy.

4.4 Compute Solutions with Stochastic Gradient Descent

So far, all discussions on calculating the joint distribution is centered on solving the linear system

Aπ=BA\pi=B

with different constructions of AA and BB. Although linear programming with linprog computes solutions, it is computationally expensive. To have a more efficient approach, we consider a stochastic gradient descent (SGD) method for solving Aπ=BA\pi=B as follows.

Let A=[𝒂𝟏,,𝒂𝒎]TA=[\bm{a_{1}},\ldots,\bm{a_{m}}]^{T}, for i=1,,mi=1,\ldots,m, define

hπ(𝒂i)=𝒂iπ.h_{\pi}(\bm{a}_{i})=\bm{a}_{i}\cdot\pi.

Consider the squared loss function LL:

L(hπ(𝒂i))=i=1m(𝒂iπbi)2.L(h_{\pi}(\bm{a}_{i}))=\sum^{m}_{i=1}(\bm{a}_{i}\cdot\pi-b_{i})^{2}.

Then, solve Aπ=BA\pi=B is to find π\pi^{*}, such that

π\displaystyle\pi^{*} =argminπi=1mL(hπ(𝒂i))\displaystyle=\operatorname*{arg\,min}_{\pi}\sum_{i=1}^{m}L(h_{\pi}(\bm{a}_{i}))
=argminπi=1m(𝒂iπbi)2.\displaystyle=\operatorname*{arg\,min}_{\pi}\sum^{m}_{i=1}(\bm{a}_{i}\cdot\pi-b_{i})^{2}.

Use SGD to find the minimum point. For some initial π=[π1,,π2n]\pi=[\pi_{1},\ldots,\pi_{2^{n}}], loop ii in 1,,m1,\ldots,m, each πj\pi_{j} in π\pi is updated iteratively with

πjmin(1,max(0,πj+Δπj))\pi_{j}\Leftarrow\min(1,\max(0,\pi_{j}+\Delta\pi_{j})) (40)

in which

Δπj\displaystyle\Delta\pi_{j} =η×πj(𝒂iπbi)2\displaystyle=\eta\times\frac{\partial}{\partial\pi_{j}}(\bm{a}_{i}\cdot\pi-b_{i})^{2}
=η×2(𝒂iπbi)πj(𝒂iπbi)\displaystyle=\eta\times 2(\bm{a}_{i}\cdot\pi-b_{i})\frac{\partial}{\partial\pi_{j}}(\bm{a}_{i}\cdot\pi-b_{i})
=η×2(𝒂iπbi)𝒂ij,\displaystyle=\eta\times 2(\bm{a}_{i}\cdot\pi-b_{i})\bm{a}_{ij},

where η\eta is the learning rate (a small positive number).

Such a root finding process can be viewed as training an unthresholded perceptron model [53] without activation function using SGD. Each πi\pi_{i} is bounded in [0,1][0,1] in the updating step (Equation 40). Note that this is a generic method for solving linear systems. It does not rely on any specific construction of AA or BB. Thus, to compute maximum entropy solutions, we can use this approach to solve the linear system composed of Equation 37 and 38.

A prominent advantage of SGD based approach is the ability to control the error ζ=AπB\zeta=A\pi-B at run time, so the gradient descent loop terminates when |ζ||\zeta| is “small enough”. Moreover, as SGD can be easily parallelized on GPU, its performance can be improved significantly.

4.5 Computational Performance Studies

To study the performance of presented Rule-PSAT algorithms, we experiment them on randomly generated p-rules. Given a language \mathcal{L}, to ensure that p-rule sets defined over \mathcal{L} are Rule-PSAT, we first generate a random distribution π\pi over the CC set of \mathcal{L} by drawing samples from a discrete uniform distribution. Then we generate p-rules by randomly selecting literals from \mathcal{L} to be the head and body of p-rules. The length of each p-rule is randomly selected between 1 and the size of the language. The probability of each generated p-rule is then computed from π\pi with Equations 3 and 4.

We separate our experiments into two groups, approaches that find a solution to Aπ=BA\pi=B and approaches that find maximum entropy solutions. Figure 11 shows the solver performances for the first gruop: LP, SGD (CPU) and SGD (GPU). In these experiments, the size of the language ranges from n=6n=6 to n=20n=20; the lengths of p-rule sets are 64, 128, 256 and 512, respectively. The termination condition for SGD (with and without GPU) is |ζ|<103|\zeta|<10^{-3}. All experiments are conducted on a desktop PC with a Ryzen 2700 CPU (16 cores), 64GB RAM and a Nvidia 3090 GPU (24GB RAM). From this figure, we observe that although LP is faster than SGD when the size of the language is smaller than 10, SGD is significantly faster as the size of \mathcal{L} grows, especially when the GPU implementation is used.

Refer to caption
Figure 11: Solver Performance for algorithms that find a solution to Aπ=BA\pi=B.

To study performances of approaches that compute maximum entropy solutions, we first compare the entropy of solutions found by these approaches with results shown in Table 6. We observe that approaches that optimize for linear entropy find the same solutions as the QP approach that maximizes von Neumann entropy, as expected. The small differences between these methods are likely caused by numerical errors. For these experiments, the size of p-rule sets is 16. Figure 12 presents the solver performance in terms of speed. We see that although SGD is slower than LP and QP initially, it over takes LP and QP as the size of the \mathcal{L} grows.

Table 6: Entropies of Solutions found by different Approaches with Languages of difference Sizes.
Method |||\mathcal{L}| = 6 |||\mathcal{L}| = 7 |||\mathcal{L}| = 8 |||\mathcal{L}| = 9 |||\mathcal{L}| = 10
QP maximize πilog(πi)-\sum\pi_{i}\log(\pi_{i}) 4.101 4.820 5.529 6.235 6.928
LP maximize πi(πi1)-\sum\pi_{i}(\pi_{i}-1) 4.103 4.821 5.529 6.235 6.928
SGD maximize πi(πi1)-\sum\pi_{i}(\pi_{i}-1) 4.101 4.824 5.529 6.235 6.929
Refer to caption
Figure 12: Solver Performance for algorithms that find maximum entropy solution to Aπ=BA\pi=B.

In all experiments, the learning rate η\eta in our SGD implementations are set to 1/2n1/2^{n}, where nn is the size of the language. A common technique, momentum [45], in training neural networks have been applied to speed up the SGD convergence rate. Namely, in each iteration, Δπj\Delta\pi_{j} is updated with

Δπjη×2(𝒂iπbi)𝒂ij+αΔπj,\Delta\pi_{j}\Leftarrow\eta\times 2(\bm{a}_{i}\cdot\pi-b_{i})\bm{a}_{ij}+\alpha\Delta\pi_{j},

where α=0.99\alpha=0.99 is the momentum used in all experiments. Δπj\Delta\pi_{j} on the RHS is Δπj\Delta\pi_{j} calculated in the previous iteration.

In summary, Table 7 presents characteristics of the Rule-PSAT solving approaches studied in this work. We see that SGD approaches with GPU implementation significantly outperform LP and QP methods in terms of scalability.

Table 7: Characteristics of Rule-PSAT Solving Approaches introduced in this section.
Exact Maximum
Method Solution Entropy Solution
LP solve Aπ=BA\pi=B Yes No
QP maximize πilog(πi)-\sum\pi_{i}\log(\pi_{i}) Yes Yes (von Neumann)
LP maximize πi(πi1)-\sum\pi_{i}(\pi_{i}-1) Yes Yes (Linear)
SGD solve Aπ=BA\pi=B No No
SGD maximize πi(πi1)-\sum\pi_{i}(\pi_{i}-1) No Yes (Linear)
SGD solve Aπ=BA\pi=B (GPU) No No
SGD maximize πi(πi1)-\sum\pi_{i}(\pi_{i}-1) (GPU) No Yes (Linear)

5 Discussion and Conclusion

In this work, we have presented a novel probabilistic structured argumentation framework, Probabilistic Deduction (PD). Syntactically, PD frameworks are defined with probabilistic rules (p-rules) in the form of

σ0σ1,,σn:[θ],\sigma_{0}\leftarrow\sigma_{1},\ldots,\sigma_{n}:[\theta],

which is read as conditional probabilities Pr(σ0|σ1,,σn)=θ\Pr(\sigma_{0}|\sigma_{1},\ldots,\sigma_{n})=\theta. To reason with p-rules, we solve the rule probabilistic satisfiability problem to find the joint probability distribution over the language defining the p-rules and then compute literal probabilities for literals in the language. We have introduced two different formulations for this process, the probabilistic open-world assumption (P-OWA) and probabilistic closed-world assumption (P-CWA). With P-OWA, the joint probability is solved based on conditional probabilities defined by p-rules; with P-CWA, additional constraints are introduced to assert that the probability of a literal is the sum of all possible worlds that contain a deduction to the literal.

From p-rules, we build arguments as deductions in the way that the leaves of a deduction are either literals that are heads of p-rules with empty bodies or literals for which there are p-rules for their negations. One argument attacks another when the claim of the former is the negation of some literal in the latter. The main technical achievement in this part is that we prove that with maximum entropy reasoning, our probability semantics with P-CWA coincide with the complete semantics defined for non-probabilistic argumentation. We prove this abstract argumentation with mappings from AA frameworks to PD presented.

Solving Rule-PSAT is at the core of reasoning with PD. We have investigated several different approaches for doing this using linear programming, quadratic programming and stochastic gradient descent. We have conducted experiments with these approaches on p-rule sets built on different sizes of languages and with different numbers of p-rules. We observe that stochastic gradient descent with GPU implementation outperforms all other approaches as the size of the language grows.

5.1 Relations with some Existing Works

As discussed in [20], Rule-PSAT is a variation of the probabilistic satisfiability (PSAT) problem introduced by Nilsson in [47]. Nilsson considered knowledge bases in Conjunctive Normal Form. A modus ponens example,111111This example is used in [47]. The figure on the left hand side of Table 8 is a reproduction of Figure 2 in [47].

If σ1\sigma_{1}, then σ0\sigma_{0}. σ1\sigma_{1}. Therefore, σ0\sigma_{0}.

is shown in Table 8. The probabilities of the conditional claim is α\alpha, the antecedent β\beta and the consequent γ\gamma. With Nilsson’s probabilistic logic, this is interpreated as:

¬σ1σ0:[α]\neg\sigma_{1}\vee\sigma_{0}:[\alpha], σ1:[β]\sigma_{1}:[\beta], σ0:[γ]\sigma_{0}:[\gamma],

which gives rise to equations

π(¬σ1σ0)+π(σ1σ0)+π(¬σ1¬σ0)\displaystyle\pi(\neg\sigma_{1}\wedge\sigma_{0})+\pi(\sigma_{1}\wedge\sigma_{0})+\pi(\neg\sigma_{1}\wedge\neg\sigma_{0}) =α,\displaystyle=\alpha, (41)
π(σ1σ0)+π(σ1¬σ0)\displaystyle\pi(\sigma_{1}\wedge\sigma_{0})+\pi(\sigma_{1}\wedge\neg\sigma_{0}) =β,\displaystyle=\beta, (42)
π(σ1σ0)+π(¬σ1σ0)\displaystyle\pi(\sigma_{1}\wedge\sigma_{0})+\pi(\neg\sigma_{1}\wedge\sigma_{0}) =γ.\displaystyle=\gamma. (43)

With probabilistic rules discussed in this work, the interpreatation to modus ponens is the three p-rules as follows.

σ0σ1:[α]\sigma_{0}\leftarrow\sigma_{1}:[\alpha], σ1:[β]\sigma_{1}\leftarrow:[\beta], σ0:[γ]\sigma_{0}\leftarrow:[\gamma],

which gives rise to equations

π(σ0σ1)π(¬σ0σ1)+π(σ0σ1)\displaystyle\frac{\pi(\sigma_{0}\wedge\sigma_{1})}{\pi(\neg\sigma_{0}\wedge\sigma_{1})+\pi(\sigma_{0}\wedge\sigma_{1})} =α,\displaystyle=\alpha, (44)

42 and 43. The two shaded polyhedrons shown in Table 8 illustrate probabilistic consistent regions for α,β\alpha,\beta and γ\gamma, with probabilistic logic and probabilistic rule, respectively, as defined by their corresponding equations together with equations 1 and 2. The consistent region in the probabilistic logic case is a tetrahedron, with vertices (0,0,1), (1,0,0), (1,1,0) and (1,1,1). The consistent region in the probabilistic rule case is an octahedron, with vertices (0,0,0), (0,0,1), (0,1,0), (1,0,0), (1,1,0) and (1,1,1). It is argued in [46] that the conditional probability interpretation to modus ponens is more reasonable than the probabilistic logic interpretation in practical settings.

Table 8: Comparison of Consistent Probability Regions between Nilsson’s Probabilistic Logic and Probabilistic Rules on an modus ponens instance. [20]
Probabilistic Logic Probabilistic Rule
¬σ1σ0:[α]\neg\sigma_{1}\vee\sigma_{0}:[\alpha],     σ1:[β]\sigma_{1}:[\beta],     σ0:[γ]\sigma_{0}:[\gamma]. σ0σ1:[α]\sigma_{0}\leftarrow\sigma_{1}:[\alpha], σ1:[β]\sigma_{1}\leftarrow:[\beta], σ0:[γ]\sigma_{0}\leftarrow:[\gamma].
[Uncaptioned image] [Uncaptioned image]

From this example, we observe that both methods are nothing but imposing constraints on the feasible regions of the spaces defined by clauses (in the case of probabilistic logic) or p-rules (in the case of probabilistic rules). In this sense, reasoning on such probability and logic combined forms is about identifying feasible regions determined by solutions to Π\Pi in AΠ=BA\Pi=B.121212Constructions of AA differ between Nilsson’s probabilistic logic and this work. However, both are designed for solving the joint probability distribution over the CC set.

Hunter and Liu [34] make an interesting observation on representing scientific knowledge by combining probabilistic reasoning with logical reasoning. Quoting from [34]:

A key shortcoming of extending classical logic in order to handle probabilistic or statistical information, either by using a possible worlds approach or by adding a probability distribution to each model, is the computational complexity that it involves.

They suggest one may circumvent the computation of joint probability distribution by considering approaches such as Bayesian networks. We certainly agree that computational difficulty is a major challenge. On the other hand, as [26] have shown that the problem of PSAT is NP-complete, thus there does not exist a shortcut that performs probabilistic reasoning “correctly” in general cases. Thus, any probabilistic reasoning approach that does not require the computation of the joint probabilistic distribution either imposes probabilistic assumptions in the underlying model such as independence e.g., [44, 28], or topological constraints, e.g., conditional independence [14], as in Bayesian networks. In this work, we choose not to make such assumptions and study computational approaches with optimization techniques.

In the landscape of probabilistic argumentation, [55, 36, 37] give detailed account on probabilistic abstract argumentation with the epistemic approach. They have described some “desirable” properties of probability semantics, which can be viewed as properties imposed on the joint probability distribution. As discussed in Section 1, the main difference distinguishes this work with existing ones is that we do not assume a given joint probability distribution. Yet, with our approach, we can still compute argument probabilities and thus compare with some of the properties they have proposed, as follows.

  • COH A probability distribution PP is coherent if for arguments 𝙰\mathtt{A} and 𝙱\mathtt{B}, if 𝙰\mathtt{A} attacks 𝙱\mathtt{B}, then Pr(𝙰)+Pr(𝙱)1\Pr(\mathtt{A})+\Pr(\mathtt{B})\leq 1.

    As shown in Proposition 3.5, COH holds in PD frameworks in general.

  • SFOU PP is semi-founded if Pr(𝙰)0.5\Pr(\mathtt{A})\geq 0.5 for every 𝙰\mathtt{A} not attacked.

    This is not true in general PD frameworks, as one can use a p-rule

    σ0:[0.2],\sigma_{0}\leftarrow:[0.2],

    to build an un-attacked argument 𝙰={σ0}σ0\mathtt{A}=\{\sigma_{0}\}\vdash\sigma_{0}, Pr(𝙰)=0.20.5\Pr(\mathtt{A})=0.2\leq 0.5. However, SFOU holds for AA-PD frameworks, as shown by Proposition 3.7.

  • FOU PP is founded if all un-attacked argument have probability 1.

    This is not true in general PD frameworks, but true for AA-PD frameworks, as shown by Proposition 3.7.

  • SOPT PP is semi-optimistic if Pr(𝙰)1𝙱𝙱𝚜Pr(𝙱)\Pr(\mathtt{A})\geq 1-\sum_{\mathtt{B}\in\mathtt{Bs}}\Pr(\mathtt{B}), where 𝙱𝚜{}\mathtt{Bs}\neq\{\} is the set of arguments attacking 𝙰\mathtt{A}.

    This is not true in general PD frameworks, as demonstrated in Example 3.6. This is true for AA-PD framework as shown by Proposition 3.11.

  • OPT PP is optimistic if Pr(𝙰)1𝙱𝙱𝚜Pr(𝙱)\Pr(\mathtt{A})\geq 1-\sum_{\mathtt{B}\in\mathtt{Bs}}\Pr(\mathtt{B}), where 𝙱𝚜\mathtt{Bs} is the set of arguments attacking 𝙰\mathtt{A}.

    As in the previous case, this is not true in general PD framework but true for AA-PD frameworks by Proposition 3.11.

  • JUS PP is justifiable if PP is coherent and optimistic.

    Since PD frameworks are not optimistic in general, they are not justifiable. AA-PD frameworks are justifiable.

  • TER PP is ternary if Pr(𝙰){0,0.5,1}\Pr(\mathtt{A})\in\{0,0.5,1\} for all 𝙰\mathtt{A}.

    This is not true in general PD frameworks as shown in e.g., Example 1.1. This is not true for AA-PD framework either, as illustrated in Example 3.12.

This comparison is encouraging as one can take these properties introduced by Hunter and Thimm as a benchmark for probabilistic argumentation semantics. Observing AA-PD frameworks, a subset of PD frameworks, confirm to these properties helps us to see the underlying connection between PD frameworks and existing works on probabilistic argumentation. At the same, since general PD frameworks do not confirm to the founded and optimistic properties, we observe the hierarchical structure as shown in Figure 13.

Refer to caption
Figure 13: The hierarchical structure between p-rules, PD frameworks and AA-PD frameworks.

PD frameworks share some similarities with Probabilistic Assumption-based Argumentation (PABA) [18, 29, 12], syntactically. [20] has presented differences between p-rules and PABA. Namely, PABA disallows rules forming cycles and if two rules have the same head, the body of one must be a subset of the other; whereas p-rules do not have these constraints. PD frameworks and AA-PD frameworks introduced in this work do not have these constraints. More fundamentally, PABA is an constellation approach to probabilistic argumentation [12], whereas PD is an epistemic approach.

5.2 Future Work

Moving forward, there are three main research directions we will explore in the future. Firstly, as briefly explained in Section B.1, the current approaches for computing literal probability requires either a Rule-PSAT solution (for reasoning with P-OWA) or a P-CWA consistent solution (for reasoning with P-CWA) found on the joint probability distribution. However, such requirement renders “local” reasoning impossible in the sense that one cannot deduce the probability of any literal in an inconsistent set of p-rules even if the literal of interest is independent of the of subset of p-rules that are inconsistent. (This is not much different from observing inconsistency in classical logic in the sense that with a classical logic knowledge base, having both pp and ¬p\neg p co-exist trivializes the knowledge base.) In the future, we would like to explore probability semantics for such inconsistent set of p-rules as well as their computational counterparts.

Secondly, even though we have shown that solving Rule-PSAT with SGD is a promising direction when compared with other approaches such as LP and QP, we are aware that the number of unknowns grows exponentially with respect to the size of the language. Thus, we would like to explore techniques that do not explicitly compute the 2n2^{n} unknowns defining the joint probability distribution, as suggested by e.g., [34]. To this end, there are two directions we will explore. (1) Inspired by the column generation method that is commonly used in optimization, we would like to see whether a similar technique can be developed for reasoning with p-rules; and (2) investigate the existence of equivalent “local” semantics in addition to the “global” semantics given in this work for literal probability computation, especially in cases where the given PD framework can be assumed to be P-CWA consistent and P-CWA can be assumed.

Lastly, we would like to explore applications of PD. As a generic structured probabilistic argumentation framework, we believe the practical limits of PD can be best understood by experimenting it with applications from different domains. Just as ABA has seen its applications in areas such as decision making and planning, we believe PD with its ability in handling probabilistic information could be suitable for solving problems in such domains. We would like to explore these potentials in the future.

References

  • [1] P. Baroni, M. Caminada, and M. Giacomin. An introduction to argumentation semantics. Knowl. Eng. Rev., 26(4):365–410, 2011.
  • [2] P. Baroni, D. Gabbay, M. Giacomin, and L. Van der Torre. Handbook of formal argumentation. College Publications, 2018.
  • [3] P. Besnard, A. Garcia, A. Hunter, S. Modgil, H. Prakken, G. Simari, and F. Toni. Special issue: Tutorials on structured argumentation. Argument & Computation, 5(1), 2014.
  • [4] G.S.G. Beveridge, S.G. Beveridge, and R.S. Schechter. Optimization: Theory and Practice. Chemical Engineering Series. McGraw-Hill, 1970.
  • [5] G. Bongiovanni, G. Postema, A. Rotolo, G. Sartor, C. Valentini, and D. Walton. Handbook of Legal Reasoning and Argumentation. Springer Netherlands, 2018.
  • [6] F. Buscemi, P. Bordone, and A. Bertoni. Linear entropy as an entanglement measure in two-fermion systems. Physical Review A, 75(3), mar 2007.
  • [7] M. Caminada. Argumentation semantics as formal discussion. FLAP, 4(8), 2017.
  • [8] M. Caminada and D. Gabbay. A Logical Account of Formal Argumentation. Studia Logica, 93(2):109–145, December 2009.
  • [9] R. Craven, F. Toni, C. Cadar, A. Hadad, and M. Williams. Efficient argumentation for medical decision-making. In Proc. of AAAI. AAAI Press, 2012.
  • [10] K. Čyras, B. Delaney, D. Prociuk, F. Toni, M. Chapman, J. Domínguez, and V. Curcin. Argumentation for explainable reasoning with conflicting medical recommendations. In Proc. of MedRACER 2018, pages 14–22. CEUR-WS.org, 2018.
  • [11] K. Čyras, X. Fan, C. Schulz, and F. Toni. Assumption-based argumentation: Disputes, explanations, preferences. IfCoLog JLTA, 4(8), 2017.
  • [12] K. Čyras, Q. Heinrich, and F. Toni. Computational complexity of flat and generic assumption-based argumentation, with and without probabilities. Artificial Intelligence, 293:103449, 2021.
  • [13] K. Čyras, A. Rago, E. Albini, P. Baroni, and F. Toni. Argumentative XAI: A survey. In Proc. of IJCAI, pages 4392–4399. ijcai.org, 2021.
  • [14] A. P. Dawid. Conditional independence in statistical theory. Journal of the Royal Statistical Society: Series B (Methodological), 41(1):1–15, 1979.
  • [15] D. Doder and S. Woltran. Probabilistic argumentation frameworks - A logical approach. In Proc. of SUM, pages 134–147. Springer, 2014.
  • [16] P. Dondio. Multi-valued and probabilistic argumentation frameworks. In Proc. of COMMA, volume 266, pages 253–260. IOS Press, 2014.
  • [17] P. M. Dung. On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial Intelligence, 77(2):321–357, 1995.
  • [18] P.M. Dung and P.M. Thang. Towards (probabilistic) argumentation for jury-based dispute resolution. In Proc. of COMMA, pages 171–182. IOS Press, 2010.
  • [19] F. H. Van Eemeren and B. Verheij. Argumentation theory in formal and computational perspective. FLAP, 4(8), 2017.
  • [20] X. Fan. Rule-psat: Relaxing rule constraints in probabilistic assumption-based argumentation. In Proc. of COMMA, 2022.
  • [21] X. Fan, R. Craven, R. Singer, F. Toni, and M. Williams. Assumption-based argumentation for decision-making with preferences: A medical case study. In Proc. of CLIMA, pages 374–390, 2013.
  • [22] B. Fazzinga, S. Flesca, and F. Parisi. On efficiently estimating the probability of extensions in abstract argumentation frameworks. Int. J. Approx. Reason., 69:106–132, 2016.
  • [23] B. Fazzinga, S. Flesca, F. Parisi, and A. Pietramala. Computing or estimating extensions’ probabilities over structured probabilistic argumentation frameworks. FLAP, 3(2):177–200, 2016.
  • [24] J. Fox, D. Glasspool, D. Grecu, S. Modgil, M. South, and V. Patkar. Argumentation-based inference and decision making–a medical perspective. IEEE Intelligent Systems, 22(6):34–41, 2007.
  • [25] J. Gainsburg, J. Fox, and L. M. Solan. Argumentation and decision making in professional practice. Theory Into Practice, 55(4):332–341, 2016.
  • [26] G. F. Georgakopoulos, D. J. Kavvadias, and C. H. Papadimitriou. Probabilistic satisfiability. Journal of Complexity, 4(1):1–11, 1988.
  • [27] P. Hansen and B. Jaumard. Algorithms for the maximum satisfiability problem. Computing, 44(4):279–303, 1990.
  • [28] T. C. Henderson, R. Simmons, B. Serbinowski, M. Cline, D. Sacharny, X. Fan, and A. Mitiche. Probabilistic sentence satisfiability: An approach to PSAT. Artificial Intelligence, 278, 2020.
  • [29] N. D. Hung. Inference procedures and engine for probabilistic argumentation. International Journal of Approximate Reasoning, 90:163–191, 2017.
  • [30] A. Hunter. Some foundations for probabilistic abstract argumentation. In Proc. of COMMA, volume 245, pages 117–128. IOS Press, 2012.
  • [31] A. Hunter. A probabilistic approach to modelling uncertain logical arguments. International Journal Approximate Reasoning, 2013.
  • [32] A. Hunter. Reasoning with inconsistent knowledge using the epistemic approach to probabilistic argumentation. In Proc. of KR, pages 496–505, 2020.
  • [33] A. Hunter. Argument strength in probabilistic argumentation based on defeasible rules. Int. J. Approx. Reason., 146:79–105, 2022.
  • [34] A. Hunter and W. Liu. A survey of formalisms for representing and reasoning with scientific knowledge. Knowl. Eng. Rev., 25(2):199–222, 2010.
  • [35] A. Hunter, S. Polberg, and M. Thimm. Epistemic graphs for representing and reasoning with positive and negative influences of arguments. Artificial Intelligence, 281:103236, 2020.
  • [36] A. Hunter and M. Thimm. Probabilistic argumentation with incomplete information. In Proc. of ECAI, pages 1033–1034. IOS Press, 2014.
  • [37] A. Hunter and M. Thimm. Probabilistic reasoning with abstract argumentation frameworks. J. Artif. Intell. Res., 59:565–611, 2017.
  • [38] E. T. Jaynes. Information theory and statistical mechanics. Phys. Rev., 106:620–630, May 1957.
  • [39] E.T. Jaynes and G.L. Bretthorst. Probability Theory: The Logic of Science. Cambridge University Press, 2003.
  • [40] N. Käfer, C. Baier, M. Diller, C. Dubslaff, S. Alice Gaggl, and H. Hermanns. Admissibility in probabilistic argumentation. J. Artif. Intell. Res., 74, 2022.
  • [41] N. Kökciyan, I. Sassoon, E. Sklar, S. Modgil, and S. Parsons. Applying metalevel argumentation frameworks to support medical decision making. IEEE Intelligent Systems, 36(2):64–71, 2021.
  • [42] N. Labrie and P. J. Schulz. Does argumentation matter? a systematic literature review on the role of argumentation in doctor–patient communication. Health communication, 29(10):996–1008, 2014.
  • [43] H. Li, N. Oren, and T. Norman. Probabilistic argumentation frameworks. In Proc. of TAFA, 2011.
  • [44] J. Ma, W. Liu, and A. Hunter. Inducing probability distributions from knowledge bases with (in)dependence relations. In Proc. of AAAI. AAAI Press, 2010.
  • [45] T. M. Mitchell. Machine Learning. McGraw-Hill, Inc., New York, NY, USA, 1 edition, 1997.
  • [46] N. Nilsson. Probabilistic logic revisited. Artificial Intelligence, 59(1-2):39–42, 1993.
  • [47] N. J. Nilsson. Probabilistic logic. Artificial Intelligence, 28(1):71–87, 1986.
  • [48] J. B. Paris. The uncertain reasoner’s companion: a mathematical perspective. Cambridge University Press, 1994.
  • [49] S. Polberg and D. Doder. Probabilistic abstract dialectical frameworks. In Eduardo Fermé and João Leite, editors, Proc. of JELIA, volume 8761, pages 591–599. Springer, 2014.
  • [50] R. Reiter. On closed world data bases. In Logic and Data Bases, Symposium on Logic and Data Bases, Centre d’études et de recherches de Toulouse, France, 1977, pages 55–76, New York, 1977. Plemum Press.
  • [51] R. Reiter and G. Criscuolo. On interacting defaults. In Proc. of IJCAI, pages 270–276. William Kaufmann, 1981.
  • [52] T. Rienstra. Towards a probabilistic dung-style argumentation system. In Proc. AT, volume 918, pages 138–152. CEUR-WS.org, 2012.
  • [53] S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall Press, Upper Saddle River, NJ, USA, 3rd edition, 2009.
  • [54] X. Sun and B. Liao. Probabilistic argumentation, a small step for uncertainty, a giant step for complexity. In Proc. of EUMAS, pages 279–286. Springer, 2015.
  • [55] M. Thimm. A probabilistic semantics for abstract argumentation. In Proc. of ECAI, 2012.
  • [56] F. Toni. A tutorial on assumption-based argumentation. Argument & Computation, Special Issue: Tutorials on Structured Argumentation, 5(1):89–117, 2014.
  • [57] L. V. D. Torre and S. Vesic. The principle-based approach to abstract argumentation semantics. FLAP, 4(8), 2017.
  • [58] M. Ulbricht and R. Baumann. If nothing is accepted - repairing argumentation frameworks. J. Artif. Intell. Res., 66:1099–1145, 2019.
  • [59] J. Williamson. Handbook of the Logic of Argument and Inference: the Turn Toward the Practical, chapter Probability Logic, pages 397–424. Elsevier, 2002.
  • [60] A. Wilson-Lopez, A. R. Strong, C. M. Hartman, J. Garlick, K. H. Washburn, A. Minichiello, S. Weingart, and J. Acosta-Feliz. A systematic review of argumentation related to the engineering-designed world. Journal of Engineering Education, 109(2):281–306, 2020.
  • [61] Y. Yuan. Recent advances in trust region algorithms. Math. Program., 151(1):249–281, 2015.

Appendix A Proofs for the Results

Proof of Proposition 2.1

Since π\pi is a consistent probability distribution, from Equation 1, we know that

0ωiΩ,ωiσπ(ωi).0\leq\sum_{\omega_{i}\in\Omega,\omega_{i}\models\sigma}\pi(\omega_{i}).

From Equation 2, we know that

ωiΩ,ωiσπ(ωi)1.\sum_{\omega_{i}\in\Omega,\omega_{i}\models\sigma}\pi(\omega_{i})\leq 1.

For ωiΩ\omega_{i}\in\Omega, either ωiσ\omega_{i}\models\sigma or ωi¬σ\omega_{i}\models\neg\sigma, since

ωiΩπ(ωi)=1,\sum_{\omega_{i}\in\Omega}\pi(\omega_{i})=1,

Equation 12 holds. ∎

Proof of Proposition 2.2

πm\pi^{m} exists as the set of p-rules is consistent. By Definition 2.3, there exists at least one solution π\pi. It is unique as the set of possible solutions is convex.

Proof of Lemma 2.1

Since αω>0\alpha_{\omega}>0, there are ω1,,ωkΩ\omega_{1}^{\prime},\ldots,\omega_{k}^{\prime}\in\Omega, such that

Pr(ω1)++Pr(ωk)=αω+β,\Pr(\omega_{1}^{\prime})+\ldots+\Pr(\omega_{k}^{\prime})=\alpha_{\omega}+\beta,

for some β0\beta\geq 0 and H(ω,ω1,,ωk)H(\omega,\omega_{1}^{\prime},\ldots,\omega_{k}^{\prime}) attains its maximum value when

Pr(ω)=Pr(ω1)==Pr(ωk)=ωα+βk+1.\Pr(\omega)=\Pr(\omega_{1}^{\prime})=\ldots=\Pr(\omega_{k}^{\prime})=\frac{\omega_{\alpha}+\beta}{k+1}.

Since αω>0\alpha_{\omega}>0,

ωα+βk+1>0.\frac{\omega_{\alpha}+\beta}{k+1}>0.

Proof of Corollary 2.1

This follows directly from Lemma 2.1 and definitions of Prc(σ)\Pr_{c}(\sigma) and Pro(σ)\Pr_{o}(\sigma) (Equations 13 and 18).

Proof of Proposition 3.1

Let Σ={s1,,sk}\Sigma=\{s_{1},\ldots,s_{k}\}. From Equation 23, the probability of an argument is the sum of Pr(ωi)\Pr(\omega_{i}) such that ωis1sk\omega_{i}\models s_{1}\wedge\ldots\wedge s_{k}. However, since Σ\Sigma contains both σi\sigma_{i} and ¬σi\neg\sigma_{i}. So there is no ωi\omega_{i} such that ωis1sk\omega_{i}\models s_{1}\wedge\ldots\wedge s_{k}. Therefore Pr(𝙰)=0\Pr(\mathtt{A})=0. ∎

Proof of Proposition 3.2

This is a special case of Proposition 3.1. Since both σ\sigma and ¬σ\neg\sigma are in Σ\Sigma, there is no ωi\omega_{i} such that ωiσ¬σ\omega_{i}\models\ldots\wedge\sigma\wedge\neg\sigma\ldots. Therefore Pr(A)=0.\Pr(A)=0.

Proof of Proposition 3.3

This is the case by Equation 18 and 23. From Equation 18, we see that

Pr(σ)=ωiΩ,ωiσπ(ωi).\Pr(\sigma)=\sum_{\omega_{i}\in\Omega,\omega_{i}\models\sigma}\pi(\omega_{i}).

Let Σ={σ1,,σn}\Sigma=\{\sigma_{1},\ldots,\sigma_{n}\} (note that σΣ\sigma\in\Sigma). Since for any ωij=1nσj\omega_{i}\models\bigwedge_{j=1}^{n}\sigma_{j}, ωiσ\omega_{i}\models\sigma, we have

ωiΩ,ωij=1nσjπ(ωi)ωiΩ,ωiσπ(ωi).\sum_{\omega_{i}\in\Omega,\omega_{i}\models\bigwedge_{j=1}^{n}\sigma_{j}}\pi(\omega_{i})\leq\sum_{\omega_{i}\in\Omega,\omega_{i}\models\sigma}\pi(\omega_{i}).

Therefore Pr(𝙰)Pr(σ)\Pr(\mathtt{A})\leq\Pr(\sigma). ∎

Proof of Proposition 3.4

Assume that such 𝙱\mathtt{B} exists, let

𝙱={σσ1bσnb}σ.\mathtt{B}=\{\sigma\wedge\sigma_{1}^{b}\wedge\sigma_{n}^{b}\}\vdash\sigma.

Since Pr(𝙱)0\Pr(\mathtt{B})\neq 0, Pr(σσ1bσnb)0\Pr(\sigma\wedge\sigma_{1}^{b}\wedge\sigma_{n}^{b})\neq 0. Let

𝙰={σσ1aσma}σ.\mathtt{A}=\{\sigma\wedge\sigma_{1}^{a}\wedge\sigma_{m}^{a}\}\vdash\sigma.

Since 𝙰𝙱\mathtt{A}\neq\mathtt{B},

{σ1b,,σnb}{σ1a,,σma}.\{\sigma_{1}^{b},\ldots,\sigma_{n}^{b}\}\neq\{\sigma_{1}^{a},\ldots,\sigma_{m}^{a}\}.

Thus, there exists some ωΩ\omega^{*}\in\Omega such that

ωσσ1bσnb\omega^{*}\models\sigma\wedge\sigma_{1}^{b}\wedge\sigma_{n}^{b}, ω⊧̸σσ1aσma\omega^{*}\not\models\sigma\wedge\sigma_{1}^{a}\wedge\sigma_{m}^{a} and Pr(ω)0\Pr(\omega^{*})\neq 0.

Since

Pr(𝙰)=ωS1ω,\Pr(\mathtt{A})=\sum_{\omega\in S1}\omega,

for some S1ΩS1\subset\Omega such that ωS1\omega^{*}\not\in S1, and

Pr(σ)=ωS2ω,\Pr(\sigma)=\sum_{\omega\in S2}\omega,

for some S2ΩS2\subseteq\Omega such that ωSS2\omega^{*}\in S_{S2}, we have

Pr(A)Pr(σ).\Pr(A)\neq\Pr(\sigma).

Contradiction. ∎

Proof of Proposition 3.5

Let

𝙰={σ0a,,σkaa}σ0a\mathtt{A}=\{\sigma_{0}^{a},\ldots,\sigma_{ka}^{a}\}\vdash\sigma_{0}^{a}

and

𝙱={σ0b,,σkbb}σ0b.\mathtt{B}=\{\sigma_{0}^{b},\ldots,\sigma_{kb}^{b}\}\vdash\sigma_{0}^{b}.

Since 𝙰\mathtt{A} attacks 𝙱\mathtt{B}, σ0a=¬σjb\sigma_{0}^{a}=\neg\sigma_{j}^{b}, for some j{0,,kb}j\in\{0,\ldots,kb\}. Thus,

Pr(𝙰)=Pr(σ0aσkaa)\Pr(\mathtt{A})=\Pr(\sigma_{0}^{a}\wedge\ldots\wedge\sigma_{ka}^{a})

and

Pr(𝙱)=Pr(¬σ0a).\Pr(\mathtt{B})=\Pr(\ldots\wedge\neg\sigma_{0}^{a}\wedge\ldots).

For all ωΩ\omega\in\Omega, either

ωσ0aσkaa\omega\models\sigma_{0}^{a}\wedge\ldots\wedge\sigma_{ka}^{a} or ω¬σ0a.\omega\models\ldots\wedge\neg\sigma_{0}^{a}\wedge\ldots.

Since ωΩPr(ω)=1\sum_{\omega\in\Omega}\Pr(\omega)=1,

Pr(𝙰)+Pr(𝙱)1.\Pr(\mathtt{A})+\Pr(\mathtt{B})\leq 1.

Proof of Proposition 3.6

This is easy to see as by Definition 3.5, we know that the size of \mathcal{L} is the same as the size of 𝒜\mathcal{A}. In other words, an argument exists in 𝒜\mathcal{A} if and only if there is its counterpart in 𝙰𝙰𝟸𝙿𝙳(F)\mathtt{AA2PD}(F). Specifically, for each argument σ𝒜\sigma\in\mathcal{A}, let {σ1,,σm}\{\sigma_{1},\ldots,\sigma_{m}\} be the set of arguments attacking σ\sigma in FF, there is {σ,¬σ1,,¬σm}σ\{\sigma,\neg\sigma_{1},\ldots,\neg\sigma_{m}\}\vdash\sigma in 𝙰𝙰𝟸𝙿𝙳(F)\mathtt{AA2PD}(F).

Furthermore, for two arguments σa\sigma_{a} attacking σb\sigma_{b} in FF, meaning that (σa,σb)𝒯(\sigma_{a},\sigma_{b})\in\mathcal{T}, we have arguments 𝙰={σa,}σa\mathtt{A}=\{\sigma_{a},\ldots\}\vdash\sigma_{a} and 𝙱={σb,¬σa,}σb\mathtt{B}=\{\sigma_{b},\neg\sigma_{a},\ldots\}\vdash\sigma_{b} in 𝙰𝙰𝟸𝙿𝙳(F)\mathtt{AA2PD}(F). Clearly, 𝙰\mathtt{A} attacks 𝙱\mathtt{B}. ∎

Proof of Proposition 3.7

This is trivially true as arguments in a AA-PD frameworks are either of the two forms:

𝙰={σ0}σ0\mathtt{A}=\{\sigma_{0}\}\vdash\sigma_{0} or 𝙱={σ0,¬σ1,}σ0\mathtt{B}=\{\sigma_{0},\neg\sigma_{1},\ldots\}\vdash\sigma_{0}.

Arguments in the form of 𝙰\mathtt{A} are not attacked whereas in the form of 𝙱\mathtt{B} are attacked by some arguments. Since arguments in the form of 𝙰\mathtt{A} are composed by a single p-rule σ0:[1]\sigma_{0}\leftarrow:[1], we have Pr(𝙰)=1\Pr(\mathtt{A})=1.

Proof of Proposition 3.8

Let 𝙰={σ1a,,σna}σ1a\mathtt{A}=\{\sigma_{1}^{a},\ldots,\sigma_{n}^{a}\}\vdash\sigma_{1}^{a} and 𝙱={σ1b,,σmb}σ1b\mathtt{B}=\{\sigma_{1}^{b},\ldots,\sigma_{m}^{b}\}\vdash\sigma_{1}^{b}.

Since Pr(𝙰)=Pr(σ1aσna)=1,\Pr(\mathtt{A})=\Pr(\sigma_{1}^{a}\wedge\ldots\wedge\sigma_{n}^{a})=1, we have

Pr(σia)=1\Pr(\sigma_{i}^{a})=1

for σia{σ1a,,σna}\sigma_{i}^{a}\in\{\sigma_{1}^{a},\ldots,\sigma_{n}^{a}\}. Since 𝙰\mathtt{A} attacks 𝙱\mathtt{B}, ¬σ1a{σ1b,,σmb}\neg\sigma_{1}^{a}\in\{\sigma_{1}^{b},\ldots,\sigma_{m}^{b}\}.

Let ¬σ1a=σkb\neg\sigma_{1}^{a}=\sigma_{k}^{b}, for some k{1,,m}k\in\{1,\ldots,m\}. We have

Pr(σkb)=1Pr(σ1a)=0.\Pr(\sigma_{k}^{b})=1-\Pr(\sigma_{1}^{a})=0.

Since Pr(𝙱)=Pr(σkb)Pr(σkb),\Pr(\mathtt{B})=\Pr(\ldots\wedge\sigma_{k}^{b}\wedge\ldots)\leq\Pr(\sigma_{k}^{b}), we have

Pr(𝙱)=0.\Pr(\mathtt{B})=0.

Proof of Proposition 3.9

(Sketch.) Let

𝙰={σ1a,,σna}σ1a\mathtt{A}=\{\sigma_{1}^{a},\ldots,\sigma_{n}^{a}\}\vdash\sigma_{1}^{a}, and 𝙱={σ1b,,σmb}σ1b\mathtt{B}=\{\sigma_{1}^{b},\ldots,\sigma_{m}^{b}\}\vdash\sigma_{1}^{b}.

Since 𝙱\mathtt{B} attacks 𝙰\mathtt{A}, there is some σka\sigma_{k}^{a} in 𝙰\mathtt{A} such that σ1b=¬σka\sigma_{1}^{b}=\neg\sigma_{k}^{a}.

To show the first part, since 𝙰𝚜\mathtt{As} contains all arguments attacking 𝙰\mathtt{A}, 𝙰𝚜\mathtt{As} contains all arguments of the form _σ1b\_\vdash\sigma_{1}^{b}, for which 𝙱\mathtt{B} is one of these. As all of them have probability 0, with P-CWA, Pr(σ1b)=0\Pr(\sigma_{1}^{b})=0. Thus, we have Pr(σka)=1\Pr(\sigma_{k}^{a})=1. This reasoning can be applied to all σa\sigma^{a} in 𝙰\mathtt{A} such that ¬σa\neg\sigma^{a} is the claim of some argument in 𝙰𝚜\mathtt{As}. For all such σa\sigma^{a}, we have that Pr(σa)=1\Pr(\sigma^{a})=1.

To show the second part, again we notice that

𝙰={σ1a,¬σ1bb,,¬σnbb}σ1a,\mathtt{A}=\{\sigma_{1}^{a},\neg\sigma_{1b}^{b},\ldots,\neg\sigma_{nb}^{b}\}\vdash\sigma_{1}^{a},

such that {_σ1bb,,_σnbb}=𝙰𝚜\{\_\vdash\sigma_{1b}^{b},\ldots,\_\vdash\sigma_{nb}^{b}\}=\mathtt{As}. Since Pr(𝙰)=1\Pr(\mathtt{A})=1, Pr(¬σ1bb)==Pr(¬σnbb)=1\Pr(\neg\sigma_{1b}^{b})=\ldots=\Pr(\neg\sigma_{nb}^{b})=1, therefore Pr(σ1bb)==Pr(σnbb)=0\Pr(\sigma_{1b}^{b})=\ldots=\Pr(\sigma_{nb}^{b})=0. Thus for all 𝙱𝙰𝚜\mathtt{B}\in\mathtt{As}, we have Pr(𝙱)=0\Pr(\mathtt{B})=0. ∎

Proof of Proposition 3.10

To show (1), from Proposition 3.7, we know that 𝙰\mathtt{A} is attacked by some argument thus 𝙰𝚜\mathtt{As} is not empty. We show it is the case that if for all 𝙱𝙰𝚜\mathtt{B}\in\mathtt{As}, Pr(𝙱)<1\Pr(\mathtt{B})<1, then Pr(𝙰)>0\Pr(\mathtt{A})>0.

Let

𝙰={σ0,¬σ1,,¬σn}σ0,\mathtt{A}=\{\sigma_{0},\neg\sigma^{\prime}_{1},\ldots,\neg\sigma^{\prime}_{n}\}\vdash\sigma_{0},

such that 𝙰𝚜={_σ1,,_σn}\mathtt{As}=\{\_\vdash\sigma^{\prime}_{1},\ldots,\_\vdash\sigma^{\prime}_{n}\}. We see that

Pr(𝙰)min(Pr(¬σ1),,Pr(¬σn)).\Pr(\mathtt{A})\leq min(\Pr(\neg\sigma^{\prime}_{1}),\ldots,\Pr(\neg\sigma^{\prime}_{n})).

Since Pr(𝙱)<1\Pr(\mathtt{B})<1 for all 𝙱𝙰𝚜\mathtt{B}\in\mathtt{As}, min(Pr(σ1),,Pr(σn))>0min(\Pr(\sigma^{\prime}_{1}),\ldots,\Pr(\sigma^{\prime}_{n}))>0. By Lemma 2.1, we know that Pr(𝙰)>0\Pr(\mathtt{A})>0. Therefore, Pr(𝙰)=0\Pr(\mathtt{A})=0 only if these exists at least one 𝙱𝙰𝚜\mathtt{B}\in\mathtt{As} such that Pr(𝙱)=1\Pr(\mathtt{B})=1.

(2) follows from Proposition 3.8 directly. ∎

Proof of Theorem 3.1

This follows directly from the Proposition 2 in [1], Propositions 3.9 and 3.10. ∎

Proof of Proposition 3.11

Let 𝙰={σa,¬σ1b,,¬σnb}σa\mathtt{A}=\{\sigma^{a},\neg\sigma^{b}_{1},\ldots,\neg\sigma^{b}_{n}\}\vdash\sigma^{a}, 𝙰𝚜={𝙱1=_σ1b,,𝙱n=_σnb}\mathtt{As}=\{\mathtt{B}_{1}=\_\vdash\sigma^{b}_{1},\ldots,\mathtt{B}_{n}=\_\vdash\sigma^{b}_{n}\}, with FF containing the following p-rules:

σa¬σ1b,,¬σnb\sigma^{a}\leftarrow\neg\sigma^{b}_{1},\ldots,\neg\sigma^{b}_{n}:[11], σ1b_\sigma^{b}_{1}\leftarrow\_:[11], …, σnb_\sigma^{b}_{n}\leftarrow\_:[11].

This proposition is to show that

Pr(𝙰)+Pr(𝙱1)++Pr(𝙱n)1.Pr(\mathtt{A})+Pr(\mathtt{B}_{1})+\ldots+Pr(\mathtt{B}_{n})\geq 1.

Assume otherwise, then there exists ωΩ\omega\in\Omega, Pr(ω)>0\Pr(\omega)>0, such that

ω⊧̸𝙰,ω⊧̸𝙱1,,ω⊧̸𝙱n.\omega\not\models\mathtt{A},\omega\not\models\mathtt{B}_{1},\ldots,\omega\not\models\mathtt{B}_{n}.

Through a case analysis, we show this is not possible.

  • Case 1: ωσib,i{1,,n}\omega\models\sigma^{b}_{i},i\in\{1,\ldots,n\}. By P-CWA, either ω𝙱i\omega\models\mathtt{B}_{i} or Pr(ω)=0\Pr(\omega)=0. Both contradict to the proposition.

  • Case 2: ω¬σib,i{1,,n}\omega\models\neg\sigma^{b}_{i},i\in\{1,\ldots,n\}. Then, there are two sub-cases as follows.

    • Case 2a: if ωσa\omega\models\sigma^{a}, then, by P-CWA, either ω𝙰\omega\models\mathtt{A} or Pr(ω)=0\Pr(\omega)=0.

    • Case 2b: if ω⊧̸σa\omega\not\models\sigma^{a}, then either

      • *

        case 2b(i): ω¬σa¬σ1b¬σnb\omega\models\neg\sigma^{a}\wedge\neg\sigma^{b}_{1}\wedge\ldots\wedge\neg\sigma^{b}_{n}, then Pr(ω)=0\Pr(\omega)=0 as from the p-rule

        σa¬σ1b,,σnb:[1]\sigma^{a}\leftarrow\neg\sigma^{b}_{1},\ldots,\sigma^{b}_{n}:[1]

        we have

        Pr(σa¬σ1b¬σnb)Pr(¬σ1b¬σnb)=1,\frac{\Pr(\sigma^{a}\wedge\neg\sigma^{b}_{1}\wedge\ldots\wedge\neg\sigma^{b}_{n})}{\Pr(\neg\sigma^{b}_{1}\wedge\ldots\wedge\neg\sigma^{b}_{n})}=1,

        so Pr(¬σa¬σ1b¬σnb)=0\Pr(\neg\sigma^{a}\wedge\neg\sigma^{b}_{1}\wedge\ldots\wedge\neg\sigma^{b}_{n})=0.

      • *

        case 2b(ii): there exists σjb\sigma^{b}_{j} such that ω¬σaσjb\omega\models\neg\sigma^{a}\wedge\ldots\wedge\sigma^{b}_{j}\wedge\ldots. In this case, ωσjb\omega\models\sigma^{b}_{j}, which is Case 1 with jj in place of ii.

Therefore, in all cases, either ω𝙰\omega\models\mathtt{A} or ω𝙱i\omega\models\mathtt{B}_{i} for some 𝙱i𝙰𝚜\mathtt{B}_{i}\in\mathtt{As} or Pr(ω)=0\Pr(\omega)=0. ∎

Proof of Theorem 4.1

(Sketch.) Equations 1 to 4 are satisfied by a π\pi solution in [0,1]2n[0,1]^{2^{n}} as follows.

  1. 1.

    If π[0,1]2n\pi\in[0,1]^{2^{n}}, then 0π(ωi)10\leq\pi(\omega_{i})\leq 1 for all ωi\omega_{i}.

  2. 2.

    Since Row m+1m+1 in AA and BB are 1s, we have the sum of all π(ωi)\pi(\omega_{i})s being 1.

  3. 3.

    For each p-rule σ0\sigma_{0}\leftarrow:[θ\theta], Equations 25 and 26 ensure that Equation 3 is satisfied.

  4. 4.

    For each p-rule σ0σ1,,σk\sigma_{0}\leftarrow\sigma_{1},\ldots,\sigma_{k}:[θ\theta], Equation 27 and 28 ensure that Equation 4 is satisfied with simple algebra.

Thus, we see that Equation 24, Aπ=BA\pi=B, is nothing but a linear system representation of Equations 1-4, which characterise probability distributions over the CC set of 0\mathcal{L}_{0} with conditionals. ∎

Proof of Theorem 4.2

The key to this proof is on showing that the “local” constraints given in Equations 33 and 34 are correct. In other words, we must generalize Examples 4.2 and 4.3 to consider p-rule sets composed of arbitrary numbers of p-rules. To this end, we use proof by induction.

Consider a language \mathcal{L} and a set of p-rules \mathcal{R} defined with \mathcal{L}. Consider \mathcal{R} as the union of mm sets of p-rules,

=i=1mR(i),\mathcal{R}=\bigcup_{i=1}^{m}R^{(i)},

in which each set R(i)={ρ1(i),,ρr(i)}R^{(i)}=\{\rho^{(i)}_{1},\ldots,\rho^{(i)}_{r}\} contains rr p-rules with the same head σ(i)\sigma^{(i)}131313Although rr is parametrized on (i)(i), it is omitted to simplify the notation. We do not force that all literals which are heads of rules are heads of the same number of rules.

ρ1(i)=σ(i)σ1(i),1,,σr1(i),1:[],\rho^{(i)}_{1}=\sigma^{(i)}\leftarrow\sigma_{1}^{(i),1},\ldots,\sigma_{r1}^{(i),1}:[\cdot],

ρr(i)=σ(i)σ1(i),r,,σrr(i),r:[].\rho^{(i)}_{r}=\sigma^{(i)}\leftarrow\sigma_{1}^{(i),r},\ldots,\sigma_{rr}^{(i),r}:[\cdot].

From p-rules, we define

L(σ(i))\displaystyle L(\sigma^{(i)}) =(σ(i)j=1r1σj(i),1)(σ(i)j=1rrσj(i),r),\displaystyle=(\sigma^{(i)}\wedge\bigwedge\limits_{j=1}^{r1}\sigma_{j}^{(i),1})\vee\ldots\vee(\sigma^{(i)}\wedge\bigwedge\limits_{j=1}^{rr}\sigma_{j}^{(i),r}),
fL(σ(i))\displaystyle f_{L}(\sigma^{(i)}) ={ωΩ|ωσ(i),ω⊧̸L(σ(i))},\displaystyle=\{\omega\in\Omega|\omega\models\sigma^{(i)},\omega\not\models L(\sigma^{(i)})\},
ΩL(i)\displaystyle\Omega_{L}^{(i)} =j=1ifL(σ(j)).\displaystyle=\bigcup_{j=1}^{i}f_{L}(\sigma^{(j)}).

L(σ(i))L(\sigma^{(i)}) describes the local constraint defined by p-rules with head σ(i)\sigma^{(i)}. fL(σ(i))f_{L}(\sigma^{(i)}) is the set of atomic conjunctions ω\omega which are set to have Pr(ω)=0\Pr(\omega)=0 by considering p-rules with head σ(i)\sigma^{(i)}, and ΩL(i)\Omega_{L}^{(i)} the set of ω\omegas which are set to have Pr(ω)=0\Pr(\omega)=0 by considering all p-rules with heads σ(1),,σ(i)\sigma^{(1)},\ldots,\sigma^{(i)}.

Consider “global” constraints, we have deductions built from p-rules. For each literal σ(i)\sigma^{(i)}, which is the head of a rule, we define a set of deductions D(i)={δ1(i),,δd(i)}D^{(i)}=\{\delta_{1}^{(i)},\ldots,\delta_{d}^{(i)}\}, in which

δ1(i)={σ(i),σ1(i),1,,σd1(i),1}𝙳σ(i)\delta_{1}^{(i)}=\{\sigma^{(i)},\sigma_{1}^{(i),1},\ldots,\sigma_{d1}^{(i),1}\}\vdash_{\mathtt{D}}\sigma^{(i)},

δd(i)={σ(i),σ1(i),d,,σdd(i),d}𝙳σ(i)\delta_{d}^{(i)}=\{\sigma^{(i)},\sigma_{1}^{(i),d},\ldots,\sigma_{dd}^{(i),d}\}\vdash_{\mathtt{D}}\sigma^{(i)}.141414As in the previous footnote, although dd is parametrized on (i)(i), it is omitted to simplify the notation. We do not force all literals which are claims of deductions having the same number of deductions.

From δ1(i),,δr(i)\delta_{1}^{(i)},\ldots,\delta_{r}^{(i)}, we define

G(σ(i))\displaystyle G(\sigma^{(i)}) =(σ(i)j=1d1σj(i),1)(σ(i)j=1ddσj(i),d),\displaystyle=(\sigma^{(i)}\wedge\bigwedge\limits_{j=1}^{d1}\sigma_{j}^{(i),1})\vee\ldots\vee(\sigma^{(i)}\wedge\bigwedge\limits_{j=1}^{dd}\sigma_{j}^{(i),d}),
fG(σ(i))\displaystyle f_{G}(\sigma^{(i)}) ={ωΩ|ωσ(i),ω⊧̸G(σ(i))},\displaystyle=\{\omega\in\Omega|\omega\models\sigma^{(i)},\omega\not\models G(\sigma^{(i)})\},
ΩG(i)\displaystyle\Omega_{G}^{(i)} =j=1ifG(σ(j)).\displaystyle=\bigcup_{j=1}^{i}f_{G}(\sigma^{(j)}).

fG(σ(i))f_{G}(\sigma^{(i)}) is the set of atomic conjunctions ω\omega, which are set to have Pr(ω)=0\Pr(\omega)=0 by considering deductions for σ(i)\sigma^{(i)}, and ΩG(i)\Omega_{G}^{(i)} the set of ω\omegas which are set to have Pr(ω)=0\Pr(\omega)=0 by considering all deductions for σ(1),,σ(i)\sigma^{(1)},\ldots,\sigma^{(i)}. These are the atomic conjunctions obtained from “global” constraints.

For the base case, i=1i=1, by the constructions of R(1)R^{(1)} and D(1)D^{(1)}, which give L(σ(1))L(\sigma^{(1)}) and G(σ(1))G(\sigma^{(1)}), we see that

ΩL(1)=ΩG(1).\Omega_{L}^{(1)}=\Omega_{G}^{(1)}.

Assume that ΩL(n)=ΩG(n)\Omega_{L}^{(n)}=\Omega_{G}^{(n)}, we show it is the case that ΩL(n+1)=ΩG(n+1)\Omega_{L}^{(n+1)}=\Omega_{G}^{(n+1)}. We have

L(σ(n+1))\displaystyle L(\sigma^{(n+1)}) =(σ(n+1)j=1r1σj(n+1),1)(σ(n+1)j=1rrσj(n+1),r),\displaystyle=(\sigma^{(n+1)}\wedge\bigwedge\limits_{j=1}^{r1}\sigma_{j}^{(n+1),1})\vee\ldots\vee(\sigma^{(n+1)}\wedge\bigwedge\limits_{j=1}^{rr}\sigma_{j}^{(n+1),r}),
G(σ(n+1))\displaystyle G(\sigma^{(n+1)}) =(σ(n+1)j=1d1σj(n+1),1)(σ(n+1)j=1ddσj(n+1),d),\displaystyle=(\sigma^{(n+1)}\wedge\bigwedge\limits_{j=1}^{d1}\sigma_{j}^{(n+1),1})\vee\ldots\vee(\sigma^{(n+1)}\wedge\bigwedge\limits_{j=1}^{dd}\sigma_{j}^{(n+1),d}),
fL(σ(n+1))\displaystyle f_{L}(\sigma^{(n+1)}) ={ωΩ|ωσ(n+1),ω⊧̸L(σ(n+1))},\displaystyle=\{\omega\in\Omega|\omega\models\sigma^{(n+1)},\omega\not\models L(\sigma^{(n+1)})\},
fG(σ(n+1))\displaystyle f_{G}(\sigma^{(n+1)}) ={ωΩ|ωσ(n+1),ω⊧̸G(σ(n+1))}.\displaystyle=\{\omega\in\Omega|\omega\models\sigma^{(n+1)},\omega\not\models G(\sigma^{(n+1)})\}.

By the definition of deduction, rdr\leq d and r1d1r1\leq d1, r2d2r2\leq d2, …. It is easy to see that for each σ(n+1)\sigma^{(n+1)}, it holds that

fL(σ(n+1))fG(σ(n+1)).f_{L}(\sigma^{(n+1)})\subseteq f_{G}(\sigma^{(n+1)}).

Thus, to show ΩL(n+1)=ΩG(n+1)\Omega_{L}^{(n+1)}=\Omega_{G}^{(n+1)}, we need to show that when fL(σ(n+1))fG(σ(n+1))f_{L}(\sigma^{(n+1)})\subset f_{G}(\sigma^{(n+1)}), i.e., when there exists ωfG(σ(n+1))\omega\in f_{G}(\sigma^{(n+1)}) and ωfL(σ(n+1))\omega\not\in f_{L}(\sigma^{(n+1)}), there exists σ+{σ(1),,σ(n)}\sigma^{+}\in\{\sigma^{(1)},\ldots,\sigma^{(n)}\} such that

ωfL(σ+)\omega\in f_{L}(\sigma^{+}) therefore ωΩL(n)\omega\in\Omega_{L}^{(n)}.

We observe that ωfG(σ(n+1))fL(σ(n+1))\omega^{*}\in f_{G}(\sigma^{(n+1)})\setminus f_{L}(\sigma^{(n+1)}) is of the form

ω=σ(n+1)σ1(n+1),kσrk(n+1),k¬σ1¬σe\omega^{*}=\sigma^{(n+1)}\wedge\sigma_{1}^{(n+1),k}\wedge\ldots\wedge\sigma_{rk}^{(n+1),k}\wedge\neg\sigma^{*}_{1}\wedge\ldots\wedge\neg\sigma^{*}_{e}\ldots

such that

  1. 1.

    σ(n+1)σ1(n+1),k,,σrk(n+1),k:[]R(n+1)\sigma^{(n+1)}\leftarrow\sigma_{1}^{(n+1),k},\ldots,\sigma_{rk}^{(n+1),k}:[\cdot]\in R^{(n+1)}, and

  2. 2.

    for all δD(n+1)\delta\in D^{(n+1)}, there exists σ{σ1,,σe}\sigma^{*}\in\{\sigma^{*}_{1},\ldots,\sigma^{*}_{e}\} such that σ\sigma^{*} is in δ\delta.

In other words, ωσ(n+1),ωL(σ(n+1))\omega^{*}\models\sigma^{(n+1)},\omega^{*}\models L(\sigma^{(n+1)}), and ω⊧̸G(σ(n+1))\omega^{*}\not\models G(\sigma^{(n+1)}). Therefore, There does not exist Σ𝙳σ(n+1)D(n+1)\Sigma\vdash_{\mathtt{D}}\sigma^{(n+1)}\in D^{(n+1)} such that

ωσΣσ\omega^{*}\models\bigwedge\limits_{\sigma\in\Sigma}\sigma.

However, for such ω\omega^{*}, there must exist σ+{σ1(n+1),k,,σrk(n+1),k}\sigma^{+}\in\{\sigma_{1}^{(n+1),k},\ldots,\sigma_{rk}^{(n+1),k}\} such that the following two conditions C1 and C2 are met:

  • (C1): ωσ+\omega^{*}\models\sigma^{+}, and

  • (C2): ω⊧̸L(σ+)\omega^{*}\not\models L(\sigma^{+}).

Therefore ωfL(σ+)\omega^{*}\in f_{L}(\sigma^{+}). Moreover, since {σ1(n+1),k,,σrk(n+1),k}{σ(1),,σ(n)}\{\sigma_{1}^{(n+1),k},\ldots,\sigma_{rk}^{(n+1),k}\}\subseteq\{\sigma^{(1)},\ldots,\sigma^{(n)}\}, we have σ+{σ(1),,σ(n)}\sigma^{+}\in\{\sigma^{(1)},\ldots,\sigma^{(n)}\}. Thus,

ωΩL(n).\omega^{*}\in\Omega_{L}^{(n)}.

Assume otherwise, i.e.,

ωL(σ1(n+1),k)\omega^{*}\models L(\sigma_{1}^{(n+1),k}), …, ωL(σrk(n+1),k)\omega^{*}\models L(\sigma_{rk}^{(n+1),k}),

then there exists deductions

Σ1𝙳σ1(n+1),k,,Σk𝙳σrk(n+1),k\Sigma_{1}\vdash_{\mathtt{D}}\sigma_{1}^{(n+1),k},\ldots,\Sigma_{k}\vdash_{\mathtt{D}}\sigma_{rk}^{(n+1),k}

such that

ωσΣ1Σkσ.\omega^{*}\models\bigwedge\limits_{\sigma\in\Sigma_{1}\cup\ldots\cup\Sigma_{k}}\sigma.

Since ωσ(n+1)\omega^{*}\models\sigma^{(n+1)}, we also have

ωσΣ1Σk{σ(n+1)}σ.\omega^{*}\models\bigwedge\limits_{\sigma\in\Sigma_{1}\cup\ldots\cup\Sigma_{k}\cup\{\sigma^{(n+1)}\}}\sigma.

Since there is a p-rule σ(n+1)σ1(n+1),k,,σrk(n+1),k:[]R(n+1)\sigma^{(n+1)}\leftarrow\sigma_{1}^{(n+1),k},\ldots,\sigma_{rk}^{(n+1),k}:[\cdot]\in R^{(n+1)}, we have

Σ1Σk{σ(n+1)}𝙳σ(n+1)D(n+1).\Sigma_{1}\cup\ldots\cup\Sigma_{k}\cup\{\sigma^{(n+1)}\}\vdash_{\mathtt{D}}\sigma^{(n+1)}\in D^{(n+1)}.

Contradiction.

Therefore, there exists σ+{σ1(n+1),k,,σrk(n+1),k}\sigma^{+}\in\{\sigma_{1}^{(n+1),k},\ldots,\sigma_{rk}^{(n+1),k}\} meeting conditions C1 and C2. Therefore ωΩL(n)\omega^{*}\in\Omega_{L}^{(n)} and ΩL(n+1)=ΩG(n+1)\Omega_{L}^{(n+1)}=\Omega_{G}^{(n+1)}. ∎

Appendix B Miscellaneous

B.1 Reasoning under Inconsistency

As a “global” semantics, to reason with any literal in a set of p-rules RR, we require RR being Rule-PSAT for computing literal and argument probabilities. However, such consistency requirement may be unjustified if the literal or argument of interest is independent of the part of RR that is inconsistent.

Consider the following example.

Example B.1.

Consider a p-rule set RR with three p-rules:

σ0\sigma_{0}\leftarrow:[0.50.5], σ0\sigma_{0}\leftarrow:[0.60.6], σ1\sigma_{1}\leftarrow:[11].

Clearly, RR is inconsistent as we cannot have Pr(σ0)=0.5=0.6\Pr(\sigma_{0})=0.5=0.6 as 0.50.60.5\neq 0.6. However, if one is to query Pr(σ1)\Pr(\sigma_{1}), one would expect Pr(σ1)=1\Pr(\sigma_{1})=1 can still be returned from this set of inconsistent p-rules.

Designing a comprehensive solution for solving this inconsistent reasoning problem is beyond the scope of this work. However, a quick yet still reasonable approach is to directly relax the condition Aπ=BA\pi=B. Specifically, we formulate the following optimization problem.

minimize:

AπB1\lVert A\pi-B\rVert_{1} (45)

subject to:

i=12nπi\displaystyle\sum_{i=1}^{2^{n}}\pi_{i} =1.\displaystyle=1.
0\displaystyle 0 πi.\displaystyle\leq\pi_{i}.

Clearly, π\pi obtained as such is a probability distribution over the CC set. From which, one can compute literal probabilities as defined in Equation 13.

Alternatively, one can also (1) find some maximum subset RR^{\prime} (with respect to \subseteq) of RR such that RR^{\prime} is Rule-PSAT and compute literal probabilities on RR^{\prime} (this is not unlike solving the MAXSAT problem [27] in a probabilistic setting) or (2) find new consistent probabilities θ\thetas for p-rules in RR such that they are “close” to the original probabilities (this is not unlike works that “fix” argumentation frameworks, e.g., [58]). We will explore these approaches in the future.