An ASP-Based Approach to Counterfactual Explanations for Classification

Leopoldo Bertossi
Universidad Adolfo Ibañez Faculty Eng. & Sciences, Santiago, Chile. leopoldo.bertossi@uai.cl. Member of RelationalAI’s Academic Network, and the Millenium Institute on Foundations of Data (IMFD, Chile).

Abstract

We propose answer-set programs that specify and compute counterfactual interventions as a basis for causality-based explanations to decisions produced by classification models. They can be applied with black-box models and models that can be specified as logic programs, such as rule-based classifiers. The main focus is on the specification and computation of maximum responsibility causal explanations. The use of additional semantic knowledge is investigated.

1 Introduction

Providing explanations to results obtained from machine-learning models has been recognized as critical in many applications, and has become an active research direction in the broader area of explainable AI, and explainable machine learning, in particular [23]. This becomes particularly relevant when decisions are automatically made by those models, possibly with serious consequences for stake holders. Since most of those models are algorithms learned from training data, providing explanations may not be easy or possible. These models are or can be seen as black-box models.

In AI, explanations have been investigated in several areas, and in particular, under actual causality [16], where counterfactual interventions on a causal model are central [24]. They are hypothetical updates on the model’s variables, to explore if and how the outcome of the model changes or not. In this way, explanations for an original output are defined and computed. Counterfactual interventions have been used with ML models, in particular with classification models [21, 27, 26, 17, 10, 20, 6].

In this work we introduce the notion of causal explanation as a set of feature value for the entity under classification that is most responsible for the outcome. The responsibility score is adopted and adapted from the general notion of responsibility used in actual causality [9]. Experimental results with the responsibility score, and comparisons with other scores are reported in [6]. We also introduce answer-set programs (ASPs) that specify counterfactual interventions and causal explanations, and allow to specify and compute the responsibility score. The programs can be applied with black-box models, and with rule-based classification models.

As we show in this work, our declarative approach to counterfactual interventions is particularly appropriate for bringing into the game additional declarative semantic knowledge, which is much more complicated to do with purely procedural approaches. In this way, we can combine logic-based specifications, and use the generic and optimized solvers behind ASP implementations.

This paper is structured as follows. Section 2 introduces the background, and the notions of counterfactual intervention and causal explanation; and the explanatory responsibility score, x-resp, on their basis. Section 3 introduces ASPs that specify causal explanations, the counterfactual ASPs. Section 4 argues for the need to include semantic domain knowledge in the specification of causal explanations. Section 5 discusses several issues raised by this work and possible extensions.

2 Counterfactual Explanations

We consider classification models, $\mathcal{C}$ , that are represented by an input/output relation. Inputs are the so-called entities, $\mathbf{e}$ , which are represented each by a record (or vector), $\mathbf{e}=\langle x_{1},\ldots,x_{n}\rangle$ , where $x_{i}$ is the value $F_{i}(\mathbf{e})\in{\it Dom}(F_{i})$ taken on $\mathbf{e}$ by a feature $F_{i}\in\mathcal{F}=\{F_{1},\ldots,F_{n}\}$ , a set of functions. The output is represented by a label function $L$ that maps entities $\mathbf{e}$ to $0$ or $1$ , the binary result of the classification. That is, to simplify the presentation, we concentrate here on binary classifiers, but this is not essential. We also concentrate on features whose domains ${\it Dom}(F_{i})$ take a finite number of categorical values. C.f. Section 4 for the transformation of numerical domains into categorical ones.

Building a classifier, $\mathcal{C}$ , from a set of training data, i.e. a set of pairs $T=\{\langle\mathbf{e}_{1},c(\mathbf{e}_{1})\rangle,$ $\ldots,\langle\mathbf{e}_{M},c(\mathbf{e}_{M})\rangle\}$ , with $c(\mathbf{e}_{i})\in\{0,1\}$ , is one of the most common tasks in machine learning [13]. It is about learning the label function $L$ for the entire domain of values, beyond $T$ . We say that $L$ “represents” the classifier $\mathcal{C}$ .

Classifiers may take many different internal forms. They could be decision trees, random forests, rule-based classifiers, logistic regression models, neural network-based (or deep) classifiers, etc. [13]. Some of them are more “opaque” than others, i.e. with a more complex and less interpretable internal structure and results [25]. Hence the need for explanations to their classification outcomes. In this work, we are not assuming that we have an explicit classification model, and we do not need it. All we need is to be able to invoke and use it. It could be a “black-box” model.

The problem is the following: Given an entity $\mathbf{e}$ that has received the label $L(\mathbf{e})$ , provide an “explanation” for this outcome. In order to simplify the presentation, and without loss of generality, we assume that label $1$ is the one that has to be explained. It is the “negative” outcome one has to justify, such as the rejection of a loan application.

Causal explanations are defined in terms of counterfactual interventions that simultaneously change feature values in $\mathbf{e}$ in such a way that the updated record gets a new label. A causal explanation for the classification of $\mathbf{e}$ is then a set of its original feature values that are affected by a minimal counterfactual interventions. These explanations are assumed to be more informative than others. Minimality can be defined in different ways, and we adopt an abstract approach, assuming a partial order relation $\preceq$ on counterfactual interventions.

Definition 1

Consider a binary classifier represented by its label function $L$ , and a fixed input record $\mathbf{e}=\langle x_{1},\ldots,x_{n}\rangle$ , with $F_{i}(\mathbf{e})=x_{i}$ , $1\leq i\leq n$ , and $L(\mathbf{e})=1$ .

(a) An intervention $\iota$ on $\mathbf{e}$ is a set of the form $\{\langle F_{i_{1}},x_{i_{1}}^{\prime}\rangle,\ldots,\langle F_{i_{K}},x_{i_{K}}^{\prime}\rangle\}$ , with $F_{i_{s}}\neq F_{i_{\ell}}$ , for $s\neq\ell$ , $x_{i_{s}}\neq x_{i_{s}}^{\prime}\in{\it Dom}(X_{i_{s}})$ . We denote with $\iota(\mathbf{e})$ the record obtained by applying to $\mathbf{e}$ intervention $\iota$ , i.e. by replacing in $\mathbf{e}$ every $x_{i_{s}}=F_{i_{s}}(\mathbf{e})$ , with $F_{i_{s}}$ appearing in $\iota$ , by $x_{i_{s}}^{\prime}$ .

(b) A counterfactual intervention on $\mathbf{e}$ is an intervention $\iota$ on $\mathbf{e}$ such that $L(\iota(\mathbf{e}))=0$ . A $\preceq$ -minimal counterfactual intervention is such that there is no counterfactual intervention $\iota^{\prime}$ on $\mathbf{e}$ with $\iota^{\prime}\prec\iota$ (i.e. $\iota^{\prime}\preceq\iota$ , but not $\iota\preceq\iota^{\prime}$ ).

(c) A causal explanation for $L(\mathbf{e})$ is a set of the form $\epsilon=\{\langle F_{i_{1}},x_{i_{1}}\rangle,\ldots,$ $\langle F_{i_{K}},x_{i_{K}}\rangle\}$ for which there is a counterfactual intervention $\iota=$ $\{\langle F_{i_{1}},x_{i_{1}}^{\prime}\rangle,\ldots,$ $\langle F_{i_{K}},x_{i_{K}}^{\prime}\rangle\}$ for $\mathbf{e}$ . Sometimes, to emphasize the intervention, we denote the explanation with $\epsilon(\iota)$ .

(d) A causal explanation $\epsilon$ for $L(\mathbf{e})$ is $\preceq$ -minimal if it is of the form $\epsilon(\iota)$ for a $\preceq$ -minimal counterfactual intervention $\iota$ on $\mathbf{e}$ . $\Box$

Several minimality criteria can be expressed in terms of partial orders, such as: (a) $\iota_{1}\leq^{s}\iota_{2}$ iff $\pi_{1}(\iota_{1})\subseteq\pi_{1}(\iota_{2})$ , with $\pi_{1}(\iota)$ the projection of $\iota$ on the first position. (b) $\iota_{1}\leq^{c}\iota_{2}$ iff $|\iota_{1}|\leq|\iota_{2}|$ . That is, minimality under set inclusion and cardinality, resp. In the following, we will consider only these; and mostly the second.

Example 1

Consider three binary features, i.e. $\mathcal{F}=\{F_{1},F_{2},F_{3}\}$ , and they take values $0$ or $1$ ; and the input/output relation of a classifier $\mathcal{C}$ shown in Table 1. Let $\mathbf{e}$ be $\mathbf{e}_{1}$ in the table. We want causal explanations for its label $1$ . Any other record in the table can be seen as the result of an intervention on $\mathbf{e}_{1}$ . However, only $\mathbf{e}_{4},\mathbf{e}_{7},\mathbf{e}_{8}$ are (results of) counterfactual interventions in that they switch the label to $0$ .

$\mathcal{C}$

entity (id)	$F_{1}$	$F_{2}$	$F_{3}$	$L$
$\mathbf{e}_{1}$	0	1	1	1
$\mathbf{e}_{2}$	1	1	1	1
$\mathbf{e}_{3}$	1	1	0	1
$\mathbf{e}_{4}$	1	0	1	0
$\mathbf{e}_{5}$	1	0	0	1
$\mathbf{e}_{6}$	0	1	0	1
$\mathbf{e}_{7}$	0	0	1	0
$\mathbf{e}_{8}$	0	0	0	0

Table 1

For example, $\mathbf{e}_{4}$ corresponds to the intervention $\iota_{4}=\{\langle F_{1},1\rangle,\langle F_{2},0\rangle\}$ in that $\mathbf{e}_{4}$ is obtained from $\mathbf{e}_{1}$ by changing the values of $F_{1},F_{2}$ into $1$ and $0$ , resp. For $\iota_{4}$ , $\pi_{1}(\iota_{4})=\{\langle F_{1}\rangle,\langle F_{2}\rangle\}$ . From $\iota_{4}$ we obtain the causal explanation $\epsilon_{4}=\{\langle F_{1},0\rangle,\langle F_{2},1\rangle\}$ , telling us that the values $F_{1}(\mathbf{e}_{1})=0$ and $F_{2}(\mathbf{e}_{1})=1$ are the joint cause for $\mathbf{e}_{1}$ to have been classified as $1$ . There are three causal explanations: $\epsilon_{4}:=\{\langle F_{1},0\rangle,\langle F_{2},1\rangle\}$ , $\epsilon_{7}:=\{\langle F_{2},1\rangle\}$ ,

and $\epsilon_{8}:=\{\langle F_{2},1\rangle,\langle F_{3},1\rangle\}$ . Here, $\mathbf{e}_{4}$ and $\mathbf{e}_{8}$ are incomparable under $\preceq^{s}$ , $\mathbf{e}_{7}\prec^{s}\mathbf{e}_{4}$ , $\mathbf{e}_{7}\prec^{s}\mathbf{e}_{8}$ , and $\epsilon_{7}$ turns out to be $\preceq^{s}$ - and $\preceq^{c}$ -minimal (actually, minimum). $\Box$

Notice that, by taking a projection, the partial order $\preceq^{s}$ does not care about the values that replace the original feature values, as long as the latter are changed. Furthermore, given $\mathbf{e}$ , it would be good enough to indicate the features whose values are relevant, e.g. $\epsilon_{7}=\{F_{2}\}$ in the previous example. However, the introduced notation emphasizes the fact that the original values are those we concentrate on when providing explanations.

Clearly, every $\preceq^{c}$ -minimal explanation is also $\preceq^{s}$ -minimal. However, it is easy to produce an example showing that a $\preceq^{s}$ -minimal explanation may not be $\preceq^{c}$ -minimal.

Notation: An s-explanation for $L(\mathbf{e})$ is a $\preceq^{s}$ -minimal causal explanation for $L(\mathbf{e})$ . A c-explanations $L(\mathbf{e})$ is a $\preceq^{c}$ -minimal causal explanation for $L(\mathbf{e})$ .

This definition characterizes explanations as sets of (interventions on) features. However, it is common that one wants to quantify the “causal strength” of a single feature value in a record representing an entity [20, 6], or a single tuple in a database (as a cause for a query answer) [22], or a single attribute value in a database tuple [3, 4], etc. Different scores have been proposed in this direction, e.g. SHAP in [20] and Resp in [6]. The latter has it origin in actual causality [16], as the responsibility of an actual cause [9], which we adapt to our setting.

Definition 2

Consider $\mathbf{e}$ to be an entity represented as a record of feature values $x_{i}=F_{i}(\mathbf{e})$ , $F_{i}\in\mathcal{F}$ .

(a) A feature value $v=F(\mathbf{e})$ , with $F\in\mathcal{F}$ , is a value-explanation for $L(\mathbf{e})$ if there is an s-explanation $\epsilon$ for $L(\mathbf{e})$ , such that $\langle F,v\rangle\in\epsilon$ .

(b) The explanatory responsibility of a value-explanation $v=F(\mathbf{e})$ is:

$\mbox{\sf x-resp}_{\mathbf{e},F}(v):={\it max}\{\frac{1}{|\epsilon|}:\epsilon\mbox{ is s-explanation with }\langle F,v\rangle\in\epsilon\}.$

Notice that (b) can be stated as $\mbox{\sf x-resp}_{\mathbf{e},F}(v):=\frac{1}{|\epsilon^{\star}|}$ , with $\epsilon^{\star}=\mbox{\sf argmin}\{|\epsilon|~{}:~{}\epsilon$ $\mbox{is s-explanation with }\langle F,v\rangle\in\epsilon\}$ .

Adopting the usual terminology in actual causality [16], a counterfactual value-explanation for $\mathbf{e}$ ’s classification is a value-explanation $v$ with $\mbox{\sf x-resp}_{\mathbf{e}}(v)=1$ , that is, it suffices, without company of other feature values in $\mathbf{e}$ , to justify the classification. Similarly, an actual value-explanation for $\mathbf{e}$ ’s classification is a value-explanation $v$ with $\mbox{\sf x-resp}_{\mathbf{e}}(v)>0$ . That is, $v$ appears in an s-explanation $\epsilon$ , say as $\langle F,v\rangle$ , but possibly in company of other feature values. In this case, $\epsilon\smallsetminus\{\langle F,v\rangle\}$ is called a contingency set for $v$ [22]. It turns out that maximum-responsibility value-explanations appear in c-explanations.

Example 2

(ex. 1 cont.) $\epsilon_{7}$ is the only c-explanation for entity $\mathbf{e}_{1}$ ’s classification. Its value $1$ for feature $F_{2}$ is a value-explanation, and its explanatory responsibility is $\mbox{\sf x-resp}_{\mathbf{e}_{1},F_{2}}(1):=1$ . $\Box$

3 Specifying Causal Explanations in ASP

Entities will be represented by a predicate with $n+2$ arguments $E(\cdot;\cdots;\cdot)$ . The first one holds a record (or entity) id (which may not be needed when dealing with single entities). The next $n$ arguments hold the feature values.¹¹1For performance-related reasons, it might be more convenient to use $n$ 3-are predicates to represent an entity with an identifier, but the presentation here would be more complicated.The last argument holds an annotation constant from the set $\{\mbox{\bf\sf o},\mbox{\bf\sf do},\mathbf{\star},\mbox{\bf\sf s}\}$ . Their semantics will be specified below, by the generic program that uses them.

Initially, a record $\mathbf{e}=\langle x_{1},\ldots,x_{n}\rangle$ has not been subject to interventions, and the corresponding entry in predicate $E$ is of the form $E(\mathbf{e};\bar{x};\mbox{\bf\sf o})$ , with $\bar{x}$ an abbreviation for $x_{1},\ldots,x_{n}$ , and constant o standing for “original entity”.

When the classifier gives label $1$ to $\mathbf{e}$ , the idea is to start changing feature values, one at a time. The intervened entity becomes then annotated with constant do in the last argument. When the resulting intervened entities are classified, we may not have the classifier specified within the program. For this reason, the program uses a special predicate $\mathcal{C}[\cdot;\cdot]$ , whose first argument takes (a representation of) an entity under classification, and whose second argument returns the binary label. We will assume this predicate can be invoked by an ASP as an external procedure, much in the spirit of HEX-programs [11, 12]. Since the original instance may have to go through several interventions until reaching one that switches the label to $0$ , the intermediate entities get the “transition” annotation $\mathbf{\star}$ . This is achieved by a generic program.

The Counterfactual Intervention Program:

P1.

The facts of the program are all the atoms of the form ${\it Dom}_{i}(c)$ , with $c\in{\it Dom}_{i}$ , plus the initial entity $E(\mathbf{e};\bar{f};\mbox{\bf\sf o})$ , where $\bar{f}$ is the initial vector of feature values.

P2.

The transition entities are obtained as initial, original entities, or as the result of an intervention: (here, $\mathbf{e}$ is a variable standing for a record id)

	$\displaystyle E(\mathbf{e};\bar{x};\mathfrak{\star})$	$\displaystyle\longleftarrow$	$\displaystyle E(\mathbf{e};\bar{x};\mbox{\bf\sf o}).$
	$\displaystyle E(\mathbf{e};\bar{x};\mathfrak{\star})$	$\displaystyle\longleftarrow$	$\displaystyle E(\mathbf{e};\bar{x};\mbox{\bf\sf do}).$

P3.

The program rule specifying that, every time the entity at hand (original or obtained after a “previous” intervention) is classified with label $1$ , a new value has to be picked from a domain, and replaced for the current value. The new value is chosen via the non-deterministic “choice operator”, a well-established mechanism in ASP [15]. In this case, the values are chosen from the domains, and are subject to the condition of not being the same as the current value:

	$\displaystyle\mbox{\phantom{ooo}}E(\mathbf{e};x_{1}^{\prime},x_{2},\ldots,x_{n},\mbox{\bf\sf do})\vee\cdots\vee E(\mathbf{e};x_{1},x_{2},\ldots,x_{n}^{\prime},\mbox{\bf\sf do})\ \longleftarrow\ E(\mathbf{e};\bar{x};\mathfrak{\star}),\mathcal{C}[\bar{x};1],$
	$\displaystyle\hskip 113.81102pt{\it Dom}_{1}(x_{1}^{\prime}),\ldots,{\it Dom}_{n}(x_{n}^{\prime}),x_{1}^{\prime}\neq x_{1},\ldots,x_{n}^{\prime}\neq x_{n},$
	$\displaystyle\hskip 113.81102pt{\it choice}(\bar{x};x_{1}^{\prime}),\ldots,{\it choice}(\bar{x};x_{n}^{\prime}).$

For each fixed $\bar{x}$ , ${\it choice}(\bar{x};y)$ chooses a unique value $y$ subject to the other conditions in the same rule body. The use of the choice operator can be eliminated by replacing each ${\it choice}(\bar{x};x_{i}^{\prime})$ atom by the atom ${\it Chosen}_{i}(\bar{x},x_{i}^{\prime})$ , and defining each predicate ${\it Chosen}_{i}$ by means of “classical” rules [15], as follows:

	$\displaystyle{\it Chosen}_{i}(\bar{x},y)$	$\displaystyle\leftarrow$	$\displaystyle E(\mathbf{e};\bar{x};\mathfrak{\star}),\mathcal{C}[\bar{x};1],{\it Dom}_{i}(y),y\neq x_{i},{\it not}\ {\it DiffChoice}(\bar{x},y).$
	$\displaystyle{\it DiffChoice}(\bar{x},y)$	$\displaystyle\leftarrow$	$\displaystyle{\it Chosen}_{i}(\bar{x},y^{\prime}),y^{\prime}\neq y.$

P4.

The following rule specifies that we can “stop”, hence annotation s, when we reach an entity that gets label $0$ :

$E(\mathbf{e};\bar{x};\mbox{\bf\sf s})\ \longleftarrow\ E(\mathbf{e};\bar{x};\mbox{\bf\sf do}),\mathcal{C}[\bar{x};0].$
P5.

We add a program constraint specifying that we prohibit going back to the original entity via local interventions:

$\longleftarrow\ E(\mathbf{e};\bar{x};\mbox{\bf\sf do}),E(\mathbf{e};\bar{x};\mbox{\bf\sf o}).$
P6.

The causal explanations can be collected by means of predicates ${\it Expl}_{i}(\cdot;\cdot)$ specified by means of:

${\it Expl}_{i}(\mathbf{e};x_{i})\ \longleftarrow\ E(\mathbf{e};x_{1},\ldots,x_{n};\mbox{\bf\sf o}),E(\mathbf{e};x_{1}^{\prime},\ldots,x_{n}^{\prime};\mbox{\bf\sf s}),x_{i}\neq x_{i}^{\prime}$ .

Actually, each of these is a value-explanation. $\Box$

The program will have several stable models due to the disjunctive rule and the choice operator. Each model will hold intervened versions of the original entity, and hopefully versions for which the label is switched, i.e. those with annotation s. If the classifier never switches the label, despite the fact that local interventions are not restricted (and this would be quite an unusual classifier), we will not find a model with a version of the initial entity annotated with s. Due to the program constraint in P5., none of the models will have the original entity annotated with do, because those models would be discarded [19].

Notice that the use of the choice operator hides occurrences of non-stratified negation [15]. In relation to the use of disjunction in a rule head, the semantics of ASP, which involves model minimality, makes only one of the atoms in the disjunction true (unless forced otherwise by the program itself).

Example 3

(ex. 1 cont.) Most of the Counterfactual Intervention Program above is generic. In this particular example, the have the following facts: ${\it Dom}_{1}(0),$ ${\it Dom}_{1}(1),$ ${\it Dom}_{2}(0),{\it Dom}_{2}(1),{\it Dom}_{3}(0),{\it Dom}_{3}(1)$ and $E(\mathbf{e}_{1};0,1,1;\mbox{\bf\sf o})$ , with $\mathbf{e}_{1}$ a constant, the record id of the first row in Table 1.

In this very particular situation, the classifier is explicitly given by Table 1. Then, predicate $\mathcal{C}[\cdot;\cdot]$ can be specified with a set of additional facts: $\mathcal{C}[0,1,1;1]$ , $\mathcal{C}[1,1,1;1]$ , $\mathcal{C}[1,1,0;1]$ $\mathcal{C}[1,0,1;0]$ $\mathcal{C}[1,0,0;1]$ $\mathcal{C}[0,1,0;1]$ $\mathcal{C}[0,0,1;0]$ $\mathcal{C}[0,0,0;0]$ .

The stable models of the program will contain all the facts above. One of them, say $\mathcal{M}_{1}$ , will contain (among others) the facts: $E(\mathbf{e}_{1};0,1,1;\mbox{\bf\sf o})$ and $E(\mathbf{e}_{1};0,1,1;\mathbf{\star})$ . The presence of the last atom activates rule P3., because $\mathcal{C}[0,1,1;1]$ is true (for $\mathbf{e}_{1}$ in Table 1). New facts are produced for $\mathcal{M}_{1}$ (the new value due to an intervention is underlined): $E(\mathbf{e}_{1};\underline{1},1,1;\mbox{\bf\sf do}),$ $E(\mathbf{e}_{1};\underline{1},1,1;\mathbf{\star})$ . Due to the last fact and the true $\mathcal{C}[1,1,1;1]$ , rule P3. is activated again. Choosing the value $0$ for the second disjunct, atoms $E(\mathbf{e}_{1};\underline{1},\underline{0},1;\mbox{\bf\sf do}),$ $E(\mathbf{e}_{1};\underline{1},\underline{0},1;\mathbf{\star})$ are generated. For the latter, $\mathcal{C}[1,0,1;0]$ is true (coming from $\mathbf{e}_{4}$ in Table 1), switching the label to $0$ . Rule P3 is no longer activated, and we can apply rule P4., obtaining $E(\mathbf{e}_{1};\underline{1},\underline{0},1;\mbox{\bf\sf s})$ .

From rules P6., we obtain as explanations: ${\it Expl}_{1}(\mathbf{e}_{1};0),{\it Expl}_{2}(\mathbf{e}_{1};1)$ , showing the values in $\mathbf{e}_{i}$ that were changed. All this in model $\mathcal{M}_{1}$ . There are other models, and one of them contains $E(\mathbf{e}_{1};0,\underline{0},1;\mbox{\bf\sf s})$ , the minimally intervened version of $\mathbf{e}_{1}$ , i.e. $\mathbf{e}_{7}$ . $\Box$

3.1 C-explanations and maximum responsibility

There is no guarantee that the intervened entities $E(\mathbf{e};c_{1},\ldots,c_{n};\mbox{\bf\sf s})$ will correspond to c-explanations, which are the main focus of this work. In order to obtain them (and only them), we add weak program constraints (WCs) to the program. They can be violated by a stable model of the program (as opposed to (strong) program constraints that have to be satisfied). However, they have to be violated in a minimal way. We use WCs, whose number of violations have to be minimized, in this case, for $1\leq i\leq n$ :

:\sim\ E(\mathbf{e};x_{1},\ldots,x_{n},\mbox{\bf\sf o}),E(\mathbf{e};x_{1}^{\prime},\ldots,x_{n}^{\prime},\mbox{\bf\sf s}),x_{i}\neq x_{i}^{\prime}.

Only the stable models representing an intervened version of $\mathbf{e}$ with a minimum number of value discrepancies with $\mathbf{e}$ will be kept.

In each of these “minimum-cardinality” stable models $\mathcal{M}$ , we can collect the corresponding c-explanation for $\mathbf{e}$ ’s classification as the set $\epsilon^{\mathcal{M}}=\{\langle F_{i},c_{i}\rangle~{}|$ ${\it Expl}_{i}(\mathbf{e};c_{i})\in\mathcal{M}\}$ . This can be done within a ASP system such as DLV, which allows set construction and aggregation, in particular, counting [1, 19]. Actually, counting comes handy to obtain the cardinality of $\epsilon^{\mathcal{M}}$ . The responsibility of a value-explanation ${\it Expl}_{i}(\mathbf{e};c_{i})$ will then be: $\mbox{\sf x-resp}_{\mathbf{e},F_{i}}(c_{i})=\frac{1}{|\epsilon^{\mathcal{M}}|}$ .

4 Semantic Knowledge

Counterfactual interventions in the presence of semantic conditions requires consideration. As the following example shows, not every intervention, or combination of them, may be admissible [5]. It is in this kind of situations that declarative approaches to counterfactual interventions, like the one presented here, become particularly useful.

Example 4

A moving company makes automated hiring decisions based on feature values in applicants’ records of the form $R=\langle{\it appCode},\mbox{{\it ability to lift}},\mbox{\it gender},$ $\mbox{\it weight},\mbox{\it height},{\it age}\rangle$ . Mary, represented by $R^{\star}=\langle 101,1,F,\mbox{\it 160 pounds},\mbox{\it 6 feet},28\rangle$ applies, but is denied the job, i.e. the classifier returns: $L(R^{\star})=1$ . To explain the decision, we can hypothetically change Mary’s gender, from ${\it F}$ into ${\it M}$ , obtaining record $R^{\star\prime}$ , for which we now observe $L(R^{\star\prime})=0$ . Thus, her value $F$ for gender can be seen as a counterfactual explanation for the initial decision.

As an alternative, we might keep the value of gender, and counterfactually change other feature values. However, we might be constrained or guided by an ontology containing, e.g. the denial semantic constraint $\neg(R[2]=1\wedge R[6]>80)$ ( $2$ and $6$ indicating positions in the record) that prohibits someone over 80 to be qualified as fit to lift. We could also have a rule, such as $(R[3]=M\wedge R[4]>100\wedge R[6]<70)\rightarrow R[2]=1$ , specifying that men who weigh over 100 pounds and are younger than 70 are automatically qualified to lift weight.

In situations like this, we could add to the ASP we had before: (a) program constraints that prohibit certain models, e.g. $\longleftarrow\ R(\mathbf{e};x,\mbox{\sf 1},y,z,u,w;\mathbf{\star}),$ $w>{\sf 80}$ ; (b) additional rules, e.g. $R(\mathbf{e};x,\mbox{\sf 1},y,z,u,w;\mathbf{\star})\longleftarrow R(\mathbf{e};x,y,\mbox{\sf M},z,u,w;\mathbf{\star}),$ $z>{\sf 100},w<{\sf 70}$ , that may automatically generate additional interventions. In a similar way, one could accommodate certain preferences using weak program constraints. $\Box$

Another situation where not all interventions are admissible occurs when features take continuous values, and their domains have to be discretized. The common way of doing this, namely the combination of bucketization and one-hot-encoding, leads to the natural and necessary imposition of additional constraints on interventions, as we will show. Through bucketization, a feature range is discretized by splitting it into finitely many, say $N$ , usually non-overlapping intervals. This makes the feature basically categorical (each interval becoming a categorical value). Next, through one-hot-encoding, the original feature is represented as a vector of length $N$ of indicator functions, one for each categorical value (intervals here) [6]. In this way, the original feature gives rise to $N$ binary features. For example, if we have a continuous feature “External Risk Estimate” (ERE), its buckets could be: $[0,64),[64,71),[71,76),[76,81),$ $[81,\infty)$ . Accordingly, if for an entity $\mathbf{e}$ , $\mbox{\sf ERE}(\mathbf{e})=65$ , then, after one-hot-encoding, this value is represented as the vector $[0,1,0,0,0,0]$ , because $65$ falls into the second bucket.

In a case like this, it is clear that counterfactual interventions are constrained by the assumptions behind bucketization and one-hot-encoding. For example, the vector cannot be updated into, say $[0,1,0,1,0,0]$ , meaning that the feature value for the entity falls both in intervals $[64,71)$ and $[76,81)$ . Bucketization and one-hot-encoding can make good use of program constraints, such as $\longleftarrow\mbox{\sf ERE}(\mathbf{e};x,1,y,1,z,w;\mathbf{\star})$ , etc. Of course, admissible interventions on predicate ERE could be easily handled with a disjunctive rule like that in P3., but without the “transition” annotation $\mathbf{\star}$ . However, the ERE record is commonly a component of a larger record containing all the feature values for an entity [6]. Hence the need for a more general and uniform form of specification.

5 Discussion

This work is about interacting with possibly external classifiers and reasoning with their results and potential inputs. That is, the classifier is supposed to have been learned by means of some other methodology. In particular, this is not about learning ASPs, which goes in a different direction [18].

We have treated classifiers as black-boxes that are represented by external predicates in the ASP. However, in some cases it could be the case that the classifier is given by a set of rules, which, if compatible with ASPs, could be appended to the program, to define the classification predicate $\mathcal{C}$ . The domains used by the programs can be given explicitly. However, they can be specified and extracted from other sources. For example, for the experiments in [6], the domains were built from the training data, a process that can be specified and implemented in ASP.

The ASPs we have used are inspired by repair programs that specify and compute the repairs of a database that fails to satisfy the intended integrity constraints [8]. Actually, the connection between database repairs and actual query answer causality was established and exploited in [3]. ASPs that compute attribute-level causes for query answering were introduced in [4]. They are much simpler that those presented here, because, in that scenario, changing attribute values by nulls is good enough to invalidate the query answer (the “equivalent” in that scenario to switching the classification label here). Once a null is introduced, there is no need to take it into account anymore, and a single “step” of interventions is good enough.

Here we have considered only s- and c-explanations, specially the latter. Both embody specific and different, but related, minimization conditions. However, counterfactual explanations can be cast in terms of different optimization criteria [17, 26]. One could investigate in this setting other forms on preferences, the generic $\preceq$ in Definition 1, by using ASPs as those introduced in [14]. These programs could also de useful to compute (a subclass of) s-explanations, when c-explanations are, for some reason, not useful or interesting enough. The ASPs, as introduced in this work, are meant to compute c-explanations, but extending them is natural and useful.

This article reports on preliminary work that is part of longer term and ongoing research. In particular, we are addressing the following: (a) multi-task classification. (b) inclusion of rule-based classifiers. (c) scores associated to more than one intervention at a time [6], in particular, to full causal explanations. (d) experiments with this approach and comparisons with other forms of explanations. However, the most important direction to explore, and that is a matter of ongoing work, is described next.

5.1 From ideal to more practical explanations

The approach to specification of causal explanations we described so far in this paper is in some sense ideal, in that the whole product space of the feature domains is considered, together with the applicability of the classifier over that space. This may be impractical or unrealistic. However, we see our proposal as a conceptual and specification basis that can be adapted in order to include more specific practices and mechanisms, hopefully keeping a clear declarative semantics. One way to go consists in restricting the product space; and this can be done in different manners. For instance, one can use constrains or additional conditions in rule bodies. An extreme case of this approach consists in replacing the product space with the entities in a data sample $S\subseteq\Pi_{i=1}^{n}{\it Dom}(F_{i})$ . We could even assume that this sample already comes with classification labels, i.e. $S^{L}=\{\langle\mathbf{e}_{1}^{\prime},L(\mathbf{e}_{1}^{\prime})\rangle,\ldots,\langle\mathbf{e}_{K}^{\prime},L(\mathbf{e}_{K}^{\prime})\rangle\}$ . Actually, this dataset does not have to be disjoint from the training dataset $T$ mentioned early in Section 2. The definition of causal explanation and the counterfactual ASPs could be adapted to these new setting without major difficulties.

An alternative and more sophisticated approach consists in using knowledge about the underlying population of entities, such a probabilistic distribution; and using it to define causal explanations, and explanation scores for them. This is the case of the Resp and SHAP explanation scores mentioned in Section 2 [6, 20]. In these cases, it is natural to explore the applicability of probabilistic extensions of ASP [2]. In most cases, the underlying distribution is not known, and has to be estimated from the available data, e.g. a sample as $S^{L}$ above, and the scores have to be redefined (or estimated) through by appealing to this sample. This was done in [6] for both Resp and SHAP. In these cases, counterfactual ASPs could be used, with extensions for set building and aggregations to compute the empirical scores, hopefully in interaction with a database containing the sample.

Acknowledgements: The thorough and useful comments provided by anonymous reviewers are greatly appreciated.

References

[1] Alviano, M., Calimeri, F., Dodaro, C., Fuscà, D., Leone, N., Perri, S., Ricca, F., Veltri, P. and Zangari, J. The ASP System DLV2. Proc. LPNMR, Springer LNCS 10377, 2017, pp. 215-221.
[2] Baral, C., Gelfond, M. and Rushton, N. Probabilistic Reasoning with Answer Sets. Theory and Practice of Logic Programming, 2009, 9(1):57-144.
[3] Bertossi, L. and Salimi, B. From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back. Theory of Computing Systems, 2017, 61(1):191-232.
[4] Bertossi, L. Characterizing and Computing Causes for Query Answers in Databases from Database Repairs and Repair Programs. Proc. FoIKs, 2018, Springer LNCS 10833, pp. 55-76. Revised and extended version as Corr Arxiv Paper cs.DB/1712.01001.
[5] Bertossi, L. and Geerts, F. Data Quality and Explainable AI. ACM Journal of Data and Information Quality, 2020, 12(2):1-9.
[6] Bertossi, L., Li, J., Schleich, M., Suciu, D. and and Vagena, Z. Causality-Based Explanation of Classification Outcomes. Proc. 4th International Workshop on “Data Management for End-to-End Machine Learning” (DEEM) at ACM SIGMOD, 2020. Posted as Corr Arxiv Paper arXiv:2003.0686.
[7] Calimeri, F., Faber, W., Gebser, M., Ianni, G., Kaminski, R., Krennwallner, T., Leone, N., Maratea, M., Ricca, F. and Schaub, T. ASP-Core-2 Input Language Format. Theory and Practice of Logic Programming, 2020, 20(2):294-309.
[8] Caniupan, M. and Bertossi, L. The Consistency Extractor System: Answer Set Programs for Consistent Query Answering in Databases. Data & Knowledge Engineering, 2010, 69(6):545-572.
[9] Chockler, H. and Halpern, J. Y. Responsibility and Blame: A Structural-Model Approach. Journal of Artificial Intelligence Research, 2004, 22:93-115.
[10] Datta, A., Sen, S. and Zick, Y. Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems. Proc. IEEE Symposium on Security and Privacy, 2016.
[11] Eiter, T., Germano, S., Ianni, G., Kaminski, T., Redl, C., Schüller, P. and Weinzierl, A. The DLVHEX System. Künstliche Intelligenz, 2019, 32(2-3):187-189.
[12] Eiter, T., Kaminski, T., Redl, C., Schüller, P. and Weinzierl, A. Answer Set Programming with External Source Access. Reasoning Web, Springer LNCS 10370, 2017, pp. 204-275.
[13] Flach, P. Machine Learning. Cambridge Univ. Press, 2012.
[14] Gebser, M., Kaminski, R. and Schaub, T. Complex Optimization in Answer Set Programming. Theory and Practice of Logic Programming, 2011, 11(4-5):821-839.
[15] Giannotti, F., Greco, S., Sacca, D. and Zaniolo, C. Programming with Non-Determinism in Deductive Databases. Annals of Mathematics in Artificial Intelligence, 1997, 19(1 2):97 125.
[16] Halpern, J. and Pearl, J. Causes and Explanations: A Structural-Model Approach: Part 1. British Journal of Philosophy of Science, 2005, 56:843-887.
[17] Karimi,A. H., Barthe, G., Balle, B. and Valera, I. Model-Agnostic Counterfactual Explanations for Consequential Decisions. Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), 2020. arXiv: 1905.11190.
[18] Law, M., Russo, A. and Broda K. Logic-Based Learning of Answer Set Programs. In Reasoning Web. Explainable Artificial Intelligence, Springer LNCS 11810, 2019, pp. 196-231.
[19] Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Koch, C., Mateis, C., Perri, S. and Scarcello, F. The DLV System for Knowledge Representation and Reasoning. ACM Transactions on Computational Logic, 2006, 7(3):499-562.
[20] Lundberg, S. and Lee, S.-I. A Unified Approach to Interpreting Model Predictions. Proc. NIPS 2017, pp. 4765-4774.
[21] Martens, D. and Provost, F. J. Explaining Data-Driven Document Classifications. MIS Quarterly, 2014, 38(1):73-99.
[22] Meliou, A., Gatterbauer, W., Moore, K. F. and Suciu, D. The Complexity of Causality and Responsibility for Query Answers and Non-Answers. Proc. VLDB, 2010, pp. 34-41.
[23] Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book, 2020.
[24] Pearl, J. Causality: Models, Reasoning and Inference. Cambridge University Press, 2nd edition, 2009.
[25] Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nature Machine Intelligence, 2019, 1:206-215.
[26] Russell, Ch. Efficient Search for Diverse Coherent Explanations. Proc. FAT 2019, pp. 20-28. arXiv:1901.04909.
[27] Wachter, S., Brent D. Mittelstadt, B. D. and Chris Russell, C. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR. CoRR abs/1711.00399, 2017.