\AtBeginEnvironment

algorithmic

Causal Identification of Sufficient, Contrastive and Complete Feature Sets
in Image Classification

David A. Kelly¹, Hana Chockler¹

Abstract

Existing algorithms for explaining the outputs of image classifiers are based on a variety of approaches and produce explanations that lack formal rigour. On the other hand, logic-based explanations are formally and rigorously defined but their computability relies on strict assumptions about the model that do not hold on image classifiers.

In this paper, we show that causal explanations, in addition to being formally and rigorously defined, enjoy the same formal properties as logic-based ones, while still lending themselves to black-box algorithms and being a natural fit for image classifiers. We prove formal properties of causal explanations and introduce contrastive causal explanations for image classifiers. Moreover, we augment the definition of explanation with confidence awareness and introduce complete causal explanations: explanations that are classified with exactly the same confidence as the original image.

We implement our definitions, and our experimental results demonstrate that different models have different patterns of sufficiency, contrastiveness, and completeness. Our algorithms are efficiently computable, taking on average $6$ s per image on a ResNet50 model to compute all types of explanations, and are totally black-box, needing no knowledge of the model, no access to model internals, no access to gradient, nor requiring any properties, such as monotonicity, of the model.

1 Introduction

(a) Ladybug

(b) Causal explanation

(d) Contrastive explanation

(e) Complete explanation

Figure 1:

4

types of explanation for “ladybug” with

0.46

confidence on a ResNet50. Figure 1(b) shows pixels sufficient to obtain class “ladybug” with more than the original confidence (

0.5

). Figure 1(c) shows multiple explanations. In this paper we introduce ‘contrastive’ explanations, which are subsets of pixels that are by themselves sufficient for “ladybug”, but removing these pixels results in “leaf beetle” (Figure 1(d)), and ‘complete’ explanations (Figure 1(e)), which are subsets of pixels required to arrive at the original confidence of

0.46

Recent progress in artificial intelligence and the ever increasing deployment of AI systems has highlighted the need to understand better why some decisions are made by such systems. For example, one may need to know why a classifier decides that an MRI scan showed evidence of a tumor (Blake et al. 2023). Answering such questions is the province of causality. A causal explanation for an image classification is a special case of explanations in actual causality (Halpern 2019) and identifies a minimal set of pixels which, by themselves, are sufficient to re-create the original decision (Chockler and Halpern 2024).

Logic-based approaches provide formal guarantees for explanations, but their framework assumes that the model is given explicitly as a function. Their formal abductive explanations, or prime implicants (PI), are defined as sets of features such that, if they take the given values, always lead to the same decision (Shih, Choi, and Darwiche 2018). Logic-based methods can also compute contrastive explanations, that is, those features which, if altered, change the original decision. These abductive and contrastive explanations require a model to be monotonic or linear to be effectively computable (Marques-Silva et al. 2021), and therefore are not suitable for image classifiers.

In this paper, we show that causal explanations enjoy all the formal properties of logic-based explanations, while not putting any restrictions on the model and being efficiently computable for black-box image classifiers. We prove equivalence between abductive explanations and causal explanations and introduce a variation of causal explanations equivalent to contrastive explanations. Furthermore, we augment the actual causality framework with the model’s confidence in the classification, introducing $\delta$ -confident explanations, and use these to produce more fine-grained results. We also introduce a causal version of the completeness property for explanations, following (Srinivas and Fleuret 2019). The differences between explanations of the classification of the same image reveal important information about the decision process of the model. In particular, we call the the difference between a complete explanation and a sufficient one adjustment pixels.

We examine the relationship between contrastive and adjustment pixel sets by examining the semantic distance between the original classification and the classifications of these pixel sets, as illustrated in Figure 1. Our approach allows us to formally subdivide an image into its sufficient pixels, contrastive pixels, and adjustment pixels. We prove complexity results for our definitions, giving a justification to efficient approximation algorithms.

Our algorithms are based on rex (Chockler et al. 2024). In Section 4 we introduce black-box approximation algorithms to compute contrastive, complete, and $\delta$ -confident causal explanations for image classifiers. Our algorithms do not require any knowledge of the model, no access to the model internals, nor do they require any specific properties of the model. We implemented our algorithms and present experimental results on three state-of-the-art models and three standard benchmark datasets.

We use the word ‘explanation’ for the remainder of the paper as a convenience. We do not intend, nor seek to imply, that causal explanations are more or less interpretable than other form of explanation. Instead, we are interested in the formal partitioning of pixels in an image into different functional sets. These functional sets reveal important information about the inner workings of the model.

Due to the lack of space, some theoretical and evaluation results are deferred to the supplementary material.

Related Work

Broadly speaking, the field of XAI can be split between formal and informal methods (Izza et al. 2024). The majority of methods belong in the informal camp, including well-known model agnostic methods (Lundberg and Lee 2017a; Ribeiro, Singh, and Guestrin 2016) and saliency methods (Selvaraju et al. 2017; Bach et al. 2015). Formal explanation work has been dominated by logic-based methods (Shih, Choi, and Darwiche 2018; Ignatiev, Narodytska, and Marques-Silva 2019). Logic-based explanations use abductive reasoning to find the simplest or most likely explanation ‘a’ for a (set of) observations ‘b’. Logic-based explanations provide formal guarantees of feature sufficiency Definitions 5 and 6, but usually require strong assumptions of monotonicity or linearity for reasons of computability. The scalability of such an approach is open to question. Logic-based methods are a black-box XAI method, in that they do not require access to a model’s internals, or even its gradient.

Constraint-driven black-box explanations (Shrotri et al. 2022) build on the LIME (Ribeiro, Singh, and Guestrin 2016) framework but include user-provided boolean constraints over the search space. In particular, for image explanations, these constraints could dictate the nature of the perturbations the XAI tool generates. Of course, knowing which constraints to use is a hard problem and assumes at least some knowledge of how the model works and what the explanation should be. While such methods are technically black-box, this is because model-dependent information has been separately encoded by the user.

Causal explanations (Chockler and Halpern 2024) belong in the camp of formal XAI, as they provide mathematically rigorous guarantees in much the same manner as logic-based explanations. rex (Chockler et al. 2024) is a black-box causal explainability tool which makes no assumptions about the classifier. It computes an approximation to minimal, sufficient pixels sets against a baseline value.

2 Background

There are many different definitions of explanation in the literature, some are saliency-based (Selvaraju et al. 2017), some are gradient-based (Srinivas and Fleuret 2019), Shapley-based (Lundberg and Lee 2017b) or train locally interpretable models (Ribeiro, Singh, and Guestrin 2016). Logic-based explanations are rather different in having a mathematically precise definition of explanation. Causal explanations, as defined in (Chockler and Halpern 2024) (and their approximations computed by rex (Chockler et al. 2024)), have much more in common with the rigorous definitions of logic-based explanations.

(a) Wash basin/sink (

0.6

confidence)

(b) Sufficient explanation (

0.04

confidence)

(0.757

confidence)

(d) Pixels for contrastive explanation

(e) Complete explanation (adjustment pixels in red)

Figure 2: A ‘washbasin’ partitioned into sufficient, contrastive, and adjustment pixel sets. Completeness required

82\%

of the image for a ResNet50 model. The causal explanation, Figure 2(b) is very small, with low confidence. The contrastive explanation (Figure 2(c) has higher confidence than the original image. Masking out Figure 2(c) to get Figure 2(d) ResNet50 gives us a classification of ‘toilet seat’. Interestingly, the adjustment pixels (Figure 2(e)) reduce model confidence from

0.75

0.6

, even though the they are also classified as ‘wash basin’.

2.1 Actual causality

In what follows, we briefly introduce the relevant definitions from the theory of actual causality. The reader is referred to (Halpern 2019) for further reading. We assume that the world is described in terms of variables and their values. Some variables may have a causal influence on others. This influence is modeled by a set of structural equations. It is conceptually useful to split the variables into two sets: the exogenous variables ${\mathcal{}U}$ , whose values are determined by factors outside the model, and the endogenous variables ${\mathcal{}V}$ , whose values are ultimately determined by the exogenous variables. The structural equations ${\mathcal{}F}$ describe how these values are determined. A causal model $M$ is described by its variables and the structural equations. We restrict the discussion to acyclic (recursive) causal models. A context $\vec{u}$ is a setting for the exogenous variables ${\mathcal{}U}$ , which then determines the values of all other variables. We call a pair $(M,\vec{u})$ consisting of a causal model $M$ and a context $\vec{u}$ a (causal) setting. An intervention is defined as setting the value of some variable $X$ to $x$ , and essentially amounts to replacing the equation for $X$ in ${\mathcal{}F}$ by $X=x$ . A probabilistic causal model is a pair $(M,\Pr)$ , where $\Pr$ is a probability distribution on contexts.

A causal formula $\psi$ is true or false in a setting. We write $(M,\vec{u})\models\psi$ if the causal formula $\psi$ is true in the setting $(M,\vec{u})$ . Finally, $(M,\vec{u})\models[\vec{Y}\leftarrow\vec{y}]\varphi$ if $(M_{\vec{Y}=\vec{y}},\vec{u})\models\varphi$ , where $M_{\vec{Y}\leftarrow\vec{y}}$ is the causal model that is identical to $M$ , except that the variables in $\vec{Y}$ are set to $Y=y$ for each $Y\in\vec{Y}$ and its corresponding value $y\in\vec{y}$ .

A standard use of causal models is to define actual causation: that is, what it means for some particular event that occurred to cause another particular event. There have been a number of definitions of actual causation given for acyclic models (e.g., (Beckers 2021; Glymour and Wimberly 2007; Hall 2007; Halpern and Pearl 2005; Halpern 2019; Hitchcock 2001, 2007; Weslake 2015; Woodward 2003)). In this paper, we focus on what has become known as the modified Halpern–Pearl definition and some related definitions introduced in (Halpern 2019). We briefly review the relevant definitions below. The events that can be causes are arbitrary conjunctions of primitive events.

Definition 1 (Actual cause).

$\vec{X}=\vec{x}$ is an actual cause of $\varphi$ in $(M,\vec{u})$ if the following three conditions hold:

AC1.: $(M,\vec{u})\models(\vec{X}=\vec{x})$ and $(M,\vec{u})\models\varphi$ .
AC2.: There is a a setting $\vec{x}^{\prime}$ of the variables in $\vec{X}$ , a (possibly empty) set $\vec{W}$ of variables in ${\mathcal{}V}-\vec{X}^{\prime}$ , and a setting $\vec{w}$ of the variables in $\vec{W}$ such that $(M,\vec{u})\models\vec{W}=\vec{w}$ and $(M,\vec{u})\models[\vec{X}\leftarrow\vec{x}^{\prime},\vec{W}\leftarrow\vec{w}]\neg{\varphi}$ , and moreover
AC3.: $\vec{X}$ is minimal; there is no strict subset $\vec{X}^{\prime}$ of $\vec{X}$ such that $\vec{X}^{\prime}=\vec{x}^{\prime\prime}$ can replace $\vec{X}=\vec{x}^{\prime}$ in AC2, where $\vec{x}^{\prime\prime}$ is the restriction of $\vec{x}^{\prime}$ to the variables in $\vec{X}^{\prime}$ .

In the special case that $\vec{W}=\emptyset$ , we get the but-for definition. A variable $x$ in an actual cause $\vec{X}$ is called a part of a cause. In what follows, we adopt the convention of Halpern and state that part of a cause is a cause.

The notion of explanation taken from Halpern (2019) is relative to a set of contexts.

Definition 2 (Explanation).

$\vec{X}=\vec{x}$ is an explanation of $\varphi$ relative to a set ${\mathcal{}K}$ of contexts in a causal model $M$ if the following conditions hold:

EX1a.: If $\vec{u}\in{\mathcal{}K}$ and $(M,\vec{u})\models(\vec{X}=\vec{x})\wedge\varphi$ , then there exists a conjunct $X=x$ of $\vec{X}=\vec{x}$ and a (possibly empty) conjunction $\vec{Y}=\vec{y}$ such that $X=x\wedge\vec{Y}=\vec{y}$ is an actual cause of $\varphi$ in $(M,\vec{u})$ .
EX1b.: $(M,\vec{u}^{\prime})\models[\vec{X}=\vec{x}]\varphi$ for all contexts $\vec{u}^{\prime}\in{\mathcal{}K}$ .
EX2.: $\vec{X}$ is minimal; there is no strict subset $\vec{X}^{\prime}$ of $\vec{X}$ such that $\vec{X}^{\prime}=\vec{x}^{\prime}$ satisfies EX1, where $\vec{x}^{\prime}$ is the restriction of $\vec{x}$ to the variables in $\vec{X}^{\prime}$ .
EX3.: $(M,u)\models\vec{X}=\vec{x}\wedge\varphi$ for some $u\in{\mathcal{}K}$ .

2.2 Actual causality in image classification

The material here is taken from (Chockler and Halpern 2024), and the reader is referred to that paper for a complete overview of causal models for black-box image classifiers. We view an image classifier (a neural network) as a probabilistic causal model, with the set of endogenous variables being the set $\vec{V}$ of pixels that the image classifier gets as input, together with an output variable that we call $O$ . The variable $V_{i}\in\vec{V}$ describes the color and intensity of pixel $i$ ; its value is determined by the exogenous variables. The equation for $O$ determines the output of the neural network as a function of the pixel values. Thus, the causal network has depth $2$ , with the exogenous variables determining the feature variables, and the feature variables determining the output variable.

We assume causal independence between the feature variables $\vec{V}$ . While feature independence does not hold in general, pixel independence is a common assumption in explainability tools. We argue that it is, in fact, a reasonable and accurate assumption on images. Pixels are not concepts: a $2$ D image is a projection of a scene in the $3$ D real world on a plane; concepts that are present in the object in a real world can be invisible on this projection, hence the pixel independence. Moreover, for each setting $\vec{v}$ of the feature variables, there is a setting of the exogenous variables such that $\vec{V}=\vec{v}$ . That is, the variables in $\vec{V}$ are causally independent and determined by the context. Moreover, all the parents of the output variable $O$ are contained in $\vec{V}$ . Given these assumptions, the probability on contexts directly corresponds to the probability on seeing various images (which the neural network presumably learns during training).

The following definition is proved in (Chockler and Halpern 2024) to be equivalent to Definition 2 (their proof is for a partial explanation, which is a generalization of the notion of explanation).

Definition 3 (Explanation for image classifiers).

For a depth- $2$ causal model $M$ corresponding to an image classifier $\mathcal{N}$ , $\vec{X}=\vec{x}$ is an explanation of $O=o$ relative to a set of contexts ${\mathcal{}K}$ , if the following conditions hold:

EXIC1.: $(M,\vec{u})[\vec{X}=\vec{x}]\models O=o$ for all $\vec{u}\in{\mathcal{}K}$ .
EXIC2.: $\vec{X}$ is minimal; there is no strict subset $\vec{X}^{\prime}$ of $\vec{X}$ such that $\vec{X}^{\prime}=\vec{x}^{\prime}$ satisfies EXIC1, where $\vec{x}^{\prime}$ is the restriction of $\vec{x}$ to the variables in $\vec{X}^{\prime}$ .
EXIC3.: There exists a context $\vec{u}^{\prime\prime}\in{\mathcal{}K}$ and a setting $\vec{x}^{\prime\prime}$ of $\vec{X}$ , such that $(M,\vec{u}^{\prime\prime})\models(\vec{X}=\vec{x})\land(O=o)$ and $(M,\vec{u}^{\prime\prime})\models[\vec{X}=\vec{x}^{\prime\prime}](O\neq o)$ .

2.3 Logic-based Explanations

We now briefly review some relevant definitions for the world of logic-based explanations.

A classification problem is characterized by a set of features $\mathcal{F}=\{1\dots m\}$ and a set of classes $K=\{c_{1},\dots c_{k}\}$ . Each feature $i\in\mathcal{F}$ has a domain $D_{i}$ , resulting in a feature space $\mathbb{F}=D_{1}\times D_{2}\dots\times D_{m}$ . The classifier $\mathcal{N}$ cannot be a constant function: there must be at least two different points in the feature space that have different classifications. The most important assumption underlying the computability of logic-based explanations is monotonicity.

Definition 4.

[Monotonic Classifier (Marques-Silva et al. 2021)] Given feature domains and a set of classes assumed to be totally ordered, a classifier $\mathcal{N}$ is fully monotonic if $a\leq b\Rightarrow\mathcal{N}(a)\leq\mathcal{N}(b)$ (where, given two feature vectors $a$ and $b$ , we say that $a\leq b$ if $a_{i}\leq b_{i}(i=1,\dots n))$ .

Definition 5.

[Abductive Explanation (Marques-Silva et al. 2021)] An abductive, or prime-implicant (PI), explanation is a subset-minimal set of features $\mathcal{X}\subseteq\mathcal{F}$ , which, if assigned the values $v$ dictated by the instance $(v,c)$ are sufficient for the prediction $c$ .

\forall(x\in\mathbb{F})\big{[}\bigwedge_{i\in\mathcal{X}}(x_{i}=v_{i})\big{]}\rightarrow(\mathcal{N}(x)=c).

(1)

The other common definition in logic-based explanations relevant to our discussion is contrastive explanations. A contrastive explanation answers the question “why did this happen, and not that?” (Miller 2019).

Definition 6.

[Contrastive Explanation (Ignatiev et al. 2020)] A contrastive explanation is a subset-minimal set $\mathcal{Y}\subseteq\mathcal{F}$ which, if the features in $\mathcal{F}\setminus\mathcal{Y}$ are assigned the values dictated by the instance $(v,c)$ then there is an assignment to the features in $\mathcal{Y}$ that changes the prediction.

\exists(x\in\mathbb{F})\big{[}\bigwedge_{i\in\mathcal{F}-\mathcal{Y}}(x_{i}=v_{i})\big{]}\land(\mathcal{N}(x)\not=c).

(2)

3 Theoretical Results

In this section we prove equivalences between definitions and properties satisfied by causal explanations. We start with formalization of logic-based explanations in the actual causality framework. For a given classification problem as defined in Section 2.3, we define a depth $2$ causal model $M$ as follows. The set of endogenous input variables is the set of features $\mathcal{F}$ , with each variable $i\in\mathcal{F}$ having a domain $D_{i}$ . The output of the classifier is the output variable $O$ of the model, with the domain $K$ . An instance $(v,c)$ corresponds to a context $\vec{u}$ for $M$ that assigns to $\mathcal{F}$ the values defined by $v$ , and the output is $c$ . The set $\mathcal{K}$ of contexts is defined as the feature space $\mathbb{F}=D_{1}\times D_{2}\dots\times D_{m}$ . As the classifier is not constant, there exist two inputs $v$ and $v^{\prime}$ such that $(M,v)\models O=c$ , $(M,v^{\prime})\models O=c^{\prime}$ , and $c\neq c^{\prime}$ . It is easy to see that $M$ is a depth- $2$ causal model with all input variables being causally independent.

3.1 Equivalences between definitions

Lemma 1.

For image classifiers, causal explanations (Definition 3) are abductive (Definition 5).

The following is an easy corollary from Lemma 1 when we observe that the proof does not use any unique characteristics of image classifiers.

Corollary 3.1.

Explanations (Definition 2) in general causal models of depth $2$ with all input variables being independent are abductive.

An abductive explanation is defined relative to the set of all possible assignments, whereas a causal explanation is always relative to a set $\mathcal{K}$ of contexts. Monotonicity does not hold in general for image classifiers, where increasing or decreasing the pixels values of some pixels set can change the classification. This is essentially the definition of causal explanation given in Definition 2, as long as we assume that the instance $(v,c)$ is an ‘actual’ instance (AC1). We need only account for $\mathcal{K}$ in this definition.

Lemma 2.

Contrastive explanations (Definition 6) always exist and are equivalent to actual causes (Definition 1) in the same setting.

3.2 Contrastive and complete explanations with confidence

In this section we extend the definition of causal explanation to explicitly include model confidence and apply this extension to contrastive and complete explanations.

Causal models with confidence

Logic-based explanations do not consider model confidence. While causal explanations are general and can, in principle, include model confidence as a part of its output, (Chockler and Halpern 2024) consider only the output label of the classifier. However, while a pixel set may be sufficient for the classification, the model confidence may be very low, leading to low-quality explanations, as shown in (Kelly, Chanchal, and Blake 2025). A more useful definition of an explanation therefore should take into account model confidence as well, so that the pixels set is sufficient to obtain the required classification with at least the required degree of confidence.

Definition 7.

[ $\delta$ -confident explanation] For a depth- $2$ causal model $M$ corresponding to an image classifier $\mathcal{N}$ and $0\leq\delta\leq 1$ , $\vec{X}=\vec{x}$ is a $\delta$ -confident explanation of $O=o$ with a confidence $C=c$ relative to a set of contexts ${\mathcal{}K}$ , if it is an explanation of $O=o$ relative to a set of contexts ${\mathcal{}K}$ according to Definition 3 and for all $\vec{u}\in{\mathcal{}K}$ , $(M,\vec{u})[\vec{X}=\vec{x}]\models O=o$ with the confidence at least $\delta\times c$ . If the confidence is exactly $\delta\times c$ , we call it a $\delta$ -exact explanation. We call $1$ -exact explanations complete explanations.

Given that, in the worst case, the entire image already has the required confidence, it follows that there is always a complete explanation.

Lemma 3.

A complete explanation always exists.

Contrastive explanations

Multiple sufficient explanations are common in images (Chockler, Kelly, and Kroening 2025). Detecting them depends on the set of contexts $\mathcal{K}$ . We have seen that a logic contrastive explanation (Definition 6) corresponds to an actual cause. The actual causality framework allows us to introduce and compute different versions of contrastive explanations for different sets of contexts, as it is not limited to the set of contexts $\mathcal{K}$ including all possible combinations of values of variables.

Completeness and adjustment pixels

Completeness is a property found in work on saliency methods (Srinivas and Fleuret 2019). There, the intuition is, if the saliency map $S(x)$ completely encodes the computational information as performed by $\mathcal{N}$ , then it is possible to recover $\mathcal{N}(x)$ by using $S(x)$ and $x$ using some function $\phi$ . In effect, this means that, in addition to recovering the original model decision, we also need to recover the model’s confidence in its decision. As a causal explanation is not a saliency map, but rather a set of pixels, we introduce a property of complete explanations (Definition 7). Informally, a complete explanation for an image $I$ is a subset-minimal set of pixels which have both the same class and confidence as $\mathcal{N}(I)$ for some model $\mathcal{N}$ .

A complete explanation for an image classification can be broken down into $3$ sets of pixels, pixels sufficient for the classification, pixels which are contrastive for the classification, and the (possibly empty) set of pixels which bring the confidence to the desired level. We call this last set of pixels the adjustment pixels.

3.3 Input invariance

Finally, we draw attention to a useful property of causal explanations in general. Input invariance (Kindermans et al. 2022) is a property first defined for saliency-based methods. It is a stronger form of the property introduced by Srinivas and Fleuret (2019) as weak dependence. Simply stated, given two models, $\mathcal{N}_{1}(x)$ and $\mathcal{N}_{2}(x)$ , which are identical other than the $\mathcal{N}_{2}$ has had its first neuron layer before non-linearity altered in a manner that does not affect the gradient (e.g. means-shifting) from $\mathcal{N}_{1}(x)$ , there should be no different in saliency map, i.e. $S(\mathcal{N}_{1}(x))=S(\mathcal{N}_{2}(x))$ .

Some methods, such as LRP (Bach et al. 2015), do not satisfy input invariance (Kindermans et al. 2022). Other methods, notably LIME (Ribeiro, Singh, and Guestrin 2016) train an additional model on local data perturbations: it is difficult to make a general statement regarding LIME and input invariance due to local model variability.

As a causal explanation is independent of the exact value of $x$ , and depends only on the output of $\mathcal{N}$ , a causal explanation is invariant in the face of such alterations. The only subtlety being that the baseline value over which an explanation is extract needs also to be means-shifted.

The following lemma follows from the observation that Definition 3 depends only on the properties of $x$ and not on its value.

Lemma 4.

A causal explanation is input invariant.

4 Algorithm

Refer to caption — (a) Original image, misclassified as ox ( $0.2919$ confidence)

We use rex (Chockler et al. 2024) as a basis for computing the definitions provided in this paper. rex uses an approximation of causal responsibility to rank features. Responsibility is a quantitative measure of causality and, roughly speaking, measures the amount of causal influence on the classification (Chockler and Halpern 2004). Algorithm 1 details the greedy algorithm for approximation of $\delta$ -sufficient-contrastive explanations, that is, explanations which are both sufficient and contrastive. Unfortunately, the precise computation of contrastive explanations is intractable, as follows from the intractability of actual causes in depth- $2$ causal models with independent inputs (Chockler et al. 2024).

We use a responsibility map created by rex (Algorithm 1, 4). A full description of rex can be found in Chockler et al. (2024). rex approximates the causal responsibility of pixels in an image $I$ by performing an initial partition of $I$ . Intuitively, a part of $I$ has the highest causal responsibility if it, by itself, is sufficient for the classification. A part of $I$ is lower in causal responsibility if it is not sufficient by itself, but it is sufficient together with several other parts. A part is completely irrelevant if it is neither. rex starts from large parts and then refines the important parts iteratively.

We use the resulting responsibility map to rank pixels by their responsibility for the desired classification. We use $2$ different sets of contexts, $\mathcal{K}^{+}$ and $\mathcal{K}^{-}$ (Algorithm 1, 5) to discover explanations. $\mathcal{K}^{+}$ is created by inserting pixels into an image created from the baseline occlusion value, as dictated by their approximate causal responsibility. In the case of a typical ImageNet model, which accepts an input tensor $224\times 224$ , the total size of $\mathcal{K}^{+}$ is $50176$ . In order to calculate the contrast of this, we also consider the opposite: the context $\mathcal{K}^{-}$ , which replaces pixels from the original image with the baseline value, in accordance with their ranking by responsibility. The procedure is similar to the one used for insertion/deletion curves (Petsiuk, Das, and Saenko 2018). $\mathcal{K}^{+}$ ( $\mathcal{K}^{-}$ ) is the set of contexts of all images produced by inserting (occluding) pixels, based on their pixel ranks, over a baseline value. In our experiments, we use the same baseline value for both $\mathcal{K}^{+}$ and $\mathcal{K}^{-}$ . Our algorithm computes explanations which are both sufficient and contrastive.

Algorithm 1

\delta

-Sufficient-Contrastive Explanation

(I,\delta,\mathcal{N})

INPUT: an image $I$ , a confidence scalar $\delta$ between $0$ and $1$ and a model $\mathcal{N}$

OUTPUT: a sufficient explanation $s$ , a contrastive explanation $c$ , a sorted responsibility ranking $\mathcal{R}$

s,c\leftarrow

initialize to

\emptyset

l\leftarrow\mathcal{N}(x)

\tau\leftarrow\sigma(\mathcal{N}(x))\times\delta

(model confidence)

\mathcal{R}\leftarrow\mathit{pixel\_ranking}(I,\mathcal{N})

sorted high to low

\mathcal{K}^{+},\mathcal{K}_{-}\leftarrow

initialize

6: for

p\in\mathcal{R}

\mathcal{K}^{+},\mathcal{K}_{-}\leftarrow

update with

p

8: if

\mathcal{N}(\mathcal{K}^{+})=l

and

\mathcal{N}(\mathcal{K}_{-})\not=l

and

\sigma(\mathcal{N}(\mathcal{K^{+}}))\geq\tau

then

9: s

\leftarrow\mathcal{K}^{+}

10: c

\leftarrow\mathcal{K}_{-}

11: return

s,c,\mathcal{R}

12: end if

13: end for

Algorithm 2

\mathit{Adjustment\;Discovery}(I,\mathcal{N},\mathcal{R},e

)

INPUT: an image $I$ , a model $\mathcal{N}$ , a responsibility landscape $\mathcal{R}$ , a contrast explanation $e$

OUTPUT: a set of adjustment pixels $a$

c\leftarrow\sigma(\mathcal{N}(x))

(model confidence)

\mathit{pixel\_ranking}\leftarrow

pixels from

\mathcal{R}

ordered from low to high

a\leftarrow\emptyset

4: for each pixel

p_{i}\in\mathit{pixel\_ranking}

a\leftarrow a\cup\{p_{i}\}

e\leftarrow e\cup a

x^{\prime}\leftarrow

mask pixels of

x

that are not in e

8: if

\mathit{confidence}(\mathcal{N}(x^{\prime}))=c

then

9: return

a

10: end if

11: end for

Algorithm 2 details the procedure for extracting the adjustment pixels, given an image and a $\delta$ -sufficient-contrastive explanation. In practice, the equality of 8 is difficult to achieve. Our implementation allows for a user-provided degree of precision. In our experiments, we set this to $4$ decimal places.

Figure 4: Results on ImageNet for

\delta=1.0

. Outliers have been removed for clarity. Both Swin_t and ResNet50 have very low requirements for sufficiency compared to MobileNet. ResNet50 also has the lowest requirement for contrast with confidence at least as great as the original image.

Figure 5: A contrastive explanation with a low ImageNet distance. This colobus monkey has guenon monkey as its contrast classification. The model is clearly relying on the muzzle or snout to refine its classification to colobus.

5 Experimental Results

Figure 6: Shortest path between the original classification and its inverse in the ImageNet hierarchy over

3

different datasets. There is remarkable similarity across all three models, a similarity which is consistent over the different datasets.

In this section, we present an analysis of various models and datasets viewed through the lens of contrast and completeness. To the best of our knowledge, there are no other tools that compute sufficient, contrastive and complete explanations for image classifiers. As shown in Section 3.2, these causal explanations have the same formal properties as logic-based explanations, but are also efficiently computable. Our algorithms have been included in the publicly available XAI tool rex ¹¹1https://github.com/ReX-XAI/ReX. rex was used with default parameters for all experiments, in particular, the random seed is $0$ , baseline value $0$ , and the number of iterations was $20$ .

To the best of our knowledge, no-one has previously investigated the relationship of original classifications to contrastive explanations. Due to the hierarchical nature of the ImageNet dataset—it forms a tree—we can calculate the shortest path between the original classification and its contrastive class. We do the same with the completeness requirements, isolating the adjustment pixels, classifying them and calculating the shortest path to the original classification. For reasons of space, we include only a representative selection of results here. Complete results may be found in the supplementary material.

We examine $3$ models, all from TorchVision: a ResNet50, a MobileNet and Swin_t. All models were used with their default preprocessing procedures as provided by TorchVision. We also used $3$ different datasets, ImageNet-1K validation (approximately $4000$ images) (Russakovsky et al. 2015), PascalVoc20212 (Everingham et al. 2012) and a dataset of ‘complex’ images, ECSSD (Shi et al. 2016). We chose these datasets due to their being publicly available and well-studied. On a NVidia A100 GPU running Ubuntu LTS 22.04 we found that runtime varied greatly depending on the model under test. The ResNet50 and MobileNet were both very quick, taking approximately $6$ seconds per image. The Swin_t model was slower, taking approximately $16$ seconds per image. Our implementation makes extensive use of batched inference for efficiency, so models which do not accept batches will likely be slower.

Figure 4 shows the relative sizes of sufficient, contrastive and adjustment pixel sets for the $3$ different models on ImageNet. These are $\delta$ -explanations where $\delta=1$ . See supplementary material for other models and settings of $\delta$ . In general, ResNet50 requires the fewest pixels for both sufficiency and contrast, and also has very low adjustment pixel size. MobileNet and Swin_t appear to be much more similar in their behavior, though Swin_t has slightly greater contrastive requirements in general.

Figure 6 shows the shortest path between the original classification and its contrast class, according to the ImageNet hierarchy. In general, across all models, the distance between the two classes is not great, given that the maximum distance is $24$ . This is not always the case, however: Figure 3(a) shows an example of a (mis-)classification where the contrast classification is ‘moped’. The adjustment pixels are classified as ‘picket fence’. It is worth noting that the initial confidence was already low on this image. At the other extreme Figure 6 reveals a few cases where the distance between the original classification and its contrast was very small. Manual inspection reveals that these cases represent refinements inside a larger class. Figure 5, for example, reveals the ResNet50 model required the highlighted pixels to refine the classification to colobus monkey. Without them, the classification is still monkey, but a different subclass – guenon.

6 Conclusions

We have demonstrated the concordances between logic-based and causal explanations. We have also introduced definitions for causal contrastive explanations and causal complete explanations. We have created algorithms for approximating these definitions and incorporated them into the tool rex. We have computed contrastive and complete explanations on $3$ different datasets with $3$ different models. We find that different models have different sufficiency, contrastive and adjustment requirements. Finally, we have examined the relationship between the original, contrastive and adjustment pixel predictions.

References

Bach et al. (2015) Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.-R.; and Samek, W. 2015. On Pixel-wise Explanations for Non-linear Classifier Decisions by Layer-wise Relevance Propagation. PLOS One, 10(7).
Beckers (2021) Beckers, S. 2021. Causal sufficiency and actual causation. Journal of Philosophical Logic, 50: 1341–1374.
Blake et al. (2023) Blake, N.; Chockler, H.; Kelly, D. A.; Pena, S. C.; and Chanchal, A. 2023. MRxaI: Black-Box Explainability for Image Classifiers in a Medical Setting. arXiv preprint arXiv:2311.14471.
Chockler and Halpern (2004) Chockler, H.; and Halpern, J. Y. 2004. Responsibility and Blame: A Structural-Model Approach. J. Artif. Intell. Res., 22: 93–115.
Chockler and Halpern (2024) Chockler, H.; and Halpern, J. Y. 2024. Explaining Image Classifiers. In Proceedings of the 21st International Conference on Principles of Knowledge Representation and Reasoning, KR.
Chockler, Kelly, and Kroening (2025) Chockler, H.; Kelly, D. A.; and Kroening, D. 2025. Multiple Different Explanations for Image Classifiers. In ECAI European Conference on Artificial Intelligence.
Chockler et al. (2024) Chockler, H.; Kelly, D. A.; Kroening, D.; and Sun, Y. 2024. Causal Explanations for Image Classifiers. arXiv preprint arXiv:2411.08875.
Everingham et al. (2012) Everingham, M.; Van Gool, L.; Williams, C. K. I.; Winn, J.; and Zisserman, A. 2012. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
Glymour and Wimberly (2007) Glymour, C.; and Wimberly, F. 2007. Actual causes and thought experiments. In Campbell, J.; O’Rourke, M.; and Silverstein, H., eds., Causation and Explanation, 43–67. Cambridge, MA: MIT Press.
Griffin, Holub, and Perona (2022) Griffin, G.; Holub, A.; and Perona, P. 2022. Caltech 256.
Hall (2007) Hall, N. 2007. Structural equations and causation. Philosophical Studies, 132: 109–136.
Halpern (2019) Halpern, J. Y. 2019. Actual Causality. The MIT Press.
Halpern and Pearl (2005) Halpern, J. Y.; and Pearl, J. 2005. Causes and explanations: a structural-model approach. Part I: causes. British Journal for Philosophy of Science, 56(4): 843–887.
Hitchcock (2001) Hitchcock, C. 2001. The intransitivity of causation revealed in equations and graphs. Journal of Philosophy, XCVIII(6): 273–299.
Hitchcock (2007) Hitchcock, C. 2007. Prevention, preemption, and the principle of sufficient reason. Philosophical Review, 116: 495–532.
Ignatiev et al. (2020) Ignatiev, A.; Narodytska, N.; Asher, N.; and Marques-Silva, J. 2020. On Relating ’Why?’ and ’Why Not?’ Explanations. CoRR, abs/2012.11067.
Ignatiev, Narodytska, and Marques-Silva (2019) Ignatiev, A.; Narodytska, N.; and Marques-Silva, J. 2019. Abduction-based explanations for Machine Learning models. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’19/IAAI’19/EAAI’19. AAAI Press. ISBN 978-1-57735-809-1.
Izza et al. (2024) Izza, Y.; Ignatiev, A.; Stuckey, P. J.; and Marques-Silva, J. 2024. Delivering Inflated Explanations. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11): 12744–12753.
Kelly, Chanchal, and Blake (2025) Kelly, D. A.; Chanchal, A.; and Blake, N. 2025. I Am Big, You Are Little; I Am Right, You Are Wrong. In IEEE/CVF International Conference on Computer Vision, ICCV. IEEE.
Kindermans et al. (2022) Kindermans, P.-J.; Hooker, S.; Adebayo, J.; Alber, M.; Schütt, K. T.; Dähne, S.; Erhan, D.; and Kim, B. 2022. The (Un)reliability of Saliency Methods, 267–280. Berlin, Heidelberg: Springer-Verlag. ISBN 978-3-030-28953-9.
Lundberg and Lee (2017a) Lundberg, S. M.; and Lee, S.-I. 2017a. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems (NeurIPS), volume 30, 4765–4774.
Lundberg and Lee (2017b) Lundberg, S. M.; and Lee, S.-I. 2017b. A Unified Approach to Interpreting Model Predictions. In Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
Marques-Silva et al. (2021) Marques-Silva, J.; Gerspacher, T.; Cooper, M. C.; Ignatiev, A.; and Narodytska, N. 2021. Explanations for Monotonic Classifiers. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, 7469–7479. PMLR.
Miller (2019) Miller, T. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267: 1–38.
Papadimitriou and Yannakakis (1982) Papadimitriou, C. H.; and Yannakakis, M. 1982. The complexity of facets (and some facets of complexity). JCSS, 28(2): 244–259.
Petsiuk, Das, and Saenko (2018) Petsiuk, V.; Das, A.; and Saenko, K. 2018. RISE: Randomized Input Sampling for Explanation of Black-box Models. In British Machine Vision Conference (BMVC). BMVA Press.
Ribeiro, Singh, and Guestrin (2016) Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. In Knowledge Discovery and Data Mining (KDD), 1135–1144. ACM.
Russakovsky et al. (2015) Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; and Fei-Fei, L. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3): 211–252.
Selvaraju et al. (2017) Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D. 2017. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. In International Conference on Computer Vision (ICCV), 618–626. IEEE.
Shi et al. (2016) Shi, J.; Yan, Q.; Xu, L.; and Jia, J. 2016. Hierarchical Image Saliency Detection on Extended CSSD. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4): 717–729.
Shih, Choi, and Darwiche (2018) Shih, A.; Choi, A.; and Darwiche, A. 2018. A symbolic approach to explaining Bayesian network classifiers. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, 5103–5111. AAAI Press. ISBN 9780999241127.
Shrotri et al. (2022) Shrotri, A. A.; Narodytska, N.; Ignatiev, A.; Meel, K. S.; Marques-Silva, J.; and Vardi, M. Y. 2022. Constraint-Driven Explanations for Black-Box ML Models. Proceedings of the AAAI Conference on Artificial Intelligence, 36(8): 8304–8314.
Srinivas and Fleuret (2019) Srinivas, S.; and Fleuret, F. 2019. Full-gradient representation for neural network visualization. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc.
Weslake (2015) Weslake, B. 2015. A partial theory of actual causation. British Journal for the Philosophy of Science, To appear.
Woodward (2003) Woodward, J. 2003. Making Things Happen: A Theory of Causal Explanation. Oxford, U.K.: Oxford University Press.

Appendix A Code and Data

Our algorithms have been included in the open source XAI tool rex. rex is available at https://github.com/ReX-XAI/ReX. Complete data and analysis code can be found at the following anonymous link https://figshare.com/s/7d822f952abcbe54ca93.

Appendix B Proofs

All definition references are to the main paper, unless explicitly stated otherwise.

Lemma 5.

For image classifiers, causal explanations (Definition 3) are abductive (Definition 5).

Proof.

For an instance $(v,c)$ , a set of variables $\mathcal{X}\subseteq\mathcal{F}$ and their values as defined by $v$ satisfy Definition 5 iff they satisfy EXIC1 in Definition 3. Moreover, $\mathcal{X}$ is subset-minimal in $\mathcal{F}$ iff it satisfies EXIC2 in Definition 3. Note that EXIC3 does not have an equivalent clause in the definition of abductive explanations, hence the other direction does not necessarily hold. ∎

Lemma 6.

Contrastive explanations (Definition 6) are equivalent to actual causes (Definition 1) in the same setting.

Proof.

Recall that the framework of logic-based explanations translates to causal models of depth $2$ with all input variables being independent. An instance $(v,c)$ in Definition 6 exists iff for a context $\vec{u}$ defined by $v$ , $(M,\vec{u})\models(\mathcal{Y}=\vec{y})\wedge(O=c)$ (AC1), where $\vec{y}$ is the restriction of $v$ to the variables in $\mathcal{Y}$ . In this type of causal models, $\vec{W}$ in AC2 is empty, hence Definition 6 holds for $\mathcal{Y}=\vec{y}$ iff AC2 holds. Finally, $\mathcal{Y}$ is minimal iff AC3 holds. ∎

Lemma 7.

A contrastive explanation always exists.

Proof.

Complete replacement of the pixels in an image $I$ is a contrastive explanation if no smaller subset is a contrastive explanation. ∎

This introduces a small complexity in the practical computation of contrastive explanations. It requires that the chosen baseline value does not that the same classification as an image $I$ , otherwise the baseline value does not introduce a contrastive element. rex checks for this problem automatically.

The following result follows from Lemma 6 and DP-completeness of actual causes in depth- $2$ causal models with independent inputs (Chockler et al. 2024). The class DP consists of those languages $L$ for which there exist a language $L_{1}$ in NP and a language $L_{2}$ in co-NP such that $L=L_{1}\cap L_{2}$ (Papadimitriou and Yannakakis 1982).

Lemma 8.

Given an input image, the decision problem of a contrastive explanation is DP-complete.

Proof.

Proof follows Chockler, Kelly, and Kroening (2025) ∎

Appendix C Experimental Results

Figure 7: Results on all

3

datasets for confidence threshold

\geq 1.0

Figure 7 shows the results across all $3$ datasets and models with $\delta=1.0$ . In general, all models follow a fairly similar pattern. It is interesting to note that MobileNet requires more pixels in general for a causal explanation, but also the lowest number of pixels for adjustment. This suggests that, for MobileNet at least, the contrastive explanation encodes nearly all of its completeness into the contrastive explanation.

Figure 8: Shortest path between the original classification and its adjustment pixels in the ImageNet hierarchy over

3

different datasets. The distance between adjustment and target class on ImageNet-1K is obviously different from both PascalVOC and ECSSD.

Figure 8 shows the length of the shortest path between the original classification and the adjustment pixels.

Appendix D Different $\delta$

Figure 9: Results on all

3

datasets for confidence threshold

\geq 0.5

Figure 9 shows results for all datasets when the $\delta$ -confident explanation threshold is $0.5$ . If minimality is taken as a quality indicatory, this setting of $\delta$ sees a general deterioration of quality. The adjustment pixels sets are larger in general across all models and datasets. This suggests that this is a payoff between contrast and completeness: forcing the contrast explanation to have a higher confidence reduces the size of the adjustment pixel set, whereas a lower $\delta$ leads to a smaller contrast explanation and larger adjustment set. Users should bear this in mind when decided what aspect of a model’s behavior they wish to explore.

Figure 10: Shortest path between the original classification and its inverse in the ImageNet hierarchy over

3

different datasets, with

\delta=0.5

. There is remarkable similarity across all three models, a similarity which is consistent over the different datasets.

Figure 11: Shortest path between the original classification and its adjustment pixels in the ImageNet hierarchy over

3

different datasets, with

\delta=0.5

. The distance between adjustment and target class on ImageNet-1K is obviously different from both PascalVOC and ECSSD.

Figure 11 shows the distance in the ImageNet hierarchy across $3$ datasets and $3$ models, given a $\delta=0.5$ for contrastive explanations, i.e. the contrastive explanation must contain at least $50\%$ of the original confidence.

Appendix E Caltech-256

Figure 12: Results on

3

classes from Caltech-256. The general pattern here is the same as for the other datasets.

Figure 12 shows a small study on $3$ different classes from Caltech-256 (Griffin, Holub, and Perona 2022). These are, in general, simple images compared to ImageNet. The general pattern seen in Figure 7 does not change.

(a) Shortest path the original classification and its contrast in the ImageNet hierarchy over a subset of Caltech-256. There is remarkable similarity across all three models, a similarity which is consistent over the different datasets.

(b) Shortest path between the original classification and its adjustment pixels in the ImageNet hierarchy over a subset of caltech-256. There is a lot of similarity across all three models, a similarity which is consistent over the different datasets.

Figure 13: Contrast and Adjustment distance on Caltech-256.

Figure 13 shows the contrast and adjustment distance in the ImageNet hierarchy on a small study of $3$ different classes from Caltech-256. This follows a similar pattern to Figure 6 in the main paper.

Causal Identification of Sufficient, Contrastive and Complete Feature Sets in Image Classification

Abstract

1 Introduction

Related Work

2 Background

2.1 Actual causality

Definition 1 (Actual cause).

Definition 2 (Explanation).

2.2 Actual causality in image classification

Definition 3 (Explanation for image classifiers).

2.3 Logic-based Explanations

Definition 4.

Definition 5.

Definition 6.

3 Theoretical Results

3.1 Equivalences between definitions

Lemma 1.

Corollary 3.1.

Lemma 2.

3.2 Contrastive and complete explanations with confidence

Causal models with confidence

Definition 7.

Lemma 3.

Contrastive explanations

Completeness and adjustment pixels

3.3 Input invariance

Lemma 4.

4 Algorithm

5 Experimental Results

6 Conclusions

References

Appendix A Code and Data

Appendix B Proofs

Lemma 5.

Proof.

Lemma 6.

Proof.

Lemma 7.

Proof.

Lemma 8.

Proof.

Appendix C Experimental Results

Appendix D Different δ\delta

Appendix E Caltech-256

Causal Identification of Sufficient, Contrastive and Complete Feature Sets
in Image Classification

Appendix D Different $\delta$