# axiomatic_foundations_of_explainability__c799d3f4.pdf

Axiomatic Foundations of Explainability

Leila Amgoud , Jonathan Ben-Naim CNRS IRIT, France {amgoud, bennaim}@irit.fr

Improving trust in decisions made by classiﬁcation models is becoming crucial for the acceptance of automated systems, and an important way of doing that is by providing explanations for the behaviour of the models. Different explainers have been proposed in the recent literature for that purpose, however their formal properties are under-studied. This paper investigates theoretically explainers that provide reasons behind decisions independently of instances. Its contributions are fourfold. The ﬁrst is to lay the foundations of such explainers by proposing key axioms, i.e., desirable properties they would satisfy. Two axioms are incompatible leading to two subsets. The second contribution consists of demonstrating that the ﬁrst subset of axioms characterizes a family of explainers that return sufﬁcient reasons while the second characterizes a family that provides necessary reasons. This sheds light on the axioms which distinguish the two types of reasons. As a third contribution, the paper introduces various explainers of both families, and fully characterizes some of them. Those explainers make use of the whole feature space. The fourth contribution is a family of explainers that generate explanations from ﬁnite datasets (subsets of the feature space). This family, seen as an abstraction of Anchors and LIME, violates some axioms including one which prevents incorrect explanations.

1 Introduction

Recent progress in data-driven AI has been largely due to machine learning and in particular deep learning models. However, the predictions of these models resist analysis due to their inherent non-linear behaviour and their vast amount of interacting parameters. This opacity impedes the relevance of those models from a theoretical point of view, since their properties are difﬁcult to investigate, and from a practical point of view, as many applications, such as healthcare or embedded systems need guarantees to be deployed, and others, e.g in the legal, or ﬁnancial domain require transparency to be accepted. Explanations help human users understand why a

decision was reached. Explaining the functionality of classiﬁcation systems and their rationale thus becomes a vital need. This has generated a lot of effort, see [Cyras et al., 2021; Guidotti et al., 2019; Miller, 2019; Biran and Cotton, 2017] for surveys on explainers of machine learning models. Existing explainers can be classiﬁed in two different ways. The ﬁrst way distinguishes explainers that provide local explanations for individual instances (eg. [Ribeiro et al., 2016; Ribeiro et al., 2018; Dhurandhar et al., 2018; Ignatiev et al., 2019; Darwiche and Hirth, 2020]) and explainers that provide global explanations for classes independently of instances (eg. [Ignatiev et al., 2019; Amgoud, 2021a]). The second way for classifying exiting explainers is based on the information used for generating explanations. Explainers, like Anchors and LIME [Ribeiro et al., 2016; Ribeiro et al., 2018; Amgoud, 2021b] use datasets while others, like those studied in [Ignatiev et al., 2019; Ignatiev et al., 2020; Darwiche and Hirth, 2020], use the whole set of instances. Despite the popularity of existing explainers, their formal properties are under-studied. This makes their comparison difﬁcult. Some explainers have been analysed against a set of metrics and have been shown to be efﬁcient. However, some counter-intuitive results have been detected in [Narodytska et al., 2019] for Anchors and LIME. This shows that the existing metrics are not sufﬁcient for analysing the quality of an explainer and guiding the deﬁnition of novel ones. They are also not sufﬁcient for an accurate comparison of explainers. The present paper bridges this gap by investigating the theoretical foundations of explainers that provide global explanations (i.e. reasons behind assigning classes independently of instances). Foundations are important not only for a better understanding of the explanation process in general, but also for clarifying the basic assumptions underlying every explainer, and for comparing different (families of) explainers. The paper contains four contributions. The ﬁrst is to lay the foundations of explainers by proposing key axioms, i.e., desirable properties, they would satisfy. Two axioms are shown to be incompatible, leading to two subsets. The second contribution consists of demonstrating that the ﬁrst subset of axioms characterizes the family of explainers that are based on abductive reasoning, hence producing sufﬁcient reasons, and the second subset of axioms characterizes the family of explainers that are based on counterfactual reasoning, i.e., returning necessary reasons. These characterisations shed

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

light on the properties that distinguish the two types of reasons. As a third contribution, the paper introduces various explainers of both families, each of them generating explanations under complete information, i.e., using the whole feature space. It fully characterizes some of them including the one which retunrs the so-called Prime Implicants and studied in [Ignatiev et al., 2019; Darwiche and Hirth, 2020; Audemard et al., 2020]. The fourth contribution is a family of explainers that generate reasons from ﬁnite datasets (subsets of the feature space). This family, seen as an abstraction of Anchors [Ribeiro et al., 2018] and LIME [Ribeiro et al., 2016], violates some axioms including one which prevents incorrect explanations.

2 Classiﬁcation We start by introducing the initial material needed to classify, i.e., classes as well as attributes and their domains. Deﬁnition 1 (Theory). A classiﬁcation theory is a triple T = A, d, C such that the following holds: A is a non-empty ﬁnite set of attributes (or features); d is a function on A such that, for every a A, d(a) is countable (discrete domains) with |d(a)| > 1; C is a ﬁnite set of classes such that |C| > 1. Next, we need to deﬁne the notion of literal, i.e., an assignment of a value to an attribute: Deﬁnition 2 (Literal). Let T = A, d, C be a classiﬁcation theory. A literal on T is a couple a, v such that a A and v d(a). We denote by Lit T the set of all literals on T. A subset L of Lit T is consistent iff, for any two elements l = a, v and l = a , v of L, if a = a , then v = v . We turn to the notion of instance, i.e., an assignment of values to all attributes: Deﬁnition 3 (Instance). Let T = A, d, C be a classiﬁcation theory. An instance on T is a subset I of Lit T such that every attribute a A appears exactly once in I. We denote by Inst T the set of all instances on T. Notice that every instance is consistent, and every proper subset of an instance is also consistent. Property 1. Let T = A, d, C be a classiﬁcation theory and I Inst T. I is consistent; for any I I, I is consistent. We are ready to deﬁne the notion of classiﬁer. It is a function which assigns a single class to every instance. Furthermore, every class is assigned to at least one instance. Deﬁnition 4 (Classiﬁer). Let T = A, d, C be a classiﬁcation theory. A classiﬁer on T is a surjective function R from Inst T to C. Notation (Inst TR(.)): We denote by Inst TR(x) the set of all instances of a class x in T and R, i.e., Inst TR(x) = {I Inst T : R(I) = x}. We show that every class is assigned to at least one instance and not assigned to at least one other instance. Property 2. Let T = A, d, C be a classiﬁcation theory and R a classiﬁer on T. For any x C, the following holds: Inst TR(x) = and Inst TR(x) = Inst T.

Let us now analyse the relation of a literal with a class. It may be irrelevant to the class, i.e., it has no impact on the class, or relevant to the class and thus its absence may prevent the class from being assigned to an instance, or core to the class, i.e. its absence automatically discards the class. Notation (Dif T(.)): Let T = A, d, C be a classiﬁcation theory, I Inst T, and a A. We denote by Dif T(I, a) the set of all instances on T that differs from I with regard to a, i.e., Dif T(I, a) is the set of every J Inst T \ {I} such that, b A \ {a}, v d(b), if b, v I, then b, v J.

A literal a, v is relevant to a class x under a theory T = A, d, C and a classiﬁer R iff there exists another value v d(a) which leads to another class than x. It is core to the class if the class is not proposed by R when the literal is absent.

Deﬁnition 5 (Relevance/Coreness). Let T = A, d, C be a classiﬁcation theory, R a classiﬁer on T, x C, and l = a, v Lit T. We say that l is relevant to x in T and R iff I Inst TR(x) such that the following holds:

I Dif T(I, a), I Inst TR(x).

l is core to x in T and R iff I Inst TR(x), l I.

Note that relevant literals exist since Inst T contains all the possible instances that can be built from a theory, i.e., all instances are assumed to be reasonable cases. Let us illustrate the above notions with a classical example borrowed from [Darwiche and Hirth, 2020].

Example 1. Consider the task of college admission. There are four binary attributes: Entrance exam (E), First time entrance (F), Work experience (W) and GPA. The decision is binary: a candidate is either admitted or denied. Consider a binary classiﬁer, represented by the following rules:

If E = 1 and F = 0, then Admit

If E = 1, F = 1, W = 1, then Admit

If E = 1, F = 1 and W = 0 and GPA = 1, then Admit

If E = 1, F = 1 and W = 0 and GPA = 0, then Deny

If E = 0, then Deny

Note that E, 1 is core to the class Admit while GPA, 1 is only relevant to Admit. However, there is no core literal to the class Deny.

Obviously, if a literal is core to a class, then it is also relevant to that class. The converse does not hold.

Proposition 1. Let T = A, d, C be a classiﬁcation theory and R a classiﬁer on T. For any x C, for any l Lit T, if l is core to x, then l is relevant to x.

3 Explanation Functions and Axioms

Explaining a classiﬁer amounts either to describing its global behaviour, namely how it affects classes independently of instances, or to locally justifying its prediction for an instance. However, the latter is generally based on the former. Indeed, an explanation of an instance describes why the classiﬁer assigned the class of the instance. Hence, in this paper we focus on explaining classes. An explanation answers the question:

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

why a class x is assigned by R? There are different categories of explanations as reviewed in [Schneider and Handali, 2019]. However, in this paper we focus exclusively on explanations that are literals since they are easy to interpret by humans. Indeed, research in cognitive science revealed that in practice, humans expect an explanation for the key factors that caused the given output. Furthermore, most of existing explanation functions (rule-based explanations, prime implicants, examples) are based on literals. Other categories (eg. conversation-based) are beyond the scope of this paper. Note that there may be several reasons for assigning a class.

Deﬁnition 6. A class question is a triple Q = T, R, x such that T = A, d, C is a classiﬁcation theory, R is a classiﬁer on T, and x is an element of C.

Formally, an explanation for a class is a set of subsets of literals. Every subset of literals, which may be the emptyset, is one reason behind predicting the class. Hence, what we call class explanation is the complete set of reasons.

Deﬁnition 7. Let T = A, d, C be a classiﬁcation theory. A class explanation on T is a set of subsets of Lit T. Every such subset is called a reason.

A class explainer, or explanation function, is a function which assigns to every class question a class explanation.

Deﬁnition 8. A class explainer is a function F mapping every question Q = T, R, x into a class explanation on T.

We provide below some formal properties that a reasonable class explainer could satisfy. Such properties are important for assessing the quality of an explanation function and for comparing pairs of functions. The ﬁrst property states that an explainer should always provide explanations. It is important to provide explanations for humans (eg., customer for whom a loan has been refused).

Axiom 1 (Success). A class explainer F satisﬁes success iff for any class question Q, F(Q) = .

The second property states that an explainer should provide informative explanations, and thus an empty explanation is not recommended.

Axiom 2 (Explainability). A class explainer F satisﬁes explainability iff for any class question Q, L F(Q), L = .

The next property states that reasons in an explanation should not contain unnecessary information.

Axiom 3 (Irreducibility). A class explainer F satisﬁes irreducibility iff for any class question Q = T, R, x , L F(Q), l L, I Inst T \ Inst TR(x) s.t. L \ {l} I.

The next property states that every reason is a subset of at least one instance. This ensures the feasability of reasons. Recall that the latter represent causes; when they occur, the classes they explain are suggested for instances.

Axiom 4 (Feasibility). A class explainer F satisﬁes feasibility iff for every class question Q = T, R, x , L F(Q), I Inst TR(x) s.t. L I.

Class explanations are the basis for explaining individual instances. Indeed, explaining an instance amounts to justify its class. The next axiom states that class explanations should

be not only sufﬁcient for explaining instances but also for reproducing the predictions of the classiﬁer. The second property makes it possible to use explanations on unseen data. Axiom 5 (Representativity). A class explainer F satisﬁes representativity iff for every class question Q = T, R, x , I Inst TR(x), L F(Q) s.t. L I. The following property states that an explanation should only contain information that impacts a prediction. Axiom 6 (Relevance). A class explainer F satisﬁes relevance iff for every class question Q = T, R, x , L F(Q), l L, l is relevant to x. We saw previously that some literals can be more than relevant for a class. They are core as their absence in an instance prevents a class from being assigned by a classiﬁer. The next axiom is more demanding than the previous one, and requires that an explanation contains only core literals. Axiom 7 (Coreness). A class explainer F satisﬁes coreness iff for every class question Q = T, R, x , L F(Q), l L, l is core to x. The next property ensures that information that is not part of reasons of a class is irrelevant to the class. This ensures exhaustivity of the explanation provided for the class. Axiom 8 (Exhaustivity). A class explainer F satisﬁes Exhaustivity iff for every class question Q = T, R, x , l Lit T, if l is relevant to x, then L F(Q) s.t. l L. The following property ensures that every core literal to a class should appear in the explanation of that class. Axiom 9 (Completeness). A class explainer F satisﬁes completeness iff for every class question Q = T, R, x , l Lit T, if l is core to x, then L F(Q) s.t. l L. The previous axioms describe properties of one class explanation. The last axiom is about the set of all such explanations that can be generated from a theory. It ensures their compatibility, avoiding thus erroneous explanations. The axiom states that the union of two reasons supporting different classes should be inconsistent. To illustrate the idea, consider an explainer that provides respectively L = {(a, v)} and L = {(b, v )} for the classes x and y. Note that L L is consistent, then there exists an instance I that contains L L . The two explanations support contradictory predictions for I. Axiom 10 (Coherence). A class explainer F satisﬁes coherence iff for any two class questions Q = T, R, x and Q = T , R , x s.t. T = T , R = R , and x = x , L F(Q), L F(Q ), L L is inconsistent. Feasibility guarantees the consistency of every reason. Property 3. Let F be a class explainer that satisﬁes Feasibility, Q a class question. For any L F(Q), L is consistent. From a couple of axioms, it follows that a reason causes the class it explains. Indeed, its appearance in any instance leads the classiﬁer to assign that class to it. Proposition 2. Let F be a class explainer that satisﬁes Feasibility, Representativity and Coherence, Q = T, R, x a class question. The following holds:

L F(Q), I Inst T s.t. L I, I Inst TR(x).

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

Exhaustivity and Relevance ensure that the literals used in the explanation of a class are exactly all those that are relevant to the class. Likewise, Completeness and Coreness ensure that explanations are based on all and only core literals.

Theorem 1. Let F be a class explainer and Q = T, R, x a class question. The following two points hold:

F satisﬁes Exhaustivity and Relevance iff [

L F(Q) L = {l Lit T : l is relevant to x};

F satisﬁes Completeness and Coreness iff [

L F(Q) L = {l Lit T : l is core to x}.

The above axioms are not all independent. Some of them follow from others. We considered them in the paper since they allow to discriminate between explainers. Some explainers may satisfy only an implied axiom while others may satisfy the one that does not follow from any other axiom.

Proposition 3. Let F be a class explainer.

if F satisﬁes Representativity, then F satisﬁes Success;

if F satisﬁes Coreness, then F satisﬁes Relevance;

if F satisﬁes Exhaustivity, then F satisﬁes Completeness;

if F satisﬁes Feasibility, Coherence and Representativity, then F satisﬁes Explainability, Exhaustivity.

Most of the axioms are compatible, i.e., there exists at least one explanation function that satisﬁes them all together (obviously for any classiﬁer and any theory). It is no surprise that Coreness and Exhaustivity are incompatible since they express diverging strategies that may be followed by explainers. Finally, since core literals may not exist, the three axioms (Success, Explainability, Coreness) are incompatible.

Proposition 4. The following holds:

Success, Explainability, Irreducibility, Feasibility, Representativity, Relevance, Exhaustivity, Completeness, and Coherence are compatible;

Success, Irreducibility, Feasibility, Representativity, Relevance, Coreness, and Completeness are compatible;

Explainability, Irreducibility, Feasibility, Relevance, Coreness, and Completeness are compatible;

Coreness and Exhaustivity are incompatible.

Success, Explainability and Coreness are incompatible.

4 Explainers Based on Abductive Reasoning

One of the most studied explainers is based on abductive reasoning. It looks for sets of literals that are sufﬁcient for assigning a class to a given instance. It thus explains instances instead of classes. Its explanations are called minimal sufﬁcient subsets in [Camburu et al., 2020], prime implicants in [Shih et al., 2018; Darwiche and Hirth, 2020] or abductive

explanations in [Ignatiev, 2020]. In [Amgoud, 2021a], abductive reasoning is used for explaining classes. The idea is to highlight factors that caused a class. In that spirit, we investigate a family of class explainers based on the abductive reasoning. We call them the sufﬁciency explainers. Such explainers generate explanations under complete information (i.e., the whole set of instances is available, which is reasonable for explaining some quite simple classiﬁers like decision trees) and adopt the following abductive principle: if a class x is assigned whenever a literal l is observed, then we extrapolate that l is a reason for x. Let us formally deﬁne the sufﬁciency explainers. As a preliminary, we need a notation for the set of all those subsets of literals that are sufﬁcient to force a certain class: Deﬁnition 9. Let Q = T, R, x be a class question. We denote by Suff Q the set of every L Lit T such that: L is consistent; I Inst T, if L I, then I Inst TR(x). We are ready to deﬁne our family of explainers based on complete information and the abductive reasoning: Deﬁnition 10 (Sufﬁciency). A sufﬁciency class explainer is a class explainer F such that, for every class question Q, F(Q) Suff Q, I Inst TR(x), L F(Q), L I. Next, we characterize the sufﬁciency explainers with three axioms, namely Feasibility, Representativity, and Coherence. As a preliminary, we ﬁrst show that every class explainer satisfying the three aforementioned axioms returns explanations which are subsets of those generated by Suff Q: Theorem 2. If a class explainer F satisﬁes Feasibility, Representativity and Coherence, then, for any class question Q = T, R, x , the inclusion F(Q) Suff Q holds. We are ready for the characterization: Theorem 3. A class explainer F satisﬁes Feasibility, Representativity, Coherence iff F is a sufﬁciency class explainer. It is worth mentioning that a sufﬁciency class explainer violates Relevance, Coreness and Irreducibility (see Table 1). In what follows, we provide two speciﬁc explainers of this family. The ﬁrst one, called the all-abductive explainer (a Abd), returns all sufﬁcient reasons for a class. Deﬁnition 11 (a Abd). We denote by a Abd the class explainer transforming every class question Q into Suff Q.

Example 1 (Cont.) Examples of reason for Admit are: {(E, 1), (F, 0)}, {(E, 1), (F, 0), (GPA, 1)}.

The following result shows that the class explainer a Abd satisﬁes most of the axioms. Theorem 4. The following properties hold: a Abd satisﬁes Success, Explainability, Feasibility, Representativity, Exhaustivity, Completeness, Coherence; a Abd violates Irreducibility, Relevance and Coreness. We turn to a second speciﬁc sufﬁciency explainer, called the min-abductive explainer (m Abd). The latter returns the minimal sufﬁcient reasons for a class.

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

Sufﬁciency a Abd m Abd a Ctf m Ctf x Ctf f-r Abd Success Explainability Irreducibility Feasibility Representativity Relevance Coreness Exhaustivity Completeness Coherence

Table 1: The symbol stands for the axiom is satisﬁed by the explainer.

Deﬁnition 12 (m Abd). The min-abductive class explainer (m Abd) is a class explainer transforming every class question Q = T, R, x into the set of every L Lit T such that:

L is consistent;

I Inst T such that L I, I Inst TR(x);

L L, L does not satisfy the above point.

Example 1 (Cont.) The class Admit has three reasons, which correspond to the three preconditions of the rules. The same holds for Deny.

The explainer m Abd reﬁnes a Abd by keeping only the minimal (for set-inclusion) explanations.

Proposition 5. For any class question Q = T, R, x , m Abd(Q) = {L a Abd(Q) : L L, L / a Abd(Q)}.

The min-abductive explainer satisﬁes all our axioms except Coreness. Due to the minimality condition, m Abd ensures that every literal in an explanation is relevant to the explained class. Furthermore, it keeps only the minimally sufﬁcient subset of literals for causing a class.

Theorem 5. m Abd satisﬁes Success, Explainability, Irreducibility, Feasibility, Representativity, Relevance, Exhaustivity, Completeness, and Coherence, but violates Coreness.

We now present below a representation theorem which characterizes the abductive explainer m Abd. We show that m Abd is the only explainer satisfying all axioms except coreness (recall that some axioms imply others).

Theorem 6. A class explainer F satisﬁes Irreducibility, Feasibility, Representativity, and Coherence iff F = m Abd.

5 Explainers Based on Counterfactual Reasoning

We turn to a second family of explainers, called the necessity explainers. It is based on complete information and the following counterfactual principle: if a literal l is observed whenever a class x is assigned, then we extrapolate that l is a reason for assigning x. Put differently, if l was not observed, then x would not have been assigned, hence the word counterfactual. As a preliminary to deﬁne the necessity explainers, we need a notation for those subsets of literals that are necessary to a certain class:

Deﬁnition 13. Let Q = T, R, x be a class question. We denote by Nec Q the set of every L Lit T such that:

L is consistent;

I Inst T, if L I, then I Inst TR(x).

Note that the necessary subsets of literals for a class x constitute the power set of the intersection of all instances of x.

Proposition 6. Let Q = T, R, x be a class question. Then, Nec Q = Pow[T Inst TR(x)].

We are ready to deﬁne our family of explainers based on complete information and the counterfactual reasoning.

Deﬁnition 14 (Necessity). A necessity class explainer is a class explainer F such that, for every class question Q, F(Q) Nec Q.

Let us investigate a speciﬁc member of the family, which returns all necessary subsets of literals:

Deﬁnition 15. The all-counterfactual explainer (a Ctf) is a function transforming every class question Q into Nec Q.

Example 1 (Cont.) Let Q be the question centered on Admit and Q the question centered on Deny. We have T Inst TR(Admit) = { E, 1 }. Thus, Nec Q = Pow({ E, 1 }) = { , { E, 1 }}. Thus, a Ctf(T, R, Admit) = { , { E, 1 }}. Similarly, T Inst TR(Deny) = . Thus, Nec Q = Pow( ) = { }. Thus, a Ctf(T, R, Deny) = { }.

We axiomatically analyse a Ctf:

Theorem 7. a Ctf satisﬁes Success, Irreducibility, Feasibility, Representativity, Relevance, Coreness, Completeness. It violates Explainability, Exhaustivity, and Coherence.

We turn to a second speciﬁc explainer, which minimizes the necessary subsets:

Deﬁnition 16 (m Ctf). The min-counterfactual explainer (m Ctf) is the function transforming every class question Q = T, R, x into the set of every subset L of Lit T such that:

L = ; L is consistent;

I Inst T, if L I, then I Inst TR(x);

L L, L does not satisfy the above two points.

Example 1 (Cont.) We have m Ctf(T, R, Admit) = {{ E, 1 }} and m Ctf(T, R, Deny) = .

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

We axiomatically analyze m Ctf. Note that we lose Success and Representativity, but we gain Explainability.

Theorem 8. m Ctf satisﬁes Explainability, Irreducibility, Feasibility, Relevance, Coreness, and Completeness. It violates Success, Representativity, Exhaustivity, and Coherence.

Finally, we introduce a third speciﬁc explainer, which maximizes the necessary subsets.

Deﬁnition 17 (x Ctf). The max-counterfactual explainer (x Ctf) is the function transforming every class question Q = T, R, x into the set of every subset L of Lit T such that:

L is consistent;

I Inst T, if L I, then I Inst TR(x);

L L, L does not satisfy the above two points.

Notice that x Ctf returns only one reason, namely the intersection of all instances of the class in question:

Proposition 7. Let Q = T, R, x be a class question. Then, x Ctf(Q) = {T Inst TR(x)}.

Example 1 (Cont.) We have x Ctf(T, R, Admit) = {{ E, 1 }} and x Ctf(T, R, Deny) = { }.

We axiomatically analyze x Ctf and observe that it satisﬁes exactly the same axioms as a Ctf. So, returning all necessary subsets or the largest one (i.e., the intersection of the instances of the class in question) lead to the same axioms.

Theorem 9. x Ctf satisﬁes Success, Irreducibility, Feasibility, Representativity, Relevance, Coreness, and Completeness. It violates Explainability, Exhaustivity, and Coherence.

6 Explaining Under Incomplete Information

In this section, we investigate explanations under incomplete information (i.e., not all instances are available, which is typically the case with the dataset a classiﬁer has been trained on, or the dataset generated for existing explainers like Anchors and LIME). Working with incomplete information makes sense, in particular, for complex classiﬁers whose querying may not be reasonable for all instances. Note that our abductive and counterfactual explainers (deﬁned in the previous sections) work with the whole set of instances. However, in practice only a subset of instances (dataset) is available. The question is: does our previous results still hold if reasons are generated from a proper subset of Inst T? The answer is unfortunately negative. We deﬁne a parameterized family of explainers that provide minimally sufﬁcient reasons from a dataset. The parameter is a function which selects the dataset to be used. Such a deﬁnition abstracts Anchors and LIME since they both use datasets generated in different ways.

Deﬁnition 18 (Fragments). Let T = A, d, C be a classiﬁcation theory, R a classiﬁer on T, and S Inst T. We say that S is a fragment in T and R iff, for every x C, we have that Inst TR(x) S = .

Deﬁnition 19. A fragment selector is a function f transforming every couple (T, R) such that T is a classiﬁcation theory and R a classiﬁer on T into a fragment in T and R.

We are now ready to introduce the novel family.

Deﬁnition 20. Let f be a fragment selector. The f-relaxed abductive explainer (f-r Abd) is the function transforming every class question Q = T, R, x into the set of every subset L of Lit T such that: I f(T, R) such that L I; I f(T, R) such that L I, I Inst TR(x); L L, L does not satisfy the above point. Property 4. Let f be a fragment selector and Q = T, R, x a class question. For any L f-r Abd(Q), L is consistent. We show that, for every fragment selector f, f-r Abd satisﬁes Success, Explainability, Feasibility and Irreducibility, and it violates the remaining axioms. This is not surprising since it generates explanations from a subset of instances. Theorem 10. Let f be a fragment selector. f-r Abd satisﬁes Success, Explainability, Feasibility and Irreducibility. It violates Coreness, Relevance, Completeness, Exhaustivity, Representativity and Coherence. The following result shows that f-r Abd satisﬁes a weak version of Representativity. Indeed, every instance of the set f(T, R) is a superset of at least one reason of its class. Proposition 8. Let f be a fragment selector. f-r Abd satisﬁes Weak Representativity, i.e., for every class question Q = T, R, x , for every I f(T, R) Inst TR(x), there exists L f-r Abd(Q) such that L I. Existing heuristics explanation functions like Anchor and LIME violate Coherence, leading to incorrect outcomes in some cases. Recall that both Anchors and LIME are not class explainers, they are instance explainers, i.e., they provide reasons for assigning R(I) to an instance I.

7 Related Work There haven t been a lot of axiomatic approaches to explainability. Most of existing works propose instances of explainers and analyse them either experimentally (eg. [Ignatiev et al., 2019]) or formally (eg. [Darwiche and Hirth, 2020]). None of these works have discussed axioms. In [Wolf et al., 2019], some axioms have been proposed for instance explainers. Our axioms concern class explainers. Contrastive explanations are widely studied. They describe what should be modiﬁed in order to avoid a class. It has been shown in [Amgoud, 2021a] that they are dual to the reasons generated by m Abd. Hence, they represent the same concept. That s why in this paper, we investigated only one of them.

8 Conclusion This paper studied foundations of explainers that justify classes. It provided key axioms that an explainer would satisfy and characterised various explainers that satisfy them. It highlighted the key axioms that separate sufﬁcient reasons from necessary ones (i.e., counterfactuals). Another important result of the paper concerns the family of explainers that generate reasons from a subset of instances. We showed that they violate relevance, leading to erroneous explanations. As a future work, we plan to extend our axioms for dealing with other types of explanations like the conversational ones.

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)

Acknowledgments

Support from the ANR-3IA Artiﬁcial and Natural Intelligence Toulouse Institute (ANITI) is gratefully acknowledged.

[Amgoud, 2021a] Leila Amgoud. Explaining black-box classiﬁcation models with arguments. In 33rd IEEE International Conference on Tools with Artiﬁcial Intelligence, ICTAI, pages 791 795, 2021.

[Amgoud, 2021b] Leila Amgoud. Non-monotonic explanation functions. In Proceedings of the 16th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, ECSQARU, volume 12897 of Lecture Notes in Computer Science, pages 19 31, 2021.

[Audemard et al., 2020] Gilles Audemard, Fr ed eric Koriche, and Pierre Marquis. On tractable XAI queries based on compiled representations. In Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning, KR, pages 838 849, 2020.

[Biran and Cotton, 2017] Or Biran and Courtenay Cotton. Explanation and justiﬁcation in machine learning: A survey. In IJCAI Workshop on Explainable Artiﬁcial Intelligence (XAI), pages 1 6, 2017.

[Camburu et al., 2020] Oana-Maria Camburu, Eleonora Giunchiglia, Jakob N. Foerster, Thomas Lukasiewicz, and Phil Blunsom. The struggles of feature-based explanations: Shapley values vs. minimal sufﬁcient subsets. Ar Xiv, abs/2009.11023, 2020.

[Cyras et al., 2021] Kristijonas Cyras, Antonio Rago, Emanuele Albini, Pietro Baroni, and Francesca Toni. Argumentative XAI: A survey. In Proceedings of the Thirtieth International Joint Conference on Artiﬁcial Intelligence, IJCAI, pages 4392 4399, 2021.

[Darwiche and Hirth, 2020] Adnan Darwiche and Auguste Hirth. On the reasons behind decisions. In 24th European Conference on Artiﬁcial Intelligence ECAI, volume 325, pages 712 720. IOS Press, 2020.

[Dhurandhar et al., 2018] Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Pai-Shun Ting, Karthikeyan Shanmugam, and Payel Das. Explanations based on the missing: Towards contrastive explanations with pertinent negatives. In Annual Conference on Neural Information Processing Systems, Neur IPS, pages 590 601, 2018.

[Guidotti et al., 2019] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. A survey of methods for explaining black box models. ACM Computing Surveys, 51(5):93:1 93:42, 2019.

[Ignatiev et al., 2019] Alexey Ignatiev, Nina Narodytska, and Joao Marques-Silva. On relating explanations and adversarial examples. In Neur IPS, pages 15857 15867, 2019.

[Ignatiev et al., 2020] Alexey Ignatiev, Nina Narodytska, Nicholas Asher, and Jo ao Marques-Silva. From contrastive to abductive explanations and back again. In XIXth International Conference of the Italian Association for Artiﬁcial Intelligence, volume 12414 of Lecture Notes in Computer Science, pages 335 355, 2020. [Ignatiev, 2020] Alexey Ignatiev. Towards trustable explainable AI. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence, IJCAI, pages 5154 5158, 2020. [Miller, 2019] Tim Miller. Explanation in artiﬁcial intelligence: Insights from the social sciences. Artiﬁcial Intelligence, 267:1 38, 2019. [Narodytska et al., 2019] Nina Narodytska, Aditya A. Shrotri, Kuldeep S. Meel, Alexey Ignatiev, and Jo ao Marques-Silva. Assessing heuristic machine learning explanations with model counting. In Proceedings of the 22nd International Conference Theory and Applications of Satisﬁability Testing - SAT, pages 267 278, 2019. [Ribeiro et al., 2016] Marco T ulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should itrust you?: Explaining the predictions of any classiﬁer. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 1135 1144, 2016. [Ribeiro et al., 2018] Marco T ulio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: High-precision modelagnostic explanations. In Proceedings of the Thirty Second AAAI Conference on Artiﬁcial Intelligence, (AAAI18), pages 1527 1535, 2018. [Schneider and Handali, 2019] Johannes Schneider and Joshua Peter Handali. Personalized explanation for machine learning: a conceptualization. In 27th European Conference on Information Systems - Information Systems for a Sharing Society, ECIS, 2019. [Shih et al., 2018] Andy Shih, Arthur Choi, and Adnan Darwiche. A symbolic approach to explaining bayesian network classiﬁers. In Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence, IJCAI, pages 5103 5111, 2018. [Wolf et al., 2019] Lior Wolf, Tomer Galanti, and Tamir Hazan. A formal approach to explainability. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society AIES, pages 255 261, 2019.

Proceedings of the Thirty-First International Joint Conference on Artiﬁcial Intelligence (IJCAI-22)