# axiomatic_foundations_of_explainability__c799d3f4.pdf Axiomatic Foundations of Explainability Leila Amgoud , Jonathan Ben-Naim CNRS IRIT, France {amgoud, bennaim}@irit.fr Improving trust in decisions made by classification models is becoming crucial for the acceptance of automated systems, and an important way of doing that is by providing explanations for the behaviour of the models. Different explainers have been proposed in the recent literature for that purpose, however their formal properties are under-studied. This paper investigates theoretically explainers that provide reasons behind decisions independently of instances. Its contributions are fourfold. The first is to lay the foundations of such explainers by proposing key axioms, i.e., desirable properties they would satisfy. Two axioms are incompatible leading to two subsets. The second contribution consists of demonstrating that the first subset of axioms characterizes a family of explainers that return sufficient reasons while the second characterizes a family that provides necessary reasons. This sheds light on the axioms which distinguish the two types of reasons. As a third contribution, the paper introduces various explainers of both families, and fully characterizes some of them. Those explainers make use of the whole feature space. The fourth contribution is a family of explainers that generate explanations from finite datasets (subsets of the feature space). This family, seen as an abstraction of Anchors and LIME, violates some axioms including one which prevents incorrect explanations. 1 Introduction Recent progress in data-driven AI has been largely due to machine learning and in particular deep learning models. However, the predictions of these models resist analysis due to their inherent non-linear behaviour and their vast amount of interacting parameters. This opacity impedes the relevance of those models from a theoretical point of view, since their properties are difficult to investigate, and from a practical point of view, as many applications, such as healthcare or embedded systems need guarantees to be deployed, and others, e.g in the legal, or financial domain require transparency to be accepted. Explanations help human users understand why a decision was reached. Explaining the functionality of classification systems and their rationale thus becomes a vital need. This has generated a lot of effort, see [Cyras et al., 2021; Guidotti et al., 2019; Miller, 2019; Biran and Cotton, 2017] for surveys on explainers of machine learning models. Existing explainers can be classified in two different ways. The first way distinguishes explainers that provide local explanations for individual instances (eg. [Ribeiro et al., 2016; Ribeiro et al., 2018; Dhurandhar et al., 2018; Ignatiev et al., 2019; Darwiche and Hirth, 2020]) and explainers that provide global explanations for classes independently of instances (eg. [Ignatiev et al., 2019; Amgoud, 2021a]). The second way for classifying exiting explainers is based on the information used for generating explanations. Explainers, like Anchors and LIME [Ribeiro et al., 2016; Ribeiro et al., 2018; Amgoud, 2021b] use datasets while others, like those studied in [Ignatiev et al., 2019; Ignatiev et al., 2020; Darwiche and Hirth, 2020], use the whole set of instances. Despite the popularity of existing explainers, their formal properties are under-studied. This makes their comparison difficult. Some explainers have been analysed against a set of metrics and have been shown to be efficient. However, some counter-intuitive results have been detected in [Narodytska et al., 2019] for Anchors and LIME. This shows that the existing metrics are not sufficient for analysing the quality of an explainer and guiding the definition of novel ones. They are also not sufficient for an accurate comparison of explainers. The present paper bridges this gap by investigating the theoretical foundations of explainers that provide global explanations (i.e. reasons behind assigning classes independently of instances). Foundations are important not only for a better understanding of the explanation process in general, but also for clarifying the basic assumptions underlying every explainer, and for comparing different (families of) explainers. The paper contains four contributions. The first is to lay the foundations of explainers by proposing key axioms, i.e., desirable properties, they would satisfy. Two axioms are shown to be incompatible, leading to two subsets. The second contribution consists of demonstrating that the first subset of axioms characterizes the family of explainers that are based on abductive reasoning, hence producing sufficient reasons, and the second subset of axioms characterizes the family of explainers that are based on counterfactual reasoning, i.e., returning necessary reasons. These characterisations shed Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22) light on the properties that distinguish the two types of reasons. As a third contribution, the paper introduces various explainers of both families, each of them generating explanations under complete information, i.e., using the whole feature space. It fully characterizes some of them including the one which retunrs the so-called Prime Implicants and studied in [Ignatiev et al., 2019; Darwiche and Hirth, 2020; Audemard et al., 2020]. The fourth contribution is a family of explainers that generate reasons from finite datasets (subsets of the feature space). This family, seen as an abstraction of Anchors [Ribeiro et al., 2018] and LIME [Ribeiro et al., 2016], violates some axioms including one which prevents incorrect explanations. 2 Classification We start by introducing the initial material needed to classify, i.e., classes as well as attributes and their domains. Definition 1 (Theory). A classification theory is a triple T = A, d, C such that the following holds: A is a non-empty finite set of attributes (or features); d is a function on A such that, for every a A, d(a) is countable (discrete domains) with |d(a)| > 1; C is a finite set of classes such that |C| > 1. Next, we need to define the notion of literal, i.e., an assignment of a value to an attribute: Definition 2 (Literal). Let T = A, d, C be a classification theory. A literal on T is a couple a, v such that a A and v d(a). We denote by Lit T the set of all literals on T. A subset L of Lit T is consistent iff, for any two elements l = a, v and l = a , v of L, if a = a , then v = v . We turn to the notion of instance, i.e., an assignment of values to all attributes: Definition 3 (Instance). Let T = A, d, C be a classification theory. An instance on T is a subset I of Lit T such that every attribute a A appears exactly once in I. We denote by Inst T the set of all instances on T. Notice that every instance is consistent, and every proper subset of an instance is also consistent. Property 1. Let T = A, d, C be a classification theory and I Inst T. I is consistent; for any I I, I is consistent. We are ready to define the notion of classifier. It is a function which assigns a single class to every instance. Furthermore, every class is assigned to at least one instance. Definition 4 (Classifier). Let T = A, d, C be a classification theory. A classifier on T is a surjective function R from Inst T to C. Notation (Inst TR(.)): We denote by Inst TR(x) the set of all instances of a class x in T and R, i.e., Inst TR(x) = {I Inst T : R(I) = x}. We show that every class is assigned to at least one instance and not assigned to at least one other instance. Property 2. Let T = A, d, C be a classification theory and R a classifier on T. For any x C, the following holds: Inst TR(x) = and Inst TR(x) = Inst T. Let us now analyse the relation of a literal with a class. It may be irrelevant to the class, i.e., it has no impact on the class, or relevant to the class and thus its absence may prevent the class from being assigned to an instance, or core to the class, i.e. its absence automatically discards the class. Notation (Dif T(.)): Let T = A, d, C be a classification theory, I Inst T, and a A. We denote by Dif T(I, a) the set of all instances on T that differs from I with regard to a, i.e., Dif T(I, a) is the set of every J Inst T \ {I} such that, b A \ {a}, v d(b), if b, v I, then b, v J. A literal a, v is relevant to a class x under a theory T = A, d, C and a classifier R iff there exists another value v d(a) which leads to another class than x. It is core to the class if the class is not proposed by R when the literal is absent. Definition 5 (Relevance/Coreness). Let T = A, d, C be a classification theory, R a classifier on T, x C, and l = a, v Lit T. We say that l is relevant to x in T and R iff I Inst TR(x) such that the following holds: I Dif T(I, a), I Inst TR(x). l is core to x in T and R iff I Inst TR(x), l I. Note that relevant literals exist since Inst T contains all the possible instances that can be built from a theory, i.e., all instances are assumed to be reasonable cases. Let us illustrate the above notions with a classical example borrowed from [Darwiche and Hirth, 2020]. Example 1. Consider the task of college admission. There are four binary attributes: Entrance exam (E), First time entrance (F), Work experience (W) and GPA. The decision is binary: a candidate is either admitted or denied. Consider a binary classifier, represented by the following rules: If E = 1 and F = 0, then Admit If E = 1, F = 1, W = 1, then Admit If E = 1, F = 1 and W = 0 and GPA = 1, then Admit If E = 1, F = 1 and W = 0 and GPA = 0, then Deny If E = 0, then Deny Note that E, 1 is core to the class Admit while GPA, 1 is only relevant to Admit. However, there is no core literal to the class Deny. Obviously, if a literal is core to a class, then it is also relevant to that class. The converse does not hold. Proposition 1. Let T = A, d, C be a classification theory and R a classifier on T. For any x C, for any l Lit T, if l is core to x, then l is relevant to x. 3 Explanation Functions and Axioms Explaining a classifier amounts either to describing its global behaviour, namely how it affects classes independently of instances, or to locally justifying its prediction for an instance. However, the latter is generally based on the former. Indeed, an explanation of an instance describes why the classifier assigned the class of the instance. Hence, in this paper we focus on explaining classes. An explanation answers the question: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22) why a class x is assigned by R? There are different categories of explanations as reviewed in [Schneider and Handali, 2019]. However, in this paper we focus exclusively on explanations that are literals since they are easy to interpret by humans. Indeed, research in cognitive science revealed that in practice, humans expect an explanation for the key factors that caused the given output. Furthermore, most of existing explanation functions (rule-based explanations, prime implicants, examples) are based on literals. Other categories (eg. conversation-based) are beyond the scope of this paper. Note that there may be several reasons for assigning a class. Definition 6. A class question is a triple Q = T, R, x such that T = A, d, C is a classification theory, R is a classifier on T, and x is an element of C. Formally, an explanation for a class is a set of subsets of literals. Every subset of literals, which may be the emptyset, is one reason behind predicting the class. Hence, what we call class explanation is the complete set of reasons. Definition 7. Let T = A, d, C be a classification theory. A class explanation on T is a set of subsets of Lit T. Every such subset is called a reason. A class explainer, or explanation function, is a function which assigns to every class question a class explanation. Definition 8. A class explainer is a function F mapping every question Q = T, R, x into a class explanation on T. We provide below some formal properties that a reasonable class explainer could satisfy. Such properties are important for assessing the quality of an explanation function and for comparing pairs of functions. The first property states that an explainer should always provide explanations. It is important to provide explanations for humans (eg., customer for whom a loan has been refused). Axiom 1 (Success). A class explainer F satisfies success iff for any class question Q, F(Q) = . The second property states that an explainer should provide informative explanations, and thus an empty explanation is not recommended. Axiom 2 (Explainability). A class explainer F satisfies explainability iff for any class question Q, L F(Q), L = . The next property states that reasons in an explanation should not contain unnecessary information. Axiom 3 (Irreducibility). A class explainer F satisfies irreducibility iff for any class question Q = T, R, x , L F(Q), l L, I Inst T \ Inst TR(x) s.t. L \ {l} I. The next property states that every reason is a subset of at least one instance. This ensures the feasability of reasons. Recall that the latter represent causes; when they occur, the classes they explain are suggested for instances. Axiom 4 (Feasibility). A class explainer F satisfies feasibility iff for every class question Q = T, R, x , L F(Q), I Inst TR(x) s.t. L I. Class explanations are the basis for explaining individual instances. Indeed, explaining an instance amounts to justify its class. The next axiom states that class explanations should be not only sufficient for explaining instances but also for reproducing the predictions of the classifier. The second property makes it possible to use explanations on unseen data. Axiom 5 (Representativity). A class explainer F satisfies representativity iff for every class question Q = T, R, x , I Inst TR(x), L F(Q) s.t. L I. The following property states that an explanation should only contain information that impacts a prediction. Axiom 6 (Relevance). A class explainer F satisfies relevance iff for every class question Q = T, R, x , L F(Q), l L, l is relevant to x. We saw previously that some literals can be more than relevant for a class. They are core as their absence in an instance prevents a class from being assigned by a classifier. The next axiom is more demanding than the previous one, and requires that an explanation contains only core literals. Axiom 7 (Coreness). A class explainer F satisfies coreness iff for every class question Q = T, R, x , L F(Q), l L, l is core to x. The next property ensures that information that is not part of reasons of a class is irrelevant to the class. This ensures exhaustivity of the explanation provided for the class. Axiom 8 (Exhaustivity). A class explainer F satisfies Exhaustivity iff for every class question Q = T, R, x , l Lit T, if l is relevant to x, then L F(Q) s.t. l L. The following property ensures that every core literal to a class should appear in the explanation of that class. Axiom 9 (Completeness). A class explainer F satisfies completeness iff for every class question Q = T, R, x , l Lit T, if l is core to x, then L F(Q) s.t. l L. The previous axioms describe properties of one class explanation. The last axiom is about the set of all such explanations that can be generated from a theory. It ensures their compatibility, avoiding thus erroneous explanations. The axiom states that the union of two reasons supporting different classes should be inconsistent. To illustrate the idea, consider an explainer that provides respectively L = {(a, v)} and L = {(b, v )} for the classes x and y. Note that L L is consistent, then there exists an instance I that contains L L . The two explanations support contradictory predictions for I. Axiom 10 (Coherence). A class explainer F satisfies coherence iff for any two class questions Q = T, R, x and Q = T , R , x s.t. T = T , R = R , and x = x , L F(Q), L F(Q ), L L is inconsistent. Feasibility guarantees the consistency of every reason. Property 3. Let F be a class explainer that satisfies Feasibility, Q a class question. For any L F(Q), L is consistent. From a couple of axioms, it follows that a reason causes the class it explains. Indeed, its appearance in any instance leads the classifier to assign that class to it. Proposition 2. Let F be a class explainer that satisfies Feasibility, Representativity and Coherence, Q = T, R, x a class question. The following holds: L F(Q), I Inst T s.t. L I, I Inst TR(x). Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22) Exhaustivity and Relevance ensure that the literals used in the explanation of a class are exactly all those that are relevant to the class. Likewise, Completeness and Coreness ensure that explanations are based on all and only core literals. Theorem 1. Let F be a class explainer and Q = T, R, x a class question. The following two points hold: F satisfies Exhaustivity and Relevance iff [ L F(Q) L = {l Lit T : l is relevant to x}; F satisfies Completeness and Coreness iff [ L F(Q) L = {l Lit T : l is core to x}. The above axioms are not all independent. Some of them follow from others. We considered them in the paper since they allow to discriminate between explainers. Some explainers may satisfy only an implied axiom while others may satisfy the one that does not follow from any other axiom. Proposition 3. Let F be a class explainer. if F satisfies Representativity, then F satisfies Success; if F satisfies Coreness, then F satisfies Relevance; if F satisfies Exhaustivity, then F satisfies Completeness; if F satisfies Feasibility, Coherence and Representativity, then F satisfies Explainability, Exhaustivity. Most of the axioms are compatible, i.e., there exists at least one explanation function that satisfies them all together (obviously for any classifier and any theory). It is no surprise that Coreness and Exhaustivity are incompatible since they express diverging strategies that may be followed by explainers. Finally, since core literals may not exist, the three axioms (Success, Explainability, Coreness) are incompatible. Proposition 4. The following holds: Success, Explainability, Irreducibility, Feasibility, Representativity, Relevance, Exhaustivity, Completeness, and Coherence are compatible; Success, Irreducibility, Feasibility, Representativity, Relevance, Coreness, and Completeness are compatible; Explainability, Irreducibility, Feasibility, Relevance, Coreness, and Completeness are compatible; Coreness and Exhaustivity are incompatible. Success, Explainability and Coreness are incompatible. 4 Explainers Based on Abductive Reasoning One of the most studied explainers is based on abductive reasoning. It looks for sets of literals that are sufficient for assigning a class to a given instance. It thus explains instances instead of classes. Its explanations are called minimal sufficient subsets in [Camburu et al., 2020], prime implicants in [Shih et al., 2018; Darwiche and Hirth, 2020] or abductive explanations in [Ignatiev, 2020]. In [Amgoud, 2021a], abductive reasoning is used for explaining classes. The idea is to highlight factors that caused a class. In that spirit, we investigate a family of class explainers based on the abductive reasoning. We call them the sufficiency explainers. Such explainers generate explanations under complete information (i.e., the whole set of instances is available, which is reasonable for explaining some quite simple classifiers like decision trees) and adopt the following abductive principle: if a class x is assigned whenever a literal l is observed, then we extrapolate that l is a reason for x. Let us formally define the sufficiency explainers. As a preliminary, we need a notation for the set of all those subsets of literals that are sufficient to force a certain class: Definition 9. Let Q = T, R, x be a class question. We denote by Suff Q the set of every L Lit T such that: L is consistent; I Inst T, if L I, then I Inst TR(x). We are ready to define our family of explainers based on complete information and the abductive reasoning: Definition 10 (Sufficiency). A sufficiency class explainer is a class explainer F such that, for every class question Q, F(Q) Suff Q, I Inst TR(x), L F(Q), L I. Next, we characterize the sufficiency explainers with three axioms, namely Feasibility, Representativity, and Coherence. As a preliminary, we first show that every class explainer satisfying the three aforementioned axioms returns explanations which are subsets of those generated by Suff Q: Theorem 2. If a class explainer F satisfies Feasibility, Representativity and Coherence, then, for any class question Q = T, R, x , the inclusion F(Q) Suff Q holds. We are ready for the characterization: Theorem 3. A class explainer F satisfies Feasibility, Representativity, Coherence iff F is a sufficiency class explainer. It is worth mentioning that a sufficiency class explainer violates Relevance, Coreness and Irreducibility (see Table 1). In what follows, we provide two specific explainers of this family. The first one, called the all-abductive explainer (a Abd), returns all sufficient reasons for a class. Definition 11 (a Abd). We denote by a Abd the class explainer transforming every class question Q into Suff Q. Example 1 (Cont.) Examples of reason for Admit are: {(E, 1), (F, 0)}, {(E, 1), (F, 0), (GPA, 1)}. The following result shows that the class explainer a Abd satisfies most of the axioms. Theorem 4. The following properties hold: a Abd satisfies Success, Explainability, Feasibility, Representativity, Exhaustivity, Completeness, Coherence; a Abd violates Irreducibility, Relevance and Coreness. We turn to a second specific sufficiency explainer, called the min-abductive explainer (m Abd). The latter returns the minimal sufficient reasons for a class. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22) Sufficiency a Abd m Abd a Ctf m Ctf x Ctf f-r Abd Success Explainability Irreducibility Feasibility Representativity Relevance Coreness Exhaustivity Completeness Coherence Table 1: The symbol stands for the axiom is satisfied by the explainer. Definition 12 (m Abd). The min-abductive class explainer (m Abd) is a class explainer transforming every class question Q = T, R, x into the set of every L Lit T such that: L is consistent; I Inst T such that L I, I Inst TR(x); L L, L does not satisfy the above point. Example 1 (Cont.) The class Admit has three reasons, which correspond to the three preconditions of the rules. The same holds for Deny. The explainer m Abd refines a Abd by keeping only the minimal (for set-inclusion) explanations. Proposition 5. For any class question Q = T, R, x , m Abd(Q) = {L a Abd(Q) : L L, L / a Abd(Q)}. The min-abductive explainer satisfies all our axioms except Coreness. Due to the minimality condition, m Abd ensures that every literal in an explanation is relevant to the explained class. Furthermore, it keeps only the minimally sufficient subset of literals for causing a class. Theorem 5. m Abd satisfies Success, Explainability, Irreducibility, Feasibility, Representativity, Relevance, Exhaustivity, Completeness, and Coherence, but violates Coreness. We now present below a representation theorem which characterizes the abductive explainer m Abd. We show that m Abd is the only explainer satisfying all axioms except coreness (recall that some axioms imply others). Theorem 6. A class explainer F satisfies Irreducibility, Feasibility, Representativity, and Coherence iff F = m Abd. 5 Explainers Based on Counterfactual Reasoning We turn to a second family of explainers, called the necessity explainers. It is based on complete information and the following counterfactual principle: if a literal l is observed whenever a class x is assigned, then we extrapolate that l is a reason for assigning x. Put differently, if l was not observed, then x would not have been assigned, hence the word counterfactual. As a preliminary to define the necessity explainers, we need a notation for those subsets of literals that are necessary to a certain class: Definition 13. Let Q = T, R, x be a class question. We denote by Nec Q the set of every L Lit T such that: L is consistent; I Inst T, if L I, then I Inst TR(x). Note that the necessary subsets of literals for a class x constitute the power set of the intersection of all instances of x. Proposition 6. Let Q = T, R, x be a class question. Then, Nec Q = Pow[T Inst TR(x)]. We are ready to define our family of explainers based on complete information and the counterfactual reasoning. Definition 14 (Necessity). A necessity class explainer is a class explainer F such that, for every class question Q, F(Q) Nec Q. Let us investigate a specific member of the family, which returns all necessary subsets of literals: Definition 15. The all-counterfactual explainer (a Ctf) is a function transforming every class question Q into Nec Q. Example 1 (Cont.) Let Q be the question centered on Admit and Q the question centered on Deny. We have T Inst TR(Admit) = { E, 1 }. Thus, Nec Q = Pow({ E, 1 }) = { , { E, 1 }}. Thus, a Ctf(T, R, Admit) = { , { E, 1 }}. Similarly, T Inst TR(Deny) = . Thus, Nec Q = Pow( ) = { }. Thus, a Ctf(T, R, Deny) = { }. We axiomatically analyse a Ctf: Theorem 7. a Ctf satisfies Success, Irreducibility, Feasibility, Representativity, Relevance, Coreness, Completeness. It violates Explainability, Exhaustivity, and Coherence. We turn to a second specific explainer, which minimizes the necessary subsets: Definition 16 (m Ctf). The min-counterfactual explainer (m Ctf) is the function transforming every class question Q = T, R, x into the set of every subset L of Lit T such that: L = ; L is consistent; I Inst T, if L I, then I Inst TR(x); L L, L does not satisfy the above two points. Example 1 (Cont.) We have m Ctf(T, R, Admit) = {{ E, 1 }} and m Ctf(T, R, Deny) = . Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22) We axiomatically analyze m Ctf. Note that we lose Success and Representativity, but we gain Explainability. Theorem 8. m Ctf satisfies Explainability, Irreducibility, Feasibility, Relevance, Coreness, and Completeness. It violates Success, Representativity, Exhaustivity, and Coherence. Finally, we introduce a third specific explainer, which maximizes the necessary subsets. Definition 17 (x Ctf). The max-counterfactual explainer (x Ctf) is the function transforming every class question Q = T, R, x into the set of every subset L of Lit T such that: L is consistent; I Inst T, if L I, then I Inst TR(x); L L, L does not satisfy the above two points. Notice that x Ctf returns only one reason, namely the intersection of all instances of the class in question: Proposition 7. Let Q = T, R, x be a class question. Then, x Ctf(Q) = {T Inst TR(x)}. Example 1 (Cont.) We have x Ctf(T, R, Admit) = {{ E, 1 }} and x Ctf(T, R, Deny) = { }. We axiomatically analyze x Ctf and observe that it satisfies exactly the same axioms as a Ctf. So, returning all necessary subsets or the largest one (i.e., the intersection of the instances of the class in question) lead to the same axioms. Theorem 9. x Ctf satisfies Success, Irreducibility, Feasibility, Representativity, Relevance, Coreness, and Completeness. It violates Explainability, Exhaustivity, and Coherence. 6 Explaining Under Incomplete Information In this section, we investigate explanations under incomplete information (i.e., not all instances are available, which is typically the case with the dataset a classifier has been trained on, or the dataset generated for existing explainers like Anchors and LIME). Working with incomplete information makes sense, in particular, for complex classifiers whose querying may not be reasonable for all instances. Note that our abductive and counterfactual explainers (defined in the previous sections) work with the whole set of instances. However, in practice only a subset of instances (dataset) is available. The question is: does our previous results still hold if reasons are generated from a proper subset of Inst T? The answer is unfortunately negative. We define a parameterized family of explainers that provide minimally sufficient reasons from a dataset. The parameter is a function which selects the dataset to be used. Such a definition abstracts Anchors and LIME since they both use datasets generated in different ways. Definition 18 (Fragments). Let T = A, d, C be a classification theory, R a classifier on T, and S Inst T. We say that S is a fragment in T and R iff, for every x C, we have that Inst TR(x) S = . Definition 19. A fragment selector is a function f transforming every couple (T, R) such that T is a classification theory and R a classifier on T into a fragment in T and R. We are now ready to introduce the novel family. Definition 20. Let f be a fragment selector. The f-relaxed abductive explainer (f-r Abd) is the function transforming every class question Q = T, R, x into the set of every subset L of Lit T such that: I f(T, R) such that L I; I f(T, R) such that L I, I Inst TR(x); L L, L does not satisfy the above point. Property 4. Let f be a fragment selector and Q = T, R, x a class question. For any L f-r Abd(Q), L is consistent. We show that, for every fragment selector f, f-r Abd satisfies Success, Explainability, Feasibility and Irreducibility, and it violates the remaining axioms. This is not surprising since it generates explanations from a subset of instances. Theorem 10. Let f be a fragment selector. f-r Abd satisfies Success, Explainability, Feasibility and Irreducibility. It violates Coreness, Relevance, Completeness, Exhaustivity, Representativity and Coherence. The following result shows that f-r Abd satisfies a weak version of Representativity. Indeed, every instance of the set f(T, R) is a superset of at least one reason of its class. Proposition 8. Let f be a fragment selector. f-r Abd satisfies Weak Representativity, i.e., for every class question Q = T, R, x , for every I f(T, R) Inst TR(x), there exists L f-r Abd(Q) such that L I. Existing heuristics explanation functions like Anchor and LIME violate Coherence, leading to incorrect outcomes in some cases. Recall that both Anchors and LIME are not class explainers, they are instance explainers, i.e., they provide reasons for assigning R(I) to an instance I. 7 Related Work There haven t been a lot of axiomatic approaches to explainability. Most of existing works propose instances of explainers and analyse them either experimentally (eg. [Ignatiev et al., 2019]) or formally (eg. [Darwiche and Hirth, 2020]). None of these works have discussed axioms. In [Wolf et al., 2019], some axioms have been proposed for instance explainers. Our axioms concern class explainers. Contrastive explanations are widely studied. They describe what should be modified in order to avoid a class. It has been shown in [Amgoud, 2021a] that they are dual to the reasons generated by m Abd. Hence, they represent the same concept. That s why in this paper, we investigated only one of them. 8 Conclusion This paper studied foundations of explainers that justify classes. It provided key axioms that an explainer would satisfy and characterised various explainers that satisfy them. It highlighted the key axioms that separate sufficient reasons from necessary ones (i.e., counterfactuals). Another important result of the paper concerns the family of explainers that generate reasons from a subset of instances. We showed that they violate relevance, leading to erroneous explanations. As a future work, we plan to extend our axioms for dealing with other types of explanations like the conversational ones. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22) Acknowledgments Support from the ANR-3IA Artificial and Natural Intelligence Toulouse Institute (ANITI) is gratefully acknowledged. [Amgoud, 2021a] Leila Amgoud. Explaining black-box classification models with arguments. In 33rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI, pages 791 795, 2021. [Amgoud, 2021b] Leila Amgoud. Non-monotonic explanation functions. In Proceedings of the 16th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, ECSQARU, volume 12897 of Lecture Notes in Computer Science, pages 19 31, 2021. [Audemard et al., 2020] Gilles Audemard, Fr ed eric Koriche, and Pierre Marquis. On tractable XAI queries based on compiled representations. In Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning, KR, pages 838 849, 2020. [Biran and Cotton, 2017] Or Biran and Courtenay Cotton. Explanation and justification in machine learning: A survey. In IJCAI Workshop on Explainable Artificial Intelligence (XAI), pages 1 6, 2017. [Camburu et al., 2020] Oana-Maria Camburu, Eleonora Giunchiglia, Jakob N. Foerster, Thomas Lukasiewicz, and Phil Blunsom. The struggles of feature-based explanations: Shapley values vs. minimal sufficient subsets. Ar Xiv, abs/2009.11023, 2020. [Cyras et al., 2021] Kristijonas Cyras, Antonio Rago, Emanuele Albini, Pietro Baroni, and Francesca Toni. Argumentative XAI: A survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI, pages 4392 4399, 2021. [Darwiche and Hirth, 2020] Adnan Darwiche and Auguste Hirth. On the reasons behind decisions. In 24th European Conference on Artificial Intelligence ECAI, volume 325, pages 712 720. IOS Press, 2020. [Dhurandhar et al., 2018] Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Pai-Shun Ting, Karthikeyan Shanmugam, and Payel Das. Explanations based on the missing: Towards contrastive explanations with pertinent negatives. In Annual Conference on Neural Information Processing Systems, Neur IPS, pages 590 601, 2018. [Guidotti et al., 2019] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. A survey of methods for explaining black box models. ACM Computing Surveys, 51(5):93:1 93:42, 2019. [Ignatiev et al., 2019] Alexey Ignatiev, Nina Narodytska, and Joao Marques-Silva. On relating explanations and adversarial examples. In Neur IPS, pages 15857 15867, 2019. [Ignatiev et al., 2020] Alexey Ignatiev, Nina Narodytska, Nicholas Asher, and Jo ao Marques-Silva. From contrastive to abductive explanations and back again. In XIXth International Conference of the Italian Association for Artificial Intelligence, volume 12414 of Lecture Notes in Computer Science, pages 335 355, 2020. [Ignatiev, 2020] Alexey Ignatiev. Towards trustable explainable AI. In Christian Bessiere, editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI, pages 5154 5158, 2020. [Miller, 2019] Tim Miller. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267:1 38, 2019. [Narodytska et al., 2019] Nina Narodytska, Aditya A. Shrotri, Kuldeep S. Meel, Alexey Ignatiev, and Jo ao Marques-Silva. Assessing heuristic machine learning explanations with model counting. In Proceedings of the 22nd International Conference Theory and Applications of Satisfiability Testing - SAT, pages 267 278, 2019. [Ribeiro et al., 2016] Marco T ulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should itrust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 1135 1144, 2016. [Ribeiro et al., 2018] Marco T ulio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: High-precision modelagnostic explanations. In Proceedings of the Thirty Second AAAI Conference on Artificial Intelligence, (AAAI18), pages 1527 1535, 2018. [Schneider and Handali, 2019] Johannes Schneider and Joshua Peter Handali. Personalized explanation for machine learning: a conceptualization. In 27th European Conference on Information Systems - Information Systems for a Sharing Society, ECIS, 2019. [Shih et al., 2018] Andy Shih, Arthur Choi, and Adnan Darwiche. A symbolic approach to explaining bayesian network classifiers. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI, pages 5103 5111, 2018. [Wolf et al., 2019] Lior Wolf, Tomer Galanti, and Tamir Hazan. A formal approach to explainability. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society AIES, pages 255 261, 2019. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22)