# faithshap_the_faithful_shapley_interaction_index__30b78261.pdf

Journal of Machine Learning Research 24 (2023) 1-42 Submitted 2/22; Revised 9/22; Published 3/23

Faith-Shap: The Faithful Shapley Interaction Index

Che-Ping Tsai chepingt@cs.cmu.edu Department of Machine Learning Carnegie Mellon University PA 15213, USA

Chih-Kuan Yeh cjyeh@cs.cmu.edu Department of Machine Learning Carnegie Mellon University PA 15213, USA

Pradeep Ravikumar pradeepr@cs.cmu.edu Department of Machine Learning Carnegie Mellon University PA 15213, USA

Editor: Tommi Jaakkola

Shapley values, which were originally designed to assign attributions to individual players in coalition games, have become a commonly used approach in explainable machine learning to provide attributions to input features for black-box machine learning models. A key attraction of Shapley values is that they uniquely satisfy a very natural set of axiomatic properties. However, extending the Shapley value to assigning attributions to interactions rather than individual players, an interaction index, is non-trivial: as the natural set of axioms for the original Shapley values, extended to the context of interactions, no longer specify a unique interaction index. Many proposals thus introduce additional less natural axioms, while sacriﬁcing the key axiom of eﬃciency, in order to obtain unique interaction indices. In this work, rather than introduce additional conﬂicting axioms, we adopt the viewpoint of Shapley values as coeﬃcients of the most faithful linear approximation to the pseudo-Boolean coalition game value function. By extending linear to ℓ-order polynomial approximations, we can then deﬁne the general family of faithful interaction indices. We show that by additionally requiring the faithful interaction indices to satisfy interactionextensions of the standard individual Shapley axioms (dummy, symmetry, linearity, and eﬃciency), we obtain a unique Faithful Shapley Interaction index, which we denote Faith Shap, as a natural generalization of the Shapley value to interactions. We then provide some illustrative contrasts of Faith-Shap with previously proposed interaction indices, and further investigate some of its interesting algebraic properties. We further show the computational eﬃciency of computing Faith-Shap, together with some additional qualitative insights, via some illustrative experiments.

Keywords: Feature Attribution, Feature Interaction, Interaction Index, Shapley Value, Explainable AI

c 2023 Che-Ping Tsai, Chih-Kuan Yeh and Pradeep Ravikumar.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v24/22-0202.html.

Tsai, Yeh and Ravikumar

1. Introduction

Explaining the prediction of a black-box machine learning model via attributions to its features is an increasingly important task. Most approaches have focused on attributions to individual features, which does not always suﬃce to provide insight into the model when there are heavy feature interactions. For instance, when explaining models with text input, we might also ask for attributions to phrases and sequences of words rather than just individual words. Similarly, in Question Answering (QA) (Ye et al., 2021), it is of interest to measure attributions to query answer tuples, rather than just individual entities associated with answers. Such feature interactions are also salient with images as input, where instead of attributions to individual pixels, we might prefer attributions to groups of pixels. A large class of recent approaches for individual feature attributions reduces the task to a cooperative game theory problem. Given a machine learning model, a test point, and the underlying data distribution, one can devise a set value function that takes as input a set of features and outputs the value of that set of features. There are many choices for such a reduction to a set function (Lundberg and Lee, 2017; Sundararajan and Najmi, 2019; Frye et al., 2020; Chen et al., 2020). We can then relate this to a cooperative game theory problem where the features are players, the set function above is the value function of the coalition game that speciﬁes the value of various player coalitions, and we wish to derive feature attributions given such a value function. This meta-approach has led to a slew of explanation approaches when the goal is to obtain individual feature attributions. The key question we focus on in this paper is to obtain attributions to feature interactions instead. In this setting, any feature interactions (up to a given order), along with each individual feature, should get some attribution score. This question has attracted some attention in the cooperative game theory and the explainable AI literature, with the broad strategy of extending popular approaches for individual feature attributions, such as Shapley and Banzhaf values (Shapley, 1953; Harsanyi, 1963), to the interaction context. But these existing proposals come with many caveats. Part of the attraction of the cooperative game theory based explanations above is that for the case of individual feature attributions, if we stipulate some natural axioms such as linearity, symmetry, dummy, and eﬃciency (detailed in a later section), there exist unique attributions such as Shapley and Banzhaf (depending on the notion of eﬃciency). Thus we have both a strong axiomatic foundation to the explanations, as well as a very compelling uniqueness result that there can exist no other explanations that satisfy these axioms. These have thus led to an explosion of Shapley value based explanations in the XAI literature that assign attributions to features, data, and even concepts (Lundberg and Lee, 2017; Grömping, 2007; Lindeman, 1980; Owen, 2014; Owen and Prieur, 2017; Datta et al., 2016; Ghorbani and Zou, 2019; Jia et al., 2019; Yeh et al., 2020). However, when we move to the context of feature interactions, while the axioms above have natural extensions from the individual feature to the feature interaction context, they no longer result in a unique feature attribution value. Approaches to address this have thus focused on adding additional less natural axioms to ensure uniqueness. One set of unique feature attributions Shapley interaction and Banzhaf interaction indices (Grabisch and Roubens, 1999) derive unique attributions via a recursive axiom, which speciﬁes how higher-order feature attributions be derived from

Faith-Shap: The Faithful Shapley Interaction Index

lower order feature interaction attributions (all the way to individual feature attributions). Thus, given the uniqueness at the level of individual feature attributions, we in turn get uniqueness at all levels of interaction attributions. One major caveat of these Shapley interaction and Banzhaf interaction indices is that they do not satisfy the eﬃciency axiom for interaction feature attributions, and hence can no longer be viewed as distributing the total contribution of the model prediction among all feature interactions. The other caveat is that the recursive axiom, while convenient to extend uniqueness from individual to interaction feature attributions, is much less natural when compared with the original Shapley axioms, which speciﬁcally deﬁned the forms of ﬁrst-order indices for certain value functions. To address these caveats, Sundararajan et al. (2020) proposed the interaction distribution axiom that entails distributing higher-order interactions to the topmost interaction indices at the expense of impoverished lower-order interactions. This makes the interaction attributions unique for unanimity games (Shapley, 1953), and since these act as a basis for set value functions, by linearity this ensures uniqueness of interaction attributions for general games. The caveat however is that the speciﬁed attribution distribution inordinately favors the topmost interactions, which in turn aﬀects the usefulness of both the lower and highestorder interactions as we show in our examples. And arguably, the interaction distribution axioms too are much less natural when compared to the original Shapley axioms. Thus, there remains an open problem to specify a natural restriction or axiom that allows for unique interaction attributions. An additional desideratum is that the feature interaction attributions be cognizant of the maximum interaction order of the interaction attributions we require. For instance, with individual feature attributions, the maximum interaction order is one, while with pairwise feature attributions, the maximum interaction order is two. This would allow the explanations to be tailored to the set of possible interactions and satisfy the relevant axioms with respect to just these interactions, instead of all possible subsets of feature interactions. In this work, rather than devising potentially less natural axioms to ensure uniqueness, we work from yet another viewpoint of Shapley values, that they are faithful to the set value function: for all subsets, the sum of individual feature attributions over a subset should approximate the set value function evaluated on that subset. When formalized as a weighted regression problem, this yields Shapley and Banzhaf values depending on the weights in the weighted regression (Banzhaf III, 1964; Ruiz et al., 1996). We then extend the above weighted regression to feature interactions up to a given maximum interaction order, which then yields what we call Faith-Interaction indices. We show that when restricting to the class of Faith-Interaction indices, together with the (interaction extensions of the individual) Shapley axioms, we obtain a unique interaction index, which we term the Faith-Shap (for Faithful Shapley Interaction) index, which reduces to the individual feature Shapley values when the top interaction order is one. We thus posit Faith-Shap as the natural extension of Shapley values from individual features to interaction indices. Similarly, when the eﬃciency axiom is replaced by the generalized 2-eﬃciency axiom, we obtain a unique interaction index, which we term Faith-Banzhaf (for Faithful Banzhaf Interaction) index. The latter has also appeared in other guises in prior work (Hammer and Holzman, 1992; Grabisch et al., 2000). Unlike the other restrictive axioms discussed earlier, here we only require that the explanations be faithful to the model, which has always been a big attraction of Shapley values in the explainable AI (XAI) context. We corroborate the usefulness of these

Tsai, Yeh and Ravikumar

Faith-Interaction indices by contrasting them with prior indices in two illustrative coalition games, as well as real-world XAI applications. We then discuss the algebraic properties of Faithful Shapley Interaction index by relating them to cardinal indices, i.e. indices that can be expressed as a linear combination of marginal contributions, as well as in terms of approximations to multilinear extensions of the coalition set value function. An additional beneﬁt of the Faith Interaction indices is that the estimation becomes much more eﬃcient via leveraging the weighted linear regression formulation, which we validate in our experiments.

2. Preliminaries

2.1 Notations

Suppose we are given a black-box model f : X 7 R, with input domain X Rd; and suppose we wish to explain its prediction at a given test point x X. Suppose also given the tuple f, x (and possibly with additional information about the underlying data distribution on which f is trained on, and from which x is drawn), there is a well-deﬁned set function vx : 2d R. We can interpret such a set function as specifying the value of a subset of the set of d features. Many popular explanations employ such a reduction of the model and its prediction context to set value functions; see Ribeiro et al. (2016); Lundberg and Lee (2017); Sundararajan et al. (2020) for many examples. When clear from the context, and for notational simplicity, we will often omit x and simply use v to denote the set function. Such a reduction allows us to leverage results from cooperative game theory, by relating the set of features to a set of players, and the set function above as specifying the values of coalitions of players. We are then interested in quantifying the importance of interactions between diﬀerent features up to some order ℓ [d]. Note that in this context, when we mean interactions between features, we mean non-self interactions between distinct features, since self-self interactions could simply be identiﬁed with the individual features. In other words, we require an importance function E which for each coalition S [d] where 0 |S| ℓ, outputs a scalar ES(v, ℓ). Let Sℓdenote the set of all subsets of [d] with size less than or

equal to ℓ; the size of this set can be seen to be dℓ def = Pℓ j=0 d j . We then use the shorthand E(v, ℓ) = (ES(v, ℓ))S Sℓ Rdℓ. To simplify notation, we omit braces for small sets and write T i to represent T {i}.

2.2 Deﬁnitions

We begin by recalling the concept of discrete derivatives.

Deﬁnition 1 (Discrete Derivative) Given a set function v : 2d 7 R and two ﬁnite disjoint coalitions S, T [d] with S T = Ø, the S-derivative of v at T, S(v(T)), is deﬁned recursively as follows:

iv(T) = v(T i) v(T), i [d], and (1)

S(v(T)) = i[ S\i(v(T))] = X

L S ( 1)|S| |L|v(T L), i S. (2)

Faith-Shap: The Faithful Shapley Interaction Index

The second equality in Eqn. (2) can be shown via induction on S (Fujimoto et al., 2006). As an illustration of discrete derivatives, for a subset S of size 2, the discrete derivative can be written as

{i,j}v(T) = v(T {i, j}) v(T j) v(T i) + v(T).

{i,j}v(T) captures the joint eﬀect of features i and j co-occurring compared to the individual eﬀects of i and j. If {i,j}v(T) > 0 (resp. < 0), we say i and j have positive (resp. negative) interaction eﬀect in the presence of T since the presence of i increases (resp. decreases) the marginal contribution of j to coalition T. Following the intuition from the two features example, the discrete derivative S(v(T)) can be viewed as a measurement of the marginal interaction of S in the presence of T. When a set of features have a positive (negative) interaction eﬀect, the discrete derivative is positive (negative). Discrete derivatives play a fundamental role in measurement of interaction eﬀects. As we will see in the following section, the Shapley and Banzhaf interaction indices can be viewed as a weighted average of S-derivatives over all subsets T [d]\S.

Next, let us recall the concept of the Möbius transform.

Deﬁnition 2 (Möbius transform) Given set function v : 2d 7 R, the Möbius transform of v( ) is

a(v, S) = X

T S ( 1)|S| |T|v(T) for all S [d]. (3)

An important property (Shapley, 1953) of the Möbius transform is that any set function v( ) can be expressed as:

R [d] a(v, R) v R, (4)

where v R for any R [d] has the form v R(S) = 1 if S R and 0 otherwise; and is also known as a unanimity game value function in game theory. Eqn. (4) states that any set function can be expressed as a linear combination of these unanimity game value functions (so that {v R}R [d] form a basis for real-valued set value functions), with the Möbius transforms a(v, R) as their coeﬃcients. Note that if an interaction index satisﬁes the interaction linearity axiom (to be discussed in the sequel), the interaction index for general set value functions can be expressed as a linear combination of the interaction indices for unanimity games.

3. Background: Axioms for Interaction Indices

In this section, we present natural extensions of Shapley axioms for individual features to the feature interactions (Grabisch and Roubens, 1999; Sundararajan et al., 2020). We then discuss the key interaction indices proposed so far in the literature the Shapley interaction index, Banzhaf interaction index and Shapley-Taylor interaction index with respect to these axioms. In all these axioms, we allow for dependence on the maximum interaction order ℓ [d]. A summarization of axioms that these interaction indices satisfy is in Table 1.

Tsai, Yeh and Ravikumar

Indices Interaction linearity

Interaction symmetry

Interaction dummy

Interaction eﬃciency

Interaction recursive

Generalized Interaction 2-eﬃciency

Interaction distribution

Is Faith-Interaction Index Shapley Interaction

Banzhaf Interaction

Shapley Taylor

Faithful Shapley

Faithful Banzhaf

Table 1: A table of axioms that diﬀerent interaction indices satisfy.

Axiom 3 (Interaction Linearity): For any maximum interaction order ℓ [d], and for any two set functions v1 and v2, and any two scalars α1, α2 R, the interaction index satisﬁes: E(α1v1 + α2v2, ℓ) = α1E(v1, ℓ) + α2E(v2, ℓ).

The interaction linearity axiom states that the feature interaction index is a linear functional of the set function v( ). It ensures that the corresponding indices scale with the value function v( ).

Axiom 4 (Interaction Symmetry): For any maximum interaction order ℓ [d], and for any set function v : 2d 7 R that is symmetric with respect to elements i, j [d], so that v(S i) = v(S j)for any S [d]\{i, j}, the interaction index satisﬁes: ET i(v, ℓ) = ET j(v, ℓ) for any T [d]\{i, j} with |T| < ℓ.

The interaction symmetry axiom entails that if the value function treats two features the same, their corresponding feature interaction index values should be the same as well.

Axiom 5 (Interaction Dummy): For any maximum interaction order ℓ [d], and for any set function v : 2d 7 R such that v(S i) = v(S) for some i [d] and for all S [d]\{i}, the interaction index satisﬁes: ET (v, ℓ) = 0 for all T Sℓwith i T.

The interaction dummy axiom entails that a dummy feature i [d] that has no inﬂuence on the function v should have no interaction eﬀect with the other features.

Axiom 6 (Interaction Eﬃciency): For any maximum interaction order ℓ [d], and for any set function v : 2d R, the interaction index satisﬁes: P S Sℓ\Ø ES(v, ℓ) = v([d]) v(Ø) and EØ(v, ℓ) = v(Ø).

The interaction eﬃciency ensures that the interaction index distributes the total value v([d]) among the diﬀerent subsets in Sℓ. This axiom lends itself a natural explanation of ES(v, ℓ): it represents the marginal contribution that the group S makes to the total value, which has also been considered by Sundararajan et al. (2020). As we will detail in the sequel, some of the recently proposed interaction indices do not satisfy such an eﬃciency axiom. For instance, the chaining interaction and Shapley interaction indices only require the total sum of individual feature importances to sum to v([d]) v(Ø), without consideration of the higher-order interaction importances.

Faith-Shap: The Faithful Shapley Interaction Index

Challenge: Lack of Uniqueness: These axioms are natural extensions to the interaction setting of classical axioms for individual feature attributions; see Fujimoto et al. (2006); Grabisch and Roubens (1999) for a counterpart of these interaction axioms without consideration of the maximum interaction order ℓ [d]. As Sundararajan et al. (2020) note, though the linearity, symmetry, dummy, and eﬃciency axioms uniquely specify a feature attribution when the maximum interaction order ℓ= 1 (i.e. for individual feature attributions), they no longer do when ℓ> 1. In other words, there could exist many interaction indices that all satisfy the axioms speciﬁed above. A big attraction of the individual Shapley value was its uniqueness given the corresponding individual attribution axioms. Accordingly, a line of work has focused on specifying additional axioms that together specify a unique interaction index.

Axiom 7 (Recursive Interaction): For any maximum interaction order 2 ℓ d, and for any set function v : 2d R, and for any j [d], let the reduced set functions v[d]\j, v[d]\j j : 2d 1 R be deﬁned as:

for all T [d]\j, v[d]\j(T) = v(T), and v[d]\j j (T) = v(T j) v(j).

Then the interaction index satisﬁes: ES(v, ℓ) = ES\j(v[d]\j j , ℓ) ES\j(v[d]\j, ℓ), S Sℓwith |S| 2.

The recursive axiom above is an extension of the recursive axiom of Grabisch and Roubens (1999) to account for arbitrary maximum interaction orders. The axiom can be informally interpreted as how does the presence or absence of feature j inﬂuence the share of feature set S . But more importantly (and the reason it is termed the recursive axiom) is that it speciﬁes how higher-order interaction scores are uniquely determined given lower-order interaction indices. By recursion, the higher-order interaction indices are thus uniquely speciﬁed given just the singleton feature attributions. The reason this helps with uniqueness is that so long as the axioms entail unique singleton attributions, together with this recursive axiom, they would entail unique interaction attributions. Thus, we argue that the recursive axiom is less natural compared to previously introduced axioms since the recursive axiom only ensures the uniqueness property, at the potential expense of other axiomatic properties.

Shapley Interaction Index: Grabisch and Roubens (1999) thus show that there is a unique interaction index that satisﬁes the interaction linearity, symmetry, dummy, and the recursive axioms (but not the interaction eﬃciency axiom), and whose restrictions to singleton sets correspond to Shapley values. They term this interaction index Shapley interaction index. This Shapley interaction index has the following closed form:

EShap S (v, ℓ) = X

|T|!(d |S| |T|)!

(d |S| + 1)! S(v(T)), S Sℓ. (5)

A critical caveat of the resulting Shapley interaction value is that it no longer satisﬁes the interaction eﬃciency axiom when the maximum interaction order ℓ> 1. Indeed, simply summing the contributions to singleton sets (i.e. the classical individual attribution Shapley values) is already equal to v([d]) v(Ø), so the only way for the interaction eﬃciency axiom to be satisﬁed if all the other interaction attributions sum to zero, which they do not.

Tsai, Yeh and Ravikumar

Banzhaf Interaction Index: Grabisch and Roubens (1999) further show that there is a unique interaction index that satisﬁes the interaction linearity, symmetry, dummy, and recursive axioms (but not the interaction eﬃciency axiom), and whose restrictions to singleton sets correspond to the Banzhaf values. They term this interaction index Banzhaf interaction index, which has the following closed form:

EBzf S (v, ℓ) = X

1 2d |S| S(v(T)), S Sℓ. (6)

It can be again shown that the Banzhaf interaction index does not satisfy the interaction eﬃciency axiom even when ℓ= 1; though they do satisfy the generalized 2-eﬃciency axiom, which can be stated as follows.

Axiom 8 (Generalized Interaction 2-Eﬃciency): Deﬁne the reduced function v[ij] : 2d 1 R given any i, j [d] as v[ij](S) = v(S) for all sets S containing both i and j, and v[ij](S [ij]) = v(S {i, j}) for all S containing neither i nor j. That is, the reduced function considers features i and j together as a group [ij]. Then the interaction index satisﬁes: ES [ij](v[ij], ℓ) = ES i(v, ℓ) + ES j(v, ℓ) for all S [d]\{i, j}, and ℓ= |S| + 1.

The generalized interaction 2-eﬃciency axiom above is an extension of the generalized 2-eﬃciency axiom of Grabisch and Roubens (1999) to account for arbitrary maximum interaction orders. It states that when features i, j form a group in the set function v[ij] with d 1 features, the importance of S [ij] equals the sum of importances of S i and S j with respect to the original set value function. When S = Ø and ℓ= 1, it reduces to the classical 2-eﬃciency axiom (Harsanyi, 1963) that indicates that the importance of [ij] as a group should be equal to the sum of importances of individual features i and j.

Shapley Taylor Interaction Index: Sundararajan et al. (2020) stipulate an additional interaction distribution (ID) axiom, which can be stated as follows.

Axiom 9 (Interaction distribution (Sundararajan et al., 2020)): Deﬁne v T parameterized by a set T [d] as v T (S) = 0 if T S and v T (S) = 1 otherwise. Then for all ℓ [d], and for all S with S T and |S| < ℓ, the interaction index satisﬁes: ES(v T , ℓ) = 0.

The key idea behind the ID axiom is to uniquely specify an interaction index for unanimity games {v T }T [d], given the interaction linearity, symmetry, dummy, and eﬃciency axioms. Since unanimity games form a basis for the set of all games, in the presence of interaction linearity axiom, we then get unique interaction indices. They thus show that there exists a unique interaction index that satisﬁes interaction linearity, symmetry, dummy, eﬃciency, and interaction distribution axioms and which they term Shapley Taylor index (for reasons which will become clearer in a later section when we discuss algebraic properties of various interaction indices). The Shapley Taylor interaction index has the following closed form:

ETaylor S (v, ℓ) =

( S(v(Ø)) , if |S| < ℓ. P T [d]/S |T|!(d |T| 1)!|S|

d! S(v(T)) , if |S| = ℓ. (7)

A key advantage of this interaction index is that it depends on the maximum interaction order ℓ, in contrast to previously proposed interaction indices such as the Shapley interaction and Banzhaf interaction indices. Indeed, in order for an interaction index to satisfy the

Faith-Shap: The Faithful Shapley Interaction Index

interaction eﬃciency axiom for maximum interaction order ℓ, it has to distribute the contributions among subsets in Sℓ, and hence has to be cognizant of the maximum interaction order ℓ. However, a key caveat of the interaction distribution axiom is that the speciﬁed attribution distribution inordinately favors the topmost interaction. As can be seen from Eqn.(7), the importance of a set S with |S| < ℓis only speciﬁed by the marginal contribution of S in the presence of the empty set, and not the presence of other subsets T [d]\S. This impoverishes lower-order interactions, which in turn hurts the meaningfulness of both lower and highest-order interactions as we will show in Section 5. Thus a key open question that this section has made salient is: how do we more naturally constrain interaction indices beyond interaction linearity, symmetry, dummy, and eﬃciency axioms, so as to obtain a unique interaction index?

4. Faith-Interaction Indices

In this section, in contrast to additional axioms, we draw from another viewpoint of singleton Shapley feature attributions: that they are faithful to the underlying value function.

Faithfulness of Singleton Shapley Values: Given singleton feature attributions {Ei}i [d], we can require that:

i S Ei, S [d].

Note that we can only ask for approximate rather than exact equality for all sets S, since exact equality would entail we solve 2d linear equalities (corresponding to the subsets of [d]) with d variables (corresponding to the d singleton feature attributions {Ei}i [d]), which may not always have a feasible solution. One approach to formalize such approximate equality is via weighted regression:

min E Rd+1 X

where µ : 2[d] 7 R+ { } is some weighting over the subsets S [d] which can be interpreted as the importance of diﬀerent coalitions. Note that the range of µ is the extended positive reals. When µ(S) = for some sets S, we can interpret the above as solving the constrained problem:

min E Rd+1 X

S [d] : µ(S)< µ(S)

!2 s.t. v(S) = X

i S Ei, S : µ(S) = .

It has been shown that we can recover the singleton Shapley values as the solution of the weighted regression problem above by setting µ(S) d 1 ( d |S|) |S| (d |S|) and µ(Ø) = µ([d]) =

(Charnes et al., 1988). And we can recover singleton Banzhaf values by using the uniform distribution µ(S) = 1/2d (Hammer and Holzman, 1992).

Tsai, Yeh and Ravikumar

From Singleton Attributions to Interaction Indices: In this section, we consider the generalization of the above to interaction indices, so that we now require:

T S,|T| ℓ ET (v, ℓ), S [d].

Again here we ask for approximate rather than exact equality since when the order of interactions is less than the number of features, so that ℓ< d, the latter would entail we solve 2d linear equalities with dℓvariables, which may not always have a feasible solution. Accordingly, we consider the following weighted regression problem as a formalization of the above:

E(v, ℓ) = arg min E Rdℓ

T S,|T| ℓ ET (v, ℓ)

where µ : 2d R+ { } is a coalition weighting function. And as before of µ(S) = for some sets S, we can interpret above as solving the constrained problem:

E(v, ℓ) = arg min E Rdℓ

S [d] : µ(S)< µ(S)

T S,|T| ℓ ET (v, ℓ)

s.t. v(S) = X

T S,|T| ℓ ET (v, ℓ), S : µ(S) = . (10)

We note that the range of the weighting function µ is not allowed to include zero since it is a necessary condition to ensure that there exists a unique minimizer (See Proposition 26 in the Appendix). This is not an issue in practice since we can always choose an arbitrary small positive value instead of zero to approximate the intended constraint that µ(S) = 0 for some S [d]. We can also see from Eqn. (9) that when the weighting function is inﬁnite for many subsets, this entails corresponding equality constraints on the interaction index, which may not have a feasible solution. We thus consider the following set of what we term proper weighting functions.

Deﬁnition 10 (Proper weighting function) We say that a weighting function µ : 2d 7 R+ { } is proper if µ(S) is ﬁnite for all S [d] with 1 S d 1.

This then leads to our deﬁnition of Faith-interaction indices.

Deﬁnition 11 (Faith-Interaction Indices): We say that E is a Faith-Interaction index, given any set value function v : 2d R and any maximum interaction order ℓ [d], if there exists a proper weighting function µ : 2d R+ { } such that E(v, ℓ) minimizes the corresponding weighted regression objective in Eqn.(10).

When the coalition weighting function µ is fully ﬁnite so that µ(S) are ﬁnite for all sets S [d], Faith-interaction indices have a simple closed-form expression as detailed in the following proposition.

Faith-Shap: The Faithful Shapley Interaction Index

Proposition 12 Any Faith-Interaction index E(v, ℓ) with respect to a ﬁnite weighting function µ( ) has the form:

S [d] µ(S)p(S)p(S)T

S [d] µ(S)v(S)p(S), (11)

where p : 2[d] {0, 1}dℓis speciﬁed as: p(S)[T] = 1[(T S)] for any T Sℓ.

When the coalition weighting function µ( ) is not fully ﬁnite, we have a linearly constrained least squares problem that does not have a closed form, but whose solution can be characterized via its Lagrangian (see more details in Proposition 29 in the Appendix).

4.1 Axiomatic Characterization of Faith-Interaction Indices

In this section, we investigate the axiomatic properties of our class of Faith-Interaction indices. We ﬁrst show that all faith-interaction indices satisfy the interaction linearity axiom.

Proposition 13 Faith-Interaction indices E satisfy the interaction linearity axiom.

For Faith-Interaction indices corresponding to ﬁnite coalition, weighting functions µ( ), this result easily follows from Proposition 12 that these are linear functionals of the set value function v( ). For Faith-Interaction indices where the weighting function is no longer ﬁnite for some sets S {Ø, [d]}, they solve a linearly constrained least squares problem which does not have a closed-form solution. But by a more nuanced analysis of its Lagrangian, we can again show that the interaction indices are linear functionals of the set value function v( ). We next show that Faith-Interaction indices also satisfy the interaction symmetry axiom provided that the weighting functions are permutation invariant ( symmetric ), and hence the weighting functions only depend on the size of the set.

Proposition 14 Faith-Interaction indices E satisfy the interaction symmetry axiom if and only if the weighting functions are permutation invariant, and hence only depend on the size of the set so that µ(S) is only a function of |S|.

We next consider the dummy axiom.

Proposition 15 Faith-Interaction indices E satisfy the interaction dummy axiom if the features behave independently of each other when forming coalitions in the weighting function so that the coalition weighting functions can be expressed as µ(S) Q i S pi Q j S(1 pj) for all S [d], where 0 < pi < 1 is the probability of the feature i to be present.

Proposition 15 implies that a dummy feature has no impact on other features when the weighting function treats features independently. So far, we have analyzed when Faith-Interaction indices satisfy the interaction linearity, symmetry, and dummy axioms. When they satisfy all three simultaneously, and the coalition weighting function is ﬁnite, then we can show that the latter has a speciﬁc algebraic form.

Tsai, Yeh and Ravikumar

Theorem 16 Faith-Interaction indices E with a ﬁnite weighting function satisfy the interaction linearity, symmetry, and dummy axioms if and only if the weighting function µ( ) has the following form:

d |S| i |S|

( 1)i |S|g(a, b, i), where g(a, b, i) =

( 1 if i = 0 Qj=i 1 j=0 a(a b)+j(b a2)

a b+j(b a2) if 1 i d,

for some a, b R+ with a > b such that µ(S) > 0 for all S [d].

Theorem 16 shows the surprising fact that Faith-Interaction indices satisfying the interaction linearity, symmetry, and dummy axioms with ﬁnite weighting functions have only two degrees of freedom: a, b R. Given these, we can fully specify the weighting function, and hence the corresponding Faith-Interaction indices. In Appendix D, we additionally show that the condition 1 > a > b a2 > 0 ensures that µ( ) is positive everywhere and also provides generalized guidance on setting the values a, b.

Faith-Banzhaf Interaction Index: As a ﬁrst application of this theorem, suppose in addition to the three axioms above, we additionally require the Faith-Interaction indices to satisfy generalized 2-eﬃciency. The following theorem shows that there is a unique Faith Interaction index satisfying these four axioms, which we term the Faith-Banzhaf index.

Theorem 17 (Faith-Banzhaf) For any d 3, there is a unique Faith-Interaction index that satisﬁes the interaction linearity, symmetry, dummy, and generalized 2-eﬃciency axioms, with its coalition weighting function given as µ(S) 1 2d for all S [d]. We term this unique interaction index as Faithful Banzhaf Interaction index (Faith-Banzhaf), which has the form:

EF-Bzf S (v, ℓ) = a(v, S) + ( 1)ℓ |S| X

|T| |S| |T| |S| 1 ℓ |S|

a(v, T), S Sℓ,

(13) where a(v, ) is the Möbius transform of v( ). Moreover, its highest-order interaction terms coincide with corresponding interaction terms from the Banzhaf interaction index introduced earlier:

EF-Bzf S (v, ℓ) = X

1 2d |S| S(v(T)) for all S Sℓwith |S| = ℓ. (14)

Our derivation of Faith-Banzhaf indices follows the pseudo-Boolean function approximation results from Grabisch et al. (2000).

Faith-Shapley Interaction Index: When moving from generalized 2-eﬃciency to the more natural interaction eﬃciency axiom, we have the following proposition.

Proposition 18 Faith-Interaction indices satisfy the interaction eﬃciency axiom if and only if the weighting functions satisfy µ(Ø) = µ([d]) = .

Faith-Shap: The Faithful Shapley Interaction Index

That the condition in the proposition is suﬃcient is a straight-forward consequence of the fact that µ(Ø) = µ([d]) = entails that the corresponding linear constraint be exactly satisﬁed, so that: P S SℓES(v, ℓ) = v([d]) and EØ(v, ℓ) = v(Ø), which is precisely the interaction eﬃciency axiom. We now have the machinery to present our main result on the unique Faith-Interaction index that satisﬁes the four (interaction counterparts of the) standard axioms that the singleton Shapley value satisﬁes.

Theorem 19 (Faith-Shap) There is a unique Faith-Interaction index that satisﬁes the interaction linearity, symmetry, dummy, and eﬃciency axioms, with its coalition weighting function given as:

µ(S) d 1 d |S| |S| (d |S|) for all S [d] with 1 |S| d 1, and µ(Ø) = µ([d]) = .

(15) We term this unique interaction index as the Faithful Shapley Interaction index (Faith Shap), which has the form:

EF-Shap S (v, ℓ) = a(v, S) + ( 1)ℓ |S| |S| ℓ+ |S|

|T|+ℓ 1 ℓ+|S| a(v, T), S Sℓ, (16)

where a(v, ) is the Möbius transform of v( ). Moreover, its highest-order interaction terms can be expressed as a weighted average of discrete derivatives:

EF-Shap S (v, ℓ) = (2ℓ 1)!

((ℓ 1)!)2 X

(ℓ+ |T| 1)!(d |T| 1)!

(d + ℓ 1)! S(v(T)) for all S Sℓwith |S| = ℓ.

When the maximum interaction order ℓ= 1, so that we only require singleton feature contributions, the explanation coincides with the classical singleton Shapley values. Thus for larger orders with ℓ> 1, Faith-Shap can be seen to be a natural generalization of the ﬁrst-order Shapley value. Note that the set of axioms it satisﬁes are (interaction extensions of) the classical linearity, symmetry, dummy, and eﬃciency axioms. As noted before in an interaction context these axioms alone do not uniquely specify an interaction index. In contrast to the less intuitive axioms such as recursive and interaction distribution axioms, we merely require an interaction extension of the faithfulness property of singleton Shapley values: that the interaction Shapley values approximate the given set value function for all possible subsets.

5. Contrasting Faith-Interaction with other Interaction Indices

In this section, we compare our Faith-Interaction indices, speciﬁcally Faith-Shap, with the other interaction indices introduced earlier.

Comparison with Shapley Interaction and Banzhaf Interaction Indices: As noted earlier, the Shapley interaction and Banzhaf interaction indices do not satisfy the interaction eﬃciency axiom, which states that the sum of interaction weights should equal the diﬀerence

Tsai, Yeh and Ravikumar

between the value function evaluated over the complete and empty sets. A critical advantage of the interaction eﬃciency axiom is that it forces the interaction index to distribute a ﬁxed contribution (diﬀerence between the value function evaluated over the complete and empty sets) among the diﬀerent interactions; without such a distributive requirement, the resulting weights can become quite non-intuitive. For instances of such non-intuitive behaviors, we refer to Sundararajan et al. (2020), who provided many simple examples where the sum of Shapley interaction values over all subsets diverges as the number of features increases, even when the value function is bounded and v([d]) = 1. Another caveat with these two interaction indices is that they are not cognizant of the maximum interaction order, and hence we cannot compute Shapley values that diﬀer with varying maximum interaction orders.

Comparison with Shapley Taylor index: The Shapley Taylor index does satisfy the four axioms of interaction linearity, symmetry, dummy, and eﬃciency. However, as noted earlier these four axioms do not uniquely determine interaction indices. The ﬁfth axiom Shapley Taylor index then imposes for uniqueness is the interaction distribution axiom, which has caveats of imbalanced distributions of values to coalitions of diﬀerent orders, namely, inordinately favoring the maximum interaction order. In particular, the interaction distribution axiom states that higher than max-order interaction values (order > ℓ) be distributed to the max-order interactions (order = ℓ), but these max-order terms end up unable to solely explain all higher-order interactions. On the other hand, it entails lower than max order interactions (order < ℓ) that do not take into account sub-coalitions other than the empty set, which can be contrasted for instance with singleton Shapley value that explicitly takes into account even higher order coalitions that contain the single feature. Thus the interaction distribution has the consequence of making both lower and max-order interactions less faithful to the model. In contrast, in our Faith-Interaction indices, even lower-order interaction weights take into account all possible coalitions, and where the weights are balanced so that the overall set of interaction indices optimally approximates the behavior of the underlying value function.

5.1 Examples

Example 1: We illustrate the diﬀerence between these interaction indices using a function with diminishing marginal utility. Consider the following value function with 11 features:

( 0 , if |S| 1. |S| p |S| 2 , otherwise. (18)

This function represents the payoﬀwhen any subset of 11 people work on a task. Each person contributes 1 unit to the overall payoﬀ, and the task requires at least 2 people. However, the marginal utility is diminishing in nature, since any two people also have a probability of p of being non-cooperative. Given this payoﬀfunction, it is worth reﬂecting on what the attributions to individuals should be. While it might seem that zero is a good value since at least two people are needed for the task, this attribution would only correspond to the marginal contribution of an individual player i.e. how much a player would contribute when they are by themselves. Whereas we would like our attributions to also take into account

Faith-Shap: The Faithful Shapley Interaction Index

larger coalitions, and marginal contributions to such larger coalitions: this is one of the motivations for considering coalitional game-theoretic indices. Once we do so, then it can be seen that an individual eﬀect of one is much more reasonable. Similarly, we would expect that the pairwise interaction eﬀects be close to p. In Table 2, we list the values for diﬀerent interaction indices for p = 0.1, 0.2. When the maximum interaction order ℓ= 1, all indices are similar since their restrictions to singleton are the Shapley/Banzhaf values. When the maximum interaction order ℓ= 2, our Faith-Shap accurately captures individual contribution and pairwise interaction eﬀects by assigning 0.95/0.95 and 0.091/ 0.191 for order 1 and 2 and for p = 0.1/0.2 respectively, which are very close to the intuitions we outlined earlier. However, the Shapley Taylor index assigns the individual eﬀect of i by using the marginal v({i}) v(Ø), which can be highly inaccurate since such a marginal contribution does not take into account marginal contributions to larger coalitions. For p = 0.1, Shapley Taylor along with Interaction Shapley assigns a positive/zero value to the interaction eﬀect, which suggests that forming groups has complimentary/no eﬀects. On the contrary, Banzhaf interaction and Faith-Banzhaf give negative values for interaction between players, which correctly reﬂects the decrease in the marginal utility of this game. For p = 0.2, the Shapley Taylor index is uniformly zero for any order. This highlights the other drawback of the Shapley Taylor index: the impoverished lower-order interaction indices make the max-order indices less faithful to the model. Speciﬁcally, for p = 0.2, and with d = 11 players, we can see that v([d]) = v( ) = 0. We have already seen that ETaylor {i} (v, ℓ) = 0, for i [d]. For ℓ= 2, we then have that the summation of the max-

order (i.e. order two) indices equals v([d]) v( ) Pd i=1 ETaylor {i} (v, ℓ) = 0 by the eﬃciency axiom. Since all max-order indices have the same value by the symmetry axiom, the maxorder indices are uniformly zero. In this case, the Shapley Taylor indices do not take into account the function values v(S) with ℓ |S| < d, and can be arbitrarily unfaithful to these orders. Here, the Banzhaf interaction and Faith-Banzhaf again correctly reﬂect the negative interaction between players. However, the Banzhaf interaction value gives a value close to 0 for the ﬁrst-order indices. Taken together with its negative interaction eﬀects, it might seem that coalitions can only be hurtful to the payoﬀ, which is misleading since the total utility is positive when 2 to 10 players are present. On the other hand, our Faith-Banzhaf gives a positive value close to 1 for individual eﬀects of order 1. Taken together with its negative interaction eﬀects, the value given by the Faith-Banzhaf seems more intuitive: every single player contributes to the utility, while each pair of players hurts the utility. Another instructive viewpoint for interaction values is by inspecting their utility for approximating the overall payoﬀfunction. In Figure 1 and 2, we approximate the function v(S) using P T S,|T| 2 ET (v, ℓ) for diﬀerent interaction indices. We can see that our Faith Shap/Faith-Banzhaf are (almost) faithful to all orders except for |S| = 1. However, the Shapley Taylor index is only fully faithful to the model when the order is 0, 1, 11, and curves for other interaction indices are unfaithful.

Example 2: We provide another example, this time with increasing marginal utility. Consider a family who is in the wind energy business, with d = 11 family members. Currently, the family owns 1 wind turbine, and they can get 3 units of revenue per wind turbine they own. Now, each family member is considering whether to manage a wind turbine. To build

Tsai, Yeh and Ravikumar

p = 0.1 p = 0.2 ℓ= 1 ℓ= 2 ℓ= 1 ℓ= 2 Order 1 Order 1 Order 2 Order 1 Order 1 Order 2 Faith-Shap 0.5 0.95 -0.091 0 0.95 -0.191 Shapley Taylor 0.5 0 0.1 0 0 0 Interaction Shapley 0.5 0.5 0 0 0 -0.1 Banzhaf Interaction 0.51 0.51 -0.113 0.009 0.009 -0.213 Faith-Banzhaf 0.51 1.08 -0.113 0.009 1.08 -0.213

Table 2: Values for diﬀerent interaction indices of diﬀerent orders for p = 0.1, 0.2 with diﬀerent maximum interaction orders. Note that the value function is symmetric with respect to players, so we use order 1 and 2 to denote importance scores of any single player and interaction of any two players. Note that EF-Shap Ø (v, ℓ) = 0 and EF-Bzf Ø (v, ℓ) = 0.24 for both p = 0.1 and p = 0.2.

Figure 1: Function approximation of Eqn.(18) using diﬀerent interaction indices for p = 0.1 with the maximum interaction order ℓ= 2.

Figure 2: Function approximation of Eqn.(18) using diﬀerent interaction indices for p = 0.2 with the maximum interaction order ℓ= 2.

x wind turbines, the cost is described by the function cost(x) = x + 2 log(x + 1), as they may get a discount from the constructor to build more wind turbines at the same time. If exactly one member chooses to manage a wind turbine, the building cost will be 0 since the family already owns one wind turbine. The total revenue for the family when S is the set of members that participate in building new wind turbines can be described by the following function:

0 , if |S| = 0. 3 , if |S| 1. 3|S| (|S| 2 log(|S| + 1)) , if 2 |S| 11.

Faith-Shap: The Faithful Shapley Interaction Index

This function has an increasing marginal utility since the marginal cost is decreasing. Therefore, we would expect the interaction eﬀect to be positive. However, from Table 3, only Faith-Shap, Faith-Banzhaf and Banzhaf interaction indices capture this eﬀect. Moreover, the Faith-Shap and Faith-Banzhaf indices have the following intuitive interpretation: Having one more member joining the family business increases the total revenue by 1.20/1.19 unit, with 0.07/0.09 additional unit of revenue when two members join together since they are cooperative. In contrast, we can not interpret the Banzhaf interaction index for orders 1 and 2 jointly since it is not cognizant of the maximum interaction order ℓ.

Indices ℓ= 1 ℓ= 2 Order 1 Order 1 Order 2 Faith-Shap 1.55 1.20 0.07 Shapley Taylor 1.55 3 -0.29 Shapley Interaction 1.55 1.55 -0.12 Faith-Banzhaf 1.65 1.19 0.09 Banzhaf Interaction 1.65 1.65 0.09

Table 3: Values for diﬀerent interaction indices of diﬀerent orders with the maximum interaction order ℓ= 2. Note that the value function is symmetric with respect to players, so we use order 1 and 2 to denote importance scores of any single player and interaction of any two players. Note that EF-Shap Ø (v, ℓ) = 0 and EF-Bzf Ø (v, ℓ) = 0.48 for the indices corresponding to empty sets.

Figure 3: Function approximation of Eqn.(19) using diﬀerent interaction indices with the maximum interaction order ℓ= 2.

6. Computation of Faithful Shapley Interaction Index

The exact computation of Faith-Shap indices via Eqn.(16) is intractable in general since it involves computations of O(2d) Möbius transforms of diﬀerent subsets. However, when the value function is lower-order, the Faith-Shap interaction indices can be computed in polynomial time.

Deﬁnition 20 (Order of value functions) A value function v has order ℓv N if ℓv is the smallest integer such that a(v, R) = 0 for all R [d] with |R| > ℓv.

From Eqn.(4), this entails the value function with order ℓv can be written as v = P R [d],|R| ℓv a(v, R)v R. When a game only involves the cooperation of a small number of players, its value function can usually be written as a summation of lower-order basis functions. For instance, a basis function v R has order 1, example value functions in Section 5.1 have order 2, the value function (Garg et al., 2012) deﬁned as summations of pairwise distances within a coalition used in clustering also have order 2. When ℓv = O(ℓ) > ℓ, by using Eqn.(16), the exact computation of Faith-Shap indices of a subset S only requires time complexity O(d O(ℓ)) since we only need to consider d ℓv |S| = O(d O(ℓ)) Möbius transforms of subsets.

Tsai, Yeh and Ravikumar

For the computation of the Faith-Shap for general functions, calculating the exact Faith Shap values requires 2d model evaluations. We sample each coalition S [d] with probability d 1 ( d |S|)|S|(d |S|), and solve

arg min E Rdℓ

T S,|T| ℓ ET (v, ℓ)

T [d],|T| ℓ ET (v, ℓ) = v([d]), and EØ(v, ℓ) = v(Ø).

We empirically show the sampling approach provides more accurate estimates with fewer model evaluations in Section 8.1. We defer deriving approximation results for sampling as well as other approximation methods for computing Faith-Shap to future work. There is a rich body of work on developing such approximation results for the ﬁrst-order Shapley values (see Mitchell et al. (2022); Covert and Lee (2020), which could be extended to the Faith-Shap setting.

7. Algebraic Properties of Faith-Interaction Indices

In the following two sub-sections, we discuss how Faith-Shap can be represented as a cardinal index, as well as through the lens of a multilinear approximation.

7.1 Cardinal Indices

Grabisch and Roubens (1999) show that any interaction index (they only consider the classical case with maximum interaction order ℓ= d) that satisﬁes the linearity, dummy, and symmetry axioms necessarily has the following form:

ES(v, d) = X

T [d]\S p|S| |T| Sv(T), S [d], (21)

and for some family of constants {ps t}s [0:d],t [0:d s]. They term this class of interaction indices as cardinal interaction indices. Of course this is a large class, and it is not apriori clear how to further constrain the indices so as to get speciﬁc values for the constants {ps t}. We remark in passing that Shapley and Banzhaf interaction indices impose additional structure on the constants {ps t}. We can also consider the class of probabilistic interaction indices:

ES(v, d) = X

T [d]\S p S T Sv(T),

where for any S [d], the constants {p S T }T [d]\S form a probability distribution on [d]\S. We can then deﬁne cardinal-probabilistic indices as those indices that are both cardinal and probabilistic interaction indices, so that p S T = p|S| |T|, for some family of constants {ps t}s [1:d],t [0:d s] that satisfy: d s X

Faith-Shap: The Faithful Shapley Interaction Index

Fujimoto et al. (2006) shows that indices that satisfy certain additivity, monotonicity, symmetry, and dummy partnership axioms are necessarily cardinal probabilistic indices. As Fujimoto et al. (2006) shows, Shapley and Banzhaf interaction indices do fall into this class. One could of course extend these notions of cardinal, probabilistic, and cardinal-probabilistic indices to be cognizant of the maximum interaction order ℓ [d]. It is an interesting open question to investigate extensions of results of Fujimoto et al. (2006) to such a sub-class of cardinal-probabilistic indices cognizant of the max-interaction order. In this section, we provide a modest initial result along these lines, focusing on the top interaction level of the interaction index.

Proposition 21 For any maximum interaction order 1 ℓ d, and for any set value function v : 2d 7 R, the top level of the Faithful Shapley Interaction index can be expressed as a cardinal-probabilistic index:

EF-Shap S (v, ℓ) = X

T [d]\S pℓ |T| S(v(T)), S [d] with |S| = ℓ, (22)

where pℓ t = (2ℓ 1)!(ℓ+t 1)!(d t 1)!

((ℓ 1)!)2(d+ℓ 1)! . Moreover, it satisﬁes Pd ℓ t=0 d ℓ t pℓ t = 1.

Therefore, the top level of the Faithful Shapley Interaction index captures the interactions of features in S in the presence of all subsets T [d]\S.

7.2 Multilinear Formulation

Any set value function v : 2[d] 7 R has a unique multi-linear extension g : [0, 1]d 7 R, also referred to the Owen multilinear extension (Owen, 1972), given as:

T [d] v(T) Y

i T (1 xi), x [0, 1]d.

For any set S [d], with S = {i1, . . . , is}, denote its S-derivative as Sg(x) := sg(x) xi1... xis .

7.2.1 Path Integrals

Grabisch et al. (2000) show that Shapley interaction index can be written as:

EShap S (v, d) = Z 1

x=0 Sg(x, . . . , x)dx, S [d].

That is, we can obtain the Shapley interaction index by integrating the S-derivative along the diagonal of the unit hypercube. On the other hand, the Banzhaf interaction index can be written as:

EBzf S (v, d) = Z

x [0,1]d Sg(x)dx, S Sd.

That is, we can obtain the Banzhaf interaction index by integrating the S-derivative over the entire unit hypercube. In this case, it also has the closed form: Sg(1/2, . . . , 1/2).

Tsai, Yeh and Ravikumar

Fujimoto et al. (2006) show that any cardinal probabilistic index E has the form:

ES(v, d) = Z 1

x=0 Sg(x, . . . , x)d F|S|(x), S Sd,

for some family of CDFs {Fs}s [d]. That is, we can obtain any cardinal probabilistic index by integrating the S-derivative along the diagonal of the unit hypercube with respect to some distribution over [0, 1]. It is an interesting open question whether we could extend these results from Grabisch et al. (2000) and Fujimoto et al. (2006) to interaction indices that are cognizant of the maximum interaction order ℓ [d]. In this section, we provide a modest initial result along these lines, focusing on the top interaction level of the interaction index.

Proposition 22 For any maximum interaction order 1 ℓ d, and for any set function v : 2d 7 R, the top level of the Faithful Shapley Interaction index value can be expressed as:

EF-Shap S (v, ℓ) = Z 1

x=0 Sg(x, , x)d Ix(ℓ, ℓ), S Sℓwith |S| = ℓ, (23)

where Ix(ℓ, ℓ) is cumulative distribution function of the beta distribution B(x; ℓ, ℓ).

7.2.2 Taylor Expansion

In contrast to path integrals, Sundararajan et al. (2020) use the Taylor expansion of g(1) = v([d]) around g(0) = v(Ø) Taylor derivations to derive their interaction index. Speciﬁcally, they show that Shapley Taylor index ETaylor S (v, ℓ) is equal to the |S|th term of the (ℓ 1)th

order Taylor expansion of g( ) with Lagrange remainder:

j! g(0) + Z 1

(ℓ 1)! g(ℓ)(x, , x)dx

|S|=j Sg(0) + X

x=0 ℓ(1 x)ℓ 1 Sg(x, , x)dx

(Sundararajan et al., 2020, Theorem 3)

|S|=j ETaylor S (v, ℓ) + X

|S|=ℓ ETaylor S (v, ℓ),

where g(j)(x) is the jth derivative of the function g(x, , x), ETaylor S (v, ℓ) = Sg(0) for |S| < ℓand ETaylor S (v, ℓ) = R 1 x=0 ℓ(1 x)ℓ 1 Sg(x, , x)dx with |S| = ℓ. This can be seen to result in impoverished lower-order subset interactions, which now no longer take into account higher-order coalitions that include that subset.

7.2.3 Pseudo-Boolean Function Approximation

While we have so far discussed the continuous multi-linear extension of a set value function v : 2[d] 7 R, we can also simply consider its equivalent pseudo-Boolean counterpart g F

Faith-Shap: The Faithful Shapley Interaction Index

with F = {g : {0, 1}d 7 R}:

T [d] v(T) Y

i T (1 xi), x {0, 1}d.

One can also derive the pseudo-Boolean function g E corresponding to interaction indices E, and ask for interaction indices with pseudo-Boolean counterparts g E that best approximate the pseudo-Boolean counterpart g of the set value function. Speciﬁcally, given a maximum interaction order ℓ [d] and an interaction index E Rdℓ, its pseudo-Boolean counterpart g E F is deﬁned as:

g E(x) := X

T [d],|T| ℓ ET (v, ℓ) Y

i T xi, x {0, 1}d.

Hammer and Holzman (1992) and Grabisch et al. (2000) consider solving for the best ℓ2-norm approximation by the function g E( ) with degree up to ℓ. That is, g g E 2 = q P x {0,1}d(g(x) g E(x))2. Using this perspective, we can see that Faith-Banzhaf interaction indices can in turn be related to such a function approximation:

EF-Bzf(v, ℓ) = min E Rdℓ g(x) g E(x) 2 = min E Rdℓ

T S,|T| ℓ ET

and where the solution has the closed-form expression we detail in Theorem 16. For the singleton attribution case, with max order ℓ= 1, Ding et al. (2008) and Ruiz et al. (1998) consider µ-norm function approximations g(x) g E(x) µ = q P x {0,1}d µ(x)(g(x) g E(x))2,

but where µ only depends on x 1, and where µ(0) and µ(1) can both be inﬁnity. Ding et al. (2008) provide a closed-form expression for g E(x), while Ruiz et al. (1998) analyze its axiomatic properties. For the speciﬁc case where the probability of coalition S can be expressed as µ(x) = Q i:xi=1 pi Q j:xj=0(1 pj) for some 0 < pi < 1 indicating the probability of the feature i being present, Ding et al. (2010) and Marichal and Mathonet (2011) considers solving the best ℓth order polynomial approximation under µ norm. In contrast to the above work, our developments could be cast as pseudo-Boolean approximations for the general weighted norm case µ, for general weighting functions µ( ) without stringent structural assumptions, and while allowing for arbitrary maximum interaction orders ℓ [d].

8. Experiments

We ﬁrst provide some experiments validating the relative computational eﬃciency of computing our Faith-Interaction indices, followed by quantitative and qualitative demonstrations of their use as explanations of ML models over a language dataset. The language dataset we use throughout the experiment is the simpliﬁed IMDB (Maas et al., 2011) dataset, where the model only uses the ﬁrst two sentences of movie reviews as input, and predicts the probability of the reviews being positive. The model being explained is a BERT language model (Devlin et al., 2018) with 0.82 accuracy on the test set.

Tsai, Yeh and Ravikumar

8.1 Computational Eﬃciency

Exact computation of interaction indices that aggregate over all possible feature subsets exactly typically requires 2d model evaluations (with d features) which is impractical in most machine learning applications. A key advantage of our Faith-Interaction indices, as compared to other recently proposed interaction indices such as the Shapley Taylor index and Shapley Interaction index, is that they can be computed by solving a weighted least squares problem. As we empirically show in this section, this enables us to provide more accurate estimates with fewer model evaluations, compared to the other recent approaches that employ permutation-based sampling methods.

Methods Simpliﬁed IMDB Bank marketing

Shapley Taylor 2781.2 7368.3

Shapley Interaction 3960.4 10421.7

Faith-Shap 887.4 893.7

Table 4: Run-time comparison of diﬀerent Shapley values. Each value represents the number of evaluations required to achieve averaged squared distance less than 10 3.

Setup: To demonstrate the computational eﬃciency of Faith-Interaction indices, we compare our proposed Faith Shap with Shapley interaction and Shapley Taylor interaction indices using diﬀerent estimation methods. For the Faith Shap interaction index, We use Eqn.(20) and solve the corresponding linear regression problem with ℓ1 regularization, and regularization parameter α = 10 3 and α = 10 6 for the simpliﬁed IMDB dataset and the bank dataset. For the Shapley Taylor interaction and Shapley Interaction indices, we use the permutation-based sampling methods ( see exact algorithms in the Appendix B). We compare these indices in two datasets: (1) language dataset: we randomly choose 50 samples with d = 15 words from the test set of simpliﬁed IMDB and set ℓ= 2. We treat each word in each text sentence as an input feature and set the baseline to empty text so that we simply remove a word if it does not appear in a coalition. We use BERT prediction scores as outputs of the value function. (2) Portuguese marketing dataset (Moro et al., 2014): this is a tabular dataset with d = 17 features. We train an xgboost model (Chen and Guestrin, 2016) with 90.5% accuracy on the test set. We also compare them on synthetic sparse functions in the Appendix. To measure how close an interaction index is to its ground-truth value, we use two evaluation metrics: (1) averaged squared distance of all top indices, E Eest 2 2 / d ℓ , and (2) precision at 10, which we measure the proportion of top-10 feature interactions (with respect to absolute value) in the top-10 ground-truth interactions as top interactions are more critical when these indices are used in XAI. (3) We also report run-time, as measured by the number of model evaluations required to achieve averaged squared distance to be smaller than 10 3. We also note that we drop the lower-order indices and only compare top-order indices (order=ℓ) since computing lower-order Shapley Taylor indices are trivial. We sampled all 2d diﬀerent coalitions to compute the ground truth of each index. Each evaluation metric is reported by averaging 50 diﬀerent inputs with 20 diﬀerent random seeds.

Faith-Shap: The Faithful Shapley Interaction Index

Results: From Figure 4 and Table 4, we see that Faith-Shap can be estimated more accurately and uses fewer model evaluations: in both language data and sparse settings, as well as in terms of all evaluation metrics.

(a) Averaged squared distance (language).

(b) Precision@10 (language).

(c) Average squared distance (bank).

(d) Precision@10 (bank).

Figure 4: Comparison of Faith-Shap, Shapley Taylor and Shapley interaction indices in terms of computational eﬃciency in language data and synthetic sparse functions. The shaded area indicates the total area between one standard deviation above and below.

8.2 Explanations on a Language Dataset

In this section, we use our Faith-Shap interaction index to explain the BERT model on the simpliﬁed IMDB dataset. The experimental setting is the same as described in Section 8.1 except that we set the maximum interaction order ℓ= 2, the regularization parameter α = 10 3 for Lasso, and sample 4000 coalitions for each text in the simpliﬁed IMDB. Table 5 shows some of the interesting interactions we found.

Index Sentences (bold words are the interactions with the highest (absolute) importance values)

Model Prediction

Interaction score

1 I have Never forgot this movie. All these years and it has remained in my life. Positive 0.818

2 TWINS EFFECT is a poor ﬁlm in so many respects. The only good element is that it doesn t take itself seriously.. Negative -0.375

3 I rented this movie to get an easy, entertained view of the history of Texas. I got a headache instead. Negative 0.396

4 Truly appalling waste of space. Me and my friend tried to watch this ﬁlm to its conclusion but had to switch it oﬀabout 30 minutes from the end. Negative 0.357

5 I still remember watching Satya for the ﬁrst time. I was completely blown away. Positive 0.283

Table 5: Top interactions of diﬀerent examples on IMDB. See more results in Appendix B.

In the ﬁrst two examples, we see non-complementary interaction eﬀects. In the ﬁrst example, while the importance values of the individual words Never and forgot are negative (as shown in Tables 8, 10 in the Appendix), their joint eﬀect as shown in the table here is

Tsai, Yeh and Ravikumar

extremely positive. Similarly, for the second, the words only and good are individually positive, while their joint eﬀect is strongly negative. The fourth and ﬁfth examples show more subtle non-complementarity eﬀects. In the fourth example, while the individual words headache and instead have negative importance scores, their joint eﬀect is positive, since the total eﬀect of the phrase is less than the sum of the individual importance of these two words. The last example shows the eﬀect of complementarity: words in a phrase are only meaningful when all words are present, and hence have a positive interaction eﬀect.

In Appendix B, we further show the top-15 important interactions and compare them to those from Shapley Taylor index, Shapley interaction index, Integrated Hessian (Janizek et al., 2021) and Archipelago (Michael et al., 2020).

We ﬁnd that although Faith-Shap, Shapley Taylor index, and Shapley interaction index capture similar feature interactions, the later two methods are not able to ﬁnd meaningful singleton features. The reasons are (1) the ﬁrst-order terms of the Shapley Taylor indices are trivial, which is the diﬀerence between predicted probabilities of a sentence containing only one word and an empty sentence (a baseline) (2) importance scores for the ﬁrst order Shapley value and Shapley interaction index are not comparable since the Shapley interaction index does not satisfy the eﬃciency axiom. For integrated Hessian, we empirically ﬁnd that the BERT model assigns higher values for self-interactions and punctuation marks.

9. Related work

Related work in cooperative theory: in cooperative game theory, a set function v( ) with v( ) = 0 corresponds to a transferable utility game (TU-game), and a set function with order ℓis called an ℓ-additive TU-game (Grabisch et al., 2016). Therefore, our approach can be viewed as a least squares approximation of a TU-game by an ℓ-additive TU-game; see for instance Eqn. (10). Variants and special cases of this least squares approximation problem have been studied in the cooperative game theory ﬁeld. For ℓ= 1, Charnes et al. (1988) ﬁrst give general solutions when the weighting function is symmetric and positive, and show that the Shapley value results from a particular choice of the weighting function. Ruiz et al. (1996, 1998) consider the same setting, and study the axiomatic properties of the solutions of the least squares problems. Ding et al. (2008) further generalizes the previous results by considering the cases where some weights are allowed to be zero. For the case where maximum interaction order ℓ> 1, Hammer and Holzman (1992) and Grabisch et al. (2000) solve the least squares problem when the weighting function is a constant, and show that the top-level coeﬃcients coincide with those of the Banzhaf interaction indices of order ℓ. Ding et al. (2010) and Marichal and Mathonet (2011) consider a certain weighted version of the problem, and propose weighted Banzhaf interaction indices. Grabisch and Rusinowska (2020) consider the approximation problem under the constraints that both TU-games yield the same Shapley value. Marichal and Roubens (1999) extend the Shapley value and propose the chaining interaction index whose deﬁnition is based on maximal chains of ordered sets. For more details on this line of work, see the recent book (Grabisch et al., 2016). From the lens of TU-game approximation, our work could be viewed as allowing for general weighting functions µ( ) without stringent structural assumptions, as well as arbitrary maximum interaction orders ℓ [d].

Faith-Shap: The Faithful Shapley Interaction Index

While the Shapley value focuses on a fair allocation among players, there exist other solution concepts in cooperative game theory that have diﬀerent purposes. For example, core (Gillies, 1953) allocates the total payoﬀin a stable manner, nucleolus (Schmeidler, 1969) is a solution lying in the core with unique axiomatic properties, and the Nash bargaining solution (Nash Jr, 1950) focuses on two-player bargaining problems. Extending these concepts to interaction contexts may lead to diﬀerent solutions with diﬀerent properties, and are interesting topics for future work.

Feature attribution in XAI: When Faith-Shap is used in XAI, it can be seen as a local and post-hoc approach that extracts singleton features and feature interaction importances for a given prediction. It can be viewed as an order ℓpolynomial model, with desired axiomatic properties, that explains how a black-box model behaves locally. Explaining complex models with an interpretable local surrogate model has been substantially studied in XAI. LIME (Ribeiro et al., 2016) use a local linear model to describe a prediction made by the model being explained. Model Agnostic Supervised Local Explanations (MAPLE) (Plumb et al., 2018) utilize local linear modeling and dual interpretation of random forests. Anchor LIME (Ribeiro et al., 2018) uses IF-THEN rules to generate explanations. Model Understanding through Subspace Explanations (MUSE) (Lakkaraju et al., 2019) explains how the model behaves in subspaces characterized by certain features of interest. Kernel SHAP (Lundberg and Lee, 2017) can be viewed as ﬁrst-order Faith-Shap. These approaches assign credits to each individual feature based on how much it inﬂuences the models prediction and do not aim to explain how feature interactions aﬀect the model.

Feature interactions in XAI: Feature interactions have also been investigated in the machine learning community. Tsang et al. (2017) detect feature interactions by examining weight matrices of DNNs. Tsang et al. (2018) disentangle complex feature interactions within DNNs by forcing the weights matrices to be block-diagonal. Singh et al. (2018) build hierarchical explanations within a feed-forward neural network using hierarchical clustering of features. Cui et al. (2019) and Janizek et al. (2021) explain pairwise interactions in neural networks, and Bayesian neural networks respectively via second-order derivatives. Lundberg et al. (2020) quantify feature interactions in tree-based models using the Shapley interaction index. Tsang et al. (2020) proposes Archipelago, which quantiﬁes the interaction within a feature group S via the marginal importance v(S) v( ). While these approaches have taken signiﬁcant steps towards understanding feature interactions, they are limited to a certain kind of model architecture. Tsang et al. (2017) and Tsang et al. (2018) can only be applied to feed-forward neural network architectures, but not LSTMs and CNNs. While Singh et al. (2018) can be applied to LSTMs and CNNs, it is unclear how to apply it to recent innovations such as transformers. The approach of Cui et al. (2019) can only be applied to Bayesian neural networks, and Janizek et al. (2021) can only be applied to models where second-order derivatives exist everywhere. Lundberg et al. (2020) only study tree-based models. While Archipelago (Tsang et al., 2020) is a post-hoc explanation approach that can be applied to any model, Archipelago measures the importance of a feature group as a whole, while Faith-Shap measures the marginal eﬀects of interaction among feature groups. Also, the Archipelago does not obey the dummy axiom and satisﬁes the eﬃciency axiom only for certain kinds of functions.

Tsai, Yeh and Ravikumar

10. Conclusion

Deriving unique interaction indices that satisfy the interaction extensions of the individual Shapley axioms has been a long-standing open problem. Existing approaches introduce additional less natural axioms, with some even sacriﬁcing natural ones such as eﬃciency, in order to specify unique interaction indices. In this work, we take the alternate route of considering the family of what we term faithful interaction indices, which similar to individual Shapley values, aim to approximate the given set value function for all feature subsets. We show that when restricting to the class of faithful interaction indices, we obtain a unique interaction index that satisﬁes the interaction extensions of the individual Shapley axioms, which we term the Faithful Shapley Interaction Index (Faith-Shap). We show the beneﬁts of the faithful Shapley interaction index via speciﬁc games of interest where there is diminishing return and increasing return and connect the Faith-Shap to cardinal probabilistic indices and multilinear approximations. Finally, we show that Faith-Shap is eﬃcient to estimate thanks to its connection to weighted linear regression in sparse settings, and provide some qualitative results for their use as explanations of machine learning models on a real language dataset.

Acknowledgments

The authors would like to thank Michel Grabisch for his generous feedback and thank Hung Hsun Yu for providing assistance in deriving the closed-form solution of Faith-Shap. The authors also acknowledge the support of NSF via IIS-1909816 and IIS-1955532.

John F Banzhaf III. Weighted voting doesn t work: A mathematical analysis. Rutgers L. Rev., 19:317, 1964.

A Charnes, B Golany, M Keane, and J Rousseau. Extremal principle solutions of games in characteristic function form: core, chebychev and shapley value generalizations. In Econometrics of planning and eﬃciency, pages 123 133. Springer, 1988.

Hugh Chen, Joseph D Janizek, Scott Lundberg, and Su-In Lee. True to the model or true to the data? ar Xiv preprint ar Xiv:2006.16234, 2020.

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785 794, 2016.

Ian Covert and Su-In Lee. Improving kernelshap: Practical shapley value estimation via linear regression. ar Xiv preprint ar Xiv:2012.01536, 2020.

Tianyu Cui, Pekka Marttinen, and Samuel Kaski. Learning global pairwise interactions with bayesian neural networks. ar Xiv preprint ar Xiv:1901.08361, 2019.

Faith-Shap: The Faithful Shapley Interaction Index

Anupam Datta, Shayak Sen, and Yair Zick . Algorithmic transparency via quantitative input inﬂuence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on, pages 598 617. IEEE, 2016.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. ar Xiv preprint ar Xiv:1810.04805, 2018.

Guoli Ding, Robert F Lax, Jianhua Chen, and Peter P Chen. Formulas for approximating pseudo-boolean random variables. Discrete Applied Mathematics, 156(10):1581 1597, 2008.

Guoli Ding, Robert F Lax, Jianhua Chen, Peter P Chen, and Brian D Marx. Transforms of pseudo-boolean random variables. Discrete Applied Mathematics, 158(1):13 24, 2010.

Christopher Frye, Damien de Mijolla, Laurence Cowton, Megan Stanley, and Ilya Feige. Shapley-based explainability on the data manifold. ar Xiv preprint ar Xiv:2006.01272, 2020.

Katsushige Fujimoto, Ivan Kojadinovic, and Jean-Luc Marichal. Axiomatic characterizations of probabilistic and cardinal-probabilistic interaction indices. Games and Economic Behavior, 55(1):72 99, 2006.

Vikas K Garg, Y Narahari, and M Narasimha Murty. Novel biobjective clustering (bigc) based on cooperative game theory. IEEE Transactions on Knowledge and Data Engineering, 25(5):1070 1082, 2012.

Amirata Ghorbani and James Zou. Data shapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning, pages 2242 2251. PMLR, 2019.

Donald B Gillies. Some theorems on n-person games. princeton university. Unpublished doctoral dissertation.)[a AMC], 1953.

Michel Grabisch and Marc Roubens. An axiomatic approach to the concept of interaction among players in cooperative games. International Journal of Game Theory, 28(4):547 565, 1999.

Michel Grabisch and Agnieszka Rusinowska. k-additive upper approximation of tu-games. Operations Research Letters, 48(4):487 492, 2020.

Michel Grabisch, Jean-Luc Marichal, and Marc Roubens. Equivalent representations of set functions. Mathematics of Operations Research, 25(2):157 178, 2000.

Michel Grabisch et al. Set functions, games and capacities in decision making, volume 46. Springer, 2016.

Ulrike Grömping. Estimators of relative importance in linear regression based on variance decomposition. The American Statistician, 61(2):139 147, 2007.

Tsai, Yeh and Ravikumar

Peter L Hammer and Ron Holzman. Approximations of pseudo-boolean functions; applications to game theory. Zeitschrift für Operations Research, 36(1):3 21, 1992.

John C Harsanyi. A simpliﬁed bargaining model for the n-person cooperative game. International Economic Review, 4(2):194 220, 1963.

Joseph D Janizek, Pascal Sturmfels, and Su-In Lee. Explaining explanations: Axiomatic feature interactions for deep networks. Journal of Machine Learning Research, 22(104): 1 54, 2021.

Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gürel, Bo Li, Ce Zhang, Dawn Song, and Costas J Spanos. Towards eﬃcient data valuation based on the shapley value. In The 22nd International Conference on Artiﬁcial Intelligence and Statistics, pages 1167 1176. PMLR, 2019.

Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. Faithful and customizable explanations of black box models. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 131 138, 2019.

Richard Harold Lindeman. Introduction to bivariate and multivariate analysis. Scott Foresman & Co, 1980.

Scott M Lundberg and Su-In Lee. A uniﬁed approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pages 4765 4774, 2017.

Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex De Grave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. From local explanations to global understanding with explainable ai for trees. Nature machine intelligence, 2(1): 56 67, 2020.

Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologiesvolume 1, pages 142 150. Association for Computational Linguistics, 2011.

Jean-Luc Marichal and Pierre Mathonet. Weighted banzhaf power and interaction indexes through weighted approximations of games. European journal of operational research, 211 (2):352 358, 2011.

Jean-Luc Marichal and Marc Roubens. The chaining interaction index among players in cooperative games. In Advances in Decision Analysis, pages 69 85. Springer, 1999.

Tsang Michael, Cheng Dehua, Liu Hanpeng, Feng Xue, Zhou Eric, and Yan Liu. Extracting and leveraging feature interaction interpretations. In International Conference on Learning Representations, 2020.

Rory Mitchell, Joshua Cooper, Eibe Frank, and Geoﬀrey Holmes. Sampling permutations for shapley value estimation. Journal of Machine Learning Research, 23(43):1 46, 2022.

Faith-Shap: The Faithful Shapley Interaction Index

Sérgio Moro, Paulo Cortez, and Paulo Rita. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62:22 31, 2014.

John F Nash Jr. The bargaining problem. Econometrica: Journal of the econometric society, pages 155 162, 1950.

Art B Owen. Sobol indices and shapley value. SIAM/ASA Journal on Uncertainty Quantiﬁcation, 2(1):245 251, 2014.

Art B Owen and Clémentine Prieur. On shapley value for measuring importance of dependent inputs. SIAM/ASA Journal on Uncertainty Quantiﬁcation, 5(1):986 1002, 2017.

Guillermo Owen. Multilinear extensions of games. Management Science, 18(5-part-2):64 79, 1972.

Gregory Plumb, Denali Molitor, and Ameet S Talwalkar. Model agnostic supervised local explanations. In Advances in Neural Information Processing Systems, pages 2515 2524, 2018.

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explaining the predictions of any classiﬁer. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135 1144. ACM, 2016.

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: High-precision modelagnostic explanations. In Thirty-Second AAAI Conference on Artiﬁcial Intelligence, 2018.

Luis M Ruiz, Federico Valenciano, and Jose M Zarzuelo. The least square prenucleolus and the least square nucleolus. two values for tu games based on the excess vector. International Journal of Game Theory, 25(1):113 134, 1996.

Luis M Ruiz, Federico Valenciano, and Jose M Zarzuelo. The family of least square values for transferable utility games. Games and Economic Behavior, 24(1-2):109 130, 1998.

David Schmeidler. The nucleolus of a characteristic function game. SIAM Journal on applied mathematics, 17(6):1163 1170, 1969.

Lloyd S Shapley. A value for n-person games. Contributions to the Theory of Games, 2(28): 307 317, 1953.

Chandan Singh, W James Murdoch, and Bin Yu. Hierarchical interpretations for neural network predictions. ar Xiv preprint ar Xiv:1806.05337, 2018.

Mukund Sundararajan and Amir Najmi. The many shapley values for model explanation. ar Xiv preprint ar Xiv:1908.08474, 2019.

Mukund Sundararajan, Kedar Dhamdhere, and Ashish Agarwal. The shapley taylor interaction index. In International Conference on Machine Learning, pages 9259 9268. PMLR, 2020.

Tsai, Yeh and Ravikumar

Michael Tsang, Dehua Cheng, and Yan Liu. Detecting statistical interactions from neural network weights. ar Xiv preprint ar Xiv:1705.04977, 2017.

Michael Tsang, Hanpeng Liu, Sanjay Purushotham, Pavankumar Murali, and Yan Liu. Neural interaction transparency (nit): Disentangling learned interactions for improved interpretability. Advances in Neural Information Processing Systems, 31:5804 5813, 2018.

Michael Tsang, Sirisha Rambhatla, and Yan Liu. How does this interaction aﬀect me? interpretable attribution for feature interactions. ar Xiv preprint ar Xiv:2006.10965, 2020.

Xi Ye, Rohan Nair, and Greg Durrett. Connecting attributions and qa model behavior on realistic counterfactuals. ar Xiv preprint ar Xiv:2104.04515, 2021.

Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pﬁster, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. Advances in Neural Information Processing Systems, 33, 2020.

Faith-Shap: The Faithful Shapley Interaction Index

Appendix A. Organization

The Appendices contain additional technical content and are organized as follows: In Appendix B, we provide details for sampling algorithms for diﬀerent indices and supplementary results for diﬀerent setups for the computational eﬃciency experiment in Section 8.1. In Appendix C, we give experimental details and show the detailed results of Faithful Shapley Interaction value and Shapley Taylor indices. In Appendix D, we provide additional guidance on Theorem, where we clarify how to choose the parameters a, b to design Faith-Interaction indices. In Appendix E, we provide auxiliary theoretical results of the Faith-Interaction indices, which will be subsequently used in our proof of main theorems. We leave proof of propositions and theorems to arxiv (https://arxiv.org/abs/2203.00870).

Appendix B. Experimental Details and Supplementary Results of Computational Eﬃciency

In this section, we provide implementation details of the sampling algorithms for diﬀerent indices as well as supplementary experimental results for computational eﬃciency experiments. The sampling algorithms for the Shapley Taylor and Shapley interaction indices are shown in Algorithm 1 and 2. These algorithms are based on the fact that these two indices are the expected value of discrete derivatives over diﬀerent ordering processes (Sundararajan et al., 2020, Section 2.2). These algorithms are more eﬃcient since they may use v(S) of the same coalition S to compute indices of diﬀerent subsets. Also, to measure the run-time of each index, we measure its average squared distance every 200/300 model evaluation. We also measure computation eﬃciency on the synthetic sparse functions, which is constructed as follows: we parameterize the synthetic sparse function v : {0, 1}d R with PN i=1 ai Q j Si xj, where x = {x1, , xd} {0, 1}d are the input of the value function, S1, S2, , SN are subsets of [d] and a1, , a N are coeﬃcients. We set d = 70, N = 30, ℓ= 2 and d = 90, N = 10, ℓ= 2, sample each ai uniformly over [ i

10]. Each Si is uniformly sampled over subsets of |S| with sizes 5 and 10, respectively. We use Eqn.(16) to compute the ground truth of interaction indices for sparse synthetic functions. The results are shown in Figure 5. We can see that Faith-Shap is more eﬃcient in terms of all evaluation metrics.

Tsai, Yeh and Ravikumar

(a) Averaged squared distance for d = 70, ℓ= 2, N = 30, |S| 5.

(b) Precision@10 for d = 70, ℓ= 2, N = 30, |S| 5.

(c) Averaged squared distance for d = 90, ℓ= 2, N = 20, |S| 10.

(d) Precision@10 for d = 90, ℓ= 2, N = 20, |S| 10.

Figure 5: Comparison of Faith-Shap, Shapley Taylor and Shapley interaction indices in terms of computational eﬃciency on synthetic sparse functions for d = 70, ℓ= 2, N = 30, |S| 5, and d = 90, ℓ= 2, N = 20, |S| 10.

Algorithm 1: Permutation-based sampling algorithm for the top-order Shapley Taylor index

input : a value function v : 2d 7 R, maximum order ℓ. begin

sum[S] 0 for all sets S [d] with size ℓ. count[S] 0 for all sets S [d] with size ℓ. for t = 1, 2, ... do

π {i1, , id} be a random ordering of {1, 2, , d}. for all set S [d] with size ℓdo

ik the leftmost element of S in the ordering π. T {i1, , ik 1} the set of predecessors of ik in π. sum[S] sum[S] + S(v(T)). count[S] = count[S] + 1. end end indices[S] sum[S]/count[S] for all sets S [d] with size ℓ. return indices end

Appendix C. Experimental details for Language Dataset

For the dataset, the Internet Movie Review Dataset (IMDb) (Maas et al., 2011) consists of 50,000 binary labeled movie reviews. Each review is annotated as a positive or negative review. We used 25,000 reviews for training and 25,000 reviews for evaluation. Here, the set function v(x) represents the predicted probability of input texts being positive sentiment, which is between 0 and 1. We remove a word in a text sequence if the corresponding entry of the word in a binary perturbation variable x is 0. we use 4000 samples to estimate both Faithful Shapley Interaction indices and Shapley Taylor indices. We use Lasso with regularization parameter α = 0.001 to estimate Faithful Shapley Interac-

Faith-Shap: The Faithful Shapley Interaction Index

Algorithm 2: Permutation-based sampling algorithm for the Shapley Interaction index.

input : a value function v : 2d 7 R, maximum order ℓ. begin

sum[S] 0 for all sets S [d] with size ℓ. count[S] 0 for all sets S [d] with size ℓ. for t = 1, 2, ... do

π {i1, , id} be a random ordering of {1, 2, , d}. for k = 1, , d ℓ+ 1 do

S {ik, , ik+ℓ 1}. T {i1, , ik 1} the set of predecessors of ik in π. sum[S] sum[S] + S(v(T)). count[S] = count[S] + 1. end end indices[S] sum[S]/count[S] for all sets S [d] with size ℓ. return indices end

tion indices and permutation-based sampling method to estimate the highest order Shapley Taylor indices (ℓ= 2). For comparison with other feature interactions methods in XAI, we provide top-15 important features (interactions) for Faith-Shap, Shapley Taylor interaction indices, Shapley Interaction indices, Integrated Hessian (Janizek et al., 2021), and Archipelago (Michael et al., 2020) in Table 6 to 10. We note that Archipelago is run with the number of interactions k = 3, and its usage is slightly diﬀerent than our methods: it constructs a feature hierarchy as an explanation rather than measuring the importance scores of feature interactions.

Tsai, Yeh and Ravikumar

Index Sentences Predicted Prob.

I have Never forgot this movie. All these years and it has remained in my life. 0.992

Faithful Shapley indices Shapley Taylor indices

Feature (interactions) Scores Feature (interactions) Scores

Never, forgot 0.818 Never, forgot 1.077

life 0.383 Never, life -0.211

forgot -0.254 remained, movie -0.177

and 0.168 Never, this -0.160

it 0.168 forgot, life -0.149

Never -0.163 and, forgot -0.149

years 0.156 in, life -0.143

All 0.132 Never, it -0.122

my 0.126 Never, movie -0.114

has 0.120 have, Never -0.110

have 0.112 I, have 0.106

Never, life -0.106 forgot, in -0.105

forgot, it -0.096 Never, All -0.104

my, life -0.086 years, life -0.101

this 0.081 it, forgot -0.101

Shapley interaction indices Integrated Hessian

Never, forgot 1.166 this, this 1.796

these, life -0.194 in, in -1.406

have, Never -0.164 my, my -1.357

Never, it -0.154 ., . 1.123

it, forgot -0.149 have, this 1.050

forgot, life -0.148 movie, movie 1.038

forgot, remained 0.146 it, it -0.972

have, forgot -0.139 never, this 0.820

and, Never -0.136 in, my 0.792

it, life -0.135 in, . 0.719

and, it 0.131 ., . -0.602

forgot, in -0.129 this, in 0.554

I, it -0.125 remained, remained -0.532

years, Never -0.121 this, life 0.526

forgot, my -0.119 remained, my 0.526

Archipalego

Table 6: Top-15 important feature (interactions) for diﬀerent methods for language dataset. The predicted probability is the out probability of the sentence having positive sentiment. The archipalego algorithm is run with k = 3 interactions.

Faith-Shap: The Faithful Shapley Interaction Index

Index Sentences Predicted Prob.

TWINS EFFECT is a poor ﬁlm in so many respects. The only good element is that it doesn t take itself seriously. 0.012

Faithful Shapley indices Shapley Taylor indices

Feature (interactions) Scores Feature (interactions) Scores

poor -0.341 only, good -0.450

respects 0.297 EFFECT, good -0.182

only, good -0.243 good, is -0.171

poor, only 0.206 poor, ﬁlm -0.169

good 0.176 only, element -0.168

poor, respects -0.173 doesn t, poor 0.151

doesn t -0.169 only, poor 0.150

poor, good 0.122 respects, poor -0.149

only, doesn t 0.115 itself, poor -0.142

poor, doesn t 0.111 respects, good -0.137

many 0.095 it, doesn t -0.108

it 0.084 it, only -0.098

itself 0.083 take, seriously 0.095

element 0.076 doesn t, good -0.094

poor, many -0.070 doesn t, only 0.093

Shapley interaction indices Integrated Hessian

only, good -0.280 ., . 19.441

a, only -0.259 is, . 3.506

only, poor 0.223 the, . 2.802

poor, good 0.171 only, . 2.374

doesn t, poor 0.159 take, . 2.004

respects, poor -0.154 it, . 1.463

good, is -0.150 good, . 1.462

The, good -0.146 seriously, . 1.372

poor, element -0.146 doesn, . 1.282

doesn t, only 0.142 is, . 1.226

doesn t, take -0.130 that, . 1.169

respects, only -0.120 , 1.143

good, element -0.119 ., . -1.001

it, take -0.117 is, is 0.996

so, that -0.112 itself, . 0.914

Archipalego

Table 7: Top-15 important feature (interactions) for diﬀerent methods for language dataset. The predicted probability is the out probability of the sentence having positive sentiment. The archipalego algorithm is run with k = 3 interactions.

Tsai, Yeh and Ravikumar

Index Sentences Predicted Prob.

I rented this movie to get an easy, entertained view of the history of Texas. I got a headache instead. 0.026

Faithful Shapley indices Shapley Taylor indices

Feature (interactions) Scores Feature (interactions) Scores

instead -0.321 headache, instead 0.268

headache, instead 0.252 view, instead -0.178

headache -0.205 headache, Texas -0.139

easy 0.158 rented, instead 0.137

view 0.130 instead, easy -0.125

history 0.123 got, headache -0.118

rented -0.122 entertained, instead -0.115

Texas 0.101 rented, headache 0.109

entertained 0.095 got, easy -0.108

rented, instead 0.085 got, history -0.105

Texas, headache -0.069 a, I -0.100

history, instead -0.064 view, history 0.100

the 0.059 got, rented -0.100

entertained, instead -0.057 got, a -0.099

this 0.052 history, an 0.094

Shapley interaction indices Integrated Hessian

headache, instead 0.333 ., . -13.363

easy, I -0.248 i, . -2.441

movie, instead -0.226 ., a -2.117

history, to -0.162 ., . -1.420

Texas, an -0.135 texas, . 1.171

rented, easy -0.135 this, . -1.087

entertained, easy 0.130 ., i -1.056

to, easy -0.115 to, . -1.035

view, instead -0.114 of, . 0.843

entertained, instead -0.112 entertained, entertained -0.761

of, instead -0.102 headache, headache 0.753

an, easy 0.095 history, history -0.688

instead, easy -0.093 ., got -0.673

of, easy -0.087 view, . 0.657

get, easy -0.085 to, to -0.599

Archipalego

Table 8: Top-15 important feature (interactions) for diﬀerent methods for language dataset. The predicted probability is the out probability of the sentence having positive sentiment. The archipalego algorithm is run with k = 3 interactions.

Faith-Shap: The Faithful Shapley Interaction Index

Index Sentences Predicted Prob.

Truly appalling waste of space. Me and my friend tried to watch this ﬁlm to its conclusion but had to switch it oﬀabout 30 minutes from the end. 0.002

Faithful Shapley indices Shapley Taylor indices

Feature (interactions) Scores Feature (interactions) Scores

waste -0.345 appalling, waste 0.298

appalling, waste 0.257 Truly, waste -0.296

appalling -0.251 switch, it -0.248

Truly 0.169 tried, waste 0.230

waste, tried 0.167 but, watch -0.210

friend 0.162 friend, waste -0.184

space 0.149 friend, tried -0.172

tried -0.134 friend, but -0.169

Truly, waste -0.118 Truly, but -0.145

watch 0.087 but, waste 0.145

oﬀ -0.086 waste, watch -0.140

and 0.078 waste, oﬀ 0.138

waste, friend -0.074 had, space -0.128

waste, space -0.058 Truly, ﬁlm 0.126

of 0.055 30, waste 0.124

Shapley interaction indices Integrated Hessian

tried, watch -0.365 the, the -31.568

appalling, waste 0.293 the, end -13.784

tried, waste 0.259 the, . 9.472

from, end 0.230 end, end -5.719

conclusion, its 0.228 end, . 3.522

Truly, waste -0.210 from, the 3.390

waste, space -0.202 ., . 2.616

space, oﬀ -0.191 had, the 1.540

to, its -0.180 from, end 1.441

to, space 0.166 30, the -0.959

Me, its 0.162 its, the 0.941

Truly, of 0.155 minutes, . 0.821

the, Me -0.154 oﬀ, the 0.796

but, waste 0.148 ., . 0.779

had, space -0.146 to, the 0.737

Archipalego

Table 9: Top-15 important feature (interactions) for diﬀerent methods for language dataset. The predicted probability is the out probability of the sentence having positive sentiment. The archipalego algorithm is run with k = 3 interactions.

Tsai, Yeh and Ravikumar

Index Sentences Predicted Prob.

I still remember watching Satya for the ﬁrst time. I was completely blown away. 0.994

Faithful Shapley indices Shapley Taylor indices

Feature (interactions) Scores Feature (interactions) Scores

remember 0.337 blown, away 0.345

blown, away 0.293 the, ﬁrst 0.191

time 0.281 time, ﬁrst 0.182

Satya 0.208 watching, for -0.169

remember, blown -0.158 time, away -0.167

watching 0.153 time, Satya -0.151

blown 0.146 time, still -0.145

time, away -0.127 still, watching -0.144

completely, away -0.101 I, watching -0.131

Satya, time -0.091 watching, ﬁrst -0.128

remember, time -0.073 remember, away 0.118

I, watching -0.071 Satya, away -0.118

completely, blown 0.063 was, watching -0.115

ﬁrst, blown -0.053 remember, blown -0.110

ﬁrst 0.049 completely, away -0.107

Shapley interaction indices Integrated Hessian

blown, away 0.318 ., . 4.759

was, remember 0.237 was, . 1.866

remember, blown -0.180 blown, blown 1.552

time, Satya -0.167 i, . 1.185

the, ﬁrst 0.144 was, was 1.105

blown, ﬁrst -0.133 i, . 1.063

completely, blown 0.126 blown, away 0.889

time, away -0.119 satya, . 0.857

I, was 0.093 for, for -0.763

watching, blown 0.087 remember, remember -0.745

I, watching -0.083 for, time -0.745

time, blown 0.080 i, i -0.727

watching, away -0.078 ., . -0.616

remember, watching -0.076 watching, satya 0.592

remember, was -0.074 completely, . 0.579

Archipalego

Table 10: Top-15 important feature (interactions) for diﬀerent methods for language dataset. The predicted probability is the out probability of the sentence having positive sentiment. The archipalego algorithm is run with k = 3 interactions.

Faith-Shap: The Faithful Shapley Interaction Index

Appendix D. Additional Guidance on Theorem 16

In this section, we clarify how to use Theorem 16 to design Faith-Interaction indices satisfying interaction linearity, symmetry, and dummy axioms by ﬁrst explaining Theorem 16 and then providing some examples. Theorem 16 states that the ﬁnite weighting function must be in the following form:

d |S| i |S|

( 1)i |S|g(a, b, i), where g(a, b, i) =

( 1 , if i = 0. Qj=i 1 j=0 a(a b)+j(b a2)

a b+j(b a2) , if 1 i d.

for some a, b R+ with a > b such that µ(S) > 0 for all S [d]. To better understand this formula, some questions need to be answered: (1) What kind of a, b makes µ(S) > 0 for all S [d]? (2) What is the physical meaning of the parameters a and b? To answer (1), we show that a simple condition 1 a > b a2 > 0 suﬃces to make µ(S) > 0 for all S [d].

Proposition 23 When a, b R+ such that 1 a > b a2 > 0, we have

d |S| i |S|

( 1)i |S|g(a, b, i) > 0 for all S [d],

where g(a, b, i) is deﬁned in Eqn.(12).

We delayed the proof of this proposition to arxiv. We note that it is only a suﬃcient condition for selecting a and b: For some small d N, we may have some a, b such that 1 > a2 > b > 0 but makes µ(S) > 0 for all S [d]. However, if a = µ1 and b = µ2 need to make the weighting function positive for all d N, we must have the condition 1 a > b a2 > 0. For question (2), we show that µi = g(a, b, i) = P L S µ(L) for subsets S [d] with |S| = i. Here, µi is deﬁned as the total weight of coalitions containing a group of features of size i (any group with size i will work due to the interaction symmetry axiom). By plugging in i = 1, 2, we get a = µ1 and b = µ2 are the total weights of coalitions containing a single feature and a pair of features. In the following, we give some special cases with particularly chosen a and b to provide an intuition of Theorem 16,

Example 1 When a = 0.5 and b = 0.25, the weighting function µ( ) with respect to Theorem 16 is µ(S) = 1/2d for all S [d]. In this case, the explanations ET (v, ℓ) equals the Banzhaf Interaction value up to order ℓfor all |T| = ℓ, which has the form ET (v, ℓ) = P S [d]\T T v(S)/2d |S|.

In this example, the Banzhaf interaction value satisﬁes interaction linearity, symmetry, and dummy axioms Fujimoto et al. (2006), which coincides with Theorem 16. We also provide another guideline to design the values of a, b based on the desired µd µd 1 and µd 1

µd 2 , where µi = µ(S) when |S| = i. 1

1. µi can be deﬁned since the interaction symmetry axiom ensures that all coalitions with equal size have equal weights.

Tsai, Yeh and Ravikumar

Proposition 24

Let µd µd 1 = r1 and µd 1 µd 2 = r2 with r1 > r2 > (d 2)r1

d 1 + r1 > 0,

then a and b can be represented as functions of r1 and r2:

a = r1(r2 + 1) (d 1)(r1 r2) (r1 + 1)(r2 + 1) (d 1)(r1 r2) and b = r1(r2 + 1) (d 2)(r1 r2) (r1 + 1)(r2 + 1) (d 2)(r1 r2)a.

(24) In this case, a and b satisfy 1 > a > b a2 > 0, which implies µi > 0 for all 0 i d .

This proposition provides a guideline to design a unique interaction value that satisﬁes interaction linearity, symmetry, dummy axioms based on given values of µd µd 1 and µd 1

µd 2 . For example, if the coalition µt has a higher probability to form when t is large, such as the case when the features of an image are explained. As an example, we may set µd µd 1 = 10. We

then have 10 > r2 > d 2

d+910, and we can set r2 = 9 when d < 101. This narrows down a unique interaction value that satisﬁes these three axioms and the conditions of µd µd 1 = 10 and µd 1

Faith-Shap: The Faithful Shapley Interaction Index

Appendix E. Auxiliary Theoretical Results

In this section, we provide auxiliary theoretical results of the Faith-Interaction indices. These properties are useful in the proof of our main theorems. The proof are delayed to Appendix arxiv. First of all, we show that if the coalition weighting function µ( ) is ﬁnite, Eqn.(9) is strictly convex.

Proposition 25 If the coalition weighting function µ( ) is ﬁnite such that µ(S) R+ for all S [d], Eqn.(9) is strictly convex.

Given that Eqn.(9) is strictly convex, we next show that the minimization problems have a unique minimizer.

Proposition 26 The (constrained) regression problems deﬁned in Eqn.(10) with a proper weighting function µ ( Deﬁnition 10) have a unique minimizer.

This proposition is a straightforward application of the following fact: For a minimization problem with linear constraints, if the objective is strictly convex, then it has a unique minimizer. Also, we note that having a positive measure for all subsets of [d] on the weighting function µ( ) is necessary to ensure the uniqueness of the minimizer. Consider the case when the maximum interaction order equals the number of features, i.e. ℓ= d, there are 2d

variables with 2d equalities. That is, v(S) P T S ES(v, d) = 0 for all S [d]. In this case, we can not have any S [d] such that µ(S) = 0 due to the lack of equations. In this special case of ℓ= d, we have the following closed-form expression. We note that these results are independent of the weighting function as long as we have µ(S) > 0 for all S [d].

Proposition 27 When the maximum interaction order ℓ= d, the minimizer of Eqn.(10) the Möbius transform of v, i.e. ES(v, d) = a(v, S) = P T S( 1)|S| |T|v(T) for all subsets S [d].

Then we provide the expression of partial derivatives of the objective in Eqn.(9) with respect to each variable EA(v, ℓ) for all A [d] with |A| ℓ.

Proposition 28 The partial derivative of Eqn.(9) with respect to EA(v, ℓ) is

S:S A, µ(S)<

µ(S)v(S) + 2 X

S Sℓ ES(v, ℓ) X

L:L S A, µ(L)<

µ(L) for all A Sℓ. (25)

This proposition is frequently used in our proof as we solve the minimization problem. Next, the following proposition illustrates how to solve the constrained regression problem via Lagrangian.

Tsai, Yeh and Ravikumar

Proposition 29 Any Faith-Interaction index E(v, ℓ) with respect to a proper weighting function µ( ) with µ(Ø) = µ([d]) = has the form:

λØ λ[d] EØ(v, ℓ) ES(v, ℓ) ET (v, ℓ)

0, 0, 1, , 0, 0, 0, 0, 1, , 1, 1, 1

2, µ(Ø), , µ(S), µ(T), , , , , , , , 0, 1

2, µ(S), , µ(S), µ(S T), 0, 1

2, µ(T), , µ(S T), µ(T), , , , , , ,

v(Ø) v([d]) v(Ø) v(S) v(T)

where λØ and λ[d] are Lagrange multipliers with respect to the constraints on the empty set and the full set, µ(S) = P L S,µ(L)< µ(L), and v(S) = P L S,µ(L)< µ(L)v(L). Formally, the matrix M R(dℓ+2) (dℓ+2) and the vector y Rdℓ+2 have the following definitions: we overuse the notations λØ, λ[d] and let the rows and columns of M are indexed by {λØ, λ[d], Ø, , S, T, }, which are corresponding to variables λØ, λ[d], EØ(v,ℓ), ,ES(v,ℓ),ET (v,ℓ)

1 if S = (λØ) (T = Ø). 0 if (S = λØ) (T = Ø). 1 if (S = λ[d]) (T Sℓ). 0 if (S = λ[d]) (T {λØ, λ[d]}). 1

2 if (S = ) (T = λ[d]). 0 if (S Sℓ\Ø) (T = λØ). 1

2 if (S Sℓ) (T = λ[d]). µ(S T) , otherwise.

, and y S =

v(Ø) if S = λØ. v([d]) if S = λ[d] v(S) otherwise.

where we use MS,T to denote the entry of the intersection of Sth row and T th column.