# emap_explainable_ai_with_manifoldbased_perturbations__472bbb8b.pdf

Journal of Machine Learning Research 26 (2025) 1-35 Submitted 10/22; Revised 7/25; Published 7/25

EMa P: Explainable AI with Manifold-based Perturbations

Minh Nhat Vu mvu@lanl.gov Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA

Huy Quang Mai huyqmai1@gmail.com Hanoi, 10000, Vietnam

My T. Thai mythai@cise.ufl.edu Department of Computer and Information Science and Engineering University of Florida Gainesville, FL 32611, USA

Editor: Florence d Alche-Buc

In the last few years, many explanation methods based on the perturbations of input data have been introduced to shed light on the predictions generated by black-box models. The goal of this work is to introduce a novel perturbation scheme so that more faithful and robust explanations can be obtained. Our study focuses on the impact of perturbing directions on the data topology. We show that perturbing along the orthogonal directions of the input manifold better preserves the data topology, both in the worst-case analysis of the discrete Gromov-Hausdorﬀdistance and in the average-case analysis via persistent homology. From those results, we introduce EMa P algorithm, realizing the orthogonal perturbation scheme. Our experiments show that EMa P not only improves the explainers performance but also helps them overcome a recently developed attack against perturbation-based explanation methods. Keywords: black-box explanations, explainability, topological data analysis, adversarial robustness

1. Introduction

In recent years, many attempts to explain predictions of deep learning models have been conducted, which resulted in various explanation methods called explainers (Lipton, 2018; Murdoch et al., 2019). A common technique used by many explainers (ˇStrumbelj and Kononenko, 2013; Ribeiro et al., 2016; Lundberg and Lee, 2017) is ﬁrst to generate some perturbations in the input space, then forward them through the model, and later provide an explanation based on the captured outputs. For that reason, these methods are also known as perturbation-based explainers. Even though the perturbation-generating step has a strong inﬂuence on the performance of explainers (Ribeiro et al., 2016; Lundberg and Lee, 2017), very few works closely examined this step. Current perturbation schemes often ignore the data topology and distort it signiﬁcantly as a result. These distortions may considerably degrade explainers performance since models are not trained to operate on the deformed topology. Additionally, the diﬀerence between the perturbations and the original data creates opportunities for mali-

c 2025 Minh Nhat Vu, Huy Quang Mai, and My Tra Thai.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v26/22-1157.html.

Vu, Mai, and Thai

cious intent. For example, the work (Slack et al., 2020) demonstrates that a discriminator trained to recognize the explainer s perturbations can be exploited to fool the explainer.

Figure 1: Visualization of perturbations with the same magnitude generated from a point cloud of a 2-dimensional spiral. Perturbations along the orthogonal directions of the data subspace (far-right) result in lower topological distortion, i.e. smaller Bottleneck distances H0 and H1 (the Bottleneck distance is discussed in Section 5).

Motivated by that lack of study, our work aims to redesign the perturbation step in the explanation process. We begin with the observation that there exists important topological information of the data having strong inﬂuences on the model trained on that data. For example, training a neural network to distinguish points from two separated point clouds is generally easier than on two overlapping/connected point clouds. Since perturbing the data can signiﬁcantly change the topological information, perturbation methods can signiﬁcantly change the model s behavior and inﬂuence the subsequent explaining tasks. As such, we aim to generate perturbations so that the topological structure of the original data is better preserved. Our key result is that, assuming the input data is embedded in an aﬃne subspace whose dimension is signiﬁcantly smaller than that of the data dimension, eliminating the perturbations components along that aﬃne subspace would better preserve the topological integrity of the original manifold. An illustration of that result is provided in Fig. 1, which shows that perturbing along the orthogonal directions (i.e., no subspace s directions) results in smaller distortion in the topological structure of the original data. This phenomenon is also reﬂected in the smaller Bottleneck distances in dimensions 0 and 1, denoted by H0 and H1. Based on that result, we further propose a novel manifold-based perturbation method aiming to preserve the topological structure of the original data, called EMa P. The highlevel operations of EMa P are shown in Fig. 2. Given some sampled data, EMa P ﬁrst learns a function mapping the samples to their low-dimensional representations in the data subspace. Then, that function is used to approximate a local aﬃne subspace, shortened to localsubspace, containing the data in the neighborhood of the input to be explained. Finally, the EMa P perturbations are generated by adding the noise vectors that are orthogonal to that local-subspace to the data. Contributions. (a) We provide theoretical results showing that the worst-case discrete Gromov-Hausdorﬀdistance between the data and the perturbations along the manifold s directions is larger than that along the orthogonal directions. (b) The worst-case analysis suggests that eliminating perturbation s components along the manifold s directions can

EMa P: Explainable AI with Manifold-based Perturbations

Figure 2: EMa P s perturbation: Assume the data is embedded in a low-dimensional aﬃne subspace (middle ﬁgure), EMa P approximates that subspace locally at some given data points and performs perturbation along orthogonal directions of that subspace (right ﬁgure).

generally better maintain the topological integrity of the original manifold, i.e. the average case. We then provide synthetic and real-world experiments based on persistent homology and Bottleneck distance to support that hypothesis. (c) We propose EMa P, an algorithm generating perturbations along the manifold s orthogonal directions for explainers. EMa P ﬁrst approximates the input s manifold locally at some given data points, called pivots, and the explained data point. The perturbations are then generated along the orthogonal directions of these local subspaces. EMa P also computes the low-dimensional distances from the perturbations to the explained data point so that the explainers can better explain the model. (d) Finally, we provide experiments on four text data sets, two tabular data sets, and two image data sets, showing that EMa P can improve the explainer s performance and protect explainers from adversarial discriminators. Organization. The remainder of the paper is structured as follows. Sections 2 and 3 brieﬂy discuss related work and preliminaries. Section 4 presents our analysis of the discrete Gromov-Hausdorﬀdistances of diﬀerent perturbation directions, which suggests orthogonal directions are preferable. We strengthen that result with a persistent homology analysis in Section 5. Sections 6 and 7 describes our proposed EMa P algorithm and its experimental results. Section 8 concludes the paper.

2. Related work

This work intersects several emerging research ﬁelds, including explainers and their attack/defense techniques. Our approach also uses recent results in topological data analysis. We provide an overview of the related work below. Perturbation-based explanation methods. Perturbation-based explainers are becoming more popular among explanation methods for black-box models since they hardly require any knowledge of the explained model. Notable ones are LIME (Ribeiro et al., 2016), SHAP (Lundberg and Lee, 2017), and some others (ˇStrumbelj and Kononenko, 2013; Zeiler and Fergus, 2014; Sundararajan et al., 2017; Chang et al., 2018; Schwab and Karlen, 2019; Lundberg et al., 2020; Ying et al., 2019; Vu and Thai, 2020). While they share the same goal to explain the model s predictions, they are not only diﬀerent in the objectives but also

Vu, Mai, and Thai

in their perturbation schemes: some zero out features (Zeiler and Fergus, 2014; Schwab and Karlen, 2019) or replace features with neutral values (Ribeiro et al., 2016; Sundararajan et al., 2017), others marginalize over some distributions on the data (Ribeiro et al., 2016; Lundberg and Lee, 2017; Lundberg et al., 2020). There also exist methods relying on separate models to generate perturbations (ˇStrumbelj and Kononenko, 2013; Chang et al., 2018). The work (Covert et al., 2021) provides a comprehensive survey on those perturbation-based explanation methods and how they perturb the data. Adversarial attack on explainers. We focus on the attack framework (Slack et al., 2020), in which the adversary intentionally hides a biased model from the explainer by training a discriminator to recognize its query. The framework will be discussed in detail in Section 3. There are other emerging attacks on explainers focusing on modifying the model s weights and tampering with the input data (Ghorbani et al., 2019; Dombrowski et al., 2019; Heo et al., 2019; Dimanov et al., 2020). Defense techniques for perturbation-based explainers. Since most attacks on perturbation-based explainers were only developed recently, defense techniques against them are quite limited. Existing defenses generate perturbations either from carefully sampling the training data (Chakraborty et al., 2020) or from learning some generative models (Saito et al., 2020; Vres and Robnik-Sikonja, 2021). The advantage of EMa P is that it does not require any generative model, which not only reduces the attack surface but also allows theoretical study of the perturbations. Topological Data Analysis. Topological data analysis (TDA) is an emerging ﬁeld in mathematics, applying the techniques of topology (which was traditionally very theoretical) to real-world problems. Notable applications are data science, robotics, and neuroscience. TDA uses deep and powerful mathematical tools in algebraic topology to explore topological structures in data and to provide insights that normal metric-based methods fail to discern. The most common tool in the TDA arsenal is persistent homology, developed in the early 2000s by Gunnar Carlsson and his collaborators. We refer readers to Ghrist (2014); Edelsbrunner and Harer (2010) for an overview of both persistent homology and TDA as a whole.

3. Preliminaries

This section discusses the notations and some preliminaries for this work. In particular, we use the standard setting of the learning tasks where the set of input X is sampled from a distribution on RN. X is also assumed to be in a manifold embedded in an aﬃne subspace RV , where V is much smaller than N. We also consider a black-box classiﬁer f mapping each input x X to a prediction y, a local explainer g, a (adversarial) discriminator D, and a masking-model f . In the following, we describe how explainers and the explanation process are formulated. After that, we provide two examples demonstrating how perturbations along diﬀerent directions can have signiﬁcant impacts on the explaining process. We end this section with an attacking framework on perturbation-based explainers and describe how better perturbations can help defend against that class of attacks. Explainers. An explanation of the prediction y = f(x) can be obtained by running an explainer g on x and f. We denote such an explanation by g(x). In additive feature

EMa P: Explainable AI with Manifold-based Perturbations

attribution methods, the range of g(x) is a set of features importance scores. We focus our analysis on the class of perturbation-based explanation, i.e., the importance scores are computed based on the model s predictions of some perturbations of the input data. Typically, the perturbations are generated by perturbing x or/and some samples in X. We denote the perturbations by Xr, where r 0 speciﬁes the amount of perturbation. A more rigorous deﬁnition for this notation will be provided in Section 4. Given the perturbations, the explainer typically ﬁnds the explanation g such that its behavior on the perturbations is similar to that of the explained function f. That is the reason why they are also known as surrogate methods. For example, the LIME explanation is a linear model whose coeﬃcients are the importance score of the features:

arg min g G L(f, g, πx) + Ω(g), (1)

where L is a loss between the explained function f and the explanation function g, G is the class of linear models and πx is an exponential kernel deﬁned on some distance function and Ω(g) is a function measuring the complexity of g. For the SHAP explainer, the main diﬀerences are the choices of the kernel πx and the complexity measure Ω(g) so that the explanations satisfy certain desired properties of Game Theory. For details of the two methods, we refer readers to the original papers (Ribeiro et al., 2016; Lundberg and Lee, 2017). Topological-awareness perturbations for explaining tasks. We now provide two examples demonstrating how leveraging topological information of the data can be beneﬁcial for the explanation tasks. The examples aim to demonstrate two key observations. The ﬁrst is that the usage of only projecting perturbations, i.e., perturbations in the manifold, might lead to high explaining errors. The second example heuristically shows orthogonal perturbations resulting in a more desirable model response for the explaining task.

(a) Explaining using projecting perturbation.

(b) Explaining using orthogonal perturbation.

Figure 3: A 1-D example showing the impact of the perturbation s direction.

In the ﬁrst example, we consider 2-dimensional data lying in a 1-dimensional manifold (Fig. 3). The task of the model is to diﬀerentiate the blue data from the green data. The decision boundary of the model is a line intersecting the manifold (the red dashed line). The goal of perturbation-based explainers is to determine that decision boundary. Fig. 3a demonstrates the scenario when only projecting perturbations are used. Since the perturbations and the data lie in a 1-dimensional manifold, they cannot help the explainer diﬀerentiate the decision boundary from any other lines going through the intersection.

Vu, Mai, and Thai

Thus, the resulting explanation (purple dot line) might have high errors. On the other hand, as depicted in Fig. 3b, orthogonal perturbations can diﬀerentiate lines in the 2-dimensional space and help the explainer learn solutions with lower errors.

(a) A C-shaped data in a regression task (left) and its perturbations (right). The projection perturbation of the data is more likely to destroy the gap at the two ends of the original C-shaped data.

0 1 2 3 4 5 6 Theta

Output's difference

The change of predictions caused by perturbations

Projection Orthogonal In distribution

(b) The ﬁgure reports the normalized change |f(x+δ) f(x)|

δ L2 for x along the C-shaped data (located by the angle θ). The perturbation x + δ belongs to the projection point cloud, the orthogonal point cloud, and the original data as indicated in the legend.

Figure 4: An example showing how leveraging topological information can be beneﬁcial for the explanation task.

Our second example is the experiment shown in Fig. 4. The experiment consists of a Cshaped data set and a regression model f mapping the data set uniformly to the range (0, 1), i.e., f(x) = θx/(2π 0.5) for θx [0, 2π 0.5], where θx is the angular coordinate of the point x on the shape. That angle θx, the mapping f, and the data set are illustrated in Fig. 4a. The ﬁrst observation is that the optimal linear local explanation (1) of this regression model is the tangent line at the point of prediction. To obtain that explanation, one necessary condition of the perturbations is that the following quantity f(x + δ) f(x) = θx+δ θx

2π 0.5 should depend only on δ but not x. In other words, the change f(x + δ) f(x) should be almost constant when x moves along the shape. Particularly, Fig. 4b demonstrates the low variance of that quantity (black line) when the perturbation x + δ is sampled from the original C-shaped data, i.e., in distribution1. For projection and orthogonal perturbations, we can observe that f(x+δ) f(x) matches that of in-distribution perturbation, except for data at the two ends of the C-shape. The reason is that the two ends have important topological information on the data, and that information also has a strong impact on the model: if the data is slightly perturbed at those ends, the data might become connected (a circle appears,) and the two extreme values of the regression model will also be connected. Thus, the evaluations of the model on the perturbations connecting the two ends, i.e,. on those that destroy the associated

1. To sample the perturbation x + δ, we ﬁrst ﬁx a perturbation magnitude , then randomly select the point x in the speciﬁed point cloud (in-data, projection or orthogonal perturbations) such that the L2 distance between x and x equal to , i.e., δ L2 = . The value of for this experiment is 0.25.

EMa P: Explainable AI with Manifold-based Perturbations

topological feature, will result in unstable responses. Consequently, explaining the model based on those perturbations is not desirable. In fact, the distortion f(x + δ) f(x) caused by projection perturbations has signiﬁcantly higher ﬂuctuation than that of the orthogonal. The reason is that projection perturbations near the two ends of the C-shaped have a higher chance of connecting the ends and destroying the shape. Attack framework. We study the discriminator-based attack framework introduced by Slack et al. (2020), which is illustrated in Fig. 5. The framework focuses on an adversary with an incentive to deploy a biased-model f. This adversary can bypass the detection of the explainer by forwarding the explainer s perturbations Xr to a masking model f . The decision whether to forward an input to the masking model is made by a discriminator D. Thus, the success of the attack is dictated by the capability of the discriminator D to distinguish the perturbations Xr from the actual data X. Intuitively, if the explainer can craft an Xr similar to X, it not only improves the explainer s performance but also prevents the adversary from hiding its bias.

Figure 5: The discriminator-based attack framework: By recognizing and forwarding the perturbations Xr generated by an explainer to the masking model f , the biased model f can be deployed without detection.

4. Analysis of Discrete Gromov-Hausdorﬀdistances of perturbations

We consider the following perturbation problem: Given a manifold embedded in RN, how do we perturb it so that we preserve as much topological information as possible? More concretely, given a ﬁnite set of points sampled from such a manifold, is there a consistent method to perturb the original data set while preserving some notion of topology? To begin talking about diﬀerences between (metric) spaces, we need to introduce a notion of distance between them. One such commonly used distance is the Gromov-Hausdorﬀ distance, which is derived from the Hausdorﬀdistance. Intuitively, a small Gromov Hausdorﬀdistance means that the two spaces are very similar as metric spaces. Thus, we can focus our study on the Gromov-Hausdorﬀdistances between the data and diﬀerent perturbation schemes. However, as it is infeasible to compute the distance in practice, we instead study an approximation of it, which is the discrete Gromov-Hausdorﬀdistance. Speciﬁcally, we show that, when the perturbation is signiﬁcantly small, the worst-case discrete Gromov-Hausdorﬀdistance resulting from orthogonal perturbation is smaller than that of projection perturbation, i.e., perturbation along the manifold (Theorem 3). The proof of that claim relies on Lemma 4, which states that, with a small perturbation, the

Vu, Mai, and Thai

discrete Gromov-Hausdorﬀdistance between the original point cloud and the perturbation point cloud equals the largest change in the distances of any pair of points in the original point cloud. With the Lemma, the problem of comparing point clouds is further reduced to the problem of comparing the change in distances. The structure of this section is as follows: ﬁrst, we recall the deﬁnition of the Hausdorﬀ distance and the Gromov-Hausdorﬀdistance. We then introduce the discrete Gromov Hausdorﬀdistance, which will be the main focus for computational purposes. Finally, using such a discrete distance, we prove Lemma 4 and Theorem 3. We now state the formal deﬁnitions. Let (M, d) be a metric space. For a subset S M and a point y M, the distance between S and y is given by d(S, y) := infx S d(x, y).

Deﬁnition 1 (Hausdorﬀdistance) (Gromov, 1981) Let S and S be two non-empty subsets of a metric space (M, d). The Hausdorﬀdistance between S and S , denoted by d H(S, S ) is:

d H(S, S ) := max sup x S d(S , x), sup y S d(S, y) .

Deﬁnition 2 (Gromov-Hausdorﬀdistance) (Gromov, 1981) Let X, Y be two compact metric spaces. The Gromov-Hausdorﬀdistance between X and Y is given by:

d GH(X, Y ) = inf f,g d H(f(X), g(Y )),

where the inﬁmum is taken over all metric spaces M and all isometric embeddings f : X M, g : Y M.

Even though the Gromov-Hausdorﬀdistance is mathematically desirable, it is practically non-computable since the above inﬁmum is taken over all possible metric spaces. In particular, this includes the computation of the Gromov-Hausdorﬀdistance between any two point clouds. In 2004, M emoli and Sapiro (2004) addressed this problem by using a discrete approximation of Gromov-Hausdorﬀ, which looks at the distortion of pairwise distances over all possible matchings between the two point clouds. Formally, given two ﬁnite sets of points X = {x1, ..., xn} and Y = {y1, ..., yn} in a metric space (M, d), the discrete Gromov-Hausdorﬀdistance between X and Y is given by

d J(X, Y ) = min π Sn max i,j 1 2

d(xi, xj) d(yπ(i), yπ(j)) , (2)

where Sn is the set of all n-permutations. Let X = {x1, ..., xk} RV RN be a point cloud contained in some aﬃne subspace RV

of RN. We say X is generic if the pairwise distances between the points in X are not all equal, i.e. there exist some points xi1, xi2, xi3, xi4 X such that d(xi1, xi2) = d(xi3, xi4). Let Xr be a ﬁnite set of points in RN s.t. for every xi X, there exists a unique xi Xr such that d(xi, xi) = r. Xr realizes a perturbation of X with the radius of perturbation being equal to r. We also denote X r (resp. XProj r ) as a ﬁnite set of points in RN such that for every xi X, there exists a unique xi X r such that d(xi, xi) = r and xi xi RV

(resp. xi xi RV ), where xi xi denotes the line connecting the points xi and xi. We are now ready to state the following key theorem:

EMa P: Explainable AI with Manifold-based Perturbations

Theorem 3 Given a generic point-cloud X = {x1, ..., xk} RV RN, there exists an r0 > 0 such that for any r < r0 and for any instances of X r , there exists an XProj r such that: d J(X, X r ) d J(X, XProj r ).

We prove Theorem 3 is based on the following lemma:

Lemma 4 There exists an ϵ > 0 such that for any r < ϵ, we have

d J(X, Xr) = 1

2 max i,j |d(xi, xj) d( xi, xj)|.

for any Xr.

To prove Lemma 4, we show that, for a small enough ϵ, the optimal permutation in Eq. (2) is the identity π(i) = i. Thus, the minimization in the computation of d J can be eliminated. The detail is shown in the following. Proof of Lemma 4. Given a permutation π Sn and two point clouds X, Y of the same cardinality, denote:

Dπ(X, Xr) = 1

2 max i,j |d(xi, xj) d( xπ(i), xπ(j))|.

Let N0 be the set of permutations π Sn such that Dπ(X, X) = 0. Let N1 = Sn\N0. Since X is generic, N0 does not include all of Sn and N1 = . Let δ = minπ N1 Dπ(X, X) > 0. We claim that choosing ϵ = δ

4 proves the lemma. To be more precise, the radius of perturbation r is chosen such that r < ϵ = δ

4, i.e. 4r < δ. Given an Xr, for any π Sn, we consider two cases:

1. If π N0, then d(xi, xj) = d(xπ(i), xπ(j)) (i, j). Note that the identity permutation belongs to N0 as:

Dπ(X, Xr) = 1

2 max i,j |d(xi, xj) d( xπ(i), xπ(j))|

2 max i,j |d(xπ(i), xπ(j)) d( xπ(i), xπ(j))| = 1

2 max i,j |d(xi, xj) d( xi, xj)|. (3)

Since d(xi, xi) = d(xj, xj) = r, from Triangle inequality, we have:

d( xi, xj) d(xi, xj) + d(xi, xi) + d(xj, xj) = d(xi, xj) + 2r,

d( xi, xj) d(xi, xj) d(xi, xi) d(xj, xj) = d(xi, xj) 2r.

Therefore, for all i, j, we have:

|d( xi, xj) d(xi, xj)| 2r. (4)

This implies Dπ(X, Xr) r for π N0.

Vu, Mai, and Thai

2. If π N1, without loss of generality, we assume the pair (x1, x2) maximizes |d(xi, xj) d(xπ(i), xπ(j))|. For convenience, we denote π(1) = 3 and π(2) = 4. From the fact that π N1, we have |d(x1, x2) d(x3, x4)| δ. On the other hand, from (4), |d( x3, x4) d(x3, x4)| 2r. Thus, from the Triangle inequality, we obtain:

|d( x3, x4) d(x1, x2)| |d(x1, x2) d(x3, x4)| |d( x3, x4) d(x3, x4)|

Since Dπ(X, Xr) 1

2|d(x1, x2) d( x3, x4)|, we establish Dπ(X, Xr) r for π N1.

From the above analysis, we can conclude Dπ(X, Xr) Dτ(X, Xr) for all π N0 and τ N1. Combining this with (3), we have that the identity permutation is the solution of (2), which proves the Lemma.

With the Lemma, Theorem 3 can be proved by choosing a speciﬁc projection perturbation such that its discrete Gromov-Hausdorﬀdistance is always bigger than the upper bound for such distance of any orthogonal perturbations. The proof of the Theorem is shown below:

Proof of Theorem 3. Applying Lemma 4 to the orthogonal perturbation Y = X r and projection perturbation Z = XProj r , and for any r less than or equal to the minimum of the ϵ corresponding to each perturbation speciﬁed in Lemma 4, we obtain:

d J(X, Y ) = 1

2 max i,j |d(xi, xj) d(yi, yj)| and d J(X, Z) = 1

2 max i,j |d(xi, xj) d(zi, zj)|.

From the triangle inequality (similar to how we show Eq. (4) in the proof of Lemma 4), we have:

1 2 max i,j |d(xi, xj) d(yi, yj)| < r and 1 2 max i,j |d(xi, xj) d(zi, zj)| r,

where the ﬁrst inequality is strict due to the orthogonality of the perturbation. Given such r, consider the following perturbation Z = XProj r where x1, x2, z1, z2 are collinear and |d(x1, x2) d(z1, z2)| = 2r, and zi = xi for i = 3, ..., n. The d J distance between such Z and X is greater than or equal to r, which proves our claim.

This theoretical result suggests the following perturbation scheme: Given a manifold embedded in an aﬃne subspace of RN and a ﬁxed amplitude r of perturbation, perturbing the manifold in the orthogonal directions with respect to the aﬃne subspace is preferable to random perturbation, as it minimizes the topological diﬀerence between the perturbed manifold and the original.

5. Persistent homology analysis with the Bottleneck distance

The results from the previous section show that on the worst-case basis, the orthogonal perturbation is preferable to the projection perturbation. However, when we apply them to

EMa P: Explainable AI with Manifold-based Perturbations

actual data sets, how do they compare on average? Since the discrete Gromov-Hausdorﬀ distance is still computationally intractable for a Monte-Carlo analysis, we choose a diﬀerent approach: persistent homology. For the last 30 years, there have been new developments in the ﬁeld of algebraic topology, which was classically very abstract and theoretical, toward real-world applications. The newly discovered ﬁeld, commonly referred to as applied topology or topological data analysis, is centered around a concept called persistent homology. Interested readers can refer to (Ghrist, 2014; Edelsbrunner and Harer, 2010; Ghrist, 2008) for an overview of the subject. For any topological space, homology is a topological invariant that counts the number of holes or voids in the space. Intuitively, given any space, the 0th-homology counts the number of connected components, the 1st-homology counts the number of loops, the 2ndhomology counts the number of 2-dimensional voids, and so on. The homology groups of dimension i are denoted by Hi. Given a point cloud sampled from a manifold, we want to recapture the homological features of the original manifold from these discrete points. The idea is to construct a sequence of topological spaces along some timeline and track the evolution of the topological features across time. The longer the features persist (and hence the name persistent homology), the more likely they are the actual features of the original manifold. Given a point cloud X and a dimension i, the persistence diagram Di(X) is the set of points (b, d) R2 corresponding to the birth and death time of these features in the aforementioned timeline. For two point clouds, and also for their two persistence diagrams, there are several notions of distances between them, representing how similar they are as topological spaces. The most commonly used distance in practice is the Bottleneck distance.

Deﬁnition 5 (Bottleneck distance) Let X and Y be two persistence diagrams. The Bottleneck distance W (X, Y ) is given by

W (X, Y ) = inf ϕ:X Y sup x X x ϕ(x) ,

where the inﬁmum is taken over all matchings ϕ : X Y (which allows matchings to points with equal birth and death time).

For simplicity, we shorthand the Bottleneck distance between persistence diagrams of X and Y in dimension i to Hi(X, Y ) instead of W (Di(X), Di(Y )). As the notation takes in 2 parameters in X and Y , this is not to be confused with the homology group of the speciﬁed spaces. Note that two point clouds with small Bottleneck distances can be considered topologically similar. Notably, the bottleneck distance is highly correlated to the Gromov-Hausdorﬀdistance (Section 4), as the bottleneck distance of two persistence diagrams of the same dimension is bounded above by their Gromov-Hausdorﬀdistance (Chazal et al., 2009):

W (Di(X), Di(Y )) d GH(X, Y ),

for every dimension i. Monte-Carlo simulations. The Bottleneck distance is much more calculable than the Gromov-Hausdorﬀdistance, and there are available software packages depending on the use

Vu, Mai, and Thai

cases. As such, we run Monte-Carlo simulations to compute the Bottleneck distances on 5 synthetic data sets, 3 real-world tabular data sets, and 2 real-world image data sets to conﬁrm our hypothesis that the orthogonal perturbation preserves the topology better than the projection perturbation on average. The synthetic data sets are some noisy point clouds of certain 2-dimensional shapes in 3-dimensional space. The tabular data sets are the COMPAS (JeﬀLarson and Angwin, 2016), German Credit (Hofmann, 1994), and Communities and Crime (Redmond, 2011). The image data sets are MNIST (Le Cun and Cortes, 2010) and Fashion-MNIST (Xiao et al., 2017). Table 1 and 2 provide more details about those data sets. We use the Ripser Python library (Tralie et al., 2018) to compute the Bottleneck distances in our experiments. All reported Bottleneck distances are normalized with the noise added to the point clouds for more intuitive visualization.

Table 3 reports the means H0 and H1 Bottleneck distances of the perturbations on the synthetic data sets. The number of data points and the noise level are chosen mainly for nice visualizations (shown in Appendix A). The results show that orthogonal perturbation consistently results in lower H0 distances for line-shaped data and lower H1 distances for cycle-shaped data. Note that in general, H1 is the better topological indicator for cycleshaped data sets compared to H0, since cycles or holes (detected by H1) are harder to replicate than connected components (detected by H0).

For the real-world data set, we conduct the experiments with perturbations of diﬀerent noise levels and report results in Fig. 6 and 7. It can be observed that both H0 and H1 Bottleneck distances of the persistence diagrams of the orthogonal perturbation are signiﬁcantly smaller than those of the projection perturbation on all experiments.

6. EMa P Algorithm

We now discuss our perturbation-generating algorithm, called EMa P, leveraging the topological information of the input data. To generate better perturbations for the explaining task, EMa P not only exploits the perturbation direction but also uses low-dimensional distances to better capture the notion of locality.

In particular, predictions on the perturbations do not necessarily hold local information about the explained inputs, which is one main reason for the usage of some distance or kernel functions measuring how similar the perturbations are to the explained input. Note that those distances and kernels are normally functions of other distances, such as L1 and L2 in the input space RN, which might not correctly capture the notion of similarity. By operating in the low-dimensional manifold, EMa P can overcome those issues. First, if topological similarity implies similarity in the model s predictions, maintaining the topological structure of the original data should improve the relevance of the model s predictions on the perturbations. Therefore, explaining the model with orthogonal perturbations, which helps preserve the topology better, should be more beneﬁcial. Furthermore, the manifold provides a natural way to improve the similarity measurement among data points. Fig. 8 shows the issue of similarity measurement based on Euclidean distance in the input space RN. As the distance ignores the layout of the data, further points on the manifold might result in the same similarity measure. On the other hand, the low-dimensional distances computed on the manifold take into account the layout and can overcome that issue.

EMa P: Explainable AI with Manifold-based Perturbations

Figure 6: The normalized H0 and H1 Bottleneck distances for orthogonal and projection perturbations on 3 real-world data sets at diﬀerent noise levels. The x-axis shows the average perturbations radius applied to each data point (log-scale).

Figure 7: The normalized H0 and H1 Bottleneck distances for orthogonal and projection perturbations on 2 image data sets at diﬀerent noise levels. The x-axis shows the average perturbations radius applied on each data point (log-scale).

Vu, Mai, and Thai

data set Parameter No. points Data s noise Perturbation No. runs EMa P s dim

Line Length: 10 100 0.1 0.15 100 1 Circle Radius: 1 400 0.1 0.1 100 2 2 intersecting circles Radius: 1 400 0.01 0.1 100 2 2 concentric circles Radius: 1 400 0.01 0.1 100 2 Spiral Radius: [0,2] 1000 0.02 0.05 100 2

Table 1: The parameters of the Bottleneck distance s experiments on the synthetic data. The perturbation column shows the average magnitude of the perturbation on the data.

Experiment No. data points No. feats Feature s values No. runs EMa P s dim

COMPAS 7214 100 {0,1} 100 2 German Credit 1000 28 {0,1} 100 2 CC 2215 100 {0,1} 100 2

MNIST 60000 28 28 [0,1] 100 2 and 3 Fashion-MNIST 60000 28 28 [0,1] 100 2 and 3

Table 2: The parameters of the Bottleneck distance s experiments on the real-world data.

Table 3: The normalized H0 and H1 Bottleneck distances for Gaussian (G), projection (P), and orthogonal (O) perturbations on synthetic data. Visualizations of the actual perturbations are provided in Appendix A.

Algorithm overview. Given an input to be explained, the outputs of EMa P are the perturbations along the manifold s orthogonal directions Xr and their low-dimensional distances Dr to that input. The pseudo-code of EMa P is shown in Alg. 1. The ﬁrst step is to learn an embedding function, i.e., a mapper, transforming the data to a low dimension (line 2). Then, p samples from each label are selected and combined with the

EMa P: Explainable AI with Manifold-based Perturbations

Figure 8: Euclidean distances computing in the input space might not capture the actual distances between the data points (left). Distances in low-dimensional space can help with the issue (right).

explained input x0 into a set, called the pivots (lines 3 to 7). After that, EMa P generates perturbations along the orthogonal directions of the manifold from each pivot (line 10). Pivots are used to provide the explainer with a broader range of perturbations, leading to improved performance. Speciﬁcally, since the manifold learned by UMAP is optimized for capturing global structure, orthogonal directions computed directly on this manifold can be highly inaccurate. By introducing pivots, EMa P can learn multiple local subspaces, enabling the generation of more accurate orthogonal perturbations. In the next paragraphs, we will describe those key steps in more detail.

Algorithm 1 EMa P Input: Data to explain x0, a subset of training data (X, y), number of pivots per labels p, number of perturbations per pivot k, lower dimension V and noise level r. Output: Xr and Dr. Xr contains k(pl + 1) orthogonal perturbations locally around x0 and points in X. Dr contains the low-dimensional distances of points in Xr to x0 (l is the number of unique labels in y).

1: Initialized an EMa P sampler object M.

2: M.mapper Mapper to the manifold of dimension V of X

3: M.pivots

4: Include x0 to M.pivots

5: for each class l in y do

6: Include p samples of class l to M.pivots

9: for each data point x in M.pivots do

10: X, D M.gen perturbation(x, k, r)

11: Include X to Xr and include D to Dr 12: end for

13: return Xr and Dr.

Vu, Mai, and Thai

The mapper. In our implementation of EMa P, the mapper is learned from a manifold approximated by UMAP (Mc Innes et al., 2018). Since the manifold learned by UMAP is optimized for global information, the orthogonal directions computed on top of that manifold at local data points are prone to high error. This can degrade the correctness of orthogonal perturbations signiﬁcantly. To overcome this issue, EMa P learns a local subspace for each pivot and generates the orthogonal perturbations on top of that subspace. Intuitively, the local subspace is a local aﬃne approximation of the manifold. Therefore, the resulting orthogonal directions are more ﬁnely tuned for the local points. We denote Gx as the N V matrix characterizing the local subspace at x. Since Gx is a linear approximation of data points near x, by denoting ω : RN RV as the function embedding input data to the manifold, we have zlow := ω(z) G x z and z Gx zlow, where z RN are points near x and zlow are their embedding in RV . In our current implementations, the set of pivots contains the explained data point and p data points sampled from each class label l (see Algorithm 1).

EMa P orthogonal perturbations. The key step of EMa P is the generation of orthogonal perturbations x from x (line 10, Alg. 1), which can be expressed as:

x = x + noise Proj Gx(noise), (5)

where the noise is sampled from a multivariate normal distribution and Proj Gx(noise) is the projection of the noise on the local subspace characterized by Gx. Upon obtaining the orthogonal perturbations, we can compute their low-dimensional embedding using the mapper transform function ω. The pseudo-code for this computation is in Algorithm 2.

Algorithm 2 gen perturbation Input: Input x, number of perturbation k and noise level r. Output: k orthogonal perturbations of x and their low-dimension distances to x.

1: Gx self.get local subspace(x)

3: for 1 i k: do

4: noise N(0, Σr)

5: x x + noise Proj Gx(noise)

6: Include x into X

8: Xlow self.mapper.transform( X)

9: xlow self.mapper.transform(x)

10: D distances of each member in Xlow to xlow

11: return X and D

Local-subspace approximation. The correctness of the orthogonal perturbations, i.e. whether the perturbations are actually lying in the orthogonal subspace, is heavily dependent on the correctness of the computation of Gx. We now discuss how EMa P learns the local-subspace matrix Gx.

EMa P: Explainable AI with Manifold-based Perturbations

Ideally, given a set of data Z near x in the manifold, we can solve the following optimization for Gx :

Gx = arg min G

z Z Gω(z) z , (6)

where ω is the transform function embedding input data to the manifold learned in the previous step. An intuition is that, for all z in the manifold and near x, we are searching for a matrix G such that the inverse mapping Gω(z) Gzlow approximately equals its original value z. Note that if the embedding ω is exactly on Z and the manifold is aﬃne, the optimization (6) can achieve its optimal value 0 for some G. Since it is not trivial to obtain the set Z belonging to the manifold, EMa P perturbs around x with some random noise and solves the following optimization instead:

ˆGx = arg min G

r B Gω(x + r) (x + r) , (7)

where B is a ball centering at 0 with noise radius r. EMa P solves (7) for an approximation of the local subspace. Further details of this step are provided in Algorithm 3.

Algorithm 3 get local subspace Input: Data point z. Hyper-parameters: Number of training samples k T and noise level for training r T . Output: Matrix G characterize the local-subspace at z

2: for 1 i k T : do

3: noise N(0, Σr T )

4: z z + noise

5: Include z into Z

7: Zlow self.mapper.transform( Z)

8: G arg min W Z W Zlow 2 9: return G

We now discuss the gap between the ideal solution Gx and the approximation ˆGx used by EMa P, i.e., the solution of (6) and (7). This can be characterized by bounding the error between {Gxω(z)} and { ˆGxω(z)}, i.e. the reconstructed signals in RN by using Gx and ˆGx, respectively. Lemma 6 provides a bound on that reconstruction error where the set Z in (6) is the projection of a ball B on the data manifold. The bound holds under a mild assumption that the optimal objective of (6) is not larger than that of (7). We ﬁnd this assumption reasonable and intuitive: as the set Z is in a subspace of dimension V and the set of x + r is a ball in RN, ﬁnding a subspace of dimension V approximating the subspace containing Z should give a much lower error.

Lemma 6 Assume that all data points x belong to the same aﬃne space RV . Let Proj be the projection onto RV , then under the above assumption on the optimization (6) and (7),

Vu, Mai, and Thai

the reconstruction error on perturbed data points is upper bounded by:

ˆGxω(x + r) Gxω(Proj(x + r)) FB(ω) + r ,

where r is the orthogonal components of r and FB(ω) := min G P r B Gω(x+r) (x+r) .

Proof For simplicity, we rewrite:

G1 = arg min G

r B Gω(x + r) (x + r) ,

G2 = arg min G

r B Gω(Proj(x + r)) Proj(x + r) .

The assumption regarding the objectives (6) and (7) mentioned in the Lemma can be rewritten as:

r B G2ω(Proj(x + r)) Proj(x + r) X

r B G1ω(x + r) (x + r) .

We ﬁnd this assumption reasonable since its left-hand side is equal to 0 in the ideal scenario, i.e., x RV for all x in our data set. With that, we have:

2 G1ω(x + r) (x + r)

G1ω(x + r) (x + r) + G2ω(Proj(x + r)) Proj(x + r)

G1ω(x + r) G2ω(Proj(x + r)) (x + r Proj(x + r))

G1ω(x + r) G2ω(Proj(x + r))|| ||(x + r) Proj(x + r))

= G1ω(x + r) G2ω(Proj(x + r))|| r Proj(r) ,

where the last two inequalities are from the Triangle Inequality. The last equality is due to the fact that (x + r) Proj(x + r) is the orthogonal components of x + r and that x has no orthogonal components.

Note that FB(ω) is small if the manifold is aﬃne in the neighborhood B of x and ω is good, i.e. ˆGx is a good estimator for Gx under the above assumption.

7. Experiments

We evaluate EMa P on two main objectives: the explainer s performance and the perturbations robustness. Our experiments are conducted on 3 tabular data sets, 2 image data sets, and 4 text data sets of reviews in multiple domains. They are COMPAS (JeﬀLarson and Angwin, 2016), Communities and Crime (Redmond, 2011), German Credit (Hofmann, 1994), MNIST (Le Cun and Cortes, 2010), Fashion-MNIST (Xiao et al., 2017), and 4 reviews data sets in the Multi-Domain Sentiment (Blitzer et al., 2007).

EMa P: Explainable AI with Manifold-based Perturbations

7.1 Experimental settings

We ﬁrst describe how the data sets and the testing models are processed in our experiments. Then, we provide some details of the implementation of EMa P along with the baselines. Data sets and Models hyperparameters. All reported results for real-world data sets include at least 2000 data points and 100 runs, except for those of German Credit where the data only consists of 1000 samples. The text data sets are tokenized into vectors of 1000 input features while the MNIST and Fashion-MNIST data sets are of dimension 1 28 28 and 3 32 32, respectively. The model s inputs of all experiments are normalized between 0 and 1. The experimental model for the text data set is the L1 logistic regression as used by LIME (Ribeiro et al., 2016). The models of testing for the two image data sets are 2-layer convolutional networks implemented in Pytorch (Paszke et al., 2019) with test set accuracy of 98% for MNIST and 93% for Fashion-MNIST. Technical implementation of EMa P and baselines. Generally, EMa P perturbations can be used to leverage any model-agnostic perturbation-based methods; however, the modiﬁcation may require signiﬁcant changes to the existing implementations of the methods. In our experiments, we use EMa P to leverage LIME (Ribeiro et al., 2016) as a proof of work, i.e., we show that EMa P can improve LIME in terms of performance. We chose LIME to demonstrate the usage of EMa P since it requires a few modiﬁcations to integrate EMa P s perturbations into LIME. This helps demonstrate fairly the gain of applying EMa P on explanation methods. We use the notation EMa P-LIME (or EMa P for short) to indicate LIME with EMa P s perturbations in our following experimental results. To apply EMa P into LIME, we simply choose the loss function (deﬁned in Eq.(1)) as:

L(f, g, πx) = X

x Xr πx( x)(f( x) g( x)),

where πx( x) = exp( D(x, x)2/σ2) with the distance D computed as in Algorithm 2 and g is the linear function of the features of the perturbation. Besides LIME, we also heuristically compare EMa P-LIME to some other black-box and white-box explanation methods. The following are brief descriptions of those methods:

LIME zero: LIME with perturbations whose perturbed features are set to zero. This method is used by LIME in explaining text data.

LIME+: LIME with perturbations whose perturbed features are added with Gaussian noise. This method is used by LIME in explaining image data.

LIME*: LIME with perturbed whose perturbed features that are multiplied with uniform noise between 0 and 1. This can be considered a smoother version of LIME zero. We include this variant of LIME because it provides nice explanations in some speciﬁc experiments (see Appendix A for some examples).

Kernel SHAP: a black-box method based on Shapley value (Lundberg and Lee, 2017), whose perturbed features are set to the average of some background data.

Gradient SHAP: a white-box method based on Shapley value (Lundberg and Lee, 2017), which relies on the gradient of the model.

Vu, Mai, and Thai

Deep LIFT: a white-box method based on back-propagating the model (Shrikumar et al., 2017).

Parzen (Baehrens et al., 2010): an explanation method that approximates the model globally and uses the gradients of that approximating model to explain the predictions.

Greedy: the features contributing the most to the predicted class are removed and selected until the prediction changes. The implementation is taken from (Ribeiro et al., 2016).

The noise vector used to generate perturbations for all applicable explainers has a radius 10 3 for text data and 10 4 for image data, which are in the range analyzed in the previous Bottleneck distance experiments in Fig. 6 and 7. The noise radius r T used to approximate the local subspaces of EMa P (Algorithm 3) is chosen equal to the noise radius r for perturbation. For UMAP s hyper-parameters, we use n components {2, 3} and min dist = 0.1 (default value). Our source code is attached in the supplementary material of this submission. Finally, for a fair comparison, the number of perturbations used to generate the explanation of any reported methods is 1000.

Figure 9: Examples of explanations for diﬀerent methods on MNIST and Fashion-MNIST: altering white pixels to dark or vice versa in the most red (or blue) regions would weaken (or reinforce) the original prediction.

7.2 Explaining performance of explanation methods

For illustrative purposes, Fig. 9 shows the resulting explanations on MNIST and Fashion MNIST generated by EMa P and some other black-box methods (More examples are in Appendix A). All methods use 1000 perturbations to generate the corresponding explanations. We can see that, while EMa P and SHAP capture relevant features, LIME includes a lot of noisy features in the background. Note that the reason LIME* does not include

EMa P: Explainable AI with Manifold-based Perturbations

the background is that the background of the original image has a value of 0 and, through multiplication, the features are not perturbed. This makes their weights 0 in the explaining model g. Regarding the low quality of LIME zero and LIME+, the main reason is that the method would require a much larger number of perturbations to generate good results at the pixel level. We want to point out that the only diﬀerence between EMa P and LIME is in the perturbations and how to compute the distances between the perturbations and the original data points. This clearly demonstrates that optimizing the perturbations can greatly improve the explanation s quality. Another interesting observation is the red areas on top of the digit 4 and at the bottom of the Trouser (the second and third rows of Fig. 9) only identiﬁed by EMa P. In fact, those areas are essential to the predictions since they diﬀerentiate the samples from other classes, i.e., digit 9 and the Dress, respectively. Those areas, on the other hand, are neglected by all other examples.

Figure 10: The precision and recall of explanations returned by Greedy, LIME, Parzen, and LIME-EMa P (the higher the better). The dots are in the increasing order of the number of features in explanations (left to right).

Text data sets. We report our experimental results for the sentiment classiﬁcation task in the Multi-Domain Sentiment data set (Blitzer et al., 2007). We follow the experimental setup in (Ribeiro et al., 2016), in which the ground-truth explanatory features of the L1 logistic regression model are known. In particular, a regression model is trained for each data set, and the ground-truth explanatory features are non-zero coeﬃcients. For each explanatory budget, i.e., the number of features included in the explanations, the fraction of these ground-truth features recovered by the explanations is computed. The performance of an explainer is evaluated by the precision and the recall rate of the features in the explanations. Intuitively, the more features included in the explanation, the higher the recall and the lower the precision.

Vu, Mai, and Thai

Fig. 10 shows the scatter plot of precision vs. recall of LIME and LIME with EMa P on 4 review data sets of books, dvds, kitchen, and electronics. Similar to how LIME is evaluated in these data sets in its original paper, we compare LIME with EMAP to LIME, Greedy, and Parzen explanation methods (Baehrens et al., 2010). In Greedy, the features contributing the most to the predicted class are removed until the prediction changes. On the other hand, Parzen approximates the model globally with Parzen windows, and the explanation is the gradient of the prediction. The results clearly show that EMa P consistently improves the faithfulness of LIME.

Figure 11: RDT ﬁdelity scores of diﬀerent perturbation-based methods on MNIST and Fashion-MNIST (the higher the better).

Image data sets. Since there are no ground-truth explanations for the MNIST and Fashion-MNIST image data sets, we evaluate explanations using the RDT-ﬁdelity Funke et al. (2023), the inﬁdelity scores (Yeh et al., 2019), and the log-odds scores (Shrikumar et al., 2017). The RDT-ﬁdelity of an explanation S with respect to a model f is the probability that the model s prediction does not change if the features in the explanation remain unchanged:

RDT(S) = E [1 (f(x) = f(x S))] ,

where x S is x with noise added on features not in S, and the expectation is taken over the noise. On the other hand, the inﬁdelity score measures the expected error between the explanation multiplied by a meaningful perturbation and the diﬀerences between the predictions at its input and at the perturbation. The metric can be considered as a generalized notion of Sensitivity-n (Ancona et al., 2018). Intuitively, explanations with lower inﬁdelity are more desirable. Finally, given the importance weights on features (i.e., the explanation), the log-odds score measures the diﬀerence between the image and the modiﬁed image whose

EMa P: Explainable AI with Manifold-based Perturbations

Figure 12: Inﬁdelity scores of diﬀerent perturbation-based methods on MNIST and Fashion MNIST (the lower the better).

Figure 13: Log-odds scores of diﬀerent perturbation-based methods on MNIST and Fashion MNIST (the higher the better).

pixels are erased based on their importance weights. Intuitively, the higher the log-odds score, the better the explanation. The evaluations using the RDT-ﬁdelity are computed on explanations of K = 40 and 80 features of the explained image (Fig. 11). The results clearly demonstrate the high ﬁdelity of our EMa P in both image data sets. Fig. 12 shows the inﬁdelity scores. It is clear that EMa P has the lowest inﬁdelity score among all black-box methods. Even though the white box methods, Kernel SHAP and Deep LIFT, have more information on the explained models than EMa P, they can only outperform EMa P in Fashion-MNIST. Additionally, Fig. 13 shows the log-odds scores of EMa P with low-dimension d = 2 and d = 3, along with other explanation methods and other perturbation schemes. In these experiments, the scores are computed on 20% erased pixels. In MNIST, we can see that EMa P does not degrade the explainer performance compared to LIME in terms of log-odds (note that the default setting for LIME for image data is LIME+). For Fashion-MNIST, EMa P improves the logodds signiﬁcantly. More examples of EMa P and other explanation methods are provided in Appendix A.

7.3 Perturbation s robustness

We evaluate the robustness of the perturbation scheme based on the discriminator s performance in diﬀerentiating perturbations from the original data. Following the setup in (Slack

Vu, Mai, and Thai

et al., 2020), the discriminator is trained with full knowledge of the explainer s parameters. This discriminator has been shown to be able to recognize perturbations of LIME and SHAP explainers. Note that, as Deep LIFT and Gradient SHAP are white-box methods and do not use perturbations, we only evaluate the robustness of LIME s perturbations and Kernel SHAP s perturbations. Since there is no ambiguity, we use SHAP to denote Kernel SHAP in the following results. Our experimental results show that EMa P s perturbation is more robust to the discriminator. Speciﬁcally, Figs. 14, 15, 16, and 17 show the True-Positive (TP) and True-Negative (TN) rates of discriminators on perturbations of 2 image data sets and 2 tabular data sets. For image data sets, the discriminators can easily recognize perturbations generated by LIME and SHAP. On the other hand, EMa P perturbations signiﬁcantly lower the success rates of discriminators in recognizing the perturbations. While the explainers perturbation schemes prove to be slightly more robust in the tabular data sets compared to the image data sets, EMa P still improves the perturbations robustness remarkably.

7.4 Computational resource and run time

Our experiments are conducted on a single GPU-assisted compute node that is installed with a Linux 64-bit operating system. The allocated resources include 32 CPU cores (AMD EPYC 7742 model) with 2 threads per core and 100GB of RAM. The node is also equipped with 8 GPUs (NVIDIA DGX A100 Super Pod model), with 80GB of memory per GPU.

EMa P Initialization LIME EMa P (d=2) EMa P (d=3)

MNIST 240-260 0.763 1.311 1.493 Fashion-MNIST 240-260 0.726 1.502 1.467

Table 4: Run time (in seconds) of EMa P and LIME in generating explanations using 1000 perturbations per explanation. The reported numbers are seconds (for initialization column) and seconds per explanation (for other columns).

The run time of EMa P is mostly dictated by the learning of the embedding function (line 2 of Algorithm 1). That initialization step in the tabular data set for about 2000 data points takes less than 2 minutes. It takes between 240 and 260 seconds for all images of 60000 MNIST/Fashion-MNIST images. Note that the manifold and local subspaces can be computed before deployment since it does not depend on the explained inputs. Thus, this overhead of EMa P can be mitigated at deployment. Table 4 reports the actual run time of EMa P and LIME in the image data sets.

8. Conclusion, limitations and future research

From our theoretical and experimental results, we exploit the data manifold to preserve the topology information of its perturbation. We implement the EMa P to realize the idea and demonstrate its beneﬁts in the explanation task. We recognize the main limitation of EMa P is in its requirement for low-dimensional representations of the data and the local aﬃne subspaces. For more complex data, computing it correctly can be very challenging.

EMa P: Explainable AI with Manifold-based Perturbations

There are several interesting open questions of EMa P that we leave for our future work. For instance, it is important to study the impact of the underlying manifold-learning algorithm, i.e., the UMAP, on the perturbations and the explanations. It is also interesting to examine the behavior of EMa P in a wider range of explainers and applications.

Acknowledgement

This work is partially supported by the National Science Foundation under Grant No. FAI1939725 and SCH-2123809.

Vu, Mai, and Thai

Figure 14: True-Positive and True-Negative rates of the discriminators on perturbations of diﬀerent methods on the MNIST data set (the lower the better).

Figure 15: True-Positive and True-Negative rates of the discriminators on perturbations of diﬀerent methods on the Fashion-MNIST data set (the lower the better).

Figure 16: True-Positive and True-Negative rates of the discriminators on perturbations of diﬀerent methods on Communities and Crime data set (the lower the better).

Figure 17: True-Positive and True-Negative rates of the discriminators on perturbations of diﬀerent methods on the German Credit data set (the lower the better).

EMa P: Explainable AI with Manifold-based Perturbations

Appendix A. Visualizations of synthetic data and explanations generated with EMa P

This Appendix provides some visualizations of our synthetic data (in experiments of Table 3) and explanations generated with or without EMa P. The explanations of EMa P shown in this Appendix are those that appeared in the experiments reported in Section 7 of the main manuscript.

Figure 18: The visualization of some synthetic data in experiments of Table 3.

In Fig. 18, we visualize the synthetic data of diﬀerent shapes and their perturbations in three dimensions. We also report H0 and H1 Bottleneck distances between the perturbations and the original data in that ﬁgure.

Vu, Mai, and Thai

Figure 19: Visualization of EMa P-LIME explanations of the Multi-polarity-books review data sets (Blitzer et al., 2007).

Figure 20: Visualization of EMa P-LIME explanations of the Multi-polarity-kitchen review data sets (Blitzer et al., 2007).

Fig. 19 and 20 compare the actual explanations returned by LIME and EMa P-LIME (EMa P) in the books reviews in the Multi-Domain Sentiment data sets. We can see that the weights of features included in the explanations are quite similar between the two methods. Fig 21 shows the explanations of LIME with EMa P for all classes in the MNIST data set. The red (blue) areas mean, if the features in those areas are unchanged (changed), the prediction for that class will be stronger (weaker). Finally, Fig. 22 and 23 provide some explanations of EMa P along with explanations of other methods in MNIST and Fashion-MNIST, respectively. The color code also has a similar meaning to that of Fig 21.

EMa P: Explainable AI with Manifold-based Perturbations

Figure 21: Visualization of EMa P-LIME explanations of the MNIST data set. The column indicates the class label to be explained.

Vu, Mai, and Thai

Figure 22: Examples of explanations in MNIST. Modifying the red-est (blue-est) area would negate (strengthen) the original prediction.

EMa P: Explainable AI with Manifold-based Perturbations

Figure 23: Examples of explanations in Fashion-MNIST. Modifying the red-est (blue-est) area would negate (strengthen) the original prediction.

Vu, Mai, and Thai

M. Ancona, E. Ceolini, C. Oztireli, and M. Gross. Towards better understanding of gradientbased attribution methods for deep neural networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Sy21R9JAW.

D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. M uller. How to explain individual classiﬁcation decisions. Journal of Machine Learning Research, 11 (61):1803 1831, 2010. URL http://jmlr.org/papers/v11/baehrens10a.html.

J. Blitzer, M. Dredze, and F. Pereira. Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classiﬁcation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 440 447, Prague, Czech Republic, June 2007. Association for Computational Linguistics. URL https://aclanthology. org/P07-1056.

J. Chakraborty, K. Peng, and T. Menzies. Making fair ML software using trustworthy explanation. Co RR, abs/2007.02893, 2020. URL https://arxiv.org/abs/2007.02893.

C.-H. Chang, E. Creager, A. Goldenberg, , and D. Duvenaud. Explaining image classiﬁers by adaptive dropout and generative in-ﬁlling. ar Xiv preprint ar Xiv:1807.08024, 2018.

F. Chazal, D. Cohen-Steiner, L. Guibas, F. M emoli, and S. Oudot. Gromov-hausdorﬀstable signatures for shapes using persistence. Comput. Graph. Forum, 28:1393 1403, 07 2009. doi: 10.1111/j.1467-8659.2009.01516.x.

I. C. Covert, S. Lundberg, and S.-I. Lee. Explaining by removing: A uniﬁed framework for model explanation. J. Mach. Learn. Res., 22(1), jan 2021. ISSN 1532-4435.

B. Dimanov, U. Bhatt, M. Jamnik, and A. Weller. You shouldn t trust me: Learning models which conceal unfairness from multiple explanation methods. In Safe AI@AAAI, 2020.

A.-K. Dombrowski, M. Alber, C. J. Anders, M. Ackermann, K.-R. M uller, and P. Kessel. Explanations can be manipulated and geometry is to blame. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d Alch e Buc, E. B. Fox, and R. Garnett, editors, Neur IPS, pages 13567 13578, 2019. URL http://dblp.uni-trier.de/db/conf/nips/nips2019.html# Dombrowski AAAMK19.

H. Edelsbrunner and J. Harer. Computational Topology: An Introduction. Applied Mathematics. American Mathematical Society, 2010. ISBN 9780821849255. URL https: //books.google.com.vn/books?id=MDXa6g FRZu IC.

T. Funke, M. Khosla, M. Rathee, and A. Anand. Zorro: Valid, sparse, and stable explanations in graph neural networks. IEEE Transactions on Knowledge &amp; Data Engineering, 35(08):8687 8698, aug 2023. ISSN 1558-2191. doi: 10.1109/TKDE.2022.3201170.

A. Ghorbani, A. Abid, and J. Zou. Interpretation of neural networks is fragile. Proceedings of the AAAI Conference on Artiﬁcial Intelligence, 33(01):3681 3688, Jul. 2019. doi: 10.1609/aaai.v33i01.33013681. URL https://ojs.aaai.org/index.php/AAAI/ article/view/4252.

EMa P: Explainable AI with Manifold-based Perturbations

R. Ghrist. Barcodes: The persistent topology of data. BULLETIN (New Series) OF THE AMERICAN MATHEMATICAL SOCIETY, 45, 02 2008. doi: 10.1090/ S0273-0979-07-01191-3.

R. Ghrist. Elementary Applied Topology. Create Space Independent Publishing Platform, 2014. ISBN 9781502880857. URL https://books.google.com.vn/books?id= Z5ATog EACAAJ.

M. L. Gromov. Groups of polynomial growth and expanding maps. Publ. Math., Inst. Hautes Etud. Sci, pages 53 73, 1981.

J. Heo, S. Joo, and T. Moon. Fooling neural network interpretations via adversarial model manipulation. In Neur IPS, 2019.

H. Hofmann. UCI machine learning repository, 1994. URL http://archive.ics.uci.edu/ ml.

L. K. JeﬀLarson, Surya Mattu and J. Angwin. How we analyzed the compas recidivism algorithm, 2016.

Y. Le Cun and C. Cortes. MNIST handwritten digit database. 2010. URL http://yann. lecun.com/exdb/mnist/.

Z. C. Lipton. The mythos of model interpretability. Queue, 16(3):31 57, June 2018. ISSN 1542-7730. doi: 10.1145/3236386.3241340. URL https://doi.org/10.1145/3236386. 3241340.

S. M. Lundberg and S.-I. Lee. A uniﬁed approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30, pages 4765 4774. 2017.

S. M. Lundberg, G. Erion, H. Chen, A. De Grave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee. From local explanations to global understanding with explainable ai for trees. Nature Machine Intelligence, 2(1):56 67, Jan 2020. ISSN 2522-5839. doi: 10.1038/s42256-019-0138-9. URL https://doi.org/10.1038/ s42256-019-0138-9.

L. Mc Innes, J. Healy, N. Saul, and L. Grossberger. Umap: Uniform manifold approximation and projection. The Journal of Open Source Software, 3(29):861, 2018.

W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu. Deﬁnitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071 22080, 2019. ISSN 0027-8424. doi: 10.1073/pnas.1900654116. URL https://www.pnas.org/content/116/44/22071.

F. M emoli and G. Sapiro. Comparing point clouds. In SGP 04: Proceedings of the 2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, volume 71, pages 33 42, 01 2004. doi: 10.1145/1057432.1057436.

Vu, Mai, and Thai

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. De Vito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8026 8037. 2019.

M. Redmond. UCI machine learning repository, 2011. URL http://archive.ics.uci. edu/ml.

M. T. Ribeiro, S. Singh, and C. Guestrin. Why should i trust you? : Explaining the predictions of any classiﬁer. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 1135 1144, 2016. ISBN 9781450342322. doi: 10.1145/2939672.2939778.

S. Saito, E. Chua, N. Capel, and R. Hu. Improving lime robustness with smarter locality sampling. Ar Xiv, abs/2006.12302, 2020.

P. Schwab and W. Karlen. CXPlain: Causal Explanations for Model Interpretation under Uncertainty. In Advances in Neural Information Processing Systems (Neur IPS), 2019.

A. Shrikumar, P. Greenside, and A. Kundaje. Learning important features through propagating activation diﬀerences. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 3145 3153, 06 11 Aug 2017.

D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju. Fooling lime and shap: Adversarial attacks on post hoc explanation methods. In AAAI/ACM Conference on Artiﬁcial Intelligence, Ethics, and Society (AIES), 2020. URL https://arxiv.org/pdf/1911.02508. pdf.

E. ˇStrumbelj and I. Kononenko. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41:647 665, 2013.

M. Sundararajan, A. Taly, and Q. Yan. Axiomatic attribution for deep networks. In D. Precup and Y. W. Teh, editors, Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pages 3319 3328. PMLR, 2017. URL http: //proceedings.mlr.press/v70/sundararajan17a.html.

C. Tralie, N. Saul, and R. Bar-On. Ripser.py: A lean persistent homology library for python. The Journal of Open Source Software, 3(29):925, Sep 2018. doi: 10.21105/joss.00925. URL https://doi.org/10.21105/joss.00925.

D. Vres and M. Robnik-Sikonja. Better sampling in explanation methods can prevent dieselgate-like deception. Co RR, abs/2101.11702, 2021. URL https://arxiv.org/abs/ 2101.11702.

M. Vu and M. T. Thai. Pgm-explainer: Probabilistic graphical model explanations for graph neural networks. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages

EMa P: Explainable AI with Manifold-based Perturbations

12225 12235. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/ paper/2020/file/8fb134f258b1f7865a6ab2d935a897c9-Paper.pdf.

H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. Ar Xiv, abs/1708.07747, 2017.

C.-K. Yeh, C.-Y. Hsieh, A. S. Suggala, D. I. Inouye, and P. Ravikumar. On the (in)ﬁdelity and sensitivity of explanations. In Neur IPS, 2019.

Z. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec. GNNexplainer: Generating explanations for graph neural networks. In Advances in Neural Information Processing Systems 32, pages 9244 9255. 2019.

M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision ECCV 2014, pages 818 833, Cham, 2014. Springer International Publishing. ISBN 978-3-31910590-1.