# causal_explanations_for_sequential_decision_making__03771486.pdf

Causal Explanations for Sequential Decision Making

SAMER B. NASHED , University of Massachusetts Amherst, USA SAADUDDIN MAHMUD, University of Massachusetts Amherst, USA CLAUDIA V. GOLDMAN, Hebrew University, Israel SHLOMO ZILBERSTEIN, University of Massachusetts Amherst, USA

Stochastic sequential decision-making systems such as Markov decision processes and their variants are increasingly used in areas such as transportation, healthcare, and communication. However, the ability to explain these systems outputs to non-technical end users has not kept pace with their widespread adoption. This paper addresses that gap by extending prior work and presenting a unified framework for generating causal explanations of agent behavior in sequential decision-making settings, grounded in the structural causal model (SCM) paradigm. Our framework supports the generation of multiple, semantically distinct explanations for agent actions capabilities that were previously unattainable. In addition to introducing a novel taxonomy of explanations for MDPs to guide empirical investigation, we develop both exact and approximate causal inference methods within the SCM framework. We analyze their applicability and derive run-time bounds for each. This leads to the proposed algorithm, Mean RESP, which operates flexibly across a spectrum of approximations tailored to external constraints. We further analyze the sample complexity and error rates of approximate Mean RESP, and provide a detailed comparison of its outputs under varying definitions of responsibility with popular Shapley-value-based methods. Empirically, we performed a series of experiments to evaluate the practicality and effectiveness of the proposed system, focusing on real-world computational demands and the validity and reliability of metrics for comparing approximate and exact causal methods. Finally, we present two user studies that reveal user preferences for certain types of explanations and demonstrate a strong preference for explanations generated by our framework compared to those from other state-of-the-art systems.

JAIR Associate Editor: Scott Sanner

JAIR Reference Format: Samer B. Nashed, Saaduddin Mahmud, Claudia V. Goldman, and Shlomo Zilberstein. 2025. Causal Explanations for Sequential Decision Making. Journal of Artificial Intelligence Research 83, Article 17 (July 2025), 62 pages. doi: 10.1613/jair.1.18126

1 Introduction

Systems for automated sequential decision making are now used in a variety of applications, including autonomous driving [159], healthcare [14], and communication networks [5], among many others. Moreover, their rate of proliferation is predicted to increase [95, 136, 52]. Whether designed for the general public or as aids in specific industries or applications, these systems often interface primarily with people who are not experts in the science of decision making and AI, or even well-versed in the principles and concepts of automation more broadly. Thus understanding, managing, and mitigating the risks of misuse, disuse, and abuse of autonomous systems in general has been a topic of considerable interest for some time [121]. Among other findings, researchers have established

Corresponding Author.

Authors Contact Information: Samer B. Nashed, snashed@cs.umass.edu, orcid: 0009-0008-3010-2071, University of Massachusetts Amherst, Amherst, Massachusetts, USA; Saaduddin Mahmud, smahmud@cs.umass.edu, orcid: 0000-0001-8767-0450, University of Massachusetts Amherst, Amherst, Massachusetts, USA; Claudia V. Goldman, claudia.goldman@mail.huji.ac.il, Hebrew University, Jerusalem, Jerusalem, Israel; Shlomo Zilberstein, orcid: 0000-0001-9817-7848, shlomo@cs.umass.edu, University of Massachusetts Amherst, Amherst, Massachusetts, USA.

This work is licensed under a Creative Commons Attribution International 4.0 License.

2025 Copyright held by the owner/author(s). doi: 10.1613/jair.1.18126

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:2 Nashed, Mahmud, Goldman & Zilberstein

that developing trust is required for the adoption and proficient use of AI systems [82, 143, 162, 66], and it is widely accepted that autonomous agents that can explain their decisions help promote trust [23, 51, 101]. However, there are many challenges in generating such explanations. Consider, for example, an autonomous vehicle (AV) that stopped behind a truck for a long time. The passenger may wonder whether the AV is waiting for the truck to move, waiting for an opportunity to pass the truck, or is dealing with some technical problem. Generating suitable explanations of such a system is hard due to the complexity of planning, which may involve large state spaces, stochastic actions, imperfect observations, and complicated objectives. Furthermore, useful explanations must somehow reduce the internal reasoning process to a form understandable by a user who likely does not know all of the algorithmic details. Another challenging aspect is the heterogeneity of possible operational contexts and interaction with users whose expectations and prior knowledge may vary substantially. For example, in the above AV scenario, the explanation provided to a driver evaluating whether or not to intervene and take control of the vehicle may differ from that given to a passenger. In addition, different planning, learning, and decision-making algorithms may not provide universal mechanisms for explanation due to fundamental differences in the available information. The debate on the definition, taxonomy, and purpose of explanations has been well represented in the cognitive science, psychology, and philosophy literature for a long time. While still active, there are several insights for which there is a broad consensus [104, 105], and we use this knowledge to motivate our approach. Scholars studying explanations largely agree that requests for explanations are often motivated by a mismatch between the mental model of the requester and a logical conclusion based on observation [54, 55, 58, 57, 86, 157], which creates a form of generalized model reconciliation problem [22]. Researchers also agree that explanations often require counterfactual analysis [92, 83, 56, 85], which in turn requires causal determination [158, 130, 84]. There are several computational paradigms for causal analysis, including those based on conditional logic [80, 39], and statistics [36]. Among the most well-studied paradigms is the structural causal model (SCM), which has been popularized in computer science by Halpern and Pearl [44, 45, 43]. SCMs, also known as structural equation models (SEMs), are an object of study in their own right [13] as they form an important generalization of causal Bayesian networks while retaining the critical benefit of producing interpretable directed graphs. Their expressive power, relative simplicity in interpretation, and popularity in the broader community make them an attractive tool for analyzing the output of sequential decision-making systems. In particular, we study one significant class of decision-making models, the Markov decision process (MDP). MDPs and their derivatives have several properties that make them a popular and effective method for tackling a variety of real-world decision-making problems, most notably their ability to encode uncertainty in action outcomes as well as their generality and modeling flexibility. Thus, there is an obvious desire to generate explanations for the output of these models. This work extends previous work [114] in which we propose a framework, based on SCMs, for applying causal analysis to the behavior of sequential decision-making agents that use MDPs as their decision-making models. Our framework creates an SCM representation of the computation needed to derive an optimal policy for an MDP and applies causal inference to identify variables whose current values cause certain agent behavior. These variables can then be used to generate explanations, for example, by completing natural language templates. Our framework provides two main benefits. First, it is theoretically sound, based on concepts and formalisms from the causality literature, while many existing approaches use heuristics [31, 71, 154, 67]. Second, it is more flexible, allowing us to identify multiple semantically distinct types of explanans, whereas existing approaches often explain events in terms of a single set of variables in the decision-making model. For example, they may use only state factors or only reward variables, whereas we may use any set, increasing our framework s applicability. Our analysis culminates in an algorithm, Mean RESP, and a process for translating MDP models into structures that facilitate causal analysis. Mean RESP may be applied to settings beyond the analysis of MDPs, is compatible with several definitions of cause and responsibility [24], and admits several approximate techniques for problems where exact inference is not tractable. We also propose several measures for comparing approximate Mean RESP

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:3

outputs to exact methods. These measures capture a range of differences and underscore the difficulty of devising a single measure for evaluating objects as complicated, nuanced, and context-dependent as explanations. We present both theoretical and empirical results covering a range of questions regarding our framework s applicability, efficiency, and efficacy. Theoretically, we determine the domain of problems for which our framework is exact and provide some worst-case run-time bounds for the algorithms presented. We also include results on the correctness and sampling error rates for causal and responsibility determination for approximate Mean RESP. Empirically, we follow up on these theoretical results, measuring computation use, sampling error rates, and convergence rates in practice, and testing the proposed measures. We also measure disagreement between Mean RESP and Shapley-value-based methods when applied to neural networks representing both learned MDP policies and classifiers. Furthermore, we present results from two user studies. The first compares the proposed method to existing state-of-the-art heuristic methods and we find statistically significant preferences in favor of explanations generated via causal reasoning. The second investigates if and how user preferences for different explanations may change depending on their level of agency with respect to the agent providing the explanation in an autonomous driving scenario. In all, these results establish important properties of the proposed framework, highlight several of its benefits compared to existing approaches, and outline promising directions for future research on explaining stochastic planning systems.

2 Related Work Automatically generating explanations for different AI systems is a rapidly growing field of research, and at a high level, there are two primary characteristics that create a natural taxonomy of these works: first, whether the system is used for planning or prediction, either discrete or continuous, and second, whether the computation is performed in an explicitly derived model or a learned, frequently black-box, model. Although not perfectly correlated, the prevalence of model-based planning and the popularity of black-box regression models have given rise to two distinct areas of focus, each with their own techniques and philosophies. Work on explaining the output of black-box machine learning algorithms [108, 88, 70] often uses the terms explainable or interpretable machine learning (XML). Many XML efforts focus on feature attribution, and these concepts have been applied to a large number of specific applications, including the natural sciences [127, 163], finance [17], industry [40, 41], and healthcare [90, 124], with a common approach being to compute and analyze Shapley values1 [137, 128] and their approximations [2]. Many similar algorithms have been shown to be variations of Shapley value algorithms [89], and this insight has both popularized the application of Shapley values to the generation of explanations and encouraged many follow-up works, including modifying the original algorithm so that user goals can inform and constrain the solution set [156] and new methods for constructing the background dataset [3]. Due to the black-box nature of state-of-the-art deep reinforcement learning techniques, there have also been several applications of Shapley values to explain the policies learned by these systems [53]. Although the networks ultimately represent MDP policies, elements of the decision-making model such as the reward, transition, and value function are not explicitly represented and thus are not accessible to Shapley value analysis or any other explanation generation system. Recent work has highlighted additional shortcomings of these approaches for explanation generation. Notably, these include the numerous plausible interpretations of how to apply these concepts to specific systems [145, 102],

1Named after their inventor, economist Lloyd Shapley, Shapley values assign a distribution of partial payoffs (or parts of the total surplus) generated by a coalition of players in a cooperative game. They have since been adopted for explaining mostly black-box models in machine learning by reinterpreting surplus" or payoff" as model output" and substituting players" for input features", leading to a high-level formalization of the form: Shapley value for feature 𝑥𝑖in model 𝜋= 1

𝑛 Í 𝑋|𝑥𝑖 𝑋 𝜋(𝑋 𝑥𝑖) 𝜋(𝑋)

𝑁 , where 𝑛is the total number of features and 𝑁is the number of subsets of features that have size |𝑋| and do not contain 𝑥𝑖. Their popularity has been driven in part by the existence of several nice properties, including efficiency, symmetry, linearity, and null player, which align with common intuitions regarding how Shapley values should be interpreted under certain conditions.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:4 Nashed, Mahmud, Goldman & Zilberstein

as well as challenges in handling conditional effects on the distribution of certain features, and potential ambiguity in interpreting the output particularly in the context of actionable recourse [75]. Moreover, systems like SHAP assign values to single input variables, and do not capture the simultaneous effect of multiple counterfactual variable assignments. Overall, Shapley-value-based methods may apply favorably to some machine learning settings, which is why we include them in part of our analysis, but do not seem appropriate for understanding and communicating the reasoning that occurs within stochastic sequential decision-making systems, such as agents who use MDPs. While it is possible to apply Mean RESP to black-box systems, and we do compare the approximation error of one of the original proponents of Shapley-value-based feature attribution [142] and different versions of approximate Mean RESP, our primary focus is on the analysis of planning models. Work on explainable planning (XAIP) has been somewhat more varied and generally aims to either explain the outputs of planning models and algorithms or modify the planning algorithms so that they produce plans that are inherently more explainable [35, 21, 7]. These concepts have been captured in a variety of metrics and definitions [20], which measure different aspects of how a plan s execution intuitively aligns with user expectations. There has also been substantial work on observer-aware systems, where plans are constructed, executed, or implicitly communicated with the explicit notion of the presence of an observer, typically in a collaborative setting [28, 140, 106, 107, 42, 148]. While valuable, we view the philosophies motivating these approaches to be largely orthogonal to, and potentially compatible with, our own. Thus, most XAIP research has been devoted to deterministic planners, or analyzing plans while still planning or after they have been executed. More closely aligned with the efforts described in this paper, there have been promising advances in incorporating counterfactual and contrastive reasoning into classical XAIP problems [74] and applying XAIP formalisms to classification models [97]. Such methods, for example, may analyze plans with respect to higher-level properties, determining mutually exclusive properties or sets of properties that different plans could satisfy [29]. Other methods propose generating feature-based (or state-factor-based) explanations, but with respect to action sequences in deterministic planning problems [135]. In general, these approaches have the potential to use more interpretable planning models to produce more user-friendly explanations. However, many applications operate in stochastic domains or require explanations in real time. On this front, research on explanations of stochastic planners is relatively sparse; however, there are several notable existing efforts. Exact analysis or explanation, regardless of the framework, is often prohibitively expensive. Generally speaking, there are three alternatives: direct approximation of the underlying exact computation, use of heuristics, and explanation via summarization. We spend most of this paper understanding and comparing the first two approaches, but we briefly cover some summarization approaches for MDP policies first. Summarization methods are typically more complex than their alternatives. These methods either simplify (summarize) the original problem and then provide an exact explanation of the simplified reasoning problem, or summarize the solution (e.g., policy) post-hoc and explain the simplified policy. For example, originally Brázdil et al. [2015], and later Russell and Santos [2019], use decision trees to approximate a given policy and analyze the decision nodes to determine which state factors are most influential for immediate reward. Panigutti et al. [2020] used similar methods to explain classifiers. Bustin and Goldman [2024] summarize MCTS trees by estimating the entropy of subtrees. Pouget et al. [2020] identify key state-action pairs via spectrum-based fault localization, wherein they repeatedly compare trajectories from a given policy to trajectories given counterfactual policies and try to summarize the initial policy in terms of a smaller number of key states or decision points that have a disproportionate impact on the quality of the trajectories. Linear temporal logic [6] and the HIGHLIGHTS approach [63] have also been used to create summaries of policies for explanation purposes. Other approaches use the power of summarization implicitly, for example to create alternate (possibly smaller) MDPs in which the expected and observed action are the same [33]. It is theoretically possible to combine summarization techniques with our framework by applying Mean RESP to an abstract [81] version of an MDP, although we do not explore this in detail in this paper. Summarization

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:5

methods are appealing because they parallel our intuitions about simplification in a number of other settings, such as analogizing during an explanation [27], science communication [132], and even other AI tools, such as automated text simplification [116, 141] or summarization [4]. However, these methods are often driven by heuristics and may be hard to generalize to different planners and models. Methods for deriving explanations of the behavior of MDP agents through a more direct application of heuristics are better represented in the literature. Many of these methods apply the spirit of counterfactual reasoning without performing any sort of formal causal analysis. For example, Elizalde et al. [2009] identify important state factors by generating counterfactual states (state factors are assigned new values) and then analyzing how the value function changes given perturbations to different state factors. State factor perturbations that result in large changes in the value function are said to be more important, even if it is not possible to transition from the current state to the counterfactual state in the MDP. Wang et al. [2016] try to explain policies of partially observable MDPs (POMDPs) by communicating the relative likelihoods of different events or levels of belief. However, research clearly indicates that humans are not good at using this kind of numerical information [104]. A more common heuristic approach is to analyze, and thus produce explanations that reference, the reward function. Khan et al. [2009] present a technique to explain policies for factored MDPs by analyzing the expected occupancy frequency of states with extreme reward values. Instead of looking at how the policy is affected by the reward function overall, Juozapaitis et al. [2019] analyze how extreme reward values impact action selection in decomposed-reward RL agents, and Bertram and Wei [2018] examine reward sources in deterministic MDPs. Later, Sukkerd et al. [2020] proposed explaining factored MDPs by annotating them with quality attributes" (QAs) related to independent, measurable cost functions. The explanations describe the QA objectives, the expected consequences of the QA values given a policy, and how those values contribute to the expected cost of the policy. The system also explains whether the policy achieves the best possible QA values simultaneously, or if there are competing objectives that required reconciliation, and proposes counterfactual alternatives. Thus, it explains entire policies, not individual actions, using custom graphics and natural language templates, the latter of which have become the de facto standard for automatic explanations. Overall, while they are computationally cheap and easy to implement, heuristic methods have limited scope in the explanations they provide, as they typically analyze only a single component of the MDP models, and do not have many theoretical advantages, if any. Recently, Madumal et al. [2020] proposed the use of structural causal models for explaining MDPs, using SCMs to encode the influence of particular actions available to the agent. This approach was used in a modelfree, reinforcement learning setting to learn the structural equations as multivariate regression models during training. However, it requires several strong assumptions, including the prior availability of a graph representing causal direction between variables, discrete actions, and the existence of sink states. Triantafyllou et al. [2022] also construct SCMs representing decentralized partially observable MDPs during policy execution, using variables that represent observations and rewards, among other things, that occur during deployment. The counterfactual analysis then concerns alternative actions. Here, the focus is on extending the definitions of cause and responsibility to the multi-agent setting in light of an agent s own ability to manipulate its level of responsibility. These concepts are loosely related to recent, more comprehensive work on causality in multi-agent settings, and games in particular [48]. In contrast, our proposed framework focuses on allowing causal analysis of all the components of MDPs using a single set of algorithms, is concentrated on the more applicable vanilla MDP model, and establishes a more rigorous experimental justification for causal explanations of stochastic planners. Moreover, it remains theoretically well-justified as it rests on a concrete theory of causality and can be easily extended for cases where approximate reasoning is required, including model-free planners. In summary, there has been a large body of work on explainable AI systems in general, but relatively little on explainable stochastic planning. Moreover, most of those limited efforts use heuristics, making Mean RESP one of the few causality-based methods for the automatic generation of explanations of stochastic planners. Those who do propose using SCMs for explaining MDPs and their variants in both planning and learning scenarios

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:6 Nashed, Mahmud, Goldman & Zilberstein

Table 1. Important notations, summarized from Halpern and Pearl [2005].

Notation Meaning 𝑋 A set of decision variables, 𝑋= {𝑋1,𝑋2,𝑋3} 𝑥 An assignment of values to the set 𝑋, {𝑋1 = 𝑥1,𝑋2 = 𝑥2,𝑋3 = 𝑥3} P(𝑋) Power set of 𝑋 D(𝑋) Domain of the joint assignments of all 𝑥 𝑋

𝑥 𝑥|𝑋 𝑥 is the restriction of 𝑥to 𝑋 , e.g. if 𝑋 = {𝑋1} and 𝑥= {𝑋1 = 𝑥1,𝑋2 = 𝑥2,𝑋3 = 𝑥3}, then 𝑥 = {𝑋1 = 𝑥1}

𝑥 [𝑥 𝑥 ] Replace values of 𝑥with values from 𝑥 , e.g. if 𝑥= {𝑋1 = 𝑥1,𝑋2 = 𝑥2} and 𝑥 = {𝑋1 = 𝑏}, then 𝑥= {𝑋1 = 𝑏,𝑋2 = 𝑥2}

have likewise based their analysis on formal definitions of causality and responsibility [46, 24]. However, our method and its approximate variants offer an increased level of generality, simplicity, and efficiency by building upon an existing responsibility attribution method called RESP, introduced by Bertossi et al. [2020] to explain classification outcomes, making it a particularly compelling framework for generating explanations of stochastic sequential decision-making systems.

3 Background Here, we review some concepts and notation relevant to the three main formalisms this paper builds upon: Markov decision processes, structural causal models, and our working definitions of weak and actual causes and responsibility. Table 1 provides a reference for common notation.

3.1 Markov Decision Processes A Markov decision process is a model for reasoning in fully observable, stochastic environments [9]. That is, environments in which the agent knows precisely the current state of the world, but has some uncertainty over the resulting state of the world conditioned on some action it may take. Formally, we define an MDP as a tuple 𝑀= 𝑆,𝐴,𝑇, 𝑅,𝑑,𝛾 , where

𝑆is a finite set of states. 𝑠 𝑆may be expressed in terms of a set of state factors, 𝑓1, 𝑓2, . . . , 𝑓𝑁 , such that 𝑠 indexes a unique assignment of values to the factors 𝑓; 𝐴is a finite set of actions; 𝑇: 𝑆 𝐴 𝑆 [0, 1] is a transition function, representing the probability of reaching state 𝑠 𝑆after performing action 𝑎 𝐴in state 𝑠 𝑆; 𝑅: 𝑆 𝐴 𝑆 R is a reward function, representing the expected immediate reward of reaching state 𝑠 𝑆after performing action 𝑎 𝐴in state 𝑠 𝑆; 𝑑: 𝑆 [0, 1] is a start state distribution, representing the probability of starting in state 𝑠 𝑆; 𝛾is a discount factor, representing the degree of preference for immediate rewards relative to future rewards. 0 𝛾< 1.

A solution to an MDP is a policy 𝜋: 𝑆 𝐴indicating that an action 𝜋(𝑠) 𝐴should be performed in a state 𝑠 𝑆. In some types of MDPs policies may be stochastic, and thus solutions may be written as 𝜋(𝑎|𝑠), which is the probability of choosing action 𝑎when in state 𝑠. A policy 𝜋induces a value function 𝑉𝜋: 𝑆 R representing the expected discounted cumulative reward 𝑉𝜋(𝑠) R for each state 𝑠 𝑆. The objective of an MDP solver is to find an optimal policy, 𝜋 , that maximizes the expected discounted cumulative reward for every state 𝑠 𝑆. This is equivalent to satisfying the Bellman optimality equation:

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:7

𝑉 (𝑠) = max 𝑎 𝐴

𝑠 𝑆 𝑇(𝑠,𝑎,𝑠 )[𝑅(𝑠,𝑎,𝑠 ) + 𝛾𝑉 (𝑠 )]. (1)

There are several important properties of Equation (1). First, it is a recurrence relation, where the value of state 𝑠depends on the value of possible successor states 𝑠 . Second, given a value function, a policy may be derived; if the value function is optimal, the resulting policy will also be optimal. For any given state, the process of calculating 𝑉 (𝑠), and thus 𝜋 (𝑠), can be represented by a directed graph, which we will exploit later in the construction of data structures to facilitate causal analysis.

3.2 Structural Causal Models

Structural causal models [45, 46] are a framework for describing systems in a way that captures the causal influence of some variables on other variables. Like Bayesian networks, they offer a directed graphical representation of these influences, breaking causality or attribution problems down into three components. Formally, SCMs model scenarios, defined as tuples S = U, V, M , where

U is a set of exogenous variables, called the context, which are required to define the scenarios but should not be identified as causal. These variables describe some condition of the world and are considered fixed for a given scenario. V is a set of variables, known as the endogenous variables, which may be causes. M is a set of equations modeling how variables in U and V affect the variables in V.

All variables in the world are either elements of U or elements of V, and U V = . In our case, V are variables internal to the MDP reasoning process of the agent, such as rewards or transitions, and thus represent potential causes for the resultant policy what we observe as agent behavior. The decision of which variables to assign to U and V is a design choice that we discuss in detail later, and usually concerns application relevance and computational cost. For example, though the presence of oxygen in the atmosphere is required for combustion, and may be necessary for a complete model describing a house fire, we typically would not want to identify its presence as one of the causes of the fire and thus may decide to assign this variable to the context. In the above example, we would call the assignment of a value to the is-there-a-house-fire variable (say 𝐻), an event. We use 𝜙to denote an event. Thus, we could equally well describe, for a given point in time, either the existence (𝜙= [𝐻= True]) or non-existence (𝜙= [𝐻= False]) of a house fire as an event. In general, events may refer to assignments of more than one variable simultaneously, and the main focus of this paper is identifying the causes of events given some scenario S. We now describe how to encode the components of an SCM within a directed graph. SCMs may represent directed graphs of arbitrary topology. However, most inference requires causal graphs that are directed acyclic graphs, or DAGs, where nodes are variables and edges denote cause-effect relations. Further improvements can be made if the causal graph is layered [30]. A layered causal graph (LCG) is defined given an event 𝜙, for which we want to determine the causes, and a set of variables 𝑋 V, which we would like to evaluate as causal or not (Fig. 1). An LCG is a DAG whose nodes are partitioned into non-intersecting layers (𝑆𝑘, . . . ,𝑆0), where for every edge 𝐴 𝐵there exists some 𝑖 {1, . . . ,𝑘} such that 𝐴 𝑆𝑖and 𝐵 𝑆𝑖 1. Further, 𝑋 𝑆𝑘, and 𝜙 𝑆0.

3.3 Weak Causes, Actual Causes, and Responsibility

Here, we review our working definitions of weak cause, actual cause, and responsibility. Note that setting values for context variables 𝑈= 𝑢induces values for all endogenous variables V. That is, absent intervention, the assignment 𝑈= 𝑢completely determines V = 𝑣.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:8 Nashed, Mahmud, Goldman & Zilberstein

Fig. 1. A layered causal graph.

Definition 1. Let 𝑋 V be a subset of the endogenous variables, and let 𝑥be a specific assignment of values for those variables. Given an event 𝜙, defined as a logical expression, for instance 𝜙= ( 𝑎 𝑏), a weak cause of 𝜙 satisfies the following conditions:

(1) Given the context 𝑈= 𝑢, 𝑋= 𝑥and 𝜙hold. (2) Some 𝑊 (V \ 𝑋), 𝑊= 𝑤, and 𝑋= 𝑥 exist such that: A) given 𝑈= 𝑢, 𝑋= 𝑥 , and 𝑊= 𝑤, 𝜙holds. B) for all 𝑊 𝑊and 𝑍 V \ (𝑋 𝑊), given 𝑈= 𝑢, 𝑤 = 𝑤|𝑊 , and 𝑋= 𝑥, 𝜙holds for any 𝑍= 𝑧.

Here, 𝑥 is simply any assignment of variables in 𝑋that is not identical to 𝑥. For a single, Boolean variable this would be its negation; in general, as few as one element of 𝑋may deviate from its original value. Thus, 𝑥

represents some possible alternate or counterfactual state. Similarly, assignments to the witness set 𝑊= 𝑤

(sometimes also called the contingency set) can be interpreted similarly. The main distinction between 𝑋and𝑊is simply that we are testing for the causal strength of variables in 𝑋specifically, and not 𝑊. The same is true for 𝑍. Condition 1) is asserting that, given some context 𝑈= 𝑢, the variable assignment 𝑋= 𝑥and the event 𝜙 actually take place. Condition 2A) ensures that the variables in 𝑋have influence on the event 𝜙through their ability to falsify the event in some setting, while Condition 2B) says that, given context 𝑈= 𝑢, 𝑋= 𝑥alone is sufficient to cause 𝜙, independent of some other variables 𝑊. This and similar definitions of cause are often called but-for" definitions. There is a related definition [44] in which condition 2B) is replaced with the following, simpler statement: for all 𝑍 V \ (𝑋 𝑊), where 𝑊= 𝑤and 𝑍= 𝑧given 𝑈= 𝑢, 𝜙holds when 𝑋= 𝑥. There is also a more restrictive form of cause, the actual cause [45] (see Definition 2). For a more extensive treatment of Definition 1, as well as a number of interesting examples, we recommend reading Halpern and Pearl [2005].

Definition 2. Let 𝑋be a weak cause. 𝑋is an actual cause if it is minimal. That is, if there is no such 𝑋 such that 𝑋 𝑋and 𝑋 is also a weak cause.

In practice, we find restricting explanations to contain only actual causes a helpful, though not necessarily sufficient, filtering mechanism. In addition to actual causality, we also use the following notion of responsibility [24].

Definition 3. The responsibility, 𝜌, of a weak or actual cause 𝑋with witness set 𝑊is 𝜌= 1 1+|𝑊| .

The responsibility score of the set of variables 𝑋is a measure of their ability to affect an outcome and is inversely proportional to the number of counterfactual variables required to achieve scenarios in which 𝑋satisfies the definition of weak cause. Intuitively, as fewer contingency variables are required for 𝑋to meet Definition 1, the responsibility score increases. That is, variables in 𝑋make up a large fraction of the counterfactual scenario.

4 A General Procedure for Automated Explanation Generation Before explaining our system in detail, we present a general, abstract algorithm for automatically generating explanations that is often implicitly acknowledged, or used in part, in the design of such systems, but which we

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:9

have not seen explicitly described anywhere in the XAIP literature. Without such grounding, we feel it is easy for different work to rather ambiguously be applied to potentially many sub-problems for which it may be unintended, incomplete, or ill-suited. The following is inspired by a close reading of Miller [2019], although such steps are never explicitly mentioned. In furnishing a high-quality, automatically generated explanation, a system must:

(1) Determine the user s why-question. Specifically, determine the fact and the foil of a counterfactual scenario. For example, a fact 𝑎and foil 𝑎 represent the why-question Why was action 𝑎taken and not action 𝑎 ? (2) Given the fact and foil, the event in question may have many plausible or correct causes. We must find them, possibly using multiple types of information or reasons. (3) Select a subset of causal variables from Step 2 to communicate to the user. (4) Produce human-interpretable output (e.g., speech, text, images) based on the selected variables from Step 3.

We emphasize here that all of these steps represent difficult open research questions, especially considering the diversity of systems and scenarios for which we may want to generate explanations. This work focuses primarily on Step 2 and provides some results of significance related to Step 3. We assume that Step 1 has been addressed either through an external module or through system design. Although we use natural language templates to perform Step 4, this is primarily done to facilitate user study participation. We make no claims as to their efficacy relative to other potential modes of communication or even other possible text template designs. MDPs, like other model-based planners, have a distinct advantage in explainability relative to their black-box counterparts as the key variables in the decision-making processes are typically represented explicitly within the model. The primary goal of the following sections is to understand how best to exploit this structure while 1) remaining theoretically solid, 2) remaining general and flexible with respect to the components of the model under analysis, and 3) not sacrificing the ability to apply this technique in some form to black box systems. We first focus on the underlying theory behind modeling MDP decision making with SCMs ( 5-6) before introducing Mean RESP, the algorithm we use for most of our experiments, in 7. Readers seeking only an implementation should skip directly there. Figure 2 provides an overview of all algorithms we discuss in this paper and their respective roles in generating explanations for systems with different properties.

5 Structural Causal Models for MDPs At a high level, we construct a causal model of the computation that solves for the policy of an MDP and then use this model to determine causes for agent actions, which can later be used online for explanation. Our goal is to explain agent behavior, which, if the agent is using an MDP for reasoning, is represented by partial policies executed by the agent during deployment. Thus, the most natural choice of 𝜙consistent with this goal is a set of Boolean variables of the form 𝜋𝑠𝑎= [𝜋(𝑠) = 𝑎]. Here, we use Iverson brackets to denote the Boolean evaluation for the statement 𝜋(𝑠) = 𝑎. For example, if 𝜙represents the event of taking action 𝑎in state 𝑠and not taking action 𝑎 , we have

𝜙= [𝜋(𝑠) = 𝑎], [𝜋(𝑠) = 𝑎 ] = True, False .

This representation is somewhat redundant for deterministic policies, which is what we focus on in this paper. However, we note that it would in theory allow us to represent many types of queries on stochastic policies cleanly, for example if given a constrained MDP, as action probabilities are not always mutually exclusive. Collectively, the variables 𝜋𝑠𝑎represent the policy of the MDP agent. Overall, there are |𝑆||𝐴| variables, one for each state-action combination. We denote this set by Π. In the LCG representation, all such variables reside in the 𝑆0 layer, although only a small fraction will be included in 𝜙for any particular query. In order to construct the LCG, we also need to determine U and V. Depending on our choices, layers 𝑆1-𝑆𝑘

will represent different parts of the MDP. In all variations below we can derive the agent s action for some state 𝑠,

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:10 Nashed, Mahmud, Goldman & Zilberstein

Model-based Planner (MDP)

Model-free Planner (NN)

Is model or input space too large?

Explanans are future states?

Is causal graph layered?

YES NO Modified beam search

Determine weak causal variables

Approximate weak causal variables

Construct causal graph

Reduce layered causal graph

Get fact and foil

Convert to layered causal graph

Algorithm 1

We assume this is given

Algorithm 7

Algorithm 5

Algorithm 3

Algorithm 6

Algorithm 8

Approximate methods

Identify causes consistent with Definition 1

Fig. 2. Process flow diagram for generating explanations of a sequential decision-making agent. Algorithms referenced here are presented throughout the rest of the paper. Theoretically, all computation may be done offline, as the policy learned by a network or through value iteration does not change. However, since the number of fact-foil combinations is typically very large, at least some computation will likely occur online. When the causal structure of the problem is relatively simple and known in advance (as is the case for an MDP) it may be possible to replace Algorithm 5 with a more efficient, customized algorithm. The notion of a causal graph is presented as a useful abstraction which can be applied generally; as we will see, it is not strictly necessary in all cases.

given the LCG for its MDP and the values of all nodes without incoming edges, by passing values along edges and computing variables in subsequent layers of the graph until we reach layer 𝑆0. In the general case, our causal analysis follows four steps. (1) A causal graph is generated from the relevant MDP components. (2) The resulting graph is converted into a layered causal graph. (3) The layered graph is pruned to remove any irrelevant nodes and edges, given 𝑋and 𝜙. (4) A recursive algorithm identifies sets of causal variables in the pruned graph. This approach provides a principled, general framework for causal inference on MDPs while simultaneously supporting several types of explanations. We first detail this process in layered MDPs ( 5) and then discuss approximate methods for MDPs of arbitrary topology ( 6). Further approximation and algorithms applicable to value function approximators such as neural networks are covered in 7.

5.1 Causal Models for Layered MDPs

We begin with the special case of layered MDPs, which contain both tree MDPs and finite-horizon MDPs, and for which our methods are exact (when the state space is discrete, up to discretization). Tree MDPs2 are MDPs for which a tree may be built to describe all possible trajectories from any given state. Layered MDPs are a class of MDPs that we introduce subsequently in order to make our claims more precise. Although it is possible to create a single, monolithic causal graph that simultaneously represents all components of the MDP tuple, this is not helpful since it does not afford any additional types of inference, is less computationally efficient, and requires

2Not to be confused with a recent Tree MDP description from reinforcement learning [131]

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:11

Fig. 3. A layered causal graph generated from an MDP representing the influence of state factors on the action.

substantial bookkeeping to maintain the layered property. Thus, we analyze two causal models that, together, can answer causal queries about all parts of MDPs considered in the previous literature.

Definition 4. A layered Markov decision process is an MDP where, for all states 𝑠 𝑆, the state transition graph rooted at state 𝑠of horizon ℎis a layered graph ℎ N.

State Factors. One of the most natural questions to ask about an action prescribed by an MDP policy is its dependence on a particular state or state factors. Once the appropriate SCM is constructed, it represents a fixed, possibly sub-optimal policy3. While we cannot change state factors to produce a different policy, we can understand how state factors affect action selection for a given policy. We construct this SCM by letting U consist of all variables related to the reward function 𝑅, transition function 𝑇, start distribution 𝑑, and discount factor 𝛾. Then, V can be defined as 𝐹 𝑆 Π, where 𝐹is the set of variables representing state factors, 𝐹= {𝑓1, 𝑓2, . . . , 𝑓𝑁}. Formally, we have

𝑠 𝑆,𝑎 𝐴 {𝜋𝑠,𝑎} Ø

where 𝑓𝑖denotes the 𝑖th state factor. Finally, M is composed of the following three sets.

M𝐹:= {[𝑓𝑖= 𝑓𝑡 𝑖]}, 𝑖 {1, . . . ,𝑛}.

Here 𝑓𝑡 𝑖is the value of state factor 𝑖at time 𝑡. A given set of state factors 𝑓1, . . . , 𝑓𝑛 𝑓determines the state 𝑠 𝑆. M𝑆:= {[𝑠= 𝑠𝑗]} = {[𝑓𝑡 1 𝑠𝑓1 𝑗] . . . [𝑓𝑡 𝑛 𝑠𝑓𝑛 𝑗]}, 𝑠 𝑆,

where 𝑠𝑓𝑖 𝑗indicates the possible values for feature 𝑓𝑖given state 𝑠𝑗. Finally, we have equations representing action selection. M𝐴:= {[𝜋(𝑠) = 𝑎]} = {𝜋𝑠𝑎 𝑠} 𝑠 𝑆,𝑎 𝐴.

Thus we define M := M𝐹 M𝑆 M𝐴. Figure 3 shows the causal graph represented by this SCM. This definition of SCMs for state factors creates layered graphs with exactly three layers, and when the state space is discrete, it permits exact inference regardless of the underlying MDP topology. Importantly, while this representation has some redundancy when used for

3That is, we can only reason counterfactually about states with respect to one policy at a time. This SCM can represent any policy, as the variables in 𝑆0 and their function of state may be set arbitrarily.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:12 Nashed, Mahmud, Goldman & Zilberstein

vanilla MDPs, this is required for other models, for example, when there is uncertainty in state factor and state values, leading to a lack of mutual exclusivity in variable values.

Rewards, Transitions, and Values. The second causal model that we present is used to analyze how reward, transition, and value functions causally influence action selection. Here, we let U = {𝛾} since it is essential for computing the effect of other variables in the system, but we are unlikely to consider this a direct cause of any behavior we want to explain. Further, we let

𝑠,𝑠 𝑆,𝑎 𝐴 {𝑇(𝑠,𝑎,𝑠 )} Ø

𝑠,𝑠 𝑆,𝑎 𝐴 {𝑅(𝑠,𝑎,𝑠 )} Ø

𝑠 𝑆 {𝑉(𝑠)} Ø

𝑠 𝑆,𝑎 𝐴 {𝜋𝑠,𝑎}.

Finally, we let M be the set of equations needed to solve for a policy, for instance by value iteration. The first two sets do not depend on other endogenous variables.

M𝑅:= 𝑅(𝑠,𝑎,𝑠 ) = 𝑅𝑠𝑠 𝑎, 𝑠,𝑠 𝑆, 𝑎 𝐴;

M𝑇:= 𝑇(𝑠,𝑎,𝑠 ) = 𝑇𝑠𝑠 𝑎, 𝑠,𝑠 𝑆, 𝑎 𝐴. The set of equations for the value at each state 𝑠 𝑆is

M𝑉:= 𝑉(𝑠) = max 𝑎

𝑠 𝑆 𝑇𝑠𝑠 𝑎[𝑅𝑠𝑠 𝑎 + 𝛾𝑉(𝑠 )], 𝑠 𝑆.

Finally, we have the set of equations for action selection.

M𝐴:= (𝜋(𝑠) =𝑎𝑘) = h

𝑠 𝑆 𝑇𝑠𝑠 𝑎𝑘𝑉(𝑠 ) = 𝐴𝑠 𝑚𝑎𝑥 i , 𝑠 𝑆.

𝐴𝑠 𝑚𝑎𝑥= max 𝑎

𝑠 𝑆 𝑇𝑠𝑠 𝑎1 𝑉(𝑠 ), . . . ,

𝑠 𝑆 𝑇𝑠𝑠 𝑎𝑚𝑉(𝑠 ) .

Thus we define M := M𝑅 M𝑇 M𝑉 M𝐴. The resultant LCG, shown in Figure 4, is built conditioned on the agent s current state in order to focus computations on the state of interest for the query. If the agent moves to a new state, a new graph is built, since reward and transition variables associated with successor states may change. Here, we also see that some layers contain value variables conditioned on particular actions (𝑉𝑎(𝑠𝑖)). Though we do not do so in our experiments, it is also possible to use this structure to ask questions like Why does the agent take action 𝑎in state 𝑠given that it is required to take action 𝑎 in state 𝑠𝑖? This is done by fixing values or collapsing subsets of the graph such that 𝜋𝑠𝑖,𝑎 = True (or more generally, enforcing any arbitrary partial policy). In the extreme, all such variables can be collapsed into the existing value variables (𝑉(𝑠𝑖)), effectively removing the max operation and reducing the graph to reproduce policy evaluation at which point we can no longer generate counterfactual scenarios consistent with the problem statement. Most importantly, it is also possible to move variables from V to the context U, reducing complexity at the cost of eliminating variables from causal analysis. For example, as we show in our experiments, we could move all reward variables to generate explanations using only transition variables.

Exactness Results. Layered MDPs are well-behaved since the SCMs formed by our construction naturally form LCGs. Given M and a state 𝑠0, we can construct an LCG using the layered structure of the MDP. The following theorem and lemmas give a class of MDPs for which LCGs may be constructed and analyzed exactly.

Theorem 1. Let 𝑀be a finite-horizon MDP. If 𝐺is a layered causal graph representing 𝑀at state 𝑠, then 𝐺 preserves the cause-effect relationships in the reasoning process for action selection in 𝑀at 𝑠.

Lemma 1.1. Given a finite-horizon MDP with horizon ℎand start state 𝑠0, there exists an equivalent layered MDP.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:13

Fig. 4. A layered causal graph representing the effect of rewards, transitions, and values on action selection. 𝑉𝑎(𝑠) is the value of taking action 𝑎at state 𝑠. Depending on the type of analysis being done, zero-entries in the transition matrix may be represented by node omissions from the graph. Here, for example, 𝑇(𝑠,𝑎 ,𝑠0) = 0 and 𝑇(𝑠,𝑎,𝑠3) = 0. We have also written reward nodes as 𝑅(𝑠,𝑎) = Í 𝑠 𝑇(𝑠,𝑎,𝑠 )𝑅(𝑠,𝑎,𝑠 ) for readability, but in general the graph could be expanded more explicitly. Note also that this figure shows only 4 successor states in 𝑆2, but in general there may be up to |𝑆|.

Proof of Lemma 1.1: We prove this via an algorithm for constructing layered MDPs from finite-horizon MDPs. For any start state 𝑠0, let Γ be the state transition graph rooted at 𝑠0 such that a directed edge from node 𝑠𝑖to 𝑠𝑗 exists if and only if 𝑎 𝐴s.t. 𝑇(𝑠𝑖,𝑎,𝑠𝑗) > 0. Next, we run Breadth First Search on Γ starting at 𝑠0 without an explored list until all paths of length ℎhave been explored and recorded. From these recorded paths, create a tree with root node 𝑠0, appending 𝑘" to state IDs at the 𝑘th level of the tree. After the tree is built, aggregate any duplicate nodes, preserving their edges. Such nodes will only occur within the same layer of the tree. Thus, after aggregation, the resulting state transition graph may not be a tree, but will be layered. Finally, for all actions 𝑎 𝐴, states 𝑠 𝑆, and layers 𝑘= {0, . . . ,ℎ} in the original MDP, set 𝑇Γ(𝑠𝑘 𝑖,𝑎,𝑠𝑘 𝑗) = 𝑇(𝑠𝑖,𝑎,𝑠𝑗) iff 𝑘+ 1 = 𝑘 ; set 𝑅Γ(𝑠𝑘 𝑖,𝑎,𝑠𝑘 𝑗) = 𝑅(𝑠𝑖,𝑎,𝑠𝑗) iff 𝑘+ 1 = 𝑘 . Construct the layered MDP 𝑀𝐿= 𝑆1:𝑘,𝐴,𝑇Γ, 𝑅Γ,𝑑𝑆1:𝑘,𝛾 .

Lemma 1.2. If 𝐺and 𝐻are causal graphs of finite Bayesian networks, and there exists a homomorphism 𝐺 𝐻, then 𝐺and 𝐻preserve cause-effect relationships.

Proof of Lemma 1.2: This result follows from Jacobs et al. [2019] and Otsuka and Saigo [2022]. Proof of Theorem 1: Let 𝐻be the causal graph representing action selection in 𝑀(similar to Figure 4, but with arbitrary cycles due to the topology of the state transition graph for 𝑀). Since 𝑀is finite-horizon, then by Lemma 1.1 we can create an equivalent layered MDP, 𝑀𝐿. 𝑀𝐿has a causal graph, 𝐺(e.g., Figure 4), representing how actions are selected for any state 𝑠via policy derivation.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:14 Nashed, Mahmud, Goldman & Zilberstein

We can construct a function 𝜓that induces a homomorphism 𝜓: 𝐺 𝐻in the following way: map all nodes for variables 𝑉(𝑠𝑖) or 𝑇(𝑠,𝑎,𝑠𝑖) in 𝐺, to the node representing 𝑠𝑖in 𝐻. 4 Next, map all nodes in 𝐺for variables 𝑅(𝑠,𝑎,𝑠 ) to any node 𝑛 𝐻such that 𝜓(𝑇(𝑠,𝑎,𝑠𝑖)) = 𝑛. Since the homomorphism 𝜓: 𝐺 𝐻exists, and since MDPs may be represented as Bayesian networks, then by Lemma 1.2, 𝐺captures all cause-effect relationships for action selection.

5.2 Causal Inference for Layered MDPs

Given an LCG, we can perform causal inference to determine causal variables with respect to an event 𝜙. Given 𝑋 V and 𝜙, such that 𝜙 𝑋= , we would like to check if setting 𝑋= 𝑥causes 𝜙. That is, we would like to test whether 𝑋= 𝑥satisfies Definition 1. We now develop a naive, exact algorithm to determine weak causality. At a high level, Algorithm 1 proceeds up the causal chain in the LCG, recursively identifying causes one layer at a time. That is, sets of variables in 𝑆1

are identified as causes of events in 𝑆0. Those variables then assume the role of the event(s) and their causes are identified in 𝑆2. This process repeats until the algorithm reaches layer 𝑆𝑘, the last layer of the LCG. This algorithm borrows mathematical structures and notation developed by Eiter and Lukasiewicz [2006] that they use to establish a theorem (reproduced below) on identification of weak causes. We first introduce these two additional constructs before proceeding to the algorithm description. Here, 𝜙𝑥𝑤(𝑢) is the value of 𝜙given context 𝑈= 𝑢and the assignments of variables 𝑋= 𝑥and 𝑊= 𝑤. Loosely, elements 𝑝 p and 𝑞 q represent satisfying assignments w.r.t. conditions 2A and 2B of Definition 1, respectively. The conditions placed on the domains of 𝐹and 𝑤also have analogs in Definition 1. These constructs are general and do not have particular meaning with respect to MDPs, other than that the variables within 𝑝, 𝑞, 𝐹, or 𝑤will be parts of the transition function, reward function, or state factors depending on the desired type of explanation. In the case of reward or transition explanations, the relationship between variables in 𝑅𝑖and those in 𝑅𝑖+1 is that those in 𝑅𝑖+1 are relevant for calculating optimal actions one timestep farther into the future.

𝑅0 ={(p, q, 𝐹)|𝐹 𝑆0, p, q D(𝐹),

𝑤 D(𝑆0 \ 𝐹) 𝑝,𝑞 D(𝐹) :

𝑝 p iff 𝜙𝑝𝑤(𝑢),

𝑞 q iff 𝜙[𝑞 𝑍(𝑢)]𝑤 (𝑢)

𝑊 𝑆0 \ 𝐹,𝑤 = 𝑤|𝑊 ,𝑍 𝐹\ 𝑆𝑘},

and 𝑅𝑖={(p, q, 𝐹)|𝐹 𝑆𝑖, p, q D(𝐹),

𝑤 D(𝑆0 \ 𝐹) (p , q , 𝐹 ) 𝑅𝑖 1 𝑝,𝑞 D(𝐹) :

𝑝 p iff 𝐹 𝑝𝑤(𝑢) p ,

𝑞 q iff 𝐹 [𝑞 𝑍(𝑢)]𝑤 (𝑢) q

𝑊 𝑆0 \ 𝐹,𝑤 = 𝑤|𝑊 ,𝑍 𝐹\ 𝑆𝑘, for 𝑖> 0}.

Theorem 2. (From Eiter and Lukasiewicz [2006]) Let S = (U, V, M) be a causal model. Let 𝑋 V, 𝑥 D(𝑋), 𝑢 D(𝑈), and let 𝜙be an event. Let (𝑆0, . . . ,𝑆𝑘) be a layering of 𝐺(S) relative to 𝑋and 𝜙, and let 𝑅𝑘be defined as above. Then, 𝑋= 𝑥is a weak cause of 𝜙under 𝑢in S iff (1) given 𝑈= 𝑢, 𝑋= 𝑥and 𝜙holds, and (2) (p, q,𝑋) 𝑅𝑘

such that p and 𝑥 q.

4Figure 4 is an expanded version of the graph 𝐺, where the odd layers (representing max( ) operations) have been explicitly factored out to illustrate the possibility of modeling different policies.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:15

Algorithm 1 Determine Weak Causes

1: Input: Layered causal graph 𝐺, context 𝑈= 𝑢, variables 𝑋, event 𝜙 2: Output: Sets of weak causal variables CW 𝑋of 𝜙. 3: 𝑅0 , 𝑆0,𝑆𝑘 layers of 𝐺containing 𝜙,𝑋 4: for all 𝐹 P(𝑆0) do 5: p, q 6: for all 𝑤 D(𝑆0 \ 𝐹) do 7: for all 𝑝 D(𝐹) do 8: if 𝜙given 𝑝and 𝑤then 9: p 𝑝 p

10: for all 𝑞 D(𝐹) do 11: 𝑏 True 12: for all 𝑊 P(𝑆0 \ 𝐹) do 13: 𝑤 𝑤|𝑊

14: for all 𝑍 P(𝐹\ 𝑆𝑘) do 15: 𝑧 𝑍(𝑢) 16: if 𝜙given 𝑞, 𝑧, and 𝑤 then 17: 𝑏 False 18: break 19: if 𝑏then 20: break 21: if 𝑏then 22: q 𝑞 q

23: 𝑅0 (p, q, 𝐹) 𝑅0

24: 𝑅 𝑅0, 𝑙 1 25: while 𝑙 𝑘do 26: 𝑅 Recurrence Step(𝑆𝑙,𝑆𝑘, 𝑅 ,𝑢) 27: 𝑅 𝑅, 𝑙 𝑙+ 1

28: CW 29: for all (p, q, 𝐹) 𝑅 do 30: if p and 𝑥 q then 31: CW 𝑥 CW 32: return CW

Algorithm 1 is split into an initial step and a recurrence step that together compute 𝑅0, . . . , 𝑅𝑘. Lines 7-9 check condition 2A from Definition 1, while lines 10-22 check condition 2B. Condition 1 is always satisfied, as 𝜙 represents the agent s actual policy. The recurrence step (Algorithm 2) applies the same reasoning to the output of the initial step. Specifically, the outer loop (line 4) looks at all possible subsets of variables, 𝐹, in the 𝑖th layer. Variables not in 𝐹are assigned values 𝑤one at a time, eventually looping over all possible sets of values (line 6). Then, for every tuple 𝑅 from layer 𝑖 1 (line 7), we check the conditions for 𝑝 p (lines 8-10) and 𝑞 q (lines 11-23). Finally, for a given set 𝐹, we add all the qualifying 𝑝,𝑞to the tuple 𝑅(line 24). The final result is a family of sets of causal variables CW, where C𝑖 W 𝑋, each of which satisfies Definition 1 with respect to the original event 𝜙. We direct interested readers to Eiter and Lukasiewicz [2006] for a detailed treatment of Theorem 2 and the definitions of 𝑅0 and 𝑅𝑘. Thus, Algorithm 1 supports analysis of LCGs as in Figure 4 and, if we restrict 𝑋to the state factors in the MDP, LCGs as in Figure 3. However, it is not efficient and does not exploit any structure in MDPs. To increase efficiency, we can often prune some irrelevant nodes and edges from the graph, based on the members of event 𝜙

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:16 Nashed, Mahmud, Goldman & Zilberstein

Algorithm 2 Recurrence Step

1: Input: Layers 𝑆𝑖, 𝑆𝑘, tuples 𝑅 , context 𝑈= 𝑢 2: Output: Set of tuples 𝑅. 3: 𝑅 4: for all 𝐹 P(𝑆𝑖) do 5: p ; q 6: for all 𝑤 D(𝑆𝑖\ 𝐹) do 7: for all (p , q , 𝐹 ) 𝑅 do 8: for all 𝑝 D(𝐹) do 9: if 𝐹 , given 𝑝and 𝑤, is in p then 10: p 𝑝 p

11: for all 𝑞 D(𝐹) do 12: 𝑏 True 13: for all 𝑊 P(𝑆𝑖\ 𝐹) do 14: 𝑤 𝑤|𝑊

15: for all 𝑍 P(𝐹\ 𝑆𝑘) do 16: 𝑧 𝑍(𝑢) 17: if 𝐹 , given 𝑞, 𝑧 , and 𝑤 , is not in 𝑞 then 18: 𝑏 False 19: break 20: if 𝑏then 21: break 22: if 𝑏then 23: q 𝑞 q

24: 𝑅 (p, q, 𝐹) 𝑅

25: return 𝑅

and variables 𝑋. Graphs absent such variables are called strongly reduced. Eiter and Lukasiewicz [2006] provide an inclusive disjunction over the following conditions for removing a variable 𝑋𝑖from an LCG.

(1) 𝑋𝑖 𝑋is not connected via variables in V \ 𝑋to 𝜙. (2) 𝑋𝑖is neither a direct parent of a variable in 𝜙nor part of a chain connecting 𝑋to 𝜙.

Algorithm 3 applies these criteria to an LCG 𝐺𝑠0, producing a strongly reduced LCG 𝐺𝜙𝑋 𝑠0 . Depending on the structure of the MDP, such reductions may be substantial. In particular, as most MDPs have relatively sparse transition functions, the number of states that may be reached within ℎactions could be significantly below the theoretical worst case of |𝑆|, resulting in a much smaller strongly reduced LCG. After generating the set of weak causes CW, we can determine actual causes by checking the minimality condition (Definition 2), using Algorithm 4. Algorithm 4 iterates through weak causal sets from smallest to largest, finding common subsets and eliminating supersets similar to basic prime-finding algorithms. In line 4, the family of weak causal sets is sorted in ascending order of size. In lines 5 and 6, an indicator is set that is False if a particular weak cause has not been checked and True if it has. In line 7, the family of causal sets is iterated over from smallest to largest. Line 8 checks if we have already either processed the set or determined it to be non-minimal. If neither of these is true, we proceed. In lines 9 and 10, we add the set to the family of actual causes and mark it as having been checked. In line 11, we iterate over all strictly larger weak causal sets. If we find a larger weak causal set containing the current actual cause as a subset, we know that it is not an actual cause and eliminate it from the analysis by marking it as having already been checked.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:17

Algorithm 3 Reduce Causal Graph

1: Input: Layered causal graph 𝐺𝑠0, explanans 𝑋, event 𝜙

2: Output: Strongly reduced layered causal graph 𝐺𝜙𝑋 𝑠0 3: 𝐺𝜙𝑋 𝑠0 𝐺𝑠0 4: for all 𝑥 𝑋do 5: if path from 𝑥to some 𝑦 𝜙then

6: remove 𝑥and its edges from 𝐺𝜙𝑋 𝑠0 7: if paths from 𝑥to 𝜙, 𝑥 𝑋along the path then

8: remove 𝑥and its edges from 𝐺𝜙𝑋 𝑠0 9: for all 𝑣 V \ (𝑋 𝜙) do 10: if 𝑦 𝜙such that 𝑣is a direct parent and 𝑥 𝑋,𝑦 𝜙such that 𝑣 𝑥 𝑦then

11: remove 𝑣and its edges from 𝐺𝜙𝑋 𝑠0 12: return 𝐺𝜙𝑋 𝑠0

Algorithm 4 Determine Actual Causes from Weak Causes

1: Input: Set of weak causes CW 2: Output: Set of actual causes C𝐴. 3: C𝐴 4: C W Sort(CW) By size, in ascending order

5: 𝐵 0 𝐵 B| CW |

6: for all 𝐹 C W do 7: if 𝐵(𝐹) then 8: C𝐴 𝐹 C𝐴 9: 𝐵(𝐹) True 10: for all 𝐹 C W such that |𝐹 | > |𝐹| do 11: if 𝐹 𝐹 then 12: 𝐵(𝐹 ) True

13: return C𝐴

Overall, this collection of algorithms is powerful. However, real-valued variables such as rewards and transitions must be converted to discrete variables. For this, we assume a discretization scheme. For example, reward variables could have discrete domains bounded by the min and max of the original reward function. While this offers an actionable speed versus accuracy tradeoff, it can be hard to know exactly how to set up such a scheme. Furthermore, the variables in 𝑋must be located in the same layer in the causal graph. Although this restriction complicates the analysis somewhat, it does synergize well with the sequential nature of the decision-making problem by naturally representing the flow of value from proximal rewarding states to the current state.

6 Generalization and Approximation Although layered MDPs encompass a large class of MDPs, Algorithm 1 has two key limitations. (1) It cannot represent infinite-horizon problems. (2) While the graph itself is straightforward to build for finite-horizon problems, very large problems or problems with large horizons may still be prohibitively expensive to analyze. In this section, we develop additional approximate algorithms to handle both finiteand infinite-horizon MDPs of arbitrary size and topology, either by constructing smaller, approximate causal models or by approximating more expensive inference processes.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:18 Nashed, Mahmud, Goldman & Zilberstein

6.1 Approximate Causal Models for MDPs Here, we address limitation (1). There are many methods for building approximate causal graphs of arbitrary MDPs, depending on the available information. We assume the original graph is built using a generic, uninformed algorithm, such as Algorithm 5, based on Iwasaki and Simon [1986]. Algorithm 5 generates a causal graph given a structural causal model by first constructing a bipartite graph, where variables (V) and equations (M) are nodes, and edges exist between variable nodes and equation nodes if the equation contains that variable. Given the bipartite graph, Hopcroft-Karp is run to produce a perfect matching. Note that a perfect matching keeps the vertex set and selects a subset of the edges such that each vertex has 1 incident edge. This perfect matching is then used to build a directed (causal) graph containing only variables. Such an algorithm is not required if there is prior information regarding the causal structure. The resulting causal graph may not be unique if M contains circular dependencies [64, 149]. Since the Bellman update equation (Equation (1)) is a recurrence relation between MDP state values, non-layered structures are highly likely in general.

Algorithm 5 Construct Causal Graph

1: Input: Set of variables V, set of equations M 2: Output: Causal graph 𝐺 3: B Construct Bipartite(V, M) 4: 𝐸𝑃𝑀 Hopcroft-Karp(B) 5: 𝑉 V, 𝐸 6: for all 𝑣 V do 7: // 𝑄is a node in B representing an equation. 8: for all 𝑒(𝑣,𝑄) Edges(𝑣) do 9: if 𝑒 𝐸𝑃𝑀then 10: // 𝑉𝑄is the set of variables in equation 𝑄. 11: for all 𝑣 𝑉𝑄, 𝑣 𝑣do 12: 𝐸 𝐸 Edge(𝑣 , 𝑣)

13: else 14: for all 𝑣 𝑉𝑄, 𝑣 𝑣do 15: 𝐸 𝐸 Edge(𝑣, 𝑣 )

16: 𝐺 {𝐸,𝑉} 17: return 𝐺

Thus, we develop Algorithm 6, which, given state 𝑠0 and causal graph 𝐺, removes these structures to produce an LCG 𝐺𝑠0 for causal analysis whenever the agent is in state 𝑠0. We consider a horizon ℎand let variables associated with states not reachable within ℎactions form causal leaves by removing their incoming causal edges. Remaining non-layered structures are corrected by removing edges such that states farther from 𝑠0 causally influence states nearer to 𝑠0, forming a simplified, finite-horizon version of the original MDP. These operations are executed simultaneously in Algorithm 6. In a sense, the causal influence within the planner flows from the horizon at time 𝑡+ ℎback in time to the present time 𝑡. Lines 4-8 label nodes, while lines 9-15 remove edges. Once Algorithm 6 is run, the output is guaranteed to be an LCG. Thus, we can immediately run Algorithm 3 given some 𝑋and 𝜙followed by Algorithm 1. Algorithm 6 produces an approximate causal model because it both disconnects some variables from the event entirely (to create an LCG with a finite number of layers) and also removes some of the causal pathways between variables that remain in the graph (to maintain the layered property). Figure 5 illustrates the transformation from an arbitrary causal graph to LCG.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:19

Algorithm 6 Construct Layered Causal Graph

1: Input: Causal graph 𝐺, state transition graph 𝐻, current state 𝑠0, horizon ℎ 2: Output: Layered causal graph 𝐺𝑠0 3: 𝐺𝑠0 (𝐸 , V ) 𝐺(𝐸, V) 4: for all 𝑉𝑠𝑖 V where 𝑉𝑠𝑖represents variables for state 𝑠𝑖do 5: if 𝑠𝑖is reachable on 𝐻in 𝜅 ℎactions from 𝑠0 then 6: label all 𝑣 𝑉𝑠𝑖with 𝜅 7: else 8: label all 𝑣 𝑉𝑠𝑖with

9: for all 𝑣 V do 10: if label(𝑣) = then 11: 𝐸 𝐸 \ { all edges to/from 𝑣} 12: V V \ 𝑣 13: continue 14: for all 𝑣 V do 15: if label(𝑣) label(𝑣 ) then 16: 𝐸 𝐸 \ { directed edges from 𝑣 to 𝑣}

17: return 𝐺𝑠0

Fig. 5. Algorithm 6 operating on a causal graph 𝐺with start state 𝑠0 and horizon ℎ= 2 (state transition graph 𝐻not shown). (Left) Arrows in 𝐺denote causal influence flowing backwards in time, from distal states to proximal states, according to their reachability. (Center) States are labeled corresponding to their distance from 𝑠0, and edges are pruned based on the labels. (Right) The final LCG is constructed from variables associated with states and connections remaining in the graph.

6.2 Approximate Causal Inference for MDPs

Regardless of topology, many MDPs are simply too expensive to analyze exactly, either because of the density of the transition function, the number of states, the length of the planning horizon, or the use of a high-fidelity discretization scheme for transition probabilities, rewards, or continuous state factors. Depending on the root cause of the complexity there are several strategies for simplification, some of which produce approximate results while others maintain exactness. In real-world problems individual reward and transition variables are often not independent, but instead depend on a high-level rule. For example, reward may be proportional to the value of a state factor, or the transition function may encode identical slipping probabilities regardless of location, as in classic grid-world domains. These rules can constrain the transition and reward function to relatively low-dimensional manifolds, and we can discretize these manifolds to gain efficiency without sacrificing important possible worlds.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:20 Nashed, Mahmud, Goldman & Zilberstein

Formally, let Δ|𝑆| 𝑖,𝑗be a simplex representing the space of possible transition functions for state 𝑖and action 𝑗, that is, the possible values for 𝑇(𝑖, 𝑗, :). This space is infinite, so in order to enumerate counterfactual scenarios we will apply a discretization represented by Ω, thus restricting possible values of 𝑇(𝑖, 𝑗, :) to a finite set of points on Δ|𝑆| 𝑖,𝑗. The space of possible transition functions for a given MDP is then 𝑇(:, :, :) ΩΔ|𝑆| 1,1 . . . ΩΔ|𝑆| |𝑆|,|𝐴|. Of course, this is likely still an intractably large space. However, if the values of 𝑇(𝑖, 𝑗, :) and 𝑇(𝑚,𝑛, :), for example, are not independent, then we can enumerate counterfactual scenarios conditioned on whatever rules relate the two values, which we call R. In the classic grid-world domain, we can write

𝑇(𝑠,𝑎,𝑠 ) =

1 𝑃(slip) if 𝑠 is intended direction of 𝑎given 𝑠, 1 2𝑃(slip) if 𝑠 is left or right of intended direction of 𝑎given 𝑠, 0 otherwise. (2)

Here, R represents the mapping 𝑃(slip) 𝑇(:, :, :). Iterating over counterfactual versions of R is exponentially cheaper, while still visiting all possible worlds under R, up to discretization, written RΩ. The same technique can be applied to the reward function, which might naively be represented in R|𝑆||𝐴||𝑆| Ω , but in practice is often generated according to similar high-level rules. Given such a structure, we can replace loops over, for example, all possible values of 𝑇(𝑠,𝑎,𝑠 ), and instead loop over the domain of the parameters in R, discretized by Ω, which is much, much smaller. Often, this may be done while retaining exact solutions. It may also be combined with other methods to approximate this manifold representation and eliminate less interesting or less probable cases. A more direct approach is to limit the sizes of 𝑊and 𝑍, the benefit of which is a reduction in problem complexity at the cost of omitting potential weak causes. These restrictions, of course, diverge from Definition 1, but can be made in a principled way that preserves an order over the possible results. In particular, one may set 𝑍= and/or |𝑊| 𝛽for some 𝛽 |V|. Limiting |𝑊| roughly corresponds to filtering weak causes based on their upper bound on responsibility from Definition 3. These strategies and their theoretical implications are covered in greater detail during our presentation and discussion of Mean RESP in 7. Finally, if we want to look far into the future, or the branching factor of the state transition graph is large, the resulting LCG may be too large to analyze, even when limiting domains or |𝑊| and |𝑍|. Therefore, we may choose to perform causal analysis on value function variables of future states, as these variables are very efficient to analyze since they summarize the reward and transition dynamics. However, intervening directly on them breaks their consistency with respect to the Bellman optimality equation (Equation (1)), and thus results in inherently approximate inference. Instead of trying to fit this analysis to match Definition 1, we opt for a different approximate algorithm altogether. We can use a form of beam search to limit the intermediate events represented at each layer of the LCG. The idea is to measure the influence of variables on 𝜙and then keep only the 𝑚most influential variables as the search progresses, where a formal definition of influence has been replaced with a heuristic. Beam search also requires the beam width 𝑚and the search depth limit ℎ. Algorithm 7 presents an overview. Generally, there are many reasonable definitions for the influence function. If the MDP has a strictly non-positive or non-negative value function, one straightforward definition for influence 𝐼is 𝐼𝜋(𝑠,𝑠 ) = |𝑉𝜋(𝑠 )𝑇(𝑠, 𝜋(𝑠),𝑠 )|/|𝑉𝜋(𝑠)|, which captures the portion of the value function at the current state 𝑠for which state 𝑠 is responsible under policy 𝜋. If the value function has both positive and negative values, we can still use the above equation without the absolute values, but the interpretation becomes slightly more complex. As written, influential future states are assumed to be at the same level of the planning graph, ℎactions away. This may be an issue if, for example, the policy prescribes an action which both serves to avoid a low-value state 𝑠 reachable in 𝑑 actions and also reach a high-value state 𝑠 in 𝑑 actions, where 𝑑 < 𝑑 < ℎ. Executing Algorithm 7 as is may not identify 𝑠 as being influential. However, saving the intermediate beams for each value of 𝑑 {0, . . . ,ℎ 1} and analyzing this family of sets after beam search has terminated allows identification of such scenarios and thus identification of influential future states within horizon ℎrather than at exactly horizon ℎ.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:21

Algorithm 7 Determine Influential Future States

1: Input: State transition graph from current state 𝐺𝑠0, current state 𝑠0, policy 𝜋, beam width 𝑚, horizon ℎ. 2: Output: Set of important future states 𝐵. 3: 𝐵 {𝑠0}, 𝑑 0 4: while 𝑑< ℎdo

5: 𝐵 6: for all 𝑠 𝐵do 7: for all successors 𝑠 of 𝑠do 8: 𝐵 [𝑠 ] Influence(𝑠, 𝜋(𝑠),𝑠 )

9: 𝐵 Sort(𝐵 ) 10: 𝐵 𝐵 [1 : 𝑚]

11: return 𝐵

In practice, an effective way to generate explanations using this method is to find one or more highly influential states and then use some of their common state factors as explanans. In our experiments, we differentiate explanans generated using Algorithm 7 (state factors from future states) from those that use the state factors of the current state by using a different text template that describes their relation to the current action differently. More details can be found in 8.3 and Appendix A. In domains which are much larger, this method may also be used as a pre-processing step to further prune the LCG, although we do not provide empirical results using this technique. We should also note that there are many reasonable definitions of influence beyond the expression we use. Furthermore, there may be ways to generate counterfactual scenarios by altering the value variables directly that maintain some level of consistency with the Bellman optimality equation, although we leave this for future exploration.

6.3 Metrics for Approximate Explanations Explanations generated by exact methods are difficult to evaluate, and the same holds true for those generated via approximate algorithms. While user studies and in situ evaluations unquestionably remain the gold standard for evaluating explanations [62], these experiments are frequently expensive and time consuming. Given the large volume of potential approximation strategies and the resulting explanations they produce, it is natural to consider automated measures that, while imperfect, can nevertheless indicate large deviations from the explanations produced by exact methods. That is, we hypothesize that, while we cannot know the ultimate quality of an explanation, exact or approximate, via automated metrics alone, we can at least compare the similarity of approximate results along several dimensions to their exact counterparts using automated techniques. Often, there exist multiple weak (or actual) causal sets for a given event. Thus, it is natural to restrict the output of, for example, Algorithm 1, to a maximum of 𝑘sets, creating top 𝑘 causal queries. There are three complementary pieces of potential information: 1) the existence or absence of a particular variable (or set) within the top 𝑘, 2) the rank of a particular set within the top 𝑘, and 3) a real-valued responsibility score associated with each set. Given (3), both (1) and (2) may be derived, and given (2), (1) may be derived, but not (3). Measuring the quality and similarity of top 𝑘results has been a topic of interest in databases and recommender systems for some time, with a variety of conceptualizations and implementations [32, 103, 147, 151]. Here, we propose several measures for comparing top 𝑘results in the context of automatic explanations, depending on the available information provided along with the top 𝑘causal sets. Some measures have been represented in different forms in the literature, and some may not be common in top 𝑘comparisons. In practice, as we show in the later evaluation, we expect a combination of these measures to yield the most insight.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:22 Nashed, Mahmud, Goldman & Zilberstein

We denote a function computing the similarity between the exact top 𝑘weak causal sets, 𝑘C W, and the approximate top 𝑘weak causal sets, 𝑘C𝛼 W, as 𝜇(𝑘C W, 𝑘C𝛼 W). Similarly, we denote the exact and approximate top 𝑘rankings, or orderings, over weak causal sets as 𝑘O and 𝑘O𝛼, respectively. Last, the sets of exact and approximate responsibility values are 𝑘𝜌 and 𝑘𝜌𝛼, respectively. All measures 𝜇are identical for actual causal sets.

6.3.1 Causal Set Contents Only. The simplest Boolean comparison we can make is to check if 𝑘C W = 𝑘C𝛼 W. Thus,

𝜇B(𝑘C W,𝑘C𝛼 W) =

( 0 if 𝑖𝑥 = 𝑖𝑥𝛼 𝑖𝑥 𝑘C W, 𝑖𝑥𝛼 𝑘C𝛼 W, 𝑖 {1, . . . ,𝑘} 1 otherwise. (3)

Obviously, 𝜇B misses out on significant nuance. However, we should note that, because this is a top 𝑘result, it does admit many results which are not completely identical, but whose discrepancies occur outside the top 𝑘 items, making it a weak but relevant baseline. Next, we consider the presence or absence of different variables within the top 𝑘causal sets, since in many cases the user will see only the causes and not their relative importance. Let I = 𝑘 𝑖𝑥 𝑘C W

𝑖𝑥 and I𝛼= 𝑘 𝑖𝑥𝛼 𝑘C𝛼 W

𝑖𝑥𝛼, then we have

𝜇I(𝑘C W,𝑘C𝛼 W) = |I𝛼 I | |I𝛼 I | . (4)

This measure represents a slightly generalized form of Hamming distance, equivalent to some definitions of recall, providing a basis to understand what, if anything, has been erroneously included or omitted in an approximate family of weak causal sets. 𝜇I will approach 0 as the number of samples increases, but not necessarily monotonically.

6.3.2 Ranking. Given 𝑘O and 𝑘O𝛼, where 𝑖𝑜 𝑘O is the rank (1, . . . ,𝑘) of 𝑖𝑥within 𝑘CW, we can calculate the rank correlation coefficients between 𝑘O and 𝑘O𝛼. The two most obvious choices are Kendall s 𝜏and Spearman s 𝜌, the latter of which we denote using the subscript 𝑆, 𝜌𝑆, to differentiate it from responsibility.

𝜇𝜏(𝑘O ,𝑘O𝛼) = 2 𝑘(𝑘 1)

𝑖=1 sgn(𝑖𝑜 𝑗𝑜 ) sgn(𝑖𝑜𝛼 𝑗𝑜𝛼), (5)

𝜇𝜌𝑆(𝑘O ,𝑘O𝛼) = 1 6 Í𝑘 𝑖=1(𝑖𝑜 𝑖𝑜𝛼)2

𝑘(𝑘2 1) . (6)

In our system such rankings would be generated using responsibility scores, so the existence of 𝑘O without 𝑘𝜌 does not occur naturally. However, this is likely not the case for systems in general. Both 𝜇𝜏and 𝜇𝜌𝑆will converge to 1 as the number of samples increases, though again not monotonically. In both this section and the next, since we do not know if the order (or the order of the responsibility scores) for exact and approximate methods are identical, we need to ensure that, when analyzing ordinal positions 𝑖𝑜 and 𝑗𝑜𝛼(or scores 𝑖𝜌 and 𝑗𝜌𝛼), they refer to the same underlying sets, that is, 𝑖𝑥 = 𝑗𝑥𝛼.

6.3.3 Responsibility. Having access to a score for each causal set, in our case the raw responsibility scores, provides an opportunity for capturing additional nuance in our measures. Here, we present several options. The first and perhaps most

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:23

basic option is to calculate the Euclidean distance between the vectors representing 𝑘𝜌 and 𝑘𝜌𝛼,

𝜇E(𝑘𝜌 ,𝑘𝜌𝛼) = 𝑘

𝑖=1,𝑗|𝑖𝑥 =𝑗𝑥𝛼 (𝑖𝜌 𝑗𝜌𝛼)2 1

As the number of samples increases, 𝜇E approaches 0. However, this measure is not scale-invariant, either with respect to the magnitude of the scores or the size of 𝑘. So, depending on the application, quality judgments using this measure may be too permissive or too restrictive. Second, similar to 𝜇𝜏and 𝜇𝜌𝑆, we can compute the correlation between 𝑘𝜌 and 𝑘𝜌𝛼using Pearson s 𝑟,

𝜇𝑟(𝑘𝜌 ,𝑘𝜌𝛼) = 𝑘Í𝑘 𝑖=1,𝑗|𝑖𝑥 =𝑗𝑥𝛼(𝑖𝜌 𝑗𝜌𝛼) Í𝑘 𝑖=1 𝑖𝜌 Í𝑘 𝑖=1 𝑖𝜌𝛼

𝑘Í𝑘 𝑖=1(𝑖𝜌 )2 (Í𝑘 𝑖=1 𝑖𝜌 )2

𝑘Í𝑘 𝑖=1(𝑖𝜌𝛼)2 (Í𝑘 𝑖=1 𝑖𝜌𝛼)2 . (8)

Here, as the number of samples increases, we expect 𝜇𝑟to approach 1. This measure, as with others in this section, will generally take much longer to converge completely than measures based on ranking alone, since it is unlikely that approximate solutions will produce exactly the same scores as their exact counterparts, even if the variables ultimately highlighted remain the same. Depending on how these potential explanans are processed, differences in such scores could result in fundamentally different explanations being generated for users. Similar in spirit to 𝜇𝑟, we may understand the similarity between 𝑘𝜌 and 𝑘𝜌𝛼in a more geometric manner. Let 𝜌 = 1

𝑘 Í𝑘 𝑖=1 𝑖𝜌 and 𝜌𝛼= 1

𝑘 Í𝑘 𝑖=1 𝑖𝜌𝛼be the mean scores for the exact and approximate causal sets, respectively. We can find the slope of a least-squares line of best fit, where approximate values are plotted as functions of their exact values, as

𝜇𝑚(𝑘𝜌 ,𝑘𝜌𝛼) =

Í𝑘 𝑖=1,𝑗|𝑖𝑥 =𝑗𝑥𝛼(𝑖𝜌 𝜌 )(𝑗𝜌𝛼 𝜌𝛼)

Í𝑘 𝑖=1(𝑖𝜌 𝜌 )2 . (9)

As the number of samples increases, 𝜇𝑚will approach 1. This measure is very sensitive to errors at the extremes (ranks 1 and 𝑘), but is less so to systematic errors that result in a more uniform shift of approximate scores with respect to their exact counterparts. So far, we have focused the more informed measures primarily on how well the top 𝑘approximate sets match the exact solution in terms of order and, to a lesser degree, score scale. All of these measures converge more slowly as 𝑘grows larger. Moreover, many of them are particularly susceptible to localized discrepancies of one form or another, including scale. Here, we present a final measure that offers an alternative emphasis. Let ||𝜌 ||1 and ||𝜌𝛼||1 be the sum of all exact and approximate scores, respectively. Let ||𝑘𝜌 ||1 and ||𝑘𝜌𝛼||1 define this quantity for the top 𝑘items. Then, we have

𝜇𝑝(𝜌 , 𝜌𝛼) = ||𝑘𝜌𝛼||1

As the number of samples increases 𝜇𝑝will approach 1. This measures the relative score mass in the top 𝑘between the exact and approximate results, making it scale-invariant. Many scores rely either directly or indirectly on the number of analyses performed in their calculation. Since one of the most common strategies for approximation is to skip some of these analyses, this can decrease the effectiveness of scale-sensitive measures. The measures presented here (evaluated further in 8.2.2) represent only a fraction of plausible choices, and many avenues remain unexplored. What is clear is that providing scores alongside causal sets can be helpful in determining what to show a user or even when to terminate anytime approximate computation. To this end, we next present Mean RESP, an algorithm for causal analysis that operates on a continuum of approximation, provides principled responsibility scores alongside identifying weak causes, and which may be applied to both MDPs as well as black box models such as neural networks.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:24 Nashed, Mahmud, Goldman & Zilberstein

7 Mean RESP

The previous sections outline a general pattern of analysis and a clear need for both approximate algorithms and the provision of additional ranking or scoring information to supplement causal determination. Moreover, we would like the scoring to be calculated as a by-product of causal determination and behave as follows:

(1) Property 1: A set of variables 𝑋 𝐹that is not a cause of the event 𝜙should have 𝜌= 0. A set of variables 𝑋 𝐹that is a cause of the event 𝜙should have 𝜌> 0. (2) Property 2: As the cause allows a set of witness variables 𝑊, 𝜌should divide the causal responsibility among the cause 𝑋and witness 𝑊in a principled manner. That is, the larger the witness set required to identify 𝑋as causal, the lower 𝜌should be. (3) Property 3: A relatively higher value of 𝜌for a cause 𝑋 𝐹should indicate the event 𝜙is relatively more affected by the assignment 𝑋= 𝑥.

To meet these criteria, we propose an algorithm similar to that presented by Bertossi et al. [2020], based on the concepts of responsibility and blame from Chockler and Halpern [2004]. Algorithm 8, which we call Mean RESP as it essentially computes a mean responsibility score over sets of potential variables, iterates directly through possible weak causal sets (line 4), and then progressively checks larger sets 𝑊for assignments 𝑤that satisfy Definition 1. Lines 11-15 and 19-22 check conditions 2B and 2A, respectively. Finally, lines 23-27 compute the responsibility score 𝜌, used to determine whether 𝑋is weakly causal. In addition to finding weak causal sets consistent with Definition 1, 𝜌provides a ranking over causal sets. Note that Mean RESP returns a value greater than 0 whenever both 2A and 2B hold, ensuring Property 1. The first condition in Definition 1 always holds for a policy or classifier, and therefore is not explicitly checked. Additionally, accumulating responsibility scores in line 23, where the size of the witness set appears in the denominator, provides Property 2. Due to the accumulation in line 22, 𝜌retains proportionality to the fraction of assignments to 𝑋that satisfy the definition of weak cause. This gives Mean RESP Property 3.

7.1 Monte Carlo Mean RESP Often, models are too large for exact inference. Moreover, we may wish to apply more restrictive versions of Definition 1, or extend the analysis to real-valued variables without applying discretization. We can address these problems by modifying Mean RESP. If variable domains are real-valued or finite but very large, we can approximate inference by sampling rather than iterating over the sets in lines 4, 5, 7, and 8, as shown in the pseudocode comments in Algorithm 8. The condition that |𝑊| = 𝛽on line 7 is retained. Thus, sampling may be constrained along several dimensions independently, based on the most expensive features of the problem. Counterfactual variable assignment and event pairs are constructed and counted in the same way, and non-zero responsibility scores still indicate weak causality. Monte Carlo Mean RESP recovers the exact solution in the limit, but the practical challenge becomes determining sampling domains that efficiently cover important counterfactual scenarios. This approach turns out to produce a scaled form of the Shapley value in expectation; however, we note that the scores produced via Mean RESP are also applicable to sets of variables. If we assume witness set samples are equally divided among all allowed 𝛽-values5, then we have the following general expression for the expected responsibility score:

E𝛽 U(0,|𝐹\𝑋|),𝑊 P𝛽(𝐹\𝑋),𝑤 D(𝑊),𝑥 D(𝑋) [𝜙(𝑥𝑝)

1 + 𝛽(𝜙(𝑥) 𝜙(𝑥𝑚))]. (11)

5This is often a valid assumption, given small enough (though still, practically speaking, large) values of 𝜅and 𝜂. More generally, the assumption of equiprobable 𝑊under different 𝛽values does not hold, since most 𝑊 P(𝐹\ 𝑋) have size close to |𝐹\ 𝑋|/2.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:25

Algorithm 8 Mean RESP

1: Input: Potential causal variables 𝐹, policy 𝜋, initial setting 𝑠0 2: Output: Set of weak causes CW, responsibility scores R. 3: CW, R 4: for all 𝑋 P(𝐹) do 1...𝜅sample 𝑋 P(𝐹) 5: for all 𝛽= 0...|𝐹\ 𝑋| do 𝛽= 0...𝛽𝑈𝐵< |𝐹\ 𝑋| 6: 𝜎,𝑡 0 7: for all 𝑊 P(𝐹\ 𝑋) such that |𝑊| = 𝛽do 1...𝜂sample 𝑊 P(𝐹\ 𝑋) 8: for all 𝑤 D(𝑊) do 1...𝜐sample 𝑤 D(𝑊) 9: 𝑏 True 10: 𝑡 𝑡+ 1 11: for all 𝑊 P(𝑊) do 12: 𝑤 𝑤|𝑊

13: 𝑠 [𝑠0 𝑤 ] 14: if 𝜋(𝑠 ) 𝜋(𝑠0) then 15: 𝑏 False 16: break 17: if 𝑏then 18: continue 19: for all 𝑥 D(𝑋) such that 𝑠0|𝑋 𝑥 do 1...𝜒sample 𝑥 D(𝑋) 20: 𝑠 [𝑠0 (𝑥 𝑤)] 21: if 𝜋(𝑠 ) 𝜋(𝑠0) then 22: 𝜎 𝜎+ 1 | D(𝑋) |

23: 𝜌 𝜎/(𝑡(1 + 𝛽)) 24: if 𝜌> 0 then 25: CW CW 𝑋 26: R.Append(𝜌) 27: break 28: return CW

Here, 𝑥𝑝= 𝑠 = [𝑠0 𝑤 ] and 𝑥𝑚= 𝑠 = [𝑠0 (𝑥 𝑤)] from lines 13 and 20 in Algorithm 8, respectively6. 𝜙(𝛼) = 1 if the event 𝜙is true given 𝛼(e.g. 𝜙(𝑥𝑝) = 𝜙𝑥𝑤 and 𝜙(𝑥𝑚) = 𝜙𝑥 𝑤). Here, 𝜙(𝑥𝑝) = 1 if 𝑋satisfies condition 2B from Definition 1, and 𝜙(𝑥𝑚) = 0 if 𝑋satisfies condition 2A from Definition 1. We can see in Equation (11) that, whenever 𝜙(𝑥𝑝) = 1, 𝜙(𝑥) = 1, and when 𝜙(𝑥𝑝) = 0, the entire expression will be zero, regardless of the value of 𝜙(𝑥). Thus, we can replace 𝜙(𝑥) with 𝜙(𝑥𝑝), yielding

E𝛽 U(0,|𝐹\𝑋|),𝑊 P𝛽(𝐹\𝑋),𝑤 D(𝑊),𝑥 D(𝑋) [𝜙(𝑥𝑝)

1 + 𝛽(𝜙(𝑥𝑝) 𝜙(𝑥𝑚))]. (12)

Removing the multiplicand, we recover a Monte Carlo approximation of the expected Shapley value:

E𝛽 U(0,|𝐹\𝑋|),𝑊 P𝛽(𝐹\𝑋),𝑤 D(𝑊),𝑥 D(𝑋) [(𝜙(𝑥𝑝) 𝜙(𝑥𝑚))]. (13)

Intuitively, responsibility can be thought of as distance-weighted Shapley value, where 1 + 𝛽captures the difference between the original input 𝑥and 𝑥𝑚, and 𝜙(𝑥𝑝) captures the difference in output between 𝜋(𝑥) and 𝜋(𝑥𝑝). The simplicity and efficiency of Monte Carlo Mean RESP make it an attractive option for many problems, and in theory it may be applied to other, more complicated variants of MDPs, such as partially observable MDPs.

6We use the notation 𝑥𝑝(for 𝑥+) and 𝑥𝑚(for 𝑥 ) to match notation commonly used in Shapley value calculations. The terms 𝜙(𝑥𝑝) and 𝜙(𝑥𝑚) are similar to 𝜋(𝑋 𝑥𝑖) and 𝜋(𝑋) from footnote 1, respectively.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:26 Nashed, Mahmud, Goldman & Zilberstein

In 8.1, we will further theoretically analyze this algorithm, and in 8.2 we show a variety of empirical results to this effect. First, however, we give an overview of a generalized version of Mean RESP compatible with multiple related but distinct versions of weak cause, essentially allowing us more flexibility beyond Definition 1.

7.2 Generalized Mean RESP

Although we have so far presented a relatively rigid framework for identifying causes of agent behavior, built on Definitions 1 3, there are several plausible versions of Mean RESP which all detect sets of variables that satisfy related definitions of weak cause. Before introducing a novel, generalized version of Mean RESP, we outline the most important axes of variation over which we would like to generalize. First, there are at least three different definitions of weak cause proposed by Halpern, and we show compatibility with both the updated definition [45] (RESP-UC, Algorithm 10) and the original [44] (RESP-OC, Algorithm 11). We omit a more recent, modified definition [43]. Not only will the choice of definition for weak cause affect the resultant responsibility scores, but it will also change what is identified as a cause. Some sets of variables will have non-zero responsibility scores under only one definition. Second, the mean responsibility score can be calculated in two ways. It may be tallied over only the witness sets of size 𝛽𝑚𝑖𝑛, where 𝛽𝑚𝑖𝑛is the smallest 𝛽for which there exists a satisfying witness set (as in [114]). Or, it may be tallied over all witness sets, regardless of 𝛽, as in Algorithm 8. Actual causes with at least some small witness sets will receive lower responsibility scores under the latter design. Third, as responsibility incrementally accrues with respect to an actual causal set, these increments can either be counted equally, or can be normalized by the size of the domain of the actual cause. We refer to this as the option to perform domain normalization, and the theory behind it is that with a larger domain the chance that some assignment 𝑋= 𝑥 will meet the conditions of Definition 1 increases, and thus the responsibility should correspondingly decrease. This option is represented as a comment within the pseudocode. None of these choices interfere with Properties 1 3 outlined earlier, but they may subtly alter the relative responsibility assigned to different weak or actual causes. As there is no clear reason based on first principles to prefer one choice over another, these decisions involve tradeoffs. For example, short circuiting after finding a single witness set of size 𝛽that satisfies Definition 1 will save compute time, but may give a slightly higher or lower responsibility score depending on whether the variables of interest are important under many counterfactual scenarios or only a few. Similarly, foregoing domain normalization may downplay the importance of singleton causes relative to causes composed of multiple variables or other singleton variables with larger domains.

Algorithm 9 Generalized Mean RESP

1: Input: All variables 𝐹, variables of interest (Vo I) 𝑋, event 𝜙, initial variable settings 𝑠0, initial Vo I assignment 𝑥, responsibility function RESP 2: Output: Mean responsibility scores 𝜌. 3: 𝜌 0 4: for all 𝛽= 0...|𝐹\ 𝑋| do 5: 𝜎,𝑇 0 6: for all 𝑊 P(𝐹\ 𝑋) such that |𝑊| = 𝛽do 7: for all 𝑤 D(𝑊) do 8: 𝑇 𝑇+ 1 9: 𝜎 𝜎+ RESP(𝜙,𝑋,𝑊,𝑤,𝑠0)

𝑇 11: return 𝜌 |D(𝑋) |

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:27

Algorithm 10 RESP-UC

1: Input: 𝜙,𝑋,𝑊,𝑤,𝑠0 2: Output: Score 𝜎 3: 𝜎 0 4: for all 𝑊 P(𝑊) do 5: 𝑤 𝑤|𝑊

6: 𝑥𝑝 [𝑠0 𝑤 ] 7: if 𝜙(𝑥𝑝) then 8: return 𝜎 9: for all 𝑥 D(𝑋) such that 𝑠0|𝑋 𝑥 do 10: 𝑥𝑚 [𝑠0 (𝑥 𝑤)] 11: if 𝜙(𝑥𝑚) then 12: 𝜎 𝜎+ 1 13: return 𝜎 1+|𝑊|

Algorithm 11 RESP-OC

1: Input: 𝜙,𝑋,𝑊,𝑤,𝑠0 2: Output: Score 𝜎 3: 𝜎 0 4: 𝑥𝑝 [𝑠0 𝑤] 5: if 𝜙(𝑥𝑝) then 6: return 𝜎 7: for all 𝑥 D(𝑋) such that 𝑠0|𝑋 𝑥 do 8: 𝑥𝑚 [𝑠0 (𝑥 𝑤)] 9: if 𝜙(𝑥𝑚) then 10: 𝜎 𝜎+ 1 11: return 𝜎 1+|𝑊|

Algorithm 9 thus represents generalized Mean RESP. After fixing a witness𝑊= 𝑤(lines 7-8), the responsibility score is calculated (line 10). Domain normalization on line 12 is optional, and in Monte Carlo Mean RESP the divisor is the number of instances of 𝑥 sampled, not the (potentially infinite) size of the domain. In RESP-UC (Algorithm 10), if condition 2B holds from Definition 1 (lines 4-9), then we check for condition 2A (lines 10-12).

7.3 Semantics of Causal Variables as Explanans We have presented several related methods for identifying causal variables given an event in the context of a stochastic planner, policy approximator, or classifier. However, not all causal variables are equally well-suited for use as explanans. This is not only due to individual and systemic human preferences, covered at length by Miller [2019] and in depth with respect to our system in 8.3, but also due to the explanans and the counterfactual scenarios themselves representing fundamentally different semantics. Most importantly, we need to highlight that explanations generated using our framework, or any of the others we refer to or compare against, do not and cannot explain action outcomes. That is, they cannot provide reasons for why an action with stochastic outcomes resulted in a particular outcome. They can, however, provide reasons for why a particular action was or was not chosen. When explaining the nature of such a partial policy, there are two broad classes of counterfactual scenarios. The first references a fixed world model and a counterfactual scenario. In an MDP, this corresponds to fixed transition and reward functions and thus a fixed policy, and the counterfactual scenario corresponds to the agent being in a different state. Explanations generated in this way will be based on reasons related to the state factors

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:28 Nashed, Mahmud, Goldman & Zilberstein

Table 2. Comparison of method applicability Method 𝐹 𝑅 𝑇 𝑉 Causal?

Elizalde et al. [2009] Yes - - - No Russell and Santos [2019] Yes - - - No Khan et al. [2009] - Yes - - No Juozapaitis et al. [2019] - Yes - - No Bertram and Wei [2018] - Yes - - No Wang et al. [2016] - - Yes - No Madumal et al. [2020] Yes Yes - - Yes Mean RESP Yes Yes Yes Yes Yes

of the MDP and their possible alternate values. Generally, such explanations will rely on reasoning similar to I took action 𝑎because 𝑋is true in my current state. The second class references a fixed scenario and a counterfactual model of the world. In an MDP, this corresponds to altering the transition or reward functions while fixing the state factors. Explanations generated using this class of counterfactual scenarios will provide reasons for action selection based on alternate possible worlds. Generally, they will rely on reasoning similar to I took action 𝑎because 𝑋is true about my current world. When using a policy approximator such as a neural network, this type of counterfactual query is not possible as agent s world model is not exposed in an interpretable manner. Technically, the calculations can be run, but it is unlikely that the results can be translated fruitfully back into terms a human user could understand. Given these choices and the potential for further differentiation, such as between transition and reward variables, it is not at all clear how to best define potential explanans, 𝑋. Intuitively, all previous work defines 𝑋 as being, for example, the set of all state factors or the set of all reward variables. That is, we tend to define 𝑋 according to some semantic type. While many other methods implement forms of analysis specific to certain subsets of variables, thus requiring such a definition, our framework does not. However, while it may be difficult to justify from first principles, we still find it useful to follow this intuition, albeit with additional flexibility, as our framework allows us to furnish explanations using many definitions for 𝑋, as seen in Table 2. Broadly, we have identified four types of explanation in the literature, each focusing on one component of the MDP tuple: state factors (𝐹) (Elizalde et al. [2009]; Russell and Santos [2019]), rewards (𝑅) (Khan et al. [2009]; Juozapaitis et al. [2019]; Bertram and Wei [2018]), transitions (𝑇) (Wang et al. [2016]), and future states and values (𝑉) [125]. These papers define metrics, algorithms, and definitions particular to their type, and lead us to define the following.

Definition 5. 𝑌-type explanations use explanans 𝑋 𝑌. For example, 𝑅-type explanations use reward variables.

In this section, we cover a range of results, both theoretical and empirical, that address questions regarding the practical, effective use of our framework to generate explanations of AI reasoning systems. Theoretically, we focus primarily on run-time bounds, convergence properties, and error rates of Mean RESP and Monte Carlo Mean RESP. Empirically, we cover some comparisons with Shapley values, provide some insight into effective measures of similarity between the top 𝑘lists of explanans, and study an array of user preferences, as well as more basic results that confirm earlier theoretical claims. Though our primary example is of an autonomous vehicle, we also present some results on other types of domains, including both discrete and continuous planning domains as well as some classification systems. This set of experiments represents a much more complete picture of system performance than has previously been available for any other MDP explanation system, and overall, our results suggest that Mean RESP and its variants are an extremely competitive option for automatic explanation generation for a wide variety of AI systems, and especially model-based planners, such as MDPs.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:29

8.1 Theoretical Analysis Here, we cover several properties of Monte Carlo Mean RESP, as well as provide some preliminary run-time and memory bounds for some of the algorithms presented earlier. We also briefly discuss the tightness of these bounds in practice.

8.1.1 On the Performance of Monte Carlo Mean RESP: Errors and Convergence. We are most interested in the correctness, one-sidedness of errors, error rate, and sample efficiency, and below we present several propositions exploring these topics. The first concerns the adherence of Mean RESP to the definition of weak cause given in Definition 1.

Proposition 2.1. Given a set of variables 𝑋and an event 𝜙, 𝑋is a weak cause of 𝜙according to Definition 1 if and only if the responsibility score, 𝜌, output by Mean RESP (Algorithm 8) is greater than zero.

Proof: If 𝜌> 0, the condition on line 21 in Algorithm 8 must be true. This condition is True when there exist assignments 𝑋= 𝑥 and𝑊= 𝑤such that 𝜙(𝑥 ,𝑤) is False. The existence of such assignments satisfies condition 2A from Definition 1. Line 21 is only executed when 𝑏is True, corresponding to the variable assignment 𝑋= 𝑥 satisfying condition 2B from Definition 1 w.r.t. event 𝜙and all alternative contingency sets 𝑊= 𝑤 . Condition 1 from Definition 1 is always trivially met as the event 𝜙represents an existing partial policy. Thus, 𝜌> 0 iff there exist 𝑋= 𝑥 and 𝑊= 𝑤that satisfy Definition 1.

Proposition 2.2. The false positive rate of Monte Carlo Mean RESP is 0.

Proof: Estimates for 𝜌 , denoted 𝜌, are initialized to 0. As shown in the proof of Proposition 2.1, the only way to increment 𝜌is to satisfy Definition 1. Therefore, it is not possible to attain a positive 𝜌estimate without being a weak cause, and thus Monte Carlo Mean RESP has a false positive rate of 0.

We also see that the error rate in Monte Carlo Mean RESP is bounded.

Proposition 2.3. Let 𝜌 be the true responsibility score for the set of potential causal variables 𝑋. The expected false negative rate of Monte Carlo Mean RESP, 𝜀, after 𝑛samples is (1 (𝜌 |𝐹\ 𝑋|))𝑛 𝜀 (1 𝜌 )𝑛.

Proof: 𝜌 represents the number of times a contingency set𝑊exists such that 𝑋satisfies Definition 1, divided by |𝑊|. Thus, the probability of not classifying 𝑋as a weak cause (i.e., a false negative) given 1 sample will be at least 1 (𝜌 |𝐹\ 𝑋|) and at most 1 𝜌 since 1 |𝑊| |𝐹\ 𝑋|. If sample contingency sets are drawn independently, then the false negative rate using 𝑛samples is at least (1 (𝜌 |𝐹\ 𝑋|))𝑛and at most (1 𝜌 )𝑛. If using domain normalization, as in Algorithm 8, then the same proof works with 𝜌 = 𝜌 |D|.

This proposition essentially shows us that when responsibility is high, expected sample efficiency is high (the expected error rate upper bound is low). This makes sense since we expect sets of variables with high responsibility score to be more easily identifiable as weak causes. When |𝐹\ 𝑋| is large, the maximum size of possible contingency sets |𝑊| is relatively also large, and thus in some cases the probability of picking some assignment𝑊= 𝑤that satisfies Definition 1 increases, decreasing the lower bound on the expected false negative rate. Moreover, we may also establish probabilistic bounds on the error of our responsibility score estimates themselves (rather than just the false negative rate) as a function of the number of samples.

Proposition 2.4. Let𝑛be the number of samples examined by Monte Carlo Mean RESP, 𝜌 be the true responsibility score for the set of potential causal variables 𝑋, and 𝑘= |𝐹\ 𝑋|. Then the probability that the estimated 𝜌deviates by 𝜖𝜌 or more is at most 2𝑒 𝜖𝑛/3𝑘. That is, 𝑃(|𝜌 𝜌 | 𝜖𝜌 ) 2𝑒 𝜖𝑛/3𝑘.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:30 Nashed, Mahmud, Goldman & Zilberstein

Proof: According to the Chernoff bound, we can write

𝑘) 2𝑒 𝛿2𝜌 𝑛/3𝑘. (14)

𝜖 𝜌 , we get

𝑘 ) 2𝑒 𝜖𝑛/3𝑘. (15)

This is equivalent to

𝜖𝜌 ) 2𝑒 𝜖𝑛/3𝑘. (16)

Largely for similar reasons as outlined regarding Proposition 2.3, the accuracy of estimates for 𝜌 is sensitive to the size of possible contingency sets, as the exact solution requires enumerating all such sets. Finally, we show a simple relation between Shapley values and responsibility scores.

Proposition 2.5. The exact Shapley value for a singleton set always upper bounds the exact responsibility score for the same set.

Proof: In Equation (12), 0 𝜙(𝑥𝑝)

1+𝛽 1. Thus, since the term (𝜙(𝑥𝑝) 𝜙(𝑥𝑚)) is just the Shapley value, then 𝜌 is always less than or equal to the Shapley value.

8.1.2 Preliminary Run Time and Memory Bounds. Table 3 shows some preliminary bounds on resource use. Here, |𝑆| and |𝐴| are the sizes of the state and action spaces, respectively. |𝑆0| is the size of event layer of the LCG representing the MDP. |𝑆0| |𝑆||𝐴|, and every variable in |𝑆0| represents a binary value of True or False indicating whether or not a policy may perform an action in a particular state7. Thus, as 𝑆0 may represent any possible counterfactual, partial policy, including full policies, |D(𝑆0)| 2|𝑆||𝐴|. Below, we also use the fact that a power set of 𝑄, P(𝑄), contains 2|𝑄| elements, on average a set drawn from P(𝑄) contains |𝑄|/2 elements, and the average domain size of all elements of the power set is upper bounded by | D(𝑄) |

2|𝑄|/2 , as each ground element 𝑞has a 1

2 chance of existing in any given element of the power set. Intermediate layers of the LCG may have different structure depending on design choices, so our bounds for Algorithm 2 compared to Algorithm 1 are much looser. V denotes all other endogenous variables, not including those in layer 𝑆0. For an MDP, this may include up to the entire transition function |𝑇| = |𝑆|2|𝐴| and the entire reward function |𝑅| = |𝑆|2|𝐴|, thus upper bounded 8 by |V| 2|𝑆|2|𝐴|. We let |𝑋| be the number of variables that are checked at any point for weak causality. All variables in an MDP are tied to specific states. Thus, the path-finding-between-variables operation in Algorithm 3 can be converted to a smaller, path-finding-between-states problem. Using BFS on a fully connected MDP results in |𝑆|3 operations.

7This exact definition could change given a stochastic policy. For example, one may want to know whether actions have above or below some probability of being selected rather than being strictly required or prohibited. The following results hold for any such binary categorization of action probabilities. 8In this paper we use 𝑅(𝑠,𝑎) to denote the reward function s dependence on both the current action and current state. Other popular notations include 𝑅(𝑠,𝑎,𝑠 ) and 𝑅(𝑠), which offer more or less fine-grain control of the reward signal, but do not fundamentally affect the conclusions of this paper. Here we write |𝑅| = |𝑆|2|𝐴| as it is the worst case.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:31

Table 3. Worst-case resource bounds. We assume nothing about the structure of the MDP, and thus the bounds appear completely intractable. In practice, we find that simple approximations create tractable problems.

Alg. Res. Complexity Bottleneck

1 time 22|𝑆||𝐴| + 27|𝑆||𝐴|/2 + 𝑘(Alg. 2) Enumerating causal / contingency sets space 3|𝑆||𝐴|2|𝑆||𝐴| Storing 𝑅, 𝑅

2 time |D(V)|2(23|V|/2 + 23|V|) Enumerating causal / contingency sets space 3|V|2|V| Storing 𝑅 3 time |𝑆| 𝑒(|𝑆| 2)! (|𝑋||𝑆|3 + |𝑋|2 + |V|) Finding / checking all paths space |𝑆| Storing one path in the LCG 4 time |CW|(log(|CW|) + |CW||𝑋|) Subset checks space |CW||V| Storing weak causal sets 6 time |𝑆|2(ℎ+ 1) Connectivity checks + Label comparisons space |𝑆|2 Storine transition / reachability matrix 8 time 2|V|/2|D(V)|(23|V|/4 + |D(V)|) Enumerating causal / contingency sets space |V|2|V| 1 Storing weak causal sets 8MC time 𝜅𝜂𝜐𝛽𝑈𝐵(2𝛽𝑈𝐵+ 𝜒) Enumerating causal / contingency sets space 𝜅|V| Storing weak causal sets

Checking all 𝑒(|𝑆| 2)! paths for all variables in 𝑋takes |𝑋||𝑆| 𝑒(|𝑆| 2)! operations9 Also recall that CW is the set of all weak causes, from which actual causes may be determined. In practice, worst-case bounds are very loose. In Algorithms 1 and 2, we check all assignments of 𝑋, and 𝑋 V. However, usually |𝑋| |V|, especially after reducing the LCG. Bounds for Algorithm 8 are also poor estimates of in-practice cost, since it uses short-circuiting. Some bounds tightness depends on the connectivity of the MDP. For example, in Algorithm 6, the bounds assume fully-connected MDPs, but most MDPs are sparse and thus the number of edges 𝐸 |𝑆|2. Moreover, if ℎis small compared to the width of the MDP, run time will decrease since nodes labeled are handled in linear time. There are other possible improvements since theoretically every explanation can be pre-computed, but this is impractical due to the number of possible explanations. Notably, constructing LCGs for each state, regardless of how 𝜙and 𝑋are specified, and computing connectivity and reachability allows reductions and causal model approximations to be applied quickly online, given 𝑋and 𝜙. Empirical run-time results are presented in 8.2.4.

8.2 Empirical Analysis

Our empirical analysis covers several topics, including run-time, analysis of metrics, comparison of weak cause definitions and Shapley values, and two user studies comparing our framework to state-of-the-art MDP explanation methods. However, we begin with a case study highlighting the generality and flexibility of our framework.

8.2.1 Case Study: Explanation Diversity. The purpose of this study is to show how (1) our approach can handle semantically different types of causal queries, corresponding to different conceptions of MDP explanation in the literature, and (2) formal definitions of causality identify sensible explanans. Here, we qualitatively examine the correctness of causal attribution in the following simplified MDP domain. Consider a robot navigating the environment depicted in Figure 6. The agent knows: its (𝑥,𝑦) location, time to failure, 𝑐, and if its location is normal, ideal for repairs, or hazardous, 𝑡. Thus,

9Given a complete graph (worst case) of size 𝑛with vertices 𝑠and 𝑡, there are Í 1 𝑘 𝑛 1 (𝑛 2)! (𝑛 1 𝑘)! total simple paths from 𝑠to 𝑡. Here, 𝑘

ranges over path lengths as measured in edges. This sum can be written as (𝑛 2)!(1 + 1

2! + . . . + 1 (𝑛 2)! ), which can ultimately be

simplified to (𝑛 2)!𝑒 .

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:32 Nashed, Mahmud, Goldman & Zilberstein

Danger Service

(x,y) Location

(5,3) (4,3)

Fig. 6. Example domain.

Table 4. List of scenarios.

ID 𝑌 𝑠0 (𝑥,𝑦,𝑡,𝑐) 𝑅1 𝑅2 𝑅3 𝑅4 𝑅𝐶 𝐴𝑐𝑡𝑖𝑜𝑛

1 𝐹 (5,1), R, 5 80 -50 -40 100 2 Repair 2 𝐹 (5,2), N, 4 80 -50 -40 100 2 Up 3 𝐹 (5,3), N, 3 80 -50 -40 100 2 Left 4 𝐹 (4,3), N, 2 80 90 -40 100 -1 Left 5 𝑅 (5,3), N, 3 80 -50 -40 100 -1 Left 6 𝑅 (5,3), N, 3 80 -50 -40 100 2 Left 7 𝑇 (5,3), N, 3 80 80 70 100 -1 Left 8 𝑇 (5,3), N, 3 80 -50 -40 100 -1 Left 9 𝑉 (5,3), N, 3 80 -50 -40 100 -1 Left

the state factors are 𝑥 {1, . . . , 9}, 𝑦 {1, . . . , 6}, 𝑐 {0, . . . , 5}, 𝑡 {Normal(N), Repair(R), Danger(D)}. The actions are 𝐴= {Up, Left, Right, Repair}. If the agent breaks down and cannot be repaired or visits a hazardous state, it gets a reward of 10. Repairing has a reward of 𝑅𝐶, and reaching the 𝑖th goal state yields reward 𝑅𝑖. All other state-action pairs have a reward of 1. Last, transitions are deterministic with two exceptions. When taking action Left at (5, 3), the agent transitions to (4, 3) (with probability 𝑇𝐿= 0.6) or (4, 4) (with probability 1 𝑇𝐿= 0.4). When taking action Right at (5, 3), the agent transitions to (6, 3) (with probability 𝑇𝑅= 0.01) or (6, 4) (with probability 1 𝑇𝑅= 0.99). For all examples we considered event 𝜙= 𝜋𝑠0, and for explanation type 𝑌 we set 𝑋= 𝑌and apply Mean RESP. Table 4 shows the value of all state factors 𝑠0, rewards, and actions for each scenario. Below, we contextualize Mean RESP s output. Scenario 1: The cause for action Repair is 𝑡= R (𝜌= 0.64). Since 𝑅𝐶> 0, it is optimal to repair as it prevents failure later and is better than the default reward of 1. Scenarios 2 and 3: There are two causes of action Up in scenario 2: 𝑡= N (𝜌= 0.50) and (𝑥,𝑦) = (5,2) (𝜌= 0.26). Scenario 3 has the same causes, but 𝜋(𝑠0) = Left and (𝑥,𝑦) has 𝜌= 0.60. This is due to the topology at (5,3) compared to (5,2). In both cases, 𝑡is a cause since if 𝑡= R, 𝜋(𝑠0) = Repair. Scenario 4: The causes for action Left are 𝑐= 2 (𝜌= 0.85), (𝑥,𝑦) = (4,3) (𝜌= 0.68), and 𝑡= N (𝜌= 0.60). Note that unlike in previous scenarios, time to failure is both a cause and has the highest responsibility score. If the agent were to go directly to goal 2, it will break down at (4,5).

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:33

Scenario 5: There are two causes of action Left: 𝑅1 (𝜌= 0.30) and 𝑅3 (𝜌= 0.55). Since we bound R = [ 100, 100], no values for 𝑅2 or 𝑅4 can change the outcome. 𝑅4 is already the maximum, and 𝑅2 alone is not a cause due to its relatively weak effect on the expected value of that subtree (transition variables are in U, not V). Scenario 6: Here, only 𝑅3 (𝜌= 0.55) is causal. Since 𝑅𝐶> 0, the agent exploits this by repeatedly taking service at (3,3). Thus, 𝑅1 alone cannot affect the agent s policy since goals 1 and 4 will never be visited. This is a good example of the proposed approach identifying a possible poorly specified objective. Scenarios 7 and 8: Since 𝑅1 = 𝑅2 in scenario 7, the only cause of the action Left is 𝑇𝑅(𝜌= 0.66). However, in scenario 8, both 𝑇𝑅(𝜌= 0.66) and 𝑇𝐿(𝜌= 0.56) are causes. Scenario 9: Using Mean RESP with beam search, we find that the most influential set of trajectories lead to goal 1. That is, its value contributes most to the expected value, even though the most likely (𝑝=0.6) outcome of taking action Left at (5,3) reaches goal 2. Summary: These scenarios show how different sets of explanans provide semantically distinct insights into variables effects on actions. This underscores the utility of flexibility in generating explanations since the best explanans may be unknown pre-deployment. Because existing methods only analyze one component of the MDP, they cannot produce most of these explanations (Table 2).

8.2.2 Evaluating Measures for Comparing Approximate Explanations. To better understand how different measures of similarity between two top 𝑘lists of explanans might capture different qualitative judgements or distinctions from users, we compare their evolution over time as Monte Carlo Mean RESP takes additional samples. In our experiments, 60 states are sampled from the Lunar Lander10 MDP, from Open AI Gym [16], and then both exact and Monte Carlo Mean RESP are run, the latter for a total of 5,000 samples. After each sample, the top 𝑘explanans along with their rank and responsibility score are recorded. The proposed measures are then run 5,000 times for each state, measuring the difference between the top 𝑘explanans after the 𝑖th sample in Monte Carlo Mean RESP and the top 𝑘explanans according to exact Mean RESP. In particular, we are not just interested in the behavior of individual measures, but rather, whether or not any of them capture unique qualities of the top 𝑘results that others do not, suggesting increased utility when used in combination. To highlight this, we plot all metrics against each other, bilaterally across several plots. Figure 7 illustrates some of these comparisons. In each sub-figure, two measures are plotted against each other, each comparing the result of Monte Carlo Mean RESP after every sample to the result of exact Mean RESP. Each point in the scatter plots represents a pair of measures comparing an approximate explanation from some state against the exact explanation for that state. The colors correspond to sample order, with dark blue representing samples near the beginning, and light yellow representing samples near 5,000. Because Monte Carlo Mean RESP converges so quickly, the linear color map shown here actually operates on log( 𝑛

20 ), where 𝑛is the sample number, rather than a direct linear map. We can see from these figures that some measures, such as the two popular rank correlation measures in Figure 7a, are very similar, likely to the point of being redundant with respect to identifying meaningful changes in explanation quality. However, this is not the case for the other two pairs of measures. For example, in Figure 7b, Euclidean distance rapidly converges to 0, while Spearman s 𝜌remains well below 1 for some time. This suggests that the exact responsibility scores are quite close in magnitude and thus their order can be changed by small deviations. Depending on the model, this may indicate either an option to terminate earlier than expected if responsibility values are more important, or a need to increase the number of samples if ordinal relationships between causes are more important. Moreover, some pairs of metrics, such as those in Figure 7c, appear to measure orthogonal, equally-scaled phenomena, converging at roughly the same rate but not necessarily in a correlated manner. Using such a pair of metrics may offer a much more robust condition for control decisions

10https://gymnasium.farama.org/environments/box2d/lunar_lander/

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:34 Nashed, Mahmud, Goldman & Zilberstein

(a) Kendall s 𝜏v. Spearman s 𝜌

(b) ℓ2-norm v. Spearman s 𝜌

(c) Pearson s 𝑅v. Top 𝑘Resp.

Fig. 7. Selection of measure comparisons. Quantization in sub-figure (a) is due to Kendall s 𝜏operating on small lists.

like early termination. For additional experimental details and a comprehensive illustration of all comparisons, please see Appendix B. These qualitative results suggest that there do not exist single measures that concisely capture the evolution of approximate top 𝑘causal sets over time and open several new directions for research, which we propose as interesting open problems. First, more research is needed to understand if and how these measures, others proposed [26], or combinations thereof, might be used to automatically reject obviously bad approximation settings that yield results far from ground truth. This would allow automated parameter tuning, and vastly increase the efficiency and impact of user studies by reducing wasted user study resources on clearly flawed systems. Second, such measures may also detect favorable conditions for early termination of Monte Carlo Mean RESP or similar algorithms, creating a more flexible type of anytime explanation" system. To the best of our knowledge, no such system yet exists. Last, although some of these measures appear in other user-centric applications, such as recommender systems, they are primarily used to analyze data already labeled by users. Here, we propose understanding essentially the opposite direction: performing user evaluation of approximate explanations to understand if, when, and how these measures capture differences that are important to humans. That is, given the difference between two explanations according to one or more measure, can we understand if they are satisfactory for a user?

8.2.3 Mean RESP and Shapley Values. While our framework was originally conceived to explain agent behavior that was the result of planning using MDPs, it may in principle also be applied to black box policy approximators and even function approximators designed for prediction or classification. Given this potential application, it is important to understand how, if at all, our framework differs from those built upon the concept of Shapley values, which is commonly applied in such domains for the purpose of explanation. To this end, we design several experiments to compare the two approaches. In these experiments, summarized in Figure 8, we find weak causal sets for 60 randomly selected states in four environments: Lunar Lander, Taxi11, Black Jack12, and a version of Highway Env13 (highway-fast-v0; Kinematic Observation). To compare with a Shapley-value method we implement a representative method [142], and focus on identifying causal variables from the set of state factors. Policies were learned via either value iteration or deep Q-learning, please see Appendix B for more experimental details. Given the Shapley values and responsibility scores generated for each method, we compare their similarity. Figures 8a, 8b, and 8c show the average Pearson, Kendall, and Spearman correlations, respectively, between the

11https://gymnasium.farama.org/environments/toy_text/taxi/ 12https://gymnasium.farama.org/environments/toy_text/blackjack/ 13https://github.com/Farama-Foundation/Highway Env

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:35

outputs of each method. Note that there is no better or best score. These graphs show only the correlation between outputs of approximate attribution algorithms. Somewhat surprisingly, we can see that the results from different responsibility methods are sometimes less similar than those from Shapley-value methods, and this is true both for ordinal and real-valued correlation metrics. However, if we look at Figure 8d, we can see that, predictably, OC and UC are more similar when a scalesensitive measure like Euclidean distance is used, which makes sense given expressions (12) and (13). Perhaps most importantly, it is clear from Figure 8e that the set difference can be substantial even for relatively small, seemingly simple domains. That is, choosing the top 𝑘items as explanans, will, on average, yield substantially different results. Here, some of the difference between domains can be explained by our need to set different values of 𝑘 due to the varying complexity of the domains. For example, the Black Jack domain has only three features, so we use 𝑘= 1, while 𝑘= 2 for Taxi, 𝑘= 4 for Lunar Lander, and 𝑘= 7 for Highway Env. Nevertheless, these differences are not small, suggesting an average difference in causal set contents of approximately 10-20% across all domains. These results have mixed implications. On the one hand, it appears that there are likely many acceptable technical solutions and definitions for establishing cause or attribution that remain relatively consistent across many domains. On the other hand, when we consider the necessity in most cases of choosing a subset of causes to present in an explanation (either setting 𝑘or choosing explanans from within the top 𝑘directly, corresponding to Step 3 from the general procedure outlined in 4), these differences may make these choices more difficult in that a larger number of tradeoffs must be considered. For more results on how different definitions of cause affect potential similarity measures as functions of the number of samples taken, please see Appendix B.

8.2.4 Run-Time Analysis and Approximation. A major concern for the practical application of any method based on causal or counterfactual analysis is whether it remains tractable as the problem size increases. Here, we show that the exact versions of Mean RESP indeed suffer from the curse of dimensionality in enumerating possible counterfactual scenarios. However, Monte Carlo Mean RESP retains remarkable efficiency and accuracy even as problem size increases. Figures 9a and 9b show how the running time of both methods scales as a function of the domain size of the variables and the number of features present, respectively. Lines plotted represent mean run times over 60 trials, and the 95% confidence interval falls within the width of the lines on the plots. All data was generated using different versions of the Lunar Lander domain. For more details on the experimental setup, please see Appendix B.

8.3 User Studies

Explanations are inherently multi-agent and human-centered. That is, all explanations have the common purpose of communicating information from one agent to another, and frequently the effectiveness of the explanation depends on the internal or hidden states of both agents. Moreover, the recipient is nearly always a human. Therefore, automated metrics alone, even those that consider information beyond purely an explanation s contents, are not sufficient for understanding whether explanations generated using our proposed framework, or any other framework, will meet their deployment needs. Given the diversity of potential applications, it is natural that there may be a variety of target effects of an explanation depending on the system. To understand whether our system achieves these effects we ran two user studies, SOTA and CONTEXT. While not designed using Hoffman et al. [2018] as a specific reference, our studies align primarily with two of the four main categories of evaluation they outline: goodness and satisfaction, although our experiments also include some measures beyond these criteria. There are many other possible studies measuring more specifically a user s internal model of the planning algorithm or their ability to perform tasks cooperatively with the agent being explained, where appropriate, but we leave those for future work. To the best of our knowledge, we present the largest user studies to-date measuring both absolute and relative performance of algorithms for explaining MDPs. In particular, we investigate the following hypotheses:

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:36 Nashed, Mahmud, Goldman & Zilberstein

Highway Lunar Taxi Black Jack Environment

Pearson's R

0.6 0.62 0.62

OC vs. UC OC vs. Shapley UC vs. Shapley

(a) Pearson s 𝑅

Highway Lunar Taxi Black Jack Environment

Kendall's Tau

0.56 0.49 0.5

OC vs. UC OC vs. Shapley UC vs. Shapley

(b) Kendall s 𝜏

Highway Lunar Taxi Black Jack Environment

Spearman's Rho

0.56 0.62 0.65

OC vs. UC OC vs. Shapley UC vs. Shapley

(c) Spearman s 𝜌

Highway Lunar Taxi Black Jack Environment

Euclidean Distance

OC vs. UC OC vs. Shapley UC vs. Shapley

(d) ℓ2-norm

Highway Lunar Taxi Black Jack Environment

Average Set Difference

OC vs. UC OC vs. Shapley UC vs. Shapley

(e) Average Set Difference

Fig. 8. Comparison of different weak cause (responsibility) definitions and Shapley value attribution. In sub-figures (a)-(c) we show correlation measures, thus a higher value indicates more similarity. In sub-figures (d) and (e) we show difference measures, thus a lower value indicates more similarity.

H1: Users prefer explanations generated using causal reasoning to those generated using heuristics. (SOTA) H2: Users prefer explanations supported by explanans representing specific types of information. (SOTA) H3: User preferences for explanations composed of specific types of information depend on the context within which they receive the explanation. (CONTEXT)

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:37

5 10 15 20 Domain Size

Time [seconds]

Mean RESP-OC Exact Mean RESP-UC Exact Mean RESP-OC Sampled Mean RESP-UC Sampled

(a) Run time as a function of the domain size.

2 3 4 5 6 7 8 Number of Features

10 2 10 1 100 101 102 103 104 105 106 107

Time [seconds]

Mean RESP-OC Exact Mean RESP-UC Exact Mean RESP-OC Sampled Mean RESP-UC Sampled

(b) Run time as a function of feature number.

Fig. 9. Timing results for exact and Monte Carlo Mean RESP for original (OC) and updated (UC) definitions. Note the log scale on both vertical axes and 95% CI error bars.

H4: Users differentiate meaningfully between concepts of trust, understanding, and necessity, in the context of an explanation. (CONTEXT) H5: User preferences for explanation methods or explanan types correlate with demographic or lifestyle indicators. (SOTA and CONTEXT)

8.3.1 Study Descriptions and Administration Overview. Roughly 200 participants, recruited via the crowd-sourcing platform Prolific14, participated in each study. Participants were fluent in English, aged between 18 and 65, and consisted of roughly 50% men and 50% women. Both studies had a similar high-level structure: participants were asked for some basic demographic information, followed by study-specific questions, and finally questions about their patterns of use and attitude towards different forms of AI technologies, which we put at the end of the study to avoid biasing participants prior to their evaluation of the explanations in the middle portion of the studies. Both studies were conducted using the same virtual driving domain [79], see Figure 10. Participants were shown short clips of simulated driving scenarios in a randomized order, where a car drives on a highway and changes speeds and lanes based on a policy from an MDP solved offline. After watching each clip, participants are shown one or more automatically generated explanations referring to the behavior of the vehicle shown in the clip, and asked questions about the explanation(s). Clip order and explanation order (for questions involving multiple explanations) were randomized. The predominant method for communicating automatically generated explanations to humans, Step 4 from the procedure in 4, is via text. While we use natural language templates for conducting our studies, we make no claims about the relative effectiveness of this method compared to other options. To present as little bias towards different explanations as possible, every explanation was presented using the same basic template: The car <took action> because <explanan 1>, ..., <explanan N>. Each action and explanan in the MDP was mapped to a custom phrase, signifying both what the explanan represented and its value, or the nature of the action. Although many of the explanans generated by other methods do not assert causality, at least not theoretically, we still use the conjunction because in order to maintain consistency. For a more comprehensive description of administration details, specific questions, and participant demographics for both studies, please see Appendix A.

14www.prolific.co

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:38 Nashed, Mahmud, Goldman & Zilberstein

Fig. 10. A screen capture from an example scenario in which the ego car (green) makes a left lane change.

8.3.2 The SOTA Study.

Design. Following demographic questions, participants were shown three clips, one at a time. After each clip, showing the agent (car) taking a particular action, such as the left lane change shown in Fig. 10, participants were shown 7 different explanations in a randomized order and asked to rank them relative to each other to produce a strict preference ordering. Specifically, they were asked to rank the following explanations according to how well you feel they would help someone understand the behavior of the vehicle". Each explanation was generated using a different method for automatic explanation, including three baselines15: (𝐹-type) [31], (𝑅-type) [71], and (𝑇-type) [154], as well as all four types of explanation generated by our proposed method. We find remarkably strong evidence in support of H1 and H2. Below we review the results in detail.

H1: Preferences for Causal Explanations. Figure 11 summarizes our findings on user preferences for explanation methods. The most important observation is that, for every explanation type (𝐹, 𝑅,𝑇), users prefer the explanations generated via causal reasoning over those generated via heuristic methods. We applied the Mann-Whitney Utest [96] to each pair of generation methods (21 in total), using an initial 𝛼-value of 0.5, and a Bonferroni corrected 𝛼-value of 0.0024 [12]. We detected the following preference ordering with p-values below 0.0001.

1) Prop-𝐹 Prop-𝑉 Elizalde Prop-𝑅 Prop-𝑇 Khan Wang

Here, 𝐴 𝐵denotes a strict preference for 𝐴over 𝐵, and denotes preference equality. We believe the overall preference for causal explanations is due to their consistent relevance across all scenarios. For example, although both heuristic and causal 𝐹-type methods have access to the same potential explanans, and occasionally produce the same explanations, there are some cases where the heuristic methods do not produce sensible explanations, such as the following, where the heuristic method fails to provide both relevant and complete information to explain the event in this case:

The car changed lanes to the right because the car was in the left lane .

In contrast, the method based on causal reasoning more reliably produces explanations that capture more completely the underlying reasons for the observed behavior:

The car changed lanes to the right because the car was in the left lane, the estimated time to collision in the left lane was 2 seconds, and the right lane was empty .

15Some of the state-of-the-art methods we compare against [71, 154] were originally designed to explain certain sub-types of MDPs or POMDPs. In order to run our experiments, we modified these methods as little as possible from their original implementations.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:39

Fig. 11. Preference likelihoods for MDP explanation methods. The color of cell (𝑟𝑜𝑤, 𝑐𝑜𝑙𝑢𝑚𝑛) indicates the probability that explanations generated using method 𝑟𝑜𝑤are preferred to explanations generated from method 𝑐𝑜𝑙𝑢𝑚𝑛.

Fig. 12. Preference likelihoods for MDP explanations based on explanan type.

H2: Preferences for Explanan Types. Figure 12 shows a similar analysis with respect to different explanan types (𝐹, 𝑅, 𝑇, 𝑉), where we can see that 𝐹-type explanations are, on average, preferred over 𝑅-type and 𝑇-type. For this analysis if there were multiple explanations based on the same explanans we used the most preferred rank. For example, if a user preferred 𝐴 𝐵 𝐶, where 𝐴and 𝐶were 𝐹-type and 𝐵was 𝑅-type, then we would record 𝐹-type as being ranked highest, followed by 𝑅-type. We apply the same pair-wise Mann-Whitney analysis as before, now with a total of 6 pairs and a Bonferroni corrected 𝛼-value of 0.0083. Following this analysis, we obtain the following preference ordering: 𝐹-type 𝑉-type 𝑅-type 𝑇-type with p-value 0.00001.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:40 Nashed, Mahmud, Goldman & Zilberstein

8.3.3 The CONTEXT Study. As is clear from the results of the previous study, given identical scenarios and identical methods for identifying multiple explanans, humans may exhibit strong preferences over explanations supported by semantically different explanans. This is not surprising given the number of different preferences for explanations already captured in the literature [104]. However, one piece crucially missing from our understanding of these preferences is whether or not they are context dependent [152], and if so, to what degree. Given the generality of our framework, which allows us to pick explanans from virtually any set of variables within a model, we have in a sense bypassed one of the typical gating or filtering mechanisms humans are theorized to use when simulating or constructing hypothetical or counterfactual worlds [68]. This begets a new problem (Step 3 from the procedure outlined in 4) which is to pick only the most preferred explanans for use in the explanation. Given the possibility for context-dependence and the relatively wide scope of possible contextual variables, we consider the following family of hypotheses. There are factors, such as inherent task risk, level of complexity, power differential, amount of independence, amount of cooperation, etc. that affect relative preferences for types of information (or reasons) referenced in an explanation. Here, we choose to focus on just one small subset of such hypotheses, the context of an autonomous driving scenario.

Design. This study split participants into two groups. Group A were told that they were a passenger, with no ability to change the behavior of the car, communicate with the car, or intervene in the navigation or driving process in any way. Group B were told that they were to play the role of a driver. They were not currently in control of the vehicle, but if they wished they could signal their intent to take over control from the autonomous vehicle at any point by hitting a designated key. This design was intended to create two groups of users who were (possibly) interested in different forms of information about the car s operation and thus would exhibit different evaluations of the explanations presented subsequently. For the exact prompts, please see Appendix A. Following demographic questions, participants were shown several clips, one at a time. After each clip, they were shown a single explanation and asked to rate on a 5-point Likert scale how the explanation affected their trust of the system, understanding of the system, and the necessity of the explanation itself given what was occurring in the simulated scenario. All explanations were generated using our proposed method, and were equally distributed between different types of explanans (𝐹-, 𝑅-, 𝑇-, and 𝑉-type). We find evidence in support of H3 for 𝑉-type explanations, and H4, but not H5. Below we review the results in detail.

H3: Context-dependent Preferences for Explanan Types. Figure 13 shows the distribution of Likert scores related to trust, understanding, and necessity on all scenarios, conditioned on the type of explanation presented and on whether the user was given the passenger prompt (passive user) or the driver prompt (active user). To each of the four pairs of distributions, we applied the two-sample Kolmogorov-Smirnov test [73, 139], and thus have a Bonferroni corrected 𝛼-value of 0.0125. With p-value 0.00530, we find that for 𝑉-type explanations, passive and active users exhibited a difference in their evaluations. We also saw some effect for 𝑅-type explanations (p-value 0.06026), but we cannot make additional conclusions at this time. Interestingly, both 𝐹-type and 𝑉-type explanations make use of the same underlying types of variables (state factors), but differ in how they present them. 𝐹-type explanations reference state factor variables and values that are currently true in the scenario, and 𝑉-type explanations reference state factor variable values that may or may not be true at the current time. For more example explanations, please see Appendix A. At a high-level, these results seem to parallel the psychological theory of agency. Identification of agentive entities is theorized to be a key capability for both humans and some animals [38], and one of the primary ways we identify other agents is through observing their actions [19]. Moreover, it has been shown that robots that do not look human can still elicit agency attribution in humans [109]. Such agents are called instrumental agents [37]. So-called communicative agents have a further ability to express their intent via explicit communication [37]. An interesting hypothesis for future consideration is whether 𝑉-type explanations elicit context-dependent

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:41

1 2 3 4 5 Likert Scale

Aggregate Likert Response for F-type Explanations

Passive Active

(a) 𝐹-type Explanations

1 2 3 4 5 Likert Scale

Aggregate Likert Response for R-type Explanations

Passive Active

(b) 𝑅-type Explanations

1 2 3 4 5 Likert Scale

Aggregate Likert Response for T-type Explanations

Passive Active

(c) 𝑇-type Explanations

1 2 3 4 5 Likert Scale

Aggregate Likert Response for V-type Explanations

Passive Active

(d) 𝑉-type Explanations

Fig. 13. Aggregate Likert distributions for different explanan types.

preferences from users due to their increased ability to communicate intent and therefore signal agency. In this case, the causal grounding of the explanations may serve to maintain rationality of the action with respect to the goal, which we know to be another key factor in agency attribution [91]. Another notable psychological theory, prominence-interpretation theory [34], breaks trust-affecting events into two stages, an observation stage, based on the prominence of the feature or event with respect to the user, and an interpretation stage, where the user assesses the affect of their observation on their trust of the system. Whether active users react differently to 𝑉-type explanations due to differences in prominence or interpretation is thus an open question, although we hypothesize, given the focused nature of the study, that interpretation plays a larger role. Other studies have looked at the impact of proactive explanations, but do not compare directly to post-hoc explanations [164]. Thus, the utility of proactive explanations to positively affect trust in general is still largely unexplored. We should note also that there is a large volume of existing work related to context-dependent trust and the antecedents of trust under many different conditions. However, to the best of our knowledge, this is the first such study to consider these phenomena with respect to different explanations.

H4: Trust, Necessity, and Understanding. Tables 5 and 6 summarize the results of our correlation tests. Our primary finding is that ratings of increased understanding, increased trust, and the necessity of the explanation

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:42 Nashed, Mahmud, Goldman & Zilberstein

Table 5. Correlation of understanding, trust, and necessity across all user types.

Type Trust v. Und. Trust v. Nec. Und. v. Nec.

𝐹 0.6682 0.6643 0.7296 𝑅 0.6086 0.6084 0.6082 𝑇 0.7247 0.6985 0.7480 𝑉 0.6748 0.5256 0.6202 ALL 0.6796 0.6405 0.6931

Table 6. Correlation of understanding, trust, and necessity across all explanation types

User Type Trust v. Und. Trust v. Nec. Und. v. Nec.

Active 0.7000 0.6298 0.7128 Passive 0.6619 0.6497 0.6754

are highly correlated, with the minimum and maximum 𝑟-values of 0.5034 and 0.7729 for all combinations of user and explanation types, respectively (see Appendix B for Figures). Moreover, the mode response across all questions was somewhat agree , indicating a positive effect on trust and understanding given the explanations, and a modest desire (necessity) for the explanations provided. These results are expected, given the previously established connections between desire for explanations and their typical effects on understanding and trust. Our second and more interesting finding is that explanations that increase understanding may not always be deemed necessary and may not always increase trust to the same degree that they do understanding. That is, in Figure 14, we can see Figures 14a and 14b show strong correlation but have a slope that clearly differs from 1, indicating that response magnitudes frequently differed, while Figure 14c shows data representing the null hypothesis, that measures of trust and necessity for different explanations scale equally. We see these results as additional support for a number of conclusions established within a large body of work on the antecedents of trust [69]. Principally, these include the importance of understanding, predictability, and competence in the formation of trust [77, 99], and that these results hold for both passive and active users [160]. There have also been results suggesting that different types of explanations may impact different beliefs about trust [155]. Trustor models of trustee motivations may also drive development of trust [49], and there is an important distinction made between a decrease in trust due to lack of competence (forgiven more easily) versus a lack of benevolence or honesty (not as easily forgiven) [117]. Although the types of explanations in these studies do not match exactly with Definition 5, our framework and results offer a promising initial direction to study more advanced hypotheses about the effect of explanations on trust under a variety of conditions, beyond the established wisdom that they have a generally positive impact.

H5: Demographic and Lifestyle Non-Impact. We found no instance in which we could reject the null hypothesis with respect to H5. That is, the results presented with respect to H1-H4 are consistent across genders, age groups, and rates of technology use. We also, somewhat surprisingly, found these results to be consistent regardless of the frequency with which participants operated motor vehicles. We also tried several unsupervised clustering methods to check for more complex correlations between user demographics and preferences for different types of explanations, including principal component analysis (PCA) [123] and t-distributed stochastic neighbor embedding (TSNE) [59], but did not find anything significant. This should give practitioners confidence that these results will hold in many settings. These results primarily index particularized trust [133]. That is, trust established between specific entities. This is generally accepted to be distinct from generalized trust [25], which is considered to be an individual s natural predisposition to trust, and which has been repeatedly established to be contingent on many demographic

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:43

Neither agree

nor disagree

Neither agree

nor disagree

Understanding

Understanding and Necessity for All Scenarios

(a) Understanding vs. Necessity

Neither agree

nor disagree

Understanding

Neither agree

nor disagree

Trust and Understanding for All Scenarios

(b) Trust vs. Understanding

Neither agree

nor disagree

Neither agree

nor disagree

Trust and Necessity for All Scenarios

(c) Trust vs. Understanding

Fig. 14. Distributions of Likert responses on the effect of explanations.

factors including age [161], gender [153], religion [118], and socioeconomic class [47]. We should also note for completeness that sociologists have developed many understandings and models of trust in addition to those proposed by psychologists, and the particularized-generalized trust dichotomy is not universally accepted [133].

9 Discussion and Conclusion 9.1 Study Limitations Overall, the results, especially for H1 and H2, are exceptionally strong. However, we should address some key limitations of this study. First, self-reporting, and specifically self-reporting of trust, has known weaknesses as an experimental practice [72]. Second, the simulations lacked significant realism. While more immersive driving

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:44 Nashed, Mahmud, Goldman & Zilberstein

simulators exist, recruiting a similar number of participants to do in-person studies was not feasible from a cost perspective. It is possible some effects were not detectable in this study that would be under more realistic autonomous driving conditions.

9.2 Inference and Conceptual Limitations

Because they do not encompass repeated interaction with our system, our experiments offer limited evidence for or against popular models for the dynamics of trust formation between humans and automated systems from psychology. While there are key differences among the most established, there is some level of consensus that 1) trust is established over time, and 2) reasons for trust in a specific interaction can come from several qualitatively different sources. For example, one model, advanced by Marsh and Dibben [2003] identifies dispositional, situational, and learned trust as distinct sources. These represent one s natural propensity to trust, the context of an interaction (including both environment variables and transient mood or attitude variables), and past experiences with similar agents. See Hoff and Bashir [2015] for a more extensive discussion. Others theorize that the underlying reasons for trust shift over time. In particular, it has been proposed that between humans trust is initially justified by the predictability of the trustee, is later based on the trustee s dependability, and is finally based on faith in the trustee [126]. Zuboff [1988] and Muir, Muir [1987, 1994] have proposed a similar evolution of trust between humans and technology, beginning with experience, followed by understanding, and finally, faith. However, later studies have contended that this order is actually reversed for some human-automation trust relationships [110, 115, 93]. Lee and Moray [1992] proposed reasons supported by performance, process, and purpose as the foundation of trust, representing different types of information. For example, performance estimates may be established from direct observation, process understanding can be developed through familiarity with the underlying mechanisms, and purpose may be imputed from the system s intended use. This may explain why users with no experience but an understanding of the purpose of a system initially trust based on faith [60]. This model, although not directly mapping onto the different types of explanations outlined in Definition 5, is a promising theoretical meeting point. These models are not mutually exclusive, and regardless of the specifics, they all suggest a complex set of dynamics governing trust formation. While studies like the ones we present make some progress on understanding the effectiveness of causal explanations on trust and understanding in an isolated interaction, to confidently support or reject any of the theories summarized above, longitudinal studies are likely required. Fortunately, embodied agents such as autonomous vehicles, or other robotic systems operating in the open world, afford a rich experimental domain. Previous work has identified that automated information acquisition, information analysis, decision selection, and action implementation play an important role in the dynamics of trust in automation [122], and such systems perform all of these functions at different levels of natural transparency, in addition to eliciting some of the strongest effects [69]. Moreover, there are many different definitions of trust [138], and it is likely that study participants will vary in their theoretical construction of trust . To what extent this variance affects experimental results that rely on self reporting is, to the best of our knowledge, unknown. In practice, it may be possible to avoid forming consensus on abstract notions of trust by disentangling it from certain observable behavior that is more straightforward to label. For example,Ajzen [1980] developed a framework in which beliefs and perceptions (available information) inform attitudes, which, in turn, affect intentions and finally behaviors. As this framework distinguishes between beliefs, attitudes, intentions, and behavior, it can model the influence of trust on reliance. In this model, trust is used as a heuristic to aid in establishing an appropriate level of reliance on a system with which we may lack experience. Given trust is the belief and reliance is the action, trust is thus an antecedent of reliance; additionally, reliance as the behavior can sometimes be observed directly. However, in order to gain experience and grow

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:45

trust, some form of reliance must initially exist. Lee and See [2004] discuss this apparent contradiction and many other key central ideas relevant to trust in automation research in their excellent survey paper. Others from philosophy have argued that trust and reliance may be distinguished since trust may be betrayed while reliance may only be disappointed [8]. For example, reliance on a clock does not produce feelings of betrayal should it break [100], and thus it is the violation of trust alone which elicits a feeling of betrayal [50]. This raises a question for future research on trust in any form in automated systems. As many academic conceptions of trust require competing objectives, without competing objectives, often captured as different theoretical constructions of rational action, there is no need for trust. Moreover, trust requires that an agent may choose to prioritize one objective (its own) over another. Robots occupy a strange middle ground: they clearly have objectives and pursue them rationally to the extent they are capable, and thus are obviously agents. However, they are designed specifically for their ability to aid people. Thus, their objectives are necessarily aligned with our own to the extent that we can model and program accurately16. They have no ability to choose based on emotion or other motivations. Their success or lack thereof in pursuit of their objective can be attributed purely to their level of competence, and thus reliance seems to be the more appropriate construct to study, whether influenced by attitudes of trust or not. One might try to differentiate between a bug in the software due to programmer incompetence, a flaw in the design or model due to developer oversight, or an aspect of the behavior that is intentionally malicious. In this case, trust may be placed in the human agents responsible for designing, developing, and deploying the system, but this distinction ought to be made clearly, and at the moment is often left implicit or unquestioned.

9.3 Conclusion and Future Work

We present a novel framework for causal analysis of MDPs using SCMs, motivated by generating explanations of MDP agent behavior. Principally, this framework provides (1) a theoretical foundation for explainable sequential decision making, and (2) simultaneous support for causal queries using different decision problem components, which has previously not been possible. Beyond the proposed framework, this paper presented a set of theoretical and empirical analyses regarding several important properties of approximate Mean RESP related to convergence rates, error rates, similarity and distinction from other methods, and quality of approximation. We also presented two different user studies investigating several hypotheses, the most striking of which is users overwhelming preference for explanations generated using causal analysis compared to heuristic methods. We see several promising directions for future work. These include extending this framework to other decisionmaking models, running longer or more realistic studies of autonomous driving, and using metric information online to produce anytime explanation systems for models which are much larger than the ones tested in this paper. Additionally, further research is needed to understand context-dependent preferences for different types of explanations particularly by exploring additional contextual variables such as risk level, degree of cooperation, and task complexity.

Acknowledgments This work was supported in part by the United States National Science Foundation grants #1954782, #2205153, and #2321786. Claudia Goldman was partially supported by the David Goldman Data-Driven Innovation Research Center at the Hebrew University Business School and was affiliated with General Motors, Technical Center Israel, when this research was performed.

16This is, of course, an important qualification. Here, we are avoiding possible complications where a robot may be attempting to meet the goals of multiple users that may conflict. There is significant research on how systems ought to reason about such conflicts and about the consequences of their actions, including in decision-making models like MDPs [146, 113, 87]

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:46 Nashed, Mahmud, Goldman & Zilberstein

[1] I. Ajzen. 1980. Understanding attitudes and predictiing social behavior. Prentice-hall. [2] M. S. Alam and Y. Xie. 2022. Appley: Approximate shapley value for model explainability in linear time. In IEEE International Conference on Big Data (Big Data), 95 100. [3] E. Albini, J. Long, D. Dervovic, and D. Magazzeni. 2022. Counterfactual Shapley additive explanations. In ACM Conference on Fairness, Accountability, and Transparency, 1054 1070. [4] M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut. 2017. Text summarization techniques: A brief survey. In ar Xiv preprint ar Xiv:1707.02268. [5] E. Altman. 2000. Applications of Markov decision processes in communication networks: A survey. Tech. rep. INRIA. [6] Y. Amitai, G. Avni, and O. Amir. 2022. Interactive explanations of agent behavior. In ICAPS 2022 Workshop on Explainable AI Planning. [7] S. Anjomshoae, A. Najjar, D. Calvaresi, and K. Främling. 2019. Explainable agents and robots: Results from a systematic literature review. In 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 1078 1088. [8] A. Baier. 2014. Trust and antitrust. In Feminist Social Thought. Routledge, 604 629. [9] R. Bellman. 1952. On the theory of dynamic programming. National Academy of Sciences of the United States of America, 38, 8, 716. [10] L. Bertossi, J. Li, M. Schleich, D. Suciu, and Z. Vagena. 2020. Causality-based explanation of classification outcomes. In ar Xiv preprint ar Xiv:2003.06868. [11] J. Bertram and P. Wei. 2018. Explainable deterministic MDPs. In ar Xiv preprint ar Xiv:1806.03492. [12] C. Bonferroni. 1936. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8, 3 62. [13] S. Bongers, P. Forré, J. Peters, and J. M. Mooij. 2021. Foundations of structural causal models with cycles and latent variables. The Annals of Statistics, 49, 5, 2885 2915. [14] R. J. Boucherie and N. M. Van Dijk. 2017. Markov decision processes in practice. Vol. 248. Springer. [15] T. Brázdil, K. Chatterjee, M. Chmelık, A. Fellner, and J. Křetınsk y. 2015. Counterexample explanation by learning small strategies in Markov decision processes. In 27th International Conference on Computer Aided Verification (CAV), 158 177. [16] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. 2016. Open AI Gym. Ar Xiv, abs/1606.01540. [17] N. Bussmann, P. Giudici, D. Marinelli, and J. Papenbrock. 2021. Explainable machine learning in credit risk management. Computational Economics, 57, 203 216. [18] R. Bustin and C. V. Goldman. 2024. Structure and reduction of MCTS for explainable-AI. In ar Xiv preprint ar Xiv:2408.05488. [19] S. Carey and E. Spelke. 1994. Domain-specific knowledge and conceptual change. Mapping the mind: Domain specificity in cognition and culture, 169, 200. [20] T. Chakraborti, A. Kulkarni, S. Sreedharan, D. E. Smith, and S. Kambhampati. 2019. Explicability? legibility? predictability? transparency? privacy? security? the emerging landscape of interpretable agent behavior. In International Conference on Automated Planning and Scheduling (ICAPS), 86 96. [21] T. Chakraborti, S. Sreedharan, and S. Kambhampati. 2020. The emerging landscape of explainable automated planning & decision making. In International Joint Conference on Artificial Intelligence (IJCAI), 4803 4811. [22] T. Chakraborti, S. Sreedharan, Y. Zhang, and S. Kambhampati. 2017. Plan explanations as model reconciliation: Moving beyond explanation as soliloquy. In ar Xiv preprint ar Xiv:1701.08317. [23] J. Y. Chen, S. G. Lakhmani, K. Stowers, A. R. Selkowitz, J. L. Wright, and M. Barnes. 2018. Situation awareness-based agent transparency and human-autonomy teaming effectiveness. Theoretical Issues in Ergonomics Science, 19, 3, 259 282. [24] H. Chockler and J. Y. Halpern. 2004. Responsibility and blame: A structural-model approach. Journal of Artificial Intelligence Research, 22, 93 115. [25] L. L. Couch and W. H. Jones. 1997. Measuring levels of trust. Journal of research in personality, 31, 3, 319 336. [26] S. Dasgupta, N. Frost, and M. Moshkovitz. 2022. Framework for evaluating faithfulness of local explanations. In International Conference on Machine Learning, 4794 4815. [27] E. David Wong. 1993. Understanding the generative capacity of analogies as a tool for explanation. Journal of Research in Science Teaching, 30, 10, 1259 1272. [28] A. D. Dragan, K. C. Lee, and S. S. Srinivasa. 2013. Legibility and predictability of robot motion. In International Conference on Human-Robot Interaction (HRI), 301 308. [29] R. Eifler, M. Steinmetz, A. Torralba, and J. Hoffmann. 2021. Plan-space explanation via plan-property dependencies: Faster algorithms & more powerful properties. In International Joint Conference on Artificial Intelligence (IJCAI), 4091 4097. [30] T. Eiter and T. Lukasiewicz. 2006. Causes and explanations in the structural-model approach: Tractable cases. Artificial Intelligence, 170, 6-7, 542 580. [31] F. Elizalde, E. Sucar, J. Noguez, and A. Reyes. 2009. Generating explanations based on Markov decision processes. In Mexican International Conference on Artificial Intelligence.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:47

[32] R. Fagin, R. Kumar, and D. Sivakumar. 2003. Comparing top k lists. SIAM Journal on Discrete Mathematics, 17, 1, 134 160. [33] M. Finkelstein, L. Liu, Y. Kolumbus, D. C. Parkes, J. S. Rosenschein, S. Keren, et al. 2022. Explainable reinforcement learning via model transforms. Advances in Neural Information Processing Systems, 35, 34039 34051. [34] B. J. Fogg. 2003. Prominence-interpretation theory: Explaining how people assess credibility online. In CHI 03 Extended Abstracts on Human Factors in Computing Systems, 722 723. [35] M. Fox, D. Long, and D. Magazzeni. 2017. Explainable planning. In ar Xiv preprint ar Xiv:1709.10256. [36] D. A. Freedman. 2007. Statistical models for causation. In The SAGE Handbook of Social Science Methodology, 127 146. [37] G. Gergely. 2010. Kinds of agents: The origins of understanding instrumental and communicative agency. Wiley Online Library, 76 105. [38] G. Gergely and G. Csibra. 2003. Teleological reasoning in infancy: The naıve theory of rational action. Trends in Cognitive Sciences, 7, 7, 287 292. [39] L. Giordano and C. Schwind. 2004. Conditional logic of actions and causation. Artificial intelligence, 157, 1-2, 239 279. [40] C. V. Goldman and M. Baltaxe. 2021. Why are you predicting this class? In 2021 IEEE Intelligent Vehicles Symposium (IV), 415 420. [41] C. V. Goldman, M. Baltaxe, D. Chakraborty, J. Arinez, and C. E. Diaz. 2023. Interpreting learning models in manufacturing processes: Towards explainable AI methods to improve trust in classifier predictions. Journal of Industrial Information Integration, 33. [42] A. Grastien, C. Benn, and S. Thiébaux. 2021. Computing plans that signal normative compliance. In AAAI/ACM Conference on AI, Ethics, and Society (AIES), 509 518. [43] J. Halpern. 2015. A modification of the Halpern-Pearl definition of causality. In International Joint Conference on Artificial Intelligence (IJCAI), 3022 3033. [44] J. Y. Halpern and J. Pearl. 2001. Causes and explanations: A structural-model approach Part I: Causes. In Seventeenth Conference on Uncertainty in Artificial Intelligence, 194 202. [45] J. Y. Halpern and J. Pearl. 2005. Causes and explanations: A structural-model approach. Part I: Causes. The British Journal for the Philosophy of Science, 52, 3. [46] J. Y. Halpern and J. Pearl. 2005. Causes and explanations: A structural-model approach. Part II: Explanations. The British Journal for the Philosophy of Science, 56, 4, 889 911. [47] T. Hamamura. 2012. Social class predicts generalized trust but only in wealthy societies. Journal of Cross-Cultural Psychology, 43, 3, 498 509. [48] L. Hammond, J. Fox, T. Everitt, R. Carey, A. Abate, and M. Wooldridge. 2023. Reasoning about causality in games. In ar Xiv preprint ar Xiv:2301.02324. [49] R. Hardin. 2002. Trust and trustworthiness. Russell Sage Foundation. [50] K. Hawley. 2014. Trust, distrust and commitment. Noûs, 48, 1, 1 20. [51] B. Hayes and J. A. Shah. 2017. Improving robot controller transparency through autonomous policy explanation. In ACM/IEEE International Conference on Human-Robot Interaction (HRI), 303 312. [52] J. He, S. L. Baxter, J. Xu, J. Xu, X. Zhou, and K. Zhang. 2019. The practical implementation of artificial intelligence technologies in medicine. Nature Medicine, 25, 1, 30 36. [53] L. He, N. Aouf, and B. Song. 2021. Explainable deep reinforcement learning for UAV autonomous path planning. Aerospace Science and Technology, 118, 107052. [54] F. Heider. 1958. The psychology of interpersonal relations. Wiley, New York. [55] G. Hesslow. 1988. The problem of causal selection. In Contemporary Science and Natural Explanation: Commonsense Conceptions of Causality, 11 32. [56] D. J. Hilton. 1990. Conversational processes and causal explanation. Psychological Bulletin, 107, 1, 65. [57] D. J. Hilton. 1996. Mental models and causal explanation: Judgements of probable cause and explanatory relevance. Thinking & Reasoning, 2, 4, 273 308. [58] D. J. Hilton and B. R. Slugoski. 1986. Knowledge-based causal attribution: The abnormal conditions focus model. Psychological Review, 93, 1, 75. [59] G. E. Hinton and S. Roweis. 2002. Stochastic neighbor embedding. Advances in Neural Information Processing Systems, 15. [60] J.-M. Hoc. 2000. From human machine interaction to human machine cooperation. Ergonomics, 43, 7, 833 843. [61] K. A. Hoff and M. Bashir. 2015. Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors, 57, 3, 407 434. [62] R. R. Hoffman, S. T. Mueller, G. Klein, and J. Litman. 2018. Metrics for explainable AI: Challenges and prospects. In ar Xiv preprint ar Xiv:1812.04608. [63] T. Huber, K. Weitz, E. André, and O. Amir. 2021. Local and global explanations of agent behavior: Integrating strategy summaries with saliency maps. Artificial Intelligence, 301. [64] Y. Iwasaki and H. A. Simon. 1986. Causality in device behavior. Artificial Intelligence, 29, 1, 3 32. [65] B. Jacobs, A. Kissinger, and F. Zanasi. 2019. Causal inference by string diagram surgery. In International Conference on Foundations of Software Science and Computation Structures, 313 329.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:48 Nashed, Mahmud, Goldman & Zilberstein

[66] A. Jacovi, A. Marasović, T. Miller, and Y. Goldberg. 2021. Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in AI. In ACM Cconference on Fairness, Accountability, and Transparency, 624 635. [67] Z. Juozapaitis, A. Koul, A. Fern, M. Erwig, and F. Doshi-Velez. 2019. Explainable reinforcement learning via reward decomposition. In IJCAI/ECAI Workshop on Explainable Artificial Intelligence. [68] D. Kahneman and A. Tversky. 1981. The simulation heuristic. National Technical Information Service. [69] A. D. Kaplan, T. T. Kessler, J. C. Brill, and P. Hancock. 2023. Trust in artificial intelligence: Meta-analytic findings. Human Factors, 65, 2, 337 359. [70] A.-H. Karimi, B. Schölkopf, and I. Valera. 2021. Algorithmic recourse: From counterfactual explanations to interventions. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 353 362. [71] O. Khan, P. Poupart, and J. Black. 2009. Minimal sufficient explanations for factored Markov decision processes. In International Conference on Automated Planning and Scheduling (ICAPS). [72] S. C. Kohn, E. J. de Visser, E. Wiese, Y.-C. Lee, and T. H. Shaw. 2021. Measurement of trust in automation: A narrative review and reference guide. Frontiers in Psychology, 12, 604977. [73] A. N. Kolmogorov. 1933. Sulla determinazione empirica di una legge didistribuzione. Giorn Dell inst Ital Degli Att, 4, 89 91. [74] B. Krarup, M. Cashmore, D. Magazzeni, and T. Miller. 2019. Model-based contrastive explanations for explainable planning. In ICAPS Workshop on Explainable Planning. [75] I. E. Kumar, S. Venkatasubramanian, C. Scheidegger, and S. Friedler. 2020. Problems with Shapley-value-based explanations as feature importance measures. In International Conference on Machine Learning, 5491 5500. [76] J. Lee and N. Moray. 1992. Trust, control strategies and allocation of function in human-machine systems. Ergonomics, 35, 10, 1243 1270. [77] J. D. Lee and N. Moray. 1994. Trust, self-confidence, and operators adaptation to automation. International Journal of Human-Computer Studies, 40, 1, 153 184. [78] J. D. Lee and K. A. See. 2004. Trust in automation: Designing for appropriate reliance. Human Factors, 46, 1, 50 80. [79] E. Leurent. 2018. An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env. (2018). [80] D. Lewis. 1974. Causation. The Journal of Philosophy, 70, 17, 556 567. [81] L. Li, T. J. Walsh, and M. L. Littman. 2006. Towards a unified theory of state abstraction for MDPs. In International Symposium on Artificial Intelligence and Mathematics. [82] M. P. Linegang, H. A. Stoner, M. J. Patterson, B. D. Seppelt, J. D. Hoffman, Z. B. Crittendon, and J. D. Lee. 2006. Human-automation collaboration in dynamic mission planning: A challenge requiring an ecological approach. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 50, 23, 2482 2486. [83] P. Lipton. 1990. Contrastive explanation. Royal Institute of Philosophy Supplements, 27, 247 266. [84] T. Lombrozo. 2010. Causal-explanatory pluralism: How intentions, functions, and mechanisms influence causal ascriptions. Cognitive Psychology, 61, 4, 303 332. [85] T. Lombrozo. 2012. Explanation and abductive inference. In The Oxford Handbook of Thinking and Reasoning. K. J. Holyoak and R. G. Morrison, (Eds.), 260 276. [86] T. Lombrozo. 2006. The structure and function of explanations. Trends in Cognitive Sciences, 10, 10, 464 470. [87] Q. Lu, J. Svegliato, S. B. Nashed, S. Zilberstein, and S. Russell. 2024. Ethically compliant autonomous systems under partial observability. In International Conference on Robotics and Automation (ICRA), 16229 16235. [88] A. Lucic, H. Haned, and M. de Rijke. 2020. Why does my model fail? Contrastive local explanations for retail forecasting. In ACM Conference on Fairness, Accountability, and Transparency, 90 98. [89] S. M. Lundberg and S.-I. Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, 4765 4774. [90] S. M. Lundberg et al. 2018. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, 2, 10. [91] Y. Luo and R. Baillargeon. 2005. Can a self-propelled box have a goal? Psychological reasoning in 5-month-old infants. Psychological Science, 16, 8, 601 608. [92] J. L. Mackie. 1980. The cement of the universe: A study of causation. Clarendon Press. [93] P. Madhavan and D. A. Wiegmann. 2007. Similarities and differences between human human and human automation trust: An integrative review. Theoretical Issues in Ergonomics Science, 8, 4, 277 301. [94] P. Madumal, T. Miller, L. Sonenberg, and F. Vetere. 2020. Explainable reinforcement learning through a causal lens. In AAAI Conference on Artificial Intelligence, 2493 2500. [95] S. Makridakis. 2017. The forthcoming artificial intelligence (AI) revolution: Its impact on society and firms. Futures, 90, 46 60. [96] H. B. Mann and D. R. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 1, 50 60.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:49

[97] J. Marques-Silva and A. Ignatiev. 2022. Delivering trustworthy AI through formal XAI. In AAAI Conference on Artificial Intelligence, 12342 12350. [98] S. Marsh and M. R. Dibben. 2003. The role of trust in information science and technology. Annual Review of Information Science and Technology (ARIST), 37, 465 98. [99] A. J. Masalonis and R. Parasuraman. 2003. Effects of situation-specific reliability on trust and usage of automated air traffic control decision aids. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting. [100] C. Mc Leod. 2021. Trust. In The Stanford Encyclopedia of Philosophy. (Fall 2021 ed.). Metaphysics Research Lab, Stanford University. [101] J. E. Mercado, M. A. Rupp, J. Y. Chen, M. J. Barnes, D. Barber, and K. Procci. 2016. Intelligent agent transparency in human-agent teaming for Multi-Ux V management. Human Factors, 58, 3, 401 415. [102] L. Merrick and A. Taly. 2020. The explanation game: Explaining machine learning models using Shapley values. In Machine Learning and Knowledge Extraction, 17 38. [103] S. Michel, P. Triantafillou, and G. Weikum. 2005. KLEE: A framework for distributed top-k query algorithms. In International Conference on Very Large Data Bases. [104] T. Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1 38. [105] B. Mittelstadt, C. Russell, and S. Wachter. 2019. Explaining explanations in AI. In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 279 288. [106] S. Miura, A. L. Cohen, and S. Zilberstein. 2021. Maximizing legibility in stochastic environments. In International Conference on Robot & Human Interactive Communication (RO-MAN), 1053 1059. [107] S. Miura and S. Zilberstein. 2021. A unifying framework for observer-aware planning and its complexity. In Uncertainty in Artificial Intelligence, 610 620. [108] R. K. Mothilal, A. Sharma, and C. Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In ACM Conference on Fairness, Accountability, and Transparency, 607 617. [109] J. R. Movellan and J. S. Watson. 2002. The development of gaze following as a Bayesian systems identification problem. In International Conference on Development and Learning (ICDL), 34 40. [110] B. Muir and N. Moray. 1996. Trust in automation: Part II. Experimental studies of trust and human intervention in automated systems. Ergonomics, 37, 1905 1922. [111] B. M. Muir. 1987. Trust between humans and machines, and the design of decision aids. International Journal of Man-Machine Studies, 27, 5-6, 527 539. [112] B. M. Muir. 1994. Trust in automation: Part I. Theoretical issues in the study of trust and human intervention in automated systems. Ergonomics, 37, 11, 1905 1922. [113] S. Nashed, J. Svegliato, and S. Zilberstein. 2021. Ethically compliant planning within moral communities. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 188 198. [114] S. B. Nashed, S. Mahmud, C. V. Goldman, and S. Zilberstein. 2023. Causal explanations for sequential decision making under uncertainty. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2307 2309. [115] C. Nass and Y. Moon. 2000. Machines and mindlessness: Social responses to computers. Journal of Social Issues, 56, 1, 81 103. [116] S. Nisioi, S. Štajner, S. P. Ponzetto, and L. P. Dinu. 2017. Exploring neural text simplification models. In 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 85 91. [117] B. Nooteboom. 2002. Trust: Forms, foundations, functions, failures and figures. Edward Elgar Publishing. [118] D. V. Olson and M. Li. 2015. Does a nation s religious composition affect generalized trust? The role of religious heterogeneity and the percent religious. Journal for the Scientific Study of Religion, 54, 4, 756 773. [119] J. Otsuka and H. Saigo. 2022. On the equivalence of causal models: A category-theoretic approach. In ar Xiv preprint ar Xiv:2201.06981. [120] C. Panigutti, A. Perotti, and D. Pedreschi. 2020. Doctor XAI: An ontology-based approach to black-box sequential data classification explanations. In ACM Conference on Fairness, Accountability, and Transparency, 629 639. [121] R. Parasuraman and V. Riley. 1997. Humans and automation: Use, misuse, disuse, abuse. Human Factors, 39, 2, 230 253. [122] R. Parasuraman, T. B. Sheridan, and C. D. Wickens. 2000. A model for types and levels of human interaction with automation. IEEE Transactions on systems, man, and cybernetics-Part A: Systems and Humans, 30, 3, 286 297. [123] K. Pearson. 1901. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2, 11, 559 572. [124] J. Petch, S. Di, and W. Nelson. 2022. Opening the black box: The promise and limitations of explainable machine learning in cardiology. Canadian Journal of Cardiology, 38, 2. [125] H. Pouget, H. Chockler, Y. Sun, and D. Kroening. 2020. Ranking policy decisions. In ar Xiv preprint ar Xiv:2008.13607. [126] J. K. Rempel, J. G. Holmes, and M. P. Zanna. 1985. Trust in close relationships. Journal of Personality and Social Psychology, 49, 1, 95. [127] R. Roscher, B. Bohn, M. F. Duarte, and J. Garcke. 2020. Explainable machine learning for scientific insights and discoveries. IEEE Access, 8, 42200 42216.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:50 Nashed, Mahmud, Goldman & Zilberstein

[128] B. Rozemberczki, L. Watson, P. Bayer, H.-T. Yang, O. Kiss, S. Nilsson, and R. Sarkar. 2022. The Shapley value in machine learning. In ar Xiv preprint ar Xiv:2202.05594. [129] J. Russell and E. Santos. 2019. Explaining reward functions in Markov decision processes. In Thirty-Second International FLAIRS Conference, 56 61. [130] W. C. Salmon. 2006. Four decades of scientific explanation. University of Pittsburgh Press. [131] L. Scavuzzo, F. Chen, D. Chételat, M. Gasse, A. Lodi, N. Yorke-Smith, and K. Aardal. 2022. Learning to branch with tree MDPs. Advances in Neural Information Processing Systems, 35, 18514 18526. [132] L. Scharrer, R. Bromme, M. A. Britt, and M. Stadtler. 2012. The seduction of easiness: How science depictions influence laypeople s reliance on their own evaluation of scientific information. Learning and Instruction, 22, 3, 231 243. [133] O. Schilke, M. Reimann, and K. S. Cook. 2021. Trust in social relations. Annual Review of Sociology, 47, 239 259. [134] J. C. Schlimmer. 1987. Concept Acquisition through Representational Adjustment. University of California, Irvine. [135] R. Selvey, A. Grastien, and S. Thiébaux. 2023. Formal explanations of neural network policies for planning. In International Joint Conference on Artificial Intelligence (IJCAI), 5446 5456. [136] S. Semmler and Z. Rose. 2017. Artificial intelligence: Application today and implications tomorrow. Duke L. & Tech. Rev., 16, 85. [137] L. S. Shapley. 1953. A value for n-person games. In Contributions to the Theory of Games. H. W. Kuhn and A. W. Tucker, (Eds.) Princeton University Press. Chap. 7, 307 317. [138] T. B. Sheridan. 2019. Individual differences in attributes of trust in automation: Measurement and application to system design. Frontiers in Psychology, 10, 1117. [139] N. V. Smirnov. 1939. On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Bull. Math. Univ. Moscou, 2, 2, 3 14. [140] S. Sreedharan, T. Chakraborti, and S. Kambhampati. 2017. Balancing explicability and explanation in human-aware planning. In 2017 AAAI Fall Symposium Series, 61 68. [141] N. Srikanth and J. J. Li. 2020. Elaborative simplification: Content addition and explanation generation in text simplification. In ar Xiv preprint ar Xiv:2010.10035. [142] E. Štrumbelj and I. Kononenko. 2014. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41, 647 665. [143] K. Stubbs, P. J. Hinds, and D. Wettergreen. 2007. Autonomy and common ground in human-robot interaction: A field study. IEEE Intelligent Systems, 22, 2, 42 50. [144] R. Sukkerd, R. Simmons, and D. Garlan. 2020. Tradeoff-focused contrastive explanation for MDP planning. In 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 1041 1048. [145] M. Sundararajan and A. Najmi. 2020. The many Shapley values for model explanation. In International Conference on Machine Learning, 9269 9278. [146] J. Svegliato, S. B. Nashed, and S. Zilberstein. 2021. Ethically compliant sequential decision making. In Proceedings of the AAAI Conference on Artificial Intelligence, 11657 11665. [147] Y.-M. Tamm, R. Damdinov, and A. Vasilev. 2021. Quality metrics in recommender systems: Do we calculate metrics consistently? In Proceedings of the 15th ACM Conference on Recommender Systems, 708 713. [148] A. V. Taylor, E. Mamantov, and H. Admoni. 2022. Observer-aware legibility for social navigation. In 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 1115 1122. [149] L. Trave-Massuyes and R. Pons. 1997. Causal ordering for multiple mode systems. In 11th International Workshop on Qualitative Reasoning, 203 214. [150] S. Triantafyllou, A. Singla, and G. Radanovic. 2022. Actual causality and responsibility attribution in decentralized partially observable Markov decision processes. In ar Xiv preprint ar Xiv:2204.00302. [151] A. Trotman and V. Kitchen. 2022. Quality metrics for search engine deterministic sort orders. Information Processing & Management, 59, 6, 103102. [152] A. Tversky and I. Simonson. 1993. Context-dependent preferences. Management Science, 39, 10, 1179 1189. [153] F. Wang and T. Yamagishi. 2005. Group-based trust and gender differences in China. Asian Journal of Social Psychology, 8, 2, 199 210. [154] N. Wang, D. V. Pynadath, and S. G. Hill. 2016. The impact of POMDP-generated explanations on trust and performance in human-robot teams. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 997 1005. [155] W. Wang and I. Benbasat. 2007. Recommendation agents for electronic commerce: Effects of explanation facilities on trusting beliefs. Journal of Management Information Systems, 23, 4, 217 246. [156] D. Watson. 2022. Rational Shapley values. In ACM Conference on Fairness, Accountability, and Transparency, 1083 1094. [157] J. J. Williams, T. Lombrozo, and B. Rehder. 2013. The hazards of explanation: Overgeneralization in the face of exceptions. Journal of Experimental Psychology: General, 142, 4, 1006. [158] J. Woodward. 2005. Making things happen: A theory of causal explanation. Oxford university press.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:51

[159] K. H. Wray, S. J. Witwicki, and S. Zilberstein. 2017. Online decision-making for scalable autonomous systems. In International Joint Conference on Artificial Intelligence, (IJCAI), 4768 4774. [160] J. Xu, K. Le, A. Deitermann, and E. Montague. 2014. How different types of users develop trust in technology: A qualitative analysis of the antecedents of active and passive user trust in a shared technology. Applied Ergonomics, 45, 6, 1495 1503. [161] R. Zeffane. 2018. Do age, work experience and gender affect individuals propensity to trust others? An exploratory study in the United Arab Emirates. International Journal of Sociology and Social Policy, 38, 3/4, 210 223. [162] Y. Zhang, Q. V. Liao, and R. K. Bellamy. 2020. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In ACM Conference on Fairness, Accountability, and Transparency, 295 305. [163] X. Zhong, B. Gallagher, S. Liu, B. Kailkhura, A. Hiszpanski, and T. Y.-J. Han. 2022. Explainable machine learning in materials science. NPG Computational Materials, 8, 1, 204. [164] L. Zhu and T. Williams. 2020. Effects of proactive explanations by robots on human-robot trust. In 12th International Conference on Social Robotics (ICSR), 85 95. [165] S. Zuboff. 1988. In the age of the smart machine: The future of work and power. Basic Books, Inc.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:52 Nashed, Mahmud, Goldman & Zilberstein

A Additional User Study Details A.1 Common Demographic and Lifestyle Questions To begin both surveys, we asked three basic demographic questions, shown on the left in Figure 15. At the conclusion of both surveys, we asked the participants about their general use of AI technologies, as well as their overall level of trust in such systems in general, shown on the right in Figure 15.

Fig. 15. Demographic, technology use, and attitudes questions common to both surveys.

A.2 Study Demographics

In total, 189 (199) participants from the United States and Canada were recruited via the crowd-sourcing platform Prolific (www.prolific.co) for the SOTA (CONTEXT) study. All participants were fluent in English. Figure 16 summarizes participant demographics by gender (16a), age (16b), driving frequency (16c), and rates of AI technology use (16d) for both studies.

A.3 SOTA Details

For the SOTA study, after answering the basic demographic questions, every participant was shown the instructions at the top of Figure 17. The participants were then shown 3 clips of simulated driving behavior (not shown here) in a random order. After each clip, participants were asked to rank the explanations using the prompt shown at the bottom of Figure 17.

A.4 CONTEXT Details For the CONTEXT study, after answering basic demographic questions, each participant was randomly shown one of the two prompts in Figure 18. If they saw the top prompt, participants were placed in the passive group. If they saw the bottom prompt, they were placed in the active group. After reading the prompt, the participants were shown 8 different clips of simulated driving behavior in a random order and presented with an explanation

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:53

(c) Driving frequency

(d) Technology use

(g) Driving frequency

(h) Technology use

Fig. 16. Demographic summary for the SOTA and CONTEXT surveys.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:54 Nashed, Mahmud, Goldman & Zilberstein

Fig. 17. Instructions (top) and prompt (bottom) for the SOTA survey.

of the behavior shown in the clip which was generated using our framework, all of which are shown in Figure 19. After each clip, the participants were asked the following 3 questions, shown in Figure 20. In an effort to keep the survey short and thus participant attentiveness and data quality high, we showed each participant a random subset (8) of all (12) possible scenario-explanation pairs. The survey software we used, Qualtrics (www.qualtrics.com), allowed us to balance the questions shown so that all 12 scenario-explanation pairs occurred equally frequently in the data set as a whole. Last, for all participants in the active group, we asked an additional sequence of questions. For each scenario, we asked whether they would initiate a transfer of control upon seeing different explanations, shown in Figure 22.

A.5 Additional Results

Figure 22 summarizes the results of the question asked in Figure 21. Here we can see that, regardless of the scenario, 𝐹-type explanations cause the lowest percentage of transfer of control requests, and 𝑅-type explanations seem to produce the highest. 𝑇-type and 𝑉-type explanations rates seem to vary depending on the specific scenario. As an example, we also include the partitions of data that lead to the minimum and maximum correlation values. Figure 23a, while still showing a relatively correlated relationship, depicts very uncontroversial 𝑉-type

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:55

Fig. 18. Passive (top) and active (bottom) prompts in the CONTEXT survey.

Fig. 19. Example clip, explanation, and prompt for the CONTEXT survey.

explanations, with almost all participants responding neutrally or mildly positive. Figure 23b on the other hand shows a highly correlated and highly polarizing reaction to 𝑇-type explanations, with roughly the same fraction of participants reporting a strong agreement/disagreement with respect to the explanation s effect on trust and understanding. Moreover, almost no participants responded neutrally to these explanations.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:56 Nashed, Mahmud, Goldman & Zilberstein

Fig. 20. Questions regarding trust, necessity, and understanding.

Fig. 21. Transfer of control questions for active drivers only.

B Additional Experiments and Experimental Details

In this section we provide more extensive details regarding the setup of the empirical experiments. We also provide some additional results on convergence rates and similarity measure behavior, and present an additional example.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:57

F-Type R-Type T-Type V-Type Explanan Type

Effect of Explanan Type on Transfer of Control Rate in Scenario 1

To C No Toc

(a) Scenario 1

F-Type R-Type T-Type V-Type Explanan Type

Effect of Explanan Type on Transfer of Control Rate in Scenario 2

To C No Toc

(b) Scenario 2

F-Type R-Type T-Type V-Type Explanan Type

Effect of Explanan Type on Transfer of Control Rate in Scenario 3

To C No Toc

(c) Scenario 3

F-Type R-Type T-Type V-Type Explanan Type

Effect of Explanan Type on Aggregate Transfer of Control Rate

To C No Toc

(d) Aggregate

Fig. 22. Transfer of control rates.

B.1 Timing Benchmarks

For timing experiments, we use the Lunar Lander domain, which has up to 8 state factors (features) and 4 actions. To vary the feature size we simply consider a subset of the features as possibly causal. Features that are not chosen stay static throughout the sampling process. Since the original domain is continuous with a minimum and maximum bound on each feature, we discretize the domains linearly prior to sampling, and vary the discretization fidelity from 2 to 20 in increments of 1. The policy being explained is a generated via deep Q-learning using a multi-layer perceptron and stable baseline 3. Sampling was performed, with replacement, using the occupancy statistics of the policy in order to get a representative picture of explanations likely to be encountered or requested during operation. In this experiment, we only generate singleton (|𝑋| = 1) explanations, however we should note that this offers the most favorable scaling for exact Mean RESP compared to Monte Carlo Mean RESP. Because of the non-monotonic behavior of error measures with respect to sample number in Monte Carlo Mean RESP, we run 10,000 samples for every problem, and then find the last point (highest sample number) at which the sum of the absolute difference in responsibility score between all the singletons of the sampled and exact explanation goes below 0.01. This corresponds to the

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:58 Nashed, Mahmud, Goldman & Zilberstein

Neither agree

nor disagree

Neither agree

nor disagree

Trust and Necessity for Active Users in All Scenarios with V-type Explanations

(a) 𝑟= 0.5034

Neither agree

nor disagree

Understanding

Neither agree

nor disagree

Trust and Understanding for Active Users in All Scenarios with T-type Explanations

(b) 𝑟= 0.7729

Fig. 23. Minimum and maximum Pearson 𝑟(correlation) values.

final, stable convergence point, and using this time as the completion time for Monte Carlo Mean RESP ensures that it never gets lucky by estimating very accurate responsibility score before it has truly converged. All of our experiments were conducted on a Dell XPS 13 9310 Laptop with an 11th Gen Intel(R) Core(TM) i7-1185G7 3.00GHz processor and 16GB 4267MHz LPDDR4x RAM.

B.2 Experimental Details for Shapley Comparison

Experiments comparing Monte Carlo Mean RESP to Shapley value methods are run on 4 domains: Lunar Lander, Highway, Black Jack, and Taxi. For the Highway and Lunar Lander policies we use deep Q-learning using a multi-layer perceptron and stable baseline 3. Policies for Taxi and Blackjack were computed with value iteration. As in the timing experiments, 60 states were sampled with replacement from each domain proportional to the states s occupancy frequency under the given policies. Empirically, we found that after roughly 2000 samples, there were very few significant changes. Thus, we capture results from Monte Carlo Mean RESP after 1000 samples. The Shapley-value-based method is also sample based, and we used 10,000 samples.

B.3 Error Rates versus Samples We now show some additional results comparing the behavior of Monte Carlo Mean RESP under alternate definitions of cause (Figures 24 and 25). In these experiments, we use the data from the Shapley comparison experiments and evaluate some of the similarity measures. Mean values over all 60 explanations are shown in bold, while 95% confidence intervals are represented by the shaded regions. Here, we can see that convergence rates seem more dependent on problem than they do similarity measure, which reinforces the idea that the original and updated definitions of cause may highlight very different explanans and thus may be sensitive to different causal structures within problems. The exceptions to this seem to be Euclidean distance, for which no real difference is observable, and correlation (Pearson s 𝑟), which seems to converge slower or relatively equally in the updated version across all the domains.

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:59

0 500 1000 1500 2000 Number of Samples

Correlation

Mean RESP-OC Mean RESP-UC

(a) Pearson s 𝑟

0 500 1000 1500 2000 Number of Samples

Kendall's Tau

Mean RESP-OC Mean RESP-UC

(b) Kendall s 𝜏

0 500 1000 1500 2000 Number of Samples

Spearman's Rho

Mean RESP-OC Mean RESP-UC

(c) Spearman s 𝜌

0 500 1000 1500 2000 Number of Samples

Euclidean Distance

Mean RESP-OC Mean RESP-UC

(d) Euclidean Distance

0 500 1000 1500 2000 Number of Samples

Average Set Difference

Mean RESP-OC Mean RESP-UC

(e) Average Set Difference

0 500 1000 1500 2000 Number of Samples

Correlation

Mean RESP-OC Mean RESP-UC

(f) Pearson s 𝑟

0 500 1000 1500 2000 Number of Samples

Kendall's Tau

Mean RESP-OC Mean RESP-UC

(g) Kendall s 𝜏

0 500 1000 1500 2000 Number of Samples

Spearman's Rho

Mean RESP-OC Mean RESP-UC

(h) Spearman s 𝜌

0 500 1000 1500 2000 Number of Samples

Euclidean Distance

Mean RESP-OC Mean RESP-UC

(i) Euclidean Distance

0 500 1000 1500 2000 Number of Samples

Average Set Difference

Mean RESP-OC Mean RESP-UC

(j) Average Set Difference

Fig. 24. Error curves vs. sample numbers for the Taxi domain (top, (a)-(e)), and Lunar Lander domain (bottom, (f)-(j)).

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:60 Nashed, Mahmud, Goldman & Zilberstein

0 500 1000 1500 2000 Number of Samples

Correlation

Mean RESP-OC Mean RESP-UC

(a) Pearson s 𝑟

0 500 1000 1500 2000 Number of Samples

Kendall's Tau

Mean RESP-OC Mean RESP-UC

(b) Kendall s 𝜏

0 500 1000 1500 2000 Number of Samples

Spearman's Rho

Mean RESP-OC Mean RESP-UC

(c) Spearman s 𝜌

0 500 1000 1500 2000 Number of Samples

Euclidean Distance

Mean RESP-OC Mean RESP-UC

(d) Euclidean Distance

0 500 1000 1500 2000 Number of Samples

Average Set Difference

Mean RESP-OC Mean RESP-UC

(e) Average Set Difference

0 500 1000 1500 2000 Number of Samples

Correlation

Mean RESP-OC Mean RESP-UC

(f) Pearson s 𝑟

0 500 1000 1500 2000 Number of Samples

Kendall's Tau

Mean RESP-OC Mean RESP-UC

(g) Kendall s 𝜏

0 500 1000 1500 2000 Number of Samples

Spearman's Rho

Mean RESP-OC Mean RESP-UC

(h) Spearman s 𝜌

0 500 1000 1500 2000 Number of Samples

Euclidean Distance

Mean RESP-OC Mean RESP-UC

(i) Euclidean Distance

0 500 1000 1500 2000 Number of Samples

Average Set Difference

Mean RESP-OC Mean RESP-UC

(j) Average Set Difference

Fig. 25. Error curves vs. sample numbers for the Blackjack domain (top, (a)-(e)), and Highway domain (bottom, (f)-(j)).

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

Causal Explanations for Sequential Decision Making 17:61

Fig. 26. Measure comparisons for the Lunar Lander domain as a function of sample number. All sub-figures share horizontal and vertical axis labels with the corresponding label at the margin of the figure.

Table 7. The top 𝑘= 5 causes determined by exact (ground truth) Mean RESP and Monte Carlo Mean RESP after 𝑁= 10, 50, 95 samples. While responsibility score estimates fluctuate with additional samples, some highly influential variables are easily identified (ranks 1 and 2). Moreover, other weakly influential variables appear frequently (anti-satellite test ban), although not always at the correct rank. This table appears to show that the most influential variables stabilize first, which makes sense given that a higher responsibility score indicates a larger fraction of samples will identify the variable as a cause.

Ground Truth N = 10 N = 50 N = 95 1. Adoption of the budget resolution

1. Adoption of the budget resolution

1. Adoption of the budget resolution

1. Adoption of the budget resolution 2. Duty-free exports 2. Duty-free exports 2. Duty-free exports 2. Duty-free exports 3. Education spending 3. Anti-satellite test ban 3. Education spending 3. Education spending 4. Anti-satellite test ban 4. Aid to Nicaraguan Contras 4. Superfund Right to Sue 4. Anti-satellite test ban 5. Export administration act South Africa 5. Superfund Right to Sue 5. Anti-satellite test ban 5. Export administration act South Africa

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.

17:62 Nashed, Mahmud, Goldman & Zilberstein

(a) Spearman s 𝜌

(b) Kendall s 𝜏

(c) Euclidean Distance

(d) Pearson s 𝑟

(e) LSRL Slope

(f) Top 𝑘ratio

Fig. 27. Traces of six measures over time as they compare exact and approximate responsibility estimates. The solid blue line represents the mean (10 runs total), and the blue shaded regions represent one standard deviation. All values were generated for one particular input. Clearly, some measures are more sensitive than others. Moreover, some shifts appear to be detected universally, for example, near 45 samples, while at other points some measures respond to updated estimates while others do not.

B.4 Experimental Details for Metrics for Approximate Explanations

For the metrics experiments, we again use the Lunar Lander domain and a policy trained using deep Q-learning, and we generate explanations for each of the 60 states up to 5000 samples. Here, we let 𝑘= 4. In this appendix, we present some additional results of our framework generating explanations of a random forest classifier we trained on the Congressional Voting Records Data Set [134]. The number of instances in the data set is 435. The number of input features is 16 and the number of labels is 2.

B.5 Additional Results for Metrics for Approximate Explanations Figure 26 shows the complete set of bilateral metric comparisons. Table 7 shows several snapshots of the top 𝑘 most responsible variables for a given classification outcome in the Congressional Voting Records Data Set for both exact Mean RESP and at several points during the sampling process of Monte Carlo Mean RESP. Figure 27 summarizes the behavior of these measures throughout the sampling process.

Received 15 January 2025; revised 10 May 2025; accepted 15 June 2025

Journal of Artificial Intelligence Research, Vol. 83, Article 17. Publication date: July 2025.