# multiobjective_causal_bayesian_optimization__2f4de9b6.pdf

Multi-Objective Causal Bayesian Optimization

Shriya Bhatija 1 2 Paul-David Zuercher 2 3 Jakob Thumm 1 Thomas Bohn e 2

In decision-making problems, the outcome of an intervention often depends on the causal relationships between system components and is highly costly to evaluate. In such settings, causal Bayesian optimization (CBO) exploits the causal relationships between the system variables and sequentially performs interventions to approach the optimum with minimal data. Extending CBO to the multi-outcome setting, we propose multi-objective causal Bayesian optimization (MO-CBO), a paradigm for identifying Paretooptimal interventions within a known multi-target causal graph. Our methodology first reduces the search space by discarding sub-optimal interventions based on the structure of the given causal graph. We further show that any MO-CBO problem can be decomposed into several traditional multi-objective optimization tasks. Our proposed MO-CBO algorithm is designed to identify Pareto-optimal interventions by iteratively exploring these underlying tasks, guided by relative hypervolume improvement. Experiments on synthetic and real-world causal graphs demonstrate the superiority of our approach over non-causal multi-objective Bayesian optimization in settings where causal information is available.

1. Introduction

Decision-making problems arise in various domains, such as healthcare, manufacturing, and public policy, and involve manipulating variables to obtain an outcome of interest. In many such domains, interventions are inherently costly, and practical applications are subject to budgetary constraints. Moreover, these systems are often governed

1Department of Computer Engineering, Technical University of Munich, Munich, Germany 2Department of Engineering, University of Cambridge, Cambridge, United Kingdom 3The Alan Turing Institute, London, United Kingdom. Correspondence to: Shriya Bhatija <shriya.bhatija@tum.de>.

Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025. Copyright 2025 by the author(s).

(a) MO-CBO problem (b) MO-CBO solution

feasible region

Pareto front

Figure 1. MO-CBO problem in healthcare. (a) Causal graph where red, orange, and grey nodes depict outcome, manipulative, and non-manipulative variables, respectively. (b) The solution consists of interventions that yield optimal trade-offs between the targets.

by causal mechanisms, which can be exploited to approach optimal outcomes in a targeted and cost-efficient manner. A well-established strategy for optimizing such expensiveto-evaluate black-box functions is Bayesian optimization (Shahriari et al., 2016), but it cannot leverage the causal structure between its input variables. To this end, causal Bayesian optimization (CBO) (Aglietti et al., 2020) was introduced to generalize Bayesian optimization to settings where causal information is available. While existing CBO variants focus on optimizing a single objective, real-world systems often require the simultaneous optimization of multiple outcome variables. Here, the aim is to establish optimal trade-offs between these variables instead of identifying the global optimum of a single objective. As an example, consider the graph in Figure 1 (a), which depicts the causal relationships between prostate-specific antigen (PSA) and its risk factors (Ferro et al., 2015). For patients sensitive to Statin medications, the aim is to manipulate these risk factors to simultaneously minimize both the required Statin intake and PSA levels. Figure 1 (b) shows the Pareto front, i.e., the optimal trade-offs between Statin and PSA.

We propose multi-objective causal Bayesian optimization (MO-CBO) to generalize CBO to problems with multiple outcome variables. Figure 2 gives a high-level overview of our proposed methodology. Our key contributions are:

1. We formally define MO-CBO as a new class of optimization problems.

Multi-Objective Causal Bayesian Optimization

The MO-CBO problem [Sec. 3] Reducing the search space

Solving the local problems

Building the Pareto front

Full search space has |P(X)| = 2|X| local problems

local problem 1

local problem 1

local problem 2

our MO-CBO algorithm

local problem 1 1. Select

2. Optimize

(DGEMO) Pareto front

Pareto fronts of local problems

Figure 2. Overview of our MO-CBO methodology.

2. We present a mathematical framework to reduce the search space of MO-CBO problems based on the topology of the causal graph. It allows us to discard sub-optimal interventions and focus exploration on possibly-optimal strategies.

3. We propose an algorithm for the parallel exploration of these possibly-optimal intervention strategies, guided by a custom acquisition function.

4. We experimentally demonstrate on both synthetic and real-world MO-CBO problems that our method can surpass traditional multi-objective Bayesian optimization in scenarios with known causal structures, achieving more cost-effective, diverse, and accurate solutions.

To our knowledge, no other multi-objective optimization method exists in the literature that can consider the causal structure. We prove that MO-CBO s reduced search space retains all solutions achievable by traditional multi-objective optimization, while in some cases containing superior, otherwise unattainable solutions. The empirical results confirm that our MO-CBO algorithm consistently matches, and in some scenarios exceeds, the performance of standard baselines.

1.1. Related Work

We combine multi-objective Bayesian optimization (MOBO) with techniques from causal inference to achieve MO-CBO. Our method lies within the field of causal decision-making, seeking to leverage known causal structures to enable causally-informed decisions.

MOBO Multi-objective Bayesian optimization aims to efficiently optimize multiple, often conflicting, objective functions simultaneously. The existing algorithms can be roughly categorized by their selection strategy: Single-point methods select and evaluate one candidate solution at each

iteration, while batch methods select multiple solutions simultaneously for parallel evaluation. One of the most prominent single-point algorithms is Par EGO (Knowles, 2006), which randomly scalarizes the multi-objective problem into a single-objective one and chooses a sample that maximizes the expected improvement. As for batch methods, MOEA/D-EGO (Zhang et al., 2010) builds on Par EGO to incorporate multiple scalarization weights and perform batch evaluation through MOEA/D (Zhang & Li, 2007). Moreover, TSEMO (Bradford et al., 2018) adopts Thompson sampling on the Gaussian process posterior as an acquisition function, optimizes multiple objectives with NSGA-II (Deb et al., 2002), and selects the next batch of samples by maximizing the hypervolume improvement. Recently, q NEHVI (Daulton et al., 2021) was proposed as a robust method that scales to highly parallel evaluations of noisy objectives. DGEMO (Konakovic Lukovic et al., 2020) is most relevant for our work, and employs a novel batch selection strategy maintaining sample diversity in the input space. Specifically, it partitions the input space into so-called diversity regions to guide the selection of diverse points in each batch. We use DGEMO in our MO-CBO algorithm to explore the possibly-optimal intervention strategies.

Causal Decision-Making Within this field, there is a line of work focusing on multi-armed bandit problems and reinforcement learning settings. Here, actions or arms correspond to interventions on an arbitrary causal graph with existing links between the agent s decisions and the received rewards. Lee & Bareinboim (2018) identify a set of possiblyoptimal arms that an agent should explore to maximize its expected reward in a multi-armed bandit problem. Moreover, Lee & Bareinboim (2019b) extend their previous work to scenarios with non-manipulative variables. Collectively, their findings represent the single-objective counterpart of our search space reduction.

Causal decision-making also encompasses a growing body of research specifically focused on advancing CBO (Aglietti

Multi-Objective Causal Bayesian Optimization

et al., 2020). These advancements include extensions such as constrained CBO (Aglietti et al., 2023), time-dynamic

CBO (Aglietti et al., 2021), and various other variants (Branchini et al., 2023; Gultchin et al., 2023; Sussex et al., 2023; 2024; Ren & Qian, 2024; Zeitler & Astudillo, 2024). However, these methods are designed to optimize a single target variable, rendering them infeasible for applications with multiple objectives.

2. Preliminaries

In this paper, random variables and their realizations are denoted in the upper and lower case Latin letters, respectively. Sets and vectors are written in bold. For a set X, its power set is denoted as P(X).

2.1. MOBO Notation

MOBO simultaneously minimizes (or maximizes) a set of black-box objectives f1, . . . , fm : X R, where X is an arbitrary input space. It is designed to rely only on a small number of function evaluations. Due to potential conflicts between objectives, MOBO aims to find trade-off solutions, known as Pareto optima (Miettinen, 1999):

Definition 2.1 (Pareto optimality). A point x X is called Pareto-optimal if there is no other x X such that fi(x) fi(x ) for all 1 i m and fi(x) > fi(x ) for at least one 1 i m. The set of Pareto-optimal points in X is called Pareto set, denoted Ps. The Pareto front is the image of the Pareto set under the objective functions, given by Pf = {f(x) = (f1(x), . . . , fm(x)) | x Ps}.

At each iteration of a MOBO algorithm, prior data is used to fit a surrogate model of the objectives, for which Gaussian processes (Rasmussen, 2004) are predominantly used. Based on the surrogates, an approximation Pf of the Pareto front is computed. To select which point, or batch of points, to evaluate next, an acquisition function is used to assess the utility of those evaluations. The most commonly used acquisition function in MOBO is based on the hypervolume indicator H (Zitzler & Thiele, 1999). The larger the hypervolume, the better Pf approximates the true Pareto front. The hypervolume improvement determines how much the hypervolume would increase if a batch of samples B X was added to the current approximation, and is given by

HVI(f(B), Pf) = H( Pf f(B)) H( Pf) . (1)

Since DGEMO is the backbone of our MO-CBO algorithm, we briefly describe its batch selection strategy. It considers hypervolume improvement as well as sample diversity in the input space. To this end, the so called diversity regions R1, . . . , RK X are constructed by using the current Pareto front approximation to group the optimal points

based on their performance properties in the input space. Formally, a batch is chosen as follows:

B = arg max B X,|B|=B HVI(f(B), Pf)

s.t. max 1 k Kδk(B) min 1 k Kδk(B) 1, (2)

where B denotes the batch size and the functions δk( ) are defined as the number of elements from B that belong to Rk. We refer to Konakovic Lukovic et al. (2020) for the complete selection algorithm.

2.2. Causality

Graph Notation A graph G = (V, E) is defined by a finite vertex set V and an edge set E V V, containing ordered pairs of distinct vertices. The subgraph of G restricted to V V is given by G[V ] = (V , E[V ]), where E[V ] = {(i, j) E | i, j V }. For V V, the set of its parents, ancestors and descendants in G is denoted as pa(V )G, an(V )G, and de(V )G, respectively. Here, no vertex is a parent, an ancestor, or a descendant of itself. Conversely, with a capital letter, this notation is extended to include the argument in the result, i.e., Pa(V )G = pa(V )G {V }. Moreover, we define these relations for sets of variables V V, i.e., pa(V )G = S

V V pa(V )G and Pa(V )G = S

V V Pa(V )G. Equivalent conventions apply to the ancestor and descendant relationships.

Structural Causal Models Let V, U, F, P(U) be a structural causal model (SCM) (Pearl, 2000) and G its associated acyclic graph that encodes the underlying causal mechanisms. Specifically, U is a set of independent exogenous random variables distributed according to the probability distribution P(U), V is a set of endogenous random variables, and F = {f V }V V is a set of deterministic functions such that V = f V (pa(V )G, UV ), where UV U is the set of exogenous variables affecting V V. The set UV UW

consists of unobserved confounders between V, W V, which are the exogenous variables influencing both V and W. Within V, there are three different types of variables to be distinguished: Non-manipulative variables C that cannot be modified, treatment variables X which can be set to specific values, and output variables Y = {Y1, . . . , Ym} which represent the outcome of interest. We consider only real-valued SCMs, where all endogenous variables have continuous domains. For Xs X, CC(Xs)G refers to the c-component of G (Tian & Pearl, 2002), which, in this context, is the maximal set of variables that includes Xs and is connected via unobserved confounders. The joint distribution of V, which is determined by P(U), is referred to as observational distribution and denoted P(V).

Interventions A set Xs P(X) is called an intervention set. The interventional domain of an intervention set is

Multi-Objective Causal Bayesian Optimization

given as D(Xs) = X Xs D(X) and describes the feasible values of Xs. An intervention on Xs involves replacing the structural equations f X with a constant intervention value x, for all X Xs. This action is denoted with the dooperator do(Xs = xs), where the vector of intervention values is xs D(Xs). The graph GXs represents this intervention and is obtained by removing the incoming edges into Xs. The observational distribution of GXs is denoted as P(V|do(Xs = xs)) and called interventional distribution. For Xs = , no intervention is performed and the observational and interventional distributions coincide. The tuple (Xs, xs) is referred to as an intervention set-value pair. Given two sets Xs, X s X and xs D(Xs), we write by xs[X s] the values of xs corresponding to Xs X s.

3. The MO-CBO Problem

In our setting, we assume that the causal relationships encoded in G are known while the underlying parametrizations, i.e., F and P(U), can be unknown. This restricted information is denoted as G, Y, X, C . The assumption is common within the CBO line of work and allows generalization across systems with the same causal structure.

A MO-CBO problem aims to identify intervention set-value pairs (Xs, xs) that offer optimal trade-offs in minimizing (or, maximizing) all target variables in Y. The outcomes of an intervention do(Xs = xs) are captured as the expected values

µi(Xs, xs) := EP (Yi|do(Xs=xs))[Yi], (3)

where P(Yi|do(Xs = xs)) denotes the interventional distribution of Yi, for all i = 1, . . . , m. We write µ(Xs, xs) = (µ1(Xs, xs), . . . , µm(Xs, xs)) for the vector notation. Next, we adopt the notion of Pareto optimality to intervention set-value pairs:

Definition 3.1 (Pareto-optimal intervention set-value pair). Given S P(X), an intervention set-value pair (Xs, xs) with Xs S, xs D(Xs) is called Pareto-optimal for S, if there is no other intervention set-value pair (X s, x s) with X s S, x s D(X s) such that µi(X s, x s) µi(Xs, xs) for all 1 i m and µi(X s, x s) < µi(Xs, xs) for at least one 1 i m.

Definition 3.2 (Pareto front for S). The space of all Paretooptimal intervention set-value pairs for a given S P(X) is called the Pareto set for S, denoted Pc s(S). The corresponding Pareto front for S, denoted Pc f(S), is the m-dimensional image of Pc s(S) under the objectives µi, 1 i m.

We define MO-CBO problems as identifying the Pareto set Pc s(P(X)) which yields the optimal trade-offs among the objectives, represented by the Pareto front Pc f(P(X)).

3.1. Decomposition of MO-CBO Problems

We aim to simplify the search space to navigate the discovery of Pareto-optimal intervention set-value pairs.

Definition 3.3 (Local MO-CBO problem). Let Xs P(X) be an intervention set. Then, the multi-objective optimization problem defined by the objective functions µi(Xs, ) : D(Xs) R, xs 7 µi(Xs, xs), 1 i m, is called the local MO-CBO problem w.r.t. Xs.

The Pareto set of the local MO-CBO problem w.r.t. Xs P(X) is denoted as Pl s(Xs) and the associated Pareto front as Pl f(Xs). Each local MO-CBO problem corresponds to a standard multi-objective optimization task, solvable with existing methods. The following proposition decomposes MO-CBO problems into such local problems.

Proposition 3.4. Given G, Y, X, C , let S P(X) be a non-empty collection of intervention sets. Then, it holds

s=1 Pl f(Xs). (4)

Proof. See Appendix A. Core idea: We exploit that the space of all intervention set-value pairs is the union of the input spaces of each local problem.

Proposition 3.4 allows to match the Pareto-optimal intervention set-value pairs to the Pareto-optimal solutions from the local problems where the intervention set is fixed. Therefore, discovering Pc f(P(X)) requires identifying Pareto-optimal solutions of local MO-CBO problems with respect to all intervention sets Xs P(X).

4. Solving MO-CBO Problems

In this section, we propose our methodology for solving

MO-CBO problems, which has been outlined in Figure 2. In summary, we reduce the search space to a subset S P(X), solve the corresponding local MO-CBO problems w.r.t. each element in S, and extract only Pareto-optimal intervention set-value pairs to construct the Pareto front Pc f(P(X)).

4.1. Reducing the Search Space

The complexity of solving the local MO-CBO problems w.r.t. all Xs P(X) rises exponentially with the number of treatment variables, making this strategy impracticable for most tasks. This section proposes a method to exploit the graph topology to identify a minimal subset S P(X) with Pc f(P(X)) = Pc f(S). Hereby, we generalize the results from Lee & Bareinboim (2018) to the multi-objective setting. All proofs and derivations are given Appendix B. For now, we assume that there are no non-manipulative vari-

Multi-Objective Causal Bayesian Optimization

ables, i.e., C = . At the end of the section, we discuss the general case C = .

We first reduce the search space by disregarding intervention sets where some variables do not affect the targets:

Definition 4.1 (Minimal intervention set). A set Xs P(X) is called a minimal intervention set if there exists no subset X s Xs such that for all xs D(Xs) it holds µi(Xs, xs) = µi(X s, xs[X s]), 1 i m, for every SCM conforming to G.

We denote the set of minimal intervention sets with MG,Y. The following proposition characterizes such sets in a given causal graph G.

Proposition 4.2. Xs P(X) is a minimal intervention set if and only if it holds Xs an(Y)GXs .

Proof. See Appendix B.1. Core idea: The if direction shows by contradiction that any non-minimal intervention set cannot consist solely of the ancestors of Y. The only if direction is straightforward to prove since variables without an ancestral relationship to Y are redundant to intervene upon.

We adapt the notion of possibly-optimal minimal intervention sets (Lee & Bareinboim, 2018) for Pareto-optimality. Intuitively, a minimal intervention set is called possibly Pareto-optimal if it includes a Pareto-optimal intervention set-value pair whose outcome is unattainable with any other intervention set, for at least one SCM conforming to G.

Definition 4.3 (Possibly Pareto-optimal minimal intervention set). A set Xs MG,Y is called possibly Paretooptimal if, for at least one SCM conforming to G, there exists xs D(Xs) such that (Xs, xs) is Pareto-optimal for P(X), and for no X s MG,Y\Xs, x s D(X s) it holds µi(X s, x s) µi(Xs, xs), for all 1 i m.

We denote the set of all possibly Pareto-optimal minimal intervention sets by OG,Y. Next, we establish graphtheoretical criteria to identify such sets in a given causal graph. First, the proposition below considers a special case:

Proposition 4.4. If no Yi is confounded with an(Yi)G via unobserved confounders, then pa(Y)G is the only possibly Pareto-optimal minimal intervention set.

Proof. See Appendix B.2. Core idea: In the absence of unobserved confounding between any Yi and its ancestors an(Yi)G, the average effect of any intervention do(Xs = xs) can be matched by intervening on pa(Y)G.

To characterize possibly Pareto-optimal minimal intervention sets in arbitrary graphs, we extend the following two

Figure 3. Two causal graphs with X = {X1, X2, X3, X4}, Y = {Y1, Y2}. (a) No unobserved confounders. (b) The dashed bidirected edge depicts an unobserved confounder between X4 and Y1.

definitions from Lee & Bareinboim (2018) to the multiobjective setting. They aim to identify a region, starting from Y, that is governed by unobserved confounders, along with its outside border that determines the realization of variables within the region.

Definition 4.5 (Minimal unobserved confounders territory). Let H = G[An(Y)G]. A set of variables T in H, with Y T, is called a UC-territory for G w.r.t. Y if De(T)H = T and CC(T)H = T. The UC-territory T is said to be minimal, denoted T = MUCT(G, Y), if no T T is a UC-territory.

Definition 4.6 (Interventional border). Let us denote T = MUCT(G, Y). Then, B = pa(T)G\T is called the interventional border for G w.r.t. Y, which we write as IB(G, Y).

Example We illustrate these two concepts with the causal graphs from Figure 3. In Figure 3 (a), there are no unobserved confounders and thus, it holds CC(Y)G = Y and De(Y)G = Y. It follows MUCT(G, Y) = {Y1, Y2} and IB(G, Y) = {X1, X2}. In Figure 3 (b), we construct the minimal UC-territory, starting from T = Y, as follows: Since Y1 has an unobserved confounder with X4, we update T = CC(Y)G = {Y1, Y2, X4}, and thereafter add all the descendants of X4, obtaining T = {Y1, Y2, X4, X1}. Since there are no more unobserved confounders between T and An(Y)G\T, the minimal UC-territory has been found and is given by MUCT(G, Y) = {Y1, Y2, X1, X4} along with IB(G, Y) = {X2, X3}.

Interventional borders can fully determine possibly Paretooptimal minimal intervention sets, which are described with the following two results.

Proposition 4.7. IB(GXs, Y) is a possibly Pareto-optimal minimal intervention set for any Xs P(X).

Proof. See Appendix B.2. Core idea: We first prove in Proposition B.2 that IB(G, Y) is a possibly Pareto-optimal minimal intervention set by constructing an SCM where do(IB(G, Y) = 0) is the single best intervention. This construction then easily extends to show that IB(GXs, Y) can also represent the single optimal intervention.

Multi-Objective Causal Bayesian Optimization

Next, we can finally characterize possibly Pareto-optimal minimal intervention sets:

Theorem 4.8. A set Xs P(X) is a possibly Paretooptimal minimal intervention set if and only if it holds IB(GXs, Y) = Xs.

Proof. See Appendix B.2. Core idea: The if statement is a special case of Proposition 4.7. We prove the only if direction by showing that intervening on IB(GXs, Y) is at least as optimal as intervening on Xs,

Corollary 4.9. Let Xs P(X) and X s = IB(GXs, Y). For any xs D(Xs) there exist x s D(X s) such that it holds µ(X s, x s) µ(Xs, xs), for all 1 i m.

Corollary 4.9 is a direct result from the proof of Theorem 4.8. In this setting, it is easy to construct an SCM, conforming to G, for which strict inequality holds in at least one component. Finally, we show that it suffices to only consider possibly Pareto-optimal minimal intervention sets to solve MO-CBO problems.

Theorem 4.10. It holds Pc f(P(X)) = Pc f(OG,Y).

Proof. : Assume Pc f(P(X)) Pc f(OG,Y). Then, there exists z Rm, with z = µ(Xs, xs) for some intervention set-value pair (Xs, xs), such that z Pc f(P(X)) and z Pc f(OG,Y). If Xs OG,Y, it follows that (Xs, xs) is not Pareto-optimal for OG,Y, which is a contradiction since it is Pareto-optimal for P(X). Conversely, if Xs P(X)\OG,Y, we set X s = IB(GXs, Y) and from Corollary 4.9, we infer that, for some SCM conforming to G, there exists x s D(X s) with µ(X s, x s) µ(Xs, xs), for all 1 i m, and µ(X s, x s) < µ(Xs, xs), for at least one 1 i m. This results in z Pc f(P(X)), which is a contradiction.

: Assume Pc f(OG,Y) Pc f(P(X)). Then, there exists z Rm, with z = µ(Xs, xs), such that z Pc f(OG,Y) and z Pc f(P(X)). There exists some X s P(X)\OG,Y, xs D(X s) such that (X s, x s) is Pareto optimal and for which it holds µ(X s, x s) µ(Xs, xs), for all 1 i m, and µ(X s, x s) < µ(Xs, xs), for at least one 1 i m. Since X s is not possibly Pareto-optimal, we infer from Corollary 4.9 that for X s = IB(G, Y) there exists x s D(X s) such that µ(X s, x s) µ(X s, x s), for all 1 i m. Hence, it holds µ(X s, x s) Pc f(P(X)), which is a contradiction to z Pc f(OG,Y) since X s OG,Y.

Using Theorem 4.10, we reduce the search space of MOCBO problems to S = OG,Y.

Example We illustrate the search space reduction with the causal graphs from Figure 3. Note that in both cases it holds P(X) = 2|X| = 16. In Figure 3 (a), there are no unobserved confounders, and it follows OG,Y = pa(Y)G = {X1, X2}.

Algorithm 1 Our MO-CBO algorithm

Input: G, Y, X, C , S {OG,Y, MG,Y, P(X)}, data D, batch size B, number of iterations N Output: Pc s(S), Pc f(S) Initialize the dataset D0 = D for s = 1 to |S| do

Fit surrogates µi(Xs, ) with D0, i = 1, . . . , m Approximate Pl s(Xs) and Pl f(Xs) using µ1, . . . , µm end for for n = 1 to N do

for s = 1 to |S| do

Select batch Bs = {xb s}B b=1 via Equation (2) end for Select batch Bˆs from {B1, . . . , B|S|} via Equation (6) Intervene on Xˆs with Bˆs Augment Dn = Dn 1 {(Xˆs, xb ˆs), µ(Xˆs, xb ˆs))}B b=1 Update surrogates µi(Xˆs, ) with Dn, i = 1, . . . , m Approximate Pl s(Xˆs) and Pl f(Xˆs) using µ1, . . . , µm end for Compute Pl s(Xs), Pl f(Xs) from DN, s = 1, . . . , |S| Compute Pc s(S) and Pc f(S)

In Figure 3 (b), the intervention sets {X1, X2, X3} and {X2, X3} satisfy the condition from Theorem 4.8, and thus, OG,Y = {{X2, X3}, {X1, X2, X3}}.

We now consider the more general case with C = , where non-manipulative variables can be present. The definitions for the minimal intervention set and the possibly Pareto-optimal minimal intervention set are a straightforward extension. Lee & Bareinboim (2019a) propose a projection G G[V\C] which preserves the distribution of the underlying SCM. Given such a projection, we can identify the possibly Pareto-optimal minimal intervention sets in G, Y, X, C by applying Theorem 4.8 to G[V\C], Y, X .

4.2. Solving the Local Problems

We propose our algorithm to solve MO-CBO problems1, for which the procedure is summarized in Algorithm 1. It assumes a known causal graph G, Y, X, C , prior data D, and a set S {OG,Y, MG,Y, P(X)} that specifies which local problems to consider. The idea is to alternately solve the local MO-CBO problems using the MOBO algorithm DGEMO.

More specifically, the algorithm operates as follows: For each local MO-CBO problem w.r.t. Xs S, it first fits the surrogate model to the objectives µi(Xs, ), 1 i m, via independent Gaussian processes. Based on the means of

1The full implementation of our algorithm is available at https://github.com/Shriya Bhatija/MO-CBO

Multi-Objective Causal Bayesian Optimization

Table 1. The generational distances, averaged across 10 random seeds. Lower values indicate closer approximation to Pc f(P(X)).

Problem Par EGO MOEA/D-EGO TSEMO q NEHVI DGEMO MO-CBO (ours)

SYNTHETIC-1 0.30 0.38 0.16 0.27 0.14 0.14 SYNTHETIC-2 12.43 11.45 8.65 7.98 6.79 2.80 HEALTH 0.09 0.20 0.08 0.12 0.07 0.06 CREDIT APPROVAL 0.14 0.12 0.06 0.08 0.06 0.05

Table 2. The inverted generational distances, averaged across 10 random seeds. Lower values indicate a more diverse coverage of solutions approximate to Pc f(P(X)).

Problem Par EGO MOEA/D-EGO TSEMO q NEHVI DGEMO MO-CBO (ours)

SYNTHETIC-1 3.63 3.24 2.82 2.45 2.57 1.40 SYNTHETIC-2 7.43 7.70 4.78 5.49 4.50 0.87 HEALTH 0.15 0.16 0.05 0.10 0.05 0.02 CREDIT APPROVAL 0.24 0.28 0.08 0.21 0.09 0.08

the Gaussian process posteriors, approximations of Pl s(Xs) and Pl f(Xs) are computed utilizing the Pareto discovery approach from DGEMO. After this initial step, the most promising intervention set is selected for batch evaluation at each iteration. The dataset is then augmented with the newly evaluated batch of samples. For the corresponding local problem, we again update the surrogate model and the Pareto set and front approximations. After completing all iterations, the algorithm identifies the final Pareto sets and fronts for each local MO-CBO problem using the collected objective function evaluations DN. Thereafter, it is easy to construct Pc s(S) and Pc f(S), see Section 4.3.

Batch Selection For the local MO-CBO problem w.r.t. Xs S, let R1(Xs), . . . , RK(Xs) D(Xs) denote the identified diversity regions from DGEMO, discussed in Section 2.1. Our algorithm seeks to balance the exploration of Pareto fronts from multiple local MO-CBO problems, but evaluating all Bs, s = 1, . . . , |S|, during a single iteration, is an inefficient strategy. Instead, we select the batch with the most promising hypervolume improvement at each iteration. To this end, we introduce the term relative hypervolume improvement, defined as

RHVI(µ(Xs, Bs), Pl f(Xs)) = HVI(µ(Xs, Bs), Pl f(Xs))

H(Pl f(Xs)) .

As the name suggests, relative hypervolume improvement is a normalized measure of improvement, enabling the assessment of batch evaluations across different intervention sets. Given B1, . . . , B|S|, we propose the following batch selection strategy for our MO-CBO algorithm:

Bˆs = arg max Bs {B1,...,B|S|} RHVI(µ(Xs, Bs), Pl f(Xs)). (6)

Overall, the proposed batch selection is designed to alternately advance the Pareto fronts Pl f(X1), . . . , Pl f(X|S|).

4.3. Building the Pareto Front

After Algorithm 1 has computed the Pareto sets and Pareto fronts of the local problems, its final step is to simply extract Pareto-optimal points from S|S| s=1 Pl s(Xs), as justified by Proposition 3.4. This yields Pareto-optimal intervention set-value pairs Pc s(P(X)) and their corresponding Pareto front Pc f(P(X)).

5. Experiments

We evaluate our MO-CBO algorithm with S = OG,Y on the causal graphs shown in Figure 3 (a) (SYNTHETIC-1), Figure 3 (b) (SYNTHETIC-2), Figure 1 (HEALTH), and an additional CREDIT APPROVAL example. We cover both synthetic and real-world scenarios. The full description of the underlying SCMs is given in Appendix C. We assume to have an initial dataset D = {((Xs, xk s), µ(Xs, xk s))}K,|S| k=1,s=1 with K = 5 samples per intervention set. The batch size is set to 5. For reproducibility, all experiments are run across 10 random seeds, resulting in varying initializations of D.

Baselines To the best of our knowledge, there exists no other multi-objective optimization method in the literature that can leverage the causal structure. As baselines, we therefore apply some of the most prominent MOBO algorithms such as Par EGO, MOEA/D-EGO, TSEMO, q NEHVI, and DGEMO (see the literature review). They intervene on all treatment variables simultaneously and thus the objective functions are µi(X, ) : D(X) R, x 7 µ(X, x), i = 1, . . . , m. Notably, Pc f(P(X)) contains at least as optimal outcomes as Pl f(X).

Multi-Objective Causal Bayesian Optimization

MO-CBO (ours)

Figure 4. SYNTHETIC-1. Pareto front approximations from MO-

CBO (ours) and DGEMO. Our method offers a higher coverage of Pc f(P(X)).

Evaluation We assess the quality of the resulting Pareto fronts by measuring their proximity to Pc f(P(X)) using the generational distance (GD) (Schutze et al., 2012). The GD is defined as the average distance from any point in the approximated front to its closest point on the groundtruth front. Moreover, we calculate the diversity of the identified solutions using the inverted generational distance (IGD) (Schutze et al., 2012). The IGD represents the average distance from any point in the ground-truth front to its closest point on the approximated front. The mathematical definitions are given in Appendix D. We present the performance metrics from our experiments in Table 1 and Table 2.

We run each experiment until a predefined cost budget is exhausted, assuming each intervention comes at a certain cost. The cost structure is detailed in Appendix D. In this section, we will only present the visual results from the experiments with DGEMO, with corresponding plots for the other baselines provided in Appendix D.

5.1. Synthetic Problems

SYNTHETIC-1 As previously discussed in Section 4.1, it holds OG,Y = {{X1, X2}}. We observe that the generational distances in Table 1 are negligible for MO-CBO and all baseline algorithms. This is also supported by the Pareto front approximations shown in Figure 4 where solutions found by both MO-CBO and DGEMO closely match Pc f(P(X)). Similar results are observed for the other baselines and are presented in Appendix D. Theoretically, these findings are expected, as µ(X, x) = µ(OG,Y, x[OG,Y]) guarantees that the baselines can reach the same solutions as MO-CBO. Furthermore, we observe that our method offers a better coverage of Pc f(P(X)), a result confirmed by its lower inverted generational distance in Table 2. The improvement likely stems from avoiding unnecessary interventions on X3 and X4, allowing for more exploratory interventions on OG,Y within the same budget.

MO-CBO (ours)

Figure 5. SYNTHETIC-2. Pareto front approximations from MO-

CBO (ours) and DGEMO. Our approach tightly fits Pc f(P(X)).

SYNTHETIC-2 In this setting, an unobserved confounder exists between Y1 and X4, placing SYNTHETIC-2 in the general case where hidden confounders may influence target variables through their ancestors. Consequently, it holds OG,Y = {{X2, X3}, {X1, X2, X3}}. The Pareto front approximations are illustrated in Figure 5, demonstrating that while the baseline method DGEMO fails to identify solutions on Pc f(P(X)), MO-CBO does indeed discover them. Similar observations are seen for the other baselines, see Appendix D. Further experiments reveal that only interventions on {X2, X3} can yield Pareto-optimal solutions in Pc f(P(X)). This can be explained as follows: The baseline strategy disrupts the causal path X4 X1 Y1, letting the unobserved confounder influence Y1 without propagating through the aforementioned path. In contrast, our approach allows interventions on {X2, X3}, preserving this causal structure. This distinction is crucial as the structural assignment of Y1 includes the term X1 X2 U/2, with U denoting the unobserved confounder (all structural equations are specified in Appendix C). Not intervening on X1, causes this term to always be negative, yielding lower function values for Y1. However, if we do intervene on X1, it is positive with probability 0.5, causing higher values for Y1 in the averaged outcomes.

5.2. Real-World Problems

HEALTH We revisit the causal graph from Figure 1, which is based on real-world causal relationships in the healthcare setting (Ferro et al., 2015). For patients sensitive to Statin medication, one might aim to minimize both Statin usage and PSA levels simultaneously. There is only one possibly Pareto-optimal minimal intervention set, OG,Y = {{BMI, Aspirin}}. Similarly to SYNTHETIC-1, both our MO-CBO algorithm and the MOBO baselines identify Pareto-optimal solutions, while the baselines tend to produce sparser approximations of Pc f(P(X)). See Figure 6 for the results obtained using DGEMO.

Multi-Objective Causal Bayesian Optimization

MO-CBO (ours)

Figure 6. HEALTH. Pareto front approximations from MO-CBO (ours) and DGEMO. Our approach yields a better coverage of Pc f(P(X)).

loan duration

MO-CBO (ours)

Figure 7. CREDIT APPROVAL. Pareto front approximations from

MO-CBO (ours) and DGEMO. Our approach yields a better coverage of Pc f(P(X)).

CREDIT APPROVAL We consider a causal system which models the credit approval probability as a function of demographic and financial variables, see Appendix D for the SCM specifications. The model is inspired by the German Credit UCI dataset (Murphy, 1994), with causal dependencies adapted from Karimi et al. (2020). Our objective is to maximize the probability of credit approval and the received loan duration (measured as a deviation from the mean). There are no unobserved confounders, resulting in OG,Y = {{loan amount, income, savings}}. Similarly to before, we observe that our MO-CBO algorithm can yield a more dense representation of Pc f(P(X)) compared to some baselines.

6. Conclusion

This paper introduces MO-CBO as a new problem class for optimizing multiple target variables within a known causal graph. We show that any MO-CBO problem can be decomposed into local problems, and propose theoretical analyses to identify a minimal collection of such local problems guaranteed to contain all Pareto-optimal solutions. Finally, we present our MO-CBO algorithm that explores this reduced

search space to identify such solutions. Notably, we have observed that traditional multi-objective optimization is simply misspecified for causal systems and can therefore yield suboptimal outcomes. More specifically, our experimental results reveal two distinct scenarios: In the absence of unobserved confounders between the targets and their ancestors, both our MO-CBO algorithm and the MOBO baselines recover optimal solutions, albeit our method can offer a higher solution diversity. In the contrasting scenario, the

MOBO baselines can lead to suboptimal solutions, while our method remains effective. These observations align with our theoretical findings.

The search space reduction in Section 4.1 requires prior causal knowledge, which is a notable limitation of our approach. In some domains, such knowledge is accessible through experimental studies (Blomqvist et al., 2020), and when unavailable, could potentially be inferred using methods from causal discovery (Zanga et al., 2022). Moreover, in our current implementation, the surrogate model assumes independent outcomes, overlooking shared endogenous confounders. Future work could enhance sample efficiency by integrating multi-task Gaussian processes to capture shared information across treatment variables. Other directions for future research include the adaptation of existing CBO variants to the multi-objective setting.

Acknowledgements

This research was supported by the Bavarian State Ministry of Education, Science and the Arts, the Engineering and Physical Sciences Research Council [EP/S023917/1], and the Alan Turing Institute s studentship scheme.

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. We further emphasize that the health-related causal graph is a simplified, illustrative example with no ethical concerns or implications for clinical practice.

Aglietti, V., Lu, X., Paleyes, A., and Gonz alez, J. Causal Bayesian optimization. In Proc. of the Int. Conf. on Artificial Intelligence and Statistics (AISTATS), pp. 3155 3164, 2020.

Aglietti, V., Dhir, N., Gonz alez, J., and Damoulas, T. Dynamic causal Bayesian optimization. In Proc. of the Int. Conf. on Neural Information Processing Systems (Neur IPS), pp. 10549 10560, 2021.

Multi-Objective Causal Bayesian Optimization

Aglietti, V., Malek, A., Ktena, I., and Chiappa, S. Constrained causal Bayesian optimization. In Proc. of the Int. Conf. on Machine Learning (ICML), pp. 304 321, 2023.

Blomqvist, E., Alirezaie, M., and Santini, M. Towards causal knowledge graphs-position paper. In KDH@ ECAI, pp. 58 62, 2020.

Bradford, E., Schweidtmann, A. M., and Lapkin, A. Efficient multiobjective optimization employing Gaussian processes, spectral sampling and a genetic algorithm. J. of Global Optimization, 71(2):407 438, 2018.

Branchini, N., Aglietti, V., Dhir, N., and Damoulas, T. Causal entropy optimization. In Proc. of the Int. Conf. on Artificial Intelligence and Statistics (AISTATS), pp. 8586 8605, 2023.

Daulton, S., Balandat, M., and Bakshy, E. Parallel Bayesian optimization of multiple noisy objectives with expected hypervolume improvement. In Proc. of the Int. Conf. on Neural Information Processing Systems (Neur IPS), pp. 2187 2200, 2021.

Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2): 182 197, 2002.

Ferro, A., Pina, F., Severo, M., Dias, P., Botelho, F., and Lunet, N. Use of statins and serum levels of prostate specific antigen. Acta Urol ogica Portuguesa, 32, 2015.

Gultchin, L., Aglietti, V., Bellot, A., and Chiappa, S. Functional causal Bayesian optimization. In Proc. of the Conf. on Uncertainty in Artificial Intelligence, pp. 756 765, 2023.

Hansen, N. The cma evolution strategy: A tutorial, 2023. URL https://arxiv.org/abs/1604.00772.

Karimi, A.-H., Von K ugelgen, J., Sch olkopf, B., and Valera, I. Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. Advances in neural information processing systems, 33:265 277, 2020.

Knowles, J. D. Parego: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Transactions on Evolutionary Computation, pp. 50 66, 2006.

Konakovic Lukovic, M., Tian, Y., and Matusik, W. Diversity-guided multi-objective Bayesian optimization with batch evaluations. In Proc. of the Int. Conf. on Neural Information Processing Systems (Neur IPS), pp. 17708 17720, 2020.

Lee, S. and Bareinboim, E. Structural causal bandits: Where to intervene? In Proc. of the Int. Conf. on Neural Information Processing Systems (Neur IPS), 2018.

Lee, S. and Bareinboim, E. Structural causal bandits with non-manipulable variables. In Proc. of the AAAI Conf. on Artificial Intelligence (AAAI), volume 33, pp. 4164 4172, 2019a.

Lee, S. and Bareinboim, E. Structural causal bandits with non-manipulable variables. In Proc. of the AAAI Conf. on Artificial Intelligence (AAAI), pp. 4164 4172, 2019b.

Miettinen, K. Nonlinear multiobjective optimization, volume 12. Springer Science & Business Media, 1999.

Murphy, P. M. UCI repository of machine learning databases, 1994. URL ftp://ftp.ics.uci.edu/ pub/machine-learning-databases/.

Pearl, J. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.

Rasmussen, C. E. Gaussian Processes in Machine Learning, pp. 63 71. Springer Berlin Heidelberg, 2004.

Ren, S. and Qian, X. Causal Bayesian optimization via exogenous distribution learning, 2024.

Schutze, O., Esquivel, X., Lara, A., and Coello, C. A. C. Using the averaged hausdorff distance as a performance measure in evolutionary multiobjective optimization. IEEE Transactions on Evolutionary Computation, pp. 504 522, 2012.

Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104: 148 175, 2016.

Sussex, S., Makarova, A., and Krause, A. Model-based causal Bayesian optimization. In Proc. of the Int. Conf. on Learning Representations (ICLR), 2023.

Sussex, S., Sessa, P. G., Makarova, A., and Krause, A. Adversarial causal Bayesian optimization. In Proc. of the Int. Conf. on Learning Representations (ICLR), 2024.

Tian, J. and Pearl, J. On the testable implications of causal models with hidden variables. In Proc. of the Conf. on Uncertainty in Artificial Intelligence, pp. 519 527, 2002.

Zanga, A., Ozkirimli, E., and Stella, F. A survey on causal discovery: Theory and practice. International Journal of Approximate Reasoning, 151:101 129, 2022.

Zeitler, J. and Astudillo, R. Causal elicitation for Bayesian optimization. In Causal Inference Workshop at UAI, 2024.

Multi-Objective Causal Bayesian Optimization

Zhang, Q. and Li, H. Moea/d: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation, 11(6):712 731, 2007.

Zhang, Q., Liu, W., Tsang, E., and Virginas, B. Expensive multiobjective optimization by moea/d with Gaussian process model. IEEE Transactions on Evolutionary Com-

putation, pp. 456 474, 2010.

Zitzler, E. and Thiele, L. Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Transactions on Evolutionary Computation, pp. 257 271, 1999.

Multi-Objective Causal Bayesian Optimization

A. Decomposition of MO-CBO Problems

Recall the definition of the local MO-CBO problems.

Definition 3.3 (Local MO-CBO problem). Let Xs P(X) be an intervention set. Then, the multi-objective optimisation problem defined by the objective functions µi(Xs, ) : D(Xs) R, xs 7 µ(Xs, xs), 1 i m, is called local MO-CBO problem w.r.t. Xs.

For the local MO-CBO problem w.r.t. Xs P(X), we denote its Pareto set as Pl s(Xs) and the associated Pareto front as Pl f(Xs).

Proposition 3.4. Given G, Y, X, C , let S P(X) be a non-empty collection of intervention sets. Then, it holds

s=1 Pl f(Xs). (7)

Proof. Assume for contradiction that Pc f(S) S|S| s=1 Pl f(Xs). Then, there exists some z Rm such that z Pc f(S) and z Pl f(Xs) for all s = 1, . . . , |S|. For some intervention set X s S and intervention value x s D(X s), it holds z = (µ1(X s, x s), . . . , µm(X s, x s)). Let Pl s(X1), . . . , Pl s(X|S|) be the Pareto sets of the associated local MO-CBO problems w.r.t. X1, . . . , X|S|, respectively. Since z Pl f(X s), it follows x s Pl s(X s), i.e. x s is not Pareto-optimal in the local

MO-CBO problem w.r.t. X s. Thus, there exists another intervention value x s D(X s) such that µi(X s, x s) µi(X s, x s) for all i and µi(X s, x s) > µi(X s, x s) for at least one 1 i m. In other words, the intervention set-value pair (X s, x s) is not Pareto-optimal for S since it is dominated by (X s, x s). Therefore, z Pc f(S) which is a contradiction.

B. Reducing the Search Space

Lee & Bareinboim (2018) leverage the graph topology of an SCM to identify intervention sets that are redundant to consider in any optimisation scheme. Their formalism exploits the rules of do-calculus to identify invariances and partial-orders among intervention sets, in order to obtain those sets that could potentially yield optimal outcomes for a given graph. To take advantage of their ideas for this paper, the relevant concepts and their theoretical properties must be extended to accommodate multi-target settings, which will be the focus of this section.

Let V, U, F, P(U) denote an SCM and G its associated acyclic graph that encodes the underlying causal mechanisms. Recall that we assume C = , i.e., there are no non-manipulative variables. In this section, we require the notation EP (W|do(Xs=xs))[W] := E[W|do(Xs = xs)] for sets Xs X, W V.

B.1. Equivalence of Intervention Sets

As a first step, we establish invariances within P(X) in regards to the effects of intervention sets on the target variables. Recall the following definition from the main part of the paper.

Definition 4.1 (Minimal intervention set). A set Xs P(X) is called a minimal intervention set if, for every SCM conforming to G, there exists no subset X s Xs such that for all xs D(Xs) it holds µ(Xs, xs) = µ(X s, xs[X s]), for all 1 i m.

We denote the set of minimal intervention sets with MG,Y. In other words, no subset of a minimal intervention set can achieve the same expected outcome on Y. Intervention sets, that are not minimal in the sense of Definition 4.1, are redundant to consider in any optimization task.

Proposition 4.2. Xs P(X) is a minimal intervention set if and only if it holds Xs an(Y)GXs .

Proof. (If) Let xs D(Xs) be any intervention value. Assume that there is a subset X s Xs such that for all SCMs conforming to G it holds E[Yi|do(Xs = xs)] = E[Yi|do(X s = xs[X s])] for all 1 i m. For the sake of contradiction, assume Xs an(Y)GXs. Consider an SCM with real-valued variables where each V V is associated with its own binary exogenous variable UV with P(UV = 1) = 0.5. Let the function of an endogenous variable be the sum of values of its parents. Then, there exists a directed path from Xs\X s to some Yi without passing X s.

Multi-Objective Causal Bayesian Optimization

Hence, setting W = Xs\X s to the values w = E[W|do(X s = xs[X s])] + 1 yields E[Yi|do(W = w, X s = xs[X s])] > E[Yi|do(X s = xs[X s])], contradicting the assumption.

(Only if) Assume that Xs an(Y)GXs. Then, for X s = Xs an(Y)GXs it holds X s Xs and by the third rule of do-calculus, for every xs D(Xs) it holds E[Yi|do(Xs = xs)] = E[Yi|do(X s = xs[X s])], 1 i m. This is a contradiction because Xs was assumed to be a minimal intervention set.

B.2. Partial-Orders among Intervention Sets

Recall the definition of possibly Pareto-optimal minimal intervention sets.

Definition 4.3 (Possibly Pareto-optimal minimal intervention set). A set Xs MG,Y is called possibly Pareto-optimal if, for at least one SCM conforming to G, there exists xs D(Xs) such that (Xs, xs) is Pareto-optimal for P(X), and for no X s MG,Y\Xs, x s D(X s) it holds µ(X s, x s) µ(Xs, xs), for all 1 i m.

Characterizing such sets is the aim of this section. For simplicity, we first consider the special case in which G exhibits no unobserved confounders between Yi and any of its ancestors.

Proposition 4.4. If no Yi is confounded with an(Yi)G via unobserved confounders, then pa(Y)G is the only possibly Pareto-optimal minimal intervention set.

Proof. Let Xs = pa(Y)G, and let pa(Y)G = X s be another minimal intervention set with x s D(X s). Define Z = Xs\(X s Xs) and W = X s\(X s Xs). Moreover, we choose an intervention value x s D(Xs) such that it dominates xs D(Xs) which is given by xs[X s] = x s[Xs] and xs[Z] = E[Z|do(X s = x s)]. If xs is non-dominated, define x s = xs. Then, for all i = 1, . . . , m it holds

E[Yi|do(Xs = x s)] = E[Yi|do(Xs X s = x s[X s], Z = x s[Z])] (8)

E[Yi|do(Xs X s = xs[X s], Z = xs[Z])] (9)

= E[Yi|do(Xs X s = xs[X s], Z = xs[Z], W = x s[W])] (10)

= E[Yi|do(Xs X s = xs[X s], W = x s[W]), Z = xs[Z]] (11)

= E[Yi|do(Xs X s = xs[X s], W = x s[W])] (12)

= E[Yi|do(X s = x s)], (13)

where the inequality holds because xs (weakly) dominates x s. Note that the second and third equalities are derived through the third and second rules of do-calculus, respectively. The second rule of do-calculus assumes that Yi is not confounded with an(Yi)G via unobserved confounders. For Xs = pa(Y)G, it is possible to construct an SCM, conforming to G, such that strict inequality holds for some Yi, see the proof of Theorem B.2. This shows that pa(Y)G is the only possibly Pareto-optimal minimal intervention set.

We continue and study the more general case where unobserved confounders can be present between Yi and any of its ancestors. For this intent, we extend two existing concepts, called minimal unobserved-confounders territory and interventional border (Lee & Bareinboim, 2018), to the multi-objective setting. Using these notions, we derive results which can fully characterize possibly Pareto-optimal minimal intervention sets in the aforementioned scenario.

Definition 4.5 (Minimal unobserved confounders territory). Let H = G[An(Y)G]. A set of variables T in H, with Y T, is called a UC-territory for G w.r.t. Y if De(T)H = T and CC(T)H = T. The UC-territory T is said to be minimal, denoted T = MUCT(G, Y), if no T T is a UC-territory.

A minimal UC-territory for G w.r.t. Y can be constructed by extending a set of variables, starting from Y, and iteratively updating the set with the c-component and descendants of the set. More intuitively, it is the minimal subset of An(Y)G that is governed by unobserved confounders, where at least one target Yi is adjacent to an unobserved confounder.

Definition 4.6 (Interventional border). Let T = MUCT(G, Y). Then, B = pa(T)G\T is called the interventional border for G w.r.t. Y, denoted as IB(G, Y).

We have already described these concepts in the main part. Before connecting the notion of minimal UC-territory and interventional border to possibly Pareto-optimal minimal intervention sets, we require the following proposition:

Multi-Objective Causal Bayesian Optimization

Proposition B.1. Let T be a minimal UC-territory and B an interventional border for G w.r.t. Y. Let Xs X be an intervention set and S = (T Xs) B. Then, for any xs D(Xs) there exists s D(S) such that E[Yi|do(S = s)] E[Yi|do(Xs = xs)], for all i = 1, . . . , m.

Proof. (Case B Xs) Let xs D(Xs) be an intervention value. Then, by the third rule of do-calculus, it holds E[Yi|do(Xs = xs)] = E[Yi|do(Xs (T B) = xs[T B])], 1 i m. Since Xs (T B) = S, by setting s = xs[T B], it follows E[Yi|do(Xs = xs)] = E[Yi|do(S = s)].

(Case B Xs) Let xs D(Xs) be an intervention value. We define B = S\(Xs S) = S\(Xs (T B)) = B\(Xs B) and W = Xs\(Xs S) = Xs\(Xs (T B)). Moreover, let s D(S) such that it dominates s D(S), which is given by s[B ] = E[B |do(Xs = xs)] and s[Xs] = xs[T B]. If s is non-dominated, we set s = s. Then, for all i = 1, . . . , m it holds

E[Yi|do(S = s )] = E[Yi|do(Xs (T B) = s [Xs], B = s [B ])] (14)

E[Yi|do(Xs (T B) = s[Xs], B = s[B ])] (15)

= E[Yi|do(Xs (T B) = s[Xs], B = s[B ], W = xs[W])] (16)

= E[Yi|do(Xs (T B) = s[Xs], W = xs[W]), B = s[B ]] (17)

= E[Yi|do(Xs (T B) = s[Xs], W = xs[W])] (18)

= E[Yi|do(Xs = xs)], (19)

where the inequality holds because s is (weakly) dominated by s . Furthermore, the second and third equalities are derived through the third and second rules of do-calculus, respectively.

The following proposition is a building block for characterizing possibly Pareto-optimal minimal intervention sets via interventional borders. The proof is similar to the one given by Lee & Bareinboim (2018)

Proposition B.2. IB(G, Y) is a possibly Pareto-optimal minimal intervention set.

Proof. The intuition of this proof is to construct an SCM, conforming to G, for which the single best strategy involves intervening on IB(G, Y). Let T and B denote MUCT(G, Y) and IB(G, Y), respectively. Every exogenous variable in U shall be a binary variable with its domain being {0, 1}. Let denote the exclusive-or function and W the logical OR operator.

(Case T = Y) In this case, B corresponds to the parents of Y. Therefore, no target variable Yi is confounded with an(Yi)G via unobserved confounders. Define an SCM such that

Each endogenous variable V V is influenced by an exogenous variable UV V;

f Yi = W u Yi W pa Yi with P(UYi = 0) 1, for all i = 1, . . . , m;

f X = (L u X) (L pa X) for X X and P(U = 0) = 0.5 for every U U\(Sm i=1 UYi).

By the third rule of do-calculus and by taking conditional expectations, it holds

E[Yi|do(B = 0)] = E[Yi|do(pa(Yi)G = 0)] (20)

= E[Yi|do(pa(Yi)G = 0), UYi = 0]P(UYi = 0) + E[Yi|do(pa(Yi)G = 0), UYi = 0]P(UYi = 0) (21)

for every 1 i m. Meanwhile, all other interventions yield expectations greater than or equal to 0.5 in at least one component. Therefore, B is a possibly Pareto-optimal minimal intervention set.

(Case T Y) In this case, at least one target variable Yi has an unobserved confounder with its ancestors. As a first step, it will be shown that there exists an SCM, conforming to H = G[T B], where the intervention do(B = 0) is the single best strategy. To achieve this, we first define individual SCMs for each unobserved confounder in H[T], and merge them into a single SCM where do(B = 0) is indeed the best strategy. Let U = {Uj}k j=1 be the set of unobserved confounders in H[T].

Multi-Objective Causal Bayesian Optimization

Figure 8. Original causal graph G and its color-coded subgraphs for each unobserved confounder.

Table 3. Values for M1, M2 and M given X4 = X5 = 0. The target variables are shown as bit sequences, Y 1 and Y 2, as well as binary values, Y1 and Y2.

Given Uj U , let B(j) and R(j) denote its two children. We define an SCM Mj, where the graph structure is given by

Hj = H h De n B(j), R(j)o

H B pa De n B(j), R(j)o

and all bidirected edges, except for Uj, are removed. In order to set the structural equations for variables in Hj, the vertices will be labelled via colour coding: Let vertices in De B(j)

H be labelled as blue, De R(j)

H as red, and De B(j)

H as purple. All target variables are coloured as purple as well. Moreover, B(j) and R(j) shall perceive Uj as a parent coloured as blue with value Uj and red with value 1 Uj, respectively. The blue-, redand purple-coloured variables are set to 3 if any of their parents in B is not 0. Otherwise, their values are determined as follows. For every blue and red vertex, the associated structural equation returns the common value of its parents of the same colour and returns 3 if coloured parents values are not homogeneous. For every purple vertex, its corresponding equation returns 2 if every blue, red and purple parent is 0,1, and 2, respectively, and returns 1 if 1,0,1, respectively.

Next, the SCMs M1, . . . , Mk will be merged into one single SCM, that conforms to H, and for which do(B = 0) is the single best intervention. Note that in Mj all variables can be represented with just two bits. To construct a unified SCM, variables in T are represented with 2k bits, where Mj takes the 2j 1th and 2jth bits. Every target variable Yi is represented as a sequence of bits and binarised as follows. Yi is set to 0 if its 2j 1th and 2jth bits are 00, 01 or 10 for every 1 j k, and 1 otherwise. Let P(Uj = 1) = 0.5 for Uj U . Therefore, it holds Yi = 0 if do(B = 0) and Yi = 1 if do(B = 0). If any variable in T is intervened, then at least one SCM Mj will be disrupted, resulting in an expectation larger than or equal to 0.5 for at least one target variable. In the multi-target setting, it may happen that some target variables do not occur in any of the Mj s. This happens if a target Yi has no parents in T, but only in B. For all such Yi s, we set f Yi = u Yi W pa Yi with P(UYi = 0) 1. As such, the newly constructed SCM enforces E[Yi|do(B = 0)] 0. Meanwhile, all other interventions yield expectations greater than or equal to 0.5

As a last step, the previously defined SCM for H = G[T B], will be extended to an SCM for G. However, we can ignore joint probability distributions for any exogenous variables only affecting endogenous variables outside of H. Setting structural equations for endogenous variables outside of H is redundant as well. For V An(Y)G\T, we define the structural equations as f V = (L u V ) (L pa V ). For U U\U , we set P(U = 0) = 0.5 if U s child(ren) is disjoint to T, and P(U = 0) 1 otherwise. Note that do(B = 0) is still the single optimal intervention. Therefore, B is a possibly

Multi-Objective Causal Bayesian Optimization

Pareto-optimal minimal intervention set.

In order to illustrate the construction of an SCM where do(IB(G, Y) = 0) is the single best strategy, consider Figure 8, showing an exemplary graph and its colour-coded subgraphs, H1 and H2, for each unobserved confounder. Table 3 presents the associated values for M1 and M2, as well as values for the target variables in the final SCM M. The next proposition generalizes the previous one.

Proposition 4.7. IB(GXs, Y) is a possibly Pareto-optimal minimal intervention set for any Xs P(X).

Proof. Let Xs be an intervention set. Let us denote T = MUCT(GXs, Y), B = IB(GXs, Y) and T0 = MUCT(G, Y). Using the strategy from Theorem 4.7, we construct an SCM for G[T B] while ignoring unobserved confounders between T and T0\T. Let U be the set of such unobserved confounders. Now, the SCM needs to be modified to ensure that do(B = 0) is the single best intervention. Every U U shall flip (i.e., 0 1) the value of its endogenous child in T whenever U = 1. Let P(U = 0) 1, so that it holds E[Yi|do(B = 0)] 0. Intervening on B = 0 or on any variable in T results in expectations around 0.5 or above.

Notably, Proposition 4.7 extends Proposition B.2 when Xs = . Note that, by iterating over all intervention sets Xs P(X), we can discover possibly Pareto-optimal minimal intervention sets in a given graph. The following theorem is an extension of the main result by Lee & Bareinboim (2018) to the scenario where multiple target variables are present. It shows that the aforementioned strategy suffices to find not some, but all, such sets.

Theorem 4.8. A set Xs is a possibly Pareto-optimal minimal intervention set if and only if it holds IB(GXs, Y) = Xs.

Proof. (If) This is a special case of Proposition 4.7.

(Only if) Let Xs be a minimal intervention set and xs D(Xs) an intervention value. Denote T = MUCT(GXs, Y), B = IB(GXs, Y), T0 = MUCT(G, Y) and B0 = IB(G, Y). From Theorem B.1, we know that no POMIS intersects with An(B0)G\B0 and thus, it is possible to conclude Xs T0 B0\Y. Note that it holds Xs An(B)G since otherwise it would follow Xs T = , which contradicts that Xs is neither a descendant of some variable nor confounded in GXs. Let B = B\(Xs B) and W = Xs\(Xs B). Moreover, we define an intervention value b D(B) such that it dominates b D(B), which is given by b[B ] = E[B |do(Xs = xs)] and b[Xs] = xs[B]. If b is non-dominated, we set b = b. Then, for all i = 1, . . . , m, it holds

E[Yi|do(B = b )] = E[Yi|do(B Xs = b [Xs], B = b [B ])] (24)

E[Yi|do(B Xs = b[Xs], B = b[B ])] (25)

= E[Yi|do(B Xs = b[Xs], B = b[B ], W = xs[W])] (26)

= E[Yi|do(B Xs = b[Xs], W = xs[W]), B = b[B ]] (27)

= E[Yi|do(B Xs = b[Xs], W = xs[W])] (28)

= E[Yi|do(Xs = xs)], (29)

where the inequality holds because b is (weakly) dominated by b . Furthermore, the second and third equalities are derived through the third and second rules of do-calculus, repectively.

Theorem 4.8 provides a necessary and sufficient condition for a set of variables to be a possibly Pareto-optimal minimal intervention set. The proof of the theorem gives the following corollary:

Corollary 4.9. Let Xs P(X) and X s = IB(GXs, Y). For any xs D(Xs) there exist x s D(X s) such that it holds µ(X s, x s) µ(Xs, xs), for all 1 i m.

Multi-Objective Causal Bayesian Optimization

C. MO-CBO Problems

In this section, we present the collection of synthetic and real-world causal graphs used to evaluate the performance of

MO-CBO in comparison to standard MOBO baselines.

Cost of Interventions The MO-CBO algorithm requires evaluating the objective functions which inherently involves implementing interventions on the system. However, in practical scenarios, such interventions can be costly, making it important to prioritize sample efficiency and explicitly account for the cost of an intervention do(Xs = xs), denoted as cost(Xs, xs). We use the convention cost(Xs, xs) = P

Xi Xs,xi=xs[Xi] cost(Xi, xi). Each experiment is conducted under a fixed cost budget, terminating once the budget is exhausted. To reflect practical constraints, every MO-CBO scenario includes a tailored cost structure that accurately represents the varying difficulty or expense of different interventions.

C.1. SYNTHETIC-1

We introduce the first synthetic MO-CBO problem in our experimental study, referred to as SYNTHETIC-1, which is defined by the causal graph G and associated structural assignments presented in Figure 9. The interventional domains are specified as D(X1), D(X2) = [ 1, 2] and D(X3), D(X4) = [ 1, 1]. Moreover, all exogenous variables follow the standard normal distribution, and there are no unobserved confounders. All treatment variables Xi, 1 i 4, shall have fixed unit cost of cost(Xi, xi) = 1 for all xi D(Xi).

X1 = exp((X3 X4)/2) + UX1

X2 = ((X3 X4)/2)3 + UX2 X3 = UX3

Y1 = (X1 + X2)2 + UY1

Y2 = (X1 + X2 10)2 + UY2

UXi, UYi N(0, 1)

Figure 9. SYNTHETIC-1. An SCM consisting of four treatment and two output variables, depicted with grey and red nodes, respectively. There are no unobserved confounders.

C.2. SYNTHETIC-2

SYNTHETIC-2 is the next MO-CBO problem of our experimental study, defined by the causal graph G and associated structural equations in Figure 10. The interventional domains are D(X1) = [ 2, 5], D(X4) = [ 4, 5] and D(Xi) = [0, 5] for i = 1, 2. Moreover, the exogenous variables UXi, UYi follow a Gaussian distribution, and there is an unobserved confounder U influencing the target variable Y1 and its ancestor X4. All treatment variables Xi, 1 i 4, have fixed unit cost of cost(Xi, xi) = 1 for all xi D(Xi).

X1 = X4/2 + U 2 X1 X2 = U 2 X2 X3 = U 2 X3 X4 = U + U 2 X4

Y1 = ln(1 + X2 1) + 2 X2 2 X1 X2 U/2 + U 3 Y1 Y2 = sin(X2 2) X2 3 X2 X3 + 50 + U 3 Y2

U { 4, 4}, P(U = 4) = P(U = 4) = 0.5

UXi, UYi N(0, 0.5)

Figure 10. SYNTHETIC-2. An SCM consisting of four treatment and two output variables, depicted with grey and red nodes, respectively. It includes an unobserved confounder, denoted via the dashed bi-directed edge, affecting one output and its ancestor.

Multi-Objective Causal Bayesian Optimization

C.3. HEALTH

The MO-CBO problem HEALTH is defined by the causal graph and structural equations in Figure 11. This model originates from previous works of Ferro et al. (2015), and is based on real-world causal relationships. It captures prostate-specificantigen (PSA) levels in causal relation to its risk factors, such as BMI, calorie intake (CI) and aspirin usage. The variable Aspirin indicates the daily aspirin regimen while Statin denotes a subject statin medication. Additionally, PSA represents the total antigen level circulating in a subject s blood, measured in ng/m L. For patients sensitive to Statin medications, the aim is to determine how to manipulate relevant risk factors to minimize both Statin and PSA. To this end, we treat both Statin and PSA as target variables. The treatment variables include BMI, Weight, CI, and Aspirin usage with interventional domains D(BMI) = [20, 30], D(Weight) = [50, 100], D(CI) = [ 100, 100] and D(Aspirin) = [0, 1]. We choose to consider a specific age groups of interest, and define Uage as a Gaussian random variable with mean 65 and standard deviation 1, focusing on individuals close to the age of 65. The variables CI and Aspirin are set to have fixed unit cost, i.e. cost(X, x) = 1 for X {CI, Aspirin}, x D(X). Since BMI and weight are significantly harder to treat, we increase their cost to cost(X, x) = 3 for X {BMI, weight}, x D(X).

The single-objective version of HEALTH, aiming to minimize only PSA, has previously been used to demonstrate the applicability of CBO (Aglietti et al., 2020), as well as for several of its variants (e.g. Gultchin et al. (2023) and Aglietti et al. (2023)).

CI = UCI, UCI U( 100, 100)

BMR = 1500 + 10UBMR, Uheight t N( 1, 2)

height = 175 + 10Uheight, Uheight t N( 0.5, 0.5)

age = Uage, Uage N(65, 1)

weight = (BMR + 6.8age 5height)/(13.7 + CI150/7716)

BMI = weight/(height/100)2

aspirin = σ( 8 + 0.1age + 0.03BMI), σ(x) = 1 1+e x statin = σ( 13 + 0.10age + 0.20BMI)

cancer = σ(2.2 0.05age + 0.01BMI 0.04statin + 0.02aspirin)

PSA = 6.8 + 0.04age 0.15BMI 0.6statin + 0.55aspirin + cancer + UPSA

UPSA N(0, 0.4)

Figure 11. HEALTH. An SCM with relations between variables such as age, BMI, aspirin and statin usage, and their effects on PSA levels (Gultchin et al., 2023). U( , ) denotes a uniform distribution and t N(a, b) a standard Gaussian distribution truncated between a and b. Red, orange, and grey nodes depict target, manipulative, and non-manipulative variables, respectively.

C.4. CREDIT APPROVAL

The final MO-CBO problem in our experiments is called CREDIT APPROVAL, and is specified by the causal graph and structural equations shown in Figure 12. This problem models the probability of credit approval as a function of various demographic and financial variables, including age, gender, education, loan amount, loan duration, income, and savings. It is based on the German Credit UCI dataset (Murphy, 1994) and causal relationships adapted from in (Karimi et al., 2020). We treat both approval probability and loan duration as target variables. The treatment variables are education, loan amount, income and savings, with interventional domains given as: D(education) = [ 0.5, 0.5], D(loan amount) = [ 1, 2], D(income) = [ 2, 1], and D(savings) = [ 5, 1]. Note that the variables age, education, loan amount, loan duration, income and savings are modelled as a deviations from their means. To reduce complexity and focus our analysis, we fix the gender variable to 0 (e.g., male), diverging from the original specification by Karimi et al. (2020), where gender is defined as Bernoulli(0.5). Additionally, we assume that all other variables remain close to their mean values and reduce the level of observational noise compared to Karimi et al. (2020). The variables loan amount and income are set to have fixed unit cost, i.e. cost(X, x) = 1 for X {loan amount, income}, x D(X). For education and savings, we increase the cost to cost(X, x) = 2 for X {education, savings}, x D(X).

Multi-Objective Causal Bayesian Optimization

amount edu.

duration income

savings approval

age = 35 + Uage, Uage Gamma(10, 3.5) edu. = 0.5 + σ 1( 1 + 0.5gender + σ 1(0.1age) + Uedu.) 1, Uedu. N(0, 0.25) amount = 1 + 0.1(age 5)(5 age) + gender + Uamount, Uamount N(0, 4) duration = 1 + 0.01age + 2gender + amount + Uduration, Uduration N(0, 9) income = 4 + 0.1(age + 35) + 2gender + gender edu. + Uincome, Uincome N(0, 4) savings = σ( 4 + 1.5 income>0income + Usavings, Usavings N(0, 25) approval = σ(0.3( amount duration + income + savings + income savings))

Figure 12. CREDIT APPROVAL. An SCM which models the probability of credit approval as a function of various demographic and financial variables (Karimi et al., 2020). Red, orange, and grey nodes depict target, manipulative, and non-manipulative variables, respectively.

D. Experiments

D.1. Hyperparameters

MO-CBO algorithm The DGEMO backbone of our MO-CBO algorithm has mostly the same hyperparameters as its original implementation from Konakovic Lukovic et al. (2020). The batch size is set to 5.

Par EGO We adopt a batch variant of Par EGO by using b random scalarization weights in each iteration, with b being the batch size. Moreover, Chebyshev scalarization (Miettinen, 1999) and the CMA-ES algorithm (Hansen, 2023) are used to solve the scalarized single-objective problems with σ = 0.5 as initial standard deviation.

MOEA/D-EGO Following Konakovic Lukovic et al. (2020), our implementation of the MOEA/D-EGO baseline follows its original framework, with the key difference being the removal of Fuzzy CM. Given the current computational efficiency of training Gaussian process models, they opt to use them directly for prediction instead of relying on faster but less accurate approximation techniques. As a result, this version may offer improved performance due to the enhanced predictive accuracy. We employ simulated binary crossover and polynomial mutation for MOEA/D, with the remaining hyperparameters detailed in Konakovic Lukovic et al. (2020).

TSEMO For TSEMO, we largely adopt the same hyperparameter settings as in the original implementation. Specifically, we use 100 points for spectral sampling.

q NEHVI We implement q NEHVI using the botorch library. We use 10 optimization restarts, and 64 raw samples for acquisition maximization. Moreover, the acquisition function uses a Sobol QMC sampler with 128 samples.

DGEMO For DGEMO, we retain the hyperparameter configuration from its original implementation.

D.2. Performance Metrics

We require metrics to assess the quality of a given Pareto front approximation. These metrics evaluate both the convergence of the approximated front to the true front and the diversity of the solutions across the performance space. For a given optimisation problem, let A be the set of points from an approximated Pareto front. If the ground-truth Pareto front is known, it is possible to evaluate how well A approximates it given the following two metrics. Let Z be the set of points on the true Pareto front.

Generational distance (GD) A common performance indicator for evaluating a given Pareto front approximation is the so-called generational distance (Schutze et al., 2012). It is the average distance from any point ai A to its closest point in the Pareto front Z. Formally,

GD(A) = 1 |A|

Multi-Objective Causal Bayesian Optimization

where dp i is the Euclidean distance from ai to its nearest point in Z. We set p = 2 in our experiments.

Inverted generational distance (IGD) The inverted generational distance measures the distance from any point zi Z to its closest point in A (Schutze et al., 2012). Thereby, it can serve as an indicator of the coverage given by the approximated front. Formally,

i=1 ˆdi p 1/p , (31)

where dp i is the Euclidean distance from zi to its nearest point in A. We set p = 2 in our experiments.

The GD evaluates the convergence of the approximated Pareto front to the true front, whereas the IGD measures the diversity of the solutions across the output space. For both metrics, smaller values indicate a better approximation of the true Pareto front.

D.3. Runtime

All experiments were executed on a machine equipped with an Apple M2 processor and 8GB of RAM. The average runtimes are reported in Table 4.

Table 4. Algorithm runtime comparison (seconds per iteration), averaged across 10 seeds.

Problem Par EGO MOEA/D-EGO TSEMO q NEHVI DGEMO MO-CBO (ours)

SYNTHETIC-1 14.39 0.62 0.20 14.23 4.35 3.43 SYNTHETIC-2 9.29 0.74 0.35 12.09 4.47 3.82 HEALTH 8.65 4.19 3.81 18.98 6.26 6.12 CREDIT APPROVAL 10.13 0.77 0.34 20.87 26.45 10.55

Multi-Objective Causal Bayesian Optimization

D.4. Results

SYNTHETIC-1 We present the experimental results for the MO-CBO problem SYNTHETIC-1, see Figure 13. We observe that our MO-CBO algorithm consistently outperforms all MOBO baselines, yielding more diverse and well-distributed solutions across the target Pareto front Pc f(P(X)). All methods were run until a fixed cost budget of 150 was exhausted.

Pc f(P(X)) MO-CBO (ours)

Par EGO MOEA/D-EGO

TSEMO q NEHVI

Figure 13. SYNTHETIC-1. Comparison of Pareto front approximations produced by our MO-CBO algorithm and various MOBO baselines: Par EGO, MOEA/D-EGO, TSEMO, q NEHVI, and DGEMO. The x-axis corresponds to objective Y1, and the y-axis to Y2.

Multi-Objective Causal Bayesian Optimization

SYNTHETIC-2 We present the experimental results for the MO-CBO problem SYNTHETIC-2, see Figure 14. We observe that the solutions identified by our MO-CBO algorithm tightly fit the target Pareto front Pc f(P(X)), and strictly dominate those found by the MOBO baselines. All methods were run until a fixed cost budget of 200 was exhausted.

Pc f(P(X)) MO-CBO (ours)

Par EGO MOEA/D-EGO

TSEMO q NEHVI

Figure 14. SYNTHETIC-2. Comparison of Pareto front approximations produced by MO-CBO (ours) and various MOBO baselines: Par EGO, MOEA/D-EGO, TSEMO, q NEHVI, and DGEMO. The x-axis corresponds to objective Y1, and the y-axis to Y2.

Multi-Objective Causal Bayesian Optimization

HEALTH We present the experimental results for the MO-CBO problem HEALTH, see Figure 15. We observe that the solutions identified by our MO-CBO algorithm tightly fit the target Pareto front Pc f(P(X)), and strictly dominate those found by the MOBO baselines. All methods were run until a fixed cost budget of 120 was exhausted.

Pc f(P(X)) MO-CBO (ours)

Par EGO MOEA/D-EGO

TSEMO q NEHVI

Figure 15. HEALTH. Comparison of Pareto front approximations produced by MO-CBO (ours) and various MOBO baselines: Par EGO, MOEA/D-EGO, TSEMO, q NEHVI, and DGEMO. The x-axis corresponds to objective Y1, and the y-axis to Y2.

Multi-Objective Causal Bayesian Optimization

CREDIT APPROVAL We present experimental results for the MO-CBO problem CREDIT APPROVAL in Figure 16. Our

MO-CBO algorithm consistently outperforms the MOBO baselines, producing solutions that are both more diverse and better distributed across the target Pareto front. The experiments are executed until a cost budget of 300 is reached.

Pc f(P(X)) MO-CBO (ours)

Par EGO MOEA/D-EGO

TSEMO q NEHVI

Figure 16. CREDIT APPROVAL. Comparison of Pareto front approximations produced by MO-CBO (ours) and various MOBO baselines: Par EGO, MOEA/D-EGO, TSEMO, q NEHVI, and DGEMO. The x-axis corresponds to objective loan duration, and the y-axis to the approval probability.