# multiobjective_influence_diagrams_with_possibly_optimal_policies__2e039d6a.pdf

Multi-Objective Inﬂuence Diagrams with Possibly Optimal Policies

Radu Marinescu IBM Research Ireland radu.marinescu@ie.ibm.com

Abdul Razak and Nic Wilson Insight Centre for Data Analytics University College Cork, Ireland {abdul.razak,nic.wilson}@insight-centre.org

The formalism of multi-objective inﬂuence diagrams has recently been developed for modeling and solving sequential decision problems under uncertainty and multiple objectives. Since utility values representing the decision maker s preferences are only partially ordered (e.g., by the Pareto order) we no longer have a unique maximal value of expected utility, but a set of them. Computing the set of maximal values of expected utility and the corresponding policies can be computationally very challenging. In this paper, we consider alternative notions of optimality, one of the most important one being the notion of possibly optimal, namely optimal in at least one scenario compatible with the inter-objective tradeoffs. We develop a variable elimination algorithm for computing the set of possibly optimal expected utility values, prove formally its correctness, and compare variants of the algorithm experimentally.

Introduction

Multi-Objective Inﬂuence Diagrams (MOID) (Marinescu, Razak, and Wilson 2012) are a recent extension of inﬂuence diagrams (ID) (Howard and Matheson 1984) that provide a general framework for modeling and solving sequential decision making problems under uncertainty and multiple objectives. They involve both chance variables, where the outcome is determined randomly based on the values assigned to other variables, and decision variables, which the decision maker can choose the value of, based on observations of some other variables. Uncertainty is represented (like in Bayesian networks) by a collection of conditional probability distributions, one for each chance variable. To cope with multiple and non-commensurate utility scales on which the decision maker s preferences are expressed, it is natural to consider multi-objective or multi-attribute utility functions (White, Sage, and Dozono 1984; Keeney and Raiffa 1993; Roy 1996; Ehrgott 1999). Therefore, the utility values are vectors in IRp, with p being the number of objectives, and the overall value of an outcome is given by the sum of a set of local multi-objective utility functions. Since the utility values are only partially ordered (for instance by the Pareto ordering) we no longer have a unique maximal value of expected utility, but a set of them. In general, given a MOID, the Pareto ordering on multi-objective

Copyright c 2017, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

utility is a rather weak one; the effect of this is that the set of maximal values of expected utility and the corresponding set of decision policies can often become huge. Therefore, the main drawback of MOIDs is that in most practical situations the set of undominated decision policies is often too large to be meaningful to the decision maker. Sometimes, the decision maker may be happy to provide additional tradeoffs between objectives, and these can be used to reduce further the set of undominated expected utility values and their policies (but typically not to singleton sets) as was recently shown in (Marinescu, Razak, and Wilson 2012).

Contribution In this paper we take a different approach and focus on ﬁnding optimal values of expected utility with respect to alternative notions of optimality, such as the notion of possibly optimal solutions considered in (Wilson, Razak, and Marinescu 2015a) for multi-objective constraint optimization problems. More speciﬁcally, we extend this approach to multi-objective inﬂuence diagrams and look to compute the set of possibly optimal values of expected utility and the corresponding decision policies. A possibly optimal policy is a policy that is optimal in at least one scenario that is compatible with the inter-objective tradeoffs. We develop a variable elimination algorithm for computing this set, and give the key formal properties that ensure the correctness of the computation. Quite often, the decision maker is happy to provide some preferences for one multi-objective utility vector over another; these may arise, for example, from an iterative elicitation technique. We use such inputs to infer other preferences and use them to eliminate dominated utility vectors during the computation. Our numerical experiments demonstrate the efﬁciency of the proposed algorithm both in terms of solution quality (as the cardinality of the solution sets obtained) and the solution times. Proofs and additional experimental results are contained in a longer version of the paper (Marinescu, Razak, and Wilson 2017).

Background Multi-objective Utility Values Let p be the number of objectives (or attributes). A multi-objective utility value is deﬁned to be a vector u = (u1, . . . , up) IRp, where ui represents the utility with respect to objective i {1, . . . , p}. We assume the standard pointwise arithmetic operations, namely u + v = (u1 + v1, . . . , up + vp) and q u = (q u1, . . . , q up),

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

where q IR. For ﬁnite sets A, B IRp, we deﬁne A + B = { u + v : u A, v B}.

DEFINITION 1 (weak Pareto order) Let u, v IRp so that u = (u1, . . . , up) and v = (v1, . . . , vp). We deﬁne the binary relation on IRp by u v i = 1, . . . , p, ui vi.

Let be a partial order on IRp and let u, v IRp. If u v then we say that u dominates v (as usual, the symbol refers to the asymmetric part of ). Given ﬁnite sets U, V IRp, we say that U dominates V , denoted U V , if v V u U such that u v. Furthermore, given a ﬁnite set U IRp, we deﬁne the maximal set max (U) = { v U | v U, v u}. When is the weak Pareto ordering , we call max (U) the Pareto set.

Imprecise Tradeoffs

We assume that the decision maker (DM) provides additional preferences i.e., a set Θ of pairs of the form ( u, v) meaning that they prefer multi-objective vector u to v. These can be viewed as a general kind of tradeoff between the objectives. We will use this input information to deduce further preferences, based on the assumption that the user s model is a weighted sum of the objectives (thus being a simple form of a Multi-attribute Utility Theory (MAUT) model (Figueira, Greco, and Ehrgott 2005)). The treatment of imprecise tradeoffs is based on the formalism introduced recently in (Marinescu, Razak, and Wilson 2012; 2013; Wilson, Razak, and Marinescu 2015a). Let W be the set of vectors in IRp with all components non-negative and summing to 1. Elements of W are known as weights vectors. Each weights vector w W induces a total pre-order w on IRp by, for u, v IRp, u w v if and only if w u w v, where e.g., w u = p i=1 wi ui. We assume that the user s preference ordering over multiobjective vectors IRp is equal to w for some weights vector w W (which is unknown to us). For w W and preference pair ( u, v), we say that w satisﬁes ( u, v) if u w v, i.e., if w u w v. We also say, for set of pairs Θ, that w satisﬁes Θ if w satisﬁes each pair in Θ; let W(Θ), the set of (consistent) scenarios, be all such w. If we knew the weights vector w, the problem would reduce to a single-objective problem. However, all we know is that w W(Θ). This leads to the induced preference relation Θ deﬁned as follows: u Θ v if and only if u w v for all w W(Θ); hence, Θ is equal to the intersection of relations w over all w W(Θ). Therefore, u is preferred to v if and only if the preference holds for every consistent scenario. The associated strict relation Θ is given by u Θ v u Θ v and v Θ u. When Θ is empty, Θ is just the Pareto ordering. We make the weak assumption that Θ is a partial order.

Operators PO, CSD and POCSD Let D be a ﬁnite set of utility vectors. There is more than one way of deﬁning the optimal utility vectors in D. Given input preferences Θ, deﬁne the set CSDΘ(D) to be the Θ-undominated vectors in D. (CSD stands for Can Strictly Dominate, since u CSDΘ(D) if and only if for all non-equivalent t D

there exists a consistent scenario w in which u strictly dominates t, i.e., u w t (Wilson and O Mahony 2011).) For w W(Θ), we deﬁne O w(D) to be the vectors u in D that are optimal in scenario w, i.e., that maximize w u. We deﬁne POΘ(D) to be the set of possibly optimal utility vectors, i.e., the vectors u D that are optimal in some consistent scenario, i.e., such that there exists w W(Θ) with O w(D) u. Thus, POΘ(D) is equal to the union of O w(D) over all w W(Θ). We also deﬁne operator POCSDΘ by POCSDΘ(D) = POΘ(D) CSDΘ(D). We will sometimes abbreviate POΘ to PO, and similarly for CSDΘ and POCSDΘ. One can also deﬁne the set of necessarily optimal utility vectors NOΘ, i.e., vectors that are optimal in every scenario. This will usually be empty. However, if NOΘ is non-empty, then it is equal to CSDΘ (Wilson and O Mahony 2011), so it sufﬁces to compute the latter.

Multi-objective Inﬂuence Diagrams

A Multi-objective inﬂuence diagrams (MOID) is deﬁned by a tuple C, D, P, U , where C = {C1, . . . , Cn} is a set of chance variables which specify the uncertain decision environment and D = {D1, . . . , Dm} is a set of decision variables which specify the possible decisions to be made. Ω(Y ) denotes the domain of variable Y C D, and for a set S C D such that S = {Yi1, . . . , Yik}, we deﬁne Ω(S) = Ω(Yi1) Ω(Yik). As in Bayesian networks (Pearl 1988), each chance variable Ci C is associated with a conditional probability table (CPT) Pi = P(Ci|pa(Ci)), where Pi P and pa(Ci) C D \ {Ci}. Each decision variable Dk D has a parent set pa(Dk) C D\{Dk}, denoting the variables whose values will be known at the time of the decision and may affect directly the decision. The multi-objective utility functions U = {U1, . . . , Ur} are deﬁned over subsets of variables Q = {Q1, . . . , Qr}, where Qi C D is called the scope of Ui, and represent the preferences of the decision maker, namely Ui : Ω(Qi) IRp. Utility function Ui can be identiﬁed in the obvious way with a function to 2IR p, by mapping each element of Ω(Qi) to a singleton subset of IRp; we abuse notation by using Ui to refer to both functions. The local utility functions deﬁne a global multi-objective utility function that is additively decomposable, as follows: U(C D) = r j=1 Uj(Qj). A policy for a MOID is a set of decision rules Δ = (δ1, . . . , δm) consisting of one rule for each decision variable. A decision rule for the decision Dk D is a mapping δk : Ω(pa(Dk)) Ω(Dk). Therefore, a policy Δ determines a value for each decision variable Dk (which depends on the parents set pa(Dk)). We assume acyclicity of the dependency relation on decision variables. Given a utility function U, a policy Δ yields a utility function [U]Δ that involves no decision variables, by assigning their values using Δ. The expected utility of policy Δ is EUΔ =

C[( n i=1 Pi r j=1 Uj)]Δ. We therefore have EUΔ IRp. Solving a MOID means ﬁnding the set of policies that

Figure 1: A bi-objective inﬂuence diagram.

generate maximal values of expected utility, i.e., values of utility in the set max {EUΔ | policies Δ}. We say that a policy Δ is undominated if the corresponding expected utility vector EUΔ is undominated. Example 1 In Figure 1 we show the inﬂuence diagram of the oil wildcatter problem (adapted from (Raiffa 1968)). An oil wildcatter must decide either to drill or not to drill for oil at a speciﬁc site. Before drilling, a seismic test could help determine the geological structure of the site. The test results can show a closed reﬂection pattern (indication of signiﬁcant oil), an open pattern (indication of some oil), or a diffuse pattern (almost no hope of oil). The special value notest is used if no seismic test is performed. There are therefore two decision variables, T (Test) and D (Drill), and two chance variables S (Seismic results) and O (Oil contents). The probabilistic knowledge consists of the conditional probability tables P(O) and P(S|O, T). We consider a utility function with two attributes representing the testing/drilling payoff and damage to the environment, respectively. The utility of testing (represented by the component U1(T)) is ( 10, 10), whereas the utility of drilling (represented by the component U2(O, D)) is ( 70, 18), (50, 12), (200, 8) for a dry, wet and soaking hole, respectively. Therefore, the utility function is the sum of U1(T) and U2(O, D). The aim is to ﬁnd optimal policies that maximize the payoff and minimize the damage to environment. The dominance relation is deﬁned in this case by u v u1 v1 and u2 v2 (e.g., (10, 2) (8, 4) and (10, 2) (8, 1)). The Pareto set max {EUΔ | policies Δ} contains 4 elements, i.e., {(22.5, 17.56), (20, 14.2), (11, 12.78), (0, 0)}, corresponding to the four optimal (undominated) policies shown below:

Δ1 Δ2 Δ3 Δ4

δT yes no yes no δD yes (S = closed) yes (S = notest) yes (S = closed) no (S = notest) yes (S = open) no (S = open) no (S = diffuse) no (S = diffuse) EUΔi {(22.5, 17.56)} {(20, 14.2)} {(11, 12.78)} {(0, 0)}

Optimality Operators We want to show that MOID computations can be correctly performed with respect to optimality deﬁnitions PO and

POCSD, as well as CSD. In this section, building on work by (Wilson, Razak, and Marinescu 2015a), we consider a more general and abstract notion of optimality, deﬁned axiomatically. We show that the operators satisfy properties that ensure correctness of a variable elimination computation.

Axioms and Properties For set of alternatives or outcomes A, OPT(A) is intended to be the set of optimal ones, with respect to some particular notion of optimality. Thus, operator OPT is a function that has input some set of outcomes A (which is a subset of some universal set D of outcomes), and returns a subset of A.

DEFINITION 2 (optimality operator) Let D be a ﬁnite set. We say that OPT is an optimality operator over D if OPT is a function from 2D to 2D satisfying the following three conditions, for arbitrary A, B D.

(1) OPT(A) A; (2) If A B then OPT(B) A OPT(A); (3) If OPT(B) A B then OPT(A) = OPT(B).

The properties entail that OPT is idempotent (Lemma 1 of (Wilson, Razak, and Marinescu 2015b)). The following property, known as Path Independence in the social choice function literature (Plott 1973; Aizerman and Malishevski 1981; Moulin 1985), is another important consequence of the axioms, see Proposition 1 of (Wilson, Razak, and Marinescu 2015b):

PROPOSITION 1 Let OPT be an optimality operator over D. Then OPT satisﬁes the Union Decomposition property, i.e., for all A, B D, OPT(A B) = OPT(OPT(A) OPT(B)).

Additive Decomposition Suppose that D is a subset of IRp for some p = 1, 2, . . .. We say that OPT : 2D 2D satisﬁes the Additive Decomposition property if for any A, B D such that A + B D, OPT(A + B) = OPT(OPT(A) + OPT(B)). This property is necessary for the correctness of our variable elimination approach.

PO, CSD and POCSD as Optimality Operators Proposition 4 of (Wilson, Razak, and Marinescu 2015a) shows that POΘ, CSDΘ and POCSDΘ are optimality operators and satisfy Union Decomposition and Additive Decomposition. Inﬂuence diagram computation involves multiplication of a set of utility vectors by a positive real (probability value). The Scaling property relates to this. We say that OPT satisﬁes Scaling if for any A D and λ IR with λ > 0, we have OPT(λA) = OPT(λOPT(A)), whenever λA and λOPT(A) are in D. The fact that t w u if and only if λ t w λ u, leads to OPT(λA) = λOPT(A), for OPT {PO, CSD, POCSD}, which, applying OPT to both sides and using idempotence, implies Scaling.

PROPOSITION 2 Operators POΘ, CSDΘ and POCSDΘ satisfy Scaling.

Example 2 Continuing Example 1, suppose that we have the input tradeoff Θ = {((3, 0), (6, 4))}, where the 4 represents 4 units of environmental damage. Then, CSD = {(11, 12.78), (20, 14.2), (0, 0)} (where the minus sign indicates that the corresponding objective value is to be minimized) because (20, 14.2) Θ (22.5, 17.56). We have that PO = {(20, 14.2), (0, 0)} because the utility value (11, 12.78) PO since w W(Θ) satisfying the linear inequalities 1.5w1 + 2w2 0, 9w1 + 1.42w2 0, 11w1 12.78w2 0, w1 + w2 = 1 and w1 0, w2 0. Similarly, we can check that the utility values (20, 14.2) and (0, 0) are possibly optimal with respect to the scenarios (0.5, 0.5) and (0.4, 0.6), respectively. Consequently, we also have that POCSD = {(20, 14.2), (0, 0)}.

Equivalences using Convex Closure

For sets of utility vectors we consider a form of equivalence based on convex closure, i.e., two sets are equivalent if and only if they have same convex closure: A C B if and only if C(A) = C(B), where C(A) is the convex hull of A. We also make use of another form of equivalence , given partial order = Θ deﬁned above. For U, V IRp, we say that U V if every element of V is (weakly) dominated by some element of U, and deﬁne U V if and only if U V and V U. Given U, V IRp, we deﬁne the equivalence relation by U V if and only if C(U) C(V). Therefore, two sets of utility vectors are considered equivalent if, for every convex combination of elements of one, there is a convex combination of elements of the other which is at least as good (with respect to the partial order on IRp). Note that if A C B then A B. We have that:

PROPOSITION 3 Suppose that A, B D are such that A C B. Then PO(A) C PO(B).

Variable Elimination

We now describe the main contribution of this paper, that is a variable elimination algorithm that eliminates chance variables and decision variables, respectively, using marginalization operators X and

X, to compute the set of OPToptimal values of expected utility and their corresponding decision policies, where OPT is an optimality operator. The decision variables are assumed to be temporally ordered. This property which is known as regularity induces a strict partial order over all the variables C D, given by I0 D1 I1 Dm Im, where I0, . . . , Im are disjoint sets of chance variables, and for each k, 0 < k < m, Ik are the chance variables observed between Dk and Dk+1, I0 are the initial evidence variables, while Im are the unobserved variables. Consequently, the variables must be processed along a legal elimination ordering that respects the partial order induced by the decision variables, namely the reverse of the elimination ordering is some extension of to a total order (Jensen, Jensen, and Dittmer 1994; Dechter 2000; Marinescu, Razak, and Wilson 2012). The set of OPT-optimal expected utility values can be found by

Algorithm 1: MOVE(OPT)

Data: A MOID M = C, D, P, U with p > 1 objectives, a legal elimination ordering τ = (Y1, . . . , Yt) of the variables C D, optimality operator OPT Result: Set of OPT-optimal expected utility values and corresponding decision policies

1 Let Φ = {φ|φ P}, Ψ = {ψ|ψ U};

2 foreach variable Y in τ = (Y1, . . . , Yt) do

3 Create bucket BY and its associated sets ΦY and ΨY ;

4 Let ΦY = {φ|φ Φ, Y sc(φ)} and Φ Φ \ ΦY ;

5 Let ΨY = {ψ|ψ Ψ, Y sc(ψ)} and Ψ Ψ \ ΨY ;

6 foreach variable Y in ordering τ = (Y1, . . . , Yt) do

7 if Y C is a chance variable then

9 foreach ψ ΨY do

Y ( φ ΦY φ) ψ;

11 else if Y D is a decision variable then

12 foreach φ ΦY do

13 Let S = sc(φ) \ {Y };

14 Compute φY as follows: x Ω(S), φY (x) = φ(xy) for any value y Ω(Y ) ;

16 Place each of the φY (resp. ψY ) to the set ΦZ (resp. ΨZ) of the highest bucket BZ corresponding to a variable Z in sc(φY ) (resp. sc(ψY )); If Y is the last variable then add φY to Φ0 and ψY to Ψ0;

17 Let E = OPT((

18 foreach variable Y in reversed ordering τ = (Yt, . . . , Y1) do

19 if Y D is a decision variable then

21 Let S = sc(ψ) \ {Y }, δ : Ω(S) 2Ω(Y );

22 foreach x Ω(S) do

23 y = {y|ψ(xy) OPT( y Ω(Y ) ψ(xy))};

24 δ (x) = y ;

25 ΔY = {δY |δY (x) = y, x Ω(S), y δ (x)}

26 Let Δ = {(δ1, . . . , δm)| δY ΔY , Y D};

27 return (E, Δ )

For a real valued function f deﬁned over a set of variables S, f : Ω(S) IR, and a variable X S, the function ( X f) is deﬁned over U = S \ {X} as follows. For every u Ω(U), ( X f)(u) = x Ω(X) f(u, x).

Similarly, for g : Ω(S) 2IR p, the functions (

X g) are deﬁned as: for every u Ω(U), (

X g)(u) = OPT(

x Ω(X) g(u, x)), and (

X g)(u) = OPT(

x Ω(X) g(u, x)) (here

x denotes the repeated application of the + operator deﬁned for sets of utility vectors).

Algorithm MOVE Algorithm 1 describes our variable elimination procedure, called MOVE, for solving MOIDs. It takes as input an opti-

mality operator OPT (e.g., PO), a legal elimination ordering τ = Y1, . . . , Yt of the variables, and partitions the input functions into a set of buckets, such that each bucket BY is labeled by a single variable Y and is associated with sets ΦY and ΨY containing all input probability and utility functions whose highest variable in their scope is Y . MOVE processes each bucket, top-down from the ﬁrst to the last, by a variable elimination procedure that computes new probability (denoted by φ) and utility (denoted by ψ) components which are then placed in corresponding buckets (lines 6-16). For a chance variable Y , the φY component is generated by multiplying all probability functions in that bucket and eliminating Y by summation (line 8). The ψY component is computed as the average utility in that bucket, normalized by the bucket s compiled φ (line 10). Most importantly, the operator

Y uses OPT during elimination to ensure that only OPT-optimal utility values are kept in ψY . For a decision variable Y , we compute the ψY component by summing all the utility functions in that bucket and eliminating Y by operator

Y which prunes away non OPToptimal utility values (line 15). Each of the probability functions in this bucket is a constant when viewed as a function of the bucket s decision variable and therefore the compiled φY is a constant as well (lines 12-14) (Jensen, Jensen, and Dittmer 1994; Wilson and Marinescu 2012). The set E of OPT-optimal expected utility values is obtained after eliminating the last variable, by combining all remaining constant utility and probability functions. In a second, bottom-up step, MOVE generates all policies that produce expected utility values in set E (i.e., OPToptimal policies). The decision buckets are processed in reversed order, from the last to the ﬁrst. Let Y be the current decision variable. The algorithm computes the set ΔY of OPT-optimal decision rules associated with Y as follows. It ﬁrst combines all utility and probability functions residing in bucket BY into a temporary function ψ which is deﬁned on Y and its parent-set pa(Y ). Then, for each conﬁguration x of the parent-set pa(Y ), it records the subset of domain values y Ω(Y ) that correspond to OPT-optimal values of ψ (lines 21-24). The sets y are used to generate the decision rules δY ΔY , by δY (x) = y, for each conﬁguration x Ω(pa(Y )) and y y (line 25). Finally, the algorithm enumerates all OPT-optimal policies by selecting iteratively for each decision variable Y an OPT-optimal decision rule δY from the corresponding set ΔY (line 26).

Complexity Based on previous work (Dechter 2000; Marinescu, Razak, and Wilson 2012), the complexity of MOVE can be bounded (time and space) by O(K N kw ), where N is the total number of variables, k bounds the domain size, and w is the induced width of the legal elimination ordering. Since the utility values are vectors in IRp, it is not easy to bound K, the size of the set of OPT-optimal expected utility values.

Correctness of the MOVE Computation We will prove the correctness of the variable elimination algorithm, showing that the set of expected utility values computed is equivalent to the set of OPT-optimal utility values,

where OPT is an alternative optimality operator.

Algorithm MOVE(I): MOVE without Pruning We ﬁrst consider the version MOVE(I) of the algorithm which uses the identity optimality operator I given by I(A) = A, for all A. This is equivalent to removing the OPT operator from the marginalization operators

X, and lines 17 and 23 of Algorithm 1. Given an MOID M, let HM be the set of all values of (expected) utility generated by all possible policies, so that u is in HM if and only if there exists some policy with expected utility of u. The set of possibly optimal values of utility is equal to POΘ(HM). The set of undominated values of utility is equal to CSDΘ(HM), which equals max Θ(HM). The algorithm in (Marinescu, Razak, and Wilson 2012) computes CSDΘ(HM) up to -equivalence. The following result follows directly from Theorem 4 and Proposition 5 of (Wilson and Marinescu 2012) (using equalling =, since then in Theorem 4 equals convex closure equivalence).

THEOREM 1 Let G be the output of MOVE(I). Then, G C HM.

Using Proposition 3, this immediately implies the following corollary, stating that if we apply the PO operator to the result of the no-pruning algorithm, then we obtain, up to convex-closure-based equivalence, the set of possibly optimal values of utility.

COROLLARY 1 Let G be the output of MOVE(I). Then PO(G) C PO(HM).

However, using the no-pruning algorithm to generate G ﬁrst will tend to be very computationally expensive. Instead one would like to be able to apply the PO operator at intermediate stages of the algorithm. The algorithm involves applying the operations of union, addition and scalar multiplication to sets of utility vectors. The fact that the PO operator satisﬁes Union Decomposition, Additive Decomposition and Scaling implies that, if we apply PO to sets of utility vectors generated during the algorithm, the result is equal to PO(G). This leads to the theorem below, which shows that MOVE(PO) computes PO(HM), the set of possibly optimal utility vectors, up to convex closure equivalence. A related result holds for MOVE(POCSD) (see (Marinescu, Razak, and Wilson 2017) for details).

THEOREM 2 Let F be the set of utility vectors generated by MOVE(PO). Then F C PO(HM), i.e., F is convexclosure-equivalent to the set of possibly optimal utility vectors.

Experiments We evaluate empirically the performance of the proposed variable elimination algorithms on random multi-objective inﬂuence diagrams with tradeoffs. The problem instances were generated using the random problem generator from (Marinescu, Razak, and Wilson 2012) using 2, 3 and 5 objectives, respectively. We consider algorithms MOVE(PO) and MOVE(POCSD), which compute sets of PO and

size w MOVE(PARETO) MOVE(CSD) MOVE(PO) (C,D,O) # time avg stdev med # time avg stdev med # time avg stdev med

K = 1; T = 0 (15,5,2) 9 9 139.58 2,714 2,864 1,673 100 0.1 22 93 1 100 0.21 3 6 1 (25,5,2) 11 7 18.25 2,344 2,614 269 97 35.87 75 296 1 100 0.29 4 7 1 (35,5,2) 14 2 283.29 9,115 8,934 9,024 96 52.34 105 537 1 100 15.09 29 122 1 (45,5,2) 16 2 397.72 7,596 7,536 7,566 90 133.68 159 561 1 100 9.40 13 31 1 (55,5,2) 18 4 717.68 10,422 5,851 14,896 94 98.68 136 537 1 97 45.19 8 21 1 K = 2; T = 1 (15,5,3) 9 6 10.69 4,889 4,069 5,830 98 25.85 224 824 1 98 30.37 52 232 1 (25,5,3) 11 2 4.42 4,000 3,938 3,969 84 216.88 582 2170 1 85 197.31 149 699 1 (35,5,3) 14 0 95 73.81 203 623 1 99 17.47 15 33 2 (45,5,3) 16 2 242.68 15,431 1,729 8,580 78 270.45 266 989 4 86 181.96 23 55 4 (55,5,3) 18 0 77 243.52 73 334 2 85 196.51 13 35 2 K = 6; T = 3 (15,5,5) 9 3 65.73 15,062 12,229 14,741 98 24.13 36 143 2 97 36.56 8 25 2 (25,5,5) 11 1 50.35 21,074 0 21,074 95 63.40 155 659 4 97 42.05 14 34 4 (35,5,5) 14 0 88 152.95 299 1499 2 92 118.76 29 92 2 (45,5,5) 16 0 79 217.52 230 996 2 88 150.42 23 47 3 (55,5,5) 18 0 82 245.84 191 569 2 93 107.75 22 44 3 K = 1; T = 0 (15,10,2) 12 5 91.82 6,856 8,280 3,175 88 154.35 304 975 1 89 182.40 207 848 1 (25,10,2) 17 1 808.94 4,964 0 4,964 85 220.18 15 49 1 91 138.48 26 149 1 (35,10,2) 20 0 61 510.36 154 721 1 82 338.56 24 65 1 (45,10,2) 22 0 20 816.32 287 653 6 22 783.43 10 17 3 (55,10,2) 26 0 7 1004.2 45 87 1 6 1028.83 16 18 7 K = 2; T = 1 (15,10,3) 12 0 99 12.52 17 113 2 99 15.09 3 5 2 (25,10,3) 17 0 49 627.47 146 936 1 54 576.97 10 27 1 (35,10,3) 20 0 30 823.64 521 1597 2 39 747.22 46 138 2 (45,10,3) 22 0 20 962.45 46 165 3 23 636.289 60 131 4 (55,10,3) 26 0 0 9 841.95 34 52 5 K = 6; T = 3 (15,10,5) 12 0 99 17.37 1340 868 3 97 42.83 19 107 2 (25,10,5) 17 0 65 415.73 222 1362 1 70 412.24 26 94 1 (35,10,5) 20 0 33 732.06 553 1137 16 42 728.14 40 62 17 (45,10,5) 22 0 19 468.08 7 14 1 22 705.79 3 3 1 (55,10,5) 26 0 0 1 1195.74 97 0 97

Table 1: Results comparing algorithms MOVE(PARETO), MOVE(CSD), and MOVE(PO). Time limit 20 minutes.

POCSD expected utility values. We also ran MOVE(CSD) for computing sets of CSD expected utility values. We generated consistent random tradeoffs between the objectives of a given problem instance using the approach from (Marinescu, Razak, and Wilson 2012). The random tradeoffs are mainly characterized by the parameters K and T, where K is the number of pairwise or binary tradeoffs and T is the number of 3-way tradeoffs, respectively. For reference, we also ran MOVE(PARETO) for computing Pareto optimal sets without any tradeoffs (Marinescu, Razak, and Wilson 2012). All algorithms were written in C++ and the experiments were run on a 2.6GHz processor with 4GB of RAM.

Comparison between the Pareto, CSDs and POs Table 1 compares cardinalities of the sets of Pareto, CSD and PO expected utility values for random inﬂuence diagrams with 2, 3 and 5 objectives, respectively. The tradeoff instances were generated with (K = 1, T = 0) for 2 objectives, (K = 2, T = 1) for 3 objectives, and (K = 6, T = 3) for 5 objectives, respectively. For each problem size (characterized by the number of chance variables C, number of decision variables D, number of objectives O), we generated 100 random instances. We report the average cardinality of these expected utility sets (together with the standard deviation and median), the CPU time in seconds, as well as the number of problem instances solved. All algorithms were allotted a 20 minute time limit per problem instance. An empty cell in the table denotes that the respective algorithm could not solve any of the problem instances. For computing CSDs we take the approach from (Marinescu, Razak, and Wilson 2012), where we replaced the faster dominance checks between the multi-objective utility values using the

Figure 2: Number of solved instances as a function of pairwise tradeoffs. Time limit 20 minutes.

matrix multiplication presented in (Marinescu, Razak, and Wilson 2013). We can see that the POs become much smaller than the corresponding CSDs, especially when the difﬁculty of the problems increases. For example, for problems with 5 objectives, 25 chance variables and 5 decision, POs are more than ten times smaller than CSDs. In addition, MOVE(PO) is able to solve slightly more problem instances than MOVE(CSD). Although applying the PO operator is relatively expensive, because of the repeated checking of the optimality condition using a linear programming solver, the algorithm for computing POs is faster than the algorithm for computing CSDs, which implies that the former prunes more effectively than the latter. We also observed that MOVE(PO) and MOVE(POCSD) performed very similarly, in many cases solving the same number of instances with exactly the same cardinality of the corresponding sets, and therefore, the computed set PO being a subset of the computed set CSD. It may well be the case that the randomly generated tradeoffs lead to the set PO almost always being a subset of CSD.

Impact of Increasing the Number of Tradeoffs Figure 2 shows the impact of the number of pairwise tradeoffs K for random inﬂuence diagram instances with 5 objectives, 25 chance variables and 5 decision variables. We ﬁxed the number of 3-way tradeoffs to T = 3. Results were taken over 100 problem instances with a 20 minute time limit per instance. We can see that, as K increases, the PO and POCSD algorithms solve more problems and consequently the running time decreases substantially.

Related Work Our notion of possible optimality is related to the convex coverage sets introduced by (Roijers, Whiteson, and Oliehoek 2013; 2014) in the context of multi-objective coordination graphs. The algorithms proposed therein are also based on variable elimination and therefore time and space exponential in the induced widths of the underlying graphs. More recently, (Benabbou and Perny 2015) proposed a multi-objective A* search approach for computing possibly optimal solutions in the context of multi-objective combinatorial optimization. An incremental elicitation mechanism is also embedded into the search so as to reduce

the set of admissible scenarios until a nearly-optimal solution is found. For soft constraint problems with missing preferences, (Gelain et al. 2010) describes a branch and bound search algorithm augmented with a preference elicitation scheme for computing possibly optimal solutions. Similarly, in computational social choice (Xia and Conitzer 2011) presents methods to determine possible and necessary winners when incomplete preferences are available. The work that is closest in spirit with ours is that by (Roijers, Whiteson, and Oliehoek 2015) who introduce the notion of optimistic linear support for multi-objective POMDPs. The basic idea is to solve a series of scalarized single-objective POMDPs, each corresponding to a different weighting of the objectives. The output is a set of policies that are optimal for some weighting (or scenario), which is very similar to our approach. However, their solving method is an iterative one which looks at different weightings one at a time (reusing results from previous iterations), whereas ours is based on dynamic programming and uses a linear programming formulation for checking possibly optimal solutions.

Conclusion We introduced a variable elimination algorithm for computing the set of OPT-optimal values of expected utility (and the corresponding decision policies) for multi-objective inﬂuence diagrams with tradeoffs, where OPT is an optimality operator. We also gave the key formal properties and proved the correctness of the computation. Our experimental results indicate that computing PO and POCSD sets of expected utility values is generally faster and can scale up to more difﬁcult problems involving more decisions and objectives. Moreover, the PO/POCSD sets are often considerably smaller than the corresponding CSD ones, thus yielding fewer policies. This is important because the decision maker can be presented with the PO/POCSD policies ﬁrst before making any decision or eliciting more tradeoffs.

Acknowledgements This work was supported in part by the Science Foundation Ireland grant SFI/12/RC/2289. We are grateful to the reviewers for their helpful comments.

Aizerman, M., and Malishevski, A. 1981. General theory of best variants choice: Some aspects. IEEE Transactions on Automatic Control 26:1030 1040. Benabbou, N., and Perny, P. 2015. Incremental weight elicitation for multiobjective state space search. In Proc. AAAI-2015, 1093 1099. Dechter, R. 2000. A new perspective on algorithms for optimizing policies under uncertainty. In Artiﬁcial Intelligence Planning Systems (AIPS), 72 81. Ehrgott, M. 1999. Multicriteria optimization. Springer. Figueira, J.; Greco, S.; and Ehrgott, M. 2005. Multiple Criteria Decision Analysis State of the Art Surveys. Springer International Series in Operations Research and Management Science Volume 76.

Gelain, M.; Pini, M. S.; Rossi, F.; Venable, K. B.; and Walsh, T. 2010. Elicitation strategies for soft constraint problems with missing preferences: Properties, algorithms and experimental studies. Artif. Intell. 174(3-4):270 294. Howard, R., and Matheson, J. 1984. Inﬂuence diagrams. In Readings on the Principles and Applications of Decision Analyis, 721 762. Jensen, F.; Jensen, V.; and Dittmer, S. 1994. From inﬂuence diagrams to junction trees. In Proc. UAI-94, 367 363. Keeney, R., and Raiffa, H. 1993. Decisions with multiple objectives: preferences and value tradeoffs. Cambridge University Press. Marinescu, R.; Razak, A.; and Wilson, N. 2012. Multi-objective inﬂuence diagrams. In Proc. UAI-2012, 574 583. Marinescu, R.; Razak, A.; and Wilson, N. 2013. Multi-objective constraint optimization with tradeoffs. In Proc. CP-2013, 497 512. Marinescu, R.; Razak, A.; and Wilson, N. 2017. Multiobjective Inﬂuence Diagrams with Possibly Optimal Policies (extended version available online). http://ucc.insightcentre.org/nwilson/POIDAAAI17longer.pdf. Moulin, H. 1985. Choice functions over a ﬁnite set: a summary. Social Choice and Welfare 2(2):147 160. Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann. Plott, C. R. 1973. Path independence, rationality, and social choice. Econometrica 41:1075 1091. Raiffa, H. 1968. Decision analysis. Addison-Wesley. Roijers, D.; Whiteson, S.; and Oliehoek, F. 2013. Computing convex coverage sets for multi-objective coordination graphs. In Proc. ADT-2013, 309 323. Roijers, D.; Whiteson, S.; and Oliehoek, F. 2014. Linear support for multi-objective coordination graphs. In Proc. AAMAS-2014, 1297 1304. Roijers, D.; Whiteson, S.; and Oliehoek, F. 2015. Point-based planning for multi-objective POMDPs. In Proc. IJCAI-2015, 1666 1672. Roy, B. 1996. Multicriteria methodology for decision analysis. Kluwer Academic Press. White, C.; Sage, A. P.; and Dozono, S. 1984. A model of multiattribute decision-making and trade-off weight determination under uncertainty. IEEE Transactions on Systems, Man, and Cybernetics 14(2):223 229. Wilson, N., and Marinescu, R. 2012. An axiomatic framework for inﬂuence diagram computation with partially ordered utilities. In Proc. KR-2012, 210 220. Wilson, N., and O Mahony, C. 2011. The relationships between qualitative notions of optimality for decision making under logical uncertainty. In Proc. AICS-2011. Wilson, N.; Razak, A.; and Marinescu, R. 2015a. Computing possibly optimal solutions for multi-objective constraint optimisation with tradeoffs. In Proc. IJCAI-2015. Wilson, N.; Razak, A.; and Marinescu, R. 2015b. Computing Possibly Optimal Solutions for Multi-Objective Constraint Optimisation with Tradeoffs (extended version of IJCAI-15 paper available online). http://ucc.insightcentre.org/nwilson/POIJCAI2015longer.pdf. Xia, L., and Conitzer, V. 2011. Determining possible and necessary winners under common voting rules given partial orders. Journal of Artiﬁcial Intelligence Research 41(1):25 67.