# decomposition_strategies_for_constructive_preference_elicitation__4f26d7a9.pdf

Decomposition Strategies for Constructive Preference Elicitation

Paolo Dragone

University of Trento, Italy TIM-SKIL, Trento, Italy paolo.dragone@unitn.it

Stefano Teso

KU Leuven, Belgium stefano.teso@cs.kuleuven.be

Mohit Kumar

KU Leuven, Belgium mohit.kumar@cs.kuleuven.be

Andrea Passerini University of Trento, Italy andrea.passerini@unitn.it

We tackle the problem of constructive preference elicitation, that is the problem of learning user preferences over very large decision problems, involving a combinatorial space of possible outcomes. In this setting, the suggested conﬁguration is synthesized on-the-ﬂy by solving a constrained optimization problem, while the preferences are learned iteratively by interacting with the user. Previous work has shown that Coactive Learning is a suitable method for learning user preferences in constructive scenarios. In Coactive Learning the user provides feedback to the algorithm in the form of an improvement to a suggested conﬁguration. When the problem involves many decision variables and constraints, this type of interaction poses a signiﬁcant cognitive burden on the user. We propose a decomposition technique for large preferencebased decision problems relying exclusively on inference and feedback over partial conﬁgurations. This has the clear advantage of drastically reducing the user cognitive load. Additionally, part-wise inference can be (up to exponentially) less computationally demanding than inference over full conﬁgurations. We discuss the theoretical implications of working with parts and present promising empirical results on one synthetic and two realistic constructive problems.

Introduction

In constructive preference elicitation (CPE) the recommender aims at suggesting a custom or novel product to a customer (Teso, Passerini, and Viappiani 2016). The product is assembled on-the-ﬂy from components or synthesized anew by solving a combinatorial optimization problem. The suggested products should of course satisfy the customer s preferences, which however are unobserved and must be learned interactively (Pigozzi, Tsouki as, and Viappiani 2016). Learning proceeds iteratively: the learner presents one or more candidate recommendations to the customer, and employs the obtained feedback to estimate the

PD is a fellow of TIM-SKIL Trento and is supported by a TIM scholarship. ST and MK are supported by the ERC Advanced Grant SYNTH Synthesising inductive data models. Copyright c 2018, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

customer s preferences. Applications include recommending custom PCs or cars, suggesting touristic travel plans, designing room and building layouts, and producing recipe modiﬁcations, among others.

A major weakness of existing CPE methods (Teso, Passerini, and Viappiani 2016; Teso, Dragone, and Passerini 2017) is that they require the user to provide feedback on complete conﬁgurations. In real-world constructive problems such as trip planning and layout design, conﬁgurations can be large and complex. When asked to evaluate or manipulate a complex product, the user may become overwhelmed and confused, compromising the reliability of the obtained feedback (Mayer and Moreno 2003). Human decision makers can revert to a potentially uninformative prior when problem solving exceeds their available resources. This effect was observed in users tasked with solving simple SAT instances (three variables and eight clauses) (Ortega and Stocker 2016). In comparison, even simple constructive problems can involve tens of categorical variables and features, in addition to hard feasibility constraints. On the computational side, working with complete conﬁgurations poses scalability problems as well. The reason is that, in order to select recommendations and queries, constructive recommenders employ constraint optimization techniques. Clearly, optimization of complete conﬁgurations in large constructive problems can become computationally impractical as the problem size increases.

Here we propose to exploit factorized utility functions (Braziunas and Boutilier 2009), which occur very naturally in constructive problems, to work with partial conﬁgurations. In particular, we show how to generalize Coactive Learning (CL) (Shivaswamy and Joachims 2015) to partwise inference and learning. CL is a simple, theoretically grounded algorithm for online learning and preference elicitation. It employs a very natural interaction protocol: at each iteration the user is presented with a single, appropriately chosen candidate conﬁguration and asked to improve it (even slightly). In (Teso, Dragone, and Passerini 2017), it was shown that CL can be lifted to constructive problems by combining it with a constraint optimization solver to efﬁciently select the candidate recommendation. Notably, the theoretical guarantees of CL remain intact in the construc-

The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)

tive case. Our part-wise generalization of CL, dubbed PCL, solves the two aforementioned problems in one go: (i) by presenting the user with partial conﬁgurations, it is possible to (substantially) lessen her cognitive load, improving the reliability of the feedback and enabling learning in larger constructive tasks; (ii) in combinatorial constructive problems, performing inference on partial conﬁgurations can be exponentially faster than on complete ones. Further, despite being limited to working with partial conﬁgurations, PCL can be shown to still provide local optimality guarantees in theory, and to perform well in practice. This paper is structured as follows. In the next section we overview the relevant literature. We present PCL in the Method section, followed by a theoretical analysis. The performance of PCL are then illustrated empirically on one synthetic and two realistic constructive problems. We close the paper with some concluding remarks.

Related Work Generalized additive independent (GAI) utilities have been thoroughly explored in the decision making literature (Fishburn 1967). They deﬁne a clear factorization mechanism, and offer a good trade off between expressiveness and ease of elicitation (Chajewska, Koller, and Parr 2000; Gonzales and Perny 2004; Braziunas and Boutilier 2009). Most of the early work on GAI utility elicitation is based on graphical models, e.g. UCP and GAI networks (Gonzales and Perny 2004; Boutilier, Bacchus, and Brafman 2001). These approaches aim at eliciting the full utility function and rely on the comparison of full outcomes. Both of these are infeasible when the utility involves many attributes and features, as in realistic constructive problems. Like our method, more recent alternatives (Braziunas and Boutilier 2005; 2007) handle both partial elicitation, i.e. the ability of providing recommendations without full utility information, and local queries, i.e. elicitation of preference information by comparing only (mostly) partial outcomes. There exist both Bayesian (Braziunas and Boutilier 2005) and regret-based (Braziunas and Boutilier 2007; Boutilier et al. 2006) approaches, which have different shortcomings. Bayesian methods do not scale to even small size constructive problems (Teso, Passerini, and Viappiani 2016), such as those occurring when reasoning over individual parts in constructive settings. On the other hand, regret-based methods require the user feedback to be strictly self-consistent, an unrealistic assumption when interacting with non-experts. Our approach instead is speciﬁcally designed to scale to larger constructive problems and, being derived from Coactive Learning, natively handles inconsistent feedback. Crucially, unlike PCL, these local elicitation methods also require to perform a number of queries over complete conﬁgurations to calibrate the learned utility function. In larger constructive domains this is both impractical (on the user side) and computationally infeasible (on the learner side). Our work is based on Coactive Learning (CL) (Shivaswamy and Joachims 2015), a framework for learning utility functions over structured domains, which has been successfully applied to CPE (Teso, Dragone, and Passerini

2017; Dragone et al. 2016). When applied to constructive problems, a crucial limitation of CL is that the learner and the user interact by exchanging complete conﬁgurations. Alas, inferring a full conﬁguration in a constructive problem can be computationally demanding, thus preventing the elicitation procedure from being real-time. This can be partially addressed by performing approximate inference, as in (Raman, Shivaswamy, and Joachims 2012), at the cost of weaker learning guarantees. A different approach has been taken in (Goetschalckx, Fern, and Tadepalli 2014), where the exchanged (complete) conﬁgurations are only required to be locally optimal, for improved efﬁciency. Like PCL, this method guarantees the local optimality of the recommended conﬁguration. All of the previous approaches, however, require the user to improve a potentially large complete conﬁguration. This is a cognitively demanding task which can become prohibitive in large constructive problems, even for domain experts, thus hindering feedback quality and effective elicitation. By dealing with parts only, PCL avoids this issue entirely.

Method Notation. We use rather standard notation: scalars x are written in italics and column vectors x in bold. The inner product of two vectors is indicated as a, b = i aibi, the Euclidean norm as a =

a, a and the max-norm as a = maxi ai. We abbreviate {1, . . . , n} to [n], and indicate the complement of I [n] as I := [n] \ I.

Setting. Let X be the set of candidate structured products. Contrarily to what happens in standard preference elicitation, in the constructive case X is deﬁned by a set of hard constraints rather than explicitly enumerated1. Products are represented by a function φ( ) that maps them to an mdimensional feature space. While the feature map can be arbitrary, in practice we will stick to features that can be encoded as constraints in a mixed-integer linear programming problem, for efﬁciency; see the Empirical Analysis section for details. We only assume that the features are bounded, i.e. x X it holds that φ(x) D for some ﬁxed D. As is common in multi-attribute decision theory (Keeney and Raiffa 1976), we assume the desirability of a product x to be given by a utility function u : X R that is linear in the features, i.e., u (x) := w , φ(x) . Here the weights w Rm encode the true user preferences, and may be positive, negative, or zero (which means that the corresponding feature is irrelevant for the user). Utilities of this kind can naturally express rich constructive problems (Teso, Passerini, and Viappiani 2016; Teso, Dragone, and Passerini 2017).

Parts. Here we formalize what parts and partial conﬁgurations are, and how they can be manipulated. We assume to

1In this paper, hard constraints refer to the constraints delimiting the space of feasible conﬁgurations, as opposed to soft constraints, which determine preferences over feasible conﬁgurations (Meseguer, Rossi, and Schiex 2006).

be given a set of n basic parts p P. A part is any subset P P of the set of basic parts. Given a part P and an object x, x P XP indicates the partial conﬁguration corresponding to P. We require that the union of the basic parts reconstructs the whole object, i.e. x P = x for all x X. The proper semantics of the decomposition into basic parts is task-speciﬁc. For instance, in a scheduling problem a month may be decomposed into days, while in interior design a house may be decomposed into rooms. Analogously, the non-basic parts could then be weeks or ﬂoors, respectively. In general, any combination of basic parts is allowed. We capture the notion of combination of partial conﬁgurations with the part combination operator : XP XQ XP Q, so that x P x Q = x P Q. We denote the complement of part P as P = P \P, which satisﬁes x = x P x P for all x X. Each basic part p P is associated to a feature subset Ip [m], which contains all those features that depend on p (and only those). In general, the sets Ip may overlap, but we do require each basic part p to be associated to some features that do not depend on any other basic part q, i.e. that Ip \

q =p Iq = for all p P. The features associated to a part P P are deﬁned as p P Ip. Since the union of the basic parts makes up the full object, we also have that

p P Ip = [m].

GAI utility decomposition. In the previous section we introduced a decomposition of conﬁgurations into parts. In order to elicit the user preferences via part-wise interaction, which is our ultimate goal, we need to decompose the utility function as well. Given a part P and its feature subset IP , let its partial utility be:

u[IP ](x) := w IP , φIP (x) =

i IP wiφi(x) (1)

If the basic parts have no shared features, the utility function is additive: it is easy to verify that u(x) = p P u[Ip](x). In this case, each part can be managed independently of the others, and the overall conﬁguration maximizing the utility can be obtained by separately maximizing each partial utility and combining the resulting part-wise conﬁgurations. However, in many applications of interest the feature subsets do overlap. In a travel plan, for instance, one can be interested in alternating cultural and leisure activities in consecutive days, in order to make the experience more diverse and enjoyable. In this case, the above decomposition does not apply anymore as the basic parts may depend on each other through the shared features. Nonetheless, it can be shown that our utility function is generalized additive independent (GAI) over the feature subsets Ip of the basic parts. Formally, a utility u(x) is GAI if and only if, given n feature subsets I1, . . . , In [m], it can be decomposed into n independent subutilities (Braziunas and Boutilier 2005):

u(x) = u I1(x) + + u In(x) (2)

where each subutility u Ik can only depend on the features in Ik (but does not need to depend on all of them). This decomposition enables applying ideas from the GAI literature

Algorithm 1 An example of ordering selection procedure using a GAI network (Gonzales and Perny 2004).

1: procedure SELECTORDERING(P) 2: Build a GAI network G from Ip, p P 3: Produce sequence p1, . . . , pn by sorting the nodes 4: in G in ascending order of node degree 5: return p1, . . . , pn

to produce a well-deﬁned part-wise elicitation protocol. Intuitively, we will assign features to subutilities so that whenever a feature is shared by multiple parts, only the subutility corresponding to one of them will depend on that feature. We will now construct a suitable decomposition of u(x) into n independent subutilities. Fix some order of the basic parts p1, . . . , pn, and let:

Jk := Ik \ ( n j=k+1 Ij) (3)

for all k [n]. We deﬁne the subutilities as u Ik(x) := u[Jk](x) for all k [n]. By summing up the subutilities for all parts, we obtain a utility where each feature is computed exactly once, thus recovering the full utility u(x):

u(x) = n k=1 u Ik(x) = n k=1 u[Ik \ n j=k+1 Ij](x)

i IP wiφi(x)

The GAI decomposition allows to elicit each subutility u Ik separately. By doing so, however, we end up ignoring some of the dependencies between parts, namely the features in Ik \ Jk. This is the price to pay in order to achieve decomposition and partwise elicitation, and it may lead to suboptimal solutions if too many dependencies are ignored. It is therefore important to minimize the broken dependencies by an appropriate ordering of the parts. Going back to the travel planning with diversifying features example, consider a multi-day trip. Here the parts may refer to individual days, and Ik includes all features of day k, including the features relating it to the other days, e.g. the alternation of cultural and leisure activities. Note that the Ik s overlap. On the other hand, the Jk s are subset of features chosen so that every feature only appears once. A diversifying feature relating days 3 and 4 of the trip is either assigned to J3 or J4, but not both. One way to control the ignored dependencies is by leveraging GAI networks (Gonzales and Perny 2004). A GAI network is a graph whose nodes represent the subsets Ik and whose edges connect nodes sharing at least one feature. Algorithm 1 presents a simple and effective solution to provide an ordering. It builds a GAI network from P and sorts the basic parts in ascending order of node degree (number of incoming and outgoing edges). By ordering last the subsets having intersections with many other parts, this ordering attempts to minimize the lost dependencies in the above decomposition (Eq. 3). This is one possible way to order the parts, which we use as an example; more informed or taskspeciﬁc approaches could be devised.

The PCL algorithm. The pseudocode of our algorithm, PCL, is listed in Algorithm 2. PCL starts off by sorting the

Algorithm 2 The PCL algorithm.

1: procedure PCL(P, T) 2: p1, . . . , pn SELECTORDERING(P) 3: w1 0, x1 initial conﬁguration 4: for t = 1, . . . , T do 5: pt SELECTPART(P) 6: xt pt argmaxxpt Xpt ut[Jt](xpt xt

pt 8: User provides improvement ˆxt pt of xt pt, keeping xt

9: if ut[It](ˆxt pt xt

pt) ut[It](xt pt xt

pt) 0 then Qt It else Qt Jt ;

10: wt+1 Qt wt Qt 11: wt+1 Qt wt Qt + φQt(ˆxt pt xt

pt) φQt(xt pt xt

basic parts, producing an ordering p1, . . . , pn. Algorithm 1 could be employed or any other (e.g. task-speciﬁc) sorting solution. Then it loops for T iterations, maintaining an estimate wt of the user weights as well as a complete conﬁguration xt. The starting conﬁguration x1 should be a reasonable initial guess, depending on the task. At each iteration t [T], the algorithm selects a part pt using the procedure SELECTPART (see below). Then it updates the object xt by inferring a new partial conﬁguration xt pt while keeping the rest of xt ﬁxed, that is xt

pt . The inferred partial conﬁguration xt pt is optimal with respect to the local subutility ut It( ) given xt 1

pt . Note that inference is over the partial conﬁguration xt pt only, and therefore can be exponentially faster than inference over full conﬁgurations. Next, the algorithm presents the inferred partial conﬁguration xt pt as well as some contextual information (see below). The user is asked to produce an improved partial conﬁguration ˆxt pt according to the her own preferences, while the rest of the object xt

pt is kept ﬁxed. We assume that a user is satisﬁed with a partial conﬁguration xt pt if she cannot improve it further, or equivalently when the object xt is conditionally optimal with respect to part pt given the rest of the object xt

pt (the formal deﬁnition of conditional optimality is given in the Analysis section). When a user is satisﬁed with a partial conﬁguration, she returns ˆxt pt = xt pt, thereby implying no change in the weights wt+1. After receiving an improvement, if the user is not satisﬁed, the weights are updated through a perceptron step. The subset Qt of weights that are actually updated depends on whether ut[It](ˆxt pt xt

pt) ut[It](xt pt xt

pt) is negative or (strictly) positive. Since we perform inference on ut[Jt]( ), we have that ut[Jt](xt pt xt

pt) ut[Jt](xt pt xt

pt) 0. The user improvement can, however, potentially change all the features in It. Intuitively, the weights associated to a subset of features should change only if the utility computed on this subset ranks ˆxt pt lower than xt pt. The algorithm therefore checks whether ut[It](xt pt xt

pt) ut[It](xt pt xt

pt), in which case the weights associated to the whole subset It should be updated. If this condition is not met, instead, the

algorithm can only safely update the weights associated to Jt, which, as said, meet this condition by construction. As for the SELECTPART procedure, we experimented with several alternative implementations, including prioritizing parts with a large feature overlap (|Ik\Jk|) and banditbased strategies aimed at predicting a surrogate of the utility gain (namely, a variant of the UCB1 algorithm (Auer, Cesa Bianchi, and Fischer 2002)). Preliminary experiments have shown that informed strategies do not yield a signiﬁcant performance improvement over the random selection stategy; hence we stick with the latter in all our experiments. The algorithm stops either when the maximum number of iterations T is met or when a local optimum has been found. For ease of exposition we left out the latter case from Algorithm 2, but we explain what a local optimum is in the following Analysis section; the stopping criterion will follow directly from Proposition 1.

Interacting through parts. In order for the user to judge the quality of a suggested partial conﬁguration xp, some contextual information may have to be provided. The reason is that, if p depends on other parts via shared features, these have to be somehow communicated to the user, otherwise his/her improvement will not be sufﬁciently informed. We distinguish two cases, depending on whether the features of p are local or global. Local features only depend on small, localized portions of x. This is for instance the case for features that measure the diversity of consecutive activities in a touristic trip plan, which depend on consecutive time slots or days only. Here the context amounts to those other portions of x that share local features with p. For instance, the user may interact over individual days only. If the features are local, the context is simply the time slots before and after the selected day. The user is free to modify the activities scheduled that day based on the context, which is kept ﬁxed. On the other hand, global features depend on all of x (or large chunks of it). For instance, in house furnishing one may have features that measure whether the total cost of the furniture is within a given budget, or how much the cost surpasses the budget. A naive solution would be that of showing the user the whole furniture arrangement x, which can

be troublesome when x is large. A better alternative is to present the user a summary of the global features, in this case the percentage of the used budget. Such a summary would be sufﬁcient for producing an informed improvement, independently from the actual size of x. Of course, the best choice of context format is application speciﬁc. We only note that, while crucial, the context only provides auxiliary information to the user, and does not affect the learning algorithm directly.

Analysis In preference elicitation, it is common to measure the quality of a recommended (full) conﬁguration in terms of the regret:

REG(x) := maxˆx X u (ˆx) u (x) = w , φ(x ) φ(x)

where u ( ) is the true, unobserved user utility and x = argmaxx X u (x) is a truly optimal conﬁguration. In PCL, interaction with the user occurs via partial conﬁgurations, namely xt pt and ˆxt pt. Since the regret is deﬁned in terms of complete conﬁgurations, it is difﬁcult to analyze it directly based on information about the partial conﬁgurations alone, making it hard to prove convergence to globally optimal recommendations. The aim of this analysis is, however, to show that our algorithm converges to a locally optimal conﬁguration, which is in line with guarantees offered by other Coactive Learning variants (Goetschalckx, Fern, and Tadepalli 2014); the latter, however still rely on interaction via complete conﬁgurations. Here a conﬁguration x is a local optimum for u ( ) if no part-wise modiﬁcation can improve x with respect to u ( ). Formally, x is a local optimum for u ( ) if and only if:

p P x p Xp u (x p xp) > u (xp xp) (4)

To measure local quality of a conﬁguration x with respect to a part p, we introduce the concept of conditional regret of the partial conﬁguration xp given the rest of the object xp:

CREGp (x) := u (x p xp) u (xp xp)

where x p = argmaxˆxp Xp u (ˆxp xp). Notice that:

CREGp (x) = u [Ip](x p xp) u [Ip](xp xp)

since u [ Ip](x p xp) u [ Ip](xp xp) = 0. We say that a partial conﬁguration x is conditionally optimal with respect to part p if CREGp (x) = 0. The following lemma gives sufﬁcient and necessary conditions for local optimality of a conﬁguration x.

Lemma 1. A conﬁguration x is locally optimal with respect to u ( ) if and only if x is conditionally optimal for u ( ) with respect to all basic parts p P.

Proof. By contradiction. (i) Assume that x is locally optimal but not conditionally optimal with respect to p P. Then CREGp (x) > 0, and thus there exists a partial conﬁguration x p such that u (x p x p) > u (xp xx p). This violates the local optimality of x (Eq. 4). (ii) Assume that all partial conﬁgurations xp p P are conditionally optimal but x

is not locally optimal. Then there exists a part q P and a partial conﬁguration x q such that u (x q x q) > u (xq x q). This in turn means that CREGq (x) > 0. This violates the conditional optimality of x with respect to q.

The above lemma gives us a part-wise measurable criterion to determine if a conﬁguration x is a local optimum through the conditional regret of x for all the provided parts. The rest of the analysis is devoted to derive an upper bound on the conditional regret incurred by the algorithm and to prove that PCL eventually reaches a local optimum. In order to derive the bound, we rely on the concept of α-informativeness from (Shivaswamy and Joachims 2015), adapting it to part-wise interaction2. A user is conditionally α-informative if, when presented with a partial conﬁguration xt pt, he/she provides a partial conﬁguration ˆxt pt that is at least some fraction α (0, 1] better than xt pt in terms of conditional regret, or more formally:

u [It](ˆxt pt xt

pt) u [It](xt pt xt

α u [It](x pt xt

pt) u [It](xt pt xt

In the rest of the paper we will use the notation u[It](ˆxt) meaning u[It](ˆxt pt xt

pt), i.e. drop the complement, when no ambiguity can arise. At all iterations t, the algorithm updates the weights speciﬁed by Qt, producing a new estimate wt+1 of w . The actual indices Qt depend on the condition at line 9 of Algorithm 2: at some iterations Qt includes all of It, while at others Qt is restricted to Jt. We distinguish between these cases by:

I = {t [T] : ut[It](ˆxt) ut[It](xt) 0}

J = {t [T] : ut[It](ˆxt) ut[It](xt) > 0}

so that if t I then Qt = It, and Qt = Jt if t J . For all t [T], the quality of wt+1 is:

w , wt+1 = w Qt, wt+1 Qt + w Qt, wt+1 Qt

= w Qt, wt Qt + w Qt, wt Qt

+ w Qt, φQt(ˆxt) φQt(xt)

= w , wt + w Qt, φQt(ˆxt) φQt(xt)

= w , wt + u [Qt](ˆxt) u [Qt](xt)

Therefore if the second summand in the last equation, the utility gain u [Qt](ˆxt) u [Qt](xt), is positive, the update produces a better weight estimate wt+1. Since the user is conditionally α-informative, the improvement ˆxt always satisﬁes u [It](ˆxt) u [It](xt) 0. When t I, we have Qt = It, and thus the utility gain is guaranteed to be positive. On the other hand, when t J we have Qt = Jt and the utility gain reduces to u [Jt](ˆxt) u [Jt](xt). In this case the update

2Here we adopted the deﬁnition of strict α-informativeness for simplicity. Our results can be directly extended to the more general notions of informativeness described in (Shivaswamy and Joachims 2015).

ignores the weights in It \ Jt, missing out a factor of u [It \ Jt](ˆxt) u [It \ Jt](xt). We compactly quantify the missing utility gain as:

ζt = 0 if t I u [It \ Jt](ˆxt) u [It \ Jt](xt) if t J

Note that ζt can be positive, null or negative for t J . When ζt is negative, making the update on Jt only actually avoids a loss in utility gain. We now prove that PCL minimizes the average conditional regret CREGT := 1 T T t=1 CREGpt (xt) as T for conditionally α-informative users.

Theorem 1. For a conditionally α-informative user, the average conditional regret of PCL after T iterations is upper bounded by:

t=1 CREGpt xt 2DS w

Proof. The following is a sketch3. We start by splitting the iterations into the I and J sets deﬁned above, and bound the norm w T +1 2. In both cases we ﬁnd that w T +1 2 4D2S2T. We then expand the term w , w T +1 for iterations in both I and J , obtaining:

w , w T +1 =

t I w It, φIt(ˆxt) φIt(xt)

t J w Jt, φJt(ˆxt) φJt(xt)

With few algebraic manipulations we obtain:

t=1 w It, φIt(ˆxt) φIt(xt)

Which we then bound using the Cauchy-Schwarz inequality:

t=1 w It, φIt(ˆxt) φIt(xt)

Applying the conditional α-informative feedback (Eq. 5) and rearranging proves the claim.

Theorem 1 ensures that the average conditional regret suffered by our algorithm decreases as O(1/

T). This alone, however, does not prove that the algorithm will eventually

3The complete proof can be found in the Ar Xi V version of the paper

arrive at a local optimum, even if T t=τ0 CREGpt (xt) = 0, for some τ0 [T]. This is due to the fact that partial inference is performed keeping the rest of the object xt

pt ﬁxed. Between iterations an inferred part may change as a result of a change of the other parts in previous iterations. The object could, in principle, keep changing at every iterations, even if CREGpt (xt) is always equal to 0. The next proposition, however, shows that this is not the case thanks to the utility decomposition we employ. Proposition 1. Let τ1 < . . . < τn < ˆτ1 < . . . < ˆτn T such that pτk = pˆτk = pk and CREGpt (xt) = 0 for all t τ1. The conﬁguration x T is a local optimum.

Proof. Sketch. The proof procedes by strong induction. We ﬁrst show that xt p1 = xτ1 p1 for all t > τ1, as ut I1( ) only depends on the features in J1 and by assumption CREGpt (xt) = 0 for all t τ1. By strong induction, assuming that xt pj = xτj pj for all j = 1, . . . , k 1 and all t > τk 1, we can easily show that xt pk = xτk pk as well. Now, xt pk = xτk pk for all t > τn, therefore xˆτk = xτn for all k [n]. Since CREGpk xˆτk = 0 for all k [n] by assumption, then x T = xˆτn = xτn is a local optimum and will not change for all t > τn.

The algorithm actually reaches a local optimum at t = τn, but it needs to double check all the parts in order to be sure that the conﬁguration is actually a local optimum. This justiﬁes a termination criterion that we use in practice: if the algorithm completes two full runs over all the parts, and the user can never improve any of the recommended partial conﬁgurations, then the full conﬁguration is guaranteed to be a local optimum, and the algorithm can stop. As mentioned, we employ this criterion in our implementation but we left it out from Algorithm 2 for simplicity. To recap, Theorem 1 guarantees that CREGT 0 as t , therefore CREGpt (xt) approaches 0 as well. Combining this fact with Proposition 1, we proved that our algorithm approaches a local optimum for T .

Empirical Analysis We ran PCL on three constructive preference elicitation tasks of increasing complexity, comparing different degrees of user informativeness. According to our experiments, informativeness is the most critical factor. The three problems involve rather large conﬁgurations, which can not be handled by coactive interaction via complete conﬁgurations. For instance, in (Ortega and Stocker 2016) the user is tasked to solve relatively simple SAT instances over three variables and (at most) eight clauses; in some cases users were observed to show signs of cognitive overload. In comparison, our simplest realistic problem involve 35 categorical variables (with 8 possible values) and 74 features, plus additional hard constraints. As a consequence, Coactive Learning can not be applied as-is, and part-wise interaction is necessary. In all of these settings, part-wise inference is cast as a mixed integer linear problem (MILP), and solved

with Gecode4. Despite being NP-hard in general, MILP solvers can be very efﬁcient on practical instances. Efﬁciency is further improved by inferring only partial conﬁgurations. Our experimental setup is available at https://github.com/unitn-sml/pcl. We employed a user simulation protocol similar to that of (Teso, Dragone, and Passerini 2017). First, for each problem, we sampled 20 vectors w at random from a standard normal distribution. Then, upon receiving a recommendation xt pt, an improvement ˆxt pt is generated by solving the following problem:

argmin xpt Xpt u [It](xpt xt

s.t. u [It](xpt xt

pt) u [It](xt pt xt

α(u [It](x pt xt

pt) u [It](xt pt xt

This formulation clearly satisﬁes the conditional αinformativeness assumption (Eq. 5).

Synthetic setting. We designed a simple synthetic problem inspired by spin glass models, see Figure 1 for a depiction. In this setting, a conﬁguration x consists of a 4 4 grid. Each node in the grid is a binary 0-1 variable. Adjacent nodes are connected by an edge, and each edge is associated to an indicator feature that evaluates to 1 if the incident nodes have different values (green in the ﬁgure), and to 1 otherwise (red in the ﬁgure). The utility of a conﬁguration is simply the weighted sum of the values of all features (edges). The basic parts p consist of all the non-overlapping 2 2 sub-grids of x, for a total of 4 basic parts (indicated by dotted lines in the ﬁgure).

Figure 1: Example grid conﬁguration.

Since the problem is small enough for inference of complete conﬁgurations to be practical, we compared PCL to standard Coactive Learning, using the implementation of (Teso, Dragone, and Passerini 2017). In order to keep the comparison as fair as possible, the improvements fed to CL were chosen to match the utility gain obtained by PCL. We further report the performance of three alternative part selection strategies: random, smallest (most independent) part ﬁrst, and UCB1.

4http://www.gecode.org/

The results can be found in the ﬁrst column of Figure 2. We report both the regret (over complete conﬁgurations) and the cumulative runtime of all algorithms, averaged over all users, as well as their standard deviation. The regret plot shows that, despite being restricted to work with 2 2 conﬁgurations, PCL does recommend complete conﬁgurations of quality comparable to CL after enough queries are made. Out of the three part selection strategies, random performs best, with the other two more informed alternatives (especially smallest ﬁrst) quite close. The runtime gap between full and part-wise inference is already clear in this small synthetic problem; complete inference quickly becomes impractical as the problem size increases.

Training planning. Generating personalized training plans based on performance and health monitoring has received a lot of attention recently in sport analytics (see e.g. (Fister et al. 2015)). Here we consider the problem of synthesizing a week-long training plan x from information about the target athlete. Each day includes 5 time slots (two for the morning, two for the afternoon, one for the evening), for 35 slots total. We assume to be given a ﬁxed number of training activities (7 in our experiments: walking, running, swimming, weight lifting, pushups, squats, abs), as well as knowledge of the slots in which the athlete is available. The training plan x associates an activity to each slot where the athlete is available. Our formulation tracks the amount of improvement (e.g. power increase) and fatigue over ﬁve different body parts (arms, torso, back, legs, and heart) induced by performing an activity for one time slot. Each day deﬁnes a basic part. The mapping between training activity and improvement/fatigue over each body part is assumed to be provided externally. It can be provided by the athlete or medical personnel monitoring his/her status. The features of x include, for each body part, the total performance gain and fatigue, computed over the recommended training plan according to the aforementioned mapping. We further include inter-part features to capture activity diversity in consecutive days. The fatigue accumulated in 3 consecutive time slots in any body parts does not exceed a given threshold, to prevent injuries. In this setting, CL is impractical from both the cognitive and computational points of view. We ran PCL and evaluated the impact of user informativeness by progressively increasing α from 0.1, to 0.3, to 0.5. The results can be seen in Figure 2. The plots show clearly that, despite the complexity of the conﬁguration and constraints, PCL can still produce very low-regret conﬁgurations after about 50 iterations or less. Understandably, the degree of improvement α plays an important role in the performance of PCL and, consequently, in its runtime (users at convergence do not contribute to the runtime), at least up to α = 0.5. Recall, however, that the improvements are part-wise, and hence α quantiﬁes the degree of local improvement: part improvements may be very informative on their own, but only give a modest amount of information about the full conﬁguration. However, it is not unreasonable to expect that users to be very informative when presented with reasonably sized (and simple) parts.

Figure 2: Regret over complete conﬁgurations (top) and cumulative runtime (bottom) of PCL and CL on our three constructive problems: synthetic (left), training planning (middle), and hotel planning (right). The x-axis is the number of iterations, while the shaded areas represent the standard deviation. Best viewed in color.

Crucially PCL allows the system designer to deﬁne the parts appropriately depending on the application.

Hotel planning. Finally, we considered a complex furniture allocation problem: furnishing an entire hotel. The problem is encoded as follows. The hotel is represented by a graph: nodes are rooms and edges indicate which rooms are adjacent. Rooms can be of three types: normal rooms, suites, and dorms. Each room can hold a maximum number of furniture pieces, each associated to a cost. Additional, ﬁxed nodes represent bathrooms and bars. The type of a room is decided dynamically based on its position and furniture. For instance, a normal room must contain at most three single or double beds, no bunk beds, and a table, and must be close to a bathroom. A suite must contain one bed, a table and a sofa, and must be close to a bathroom and a bar. Each room is a basic part, and there are 15 rooms to be allocated. The feature vector contains 20 global features plus 8 local features per room. The global features include different functions of the number of different types of rooms, the total cost of the furniture and the total number of guests. The local features include, instead, characteristics of the current room, such as its type or the amount of furniture, and other features shared by adjacent rooms, e.g. whether two rooms have the same type. These can encode preferences like suites and dorms should not be too close , or the hotel should maintain high quality standards while still being proﬁtable . Given the graph structure, room capacities, and total budget, the goal is to furnish all rooms according to the user s preferences. This problem is hard to solve to optimality with current

solvers; part-based inference alleviates this issue by focusing on individual rooms. There are 15 rooms in the hotel, so that at each iteration only 1/15 of the conﬁguration is affected. Furthermore, the presence of the global features implies dependences between all rooms. Nonetheless, the algorithm manages to reduce the regret by an order of magnitude in around a 100 iterations, starting from a completely uninformed prior. Note also that as for the training planning scenario, an alpha of 0.3 achieves basically the same results as those for alpha equal to 0.5.

In this work we presented an approach to constructive preference elicitation able to tackle large constructive domains, beyond the reach of previous approaches. It is based on Coactive Learning (Shivaswamy and Joachims 2015), but only requires inference of partial conﬁgurations and partial improvement feedback, thereby signiﬁcantly reducing the cognitive load of the user. We presented an extensive theoretical analysis demonstrating that, despite working only with partial conﬁgurations, the algorithm converges to a locally optimal solution. The algorithm has been evaluated empirically on three constructive scenarios of increasing complexity, and shown to perform well in practice. Possible future work includes improving part-based interaction by exchanging additional contextual information (e.g. features (Teso, Dragone, and Passerini 2017) or explanations) with the user, and applying PCL to large layout synthesis problems (Dragone et al. 2016).

Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finitetime analysis of the multiarmed bandit problem. Machine learning 47(2-3):235 256. Boutilier, C.; Bacchus, F.; and Brafman, R. I. 2001. Ucpnetworks: A directed graphical representation of conditional utilities. In Proceedings of the Seventeenth conference on Uncertainty in artiﬁcial intelligence, 56 64. Morgan Kaufmann Publishers Inc. Boutilier, C.; Patrascu, R.; Poupart, P.; and Schuurmans, D. 2006. Constraint-based optimization and utility elicitation using the minimax decision criterion. Artiﬁcial Intelligence 170(8-9):686 713. Braziunas, D., and Boutilier, C. 2005. Local utility elicitation in GAI models. In Proceedings of the Twenty-First Conference on Uncertainty in Artiﬁcial Intelligence, 42 49. AUAI Press. Braziunas, D., and Boutilier, C. 2007. Minimax regret based elicitation of generalized additive utilities. In UAI, 25 32. Braziunas, D., and Boutilier, C. 2009. Elicitation of factored utilities. AI Magazine 29(4):79. Chajewska, U.; Koller, D.; and Parr, R. 2000. Making rational decisions using adaptive utility elicitation. In AAAI/IAAI, 363 369. Dragone, P.; Erculiani, L.; Chietera, M. T.; Teso, S.; and Passerini, A. 2016. Constructive layout synthesis via coactive learning. In Constructive Machine Learning workshop, NIPS. Fishburn, P. C. 1967. Interdependence and additivity in multivariate, unidimensional expected utility theory. International Economic Review 8(3):335 342. Fister, I.; Rauter, S.; Yang, X.-S.; and Ljubiˇc, K. 2015. Planning the sports training sessions with the bat algorithm. Neurocomputing 149:993 1002. Goetschalckx, R.; Fern, A.; and Tadepalli, P. 2014. Coactive learning for locally optimal problem solving. In Proceedings of AAAI. Gonzales, C., and Perny, P. 2004. GAI networks for utility elicitation. KR 4:224 234. Keeney, R. L., and Raiffa, H. 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Mayer, R. E., and Moreno, R. 2003. Nine ways to reduce cognitive load in multimedia learning. Educational psychologist 38(1):43 52. Meseguer, P.; Rossi, F.; and Schiex, T. 2006. Soft constraints. Foundations of Artiﬁcial Intelligence 2:281 328. Ortega, P. A., and Stocker, A. A. 2016. Human decisionmaking under limited time. In Advances in Neural Information Processing Systems, 100 108. Pigozzi, G.; Tsouki as, A.; and Viappiani, P. 2016. Preferences in artiﬁcial intelligence. Ann. Math. Artif. Intell. 77(34):361 401. Raman, K.; Shivaswamy, P.; and Joachims, T. 2012. Online learning to diversify from implicit feedback. In Proceed-

ings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 705 713. ACM. Shivaswamy, P., and Joachims, T. 2015. Coactive learning. JAIR 53:1 40. Teso, S.; Dragone, P.; and Passerini, A. 2017. Coactive critiquing: Elicitation of preferences and features. In AAAI. Teso, S.; Passerini, A.; and Viappiani, P. 2016. Constructive preference elicitation by setwise max-margin learning. In Proceedings of the Twenty-Fifth International Joint Conference on Artiﬁcial Intelligence, 2067 2073. AAAI Press.