# constructive_preference_elicitation_by_setwise_maxmargin_learning__5e7fce2f.pdf

Constructive Preference Elicitation by Setwise Max-Margin Learning

Stefano Teso University of Trento

Trento, Italy teso@disi.unitn.it

Andrea Passerini University of Trento

Trento, Italy passerini@disi.unitn.it

Paolo Viappiani Sorbonne Universit es UPMC Univ Paris 06 CNRS, LIP6 UMR 7606

Paris, France paolo.viappiani@lip6.fr

In this paper we propose an approach to preference elicitation that is suitable to large conﬁguration spaces beyond the reach of existing state-of-theart approaches. Our setwise max-margin method can be viewed as a generalization of max-margin learning to sets, and can produce a set of diverse items that can be used to ask informative queries to the user. Moreover, the approach can encourage sparsity in the parameter space, in order to favor the assessment of utility towards combinations of weights that concentrate on just few features. We present a mixed integer linear programming formulation and show how our approach compares favourably with Bayesian preference elicitation alternatives and easily scales to realistic datasets.

1 Introduction Preferences [Peintner et al., 2008] play an important role in a variety of artiﬁcial intelligence applications and the task of eliciting or learning preferences is a crucial one; typically only limited information about the user s preferences will be available and the cost (cognitive or computational) of obtaining additional preference information will be high.

The automated assessment of preferences has received considerable attention, starting with pioneering works in the OR community, such as [White III et al., 1984] and especially the UTA methodology [Jacquet-Lagr eze and Siskos, 1982] giving rise to a wide variety of extensions [Jacquet Lagreze and Siskos, 2001; Greco et al., 2008]. Within AI, a number researchers have proposed interactive methods that elicit preferences in an adaptive way [Chajewska et al., 2000; Boutilier, 2002; Wang and Boutilier, 2003; Boutilier et al., 2006; Guo and Sanner, 2010; Viappiani and Boutilier, 2010], observing that, by asking informative questions, it is often possible to make near-optimal decisions with only partial preference information.

While most works assume that items or decisions are available in a (possibly large) dataset, in this paper we propose an adaptive elicitation framework that takes a constructive view on preference elicitation, enlarging its scope from the selection of items among a set of candidates to the synthesis of entirely novel instances. Instances are solutions to a given

optimization problem; they are represented as combinations of basic elements (e.g. the components of a laptop) subject to a set of constraints (e.g. the laptop model determines the set of available CPUs). A utility function is learned over the feature representation of an instance, as customary in many preference elicitation approaches. The recommendation is then made by solving a constrained optimization problem in the space of feasible instances, guided by the learned utility.

Preference elicitation in conﬁguration problems has been previously tackled with regret-based elicitation [Boutilier et al., 2006; Braziunas and Boutilier, 2007], where minimax regret is used both as a robust recommendation criterion and as a technique to drive elicitation. The main limitation of their approach is the lack of tolerance with respect to user inconsistency. Indeed, learning a user utility function requires to deal with uncertain and possibly inconsistent user feedback.

Bayesian preference elicitation approaches deal with this problem by building a probability distribution on candidate functions (endowed with a response or error model to be used for inference) and asking queries maximizing informativeness measures such as expected value of information (EVOI) [Chajewska et al., 2000; Guo and Sanner, 2010; Viappiani and Boutilier, 2010]. These approaches are however computationally expensive and can not scale to fully constructive scenarios, as shown in our experimental results.

We take a space decomposition perspective and jointly learn a set of weight vectors, each representing a candidate utility function, maximizing diversity between the vectors and consistency with the available feedback. These two conﬂicting objectives tend to generate equally plausible alternative hypotheses for the unknown utility. Our approach to elicitation works by combining weight vector learning with instance generation, so that each iteration of the algorithm produces two outcomes: a set of weight vectors and a set of instances, each maximizing its score according to one of the weight vectors. We evaluate the effectiveness of our approach by testing our elicitation method in both synthetic and realworld problems, and comparing it to state-of-the-art methods.

2 Background We ﬁrst introduce some notation. We use boldface letters x to indicate vectors, uppercase letters X for matrices, and calligraphic capital letters X for sets. We abbreviate the set {xi}n

i=1 as {xi} whenever the range of the index i is clear

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

from the context, and use [n] as a shorthand for {1, . . . , n}. We write kxk1 := P

z |xz| to indicate the 1 vector norm, h , i for the usual dot product, X0 for matrix transposition.

We assume to have a multi-attribute feature space X of conﬁgurations x = (x1, . . . , xm) over m features. For the sake of simplicity we focus on binary features only, i.e. xz 2 {0, 1} for all z 2 [m], assuming a one-hot encoding of categorical features. This is a common choice for preference elicitation methods [Guo and Sanner, 2010; Viappiani and Boutilier, 2010]. Support for linearly dependent continuous features will be discussed later on.

We further assume that the set of feasible conﬁgurations, denoted by Xfeasible, is expressed as a conjunction of linear constraints. This allows to formulate both arithmetic and logical constraints, e.g. under the canonical mapping of True to 1 and False to 0, the Boolean disjunction of two binary variables x1 _ x2 can be rewritten as x1 + x2 1.

Consistently with the experimental settings of previous work [Guo and Sanner, 2010; Viappiani and Boutilier, 2010], we model users with additive utility functions [Keeney and Raiffa, 1976]; the user s preferences are represented by a weight vector w 2 Rm and the utility of a conﬁguration x is given by hw, xi = Pm

z=1 wzxz. In the remainder of the paper we require all weights to be non-negative and bounded: the per-attribute weights wz must lie in a (constant but otherwise arbitrary) interval [w?

z ], with w?

z 0. Both requirements are quite natural1, and enable the translation of our core optimization problem into a mixed-integer linear problem (as done in Section 3).

During learning, the actual weight vector w is unknown to the learning system, and must be estimated by interacting with the user. We mostly focus on pairwise comparison queries, which are the simplest of the comparative queries. These can be extended to choice sets of more than two options [Viappiani and Craig, 2009; Viappiani and Boutilier, 2010] and are common in conjoint analysis [Louviere et al., 2000; Toubia et al., 2004]. For a pairwise comparison between two conﬁgurations x and x0: either x is preferred to x0 (written x x0), x0 is preferred to x (x x0), or there is no clear preference between the two items (x x0). We write D to denote the set of preferences (answers to comparison queries) elicited from the user.

In the next Section we describe how informative queries can be generated using our setwise maxmargin learning.

3 Setwise Max-margin Learning

Non-linear Formulation. We ﬁrst introduce the problem formulation as a non-linear optimization problem, and then show how to reduce it to a mixed integer linear program.

The goal of our setwise max-margin approach is twofold. First, for any given set size k 1, we want to ﬁnd a set of k weight vectors w1, . . . , wk, chosen so that all userprovided preferences are satisﬁed by the largest possible margin (modulo inconsistencies) and so that they are maximally

1Utility values are deﬁned on an interval scale, thus it is always possible to scale the values appropriately (see for instance [Torra and Narukawa, 2007] and [Keeney and Raiffa, 1976]).

diverse. Second, we want to construct a set of k conﬁgurations x1, . . . , xk, so that each conﬁguration xi is the best possible option when evaluated according to the corresponding wi and conﬁgurations are maximally diverse among each other. These options will be later used to formulate queries.

The ﬁrst goal is achieved by translating all pairwise preferences D into ranking constraints: preferences of the form yh

become linear inequalities of the form hwi, yh

i µ, where µ is the margin variable (which we aim at maximizing) and h ranges over the responses. Non-separable datasets, which occur in practice due to occasional inconsistencies in user feedback, are handled by introducing slack variables (whose sum we aim at minimizing) in a way similar to UTA and its extensions [Jacquet-Lagr eze and Siskos, 1982; Greco et al., 2008]. When augmented with the slacks, the above inequalities take the form hwi, yh

h is the penalty incurred by weight vector wi for violating the margin separation of pair h. Indifference preferences, i.e. yh

2, are translated as |hwi, yh

h; the slack increases with the difference between the estimated utility of the two options.

The second goal requires to jointly maximize the utility of each xi according to its corresponding weight vector wi and its scoring difference with respect to the other conﬁgurations xj in the set. We achieve this by maximizing the sum of utilities Pk

i=1hwi, xii and adding ranking constraints of the form hwi, xi xji µ for all i, j 2 [k], i 6= j.

A straightforward encoding of the above desiderata leads to the following mixed integer non-linear optimization problem over the non-negative margin µ 2 R 0 and vectors {wi 2 Rm}, {xi 2 {0, 1}m}:

s.t. 8 i 2 [k], 8 h 2 [n] hwi, yh

8 i, j 2 [k], i 6= j hwi, xi xji µ (2)

8 i 2 [k] w? wi w> (3)

8 i 2 [k] xi 2 Xfeasible , "i 0 (4) Let us illustrate the above piece by piece. The objective is composed of four parts: we maximize the shared margin µ (ﬁrst part) and minimize the total sum of the ranking errors "i incurred by each weight vector wi (second part), while at the same time regularizing the magnitude of the weights (third part) and the quality of the conﬁgurations {xi} (last part). The non-negative hyperparameters , β, γ control the inﬂuence of the various components. The weight regularization term copes with the common scenario in which the user has strong preferences about some attributes, but is indifferent to most of them. The 1 penalty is frequently used to improve the sparsity of learned models [Tibshirani, 1996; Zhang and Huang, 2008; Hensinger et al., 2010], with consequent gains in generalization ability and efﬁciency, as conﬁrmed by our empirical ﬁndings (see Section 4). Constraint (1) enforces the correct ranking of the observed user preferences, while (2) ensures that the generated conﬁgurations are diverse in terms of the weight vectors they maximize. Constraints (3) and (4) ensure that the weights and

Figure 1: Optimization of setwise max-margin; the black lines corresponds to preference constraints D, the red points are utility vectors w1 and w2, the red line corresponds to the hyperplane hw, x1 x2i = 0.

conﬁgurations are feasible and guarantees the non-negativity of the slacks. Since we require w? (0, . . . , 0), Eq. (3) also enforces the weights to be non-negative.

Note that we are choosing the conﬁgurations {xi} and the weight vectors {wi} simultaneously. We look for wi so that the utility loss (see constraint 2) of choosing xj instead of xi, j 6= i, is large (at least µ). Look at Figure 1, where, for simplicity, we need to choose a pair (k = 2). Eq. 2 is represented by a red line, that partitions the space of feasible utility weights in two parts (in general, there will be k subregions). Since we maximize the margin µ, the optimizer will prefer a set of conﬁgurations {x1, x2} that partitions the weight space in an even way.2 In each subregion, we have corresponding wi lying close to its centre. If, for example, the user indicates a preference for x1 over x2, the feasible region will then become the part of the polytope to the left of the red line; moreover the vector w1 will maximize the margin in the classic (k = 1) sense in the new feasible region.

MILP Formulation. This initial formulation is problematic to solve, as Eq. (2) involves quadratic terms over mixed continuous integer variables. However, the problem can be reformulated as a mixed integer linear program (MILP) by a suitable transformation. This technique is rather common in operational research, see e.g. [Boutilier et al., 2006].

Our goal is to replace Eq. (2) with a set of linear constraints. In order to do so, we introduce a set of fresh variables pi,j

z for every i, j 2 [k] and z 2 [m]. Assuming for the time being that the new variables do satisfy the equation pi,j

z, we rewrite the fourth component of the objective function in terms of the new variables as:

and, similarly, Eq. (2) as:

8 i, j 2 [k], i 6= j .

2This bears similarity with volumetric approaches [Iyengar et al., 2001], but there are important differences: ﬁrst here we consider real items to ﬁnd the best separator, second the margin is also expressed in utility terms, third the query is found via an optimization process.

The fact that pi,j

z is achieved by setting the following additional constraints. We distinguish between two cases: (i) pi,i

z and (ii) pi,j

z for i 6= j. Recall that we are maximizing the margin µ. Now, due to Eq. (5), the optimizer will try to keep pi,i

z as large as possible and pi,j

z as small as possible. (Case i) We add an explicit upper bound: pi,i

z min{wmaxxi

z}, where wmax is a sufﬁciently large constant. On one hand, if xi

z = 0 the product wi

z evaluates to 0, and so does the upper bound wmaxxi

z = 0. On the other hand, if xi

z = 1 then the product wi

z amounts to wi

z, while the upper bound reduces to min{wmax, wi

z}. By taking a sufﬁciently large constant wmax (e.g. wmax := maxz w>

z ) the upper bound simpliﬁes to wi

z. Since pi,i

z is being maximized, in both cases it will attain the upper bound, and thus satisfy pi,j

z. (Case ii) We add an explicit lower bound: pi,j

z max{0, wi

z wmax(1 xj

z = 1 the lower bound simpliﬁes to max{0, wi

z, due to the non-negativity of wi

z. Otherwise, if xj

z = 0 then the lower bound becomes max{0, wi

z wmax}, where the second term is at most 0. Since pi,j

z is being minimized, in both cases it will attain the lower bound, and thus satisfy pi,j

z.3 We thus obtain the following mixed-integer linear problem:

s.t. 8 i 2 [k], 8 h 2 [n] hwi, yh

8 i, j 2 [k], i 6= j

8 i, j 2 [k], i 6= j, 8 z 2 [m]

z min{wmaxxi

z max{0, wi

z wmax(1 xj

8 i 2 [k] w? wi w> (9)

8 i 2 [k] xi 2 Xfeasible , "i 0

which can be solved by any suitable MILP solver.

Set-wise max-margin. The full SETMARGIN algorithm follows the usual preference elicitation loop. Starting from an initially empty set of user responses D, it repeatedly solves the MILP problem above using D to enforce ranking constraints on the weight vectors {wi}. The generated conﬁgurations {xi}, which are chosen to be as good as possible with respect to the estimated user preferences, and as diverse as possible, are then employed to formulate a set of user queries. The new replies are added to D and the whole procedure is repeated. Termination can be after a ﬁxed number of iterations, when the difference between utility vectors is very small, or might be left to the user to decide (e.g. [Reilly et al., 2007]).

The procedure is sketched in Algorithm 1. Note that at the end of the preference elicitation procedure, a ﬁnal recommendation is made by solving the MILP problem for k = 1.

3Since µ is upper-bounded by Eq. (1), in some cases the pi,j

z variables do not attain the lower bound. As a consequence, the MILP reformulation of Eq. (2) is a (tight) approximation of the original one. This has no impact on the quality of the solutions.

Algorithm 1 The SETMARGIN algorithm. Here k is the set size, , β, γ are the hyperparameters, and T is the maximum number of iterations. The values of Xfeasible, w> and w? are left implicit.

1: procedure SETMARGIN(k, , β, γ, T) 2: D ; 3: for t = 1, . . . , T do 4: {wi, xi}k

i=1 SOLVE(D, k, , β, γ) 5: for xi, xj 2 {x1, . . . , xk} s.t. i < j do 6: D D [ QUERYUSER(xi, xj) 7: end for 8: end for 9: w , x SOLVE(D, 1, , β, γ) 10: return w , x

11: end procedure

Linearly dependent real attributes. In many domains of interest, items are composed of both Boolean and real-valued attributes, where the latter depend linearly on the former. This is for instance the case for the price, weight and power consumption of a laptop, which depend linearly on the choice of components. In this setting, conﬁgurations are composed of two parts: x = (x B; x R), where x B is Boolean and x R is real-valued and can be written as x R = Cx B for an appropriately sized non-negative cost matrix C. It is straightforward to extend the MILP formulation to this setting. We rewrite the weight vector as w = (w B; w R). The utility becomes:

hw, xi = hw B, x Bi + hw R, Cx Bi = hw B + C0w R, x Bi

The generalized problem is obtained by substituting wi with vi := wi

R. All constraints remain the same. The only notable change occurs in Eq. (9), which becomes:

8 i 2 [k] . (w?

4 Experiments

We implemented the SETMARGIN algorithm using Python, leveraging Gurobi 6.5.0 for solving the core MILP problem. Both the SETMARGIN source code and the full experimental setup are available at https://github.com/stefanoteso/setmargin.

We compare SETMARGIN against three state-of-the-art Bayesian approaches: i) the Bayesian approach from [Guo and Sanner, 2010], selecting queries according to restricted informed VOI (RIVOI), a computationally efﬁcient heuristic approximation of value-of-information, and inference using True Skill T M [R. et al., 2006] (based on expectation propagation [Minka, 2001]); ii) the Bayesian framework of [Viappiani and Boutilier, 2010] using Monte Carlo methods (with 50,000 particles) for Bayesian inference and asking choice queries (i.e. selection of the most preferred item in a set) selected using a greedy optimization of Expected Utility of a Selection (a tight approximation of EVOI, hereafter just called EUS); iii) Query Iteration (referred as QI below), also from [Viappiani and Boutilier, 2010], an even faster query selection method based on sampling sets of utility vectors.

We adopt the indifference-augmented Bradley-Terry user response model introduced in [Guo and Sanner, 2010]. The

probability that a user prefers conﬁguration xi over xj is deﬁned according to the classical (without indifference) Bradley-Terry model [Bradley and Terry, 1952] as (1 + exp( λ1hw, xi xji)) 1, where w is the weight vector of the true underlying user utility. Support for indifference is modelled as an exponential distribution over the closeness of the two utilities, i.e. exp( λ2|hw, xi xji|). The parameters λ1 and λ2 were set to one for all simulations, as in [Guo and Sanner, 2010].

In all experiments SETMARGIN uses an internal 5-fold cross-validation procedure to update the hyperparameters , β, and γ after every 5 iterations. The hyperparameters are chosen as to minimize the ranking loss over the user responses collected so far. is taken in {20, 10, 5, 1}, while β and γ are taken in {10, 1, 0.1, 0.001}.4

Synthetic Dataset. Following the experimental protocol in [Guo and Sanner, 2010] and [Viappiani and Boutilier, 2010], in the ﬁrst experiment we evaluate the behavior of the proposed method in an artiﬁcial setting with increasingly complex problems. We developed synthetic datasets with r attributes, for increasing values of r. Each attribute takes one of r possible values, so that the one-hot encoding of attributes results in m = r2 features. In terms of space of conﬁgurations, for r = 3 the synthetic dataset corresponds to Xfeasible = [3] [3] [3], for r = 4 to Xfeasible = [4] [4] [4] [4], and so on. The cardinality of Xfeasible is rr, and grows (super) exponentially with r. For r = 3, the dataset is comparable in size to the synthetic one used in [Guo and Sanner, 2010] and [Viappiani and Boutilier, 2010]. For larger r the size of the space grows much larger than the ones typically used in the Bayesian preference elicitation literature, and as such represents a good testbed for comparing the scalability of the various methods. The feasible conﬁguration space was encoded in SETMARGIN through appropriate MILP constraints, while the other methods require all datasets to be explicitly grounded. Users were simulated by drawing 20 random utility vectors from each of four different distributions. The ﬁrst two mimic those used in [Guo and Sanner, 2010] : (1) a uniform distribution over [1, 100] for each individual weight, and (2) a normal distribution with mean 25 and standard deviation 25

3 (each attribute is sampled i.i.d). We further produced two novel sparse versions of the uniform and normal distributions setting to zero 80% of the entries (sampled uniformly at random). We set a maximum budget of 100 iterations for all methods for simplicity.

In Figure 2 we report solution quality and timing values for increasing number of collected user responses, for the different competitors on each of the four different utility vector distributions and datasets r = 3 and r = 4. Solution quality is measured in terms of utility loss maxx2Xfeasible (u(x) u(x )), where u( ) is the true unknown user utility, and x is the solution recommended to the user after the elicitation phase (see Algorithm 1). Computa-

4Note that for k = 1 Eq. (7) and Eq. (8) disappear, so can not be taken to be less than 1, as in this case, the objective can be increased arbitrarily while keeping the right-hand side of Eq. (1) constant, rendering the problem unbounded.

SPARSE UNIFORM

SPARSE NORMAL

Figure 2: Comparison between SETMARGIN (orange), RIVOI (blue), QI (green) and EUS (gray) on the r = 3 (left) and r = 4 (right) datasets. Each row represents a different sampling distribution for user utility. The number of iterations is plotted against the utility loss (ﬁrst and third columns) and the cumulative time (second and fourth columns). Thick lines indicate median values over users, while standard deviations are shown as shaded areas.

tional cost is measured in terms of cumulative time. Given that RIVOI, QI and EUS are single-threaded, we disabled multi-threading when running our algorithm in these comparisons. All experiments were run on a 2.8 GHz Intel Xeon CPU with 8 cores and 32 Gi B of RAM. For all algorithms, one iteration corresponds to a single pairwise query (we used SETMARGIN with k = 2). For dense weight vector distributions (ﬁrst two rows), our approach achieves results which are indistinguishable from the competitors in a fraction of their time. Indeed, all Bayesian approaches become quickly impractical for growing values of r, while our algorithm can easily scale to much larger datasets, as will be shown later on. For sparse weight vector distributions (last two rows) our approach, in addition to being substantially faster on each iteration, requires less queries in order to reach optimal solutions. This is an expected result as the sparsiﬁcation norm in our formulation (kwk1) is enforcing sparsity in the weights, while none of the other approaches is designed to do this.

In order to study the effect of increasing the number of weight vectors in our formulation, we also ran SETMARGIN varying the parameter k. Figure 3 reports utility loss results

on r=4 and r=5 datasets for the uniform and sparse normal distributions (the toughest and the simplest, for space limitations). The ﬁrst and third columns report results in terms of number of iterations. It can be seen that increasing the number of weight vectors tends to favour earlier convergence, especially for the more complex dataset (r = 5). However, as in each iteration the user is asked to compare k items, different values of k imply a different cognitive effort for the user. The second and fourth columns report results in terms of number of queries, where we count all

pairs of queries when comparing k items. In this case, k = 2 seems to be the best option. The cognitive cost for the user will likely lay in between these two extremes, but formalizing this concept in an efﬁcient query ordering strategy needs to face the effect of noise. A modiﬁed sorting algorithm asking only O(k log k) queries to the user resulted in a performance worsening, likely because of a cascading effect of inconsistent feedback (but could be beneﬁcial with different noise levels).

Constructive dataset. Next, we tested SETMARGIN on a truly constructive setting. We developed a constructive ver-

SPARSE NORMAL

Figure 3: Comparison for SETMARGIN with k = 2, 3, 4, in orange, blue and green respectively for the r = 4 (left) and r = 5 (right) datasets using uniform (top row) and sparse normal (bottom row) distributions. Median and standard deviation utility loss values are reported for increasing number of iterations (1st and 3rd columns) and pairwise queries (2nd and 4th columns).

SPARSE UNIFORM

Figure 4: Results for SETMARGIN for k = 2 (red), k = 3 (blue) and k = 4 (green) on the constructive PC dataset for the sparse uniform distribution. Utility loss and time are plotted against number of iterations (left) and number of queries (right).

sion of the PC dataset used in [Guo and Sanner, 2010]: instead of explicitly enumerating all possible PC items, we deﬁned the set of feasible conﬁgurations with MILP constraints.

A PC conﬁguration is deﬁned by eight attributes: computer type (laptop, desktop, or tower), manufacturer (8 choices), CPU model (37), monitor size (8), RAM amount (10), storage (10) size, and price. The price attribute is deﬁned as a linear combination of the other attributes: this is a fair modeling choice, as often the price of a PC is well approximated by the sum of the price of its components plus a bias due to branding. Interactions between attributes are expressed as Horn clauses (e.g. a certain manufacturer implies a set of possible CPUs). The dataset includes 16 Horn constraints (the full list is omitted for space limitations). Note that the search space is of the order of hundreds of thousands of candidate conﬁgurations, and is far beyond reach of existing Bayesian approaches.

Figure 4 reports results of SETMARGIN varying k using the sparse uniform distribution (the more complex of the sparse ones, dense distributions being unrealistic in this scenario). The ﬁrst and third column report utility loss for increasing number of iterations and queries respectively, showing a behaviour which is similar to the one in Figure 3. Overall, between 50 and 70 queries on average are needed in order to ﬁnd a solution which is only 10% worse than the optimal one, out of the more than 700,000 thousands available. Note that a vendor may ensure a considerably smaller number of queries by cleverly constraining the feasible conﬁguration space; since our primary aim is benchmarking, we

chose not to pursue this direction further. The second and fourth columns report cumulative times. Note that in some cases, standard deviations have a bump; this is due to cases in which some of the hyperparameters of the internal cross validation result in ill-conditioned optimization problems which are hard to solve. These exceptions can be easily dealt with by setting an appropriate timeout on the cross validation without affecting the results, as these hyperparameters typically end up having bad performance and being discarded.

5 Conclusion

We presented a max-margin approach for efﬁcient preference elicitation in large conﬁguration spaces.5 Our approach relies on an extension of max-margin learning to sets, and is effective in the generation of a diverse set of conﬁgurations that can be used to ask informative preference queries. The main advantages of this elicitation method are 1) ability to provide recommendations in large conﬁguration problems 2) robustness with respect to erroneous feedback and 3) ability to encourage sparse utility functions. Experimental comparisons against state-of-the-art Bayesian preference elicitation strategies conﬁrm these advantages. Future work includes extending the approach to truly hybrid scenarios (where real valued attributes do not depend on categorical ones) and studying its

5Note that max-margin learning has been proposed before [Gajos and Weld, 2005] for preference elicitation, but with rudimental methods for query selection.

applicability to other problems, as the identiﬁcation of Choquet models [Ah-Pine et al., 2013].

Acknowledgments ST was supported by the Caritro Foundation through project E62I15000530007. PV was supported by the Idex Sorbonne Universit es under grant ANR11-IDEX-0004-02. We thank Craig Boutilier for motivating discussion on the topic.

References [Ah-Pine et al., 2013] J. Ah-Pine, B. Mayag, and A. Rolland.

Identiﬁcation of a 2-additive bi-capacity by using mathematical programming. In Algorithmic Decision Theory, pages 15 29. 2013. [Boutilier et al., 2006] C. Boutilier, R. Patrascu, P. Poupart,

and D. Schuurmans. Constraint-based Optimization and Utility Elicitation using the Minimax Decision Criterion. Artiﬁcal Intelligence, 170:686 713, 2006. [Boutilier, 2002] C. Boutilier. A POMDP Formulation of Preference Elicitation Problems. In Proceedings of AAAI 02, pages 239 246, 2002. [Bradley and Terry, 1952] R. A. Bradley and M. E. Terry.

Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324 345, 1952. [Braziunas and Boutilier, 2007] D. Braziunas and C. Boutilier. Minimax regret based elicitation of generalized additive utilities. In Proceedings of UAI 07, pages 25 32, 2007. [Chajewska et al., 2000] U. Chajewska, D. Koller, and R. Parr. Making rational decisions using adaptive utility elicitation. In Proceedings of AAAI 00, pages 363 369, 2000. [Gajos and Weld, 2005] K. Gajos and D. Weld. Preference elicitation for interface optimization. In UIST, pages 173 182, 2005. [Greco et al., 2008] S. Greco, V. Mousseau, and R. Słowi nski.

Ordinal regression revisited: multiple criteria ranking using a set of additive value functions. European Journal of Operational Research, 191(2):416 436, 2008. [Guo and Sanner, 2010] S. Guo and S. Sanner. Real-time multiattribute bayesian preference elicitation with pairwise comparison queries. In Proceedings of AISTAT 10, pages 289 296, 2010. [Hensinger et al., 2010] E. Hensinger, I. Flaounas, and N. Cristianini. Learning the preferences of news readers with svm and lasso ranking. In Artiﬁcial Intelligence Applications and Innovations, pages 179 186. 2010. [Iyengar et al., 2001] V. S. Iyengar, J. Lee, and M. Campbell.

Q-Eval: Evaluating multiple attribute items using queries. In Proceedings of the Third ACM Conference on Electronic Commerce, pages 144 153, 2001. [Jacquet-Lagr eze and Siskos, 1982] E. Jacquet-Lagr eze and

Y. Siskos. Assessing a set of additive utility functions for multicriteria decision making: the UTA method. European Journal of Operational Research, 10:151 164, 1982.

[Jacquet-Lagreze and Siskos, 2001] E. Jacquet-Lagreze and

Y. Siskos. Preference disaggregation: 20 years of mcda experience. European Journal of Operational Research, 130(2):233 245, 2001. [Keeney and Raiffa, 1976] R. L. Keeney and H. Raiffa. De-

cisions with Multiple Objectives: Preferences and Value Tradeoffs. John Wiley and Sons, New York, 1976. [Louviere et al., 2000] J. J. Louviere, Hensher D. A., and J. D.

Swait. Stated Choice Methods: Analysis and Application. Cambridge University Press, Cambridge, 2000. [Minka, 2001] T. Minka. Expectation propagation for approx-

imate bayesian inference. In Proceedings of UAI 01, pages 362 369, 2001. [Peintner et al., 2008] B. Peintner, P. Viappiani, and N. Yorke-

Smith. Preferences in interactive systems: Technical challenges and case studies. AI Magazine, 29(4):13 24, 2008. [R. et al., 2006] Herbrich R., Minka T., and Graepel T.

Trueskilltm: A bayesian skill rating system. In Proceedings of NIPS 06, pages 569 576, 2006. [Reilly et al., 2007] J. Reilly, J. Zhang, L. Mc Ginty, P. Pu, and

B. Smyth. Evaluating compound critiquing recommenders: a real-user study. In Proceedings of the 8th ACM conference on Electronic commerce, pages 114 123. ACM, 2007. [Tibshirani, 1996] T. Tibshirani. Regression shrinkage and se-

lection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267 288, 1996. [Torra and Narukawa, 2007] V. Torra and Y. Narukawa. Mod-

eling decisions - information fusion and aggregation operators. Springer, 2007. [Toubia et al., 2004] O. Toubia, J. R. Hauser, and D. I.

Simester. Polyhedral methods for adaptive choicebased conjoint analysis. Journal of Marketing Research, 41(1):116 131, 2004. [Viappiani and Boutilier, 2010] P. Viappiani and C. Boutilier.

Optimal bayesian recommendation sets and myopically optimal choice query sets. In Proceedings of NIPS 10, pages 2352 2360, 2010. [Viappiani and Craig, 2009] P. Viappiani and B. Craig. Regret-based optimal recommendation sets in conversational recommender systems. In Proceedings of Rec Sys 09, pages 101 108, 2009. [Wang and Boutilier, 2003] T. Wang and C. Boutilier. Incre-

mental Utility Elicitation with the Minimax Regret Decision Criterion. In Proceedings of IJCAI 03, pages 309 316, 2003. [White III et al., 1984] C. C. White III, A. P. Sage, and S. Do-

zono. A model of multiattribute decisionmaking and tradeoff weight determination under uncertainty. IEEE Transactions on Systems, Man, and Cybernetics, 14(2):223 229, 1984. [Zhang and Huang, 2008] C.-H. Zhang and J. Huang. The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Statist., 36(4):1567 1594, 08 2008.