# learning_the_valuations_of_a_kdemand_agent__6b130e97.pdf Learning the Valuations of a k-demand Agent Hanrui Zhang 1 Vincent Conitzer 1 We study problems where a learner aims to learn the valuations of an agent by observing which goods he buys under varying price vectors. More specifically, we consider the case of a k-demand agent, whose valuation over the goods is additive when receiving up to k goods, but who has no interest in receiving more than k goods. We settle the query complexity for the active-learning (preference elicitation) version, where the learner chooses the prices to post, by giving a biased binary search algorithm, generalizing the classical binary search procedure. We complement our query complexity upper bounds by lower bounds that match up to lower-order terms. We also study the passive-learning version in which the learner does not control the prices, and instead they are sampled from some distribution. We show that in the PAC model for passive learning, any empirical risk minimizer has a sample complexity that is optimal up to a factor of e O(k). 1. Introduction The active learning of agents preferences is also known as preference elicitation. Depending on the setting, we may wish to model and represent preferences differently. For example, if there is a set of alternatives to choose from, and agents cannot make payments, then it is natural to represent an agent s preferences by a weak ordering . If agents also express preferences over distributions over alternatives, we may wish to model an agent s preferences by a utility function u( ) and assume the agent is maximizing expected utility. We may learn agents preferences by asking them queries, for example which of two (distributions over) alternatives is preferred. 1Department of Computer Science, Duke University, Durham, USA. Correspondence to: Hanrui Zhang , Vincent Conitzer . Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s). In other contexts, such as the allocation of goods (or bads, e.g., tasks), agents are often able to make payments (or receive them as compensation). In this context, it is natural to model the agent s preferences by a valuation function v( ), and assume that utility is valuation minus payment made. Depending on the setting, different types of query may be possible. A value query would ask directly for the valuation that the agent has for a specific bundle of goods. But it is not always feasible to ask value queries, for example because the agent finds the query hard to answer, is reluctant to answer it out of fear of exploitation, or because there are simply exogenous restrictions on the type of query we can make. For example, if we are running a grocery store, the only way in which we may learn about an agent s valuation is by setting prices and seeing what he buys. This is what is known as a demand query given these prices, what would you buy? Such queries will be the focus of our paper. The very simplest setting involves only a single good. In this case, active learning of the agent s valuation is equivalent to the binary search problem: if we we quote a price p that is above the valuation we get a no answer, and otherwise a yes answer.1 If there are multiple goods but valuations are additive, so that an agent s valuation for a bundle S of items is simply v(S) = P j S v({j}), then the agent s decision on one good is independent of that on the other goods, and we can simply parallelize the binary searches for the individual goods. More interesting is the case of unit demand, where there are multiple goods but the agent will buy at most one good, namely the good j that maximizes v(j) p(j) if this value is nonnegative. Here, the active learning problem can be thought of as the following simple abstract problem. There is a vector of unknown numbers v; a query consists of subtracting from it an arbitrary other vector p, and learning the index of the maximum element of v p, but not its value. (Note that it makes no difference whether we add, subtract, and/or allow negative numbers.) Given the simplicity of this problem, it is likely to have applications outside of economics as well. For example, imagine a physical system, each possible state of which has 1Throughout the paper we assume consistent tie-breaking. I.e., whenever v(j) p(j) = 0, the agent either always wants the item, or always does not want the item. Similarly, whenever two items i and j provide the same utility, i.e., v(i) p(i) = v(j) p(j), one of the two is always preferred to the other. Learning the Valuations of a k-demand Agent a baseline energy that we wish to learn. We can arbitrarily modify the energy of each state, after which the system will go to the lowest-energy state, which we then observe. This is the same problem.2 Surprisingly, to our knowledge, how many queries are needed for this very basic problem has not yet been analyzed. In this paper, we settle this up to lower order terms for the generalization of a k-demand agent, who will buy at most k goods, namely the top k goods j as measured by v(j) p(j) (unless there are fewer than k for which v(j) p(j) 0, in which case only those will be bought). We also study the passive-learning version where we do not control the price vectors, but instead they are generated from some distribution. (This would correspond to the case where the energy modifications are the result of an external random process.) 1.1. Our Results In Section 2 we study the active elicitation problem, where the learner chooses the price vectors to post, observes the purchased sets, and aims to learn the exact valuations of the agent. We show that when there are n items, and the value of each item is an integer between 0 and W, there is an algorithm that learns the agent s valuations in (1 + o(1)) n log W k log(n/k) + n rounds, when k is not too large. We complement this upper bound by showing that both first-order terms of our upper bound are necessary. More specifically, we give adversarial distributions over valuations, where any algorithm needs (1 o(1)) n log W k log(n/k) and n 1 k rounds, respectively. Our algorithm is therefore optimal in a strong sense. In Section 3, we study the passive learning problem. We consider a PAC setting, where price vectors are drawn from a distribution; the learner observes the price vectors as well as the agent s choices, and aims to predict the agent s future choices. We establish sample complexity upper and lower bounds for the passive learning problem by settling the Natarajan dimension of the corresponding concept class. We also give efficient algorithms for the empirical risk minimization (ERM) problem; by solving this problem, our upper bound is achieved. Our bounds for the passive learning task are only approx- 2As a more specific example, suppose there is a set S of nearby natural structures in a lightning-prone area. We are interested in determining the electrical resistance of each structure. To do so, we can place lightning rods of varying heights on the structures, which will reduce the resistance of the electrical path through each structure by a known amount, and see where lightning strikes which will reveal which of the paths has the lowest resistance for the given lightning rods. imately tight in a worst-case sense, which means that in practice, our learning algorithm is likely to outperform the theoretical upper bound. In Section 4, we experimentally evaluate the performance of ERM algorithms. Our findings show that when prices are i.i.d., the empirical sample complexity of ERM algorithms depends much more mildly on the number of items n and the demand k than the theoretical bound suggests. 1.2. Related Work In economics, there is a long line of work on revealed preference theory, initiated by Samuelson (1938). Here, the idea is to infer consumers utility functions based on the choices that they make. However, most of this work concerns divisible goods and consumers that optimize given a monetary budget. Some work concerns the construction of valuations that explain the observed choices of the agent. In particular, Afriat (1967) shows that a sequence of observations can be explained by a utility function if and only if it can be explained by a utility function that is piecewise linear, monotone, and concave. While the proof is constructive, the representation of the constructed utility function is complex in proportion to the number of observed choices, so in general the construction fails to be predictive. In computer science, researchers have worked on both active and passive learning models. Some of the earliest work focuses on preference elicitation (Sandholm & Boutilier, 2006), an active-learning variant in which a party such as an auctioneer asks the agents queries about their valuations, and the agents respond. Typically, the goal is to determine the final outcome (say, allocation of resources) with as few queries as possible, though sometimes this is done by simply learning each agent s valuation function precisely and then computing the allocation.3 Early work established close connections between preference elicitation and the active learning of valuation functions (Blum et al., 2004; Lahaie & Parkes, 2004). Multiple types of queries are studied in this line of work. One is a value query, where an agent is asked for his valuation for a specific bundle. Another is a demand query, where an agent is asked which items he would buy at specific prices for the items. The latter type of query is the one of interest in this paper. Passive variants where in each round, prices are sampled from a distribution (or are otherwise outside our control) and we see what the agent buys under these prices, also fit the demand-query model every 3One may worry about incentives, e.g., an agent pretending to have a low valuation in order to be quoted a low price at the end. However, if VCG pricing is used, then it is an ex-post equilibrium for all agents to respond truthfully to every query (Sandholm & Boutilier, 2006; Nisan et al., 2007). This insight also applies to our work here. Learning the Valuations of a k-demand Agent round corresponds to a demand query that we do not control. This is what we study towards the end of this paper. (There are also passive-learning variants corresponding to value queries (Balcan & Harvey, 2011; Balcan et al., 2012), but we will not discuss those here.) Various iterative mechanisms (Parkes, 2006), such as ascending combinatorial auctions, require the agent to indicate a preferred bundle while prices adjust; these mechanisms are thus also implemented as sequences of demand queries. In other contexts, it is natural to ask different types of queries yet again: in voting (Conitzer, 2009; Procaccia et al., 2009), one may ask a comparison query (which of these two alternatives do you prefer?), and in cake cutting (Brams & Taylor, 1996; Procaccia & Wang, 2017), one may ask how far the knife needs to be moved for the agent to become indifferent between the two parts. However, in this paper we only consider the setting where agents have valuations for items and respond to demand queries. To study the prediction aspect of revealed preferences theory, Beigman & Vohra (2006) consider a PAC-learning model. They introduce a complexity measure of classes of utility functions, and, based on the complexity measure, characterize the learnability of different classes. Following Beigman and Vohra, Zadimoghaddam and Roth (Zadimoghaddam & Roth, 2012) give efficient learning algorithms for linearly separable concave utility functions in the PAC model. Their bound was later improved by Balcan et al. (2014), who also give generalizations to other classes of utility functions, misspecified models, and non-linear prices. Slightly departing from the PAC setting, Amin et al. (2015) study profit maximization in an online learning model. Bei et al. (2016) extend the results of Balcan et al. to Fisher and exchange markets. All these papers study divisible goods and monetary budgets. In this paper, in contrast, we consider indivisible goods and k-demand agents without a monetary budget constraint. Our results are therefore of a combinatorial nature. Basu & Echenique (2018) study the learnability of preference models of choice under uncertainty, and Chase & Prasad (2018) study the learnability of time dependent choices. Their models are intrinsically different from ours, and in particular, they aim to learn binary relations, as opposed to predicting combinatorial outcomes. Blum et al. (2018) consider a setting where a seller has unknown priority among the buyers, according to which they are allocated items. They give algorithms that with few mistakes reconstruct both the buyers valuations and the seller s priority, whenever the buyers have additive, unit-demand, or singleminded valuations. These results are incomparable to ours, since (1) they consider an online model where the goal is to minimize the number of mistakes, whereas we give algorithms that operate either with active querying or in the PAC model, and (2) even in their online model, when there are variable prices, their results apply only to additive or unit-demand buyers, and the mistake bound depends on that of the ellipsoid algorithm. The main complexity of their model comes from the fact that there are multiple agents affecting each other. There is various research on similar, but less closely related, topics (Besbes & Zeevi, 2009; Babaioff et al., 2015; Roth et al., 2016; Brero et al., 2017; Roth et al., 2017; Brero et al., 2018; Balcan et al., 2018; Ji et al., 2018). 2. Active Preference Elicitation In this section, we study the following active learning model: there is a single k-demand buyer in the market, to whom the learning agent (the seller) may pose demand queries, each consisting of a vector of prices. The buyer values the i-th item vi, where it is common knowledge that vi {0, 1, 2, . . . , W}. The actual values of the buyer, however, are unknown to the seller, and are for the seller to learn. The seller repeatedly posts prices on individual items. The buyer then buys the k (or fewer) items that maximize his quasilinear utility, and the seller observes the buyer s choice of the k (or fewer) items to buy. The question we are interested in is the following: what is the minimum number of rounds (i.e., demand queries) needed such that the seller can acquire enough information to be sure of (vi)i, and what algorithm achieves this number? 2.1. The Biased Binary Search Algorithm We present an algorithm based on biased binary search, Algorithm 1. The algorithm, generalizing the classical binary search procedure, works in the following way: first, we fix an item (item 1) as the reference item, and learn its valuation using binary search. Then, throughout the execution, the algorithm keeps track of the possible range [v i , v+ i ] of each item i s value. We maintain A as the set of items for which we have not yet learned the exact valuation. If, for a given demand query, the reference item is chosen, then we know that each item i that is not chosen gives utility at most that of the reference item, allowing us to update v+ i . If the reference item is not chosen, then we know that each item i that is chosen gives utility at least that of the reference item, allowing us to update v i . The algorithm sets prices in such a way that no matter what the chosen set is, the information gain (as measured by a potential function) from shrinking the ranges is always about the same. The word biased in the name indicates that the ranges do not necessarily shrink by a factor of 1 2. For example, in the unit demand case, if the reference item is chosen, we get to shrink all the other items ranges, but only by a little; whereas if another item is chosen, we get to shrink only that item s range, but by a lot. This ensures the information gain is (roughly) invariant. When we learn an item i s valuation and drop it from A, we update the number of items n = |A| whose valuation we Learning the Valuations of a k-demand Agent Algorithm 1 Biased Binary Search 1: Input: number of items n, range of value W 2: Output: (vi)i 3: Post price pi = for i 2, and binary search for v1. 4: Let v 1 = v+ 1 = v1, p1 = v1 0.5, A = {2, . . . , n}, n = n. 5: For each i A, let v i = 0, v+ i = W. 6: while true do 7: for i A do 8: Set pi = v+ i (v+ i v i ) k log(n /k) n 0.5. 9: end for 10: Ask a query at these prices; let S be the winning set. 11: if 1 S then 12: for i A \ S do 13: Let v+ i = pi + 0.5. 14: end for 15: else 16: for i S do 17: Let v i = pi + 0.5. 18: end for 19: end if 20: for i A do 21: if v+ i v i < 1 then 22: Let A = A \ {i}, n = n 1, pi = . 23: end if 24: end for 25: Break if 2k n . 26: end while 27: Let B be any subset of A of cardinality min(n , k). 28: Post price pi = for all i A \ B, and binary search in parallel for (vi)i B. 29: Post price pi = for all i B, and binary search in parallel for (vi)i A\B. 30: for i [n] do 31: Let vi = v+ i . 32: end for 33: Output (vi)i. still need to learn. If n becomes less than twice as large as k, we divide the remaining items in A into two groups of size not exceeding k; for each group, we perform binary search for all items in the group, while posting price for items in the other group to ensure they are never chosen. Because the size of neither group exceeds k, an item will be chosen if and only if its value exceeds the price, independent of the prices of the other items in the group. Hence, we can learn the values of all items in the group in parallel, via binary search. We now bound the query complexity of the algorithm. Theorem 1. Algorithm 1 computes the values (vi)i of the buyer, and has query complexity (1 + o(1)) n log W k log(n/k) + n + O(log W). Before proceeding to the proof, we note that in the more interesting case where k is not too large compared to n (i.e., k = o(n)), the term O(log W) is dominated by the other terms of the bound. Proof of Theorem 1. We first prove correctness. We show that throughout the repeat loop, we always have vi [v i , v+ i ]. Consider the update procedure from Line 10 to Line 18. When the reference item, item 1, is among the chosen ones, we know that for any unchosen item i, vi pi v1 p1 = 0.5. Therefore, vi pi + 0.5 and the right-hand side is what v+ i is updated to in this case. When item 1 is not chosen, we know that for any chosen item i, vi pi v1 p1 = 0.5. Therefore, vi pi + 0.5 and the right-hand side is what v i is updated to in this case. Now we prove the query complexity upper bound. The binary search for v1 takes log W demand queries. To analyze the dominant part of the complexity, let us define the following potential function: Φ(((v i , v+ i ))i) = X 1