# multipleprofile_predictionofuse_games__c0fd769f.pdf Multiple-Profile Prediction-of-Use Games Andrew Perrault and Craig Boutilier Department of Computer Science University of Toronto {perrault, cebly}@cs.toronto.edu Prediction-of-use (POU) games [Robu et al., 2017] address the mismatch between energy supplier costs and the incentives imposed on consumers by fixed-rate electricity tariffs. However, the framework does not address how consumers should coordinate to maximize social welfare. To address this, we develop MPOU games, an extension of POU games in which agents report multiple acceptable electricity use profiles. We show that MPOU games share many attractive properties with POU games attractive (e.g., convexity). Despite this, MPOU games introduce new incentive issues that prevent the consequences of convexity from being exploited directly, a problem we analyze and resolve. We validate our approach with experimental results using utility models learned from real electricity use data. 1 Introduction Prediction-of-use games were developed by Robu et al. [2017], hereafter RVRJ, to address the mismatch between the cost structure of energy suppliers and the incentive structure induced by the fixed-rate tariff faced by consumers. In most countries, energy suppliers face a two-stage market, where they purchase energy at lower rates in anticipation of future consumer demand and then reconcile supply and demand exactly at a higher rate at the time of realization through a balancing market [Team, 2011]. The cost to energy suppliers is thus highly dependent on their ability to predict future consumption. Since consumers typically have little incentive to consume predictably, suppliers use past behavior to predict consumption. The resulting prediction uncertainty incurs some additional cost for suppliers. One way to improve supplier predictions is to incentivize consumers to report predictions of their own consumption, thus offering access to their private information about the future. RVRJ analyze mechanisms where flat tariffs are replaced with prediction-of-use (POU) tariffs, in which consumers make a payment based on both their actual consumption and the accuracy of their prediction. Similar tariffs have Author now at Google Research, Mountain View, CA. been deployed in practice [Braithwait et al., 2007]. RVRJ analyze the cooperative game induced by POU tariffs, in which consumers form buying coalitions that reduce (aggregate) consumption uncertainty, and find that, under normallydistributed prediction error, the game is convex, a powerful property that significantly reduces the complexity of important problems in cooperative games. While attractive, the POU model has a significant shortcoming. Though the POU model could be adapted to model how consumers change their consumption in reaction to price changes, it is impossible for consumers to coordinate their consumption choices. A consumer s optimal consumption profile a random variable representing possible behaviors or patterns of energy consumption depends on the profiles others use. In POU games, the only consumer choice is what coalition to join consumer demand is represented by a single prediction, reflecting just one selected (or average) consumption profile. In essence, consumers predict their behavior without knowing anything about others in the game. While the POU model can offer social welfare gains when the profiles are selected optimally, we show they can result in significant welfare loss when profiles are uncoordinated. We introduce multiple-profile POU (MPOU) games, which extend POU games to admit multiple consumer profiles (or bids ). This allows consumers to coordinate the behaviors that change their predictions, facilitating the full realization of the benefits of the POU model. We show that MPOU games have many of the same properties that make the POU model tractable, e.g, convexity, which makes the stable distribution of the benefits of cooperation easy to compute. In addition, we show that MPOU games are individually rational and that consumer utility is monotone increasing as the number of truthfully-reported profiles increases. However, MPOU games also present a new challenge in coalitional allocation: since one can only observe an agent s (stochastic) consumption not their underlying behavior determining stabilizing payments for coalitional coordination requires novel techniques. We introduce separating functions, which incentivize agents to take a specific action in settings where actions are only partially observable. We experimentally validate our techniques, using household utility functions that we learn (via structured prediction) from publicly-available electricity use data. We find that the MPOU model provides a gain of 3-5% over a fixed-rate tariff Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) across several test scenarios, while a POU tariff without consumer coordination can result in losses of up to 30%. These experiments represent the first study of the welfare consequences of POU tariffs. Sec. 2 reviews cooperative games, the POU model and related work. Sec. 3 introduces MPOU games and Sec. 4 proves their convexity. Sec. 5 outlines the new class of incentive problems that arises when the mechanism designer cannot (directly) observe an agent s selected profile, and develops a general solution to that problem. In Sec. 6, we describe an approach for learning consumer utility models from real-world electricity usage data, and experimentally validate the value of MPOU games using these learned models in Sec. 7. 2 Background Cooperative Games: We briefly overview the relevant aspects of cooperative game theory. A prediction-of-use game is an instance of a cooperative game with transferrable utility [Osborne and Rubinstein, 1994], where agents can make arbitrary monetary payments to each other. In a cooperative game, the set N of agents divides into a set of coalitions, i.e., a disjoint partitioning of the agents. In a profit game, the characteristic function v : 2N Ñ R represents the value that any subset of agents can achieve by cooperating. A profit game is a tuple x N, vy. The agents in a coalition distribute the benefits of cooperation however they choose. An allocation is a payment function t : N Ñ R that assigns some payment (which may be negative) to each agent. An allocation is efficient if it distributes the entire value, i.e., ř i PN tpiq vp Nq. Agents receive no individual value under this model all value is redistributed via coalitional payments. In practice, the individual value accrued by an agent may be deducted from its payment in order to reduce total transfers. A major goal of cooperative game theory is to find allocations that prevent agents from defecting from their coalition, thus achieving stability. An allocation that stabilizes the grand coalition of all agents is in the core: Definition 1. Allocation t is in the core of profit game x N, vy if it is efficient and ř i PS tpiq ě vp Sq for all S Ď N. The core is a strong stability concept that may not exist for general games. Another central solution concept is the Shapley value s Cpiq of an agent i in coalition C, which emphasizes fairness and always exists. It values each agent according to their marginal value contribution when averaged over all join orders (i.e., the order in which agents are added to C): |S|!p|C| |S| 1q! |N|! pvp S Y tiuq vp Sqq (1) A convex game is one where the value contributed by an agent to a coalition never decreases as more agents are added to that coalition: Definition 2. Profit game x N, vy is convex if vp T Y tiuq vp Tq ě vp S Y tiuq vp Sq, for all i P N, S Ď T Ď Nztiu. Convex games have important properties [Shapley, 1971]: the grand coalition maximizes social welfare, the Shapley value is in the core, and a core allocation must exist and is computable in poly-time in the number of agents. Prediction-of-Use Games: A prediction-of-use (POU) game is a tuple x N, Π, τy. N is a set of agents, where each i P N uses electricity according to a consumption profile in Π, a normal random variable with mean µi and standard deviation σi, say, in kilowatt-hours (k Wh). Let xi denote i s realized consumption, xi Npµi, σiq. Agents are assumed to truthfully report their profiles to the coalition (we do not address elicitation or estimation of consumption here, but see below). A POU tariff has the form τ xp, p, py, and is intended to better align the incentives of the consumer and electricity supplier, whose costs are greatly influenced by how predictable demands are. Each agent i is asked to predict a baseline consumption bi, and is charged p for each unit of xi, plus a penalty that depends on the accuracy of their prediction: p for each unit their realized xi exceeds the baseline, and p for each unit it falls short: ψpxi, bi, τq # pj xi p pxi biq if bi ď xi pj xi p pbi xiq if bi ą xi (2) To ensure agents have no incentive to artificially inflate consumption, we require 0 ď p and 0 ď p ď p [Robu et al., 2017]. An agent i should report a baseline that minimizes her expected payment. RVRJ show that i does this by predicting b µi σiΦ 1p p p pq, where Φ 1 is the inverse normal CDF. They also show that i s expected payment under the optimal baseline is µip σi Lpp, pq where Lpp, pq ş p p p 0 Φ 1pyqdy. To be more predictable in aggregate, agents may form a coalition C, where C reports its aggregate demand and is charged as if it were a single agent. C s aggregate consumption is the sum of the normal random variables corresponding to the members profiles, itself normal with mean µp Cq ř i PC µi and std. dev. σp Cq ař i PC σ2 i . This aggregate prediction generally has lower variance w.r.t. the mean, thus reducing total penalty payments facing C under POU tariffs (compared to members acting individually). RVRJ analyze ex-ante POU games where agents make all decisions and internal payments are based on expected consumption (realized consumption plays no role). This approach is justified when agents are risk-neutral, expected-utility maximizers and coalitions form at the time of consumption prediction, not at realization. The characteristic value of coalition C is vp Cq µp Cqp σp Cq Lpp, pq, and they show that the exante POU game is convex.1 Related Work: POU games are closely related to newsvendor games [M uller et al., 2002], where a supplier must purchase inventory in advance of demand and faces a penalty for oversupply (storage costs) and undersupply (lost profit). Unlike POU games, the players are the suppliers, the demand distribution is known, and the primary object of study is the 1Technically, they define the game as a cost game and show that the game is concave, while we use a profit game, but results from the two perspectives translate directly. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) value that suppliers can gain by pooling their inventory. In addition to POU games, others have proposed the formation of cooperatives or coalitions among electricity consumers. Rose et al. [2012] develop a similar mechanism for truthfully eliciting consumer demand. Kota et al. [2012] and Akasiadas & Chalkiadakis [2013] propose using coalitions to improve reliability and shift peak power loads. Perrault et al. [2015] focus on the formation of groups of consumers with multiple profiles to reduce peak loads. None of this work offers the theoretical guarantees of RVRJ. Beyond electricity markets, several authors have studied the problem of group purchasing in an AI context. Lu and Boutilier [2012] study a restrictive class of buyer preferences (unit demand, only the supplier affects utility) and seller price functions (volume discounts), which has strong theoretical guarantees. Similarly, optimally matching a group of cooperative buyers to sellers has been studied [Sarne and Kraus, 2005; Manisterski et al., 2008]. 3 Multiple-Profile POU Games We extend POU games by allowing agents to report multiple profiles, each reflecting different behaviors or consumption patterns, and each with an inherent utility or value reflecting comfort, convenience, flexibility or other factors. This will allow an agent, when joining or bargaining with a coalition, to trade off cost especially the cost of predictability with her inherent utility. A multiple-profile POU (MPOU) game is a tuple x N, Π, V, τy. Given set of agents N, each agent i P N has a non-empty set of demand profiles Πi, where each profile πk xµk, σky P Πi reflects a consumption pattern (as in a POU model). Agent i s valuation function Vi : Πi Ñ R indicates her value or relative preference (in dollars) for her demand profiles.2 Admitting multiple profiles allows us to reason about an agent s response to the incentives that emerge with POU tariffs and in coalitional bargaining. We use the definition of POU tariffs and agent baselines as in POU games above. Notice that the optimal baseline report for an agent is now defined relative to the profile they use. As in POU games, agents are motivated to form coalitions to reduce the relative variance in their predictions. However, for a coalition C to accurately report its aggregate demand, its members must select and commit to a specific usage profile. We denote an assignment of profiles to agents as A : N Ñ Ś i PN Πi. Under such an assignment, C s consumption is normal, with mean µp C, Aq ř i PC µp Apiqq and std. dev. σp C, Aq ař i PC σ2p Apiqq. The aggregate value accrued by the coalition (prior to supplier payments) is the sum of its members values: V p C, Aq ř i PC Vip Apiqq. As in RVRJ, we begin by analyzing ex-ante MPOU games, where agents make decisions and payments before consumption is realized. The characteristic value v of a coalition C is the maximum value that coalition can achieve in expectation under full cooperation, i.e., an optimal profile assignment and baseline report, namely: vp Cq max A vp C, Aq, where vp C, Aq V p C, Aq µp C, Aqp σp C, Aq Lpp, pq (3) 2Such profiles and values may be explicitly elicited or estimated using past consumption data (see Sec. 6). Notice that profile selection does not arise in the POU setting. In our MPOU model, coalition value is non-concave, even if integrality of the assignment variables is relaxed, because the last term is a negative square root: σp C, Aq ař i PC σ2p Apiqq. We can perform the optimization using a mixed integer program by replacing the negative square root with a piecewise linear upper bound, which requires two binary variables per segment. As in other matching problems, we can relax the assignment variables: in practice, relaxed solutions that are very close to integral. In the following sections, we present a mechanism for MPOU games with which the grand coalition organizes the individual consumption behavior of it members and the payments that flow among them. The mechanism proceeds as follows: 1. Agents report their consumption profiles to the mechanism (we assume this report is truthful). 2. The mechanism calculates an assignment A of agents to profiles that maximizes social welfare. 3. The mechanism calculates an ex-ante core stable payment tpiq for each agent i that is based on all agents using their assigned profiles (Sec. 4). 4. For each agent with an incentive to defect from their assigned profile, the mechanism calculates a separating function Di (Sec. 5). 5. At realization time, each agent i receives tpiq. Each agent i that has a separating function receives Dipxiq, where xi is his/her realized consumption. 4 Properties of MPOU Games It is natural to ask whether, like POU games, ex-ante MPOU games are convex, since convexity simplifies the analysis of stability and fairness. We show that this is the case.3 Theorem 1. The ex-ante MPOU game is convex. Since the ex-ante MPOU game is convex, the Shapley value is in the core, hence we can compute a core allocation by averaging the payments from any number of join orders. In our experiments, we approximate the Shapley value by sampling [Castro et al., 2009]. It is important that agents are incentivized to participate in the mechanism. We show that MPOU games are individuallyrational no agent receives less utility than the best outside option. To achieve this, we augment an instance of the game by adding a dummy profile to each agent with value equal to that of their (best) outside option. Theorem 2. Let G be an MPOU game where each agent has a profile πpiq out with V pπpiq outq θi, σpπpiq outq µpπpiq outq 0, where θi is the value of i s outside option. Then, G is ex-ante individually rational if core payments are used. Proof. Core payments exist because G is an MPOU game, hence convex. Suppose, by way of contradiction, agent i receives an expected payment less than θi. The stability condition of core payments requires that tpiq ě vptiuq. However, this contradicts the fact that vptiuq ě θi. 3Proofs of all results are provided in an online appendix: cs.toronto.edu/ perrault/mpou-appendix.pdf. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) While we do not address the general problem of truthful reporting of profiles, we can show that a related, weaker condition holds: agents are not incentivized to strategically withhold information if they otherwise report truthfully. Theorem 3. Let G be an MPOU game, let G1 be identical to G except agent i reports an additional profile πpiq extra. Let all of i s reported profiles be truthful. Then, t G1piq, agent i s payoff in G1, is greater than or equal to t Gpiq, its payoff in G, if core payments are used that average marginal contributions over the same join orders. Proof. Each time agent i is added to a coalition S in a join order, agent i s marginal contribution to vp S Y tiuq with the extra profile is greater than or equal to its contribution with its original profiles. Thus, t G1piq ě t Gpiq. 5 Incentives in MPOU Games MPOU games introduce a new coordination problem for coalitions that does not occur in POU games. In a fullycooperative MPOU game, a coalition C agrees on a joint consumption profile prior to reporting its (aggregate) predicted demand. Despite this agreement, an agent i P C may have incentive to use a profile that differs from the one agreed to. For instance, suppose agent i has two profiles, π0 and π1, with Vipπ0q ą Vipπ1q, and that to maximize the social welfare of C, i should use π1 (and receive coalitional payment tpiq). By deviating from her agreed upon profile, i can increase her net utility (from tpiq to Vipπ0q Vipπ1q tpiq). Typically, a penalty should be imposed for such a deviation to ensure that C s welfare in maximized. Unfortunately, i s profile cannot be directly observed. Only her realized consumption xi is observable, and it is related only stochastically to her underlying behavior (adopted profile). As such, any such transfer or penalty in the coalitional allocation must depend on xi, showing that an ex-ante analysis is insufficient for MPOU games (in stark contrast to POU games). Furthermore, since xi is stochastic, it could have arisen from i using either profile (i.e., we have no direct signal of the i s chosen profile), which makes the design of such transfers even more difficult. Finally, the poor choice of a transfer function may compromise the convexity of the ex-ante game, undermining our ability to compute core payments. To address these challenges, we use a separating function Dipxiq. For each agent i, Di maps i s realized consumption to an additional ex-post separating payment. Definition 3. Di is a separating function (SF) for i under assignment A if it satisfies the incentive and zeroexpectation conditions. Incentive: Exi Apiqr Dipxiqs ą Exi πr Dipxiqs Vipπq Vip Apiqq for any π P Πi such that π Apiq. Zero-expectation: Exi Apiqr Dipxiqs 0. Intuitively, given an SF Di, the expected separating payment is large enough to prevent i from using any profile other than one it is assigned, while ensuring its expected payment is 0. Since agents are assumed to be risk neutral, each agent s payoffs are unaffected by addition of a SF as long as the agent uses the profile assigned by the coalition. Thus, payments remain in the core after the addition of an SF.4 The rest of this section describes how to find SFs. We begin by showing that a weaker form of separating function can trivially be transformed into a SF. Definition 4. Di is a weak separating function (WSF) for i under assignment A if Exi Apiqr Dipxiqs ą Exi πr Dipxiqs for any π P Πi such that π Apiq. Observation 1. Let Di be a WSF for i under assignment A. Then, D1 i w0Di w1 is an SF, where w0 maxπPΠi,π Apiq Vipπq Vip Apiqq Exi Apiqr Dipxiqs Exi πr Dipxiqs and w1 Exi Apiqrw0Dipxiqs. Thus, it is sufficient to find a WSF. When an agent has only two profiles, this is straightforward: we let Di be the PDF of the assigned profile minus the PDF of the unassigned profile. The proof for this statement is algebraic, using the fact that Npx; µ0, σ0q Npx; µ1, σ1q has a closed form that is proportional to a normal PDF in x (see online appendix). Theorem 4. Let i be an agent with two profiles π0 and π1 and let Apiq π0. Then, w.l.o.g., Dipxiq Npxi; µ0, σ0q Npxi; µ1, σ1q is a WSF for i under A. With more than two profiles, this approach does not always work. Instead, we can use a linear program (LP) to find coefficients of a linear combination of the profile PDFs. Formally, denote the PDFs of the profiles as N ipxiq x Npxi; µ0, σ0q, . . . , Npxi; µ|Πi| 1, σ|Πi| 1qy, their weights as yi, and search over yi P R|Πi| for a separating function of the form Dipxi, yiq yi N ipxiq. We use an LP that minimizes the L1-norm of yi subject to Exi Apiqr Dipxi, yiqs ą Exi πr Dipxi, yiqs for all π P Πi, π Apiq. Ideally, we would also like to minimize the variance of the separating payment, giving agents maximal certainty w.r.t. this payment; however, this objective is not tractable in an LP (we leave this question to future work). In our experiments below, we do, however, assess the variance of the separating payment. A feasible yi corresponds to a linear combination of vectors whose sum has only positive entries. We call these the difference vectors of Di. While we cannot prove that a feasible yi always exists, viewing the problem in terms of difference vectors suggests why they exist in practice: Definition 5. Let Apiq be π0 (w.l.o.g.). For each profile πk P Πi the difference vector dk Ex πkr Npx; π0, σ0s x Ex πkr Npx; µ1, σ1qs, . . . , Ex πkr Npx; µ|Πi| 1, σ|Πi| 1qsy. Note that these vectors do not depend on yi. We can restate the LP constraints using difference vectors: Theorem 5. Let i have profiles Πi and let A assign a profile to i. There exists yi P R|Πi| that makes Dipxi, yiq a WSF if and only if there is a linear combination of the difference vectors of Dipxi, yiq that has only positive entries. Corollary 1. Let dk be the difference vectors for agent i. If the difference vectors are linearly independent, a setting of yi exists that makes Dipxi, yiq a WSF. 4 Our use of zero-expectation payments for risk-neutral agents is mechanically similar to Cremer and Mc Clean s [1988] revenueoptimal auction for bidders with correlated valuations. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) NN(10) NN(10) NN(10) Figure 1: The learned valuation model. NNp10q denotes a neural network with 10 hidden units. Consumption (k Wh) Translated Tangent Figure 2: Translating the valuation function to pass through the origin. We generally expect a random set of vectors to be linearly independent when their entries are drawn from the reals. We have yet to encounter an instance where a separating function does not exist in our experiments. It is an open question as to whether a separating function of this form always exists. 6 Learning Utility Models To empirically test the MPOU framework and our separating functions, we require consumer utility functions. As we know of no data set with such utility functions, we learn household (agent) utility models from real electricity usage data from Pecan Street Inc. [Rhodes et al., 2014].5 We define our prediction period as 4-7 pm each day, when electricity usage typically peaks in Austin, Texas, where the data was collected. We decompose utility into two parts: V pµq i pw, µq describes the value an agent i derives from her mean consumption given a vector w of weather conditions; and V pσq i pσ, µq represents utility derived from variance in consumption behavior. Agent i s utility is Vipw, µ, σq V pµq i pw, µq V pσq i pσ, µq. Estimating V pµq i is difficult, since we lack data for some aspects of the problem. Thus, we make some simplifying assumptions: (i) consuming 0 k Wh yields value $0; and (ii) V pµq i pw, µq is concave and increasing. We learn a model for each of 25 households that have complete data from 2013 15 (about 1100 data points per household), using select weather conditions w and mean consumption between 4-7 pm as input, and outputting value (in dollars). We use this valuation function to predict consumption by maximizing an agent s net utility under the observed price: V pµq i pw, µq zp0q i pwq µ zp1q i pwq zp2q i pwq zp3q i pwq (4) constraining zp0q i ą 0, zp1q i ą 0, 0 ă zp2q i ă 1, zp3q i pwq ě 0 (Figure 1 depicts the utility model). We use a homogenous function to represent utility [Simon and Blume, 1994]. The term zp3q i pwq has no influence on predictions: it can be viewed as inherent value due to weather, and accounts for the flexibility provided by the zp1q i term, which may create valuations where consumption 0 yields negative value (violating our assumptions). To prevent this, we set zp3q i pwq to ensure the tangent at the predicted consumption for $0.64 (the largest price in the data set) passes through (0,0) (see 5Publicly available at pecanstreet.org. Model Mean train RMSE Std. dev. train RMSE Mean test RMSE Std. dev. test RMSE Valuation 0.137 0.0168 0.148 0.0194 Unstructured 0.142 0.0226 0.144 0.0284 Constant 0.204 0.0345 0.205 0.0411 Table 1: Comparison of model prediction accuracy by root-meansquare error (RMSE). We divide each household s consumption amounts by their largest observed consumption. Figure 3: Learned value models for three of the 25 households with consumption mean (k Wh) on the x-axis and value ($) on the y-axis. The red line represents the median weather conditions. The dotted line represents the median day with 90th percentile or higher temperature. The dashed and green lines are the same for sunshine and humidity, respectively. Figure 2). When this tangent crosses the y-axis above 0, we set zp3q i pwq 0 and splice in an exponential axb that passes through (0,0) and matches the derivative at the splice point. For training, we use the model to predict consumption by solving the net utility maximization problem, maxµp Vipw, µq µpq, yielding: zp0q i pwqzp2q i pwq zp2q i pwq 1 zp1q i pwq (5) We represent zp0q i , zp1q i and zp2q i in fully-connected singlelayer neural networks, each with 10 hidden units and Re LU activations, and train the model with backpropagation. We implement the model in Tensor Flow [Abadi et al., 2015] using the squared error loss function and the Adam optimizer [Kingma and Ba, 2015]. We use Dropout [Srivastava et al., 2014] with a probability of 0.7 on each hidden unit. We split the data into 80% train and 20% test for each household. Table 1 compares the prediction accuracy of our model ( valuation ) to (i) an unstructured neural network, and (ii) the best constant prediction for each household. The unstructured net learns a mapping from xw, py to µ directly using 10 hidden units, without an intervening utility model.6 The best constant prediction disregards weather and price data, and simply predicts average consumption for that household. Table 1 shows that the valuation model overfits somewhat, but that predictive accuracy is on par with the unstructured model. This shows that our constraints on the form of the valuation function are not unduly restrictive and validates the value predictions produced by these learned models. However, we believe these value functions significantly underestimate value because we lack consumption observations when the price is higher is than $0.64. Figure 3 shows the learned valuation functions for three of the 25 households (see online appendix for the other learned valuations). Each line represents a household s response to 6Our other implementation choices are the same as the valuation model, except we use Dropout of 0.5. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) different weather conditions. While temperature is the most significant predictor of power usage, different households appear to exhibit sensitivity to different factors (e.g., the household on the right is highly sensitive to humidity). Modeling Unpredictable Consumption: Unfortunately, we do not have access to electricity usage data where consumers are charged differently depending on the accuracy of their predictions. Our model of the value of unpredictable consumption is thus speculative, but uses the Pecan Street data as a starting point. We assume that each household chooses the σ that maximizes its utility (since they are not being charged for σ), and that it has an optimal fraction βi of σ{µ that does not depend on other conditions. We estimate βi from the data by treating each data point as having an observed σ equal to the absolute error in consumption prediction made by the learned valuation model. We assume no value is gained by increasing σ above the optimal ratio, and use an exponential to represent the loss in value when σ is reduced, V pσq i pσ, µq max ˆµ{σ βi , 1 γi , (6) where γi is a constant representing i s cost for being predictable. A higher γi means that consumer i values variance more highly. In our experiments, we sample γi from the uniform distribution over the interval r0.1, 2s. 7 Experiments The questions we study experimentally are: (i) how important is consumer coordination under POU tariffs; (ii) what is the overall social welfare gain from using an MPOU model vs. a flat tariff; (iii) how important is an agent s choice of reported profiles; and (iv) what are the variances of the payments introduced by the separating functions. We first describe the experimental setup: how we select agents, profiles and tariffs. For each trial, we select weather conditions w uniformly at random from the Pecan Street data. To generate agents, we sample from our 25 learned household utility models, using w as input and adding a small amount of zero mean noise to the model parameters. We sample γi from the uniform distribution r0.1, 2s for each agent i. Each data point is an average of 100 trials with 5000 agents, unless otherwise noted. One of the goals of our experiments is to study the consequences of different choices of reported profile. To do this, we vary the way profiles are generated. Each agent has four profiles: a base profile (predicted to be optimal under a flat rate tariff with rate equal to the fixed-rate p of the POU tariff), and three others reflecting reduced consumption mean or variance. The first reduces the base profile mean by the amount required to reduce value by u%, which we call the profile spacing. The second reduces variance to reduce value by u%. The third reduces both. We vary u throughout the experiments. To generate tariffs, we vary the amount of emphasis each puts on accurate predictions vs. the amount consumed. We let the predictivity emphasis (PE) of a tariff w.r.t. a group of agents be the fraction of the expected total cost paid for prediction penalties when each uses her base profile. In practice, PE should be set to match the properties of the reserve power generation capacity that is available: a higher PE corresponds to more expensive reserves. A tariff is revenue-equivalent to another with respect to a specific set of profiles if the revenue of the two is the same for that set. All of our tariffs will be revenue-equivalent with respect to the set of base profiles. To find a revenue-equivalent tariff with a certain PE, we use a numerical solver to find a tariff of the form xp, r, ry with the appropriate total cost. Intuitively, a higher PE should result in larger benefits from POU tariffs, and we find that to be the case in our experiments. To generate Shapley values, we sampled a number of join orders equal to the logarithm of the number of agents in the instance. Shapley values were very close to linear in the std. dev. of the assigned profile. The average Shapley payment for prediction was $0.41 per k Wh of uncertainty across trials with PE 10%, and $0.82 per k Wh with PE 20%.7 Within a single trial, the std. dev. of this ratio was less than 0.01 on average, suggesting that it is not necessary to optimize the choice of profiles every time an agent added in a join order it is sufficient to fix each agent s profile to the assigned one. We exploit this fact to run larger experiments. Results: We first address the question of how important it is for agents to coordinate their consumption under a POU tariff. We define the uncoordinated POU setting as the scenario where agents are subject to a POU tariff, but do not coordinate their consumption behavior, i.e., each agent uses the profile that individually maximizes her net utility relative to that POU tariff. Then, as is standard in that setting, the grand coalition forms and makes the optimal baseline prediction. Figure 4 shows the social welfare derived by agents in the uncoordinated POU setting as a percentage of their social welfare under a revenue-equivalent fixed-rate tariff. We see that the average social welfare achieved in the uncoordinated POU setting is less than that of the fixed rate setting for all profile spacings. Individual agents react to the POU tariff by increasing their predictivity, and thus decreasing their realized value, but they do not account for the predictivity discount that results from being part of a coalition. As profile spacing increases, more agents shift away from their base profile and social welfare decreases, reaching 70% when spacing is 25%. These results underscore the need for a way for agents to coordinate their profile choices under POU tariffs and highlight one of the main challenges of successfully implementing a POU tariff in practice. Next, we study the social welfare gain that can be achieved by a POU tariff when agents coordinate optimally under the MPOU framework. Figure 5 shows the effect of profile spacing (u) on the welfare gained by switching from a fixed-rate tariff to a revenue-equivalent POU tariff.8 Overall welfare gains are moderate, around 3.13% for PE of 10% and 4.44.9% for PE of 20%. A higher PE results in a larger social welfare gain because agents only benefit from cooperating when trading off predictivity for inherent utility. Profile spacing appears to have limited impact on social welfare gain, suggesting that most of the gain is achieved by the effective reduction in fixed-rate price under a POU tariff. We note that these experiments are the first to study end-to-end social wel- 7This and other tariffs in this section have 0.2 ď p p ď 1.5. 8Each instance took around 3 minutes on a single thread of 2.6 Ghz Intel i7, 8 GB RAM. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Figure 4: Profile spacing vs. % of social welfare of fixed-rate tariff for uncoordinated POU setting and % of agents that change profile. Figure 5: Profile spacing vs. social welfare % gain from fixed-rate tariff and % of agents that change profile. Figure 6: Comparison of the standard deviation of the separating function payment to the ex-ante payment for prediction accuracy. fare gain from a POU tariff. Figure 5 appears to indicate that personalizing profile spacing based on each agent s value for predictivity would increase social welfare further. We can see this because increasing profile spacing increases welfare up to a spacing of 15% for both PE levels, but the number of agents that shift profiles decreases as spacing is increased (shown on the rightside axis). Thus, we hypothesize that welfare could be further increased if agents with higher γ spaced their profiles farther apart than those with lower. Next, we address the question of uncertainty introduced by separating payments. Recall that while separating payments have expectation zero, they introduce additional uncertainty to agent payments. We find that the amount of uncertainty introduced is, in fact, minimal, and decreases with instance size and increased PE. The std. dev. of the separating payment is on average 15-20% of predictivity payment for PE of 10% and 7.5-10% for PE of 20%, and increases slightly as profile spacing increases. Note that only agents that actually require a separating function are taken into account, around 1-2% of all agents for PE of 10% and 5-10% for PE of 20%, on average. More agents require separating payments as PE increases, but the uncertainty introduced by each decreases. Note that these are uncertainties for a single instance of the game, and if the game is played repeatedly (e.g., every day), the aggregate uncertainty will decrease as the independent random variables are added. Figure 6 shows the same uncertainty ratio for a single large instance versus the predictivity flexibility (γ) of each agent. This instance has PE of 20%, 100,000 agents, profile spacing of 15% and takes 90 min. to solve. The ratio is shown for the 4876 agents that require separating functions. The magnitude of the introduced uncertainty is smaller in this larger instance with an average of 2.07% (and not exceeding 3% for any agent). In addition, predictivity flexibility has little affect on the introduced uncertainty: the linear least-squares fit (red line) has slope of less than 10 4. 8 Conclusion We have introduced multiple-profile POU (MPOU) games, a framework for coordinating agent behavior under POU tar- iffs. MPOU games allow agents to express their consumption utility functions, while maintaining convexity of the basic POU model. MPOU games introduce a new class of incentive problems due to agent actions being partially observable: we introduce separating payments to restore proper incentives. Our experimental utility models are learned from historical electricity usage data in a novel way. Our experiments show that, while social welfare gained by introducing the MPOU model (w.r.t. a fixed-rate tariff) appear moderate, the gains relative to a POU tariff are substantial. The gains over a fixed-rate tariff may be worthwhile in a large system and may be further enhanced by more sophisticated agent utility and behavior profile models. They depend both on the predictivity emphasis (PE) of reserve generation and on consumers value for consuming unpredictably, which are both areas where more real-world data is needed. We find that the uncertainty introduced by separating payments decreases as instance size increases, and decreases in aggregate as more iterations of the game are played. Increased PE increases the number of agents that need separating functions, but the uncertainty introduced decreases. Interesting future directions for POU/MPOU games remain. Greater access to household utility data, especially for variance of consumption, and data about the PE of generation mixes would allow us to more precisely test social welfare gain. In addition, it would be desirable to allow agents to make predictions contingent on intermediate predictions (e.g., of weather) thus reducing the need for agents to make accurate weather forecasts. While our discussion of POU and MPOU games has focused on electricity markets, we believe the approach may be more widely applicable in other cases where agents are contending with a scarce resource, e.g., cloud computing. Acknowledgments Perrault was supported by OGS. We gratefully acknowledge the support of NSERC. We thank Valentin Robu, Meritxell Vinyals, Marek Janicki, Jake Snell, and the anonymous reviewers for their helpful suggestions. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) References [Abadi et al., 2015] Mart ın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Man e, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Vi egas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensor Flow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org. [Akasiadis and Chalkiadakis, 2013] Charilaos Akasiadis and Georgios Chalkiadakis. Agent cooperatives for effective power consumption shifting. In Proceedings of the Twenty-seventh AAAI Conference on Artificial Intelligence (AAAI-13), pages 1263 1269, Bellevue, WA, 2013. [Braithwait et al., 2007] Steven Braithwait, Dan Hansen, and Michael O Sheasy. Retail electricity pricing and rate design in evolving markets. Edison Electric Institute, pages 1 57, 2007. [Castro et al., 2009] Javier Castro, Daniel G omez, and Juan Tejada. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research, 36(5):1726 1730, 2009. [Cremer and Mc Lean, 1988] Jacques Cremer and Richard P Mc Lean. Full extraction of the surplus in Bayesian and dominant strategy auctions. Econometrica, pages 1247 1257, 1988. [Kingma and Ba, 2015] Diederik P. Kingma and Jimmy Lei Ba. Adam: A method for stochastic optimization. In Proceedings of the 3rd ACM SIGKDD International Conference on Learning Representations (ICLR-15), San Diego, 2015. [Kota et al., 2012] Ramachandra Kota, Georgios Chalkiadakis, Valentin Robu, Alex Rogers, and Nicholas R Jennings. Cooperatives for demand side management. In Proceedings of the Twenty-First European Conference on Artificial Intelligence (ECAI-12), pages 969 974, Montpellier, France, 2012. [Lu and Boutilier, 2012] Tyler Lu and Craig Boutilier. Matching models for preference-sensitive group purchasing. In Proceedings of the Thirteenth ACM Conference on Electronic Commerce (EC 12), pages 723 740, Valencia, Spain, 2012. [Manisterski et al., 2008] Efrat Manisterski, David Sarne, and Sarit Kraus. Enhancing cooperative search with concurrent interactions. Journal of Artificial Intelligence Research, 32(1):1 36, 2008. [M uller et al., 2002] Alfred M uller, Marco Scarsini, and Moshe Shaked. The newsvendor game has a nonempty core. Games and Economic Behavior, 38(1):118 126, 2002. [Osborne and Rubinstein, 1994] Martin J. Osborne and Ariel Rubinstein. A Course in Game Theory. MIT Press, Cambridge, 1994. [Perrault and Boutilier, 2015] Andrew Perrault and Craig Boutilier. Approximately stable pricing for coordinated purchasing of electricity. In Proceedings of the Twentyfourth International Joint Conference on Artificial Intelligence (IJCAI-15), Buenos Aires, 2015. [Rhodes et al., 2014] Joshua D Rhodes, Charles R Upshaw, Chioke B Harris, Colin M Meehan, David A Walling, Paul A Navr atil, Ariane L Beck, Kazunori Nagasawa, Robert L Fares, Wesley J Cole, et al. Experimental and data collection methods for a large-scale smart grid deployment: Methods and first results. Energy, 65:462 471, 2014. [Robu et al., 2017] Valentin Robu, Meritxell Vinyals, Alex Rogers, and Nicholas Jennings. Efficient buyer groups with prediction-of-use electricity tariffs. IEEE Transactions on Smart Grid, 2017. [Rose et al., 2012] Harry Rose, Alex Rogers, and Enrico H Gerding. A scoring rule-based mechanism for aggregate demand prediction in the smart grid. In Proceedings of the Eleventh International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS-12), pages 661 668, Valencia, Spain, 2012. [Sarne and Kraus, 2005] David Sarne and Sarit Kraus. Cooperative exploration in the electronic marketplace. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05), pages 158 163, Pittsburgh, 2005. [Shapley, 1971] Lloyd S. Shapley. Cores of convex games. International Journal of Game Theory, 1:11 26, 1971. [Simon and Blume, 1994] Carl P Simon and Lawrence Blume. Mathematics for economists, volume 7. Norton New York, 1994. [Srivastava et al., 2014] Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929 1958, 2014. [Team, 2011] G. M. Team. Electricity and gas supply market report. Technical Report 176/11, The Office of Gas and Electricity Markets (Ofgem), December 2011. Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)