# preventing_arbitrage_from_collusion_when_eliciting_probabilities__6119acfd.pdf The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) Preventing Arbitrage from Collusion When Eliciting Probabilities Rupert Freeman,1 David M. Pennock,1 Dominik Peters,2 Bo Waggoner3 1Microsoft Research, 2Carnegie Mellon University, 3CU Boulder {rupert.freeman, dpennock}@microsoft.com, dominikp@cs.cmu.edu, bwag@colorado.edu We consider the design of mechanisms to elicit probabilistic forecasts when agents are strategic and may collude with one another. Chun and Shachter (2011) have shown that when agents may form coalitions, many known mechanisms for elicitation permit arbitrage, allowing the coalition members to guarantee themselves higher payments by misreporting their beliefs. We consider two approaches to protect against colluding agents. First, we present a novel strictly proper mechanism that does not admit arbitrage provided that the reports of the agents are bounded away from 0 and 1, a common assumption in many settings. Second, we discover strictly arbitrage-free mechanisms that satisfy an intermediate guarantee between weak and strict properness. 1 Introduction Suppose that a decision maker, the principal, asks an agent to estimate the probability that it will rain tomorrow. The principal would like the agent to respond with their true belief. The principal can incentivize truthfulness by asking the agent for the prediction today, then paying the agent an amount of money tomorrow that depends on whether it actually rained and how accurate the prediction was. Payment schemes in this setting are called scoring rules. For example, using Brier s (1950) famous quadratic scoring rule, the principal would pay the agent $1 (1 ˆp)2 if it rains, and $1 ˆp2 if it does not, where ˆp is the probability that the agent reported. For example, if ˆp = 0.6, then the agent earns $0.84 if it rains and $0.64 otherwise. Calculus shows that if the agent wishes to maximize their expected payment, it is uniquely optimal for the agent to report their truthful subjective belief. Scoring rules with this property are called strictly proper. The principal may wish to obtain an estimate from several agents. If each agent is paid their quadratic score, every agent is incentivized to report truthfully, just as in the single-agent case. However, French (1985) observed that the quadratic score may not be incentive compatible if agents are able to collude. Assume that the agents know each other, can communicate their beliefs, and can transfer money among themselves. Suppose three agents were to truthfully report ˆp = (0.2, 0.4, 0.6). Then the quadratic score pays them a total of $0.36 + $0.64 + $0.84 = $1.84 Copyright c 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. if it rains, and $0.96 + $0.84 + $0.64 = $2.44 otherwise. If the group instead decided to calculate their mean belief (0.4) and report ˆp = (0.4, 0.4, 0.4), then quadratic score payments are $0.64 + $0.64 + $0.64 = $1.92 if it rains, and $0.84 + $0.84 + $0.84 = $2.52 otherwise. In both cases, the group obtains strictly more money than before. They could then distribute their gains in some way so that each member is strictly better off, no matter the outcome. This example is general: groups with at least some disagreement are always strictly better off if they report their mean belief. French (1985) shows that this is true not only for Brier s quadratic rule, but for every concave scoring rule. Chun and Shachter (2011) termed this phenomenon arbitrage, since it allows groups of colluding agents to risklessly increase their payoff. They showed that all strictly proper scoring rules, whether concave or not, permit arbitrage, and that more complicated multi-agent rules are also vulnerable, including market scoring rules (Hanson 2003) and competitive scoring rules (Kilgour and Gerchak 2004; Lambert et al. 2008). We may hope that collusion of this type is uncommon due to coordination and communication difficulties. However, it is often the case that forecasters are assigned to small groups for the purposes of deliberation before reporting their probabilities.1 While deliberation of this type can improve forecast accuracy by encouraging information sharing, it also provides an opportunity for collusion. Further, the profit opportunity might encourage third parties to act as hidden intermediaries, by collecting individual reports, reporting aggregate beliefs to the principal, and then keeping the gain from arbitrage. Wide-spread arbitrage can come at a high cost to the principal s goals. For example, if the principal plans to pool the reports into an aggregate forecast, the result will be distorted.2 In our example, all colluding agents report the same probability, thereby hiding the true level of disagreement among 1For example, the Good Judgment Project (Tetlock and Gardner 2015), the winning competitor in the recent IARPA ACE forecasting competititon, employed this technique. 2The unweighted arithmetic mean does not change if colluding agents report their group mean, as in our example. However, the principal may wish to compute a weighted mean, a geometric mean (Genest and Zidek 1986), or another aggregate measure such as a supra-Bayesian inference (Morris 1977). agents, which may cause the principal to have too much confidence in an aggregate forecast. Misreports due to arbitrage also obscure the forecasters relative accuracy, which is problematic if the principal wants to identify the most accurate forecasters. Fundamentally, colluding agents are extracting additional payment, at the expense of either the principal or some other agents, without contributing correspondingly valuable information. Chun and Shachter (2011) hoped to develop a mechanism resistant to collusion, but concluded that it is still an open question whether there is any strictly proper mechanism that does not admit arbitrage, but it seems unlikely. The question remains open. However, if either strict properness or noarbitrage is slightly weakened, we show that collusion-proof mechanisms do exist. First, in Section 3, we design a strictly proper mechanism that does not admit arbitrage as long as there is an ϵ > 0 such that each agent report is between ϵ and 1 ϵ. In practice, this is a mild restriction. For instance, the prediction market Predict It3 (which operates by continuous double auction) does not allow trades at less than 1c or more than 99c, equivalent to setting ϵ = 0.01. Systems based on the popular logarithmic scoring rule often impose a similar bound on reports in order to avoid infinite agent losses, which occur whenever an agent assigns probability zero to the outcome that is eventually realized. Each of the mechanisms studied by Chun and Shachter (2011) provides arbitrage possibilities for every possible profile of agent reports with at least some disagreement. In contrast, our mechanism avoids arbitrage in almost all cases, except for agents reporting extremal beliefs. A drawback of our mechanism is that, when choosing ϵ small, very large payments may be required. If payments on extreme instances are not high enough, then there exist instances where the average report is close to 0.5 on which all agents receive almost indistinguishable scores. While this may limit the practical viability of our scheme, the existence of a no-arbitrage mechanism for a wide class of agent reports sheds a new optimistic light on Chun and Shachter s conjecture, prompting hope for the existence of a fully arbitrage-free, strictly proper mechanism. Second, in Section 4, we design weakly proper mechanisms that fully avoid arbitrage for unrestricted agent reports. We begin by studying mechanisms where an agent s payoff depends only on their own report. We fully characterize the set of arbitrage-free and weakly proper scoring rules as those that offer at most two sets of conditional payments, for example paying $1 if and only if the agent s report is greater than 0.5 for the true outcome. Under these mechanisms, truthtelling is an undominated strategy, but not uniquely. By combining these rules with a rule that pays each agent the score of the median report, we obtain arbitrage-free mechanisms that are weakly dominant proper: truth-telling is the unique undominated strategy. Weakly dominant proper mechanisms are close to strictly proper: incentives are not strict only if some profiles are impossible. We consider these weakly dominant proper mechanisms to be compelling practical options for a principal who wants to avoid collusion. 3https://www.predictit.org/ 2 Preliminaries Let X be a Boolean4 random variable: an event with possible outcomes 0 and 1. Let N be a set of n agents. Each agent has a belief pi [0, 1] as to the probability that X = 1, and reports ˆpi [0, 1]. Following the nomenclature of Chun and Shachter (2011), a contract function Π : [0, 1]n {0, 1} Rn maps agent reports and the event outcome to payments. A positive payment is a payment from the mechanism to the agent. We refer to agent i s payment as Πi. Contract functions are a general class of mechanisms that include standard proper scoring rules (Brier 1950; Savage 1971; Gneiting and Raftery 2007), market scoring rules (Hanson 2003), competitive scoring rules (Kilgour and Gerchak 2004), and wagering mechanisms (Lambert et al. 2008). A contract function Π is weakly proper if for every agent i with belief pi, every vector of other agents reports ˆp i, and every ˆpi, piΠi((pi, ˆp i), 1) + (1 pi)Πi((pi, ˆp i), 0) piΠi((ˆpi, ˆp i), 1) + (1 pi)Πi((ˆpi, ˆp i), 0). We say that Π is strictly proper if the inequality is strict whenever ˆpi = pi. Thus, for a strictly proper Π, the expected payoff to i is uniquely maximized when i reports the true belief pi. A contract function Π admits arbitrage if there exists a coalition C N and vectors q = (qi)i N and r = (ri)i N with qi = ri for all i C for which i C Πi(q, x) i C Πi(r, x) for each x {0, 1}, and the inequality is strict for some x. This definition is adapted from Chun and Shachter (2011), who define a version that requires that arbitrage is possible whenever agents disagree. We say that a contract function that does not admit arbitrage is arbitrage free. A scoring rule, s : [0, 1] {0, 1} R, is a function that assigns a real-valued score to an agent based only on the agent s report and the realized outcome. We say that s is weakly (resp. strictly) proper if the corresponding contract function Π that pays each agent Πi(ˆp, x) = s(ˆpi, x) is weakly (resp. strictly) proper. It is easy to see that a contract function is weakly (resp. strictly) proper if, and only if, for all fixed reports of the other agents ˆp i, the payment to agent i takes the form of a weakly (resp. strictly) proper scoring rule. Scoring rule characterization. We will utilize the wellknown characterization of proper scoring rules (Mc Carthy 1956; Savage 1971; Schervish 1989; Gneiting and Raftery 2007). 4In the full version we show how to extend our results to any finite discrete outcome space. 0 0.2 0.4 0.6 0.8 1 0 Figure 1: Illustration of Theorem 1 for an agent with belief p = 0.4 who reports ˆp = 0.6. The horizontal axis is belief and the vertical axis is score. The line segment is s(0.6; p), the expected score for report 0.6 under belief p. Its endpoints are the scores s(0.6, 0) and s(0.6, 1) in outcomes 0 and 1, respectively. Theorem 1. Scoring rule s is (strictly) proper if and only if there exists a (strictly) convex function G : [0, 1] R with s(ˆp, x) = G(ˆp) + d G(ˆp) (x ˆp), where d G(ˆp) is a subgradient of G at ˆp. Furthermore, G(p) is the expected score for truthul reporting with belief p. Write s(ˆp; p) for the expected score of report ˆp under belief p. From Theorem 1 and linearity of expectation, we obtain s(ˆp; p) = G(ˆp) + d G(ˆp) (p ˆp). Note that s(p; p) = G(p) as claimed. A pictorial representation (Gneiting and Raftery 2007) is shown in Figure 1. Given G and a fixed report ˆp = 0.6, the function s(0.6; p) traces a subtangent line of G, tangent at p = 0.6. Its slope is equal to the subgradient d G(ˆp). The agent s score in outcome 0 or 1 is the height of this subtangent at the vertical line p = 0 or p = 1, respectively. An agent who believes p = 0.4 and reports ˆp = 0.6 expects to receive payment equal to the height of the subtangent at the vertical line p = 0.4, an amount strictly worse than G(0.4), their expected payment when reporting truthfully. 3 An Arbitrage-Free Rule Under Bounded Reports In this section we present a strictly proper contract function. It does not permit arbitrage when the reports of the agents are bounded in (ϵ, 1 ϵ), for some ϵ > 0. We denote our contract function M k, where k is a parameter that is tuned depending on ϵ. Payments are defined by M k i (ˆp, x) = ( n j=1 ˆpj n 2 )k + k(x ˆpi)( n j=1 ˆpj n 2 )k 1, where k is an even integer. We first verify that M k is strictly proper.5 5While we may want to bound the reports away from extremes, we do not require any assumptions on the agents beliefs. If beliefs fall outside of the range of allowable reports, strict properness implies that agents will prefer to report the allowable report closest to their belief. Lemma 2. M k is strictly proper. Proof. The result follows from Theorem 1, setting G(ˆpi) = ( n j=1 ˆpj n To gain intuition, let us describe M k in words, with the assistance of Figure 2. The blue curve is the function G : [0, n] R defined by G(ˆp) = ( n j=1 ˆpj n 2 )k, which defines payoffs over the space of all possible reports. For fixed reports of other agents ˆp i summing to c, agent i is faced with a scoring rule derived from G(ˆpi) = (c+ ˆpi n 2 )k, corresponding to a width-1 interval of the function G. An interesting feature of M k is that each agent receives the same expected payment from truthful reporting: Pictorially, each agent s payments are determined by a subtangent of G taken at the same point. We now show that if it is possible to bound agents reports away from 0 and 1, then there exists a k for which M k is arbitrage free. The proof proceeds by first showing that the payment a group of agents receives is a function only of the sum of their reports, and that, while all reports are within (ϵ, 1 ϵ), this function is increasing for X = 1 and decreasing for X = 0. Theorem 3. Suppose there exists an ϵ > 0 such that ˆpj (ϵ, 1 ϵ) for all agents j. Then M k is arbitrage free whenever k > n+2 Proof. Consider reports ˆp and a coalition of agents C = {1, . . . , |C|}, where |C| 2. Let us examine the total payment to agents in C when X = 1. For conciseness, given a subset S N, we use the notation ˆp S = j S ˆpj to denote the sum of reports of agents in S. j C M k j (ˆp, 1)) j C ((ˆp N n 2 )k + k(1 ˆpj)(ˆp N n = |C|(ˆp N n j C (1 ˆpj) = |C|(ˆp N n 2 )k 1(ˆp N\C n 2 + k + (1 k |C|)ˆp C) Note that this is a function of ˆp C and ˆp N\C only. Now we differentiate with respect to ˆp C = j C ˆpj: j C M k j (ˆp, 1) = (k 1)|C|(ˆp N n 2 )k 2(ˆp N\C n 2 + k + (1 k + (1 k |C|)|C|(ˆp N n = |C|(ˆp N n 2 )k 2 (k 1)(ˆp N\C n 2 + k + (1 k + (1 k |C|)(ˆp N n = k|C|(ˆp N n 2 )k 2 k 1 + (1 1 |C|)ˆp N\C + (1 k |C|)ˆp C n 2 (1 1 |C|) 0 0.5 1 1.5 2 2.5 3 3.5 4 4 ˆp2 + ˆp3 + ˆp4 = 0.25 ˆp2 + ˆp3 + ˆp4 = 1.5 ˆp2 + ˆp3 + ˆp4 = 3 G Figure 2: Illustration of contract function M k, k = 2, for agent 1 of four. The horizontal axis is the sum of all reports ˆpi and the vertical axis is score. We consider three examples: the sum of the reports of agents 2, 3, and 4 is 0.25 (left), 1.5 (middle), and 3 (right). Each case induces a strictly proper scoring rule for agent 1, as in Figure 1: the expected score function G is shown in solid blue along with subtangent s(0.6; p) in orange. We show that the derivative is positive whenever k > n+2 2ϵ . The multiplicative factor k|C|( n i=1 ˆpi n 2 )k 2 is always positive because k is even, so we can ignore it and focus on the remaining part of the expression: |C|)ˆp N\C + (1 k |C||C|(1 ϵ) n The first inequality comes from removing positive terms and the fact that ˆpj < 1 ϵ for all j, and the final inequality from the definition of k. In particular, the positive derivative implies that any coordination in which the colluding agents decrease the sum of their reports results in them collectively receiving a strictly lower payment when X = 1. Therefore, any successful arbitrage must have the agents in C increase the sum of their reports. But, because M k is symmetric in the possible outcomes, an identical argument shows that if the agents in C increase the sum of their reports, their total payment for outcome X = 0 would strictly decrease. Hence, arbitrage is not possible if the colluding agents change the sum of their reports. However, recall that the expression for j C M k j (ˆp, 1) is a function of ˆp C and ˆp N\C. Therefore, any group deviation that does not change the sum ˆp C does not change the total payment for X = 1 or, symmetrically, for X = 0, and so cannot constitute arbitrage. This completes the proof. Our next example demonstrates the failure of M k to be fully arbitrage free. Example 1. Consider M k with k = 2 (for higher k, a similar example works but requires more extreme reports). Let p = (0, 0, 0, 0.98, 0.98). If X = 0, each of the last two agents receives payment (1.96 2.5)2 0.98 (1.96 2.5) = 0.8208. If X = 1, then each of them receives (1.96 2.5)2 + 0.02 (1.96 2.5) = 0.2808. Suppose they both instead report 0.97 (so ˆp = (0, 0, 0, 0.97, 0.97)). Now, each of the last two agents receives payment (1.94 2.5)2 0.97 (1.94 2.5) = 0.8568 if X = 0 and (1.94 2.5)2 + 0.03 (1.94 2.5) = 0.2968 if X = 1. Under the manipulation, agents 4 and 5 receive higher payments in both outcomes, constituting arbitrage. We end this section by elaborating on the relationship between k, n, and ϵ. Even for n = 5 agents and ϵ = 0.1 (reports bounded between 0.1 and 0.9), our bound on k requires k > 35. Since payments are on the order of ( n i=1 ˆpi n 2 )k, the principal may have to make payments as large as (5 5 2)35 1013. Of course, these payments can be scaled down by any constant while maintaining strict properness, as is standard in elicitation settings to deal with real-world budget constraints, but then it becomes possible that agents in other profiles (say, when n i=1 ˆpi n 2 ) receive payments that are very close to 0, no matter what they report. Designing a rule that provides reasonable incentives to agents while also permitting the principal a reasonable loss in the worst case is an interesting question that we leave open. 4 Weakly Proper Arbitrage-Free Rules In the previous section, we presented a contract function that is strictly proper and arbitrage free when reports are bounded away from 0 and 1. In this section, we relax strict properness to weak properness, with the goal of achieving fully arbitrage-free contract functions. Weak properness on its own is a very weak guarantee. Paying each agent a constant amount Πi(ˆp, x) = d is weakly proper, because no agent can profit by reporting ˆpi = pi. It is also arbitrage free, since the total payment to any group of agents is also fixed by the mechanism in advance. However, this is not a compelling contract function; a constant score can hardly be called a scoring function at all. Agents paid a fixed amount have no incentive to gather or process information if there is even a minimal cost to do so. We will consider two properties to avoid such degenerate contract functions. First, we want a contract function that distinguishes between good and bad reports. To this end, we say that a contract function Π satisfies weak distinguishability if, whenever there exist agents i, j with ˆpi = 0 and ˆpj = 1, we have Πi(ˆp, 0) > Πj(ˆp, 0) and Πi(ˆp, 1) < Πj(ˆp, 1). An agent with a maximally accurate report should receive a strictly higher payment than an agent with a maximally inaccurate report. While this may seem like a trivial property, it turns out that mechanism M k from Section 3 actually fails weak distinguishability. Second, observe that under the constant payment rule, agents can misreport safely: even without knowing the reports of the other agents, an agent can misreport and be guaranteed to not regret their misreport once the other reports are revealed. If a contract function is weakly proper and does not permit safe misreports, we say that it is weakly dominant proper. Formally, a weakly proper contract function Π is weakly dominant proper if, for any agent i with belief pi and any report ˆpi = pi, there exist reports ˆp i of the other agents such that piΠi((pi, ˆp i), 1) + (1 pi)Πi((pi, ˆp i), 0) > piΠi((ˆpi, ˆp i), 1) + (1 pi)Πi((ˆpi, ˆp i), 0). Thus, for any misreport, there is a possible configuration of others reports such that the expected value of the misreport is strictly lower than the expected value of reporting truthfully. Stated in game-theoretic terms, reporting truthfully is a weakly dominant strategy.6 Weakly dominant properness was implicitly defined, but not named, by Freeman, Pennock, and Wortman Vaughan (2017). 4.1 Weakly Distinguishing Scoring Rules Let us more closely examine contract functions that pay all agents according to a fixed scoring rule. Chun and Shachter (2011) showed that all such rules admit arbitrage whenever s is strictly proper. What if s is only weakly proper? We know from our discussion above that there exists at least one such rule the constant payment rule that is arbitrage free. In this section, we provide a complete characterization of the class of arbitrage-free weakly proper scoring rules. For any convex function G, write G(q) = maxi I Si(q), for some index set I, where each Si is an affine function defining contingent payments Si 0 and Si 1. Reporting ˆp is equivalent to choosing Si where i = arg maxi I Si(ˆp), the affine function tangent to G at ˆp. As a technicality, we assume no redundant choices; that is, each Si is the unique optimal choice in expectation for at least one belief. Say that G consists of ℓchoices if it is piecewise linear with ℓ 1 linear pieces, and call the corresponding weakly proper scoring rule s an ℓ-choice scoring rule. Theorem 4. Suppose that a contract function pays all agents according to a single proper scoring rule s. Then the contract function is arbitrage free if and only if s is a 1-choice or 2choice scoring rule. Proof. If s is 1-choice, then the corresponding contract function is clearly arbitrage free since payments do not depend on reports. If s is 2-choice, then the report space [0, 1] is divided 6Translated to notions of dominance in game theory (Shoham and Leyton-Brown (2008) provide an introduction), strict properness corresponds to truthtelling being a strictly dominant strategy, weakly dominant properness corresponds to weak dominance, and weak properness corresponds to very weak dominance. 0 0.2 0.4 0.6 0.8 1 Figure 3: A generic scoring rule that is not 1-choice or 2choice, with three particular choices highlighted. into a low and a high region, offering one affine function for each. Call these affine functions Sℓand Sh. By convexity, it must be the case that Sℓ 0 > Sh 0 and Sℓ 1 < Sh 1 . Any rearrangement of the colluding agents reports in which more agents receive contingent payments Sh decreases their total payment if X = 0, while the reverse is true if the rearrangement results in more agents with contingent payments Sℓ. If the rearrangement does not change the number of colluding agents receiving contingent payments Sh and Sℓ, then the total payment remains unchanged for both X = 0 and X = 1. Therefore no arbitrage opportunity exists. For the reverse direction, suppose that s is not 1-choice or 2-choice. So G is a maximum over at least three affine functions Sa, Sb, Sc, each of which is the unique maximizer for at least some q [0, 1]. Generically, the situation must be as pictured in Figure 3: convexity of G dictates that Sa 0 > Sb 0 > Sc 0 and Sc 1 > Sb 1 > Sa 1. We will show that some fractions of agents with contingent payments Sa and Sc can arbitrage by all switching to Sb. Let p be a belief for which Sb is uniquely optimal (note 0 < p < 1 by construction); then in particular p Sb 1 + (1 p)Sb 0 > p Sa 1 + (1 p)Sa 0 p Sb 1 + (1 p)Sb 0 > p Sc 1 + (1 p)Sc 0 implying that, for all α [0, 1], p Sb 1 + (1 p)Sb 0 (1) > p (αSa 1 + (1 α)Sc 1) + (1 p) (αSa 0 + (1 α)Sc 0) Suppose an α fraction of agents are choosing Sa and a 1 α fraction choose Sc. Consider the conditions for which they have a higher average payoff than if they all chose Sb, under outcomes X = 0 and X = 1 respectively. Because the payoffs are strictly monotone affine functions of α, there exist unique thresholds α, α such that: αSa 0 + (1 α)Sc 0 Sb 0 α α (2) αSa 1 + (1 α)Sc 1 Sb 1 α α (3) Now we claim α < α: if there exists α for which the inequalities (2) and (3) hold simultaneously, we get a contradiction with inequality (1). Intuitively, if the mixture of Sa and Sc is better on average both when X = 0 and when X = 1, then the mixture would be better on average for belief p. This implies α < α. So let α be a rational number satisfying α < α < α. Write α = n n+m where n, m N and construct an instance with n agents initially choosing Sa while m agents initially choose Sc. Now suppose all agents switch to selecting Sb. By (2) and (3) and the choice of α , their average payoff strictly improves both when X = 0 and X = 1. This proves an arbitrage opportunity. If s is 1-choice then every agent receives the same payment, and so the contract function fails weak distinguishability. However, when s is 2-choice, agents are separated into two groups based on their reports. Those with higher reports (including ˆpi = 1) get paid more than those with lower reports (including ˆpi = 0) when X = 1, and less when X = 0. Therefore these rules satisfy weak distinguishability. They are not weakly dominant proper since an agent is always indifferent between any two reports that induce the same contingent payments. As a natural example of a 2-choice scoring rule, consider the 0-1 score paying $1 iff the agent s report favors the true outcome: ˆp 0.5 and X = 1 or ˆp < 0.5 and X = 0. This rule corresponds to the 2-piecewise-linear function G(ˆp) = max {ˆp, 1 ˆp}. 4.2 Weakly Dominant Properness and the Median Rule We now define a contract function that is arbitrage free and weakly dominant proper. Suppose for convenience that n is odd and fix a strictly-proper scoring rule s. We will simply pay each agent the score (according to s) of the median report.7 Formally, Πi(ˆp, x) = s(med(ˆp), x). The median rule, like any rule that pays according to an aggregate statistic of all reports, is clearly arbitrage free. In order for a group of colluding agents to change the payments made by the mechanism, they must change the value of the aggregate statistic, which strictly decreases the total payment for some outcome. Additionally, the median mechanism is weakly dominant proper. For weak properness, consider an agent i with belief pi and report ˆpi. It is easy to see that either med(ˆp) med((pi, ˆp i)) pi or med(ˆp) med((pi, ˆp i)) pi. In either case, i achieves a (weakly) higher expected payment by reporting pi than ˆpi.8 To show that the median mechanism satisfies the additional requirement for weakly dominant properness, suppose that agent i reports ˆpi > pi (the case ˆpi < pi can be handled symmetrically). We must exhibit reports ˆp i of the other agents for which i would have strictly preferred to report pi. To do so, suppose that n 1 2 of the other agents report ˆpj < pi 7If n is even, we can use the left or right median instead. 8That an agent s expected score is a single-peaked function of their report follows easily from Theorem 1. 2 of them report ˆpi < ˆpj. If i had truthfully reported pi, the median report would have been her report pi and she would receive payment s(pi, x). Instead, the median report is her misreport ˆpi and she receives payment s(ˆpi, x). By strict properness of s, agent i strictly prefers the former. We have proven the following theorem. Theorem 5. The median payment rule is arbitrage free and weakly dominant proper. The median payment rule does not satisfy weak distinguishability, because it pays all agents identically. 4.3 Weakly Distinguishing and Weakly Dominant Proper Contract Functions We now combine the mechanisms from each of the previous subsections to obtain a contract function that preserves the desirable properties of both of them. Let s be a 2-choice proper scoring rule, with corresponding convex function G consisting of affine functions Sℓand Sh, defining contingent payments Sh 0 < Sℓ 0 and Sh 1 > Sℓ 1. Let c = min{Sℓ 0 Sh 0 , Sh 1 Sℓ 1}. Further, let r be a strictly-proper scoring rule bounded in the interval [0, 1]. Define contract function s+ by s+ i (ˆp, x) = s(ˆpi, x) + c n + 1 r(med(ˆp), x). That is, s+ pays each agent according to the sum of a 2piecewise-linear proper scoring rule, and some sufficiently small fraction of the median rule. In doing so, it inherits the weak distinguishability of scoring rule s and the weakly dominant properness of the median rule, while preserving arbitrage-freeness. Theorem 6. The contract function s+ is weakly distinguishing, weakly dominant proper, and arbitrage free. Proof. Weak distinguishability follows immediately from the definition. For weakly dominant properness, we note that s+ is weakly proper since it is the sum of two weakly proper rules. For any report ˆpi = pi, there exist reports of the other agents for which i would be better off reporting pi than ˆpi under the median rule. Therefore, i would be better off reporting pi than ˆpi under s+ too. For arbitrage freeness, consider some colluding coalition C, and two sets of reports q and q with qi = q i for all i C. Consider scoring rule s from the definition of s+. Denote by L [0, 1] the low region of the report space, in which agents receive contingent payments defined by Sℓ. Let n L = |{i C : qi L}| and n L = |{i C : q i L}|. Let H = [0, 1] \ L denote the high region, where agents receive contingent payments defined by Sh, with n H = |C| n L and n H = |C| n L. We consider two cases. Case 1: Suppose that n L = n L, which implies n H = n H. Then for each x {0, 1} we have i C s+ i (q, x) = n LSℓ x + n HSh x + |C| c n+1r(med(q), x) = n LSℓ x + n HSh x + |C| c n+1r(med(q), x) i C s+ i (q , x)+|C| c n+1(r(med(q), x) r(med(q ), x)) By properness of r, if r(med(q), x) > r(med(q ), x) for some x {0, 1}, then it must be the case that r(med(q), 1 x) < r(med(q ), 1 x). Therefore, no arbitrage opportunity exists. Case 2: Suppose that n L n L = n H n H > 0. The opposite case is symmetric. Then we have i C s+ i (q, 0) = n LSℓ 0 + n HSh 0 + |C| c n+1r(med(q), 0) = n LSℓ 0 + n HSh 0 + (n L n L)(Sℓ 0 Sh 0 ) + |C| c n+1r(med(q), 0) n LSℓ 0 + n HSh 0 + c + |C| c n+1r(med(q), 0) n LSℓ 0 + n HSh 0 + c + |C| c n+1 (r(med(q ), 0) 1) > n LSℓ 0 + n HSh 0 + |C| c n+1r(med(q ), 0) = i C s+ i (q , 0) The first inequality holds because n L n L 1 and Sℓ o Sh 0 c, the second inequality because r is bounded in [0, 1], and the final inequality because |C| < n + 1. That i C s+ i (q, 1) < i C s+ i (q , 1) follows similarly, implying that no arbitrage opportunity exists. Note that arbitrage freeness does not in general follow from combining two arbitrage-free rules, and that the subtlety in the definition of s+ is necessary. Suppose that we had instead defined s+ without a small enough scaling of the median rule. In particular, suppose that s is the 0-1 score defined earlier and r is the quadratic score (s(ˆpi, x) = 1 (ˆpi x)2). Consider the contract function that pays each agent s(ˆpi, x)+ r(med(ˆp), x). If p = (1, 1, 0) then the total payment to all agents is $1 if X = 0 and $5 if X = 1. But if instead the agents report ˆp = (0.5, 0.5, 0.5), their total payment is $2.25 if X = 0 and $5.25 if X = 1. 5 Conclusion We have explored mechanisms for truthfully eliciting probabilistic forecasts from individuals who may collude with one another. When strict incentives for truthtelling are required, the principal can avoid arbitrage provided that reports are bounded away from extreme probabilities. If incentives can be relaxed, then arbitrage can be avoided with a mechanism in which truth-telling remains the unique undominated strategy. So far, we have only considered binary events. In the full version, we provide an inductive construction that extends contract functions defined for a binary random variable to any random variable X with finitely many values. The constructed contract function first pays agents based on their prediction for the event X = 0, and, if the event X = 0 occurs, pays agents based on their predictions conditioned on the event X = 0. Our construction preserves properness of various strengths and arbitrage freeness. The mechanisms we have presented can therefore be extended in a straightforward manner to elicit predictions for any discrete random variable. Compelling and challenging open questions remain. In particular, we still do not know whether a strictly proper and fully arbitrage-free contract function exists. In Section 3, we discussed an easier open problem: Can a mechanism with similar guarantees to M k be designed that has stronger truth-telling incentives in practice? Towards an impossibility result, one could imagine adding additional constraints to the mechanism design problem such as budget balance, requiring the mechanism to pay out the same amount in total regardless of the reports. Acknowledgments. Dominik Peters was supported by ERC under grant 639945 (ACCORD). Work conducted while Bo Wagonner was at Microsoft Research. Brier, G. 1950. Verification of forecasts expressed in terms of probability. Monthly Weather Review 78:1 3. Chun, S., and Shachter, R. D. 2011. Strictly proper mechanisms with cooperating players. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), 125 134. Freeman, R.; Pennock, D. M.; and Wortman Vaughan, J. 2017. The double clinching auction for wagering. In Proceedings of the 18th ACM Conference on Economics and Computation (EC), 43 60. French, S. 1985. Group consensus probability distributions: A critical survey. Bayesian statistics 2. Genest, C., and Zidek, J. V. 1986. Combining probability distributions: A critique and an annotated bibliography. Statistical Science 1(1):114 148. Gneiting, T., and Raftery, A. E. 2007. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102(477):359 378. Hanson, R. 2003. Combinatorial information market design. Information Systems Frontiers 5(1):107 119. Kilgour, D. M., and Gerchak, Y. 2004. Elicitation of probabilities using competitive scoring rules. Decision Analysis 1(2):108 113. Lambert, N. S.; Langford, J.; Wortman, J.; Chen, Y.; Reeves, D. M.; Shoham, Y.; and Pennock, D. M. 2008. Self-financed wagering mechanisms for forecasting. In Proceedings of the 9th ACM Conference on Electronic Commerce (EC), 170 179. Mc Carthy, J. 1956. Measures of the value of information. Proceedings of the National Academy of Sciences 42(9):654 655. Morris, P. A. 1977. Combining expert judgments: A Bayesian approach. Management Science 23(7):679 693. Savage, L. J. 1971. Elicitation of personal probabilities and expectations. Journal of the American Statistical Association 66(336):783 801. Schervish, M. J. 1989. A general method for comparing probability assessors. The Annals of Statistics 17(4):1856 1879. Shoham, Y., and Leyton-Brown, K. 2008. Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press. Tetlock, P. E., and Gardner, D. 2015. Superforecasting: The Art and Science of Prediction. New York, NY, USA: Crown Publishing Group.