# causal_inference_from_competing_treatments__4d0a8d33.pdf Causal Inference from Competing Treatments Ana-Andreea Stoica 1 Vivian Yvonne Nastl 1 2 Moritz Hardt 1 Many applications of RCTs involve the presence of multiple treatment administrators from field experiments to online advertising that compete for the subjects attention. In the face of competition, estimating a causal effect becomes difficult, as the position at which a subject sees a treatment influences their response, and thus the treatment effect. In this paper, we build a game-theoretic model of agents who wish to estimate causal effects in the presence of competition, through a bidding system and a utility function that minimizes estimation error. Our main technical result establishes an approximation with a tractable objective that maximizes the sample value obtained through strategically allocating budget on subjects. This allows us to find an equilibrium in our model: we show that the tractable objective has a pure Nash equilibrium, and that any Nash equilibrium is an approximate equilibrium for our general objective that minimizes estimation error under broad conditions. Conceptually, our work successfully combines elements from causal inference and game theory to shed light on the equilibrium behavior of experimentation under competition. 1. Introduction Randomized controlled trials (RCTs) have become ubiquitous in a variety of fields, to the point that they are not so much the gold standard as just a standard tool in the toolbox (Banerjee et al., 2016). Application areas range from A/B testing in industry to informing policy choices at a large scale (Levy, 2007; Alatas et al., 2012; de Souza Leão & Eyal, 2019). 1Social Foundations of Computation, Max Planck Institute for Intelligent Systems, Tübingen, Germany and Tübingen AI Center, Germany 2Department of Mathematics, ETH Zürich, Zürich, Switzerland. Correspondence to: Ana-Andreea Stoica , Vivian Yvonne Nastl . Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). Given the abundance of experimentation, an agent who wishes to measure the effect of their treatment through an RCT may not be alone in doing so. Competing treatments arise when multiple agents share the population on which they wish to experiment. Subjects may then receive multiple treatments sequentially determined by some form of competition between the treatment administrators. Treatments that came before may modify the effectiveness of subsequent treatments. Precisely estimating effects therefore becomes challenging under competition: sequential treatments may pose the issue of external validity (Duflo, 2006) or mislead companies in their assessment of products (e.g. as the click-through-rate plummets for low-position content). Our concrete running example is online advertising. Multiple advertisers wish to estimate the effectiveness of their campaign. However, due to competition between advertisers, their ad will be displayed at a certain position on screen. The lower the rank of the ad, the lesser its effectiveness (Agarwal et al., 2011). Running the campaign, an advertiser collects data of impressions and clicks on ads displayed at different ranks. From this data, the advertiser wishes to answer the question: What would have been the effect of the treatment had it been the first treatment applied? In other words, how effective is the ad if it is displayed in the top rank? Equipped with knowledge of this effect, an advertiser can distinguish between two scenarios: (1) an ad was not effective because it was mostly displayed at lower ranks, or (2) an ad was not effective because of its design. An advertiser can then take appropriate action to increase its effectiveness in subsequent campaigns. The advertiser faces two problems: one game-theoretic, the other statistical. First, how should the advertiser bid on subjects, anticipating that other advertisers will bid strategically, too? Second, how should data from different ranks be weighted optimally to obtain the best possible estimate of the causal effect? Specifically, we show what the optimal estimation error is that an advertiser can achieve at equilibrium in a game that models the competition between advertisers. In other words, our work shows how to spend an experimental budget rationally on subjects when the goal is to estimate a causal effect optimally. Causal Inference from Competing Treatments 1.1. Our Contribution We introduce a novel model of causal inference under competing treatments. Our model combines elements from game theory and statistics to find an answer that neither toolkit can provide on its own. Specifically, how should treatment administrators allocate their budget so as to minimize statistical estimation error, while facing competition? We assume that the causal effect of interest is a bounded quantity τ that we interpret to be the effect of treatment had it come first. A number k of treatment administrators compete over a pool of n subjects. Administrators have budgets that they can spend by making bids on subjects. A probabilistic allocation rule determines the order in which each subject receives the k treatments. We assume that the treatment effectiveness decays with the order in which the treatment is applied. A treatment applied at position r has a causal effect τr < τ. We assume that τr = αr τ, where αr is a known structural parameter of our model. In our model, administrators bid strategically so as to achieve the smallest possible estimation error. The optimal estimation error depends on what we call the sample profile, that is, the number of samples nr at each rank r. One of our key contributions is to approximate the optimal error objective by a more tractable objective that we call sample value. The sample value corresponds to the weighted sum S = P r nr α2 r. The sample value quantifies the effective utility of the sample where each rank is discounted appropriately. We prove that, in the typical regime where budgets are larger than k2, any Nash equilibrium in the sample value objective is also an approximate Nash equilibrium with respect to the optimal error objective. Using this approximation result, we can understand the competition over optimal error by instead analyzing the competition over sample value. This objective has many nice properties. We show that it has a pure Nash equilibrium, by finding a class of equilibria in closed form in which rational administrators spend their budget exhaustively in a way that maximally avoids competition. Conceptually, we find a fruitful bridge between causal inference and game theory. On the one hand, causal inference literature tells us how to account for known position effects in an optimal way. On the other hand, game theory provides solution concepts for dealing with competition. A novelty is that we not only apply game theory to budget allocation strategies but rather to estimate causal effects. Our work opens the door for numerous interesting questions, which we discuss in Section 5, alongside current limitations of our framework. 1.2. Related Work Several fields provide evidence that the position at which content is seen affects engagement, from early theories in experimental psychology (Ebbinghaus, 1885) to survey design (Mc Farland, 1981). Because of position effects randomized experiments in development economics run the risk of reducing the external validity of sequential studies run on the same population (Duflo, 2006). Perhaps the most extensive work studies the digital space, where search results displayed lower on a webpage receive less engagement (Ansari & Mela, 2003; Teevan, 2008) and low-positioned ads are recalled less (Varian, 2007; Agarwal et al., 2011). In particular, Narayanan & Kalyanam (2015) measure the effect of positioning ads under selection biases such as competition using regression-discontinuity, without modeling the competition for ranks explicitly. Moreover, Hardt et al. (2022) define a framework for measuring the effect of position on market power acquisition. Our work draws on this literature and models position effects that emerge from competition: agents who wish to run randomized controlled trials on a population encounter a position effect when competing with each other over winning the attention of subjects. Such competition has been studied in game-theoretic settings. In particular in applications like online advertising, pricing mechanisms have been developed in order to achieve stability in the market (Aggarwal et al., 2006; Edelman et al., 2007). A related line of work is that of position-based auctions. Early works in this research line studied equilibria and truthfulness through designs that encode positions in the agents valuations for individual users (Varian, 2007; Athey & Ellison, 2011; Börgers et al., 2013). Another recent line of work studies optimal platform design choices under proxy bidding (or autobidding ), where agents strategy space is limited to budget choices and targeting mechanisms (Aggarwal et al., 2019; Balseiro et al., 2021; Conitzer et al., 2022). Ghosh & Sayedi (2010) study alternative designs for auctions with negative externalities generated by competing advertisers in online conversionbased auctions, without an explicit position-based model. Our work differs from this literature in the following ways: in our set-up, agents do not have an intrinsic value for particular individuals a common auction-modeling choice in early position-based auctions but rather a sample value that determines the estimation error of their. This brings our problem closer to the autobidding literature, yet, differs from it in that our optimization problem includes both bidding strategies and the choice of an optimal estimator for precise estimation of causal effects. Our game-theoretic design is inspired by and directly generalizes the work of Maehara et al. (2015). Their design optimizes the sample size of customers obtained by competing advertisers, in a line of work that bridges Causal Inference from Competing Treatments influence maximization with budget allocation (Alon et al., 2012). Whereas in Maehara et al. (2015) s design there is no explicit rank (an advertiser either wins a customer or not, maximizing the size of their sample), our model generalizes theirs by including multiple positions in which advertisers may be shown to users. This generalized game may be of independent research interest, as we can characterize particular Nash equilibria. Estimating treatment effects under equilibrium conditions has been studied in the case of unit interference (Wager & Xu, 2021) and two-sided markets (Johari et al., 2022). They consider dynamic models in which the treatment effect is dependent on the available market supply and unit s interactions, assuming a single treatment coming from a decision-maker (often a platform designer). Adaptive experimentation is a rapidly growing area in estimating causal effects under sequential treatments. Hadad et al. (2021) and Dwivedi et al. (2022) provide statistical inference guarantees in the problem of estimating counterfactual means, without explicitly modeling multiple ranks or strategic interactions. Methods such as multi-armed bandits have been employed to provide guarantees on the efficacy of treatments through optimal data collection in the presence of position effects (Lagrée et al., 2016; Zhou et al., 2023). They primarily assume a single learner. While bearing similarities in objectives (as our objective also directly optimizes for estimation error), our work differs from these by modeling data acquisition through competition that emerges from multiple learners. 2. Games and Objectives We formally introduce a game that captures the setting of k treatment administrators (also called admins) who compete over a set of n subject slots. Linking back to our online advertising example, the different admins are represented by different advertisers and the subject slots are positions at which users will see the displayed ads on an online platform. Conceptually, subject slots are placeholders for i.i.d. samples drawn from a data-generating distribution. Administrators have varying budgets that they can spend exhaustively by bidding on the subject slots. The outcome of the game is an allocation of the subject slots to the k administrators. Thus, each admin receives up to n slots for displaying their ad. For each slot, they can run a treatment-control experiment and measure the effect of an ad. The formal definition of the data-generating process is given in Section 3. We assume that the causal effect of the treatment at rank 1 is given by a bounded quantity τ with |τ| 1.1 We also assume that 1The effect just needs to be bounded quantity. We assume it is bounded by 1 without loss of generality. treatment at a higher rank r > 1 is less effective so that τr = τ αr with 0 < αr < 1 . The discount factors {αr} are known structural parameters of our model. We discuss the case where the discount factors are not a priori known in Section 3. Samples at rank 1 are most valuable, but samples from lower ranks can still be useful. A treatment administrator bids strategically so as to minimize their mean squared error in estimating the causal effect τ. To fully specify the game, we need the following ingredients: We can represent the bids as a nonnegative integer matrix x Nn k 0 , where xai is the bid of admin a on subject slot i. We denote the budget of admin a by B(a) = P i xai. An admin s strategy is a potential bid allocation over user slots. We denote the set of all possible strategies of an admin a by Da. An allocation rule A maps bids to an allocation of subject slots. That is, given a set of bids, the allocation rule determines an assignment of ranks for each admin. A rank assignment of a particular admin is a tuple r = (r1, r2, , rn), where ri is the rank assigned for subject slot i. Under a probabilistic rule, we draw the rank assignment from the distribution given by the randomized allocation. The sample profile is just the count of the number of slots at each rank: n = (n1, n2, , nk), where nr = P i 1{ri = r}. We denote the total amount of slots obtained by admin a by n(a) = P k nk. The specific allocation rule we define later is randomized. The utility function fa : D R of admin a maps a tuple of strategies from D = D1 D2 Dk to the utility of admin a given the strategies of all admins. Our goal is to understand the Nash equilibria of the game. These are the strategies in which each admin is unilaterally maximizing her utility. In other words, admins are simultaneously best responding to each other. Specifically, we want to understand what estimation error admins can hope to achieve at equilibrium. 2.1. Objective Design Given an assignment of ranks r drawn from the distribution induced by the allocation function A, an administrator has to decide how to use the samples so as to optimally estimate the causal effect τ. This corresponds to solving the optimization problem, from the perspective of a single admin: inf bτ EZ X(r) h (bτ(Z) τ)2i (1) over all possible estimators ˆτ. Here, X(r) denotes the datagenerating distribution under the realized rank assignment r, Causal Inference from Competing Treatments defined as: given a population X and a ranking assignment r, a subject is drawn independently at each subject slot i from the interventional distribution X do(rank=ri), where ri is the rank that the admin obtained at subject slot i (coordinate i in the assignment rank vector r). The data Z drawn from X(r) is the collection Zi = (ri, Ti, Yi) , i [n(a)], where Ti is the treatment assignment and Yi denotes the outcome of a subject sampled for slot i at rank ri. We formalize the data-generating process, together with the modeling choices for the outcome and treatment assignment in Section 3. Recall that treatment effects vary with the rank at which an admin s campaign is run, and therefore the data-generating distribution depends on the rank assignment. Estimation error objective. Solving objective (1) directly is a difficult task. We therefore replace the optimal error with a minimax lower bound on the error. We prove (in Section 3) that for any estimator bτ, there exists a model instance M of the data-generating distribution X(r) such that the estimation error is lower bounded by min c Pk r=1 nr α2 r 1 , 1 (2) for a constant c > 0. In our subsequent analysis, we set w.l.o.g. c = 1.2 When the number of samples obtained by an admin approaches 0, the first term in the bound expressed in equation (2) tends to infinity. However, in this case, since the causal effect is bounded, |τ| 1, the admin s mean squared error is also bounded by 1 (this bound is given by choosing the constant estimator, for example). We provide a detailed discussion of this property in the Appendix. We show (in Section 3) that the minimax lower bound in (2) is attainable up to constant factors. For example, we can use an optimally weighted average of Horvitz-Thompson treatment effect estimators at each rank, which render an estimation error of O Pk r=1 nr α2 r 1 . This motivates defining the estimation error objective as fa(x) = Er A(x) min EX(r) Pk r=1 nr α2 r 1 , 1 . (3) The negative sign serves the purpose of turning an estimation error minimization objective into a maximization objective. In essence, the estimation error objective captures an admin s goal of minimizing their estimation error (up to constant factors) in expectation over the sample profile distribution given by the allocation rule. 2The value of the constant c does not change the results: as all admins minimize the error by selecting the best sample distribution, the constant c can be ignored in this objective. We obtain identical results using the expression min Pk r=1 nr α2 r 1 , 1/c in our objectives. Our results only rely on this expression being bounded by a constant. Sample value objective. Even the estimation error objective is not necessarily tractable. In particular, it appears difficult to reason about Nash equilibria with respect to the estimation error objective. One of our main contributions is to relate the estimation error objective to a tractable objective. Given a sample profile, we call the weighted sum P r nr α2 r the sample value given by a rank allocation r. The sample value objective is defined as fa(x) = Er A(x)EX(r) h Pk r=1 nr α2 r i . (4) Intuitively, the sample value objective and the estimation error objective are linked, as they are both a function of the sample value Pk r=1 nr α2 r, obtained by an admin through a strategy choice. Formally, we can show that for typical parameter settings, any Nash equilibrium in the sample value objective is an approximate Nash equilibrium in the estimation error objective. Moreover, we can characterize a set of Nash equilibria in the sample value objective. 2.2. Main Results We denote by GMSE and GSV the games associated to the estimation error objective and the sample value objective, respectively. Figure 1 illustrates our setup and results: through a causal inference analysis, we find an explicit expression for the minimax lower bound of an estimator as a function of the sample value (Section 3); the sample value objective s associated game can be solved through finding a Nash equilibrium (Theorem 2.1), described in detail in Section 4, and any such solution approximates the error minimization objective under mild assumptions (Theorem 2.2), which is the main result of our paper. Theorem 2.1. A budget allocation game GSV with multiple treatment administrators has a pure Nash equilibrium for all sequences (αr)r such that αr (0, 1) and α1 = 1. Moreover, for B(a) n, an allocation in which all treatment administrators split their entire budget uniformly on subjects and maximally avoid competition on subjects is always a Nash equilibrium. Theorem 2.2. Any Nash equilibrium for the GSV game with k treatment administrators and n subjects is an (ϵ, η)- approximate Nash equilibrium for the GMSE game with and η = O exp( B/k2) , where B = mina B(a) and the maximum bid per subject slot is bounded. In particular, if all administrators have budget B(a) = ω(k2), we get a (o(1), o(1))-approximate Nash equilibrium. The proofs for all results are detailed in the Appendix. Nash equilibrium. We assume that players act rationally and selfish in maximizing their utilities. We assume this is a Causal Inference from Competing Treatments Theorem 2.1 Pure Nash Equilibrium Sample value objective game Theorem 3.3 Minimax lower bound Estimation error objective game bounded bids & position effect assumptions Theorem 2.2 -approximate NE Figure 1. Illustration of the objective set-up and results. complete information game (i.e. players know each other s utility functions). This set-up is common in game designs applicable in online advertising settings, as motivated by prior works (Maehara et al., 2015; Varian, 2007). We employ the well-known notions of pure and approximate Nash equilibrium. In short, a strategy set x is a pure Nash equilibrium (Nash Jr, 1950) if no deviation from it can increase utility for any admin (i.e. every admin s strategy is a best response). The concept of (ϵ, η)-approximate equilibria has been widely used in finding approximate solutions (Alon et al., 2012; Roughgarden, 2016). In its essence, it assumes that an admin will not change his strategy if his current one is an approximate best response. Definition 2.3. A strategy set x D is an (ϵ, η)- approximate Nash equilibrium for a utility-maximization game G with non-positive utility function fa : D R if for all players a, fa(xa, x a) (1 + ϵ) fa(x a, x a) η, x a Da (5) where Da denotes the strategy set of player a and ϵ, η > 0. An equivalent definition for a utility minimization game with non-negative utilities occurs for η < 0. Allocation rule: We use a probabilistic allocation rule that generalizes a model proposed by Maehara et al. (2015). We defer a formal definition to Section 4. Intuitively, each administrator has a chance to win the first rank of any given subject slot with a probability that is an increasing function of the administrator s bid for the slot. After determining a winner for rank r, the allocation rule proceeds to the next rank until all ranks have been allocated. The main properties of the allocation rule are that the probability for an admin of winning a rank is component-wise concave and increasing in the bid, which facilitates characterizing a Nash equilibrium. Furthermore, the allocation rule benefits from the property that the sample value does not grow in expectation slower than B(a)/k: E k P r=1 nr α2 r = Ω B(a)/k under a Nash equilibrium for GSV. This allows the sample value variable to concentrate around its mean in the regime where budgets are larger than k2. Proof sketch for Theorem 2.2: The intuition behind the proof relies on a few properties of the model and the alloca- tion rule. Mainly, the allocation allows the sample objective to concentrate around its expectation with high probability in the bounded parameter regime. This allows us to show that if an allocation is indeed an optimal solution for the sample value objective, it will be approximately optimal for the estimation error objective. The failure to concentrate probability determines a distance η from the utility of any other allocation; through a market-efficiency interpretation, it determines the necessary overhead above which an admin would be incentivized to change its strategy. We discuss alternative allocation rules and connections to common auction designs in Section 5, noting that our analysis is not contingent on the specific choice of an allocation rule, but rather on the properties mentioned in the proof sketch. 3. A Causal Inference Analysis of the Error Minimization Objective We treat the estimation problem from the perspective of a particular admin a. For simplicity, we drop in the notation the index a when there is no confusion. We define a population distribution X from which subjects will be independently drawn and used in the estimation problem for the admin. The admin runs an RCT on each subject and uses the data they gathered to estimate a causal effect. We employ classic modeling choices from the causal inference literature to model the outcomes and treatment assignments of subjects under an RCT. Data generating process. We detail the data-generating process for an admin a, briefly described in Section 2. The process consists of two parts, sketched in Figure 2 in the Appendix: 1. Admin a competes over the n subject slots with the other admins by placing bids. A set of bids x placed by the admins induce a distribution of rank allocations A(x), where A is the rank allocation function. We sample a rank assignment r from A(x) for admin a. The rank assignment r is a vector containing the assigned rank of admin a for each subject slot. The sample profile (n1, n2, , nk) gives the number of subjects drawn at each rank, on whom the admin will run their campaign. In total, the admin obtained Causal Inference from Competing Treatments k nk slots out of the total n slots.3 2. The subjects are drawn independently at each slot from the interventional distribution X do(rank=ri), where ri is the rank that admin a obtained at subject slot i and treatment or control are allocated in a randomized manner (through an RCT). The data the admin collects consists of the tuples (ri, Ti, Yi): rank ri, treatment assignment Ti, and outcome Yi of each subject i [n(a)]. As the RCT eliminates the influence from any possibly confounding factors, the treatment effect at a specific rank r is identifiable (Pearl, 2009). We point out that the budget allocation happens prior to the sampling process. In particular, the individuals for each slot are not yet realized when the ranking assignment occurs. This process ensures that the individuals are independently drawn. We use potential outcomes to model the outcome of each individual sampled after the second step of the datagenerating process occurs, described as follows. Potential outcomes model. We model the outcome of a subject i n(a) sampled from X do(rank=ri) as Yi = ci,0 + c(ri) i,1 Ti + εi, r [k] (6) where Ti denotes the treatment assignment of subject i by the admin. We interpret Ti = 1 as assigning subject i to the treatment group, and Ti = 0 to the control group. We model Ti Ber(qi), for qi [q, 1 q] for a constant q (0, 0.5], i [n(a)]. Essentially, we assume the admin implements a non-uniform Bernoulli randomized design. The observational noise εi is drawn from a normal distribution and does not depend on the subject s rank, εi N(0, σ2). We denote the potential outcome of subject i for being in the treatment and control groups by Yi(1) and Yi(0), respectively. Estimand: ATE but only for the first rank. We are interested in the treatment effect an administrator would have obtained if they had experienced no competition from other administrators. In reality, competition causes an admin to show up at lower positions for some slots. The lower the rank for a slot, the smaller the treatment effect. The causal effect of interest is thus the average treatment effect had it been the first treatment applied. We formally define the estimand as τ = E[Yi(1) Yi(0)|do(rank = 1)], (7) 3Each admin may obtain a different number of data slots. For example, if an admin bids 0 on a slot, they will not obtain any rank for that slot; an admin bidding all 0s will not obtain any slots, and therefore n(a) = 0 for him. where the expectation is taken over the intervened population X do(rank=1) that always encounter the content of admin a first. Assumption 3.1 (Bounded effect). We assume that the treatment effect at the first rank τ is bounded with |τ| 1. Given that a treatment administrator may acquire data from subjects at different ranks, we also define the average treatment effect of an administrator at rank r [k] as τr = E[Yi(1) Yi(0)|do(rank = r)]. (8) Note that we identify the treatment effect at rank r by τr = 1 nr P i:ri=r c(r) i,1 if nr 1. Assumption 3.2 (Position effect). We formalize the assumption that the treatment effect at a lower rank is lower than at the first rank as τr = αr τ, r [k] (9) with (αr)r satisfying α1 = 1 and αr (0, 1) for all r [k]. An admin wishing to estimate τ may of course just use the data from samples at rank 1 (n1 samples). However, the position effect assumption allows the admin to identify the treatment effect at the first rank from the treatment effect at rank r [k] and thus, to use the entire sample profile, We refer to τ (r) := τr as the r-estimand. Remark 3.3. Estimating the treatment effect at rank 1 is a powerful primitive, as it allows one to extrapolate the effect at any distribution over ranks, given the discount factors (αr)r. As mentioned in the introduction, this knowledge is important in order for an admin to gain insight into the quality of their treatment: was a lower effect observed because of a lower rank, or because of the treatment itself? Equipped with this knowledge, an admin can choose the best course of action: increase their bid or improve on the treatment design (e.g. in the case of online advertising, change the design of an ad). We emphasize that the admin has prior knowledge of the parameters (αr)r in this analysis. We discuss the case of jointly estimating τ and (αr)r later in this section. 3.1. An Optimality Argument The problem an admin is solving is now that of optimally using their collected data in order to achieve a minimum estimation error. Exactly how an admin should combine Causal Inference from Competing Treatments their samples from different ranks towards minimizing the estimation error is non-trivial. We show however that the estimation error of any possible estimator is bounded below by a function of the sample value Pk r=1 nr α2 r. To formalize this result, we denote the set of our potential outcomes model instances as M. In particular, M = ci,0, c(ri) i,1 i [n(a] : |τ| 1, τr = αr τ . (11) Theorem 3.4 (Minimax Lower Bound). For any estimator bτ, there exists an instance M M such that the minimax squared error is bounded below by EM[(bτ τ)2] min 16 (1 q) Pk r=1 nr α2r , 1 We provide the detailed proof in the Appendix. The proof relies on Le Cam s method (Le Cam, 1973) for hypothesis testing and on leveraging the assumption of independence between individuals. The particular choice of an optimal estimator for τ is beyond the scope of this paper. Instead, we showcase an estimator that achieves the minimax lower bound (up to a constant). Estimator examples. We find examples of estimators that aggregate the effect at each rank and obtain an error up- per bound of O Pk r=1 nr α2 r 1 . Estimators that aggregate the effect at each rank are often used in metaanalysis studies or stratified experiments (Hartung et al., 2011; de Chaisemartin, 2022). We refer to these as decomposable estimators. Formally, a decomposable estimator can be written as bτ = Pk r=1 ωr bτr with bτ (r) 1 an estimator for the r-estimand for r [k]. For unbiased estimators bτ (r) 1 , the well-known inverse variance estimator (Markowitz, 1952; 1959; Cochran, 1954; Shahar, 2017) provides a closed form solution for optimally weighing the r-estimators while preserving unbiasedness. The optimal weights ω are obtained by solving a variance minimization problem through Lagrange multipliers, Var bτ (r) 1 1 Pk r=1 1/Var bτ (r) 1 . (12) An immediate corollary is that the minimum MSE achieved for weights ω is E h (τ1 bτω )2i = 1 r=1 1/Var bτ (r) 1 . (13) Within our potential outcomes model, there are unbiased estimators for the r-estimand that achieve an error E h (bτr τr)2i = O n 1 r . Any such estimators, weighted optimally, would attain a total error bounded by O Pk r=1 nr α2 r 1 , as we will further argue. Given the parametric assumption on the sequence (αr)r, we know that αr amplifies the variance of the r-estimator by a squared term, Var bτ (r) 1 = Var (bτr) and thus, E bτ (r) 1 τ (r) 1 2 = O (nr α2 r) 1 . It then immediately follows from equation (13) that the estimation error of the inverse variance estimator bτω is E h (τ1 bτω )2i = O r=1 nr α2 r As examples of unbiased estimators for the r-estimand, the OLS estimator and the Horvitz-Thompson estimator (Horvitz & Thompson, 1952) have variance of order O(n 1 r ). The OLS estimator attains the lowest actual MSE, including constants, for unbiased linear estimators (see Gauss-Markov Theorem in Johnson & Wichern (2007)), while the Horvitz-Thompson estimator is preferred in more general models that include unequal treatment probability (unequal qi across subjects). We do not reproduce the bound computation for the OLS estimator as it is a known result, but we detail, for the interested reader, properties of the Horvitz-Thompson estimator in our potential outcomes model. We define the Horvitz-Thompson estimator formally for estimating the effect at rank r, Yi1{Ti = 1} P(Ti = 1) Yi1{Ti = 0} Similar to the OLS estimator, it is unbiased and has a variance of order O(n 1 r ) (albeit with a larger constant factor). We show these properties in Propositions 3.5 and 3.6. Proposition 3.5. The estimator bτr is an unbiased estimator for the average treatment effect τr at rank r. Proposition 3.6. The error of the rank r estimator bτr, E h (τr bτr)2i , can be upper bounded by q(1 q) + Y 2 max (1 q)2 + q2 q(1 q) + 2 (17) with Ymax := max i [n(a)] ci,0 + c(ri) i,1 . These results show that we can find an optimal estimator up to constants. While we do not require decomposable Causal Inference from Competing Treatments estimators in order to obtain the minimax lower bound on the error in Theorem 3.4, they provide a good intuition for the form of the sample value expression. The intuition is that whereas in a classical non-competitive causal inference task, the sample value is just a function of the sample size, the αr downgrade in treatment effect from rank 1 to rank r scales down the sample value of rank r by α2 r. Thus, we interpret the sum Pk r=1 nr α2 r as the sample value that an admin has in order to estimate a treatment effect. The choice of different strategies offers an admin the possibility of optimizing the sample value. 3.2. Estimating the Discount Factors In our analysis so far, we assumed that the discount factors (αr)r are known by the admins. The parameters are perhaps given by a central platform or estimated in previous studies.4 In settings where the discount factors (αr)r are not a priori known, they may be estimated from historical data. Related set-ups have utilized the EM algorithm in estimating the position bias in online advertisement problems (Chuklin et al., 2016; Dempster et al., 1977). In many applications of multiple treatments, such as online advertisement, the estimated discount factors do not depend on the particular admin but rather on the display position (Lagrée et al., 2016). In these cases, a pilot experiment is usually run by randomizing results or performing pairwise comparisons (Joachims et al., 2017; Wang et al., 2018), in order to get good estimates of the discount factors. The estimated discount factors are then used in subsequent campaigns, under the assumption that they do not need to be re-estimated. For completeness, we show that estimating the discount factors jointly with the treatment effect through data splitting does not reduce the estimation error, even in the most simple case: take a pilot experiment in which an online platform has collected data on an admin s treatments at different ranks, with nr samples at each rank. Without prior knowledge of the position effect parameters, the admin has two choices in estimating τ: (1) use just the data where their treatment was shown on the first rank, or (2) split the data at all positions, using part of it for estimating the position effect parameters (αr)r, and then using these estimates together with the rest of the data for estimating the treatment effect τ. Even if the admin has to split the data available in (2), he might still have more samples for the treatment estimation task than in scenario (1). We show in a simple argument that the treatment effect estimation error is not reduced in (2) as compared to (1), despite the fact that the admin is using more data in (2) and no matter how the split is being done (Lemma A.6 in the Appendix). 4For example, Google reports the decay rate in click-throughrates with lower positions in online advertising (Bailyn, 2024). 4. Nash Equilibria for the Sample Value Game We define a probabilistic allocation rule Aprob and prove Theorem 2.1 by characterizing a set of pure Nash equilibria for the game GSV. Allocation rule Aprob: We use a probabilistic allocation rule that generalizes the model proposed in Maehara et al. (2015). We define a probabilistic allocation rule called Aprob that allocates ranks to the k admins as follows. Activation probability: When an admin i allocates budget to a subject slot, we define his probability of winning that subject slot at a particular rank (determined by the order in which admins play) as PAa(xa, t) := 1 (1 p)xa(t) , (18) where p is a parameter governing how relevant admin a is for a subject slot t.5 We use interchangeably the notation xa(t) or xat for the budget that treatment administrator a allocates to slot t. We note that the relevance parameter p cannot depend on the specific subject drawn at subject slot t, since the admin does not know a priori which subject is realized. Thus, we can think of p as an expected value over the population distribution. If a treatment administrator does not bid on a slot, he does not get a sample from that slot (at any rank). Random ordering: Then, we must define a way for advertisers to compete over the subject slots. We propose the following procedure for administrators to get allocated ranks to show their content to each subject slot: for each subject slot t [n] and for each rank r k, set Sk r the set of administrators who have not been allocated a rank r < r yet for this subject slot t. Then, draw a permutation of the elements of Sk r, σ P(Sk r), uniformly at random (where P(S) denotes the set of all permutations of elements of a set S). Now, according to the order that the permutation σ defined, each administrator will try to win rank r for subject slot t, based on the budget they allocated to this subject slot (note that an administrator allocates budget to a subject, not to each rank). See Algorithm 1 for a formal description. We characterize a set of Nash equilibria for the case B(a) n in the following lemma, thus proving Theorem 2.1. The results easily generalize to B(a) > n, by considering allocations in which treatment administrators split their budget uniformly over subjects. 5In this first analysis, we take p to be the same for all admins and subjects; future work can model a different pa for each admin a, noting that the equilibrium becomes more complicated to characterize in closed form. Causal Inference from Competing Treatments Algorithm 1 Winning ranks for a subject slot t through the probabilistic allocation rule Aprob. Denote Sk 1 = [k] the set of all administrators. for rank r [k]: do Draw a random permutation σ P(Sk r) for i [k]: do Admin σ(i) tries to win subject t at rank r with probability PAσ(i)(xσ(i), t) if success then Define Sk r 1 = Sk r\σ(i) break and move on to the next rank end if end for end for Lemma 4.1. An allocation x with the following properties is always a Nash equilibrium for GSV: Admins prefer to split their budget uniformly over subjects: xa(t) {0, 1}, a [k], t [n]; Admins prefer to spend all their budget: t=1 xa(t) = B(a), a [k] Admins prefer to minimize competition with each other: 1, t, t [n] This means that an admin will prioritize spending budget on the subjects with the least bids. The proof for Lemma 4.1 is detailed in the Appendix and relies on the following properties for an allocation x as described: an admin a (1) cannot increase their utility through permuting his bids (re-arranging his allocation xa of budget units over subjects); and (2) prefers to split his budget in units of 1 over subjects rather than aggregating it over fewer subjects. The intuition behind proving these properties relies on the form of the utility function fa( ), as PAa( ) is component-wise concave and increasing: an admin gains more from uniform bidding since it has a chance of winning a first rank over multiple subjects, rather than increasing his bid on a single subject. Although increasing the bid on a single subject increases his chance to win the first rank, it does not increase it by much in comparison. Moreover, we note that an allocation in which an admin has unused budget cannot be an equilibrium, as the admin can increase their sample value by bidding on new subjects, given the probabilistic nature of the allocation rule. The nature of this equilibrium is quite subtle: it s not clear from the start whether the equilibrium is always the equal bids strategy, or to split the subject slots (biding high on a few subject slots and zero on the rest, thus splitting the subjects between different admins). We conjecture that the allocations x as described in Lemma 4.1 are the only Nash equilibria that can occur when the relevance probability p is the same for all admins: Conjecture 4.2. The only pure Nash equilibria for the game GSV are allocations x that satisfy the properties stated in Lemma 4.1, when the activation probability pa := p is the same for all admins a [k]. We can formally show this property for two admins, k = 2, and any number of subjects n: Proposition 4.3. The only pure Nash equilibria for the game GSV for two admins (k = 2) are allocations x that satisfy the properties stated in Lemma 4.1, when the activation probability pa := p is the same for all admins a [k]. 5. Discussion and Future Directions While our analysis uses the probabilistic allocation rule Aprob, it is not limited to it. As the approximation proof sketch suggests, a main ingredient is that the inherent randomness in the allocation rule renders the sample value objective S close to its mean, allowing for concentration inequalities to apply and extend to the estimation error objective. Albeit the random ordering of the agents, the allocation rule has the property that higher bids increase the probability of winning higher ranks, due to the sequential rank allocation. The Nash equilibria in the sample value optimization game bear a noteworthy resemblance to Feldman et al. (2007) s randomized solution of the associated optimization problem under a generalized second-price auction. In their setting, the authors show that the optimization problem induced by the utility function of agents with position-based value over subjects is intractable, yet randomized strategies find nearly optimal solutions. Thus, future directions could analyze our objectives with other allocation rules similar to those used in generalized second-price or first-price auctions (Vickrey, 1961). Another promising future avenue of research includes studying the quality of Nash equilibria within the class of estimation error games, in terms of bounding the Price of Anarchy (Po A) (Koutsoupias & Papadimitriou, 2009). In the GSV game, this becomes more challenging to study as it is not necessarily a potential game like its simpler version in Maehara et al. (2015), although with the added novelty that we can fully characterize a Nash equilibrium. Furthermore, it would be interesting to study the distribution of utility across admins through the lens of inequality, perhaps through related solution concepts such as Nash social welfare, often employed to study fairness issues in resource allocation games (Brânzei et al., 2017). Causal Inference from Competing Treatments Acknowledgements We thank Florian Dorner for his insightful feedback on the manuscript. We are also grateful to Claire Vernade, Aaron Schein, Kevin Jamieson, Lalit Jain, Robert Nowak, and Scott Sievert for helpful discussions on earlier versions of the project. Finally, we thank the anonymous reviewers for their helpful feedback on the paper. This project has received funding from the Max Planck ETH Center for Learning Systems (CLS). Impact Statement This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here. Agarwal, A., Hosanagar, K., and Smith, M. D. Location, location, location: An analysis of profitability of position in online advertising markets. Journal of marketing research, 48(6):1057 1073, 2011. Aggarwal, G., Goel, A., and Motwani, R. Truthful auctions for pricing search keywords. In Proceedings of the 7th ACM Conference on Electronic Commerce, pp. 1 7, 2006. Aggarwal, G., Badanidiyuru, A., and Mehta, A. Autobidding with constraints. In Web and Internet Economics: 15th International Conference, WINE 2019, New York, NY, USA, December 10 12, 2019, Proceedings 15, pp. 17 30. Springer, 2019. Alatas, V., Banerjee, A., Hanna, R., Olken, B. A., and Tobias, J. Targeting the poor: evidence from a field experiment in Indonesia. American Economic Review, 102(4):1206 1240, 2012. Alon, N., Gamzu, I., and Tennenholtz, M. Optimizing budget allocation among channels and influencers. In Proceedings of the 21st International Conference on World Wide Web, pp. 381 388, 2012. Ansari, A. and Mela, C. E-customization. Journal of Marketing Research, 40(2):131 145, 2003. Athey, S. and Ellison, G. Position auctions with consumer search. The Quarterly Journal of Economics, 126(3): 1213 1270, 2011. Bailyn, E. Google click-through rates (CTRs) by ranking position. 2024. URL https://firstpagesage.com/ reports/google-click-through-ratesctrs-by-ranking-position/. Balseiro, S., Deng, Y., Mao, J., Mirrokni, V., and Zuo, S. Robust auction design in the auto-bidding world. Advances in Neural Information Processing Systems, 34: 17777 17788, 2021. Banerjee, A. V., Duflo, E., and Kremer, M. The influence of randomized controlled trials on development economics research and on development policy. The state of Economics, the state of the world, pp. 482 488, 2016. Börgers, T., Cox, I., Pesendorfer, M., and Petricek, V. Equilibrium bids in sponsored search auctions: Theory and evidence. American economic Journal: microeconomics, 5(4):163 187, 2013. Brânzei, S., Gkatzelis, V., and Mehta, R. Nash social welfare approximation for strategic agents. In Proceedings of the 2017 ACM Conference on Economics and Computation, pp. 611 628, 2017. Charles, D., Chakrabarty, D., Chickering, M., Devanur, N. R., and Wang, L. Budget smoothing for Internet ad auctions: a game theoretic approach. In Proceedings of the fourteenth ACM conference on Electronic commerce, pp. 163 180, 2013. Chuklin, A., Markov, I., and de Rijke, M. Click Models for Web Search and their Applications to IR. In WSDM 16: Proceedings of the 9th ACM International Conference on Web search and data mining, pp. 689 690, New York, New York, USA, 2016. ACM Press. Cochran, W. G. The combination of estimates from different experiments. Biometrics, 10(1):101 129, 1954. Conitzer, V., Kroer, C., Panigrahi, D., Schrijvers, O., Stier Moses, N. E., Sodomka, E., and Wilkens, C. A. Pacing equilibrium in first price auction markets. Management Science, 68(12):8515 8535, 2022. Cortez-Rodriguez, M., Eichhorn, M., and Yu, C. L. Exploiting neighborhood interference with low-order interactions under unit randomized design. Journal of Causal Inference, 11(1):20220051, 2023. de Chaisemartin, C. Trading-off bias and variance in stratified experiments and in staggered adoption designs, under a boundedness condition on the magnitude of the treatment effect. Technical report, National Bureau of Economic Research, 2022. de Souza Leão, L. and Eyal, G. The rise of randomized controlled trials (RCTs) in international development in historical perspective. Theory and Society, 48:383 418, 2019. Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. Causal Inference from Competing Treatments Journal of the royal statistical society: series B (methodological), 39(1):1 22, 1977. Duflo, E. Field experiments in Development Economics. Econometric Society Monographs, 42:322, 2006. Dwivedi, R., Tian, K., Tomkins, S., Klasnja, P., Murphy, S., and Shah, D. Counterfactual inference for sequential experiments. ar Xiv preprint ar Xiv:2202.06891, 2022. Ebbinghaus, H. Memory: A contribution to experimental psychology. Dover, 1885. Edelman, B., Ostrovsky, M., and Schwarz, M. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review, 97(1):242 259, 2007. Feldman, J., Muthukrishnan, S., Pal, M., and Stein, C. Budget optimization in search-based advertising auctions. In Proceedings of the 8th ACM conference on Electronic commerce, pp. 40 49, 2007. Ghosh, A. and Sayedi, A. Expressive auctions for externalities in online advertising. In Proceedings of the 19th International Conference on World Wide Web, pp. 371 380, 2010. Hadad, V., Hirshberg, D. A., Zhan, R., Wager, S., and Athey, S. Confidence intervals for policy evaluation in adaptive experiments. Proceedings of the National Academy of Sciences, 118(15):e2014602118, 2021. Hardt, M., Jagadeesan, M., and Mendler-Dünner, C. Performative power. Advances in Neural Information Processing Systems, 35:22969 22981, 2022. Hartung, J., Knapp, G., and Sinha, B. K. Statistical metaanalysis with applications. John Wiley & Sons, 2011. Horvitz, D. G. and Thompson, D. J. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, 47(260):663 685, 1952. Joachims, T., Swaminathan, A., and Schnabel, T. Unbiased learning-to-rank with biased feedback. In Proceedings of the tenth ACM International Conference on Web search and data mining, pp. 781 789, 2017. Johari, R., Li, H., Liskovich, I., and Weintraub, G. Y. Experimental design in two-sided platforms: An analysis of bias. Management Science, 68(10):7069 7089, 2022. Johnson, R. A. and Wichern, D. W. Applied multivariate statistical analysis. Prentice hall Upper Saddle River, NJ, 2007. Koutsoupias, E. and Papadimitriou, C. Worst-case equilibria. Computer science review, 3(2):65 69, 2009. Lagrée, P., Vernade, C., and Cappe, O. Multiple-play bandits in the position-based model. Advances in Neural Information Processing Systems, 29, 2016. Le Cam, L. Convergence of estimates under dimensionality restrictions. The Annals of Statistics, 1(1):38 53, 1973. Levy, S. Progress against poverty: sustaining Mexico s Progresa-Oportunidades program. Rowman & Littlefield, 2007. Maehara, T., Yabe, A., and Kawarabayashi, K.-i. Budget allocation problem with multiple advertisers: A game theoretic view. In International Conference on Machine Learning, pp. 428 437. PMLR, 2015. Markowitz, H. Portfolio selection. The Journal of Finance, 7(1):77 91, 1952. ISSN 00221082, 15406261. URL http://www.jstor.org/stable/2975974. Markowitz, H. M. Portfolio Selection: Efficient Diversification of Investments. Yale University Press, 1959. ISBN 9780300013726. URL http://www.jstor.org/ stable/j.ctt1bh4c8h. Mc Diarmid, C. et al. On the method of bounded differences. Surveys in combinatorics, 141(1):148 188, 1989. Mc Farland, S. G. Effects of question order on survey responses. Public Opinion Quarterly, 45(2):208 215, 1981. Narayanan, S. and Kalyanam, K. Position effects in search advertising and their moderators: A regression discontinuity approach. Marketing Science, 34(3):388 407, 2015. Nash Jr, J. F. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences, 36(1): 48 49, 1950. Pearl, J. Causality. Cambridge University Press, 2009. Roughgarden, T. Twenty lectures on algorithmic game theory. Cambridge University Press, 2016. Shahar, D. J. Minimizing the variance of a weighted average. Open Journal of Statistics, 7(2):216 224, 2017. Teevan, J. How people recall, recognize, and reuse search results. ACM Transactions on Information Systems (TOIS), 26(4):Article No. 19., 2008. Varian, H. R. Position auctions. International Journal of Industrial Organization, 25(6):1163 1178, 2007. Vickrey, W. Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance, 16(1):8 37, 1961. Causal Inference from Competing Treatments Wager, S. and Xu, K. Experimenting in equilibrium. Management Science, 67(11):6694 6715, 2021. Wang, X., Golbandi, N., Bendersky, M., Metzler, D., and Najork, M. Position bias estimation for unbiased learning to rank in personal search. In Proceedings of the eleventh ACM International Conference on Web search and data mining, pp. 610 618, 2018. Zhou, T., Liu, J., Jiao, Y., Dong, C., Chen, Y., Gao, Y., and Sun, Y. Bandit learning to rank with position-based click models: Personalized and equal treatments. ar Xiv preprint ar Xiv:2311.04528, 2023. Causal Inference from Competing Treatments A. Appendix A.1. Figure of Data-generating Process in Section 3 We sketch the data-generating process for an admin in Figure 2. slot 1 slot 2 slot 3 slot 4 slot 5 estimation error game collected data Part 1: Ranks assigned to admins Part 2: Sampling process for admin treatment control treatment control rank 3 rank 2 rank 3 game results show treatment or control Figure 2. Illustration of the data-generating process for an admin. In Part 1, the admins compete with each other to obtain different ranks for subject slots. After placing the bids, a rank allocation rule is applied. Each admin receives a rank for all slots for which they have bid. For example, the admin depicted in yellow received rank 1 for subject slots 1, rank 3 for subject slots 2 and 3, no rank for subject slot 4 (which happens if an admin does not bid on a slot), and rank 2 for subject slot 5. In Part 2, individuals are sampled at the allocated ranks, with treatment and control assigned in a randomized manner. The collected data for the admin depicted in yellow consists of tuples (ri, Ti, Yi) for all individuals, where ri represents the rank, Ti the binary treatment-control variable, and Yi the outcome. Causal Inference from Competing Treatments A.2. Proofs for Section 2 Results Proof of Theorem 2.2. We assume that every admin a can only bid up to some constant C units of its budget B(a) on any particular subject slot t. This assumption is motivated by throttling behavior on online advertising markets (Charles et al., 2013) or advertising channel constraints in media (Maehara et al., 2015). We also assume that the budget of any admin is at most the number of subject slots: B(a) n. We note that this assumption is not restrictive, as results hold easily for larger budgets. We aim to show that for x a Nash equilibrium for the GSV game, x is an (ϵ, η)-approximate Nash equilibrium for the GMSE game. For ease of notation, denote the sample value by a variable S: r=1 nr α2 r. (19) The sample value S is determined by the allocation x. Note that S is a positive random variable, as as the variables (nr)r counts of the rank allocation r, nr = P i 1{ri = r} and the variables (αr)r are nonnegative. The rank allocation r is sampled from the distribution given by the allocation rule. We aim to show the following result, as a stepping stone in proving the existence of an ϵ-approximate Nash equilibrium for the game GMSE: Proposition A.1. For allocations x D in which all admins use all their budget, n P t=1 xa(t) = B(a), a [k], with high probability S [(1 ϵ) E[S], (1 + ϵ) E[S]], with ϵ > 0. Proof of Proposition A.1. The intuition is that the sample value concentrates around its expectation with high probability given that it satisfies a bounded differences property, as further shown. Our aim is to show that with high probability S [(1 ϵ) E[S], (1 + ϵ) E[S]] (1 + ϵ) E[S] S (1 ϵ) E[S] ϵ E[S] S E[S] ϵ E[S] |S E[S]| ϵ E[S] In order to show this, we upper bound the probability P (|S E[S]| ϵ E[S]) using Mc Diarmid s inequality by writing the random variable S as the output of a function that satisfies the bounded differences property. An admin a only obtains samples from subject slots on which they bid, i.e. if they bid 0 budget on a subject slot, they do not get any rank for that subject slot, as per the definition of the allocation rule. Given the assumption that an admin can only bid up to some constant C on each subject, there are at least B(a)/C slots where the admin gets any rank; hence, there are at least B(a)/C of the ri variables. We denote by m the number of the subject slots on which an admin gets ranks, noting that B(a)/C m B(a). For each sample profile (nr)r, we can write S as the output of a function of the random variables (ri)i, g : [k]m R, with g(r1, , rm) = k P i=1 1(ri = r) α2 r (in this notation, nr becomes nr = m P i=1 1(ri = r)). We next show that the function g satisfies the bounded differences property in the variables (ri)i : constants ci, i [n] such that i [n] and ri [k], the following property is satisfied: sup r i [k] |g(r1, , ri, , rm) g(r1, , r i, , rm)| ci (21) To show this, we denote g(r1, , ri, , rm) = r=1 nr α2 r, g(r1, , r i, , rm) = r=1 n r α2 r, Causal Inference from Competing Treatments where n ri = nri 1, n r i = nr i + 1, and for r = ri, r i, nr = n r, since changing one variable ri to r i only changes two counts of the ranks, nri and nr i, leaving the rest unchanged. Thus, we get that sup r i [k] |g(r1, , ri, , rm) g(r1, , r i, , rm)| = sup r i [k] α2 ri α2 r i since αr [0, 1], r [k]. Thus, the function g satisfies the bounded differences property with right-handside constants equal to 1 for all i [n]. From Mc Diarmid s inequality (Mc Diarmid et al., 1989), we get that for independent random variables Ri [k], i [m], P (|g(R1, , Rm) E [g(R1, , Rm)] | η) 2 exp 2η2 P (|S E [S] | η) 2 exp 2η2 Thus, the probability that the even in equation 20 happens is bounded below: P (|S E[S]| ϵ E[S]) 1 2 exp 2ϵ2E[S]2 As ϵ 0, the right hand side of equation 25 approaches 1. Next, we will use the properties of the allocation rule Aprob that allow us to bound E[S]: Proposition A.2. If B(a) n for an admin a, then under the allocation rule Aprob and a Nash equilibrium x, r=1 nr α2 r = Ω(B(a)/k). Corollary A.3. If B(a) n for an admin a, then under the allocation rule Aprob and an allocation x with n P t=1 xa(t) = B(a), r=1 nr α2 r = Ω(B(a)/C k). Proof of Proposition A.2. First, we note by linearity of expectation and the fact that all variables are non-negative that r=1 nr α2 r r=1 E [nr] α2 r E [n1] α2 1 = E [n1], as we set α1 = 1. The allocation rule Aprob gives us a closed form formula for E [n1], t=1 PAa(xa, t) 1 σ P([k]) PAa(xa, t) Y j<σa (1 PAj(xj, t)) , (26) where P(K) denotes the set of permutation of elements of a set K. We note that each summand in equation (26) is the probability that admin a wins rank 1 for a subject slot t, which we write as pa,t(x) = PAa(xa, t) 1 σ P([k]) PAa(xa, t) Y j<σa (1 PAj(xj, t)) , (27) thus giving t=1 pa,t(x) (28) Causal Inference from Competing Treatments As pa,t is essentially averaging over all possible orderings of the admins, we note that s=0 pa,t(x| admin a appears after s admins in σ) P ( admin a appears after s admins in σ) , (29) where σ is a random ordering of the admins. Thus, pa,t(x) pa,t(x| admin a appears first in σ) P ( admin a appears first in σ) (30) Since σ is a random ordering of the admins, the probability that admin a is the first one in σ is simply 1/k, whereas the probability that admin a wins rank 1 for subject slot 1 conditioned on the fact that it the first one to play is just PAa(xa, t) = 1 (1 p)xa(t) given the allocation rule Aprob. Thus, we obtain Under a Nash equilibrium x, we know from Lemma 4.1 that admin a has exhausted their budget B(a) and has bid only 1s and 0s. Thus, t [n], xa(t)=1 t [n], xa(t)=0 Since PAa(xa, t) = 1 (1 p)xa(t), we get that PAa(xa, t) = p for xa(t) = 1 and PAa(xa, t) = 0 for xa(t) = 0, thus giving t [n], xa(t)=1 E [n1] B(a) p Since p is defined as a constant with respect to all other model parameters, we get that E [n1] = Ω B(a) k . We conclude r=1 nr α2 r = Ω(B(a)/k), which follows from the simple property that if a function f is Ω(n), then for function g for which g(x) f(x), x in the domain of f and g, g also grows as Ω(n). We note that there may be other allocations rule that satisfy this property, and thus our analysis is not contingent on the specific choice of the allocation rule, as long as it preserves this property. Proof for Corollary A.3. Corollary A.3 follows easily from the above proof with a minimal modification of equations (32) (33), knowing that an admin can bid at most C budget units on every subject slot and they have exhausted their budget: t [n], xa(t) 1 t [n], xa(t)=0 Since PAa(xa, t) = 1 (1 p)xa(t), we get that PAa(xa, t) p for xa(t) 1 and PAa(xa, t) = 0 for xa(t) = 0, thus giving Causal Inference from Competing Treatments t [n], xa(t) 1 E [n1] B(a) p under the assumption that B(a) n. Just as before, we get that E [n1] = Ω B(a) C k and conclude that r=1 nr α2 r = Ω(B(a)/(C k)). Corollary A.3 simply states that the expectation of the variable S grows at at least the rate of B(a)/k (since C is a constant) when an admin uses their entire budget, whether in a Nash equilibrium or not (intuitively, subject to using all budget, any permutation of how the budget is allocated across subject slots, the expectation of S does not get too small ). Take a constant c > 0 such that E[S] c B(a) Since B(a)/C m B(a), we also know that there exists a constant c > 0 such that E[S] c m k . We take η in equation (24) to be equal to η = c0 m, with c0 to be determined. Then, equation (24) becomes P (|S E [S] | η) 1 2 exp 2η2 P |S E [S] | m c0 1 2 exp 2c2 0 (37) Taking c0 = ϵ c m k , we get from equation (36): P |S E [S] | ϵ c m 1 2 exp 2ϵ2c 2m/k2 P (|S E [S] | ϵ E[S]) 1 2 exp 2ϵ2c 2m/k2 (38) The right hand side tends to 1 when the exponential vanishes, i.e. in the regime where B(a) = ω(k2). Various modeling assumptions may satisfy this requirement, for example when the budget and the number of administrators grow sublinearly in n, with B(a) = Ω(nb) and k = O(nb/2 δ), for b (0, 1) and δ (0, b/2). A simple corollary follows from Proposition A.1: Corollary A.4. For allocations x D in which all admins use all their budget, n P t=1 xa(t) = B(a), a [k], with high probability 1 S h (1 ϵ) 1 E[S], (1 + ϵ) 1 E[S] i , with ϵ > 0. Proof. The proof follows easily by constructing ϵ and using Proposition A.1. Since we know that with high probability, S (1 ϵ) E[S] for an allocation x and S is positive, S (1 ϵ) E[S] 1 S 1 (1 ϵ) E[S], S (1 + ϵ) E[S] 1 S 1 (1 + ϵ) E[S] Set 1 1 ϵ = 1 + ϵ0 ϵ = ϵ0 1+ϵ0 1 1+ϵ = 1 ϵ0 1+2ϵ0 . Choosing ϵ = max ϵ0, ϵ0 1+2ϵ0 , we get that Causal Inference from Competing Treatments P 1 S 1 E[S] = P (|S E[S]| ϵ E[S]) 2 exp 2ϵ2c 2m/k2 , (40) concluding our proof. For m/k2 , the right hand side vanishes. We return to the proof of Theorem 2.2. We denote by f MSE a the utility function of the game GMSE and by f SV a the utility function of the game GSV, for each admin a. Note that f SV a is a non-negative function on its entire domain for all admins a, whereas f MSE a is a non-positive function on its entire domain for all admins a. Note that f MSE a (x) = E S and f SV a (x) = E k P r=1 nr α2 r by definition of the random variable S from equation (19). According to Proposition A.1, we know that S concentrates around its expectation. For ϵ > 0, denote the event that S [(1 ϵ) E[S], (1 + ϵ) E[S]] by E for ease of notation. Thus, S |E P [E] + E 1 S |Ec P [Ec] , (41) where Ac denotes the complement of the event A. We know that, conditioned on E, 1 S is concentrated around 1 E[S], whereas conditioned on Ec, expectation of 1 S is bounded by 1 given that the error estimation objective outputs min 1 S , 1 . Denote by η the probability of E (from Proposition A.1, η = 2e 2ϵ2c 2m/k2). Thus, we obtain S |E + 1 η (1 + ϵ) 1 E[S] + η f MSE a (x) (1 + ϵ) 1 f SV a (x) which holds for all allocations x such that all admins have allocated all their budget, under the assumption that an admin can bid at most C budget units on any subject slot. Furthermore, applying Jensen s inequality on the function f(x) = 1/x, we get that for all allocations xa f MSE a (xa, x a) 1 f SV a (xa, x a) Now, take x to be a Nash equilibrium for the game, and we will show that x has to be an (ϵ, η)-approximate Nash equilibrium for the game GMSE. Thus, we would like to show that x a Da, f SV a (xa, x a) f SV a (x a, x a) f MSE a (xa, x a) (1 + ϵ) f MSE a (x a, x a) η (44) with ϵ, η > 0, a [k]. Causal Inference from Competing Treatments We will first show that the above statement holds for all x a Da such that n P t=1 x a(t) = B(a) (in other words, admin a has exhausted their budget with strategy x a). Given that x is a Nash equilibrium for GSV, we get f SV a (xa, x a) f SV a (x a, x a) 1 f SV a (xa, x a) 1 f SV a (x a, x a) f MSE a (xa, x a) (1 + ϵ) 1 f SV a (xa, x a) δ (1 + ϵ) 1 f SV a (x a, x a) δ (1 + ϵ) f MSE a (x a, x a) η (45) where the last inequality follows from Jensen s inequality, as described above. Since ϵ, η > 0, we get the (ϵ, η)- approximation condition. Thus, we are left to show inequality (44) for allocations x a that may not exhaust the entire budget B(a). For each such allocation, create another allocation xfull a by using the rest of the budget randomly across subject slots that had no bid on x a. Thus, xfull a satisfies the conditions necessary for Proposition A.1. We note that the utility function f SV a can only increase from bidding a unit of budget or more on additional subjects which had a previous bid of 0 budget, since f SV a is a summation of the utility coming from all subjects t [n] with all terms being non-negative, and PAa(xa, t) = 0 if xa(t) = 0 and PAa(xa, t) p > 0 if xa(t) 1. Thus, we get that f SV a (xfull a , x a) f SV a (x a, x a) 1 f SV a (xfull a , x a) 1 f SV a (x a, x a) (46) From equations (45) 43 and using the Jensen inequality again, we obtain that f MSE a (xfull a , x a) (1+ϵ) 1 f SV a (xfull a , x a) η (1+ϵ) 1 f SV a (x a, x a) η (1+ϵ) f MSE a (x a, x a) η, (47) thus proving that a Nash equilibrium for the game GSV is an (ϵ, η)-approximate Nash equilibrium for the game GMSE. We note that ϵ and η are and η = O exp( B/k2) , (48) where B = mina B(a) and the maximum bid per subject slot is bounded. These come from equation (38), in which the concentration bounds links the concentration closeness ϵ with the probability of concentration η. Causal Inference from Competing Treatments A.3. Proofs for Section 3 Results We define a population distribution X on the subjects, which are fully described by the random variables (rank, T, Y ). Individuals are drawn independently for each slot, according to a fixed rank allocation r. In particular, we sample the nr individuals from their slots at rank r from the do-interventional distribution X do(rank=r). We highlight that the outcomes Yi for i [n(a)] are not identically distributed. Their distribution differs according to their associated ranks ri. The ranks ri are assigned by the rank allocation r, and therefore, deterministic in i. They merely indicate from which do-interventional distribution the individual was drawn. We denote the joint distribution of the outcomes and treatment assignment variables Zi = (Yi, Ti), i [n(a)] by P. We explicitly define the set of considered model instances M. For a given sequence (αr)r=1,...,k, we set M = ci,0, c(ri) i,1 i [n(a)] : |τ| 1, τr = αr τ , (49) with the quantities τ = 1 n(a) Pn(a) i=1 c (ri) i,1 αri and i:ri=r c(r) i,1 for nr 1 αr τ for nr = 0 for r [k].6 Moreover, note that we also identify the causal effect of interest in equation (7) through the potential outcomes model as c(ri) i,1 αri . Proof of Theorem 3.4. We follow the proof of Theorem 2 in Cortez-Rodriguez et al. (2023) and apply Le Cam s method (Le Cam, 1973) to obtain the minimax bound. For the computation, we make use of the independence between individuals, and the fact that we can sort them according to their assigned ranks. We use Le Cam s method (Le Cam, 1973) to obtain the proof of Theorem 3.4. Lemma A.5 (Le Cam s method). Let M be a set of model instances. Then, we have for any two instances M1, M2 M such that |τ(M1) τ(M2)| 2δ with δ > 0: inf bτ sup M M E (bτ τ)2 δ2 DKL(P1||P2) where P1 and P2 denote the population distribution with respect to M1 and M2, respectively, and DKL( || ) denotes the Kullback-Leibler divergence. We begin the proof by constructing instances M1 and M2. We set ci,0 = 0 and cri i,1 = δ αri for M1 and ci,0 = 0 and cri i,1 = δ αri for M2. Note that α1 = 1 by definition. Hence, we get τ(M1) = 1 n(a) c(ri) i,1 αri = 1 n(a) αri = δ (51) τ(M2) = = δ (52) So, we have |τ(M1) τ(M2)| 2δ. Thus, the instances M1 and M2 satisfy the coverage condition in Lemma A.5. 6We note that when there are no datapoints at rank r, we cannot identify the effect directly, but we can identify it from data at other ranks since the structural parameters (αr)r are known. Causal Inference from Competing Treatments Next, we compute DKL(P1||P2) with P1 and P2 are the joint distribution of (Yi, Ti), i [n(a)] for the instances M1 and M2. We will choose a δ to upper bound DKL(P1||P2) by 1/2. As all samples are independently drawn, the joint distributions P1 and P2 factorize as: P1({(Yi, Ti)}n(a) i=1 ) = i=1 P1(Yi, Ti) P2({(Yi, Ti)}n(a) i=1 ) = i=1 P2(Yi, Ti). Note that the outcomes Yi and Yj are differently distributed when the subjects i and j are sampled from different ranks, ri = rj. Hence, we have in general that P1(Yi, Ti) = P1(Yj, Tj) and P2(Yi, Ti) = P2(Yj, Tj) for i = j. We obtain that DKL(P1||P2) = EM1 p1({(Yi, Ti)}n(a) i=1 ) p2({(Yi, Ti)}n(a) i=1 ) log p1(Yi, Ti) with p1 and p2 the densities of P1 and P2, respectively. We sort the samples according to their allocated ranks and apply the Law of Total Expectation, DKL(P1||P2) = log p1(Yj, Tj) j:rj=r P1(Tj = 1) EM1 log p1(Yj, Tj) + P1(Tj = 0) EM1 log p1(Yj, Tj) |Tj = 0 (53) Focusing on samples drawn from a fixed rank r, the outcomes Yj conditional on the treatment Tj, are distributed as: Yj|Tj N c(r) i,1 Tj, σ2 (54) as Yj = cj,0 + c(r) j,1 Tj + εj with cj,0 = 0 and εj N(0, σ2). Hence, we have for M1 Yj|Tj N(δ αr Tj, σ2) and for M2 Yj|Tj N( δ αr Tj, σ2). We have Yj|(Tj = 0) N(0, σ2) for both instances M1 and M2. Therefore, the second term in equation (53) simplifies to zero. We remain with DKL(P1||P2) = j:rj=r P1(Tj = 1) EM1 log p1(Yj, Tj) j:rj=r P1(Tj = 1) EM1 log φδ αr,σ2(Yj, Tj) φ δ αr,σ2(Yj, Tj) where φµ,σ2 denotes the density function of N(µ, σ2). Note that the difference in distribution across ranks r is now made explicit in the densities, e.g., φδ αr,σ2 for model instance M1. We input the explicit formula for the normal density and use Causal Inference from Competing Treatments 0 1 2 3 4 5 sample value minimax lower bound 2/(16 (1 q) sample value) one Figure 3. Example of minimax lower bound with q = 0.5 and σ2 = 4. that Tj Ber(qj), for qj [q, 1 q], DKL(P1||P2) = j:rj=r qj EM1 2σ2 (Yi δ αr)2 (Yi + δ αr)2 |Tj = 1 (55) j:rj=r qj 2 σ2 δ αr EM1 [Yj|Tj = 1] (56) j:rj=r qj 2 σ2 δ αr (0 + δ αr + E[εj]) (57) j:rj=r qj 2 σ2 δ2 α2 r (58) r=1 nr α2 r. (59) We can bound DKL(P1||P2) by 1/2 if we set 4 (1 q) Pk r=1 nr α2r . Then, Le Cam s method gives, inf bτ sup M M E (bτ τ)2 δ2 DKL(P1||P2) 16 (1 q) Pk r=1 nr α2r (61) Finally, note that the sample value P r nr α2 r is lower bounded by one if the admin attains at least one slot at rank one, i.e., n1 = 1. However, it might happen that no slot is won at rank one (in fact, it may happen that nr = 0 for some other rank r too). In this case, the sample value is sometimes lower than one, and as it decreases, the inverse of Pk r=1 nr α2 r 1 may explode. However, the minimum error is still upper bounded by 1 since the admin is better off using a constant estimator, as the effect is bounded, |τ| 1. Therefore, the minimax lower bound is the minimum of the error achieved using the samples and the error obtained by a constant estimator. See Figure 3 for an illustration. Causal Inference from Competing Treatments Proof of Proposition 3.5. We note that τr = 1 nr P i:ri=r c(r) i,1 from the potential outcomes model. Then, we compute the expectation of our estimator as: Yi1{Ti = 1} P(Ti = 1) Yi1{Ti = 0} i:ri=r E Yi1{Ti = 1} P(Ti = 1) Yi1{Ti = 0} For every i, note that, by linearity of expectation and the potential outcomes model for subjects i at rank ri = r, E Yi1{Ti = 1} P(Ti = 1) Yi1{Ti = 0} = ci,0 E 1{Ti = 1} P(Ti = 1) 1{Ti = 0} +c(r) i,1 E Ti 1{Ti = 1} P(Ti = 1) 1{Ti = 0} +E[εi] E 1{Ti = 1} P(Ti = 1) 1{Ti = 0} = ci,0 E 1{Ti = 1} P(Ti = 1) 1{Ti = 0} +c(r) i,1 E Ti 1{Ti = 1} P(Ti = 1) 1{Ti = 0} Assuming that the actual treatment Ti is administered through a Bernoulli randomized design, Ti Ber(qi), for qi [q, 1 q], we get that the first term of equation (63) reduces to 0 and the second one to c(r) i,1. To see this briefly, note that the first term is equal to ci,0 E 1{Ti = 1} P(Ti = 1)] 1{Ti = 0} = ci,0 E 1{Ti = 1} P(Ti = 1)] 1{Ti = 0} = ci,0 P(Ti = 1) P(Ti = 1) P(Ti = 0) using the law of total expectation. Similarly, the second term is equal to c(r) i,1 E Ti 1{Ti = 1} P(Ti = 1) 1{Ti = 0} = c(r) i,1 E Ti 1{Ti = 1} P(Ti = 1) 1{Ti = 0} = c(r) i,1 E Ti 1{Ti = 1} E Ti 1{Ti = 0} = c(r) i,1 E 1{Ti = 1} 1{Ti = 1} E 1{Ti = 1} 1{Ti = 0} = c(r) i,1 E 1{Ti = 1} = c(r) i,1 P(Ti = 1) P(Ti = 1) = c(r) i,1 Causal Inference from Competing Treatments as Ti = 1{Ti = 1} by definition, and we know that 1{Ti = 1} 1{Ti = 1} = 1{Ti = 1} and 1{Ti = 1} 1{Ti = 0} = 0. Coming back to the sum over all subjects i at rank r, we get that i:ri=r c(r) i,1 = τr (66) Proof for Proposition 3.6. Denote by Yi1{Ti = d} P(Ti = d) , (67) for d {0, 1}. Then, we have bτr = bτr,1 bτr,0. We begin by computing the variance of bτr,d for d {0, 1}. Var(bτr,d) = 1 Cov(Yi1{Ti = d}, Yj1{Tj = d}) P(Ti = d)P(Tj = d) (68) Let s first take the case d = 1. Case 1: For i = j, Cov(Yi1{Ti = 1}, Yj1{Tj = 1}) = Var(Yi1{Ti = 1}) (69) Var(Yi1{Ti = 1}) = Var(εi) Var(1{Ti = 1}) + Var(εi) E(1{Ti = 1})2 +E(Yi|Ti)2 Var(1{Ti = 1}) σ2 qi(1 qi) + σ2 q2 i + ci,0 + c(r) i,1 + E(εi) 2 qi(1 qi) = σ2 qi + Y 2 max qi(1 qi) where the last inequality holds under the bounded effect assumption from equation 3.1. Case 2: For i = j, under the Bernoulli design, treatment assignments are independent and so no treatment affects both units i and j. Hence, Cov(Yi1{Ti = 1}, Yj1{Tj = 1}) = 0. Var(bτr,1) 1 σ2 qi + Y 2 max qi(1 qi) σ2 + Y 2 max (1 qi) 1 nr q σ2 + Y 2 max (1 q) , given that the second equation is a decreasing function in qi and qi [q, 1 q]. In a similar way we obtain a bound for d = 0: Var(bτr,0) 1 nr (1 q) σ2 + Y 2 max q (72) Causal Inference from Competing Treatments Step 2: Next, we bound the covariance between bτr,0 and bτr,1: Cov(bτr,1, bτr,0) = E[bτr,1 bτr,0] E[bτr,1] E[bτr,0] (73) A simple calculation shows that E[bτr,t] = 1 i:ri=r Yi(Ti = t) (74) E[bτr,1 bτr,0] = E Yi1(Ti = 1) Yi1(Ti = 0) Using equations (74) and (75) in the covariance expression and simplifying, we get Cov(bτr,1, bτr,0) = 1 P(Ti = 1&Tj = 0) P(Ti = 1)P(Tj = 0) Yi(Ti = 1)Yj(Tj = 0) X i:ri=r Yi(Ti = 1) X i:ri=r Yi(Ti = 0) i =j Yi(Ti = 1)Yj(Tj = 0) X i,j Yi(Ti = 1)Yj(Tj = 0) i:ri=r Yi(Ti = 1, )Yi(Ti = 0) Given the bounded effect assumption from equation 3.1, we can bound Cov(bτr,1, bτr,0) Y 2 max nr (77) Step 3: Putting equations (71), (72), and (77) together, we get Var(bτr) = Var(bτr,1) + Var(bτr,0) 2 Cov(bτr,1, bτr,0) q(1 q) + Y 2 max (1 q)2 + q2 q(1 q) + 2 (78) Causal Inference from Competing Treatments A.4. A Discussion on Estimating the Discount Factors When the discount factors are not known, an admin is put in the following position: should he discard the (potentially useful) data at lower ranks and just use the data at rank 1 for estimating the treatment effect, or should he try to engage the lower rank data by using part of it to estimate the discount factors and part of it for estimating the treatment? At a first glance, the latter might sound appealing, especially if there was significantly more data at lower ranks than at rank 1. However, this latter option does not actually decrease the estimation error, even in idealized cases. We showcase this claim in an ideal situation below, where the lower rank data exhibits no error in the τr estimator (i.e. E[(τr ˆτr)2] = 0 for r > 1). This result uses a naïve estimator for the discount parameters (αr)r, leaving the problem of using more complex estimators for future work. To formalize our set-up, we define: Scenario (1): an admin uses just the data where their treatment was shown on the first rank, and thus τ = τ1. We denote an estimator for this scenario by ˆτ(1), noting that it uses n1 samples. Scenario (2): an admin splits the data at all positions, using part of it for estimating the position effect parameters (αr)r, and then using these estimates together with the rest of the data for estimating the treatment effect τ. We denote an estimator for this scenario by ˆτ(2). Lemma A.6. Assume a naïve estimator for the discount factors: ˆαr = ˆτr/ˆτ1, r. For all valid ways of splitting the data between estimating (αr)r and estimating τ, E[(ˆτ(1) τ)2] E[(ˆτ(2) τ)2] Proof. First, splitting the data in a valid way means using some amount of samples n from both the rank 1 data and rank r data to estimate αr with the naïve estimator, and using the remaining (non-overlapping) samples to estimate τ. One must have n n1 and n nr. Since we have k 1 such estimation problems, for every rank r [k] except the first one, one must use k 1 disjoint sample sets from the rank 1 data, totalling at most the number of samples at rank 1 (since we cannot use more data than available). In scenario (1), the admin can only use the data at rank 1 in order to estimate τ (since it is the treatment effect had the ads been only shown at rank 1 to all subjects). Without any knowledge of (αr)r, the admin cannot use the data at rank 2, 3, etc to identify the treatment effect through τr. Thus, τ = τ1, which, as we have showed in Theorem 3.4 and Proposition 3.6, has error E (τ bτ)2 = Θ n 1 1 (recall that n1 is the sample size at position 1). In scenario (2), the admin uses the samples at all positions, relating the effect τ from data at rank r through the bαr estimate: bτ (r) = bτr bαr . Note that the data used for estimating bαr must be separate from the data used for estimating bτ (r). Let s say that an admin uses n r samples from the nr samples at each rank for estimating bαr. The rest of nr n r samples will be used to estimate the r-estimand. In an ideal case where the estimator at rank r is a perfect estimator, we have bτr = τr. In this case, we only have to worry about the contribution of the bαr estimates to the estimation error. For example, one can imagine that we have an abundance of data at lower ranks, nr >> 1 for r > 1, and thus the estimation error of τr vanishes with increased number of samples. Note that ˆτ(2) depends on the values (n r)r. However, the result holds no matter what these values are, i.e. no matter how many samples the admin uses from each rank. This is at first unintuitive. The sample size of the (final) estimator in scenario (2), P r n r, may be much larger than n1, and yet, that does not help in reducing the estimation error. At a closer look, we ll see that the error contributed by the discount factors estimation, (bαr)r, is a dominant term in the error of the treatment effect. Take as a naive estimator for αr the ratio of the treatment effects at rank r and rank 1: bαr = bτr bτ1 , r > 1, using n r samples from the nr samples from rank r and n r samples from the n1 samples at rank 1; i.e., we use 2n r samples to estimate αr, split equally between rank 1 and rank r data. The remaining nr n r samples from the data at rank r are kept for contributing to estimating τ. To better keep track of the number of samples used, we denote an estimator ˆθ based on n samples by ˆθ(n). Then, bαr = bτr(n r) bτ1(n r), r > 1. So, 1 bαr = bτ1(n r) bτr = bτ1(n r) τr . Using the remaining samples, the r estimand can be estimated through bτr(nr n r) bαr = τr bαr = bτ1(n r). Thus, the r-estimand uses the n r samples used for the estimator of the discount factors. The nr n r samples used in the τr estimation do not factor in because we assumed it is a perfect estimator. Causal Inference from Competing Treatments Thus, the available data at all ranks can be used to estimate τ through the identified effects at each rank, using n1 P r n r samples for rank 1, n 2 samples for rank 2, and so on until n k samples for rank k. Just as in the main text, α1 = 1, so we can use the entire remaining samples at rank 1 directly, i.e., n 1 = 0. Theorem 3.4 computes a lower bound based on the combination of the r-estimands. The lower bound obtain is a summation of the errors each of them contributes, namely over the sequence (nr α2 r)r. This connects to the sample value definition from Section 2.1. In our case, the same r-estimands contribute a sequence of errors (n1 P r n r, n 2, , n k). Applying Theorem 3.4 with the same proof but this updated sequence, we obtain an er- ror lower bound of Ω ((n1 P r n r) + n 2 + n k) 1 = Ω(n 1 1 ). (Note that n 1 = 0 since α1 = 1.) The constants are, in fact, the same as in the original Theorem 3.4. This updated bound does not depend on the amount of samples used to estimate αr, namely, on (n r)r. Thus, the lower bound remains the same no matter how many or how few samples an admin has used to estimate the discount factors. To see this exemplified, we use the Horvitz-Thompson estimator for bτ1, and define the estimand as a weighted sum of the r estimands: bτ = P r wr bτ (r). As bτ (r) are estimated through bτ1 applied on either n1 P r n r or n r samples for each rank, it is still unbiased. We can then apply the optimal weighting argument from the main text that dictates that error achieved by bτ through optimal weighting is O P r 1/Var bτ (r) 1 . We know from Proposition 3.6 that Var bτ (r) = C 1 nr when there are nr data samples for a constant C. We thus obtain an upper bound on the treatment effect estimation error of r 1/Var bτ (r) ! 1 r n r) + n 2 + + n k = O n 1 1 , (79) matching the lower bound (up to constants). One would have imagined that having more samples used from rank 2, 3, and so on that contribute to estimating the treatment effect would have helped reduced the treatment effect estimation error. However, even in this idealized scenario where one has perfect estimation of τr at each rank r, the contribution of the discount factors error to the treatment effect error trumps using more low-ranked samples. In realistic scenarios where bτr is not a perfect estimator for r > 1, one would expect that their error contributes to the estimation error in scenario (2). In this case, an admin can only increase their error by attempting to estimate (αr)r from an ad campaign, and is better off (up to constants) just using the n1 samples it has from rank 1, scenario (1). By how much does an admin increases their estimation error is a complex question, depending on the modeling assumptions and estimators chosen for the discount factors. We leave this for future work. Related work has proposed various methods for estimating the discount factors (Chuklin et al., 2016; Joachims et al., 2017), but to our knowledge, none answer this particular question of when is it actually worth doing that to reduce the treatment effect estimation error. Causal Inference from Competing Treatments A.5. Proofs for Section 4 Results Proof for Lemma 4.1. Our goal is to show that if an allocation x satisfies the conditions of Lemma 4.1, i.e. xa(t) {0, 1}, a [k], t [n], Pn t=1 xa(t) = B(a), a [k], and | Pk a=1 xa(t) Pk a=1 xa(t )| 1, t, t [n], then it is an equilibrium. We show this through two properties: Property 1: an admin a cannot increase his utility by permuting the bids in x. Property 2: an admin a cannot increase his utility by aggregating his budget (merging bids from multiple subjects in x). In other words, we show that an admin cannot increase his utility by deviating from x, by analyzing the two possible types of deviations: permutations or aggregation of bids. Notation: We introduce simplifying notation in order to prove the two aforementioned properties. As the utility function of an admin is the expected sample value, by linearity of expectation, we can write r=1 E [nr] α2 r (80) Given our allocation rule, we can write the utility function in combinatorial form, by finding the closed for of E [nr] , r [k], r=1 α2 r 1 k 1 k r X 1 (k r + 1)! σ P(s {a}) PAa(xa, t) Y j<σa (1 PAj(xj, t)) 1 1 k 1 k r X 1 (k r + 1)! σ s {a} PAa(xa, t) Y j<σa (1 PAj(xj, t)) where S(a) k r is the set of all choices of k r elements of [k]\{a} and j <σ a means that element j appeared before element a in the permutation σ. The summand is the probability of admin a to win rank r for slot t, by summing over all possible permutations at each rank and possible remaining admins (who have not yet won a rank). The probability of admin a to win a rank is the activation probability times the probability that no one else before a has won, and times the probability that a has not won a previous rank. To make things a bit more readable, take the summand for a particular subject t outside of the factor α2 r and denote it by gr,a,t(xa, x a) := PAa(xa, t) 1 k 1 k r 1 (k r + 1)! j<σa (1 PAj(xj, t)) (82) Note that the factor outside of PAa(xa, t) in equation (82) depends on the rank r but does not contain xa or PAa(xa, t). So since we are looking at fa(xa, x a) and gr,a,t(xa, x a) as a function of xa, we consider this factor a constant in xa. For an even more simplifying notation: Cr,a,t := 1 k 1 k r 1 (k r + 1)! j<σa (1 PAj(xj, t)), (83) thus giving gr,a,t(xa, x a) = PAa(xa, t) Cr,a,t (84) Coming back to the utility function, we get Causal Inference from Competing Treatments fa(xa, x a) = r=1 α2 r gr,a,t(xa, x a) Y r fa,t(x a, x a)), then xa has better utility than xa for any αr [0, 1]. This follows by taking l = k, ωr = α2 r, r [k], and hr(x) = gr,a,t(xa, x a) Q r 1 t T, for an admin a. Without loss of generality, assume that a = 2 (the second admin). We have two cases: Case 1: there is a subject t T such that x1t = 0. Case 2: there is no subject t T such that x1t = 0. Let s first tackle Case 1. In this case, we know that there is a subject t T such that x1t = 0. Since x2t > 1 for t T and B2 n, there exists a subject t / T such that x2t = 0. Denote x2t := x and x1t := 0. We will show that the configuration 0 x is not a Nash equilibrium for x, x > 1. If one of x or x is equal to 1, then we know this is not Nash because of Property 2 in the proof of Lemma 4.1. Assume without loss of generality that x x . We will show that We want to show this for any sequence (αr)r. However, due to Proposition A.7, we will argue that it is sufficient to show it for αr = 1, r [k]. To see why Proposition A.7 applies, we take as functions hr each term corresponding to α2 r, for all r [k]. We also compute the terms corresponding to α2 1 in the utility function, denoting p := 1 p for ease of notation. Using the combinatorial formula for the utility, we get that the coefficient of α2 1 in f1 is 1 px , and the coefficient of α2 1 in f1 is 1 px 1 + 1 2(1 p) (1 + px). Since x x and all terms are non-negative, it is sufficient to 1 px < 1 px 1 + 1 2(1 p) (1 + px ) px 1 (1 p) < 1 2(1 p) (1 + px ) 2 (1 + px ), which is true for x > 1. We have used that p (0, 1). Thus, the conditions for Proposition A.7 would be satisfied. We are left to show that inequality (110) happens for αr = 1, r [k]. Writing out the utility functions from their combinatorial form, we get = (1 px ) (1 + px ) (112) = (1 px 1) (1 + px 1) + (1 p) 1 + 1 2p(1 + px) (113) Again, since x x and all terms are non-negative, it is sufficient to show (1 px ) (1 + px ) < (1 px 1) (1 + px 1) + (1 p) 1 + 1 2p(1 + px ) (114) Equation (114) simplifies to Causal Inference from Competing Treatments p2x 2(1 + p) < 1 + 1 2p 1 + px (115) This is true for x > 2 since p2x 1 < 1, p2x 2 < p, and p2x 2 < px +1. For x = 2, the equation simplifies to 2p3 < 1 + 1 We next tackle Case 2, for which we know that there is no subject t T such that x1t = 0. Now, if there exists a subject t / T such that x1t > 1, there must also exist a subject t / T such that x1t = 0, since B1 n. However, we know that x2t, x2t {0, 1} as t, t / T, by definition of the set T. However, none of the configurations x1t 0 0 0 , x1t 0 1 1 , x1t 0 1 0 , x1t 0 0 1 is an equilibrium for x1t > 1, from Property 2 of the proof of Lemma 4.1 (when the other bids are 0 or 1, an admin always wants to split their bid equally). That means that for all subjects t / T, x1t {0, 1}. We will also show that for all subjects t T, x1t {0, 1}. By contradiction, if there is a subject t T, x1t > 1, take a subject t / T such that x2t = 0 (we know it exists since admin s 2 bids in T are greater than 1 and B2 n). Then, the configurations x1t x1t x2t 0 are not equilibria for x1t, x2t > 1 and x1t {0, 1}, easy to show by computing the utility function in its combinatorial form and using p (0, 1). Thus, we have shown that for all subjects t, x1t {0, 1}. However, we now have a contradiction, since by Property 2, x2t {0, 1} as well, which we assumed otherwise for t T.