# online_pandoras_boxes_and_bandits__db96408a.pdf The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) Online Pandora s Boxes and Bandits Hossein Esfandiari,1 Mohammad Taghi Haji Aghayi,2 Brendan Lucier,3 Michael Mitzenmacher4 1Google Research, 2University of Maryland, 3Microsoft Research, 4Harvard University esfandiari@google.com, hajiagha@cs.umd.edu, brlucier@microsoft.com, michaelm@eecs.harvard.edu We consider online variations of the Pandora s box problem (Weitzman 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)1. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance. 1 Introduction Information learning costs play a large role in a variety of markets and optimization tasks. For example, in the academic job market, obtaining information about a potential match is a costly investment for both sides of the market. Conserving on information costs is an important component of efficiency in such settings. A classic model for information learning costs is the Pandora s box problem, attributed to Weitzman (Weitzman 1979), which has the following form. Pandora has n boxes, where the ith box contains a prize of value vi that has a known cumulative distribution function Fi. It costs ci to open the ith box and reveal the actual value vi. Pandora may open as many boxes as she likes, in any order. The payoff is the maximum-valued prize, minus the cost of the opened boxes. That is, if S is the subset of opened boxes, then the Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1See section 2 for formal definitions. payoff Pandora seeks to maximize is max i S vi X The Pandora s box problem incorporates two key decision aspects: the ordering for opening boxes, and when to stop. It has been proposed for applications such as buying or selling a house and searching for a job. The original Pandora s box problem has a simple and elegant solution. The reservation price v i associated with an unopened box i is the value for which Pandora would be indifferent taking a prize with that value and opening box i. That is, v i = inf{y : y ci + E [max vi, y]} = inf{y : ci E [max vi y, 0]}. This result says that if Pandora is allowed to choose the ordering, Pandora should keep opening boxes in the order of decreasing reservation price, but should stop searching when the largest prize value obtained exceeds the reservation price of all unopened boxes. An alternative proof to Weitzman s proof (Weitzman 1979) of this was recently provided by Kleinberg, Waggoner, and Weyl (Kleinberg and Weyl 2016), who also present additional applications, including to auctions. Very recently Singla (Singla 2018) generalizes the approach of Kleinberg et al. (Kleinberg and Weyl 2016) for more applications in offline combinatorial problems such as matching, set cover, facility location, and prize-collecting Steiner tree. In other similar problems, the ordering is chosen adversarially and adaptively. For example, in the prophet inequality setting first introduced in 1977 by Garling, Krengel, and Sucheston (Krengel and Sucheston 1978; 1977), the boxes have no cost, and the prize distributions are known, but the decision-maker has to decide after each successive box whether to stop the process and keep the corresponding prize; if not, the prize cannot be claimed later. It is known that there exists a threshold-based algorithm that in expectation obtains a prize value within a factor of two of the expected maximum prize (and the factor of two is tight) (Krengel and Sucheston 1978; 1977). There have subsequently been many generalizations of the prophet inequality setting, especially to applications in online auctions (see e.g. (Mohammad Taghi Hajiaghayi and Sandholm 2007; Alaei, Hajiaghayi, and Liaghat 2012; Hajiaghayi, Kleinberg, and Parkes 2004; Alaei 2014; Yan 2011; Kleinberg and Weinberg 2012; Babaioff, Immorlica, and Kleinberg 2007; Lachish 2014; Feldman, Svensson, and Zenklusen 2015; Goel and Mehta 2008; Korula and P al 2009; Mahdian and Yan 2011; Karande, Mehta, and Tripathi 2011; Kesselheim et al. 2013; Guruganesh and Singla 2017; Esfandiari et al. 2017; Duetting, Feldman, and Lucier 2017)). Another related and well-studied theme includes multiarmed bandit problems and more generally reinforcement learning (see, e.g., (Gittins and Jones 1974)). In this setting, each box corresponds to a strategy, or arm, that has a payoff in each round. An online algorithm chooses one arm from a set of arms in each round over n rounds. Viewed in the language of selection problems, this translates to a feasibility constraint on the set of boxes that can be opened. Multi-armed bandit problems have applications including online auctions, adaptive routing, and the theory of learning in games. In this paper, we consider a class of problems that combine the cost considerations of Pandora s box with the online nature of prophet inequality problems. Again boxes are presented online, here with random values and costs drawn jointly from some distribution. Pandora chooses online whether to open each box, and then whether to keep it or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies in deciding which boxes to open, without knowledge of the value inside. We consider variations where Pandora can collect multiple prizes subject to sets of constraints. For example, Pandora may be able to keep at most k prizes, the selected prizes must form an independent set in a matroid, or the prizes might have associated weights that form a knapsack constraint. We also introduce variations related to classic multiarmed bandit problems and reinforcement learning, where there are feasibility constraints on the set of boxes that can be opened. Our work shows that in many scenarios even without the power of ordering choices, Pandora can achieve a good approximation to the best possible performance. Our main result is a reduction from this general class of problems, which we refer to as online Pandora s box problems, to the problem of finding threshold-based algorithms for the associated prophet inequality problems where all costs are zero. Our reduction is constructive, and results in a polynomial-time policy, given a polynomial-time algorithm for constructing thresholds in the reduced problem. We first describe the reduction in Section 2. Then in Section 3, we show how to use known results from the prophet inequality literature to directly infer solutions to online Pandora s box problems. Finally, in Section 4, we establish an algorithm for a multi-armed bandit variant of the online Pandora s box problem, by proving a novel multi-armed prophet inequality. 2 Pandora s Boxes Under General Constraints In this section we consider a very general version of an online Pandora s box problem, with the goal of showing that, if there is a suitable corresponding prophet inequality algorithm, we can use it in a way that yields good approximation ratios for the Pandora s box problem. We define the problem as follows. There is a sequence of boxes that arrive online, in an order chosen by an adversary (i.e., worst case order). Each box has a cost ci, a value vi, and a type ti. The tuple (ci, vi, ti) is drawn from a joint distribution Fi. The distributions are known in advance. When a box is presented, we observe its type ti. We can then choose whether to open the box. We note that, given the type ti, vi and ci have conditional distributions depending on the type ti. There is a set of constraints dictating what combinations of boxes can be opened; these constraints can depend on the indexes of the boxes and their types. If we open the box, then vi and ci are revealed, and we pay ci for opening the box. We must then choose (irrevocably) whether to keep and collect vi. There is also a set of constraints dictating what combinations of values can be kept; these constraints can depend on the indexes of the boxes and their types. We indicate the set of opened boxes by S {1, . . . , n} and the set of of kept boxes by R S. The goal is to maximize the expected utility E P One might want to consider an adversary that obtains E[max R,S P i S ci], that is, a fully clairvoyant adversary. However, it is not possible to provide any competitive algorithm against such an adversary even for the simple classical Pandora s box problem.2 We denote the expected utility of an algorithm Alg by u Alg. We compare our algorithms against the (potentially exponential time) optimal offline algorithm Opt that maximizes the expected utility. Specifically, Opt can see all the types ti of all the boxes, and Opt can choose to open boxes in any order. However, Opt does not learn the resulting cost and value ci and vi for a box until it is opened. Opt iteratively and adaptively opens boxes and at the end chooses a subset of opened boxes to keep. Of course Opt must respect the constraints on opened boxes and kept boxes. We first prove some fundamental lemmas that capture important structure for this Pandora s box problem. We then use these lemmas to provide a strong connection between the online Pandora s box problem and prophet inequalities. Our results allow us to translate several prophet inequalities algorithms, such as prophet inequalities under capacity constraints, matroid constraints, or knapsack constraints, to algorithms for online Pandora s box algorithms under the same constraints. 2Consider the following example with n identical (and independent) boxes where we have no constraint on opening boxes and can accept exactly one box at the end. The value of each box is 0 with probability 1/2 and 2 with probability 1/2; the cost of each box is 1. Note that the cost of each box is equal to its expected value. Hence, the expected utility of any online algorithm is upper bounded by 0. However, with probability 1 1 2n at least one of the boxes has a value 2. A fully clairvoyant adversary only opens the box with value 2 and obtain a utility 2 (1 1 2n ) 1 = 1 1 2n 1 . This is a positive utility when n 2. Fundamental Lemmas Our lemmas allow us flexibility in considering the distribution of costs, and show how we can preserve approximation ratios. Definition 1 Let F1, . . . , Fn, and F 1, . . . , F n be two sequences of boxes. Denote the outcomes of Fi and F i by (vi, ci, ti) and (v i, c i, t i) respectively. We say the two sequences are cost-equivalent if (a) they can be coupled so that for all i we have vi = v i and ti = t i, and (b) E [ci | ti] = E [c i | ti] for all ti. Lemma 2 Let F1, . . . , Fn, and F 1, . . . , F n be two costequivalent sequences of boxes. Let Alg be an online (resp., offline) algorithm that achieves an expected utility u Alg on boxes F 1, . . . , F n. There exists an online (resp., offline) algorithm Alg that achieves the same expected utility on boxes F1, . . . , Fn. Proof : We will first suppose that Alg is online, so that the order of arrival is predetermined and the types are revealed online. We will define algorithm Alg using the run of algorithm Alg on a simulated set of boxes F 1, . . . , F n. When Alg attempts to open a box F i we do the following. We open Fi, and let (vi, ci, ti) be the outcome. Then we draw a triple (v i, c i, t i) from F i, conditioning on v i = vi and t i = ti, and report it to Alg . Alg then opens the box if and only if Alg does, and likewise keeps the box if and only if Alg does. Let Yi be a binary random variable that is 1 if Alg opens box i and 0 otherwise. Also, let Xi be a binary random variable that is 1 if Alg keeps box i and 0 otherwise. Note that for any particular i, at the time that the algorithm decides about Yi, ci is unknown to the algorithm. Moreover, ci may be correlated with ti, but is independent of all observations of the algorithm from prior rounds. Therefore, after conditioning on ti, Yi is independent of ci. We have Definition of utility i=1 E [Yic i] Linearity of Exp. i=1 Et i [E [Yic i | t i]] Draw t i first i=1 Et i [Yi E [c i | t i]] Yi indep. of c i|t i i=1 Eti [Yi E [ci | ti]] cost-equivalence i=1 Eti [E [Yici | ti]] Yi indep. of ci|ti Linearity of Exp. = u Alg. Alg opens and keeps the same sets as Alg The case where Alg is an offline algorithm is similar. The only difference is that the full profile of types is known to the algorithm in advance, and hence Yi and Xi can depend on this profile. We therefore fix the type profile, interpret variables Xi and Yi as being conditioned on this realization of the types, interpret all expectations with respect to this conditioning, and the argument proceeds as before (noting that the distribution of c i can depend on t i, but is independent of other types). Note that this actually simplifies the chain of inequalities above, as the conditioning on t i on the third line is trivial and unnecessary when t i is fixed. Lemma 3 Let F1, . . . , Fn, and F 1, . . . , F n be two costequivalent sequences of boxes. Let Alg be an online (resp., offline) α-approximation algorithm on boxes F 1, . . . , F n. There exists an online (resp., offline) α-approximation algorithm Alg on boxes F1, . . . , Fn. Proof : Let Opt and Opt be the optimum (offline) algorithms for boxes F1, . . . , Fn and F 1, . . . , F n respectively. Let Alg be the algorithm of Lemma 2 applied to Alg . Moreover, note that Applying Opt to Lemma 2 implies that there is some offline algorithm Alg on boxes F 1, . . . , F n such that u Opt = u Alg . We bound the approximation factor of Alg as follows. u Alg u Opt = u Alg u Opt By Lemma 2 u Alg u Opt = u Alg u Opt definition of Opt α. Alg is an α-approximation algorithm Next we define the commitment Pandora s box problem on boxes F 1 = (v 1, c 1, t 1), . . . , F n = (v n, c n, t n). Commitment Pandora s box is similar to the Pandora s box problem with the following two restrictions, that we refer to as freeness and commitment, respectively. Freeness: Opening any box is free, i.e., for all i we have c i = 0. Commitment: If a box F i is opened and the value v i is the maximum possible value of F i , v i is kept. Note that the commitment constraint is without loss of generality for an online algorithm, but is a non-trivial restriction for an offline algorithm. For each i, and for each type ti, we define a threshold σi so that we have E [vi min(vi, σi) | ti] = E [ci | ti]. If ci = 0, we set σi to the supremum of the vi. (It is possible to have σi be infinity, with the natural interpretation.) We define F i = (min(vi, σi), 0, ti) in the following lemma. Theorem 4 Let Alg be an α-approximation algorithm for the commitment Pandora s box problem on boxes F 1 , . . . , F n. There exists an α-approximation algorithm Alg for the Pandora s box problem on F1, . . . , Fn. Proof : First we define a sequence of boxes F 1, . . . , F n, where for all i we have F i = (vi, vi min(vi, σi), ti). That is, we set v i = vi, t i = ti, and c i = vi min(vi, σi). Note that by definition of σi we have E [vi min(vi, σi) | ti] = E [ci | ti] for all ti. Thus, by Lemma 3 an (online) αapproximation algorithm for the Pandora s box problem on F 1, . . . , F n implies an α-approximation algorithm for the Pandora s box problem on F1, . . . , Fn as desired. Next, we construct Alg required by Lemma 3 using Alg . To construct Alg , whenever Alg attempts to open a box F i , we open F i and report (v i c i, 0, ti) = (min(vi, σi), 0, ti) = F i to Alg . Alg keeps the same set of boxes as Alg . Let Yi be a binary random variable that is 1 if Alg opens box i and 0 otherwise. Also, let Xi be a binary random variable that is 1 if Alg keeps box i and 0 otherwise. Note that v i = min(vi, σi) σi, and v i achieves it maximum value whenever vi σi. In this case by the commitment constraint we have Xi = Yi. Therefore we either have vi min(vi, σi) = 0 or Xi = Yi, which gives us Yi vi min(vi, σi) = Xi vi min(vi, σi) (1) Then for any fixed profile of types, and taking expectations conditional on those type realizations, we have i=1 Yi vi min(vi, σi) # i=1 Xi vi min(vi, σi) # i=1 Xi vi vi min(vi, σi) # i=1 Xi min(vi, σi) where the first equality is since Alg opens and keeps the same sets as Alg . Similarly, we can show u Opt = u Opt , where Opt is the optimum algorithm for the Pandora s box problem on F 1, . . . , F n and Opt is the optimum algorithm for the commitment Pandora s box problem on F 1 , . . . , F n. Therefore Alg is an α-approximation algorithm for the Pandora s box problem on F 1, . . . , F n as promised. A Reduction for Online Pandora s Box Problems In this section we use Theorem 4 to provide a strong connection between the online Pandora s box problem under general constraints and prophet inequalities. This leads to our main result: we prove that a threshold-based algorithm for a prophet inequality problem, under any given feasibility constraints on boxes that can be opened and/or prizes that can be kept, immediately translates into an algorithm for the online Pandora s box problem. This reduction preserves the approximation factor of the threshold-based algorithm. This implies several approximation algorithms for the online Pandora s box problem under different constraints, which we discuss in Section 3. Recall that, in the online Pandora s Box problem, the algorithm is permitted to keep a set R S of boxes and collect the reward, where R and S are restricted to be from arbitrary predefined collections of feasible collections R and S. In the associated prophet inequality problem, the costs of all boxes are known to be 0. In Theorem 5 below, we use the notion of threshold-based algorithms for the prophet inequality problem defined as follows. We say an algorithm is threshold-based if for every i {1, . . . , n} we have a threshold τi(ti) (where the threshold can depend on the type as well as the index) and the algorithm keeps a box if and only if it not less than τi(ti). The threshold τi(ti) may be adaptive, that is it may depend on any observation prior to observing the ith box. Theorem 5 Let Algτ be a threshold-based α-approximation algorithm for the prophet inequalities problem, under a collection of constraints. There exists an α-approximation algorithm for the online Pandora s box problem under the same constraints. Proof : We define F i = (v i = min(vi, σi), c i = 0, ti). Next we give an α-approximation algorithm Alg for the commitment Pandora s box problem on boxes F 1 , . . . , F n. This together with Theorem 4 will prove the theorem. Let τi(.) be the threshold function used by Algτ given observed values v 1, . . . , v n. We define Alg as follows. Upon arrival of box F i , first we check if σi τi(ti). Note that this implicitly implies that the box is acceptable according to the constraints. If σi τi(ti) we open the box, otherwise we skip it. This ensures the commitment constraint. If we opened the box and v i τi we keep it, otherwise we ignore it and continue. It is easy to observe that Algτ and Alg provide the same outcome and have the same approximation factor. 3 Algorithms For Online Pandora s Box via Prophet Inequalities Here we use the tools from the previous section to provide algorithms for the online Pandora s box problem under different kinds of constraints. First, as a warm-up, in Theorem 6 we show a 1/2-approximation algorithm for a simple version of the problem where there are no constraints on the set of boxes that we can open, and we can only keep the value of one box. Note that this is the online version of the classical Pandora s box problem. Indeed it is known that there is no 1/2+ϵ approximation algorithm for this problem even if all of the costs are 0 (where the problem is equivalent to a basic prophet inequalities problem) (Kleinberg and Weinberg 2012). We will prove Theorem 6 directly, without appealing to Theorem 5, to provide insight into how the given thresholds translate into a policy for the Pandora s Box problem. Theorem 6 There exists a 1/2-approximation algorithm for the online Pandora s box problem with no constraints on opening boxes, but the value of exactly one box is kept. Proof : We define F i = (v i = min(vi, σi), c i = 0, ti). Next we give a simple 1/2-approximation algorithm Alg for the commitment Pandora s box problem on boxes F 1 , . . . , F n. This together with Theorem 4 proves the theorem. Set τ such that Pr [maxn i=1 v i τ] = 1 2. Let j be a random variable that indicates the first index such that v τ. Let v = v j , if there exists such index j, and let v = 0 otherwise. It is known that E [v ] = 1 2 E [maxn i=1 v i ] (Samuel Cahn 1984). We define Alg as follows. Upon arrival of box F i , first we check if σi τ. If it is we open the box, otherwise we skip it. This ensures the commitment constraint; that is, if we observe σi, we will accept it. Next, if we opened the box and v i τi we keep it and terminate. Otherwise we continue to the next box. It is easy to observe that Alg keeps v , and hence is a 1/2-approximation algorithm. We now explore other applications of our reduction. Theorem 5, together with previously-known approximation algorithms for prophet inequality problems with various types of constraints, implies the existence of approximation algorithms for variations of the online Pandora s box problem. Specifically, we have the following variations: Online k-Pandora s box problem: we are given a cardinality k, and at most k boxes can be kept. Online knapsack Pandora s box problem: we have a capacity C, and the type ti of each box corresponds to a size. The total size of the boxes that can be kept is at most C. Online matroid Pandora s box problem: we have a matroid constraint on the set of boxes, and the boxes that are kept must be an independent set of the matroid. We note that all of the variations above have no constraints on opening boxes; however, in what follows, we study a variation of the problem with constraints on opening boxes. As we have mentioned, prophet inequality approximation algorithms for the settings of cardinality constraints (Alaei 2014), knapsack constraints (Duetting, Feldman, and Lucier 2017), and matriod constraints (Kleinberg and Weinberg 2012) exist. Making use of these results and Theorem 5 implies the following corollary. Corollary 7 There is a 1 1 k+3-approximation algorithm for the online k Pandora s box problem, a 1/5-approximation algorithm for the online knapsack Pandora s box problem, and a 1/2-approximation algorithm for the online matroid Pandora s box problem. 4 Pandora s Box with Multiple Arms In the context reinforcement learning, we next consider a multi-arm version of the Pandora s box problem. In each of J rounds, m boxes are presented. There are therefore m J boxes in total. At most one box can be opened in each round. The m boxes presented in a given round are ordered; we can think of each box as having a type labeled 1 through m. All boxes of type t have the same cost ct, and also have the same value distribution Ft. For notational convenience we ll write vtj for the value in the box of type t presented at time j; for convenience we assume vtj > 0. At the end of the J rounds, the player can keep at most one prize for each type of box. That is, if we write St for the subset of boxes of type t opened by the player, then the objective is to maximize X t max j St vtj ct |St|. If no box of type t is opened, we ll define maxj St vtj (i.e., the prize for that type) to be 0. This is a variant of the online Pandora s box problem with a (partition) matroid constraint on the set of prizes that can be kept, and also a constraint on the subsets of boxes that can be opened. We can think of this problem as presenting boxes one at a time, where the boxes from a round are presented sequentially, with the additional constraint that one box per round can be selected. Note that types here are not random, but would depend on the index of the box. Our previous reduction applies, so that solving this multi-arm Pandora s box problem reduces to developing the related prophet inequality. In this prophet inequality, boxes can be opened at no cost, but we must irrevocably choose whether or not to keep any given prize as it is revealed. We can still keep at most one prize of each type, and we can still open at most one box in each time period. Our question becomes: can we develop a threshold policy to achieve a constant-factor prophet inequality for this setting? In this case, a threshold policy corresponds to choosing a threshold τt for each box type t, and accepting a prize vtj from an opened box if and only if vtj > τt. What is an appropriate benchmark for the prophet inequality? Note that the sum of the best prizes of each type, ex post, might not be achievable by any policy due to the restriction on which boxes can be opened. We will therefore compare against the following weaker benchmark. We consider a prophet who must choose, in an online fashion, one box to open in each round, given knowledge of the prizes in previously opened boxes. Then, after having opened one box on each of the J rounds, the prophet can select the largest observed prize of each type. In other words, the prophet has the advantage of being able to choose from among the opened boxes in retrospect, but must still open boxes in an online fashion. Our goal is to obtain a constant approximation to the expected value enjoyed by such a prophet. We begin with some observations about the choice of which box to open. First, the optimal policy for the prophet is to open boxes greedily. In particular, this policy can be implemented in time linear in m, each round. Lemma 8 In each round j, it is optimal for the prophet to open a box of type t that maximizes his expected value, as if the game were to end after time j. Proof : Note that the expected marginal gain of opening a box of type t can only decrease over time, and only as more boxes of that type are opened. Suppose t is the box with maximum expected marginal value at time j. Suppose further that the prophet does not open box t, and furthermore opens no box of type t until the final round J. Then t will be the box with maximum expected marginal value in round J, and therefore it would be optimal to open the box of type t on the last round. This implies that it is optimal to open at least one box of type t, at some point between round j and the end of the game. The prophet is therefore at least as well off by opening the box of type t immediately, since doing so does not affect the distribution of the revealed value, and this can only provides more information for determining which other boxes to open. It is therefore (weakly) optimal to open box t at time j. Similarly, once thresholds are fixed, an identical argument implies that the optimal threshold-based policy behaves greedily. In particular, the optimal policy can be implemented in polynomial time, given an arbitrary set of thresholds. Lemma 9 Suppose the player s policy is committed to selecting a prize of type t, from an opened box of type t, if and only if its value exceeds τt. Then, in each round j, it is optimal to open a box t, from among those types for which a prize has not yet been accepted, that maximizes his expected value as if the game were to end after time j. That is, t arg maxt{E [vtj | vtj > τt] Pr [vtj > τt]}. We now claim that there are thresholds that yield a 2approximate prophet inequality for this setting. First, some notation. By the principle of deferred randomness, we can think of the value of the prize in any given box as only being determined at the moment the box is opened. With this in mind, we will write wtk for the value observed in the k th box of type t opened by the decision-maker. For example, wt1 is the value contained in whichever box of type t is opened first, regardless of the exact time at which it is opened. Note that the behavior of any online policy is fully described by the profile of values w = (wtk), and each wtk is a value drawn independently from distribution Ft. For such a profile w, write y tk(w) for the indicator variable that is 1 if the prophet opens at least k boxes of type t, and keeps the k th one opened. Then the expected value enjoyed by the prophet is t wtky tk(w) For each box type t, we will set the threshold k wtky tk(w) That is, τt is half of the expected value obtained by the prophet from boxes of type t. To prove that these thresholds achieve a good approximation to the prophet s welfare, it will be useful to analyze the possible correlation between the number of boxes of type t opened by the prophet, and whether any prize of type t is kept by the threshold algorithm. To this end, write z tk(w) for the indicator that the prophet opens at least k boxes of type t. Note that z tk(w) y tk(w) for all t, k, and w. We ll also write Qt(w) for the indicator variable that is 1 if wtkz tk(w) τt for all k. That is, Qt(w) = 1 if no box of type t opened by the prophet has value greater than τt, and hence the threshold algorithm does not keep any prize of type t. The following lemma shows that Qt is positively correlated with z tk, for every t and k. Lemma 10 For all t and k, Ew [Qt(w) z tk(w)] Ew [Qt(w)] Ew [z tk(w)]. Proof : Fix the values of wℓk for all ℓ = t. Note that this also fixes the choice of which box the prophet will open, on any round in which the prophet chooses not to open a box of type t. Due to the prophet s greedy method of opening boxes, at every time k, the prophet will choose to open the box of type t if and only if the maximum value from a box of type t, seen so far (or 0 if no box of type t has been opened yet), is below a threshold determined by the values observed from the other boxes. In other words, the values wℓk for ℓ = k define a sequence of non-decreasing thresholds h0, h1, h2, . . . , h J with the following property. Suppose, at time j, the prophet has previously opened s < j boxes of type t, and has therefore opened j s 1 boxes of types other than t. Then the prophet will open box t at time j if and only if max r s {wtr} < hj s 1. (2) Here and below, we ll take the maximum of an empty set to be 0. We now claim that the prophet opens k or more boxes of type t, at or before time J, if and only if maxr 0. Now suppose J > 1. If maxr