# buying_information_for_stochastic_optimization__ff0750c2.pdf

Buying Information for Stochastic Optimization

Mingchen Ma 1 Christos Tzamos 1

Stochastic optimization is one of the central problems in Machine Learning and Theoretical Computer Science. In the standard model, the algorithm is given a fixed distribution known in advance. In practice though, one may acquire at a cost extra information to make better decisions. In this paper, we study how to buy information for stochastic optimization and formulate this question as an online learning problem. Assuming the learner has an oracle for the original optimization problem, we design a 2-competitive deterministic algorithm and a e/(e 1)-competitive randomized algorithm for buying information. We show that this ratio is tight as the problem is equivalent to a robust generalization of the ski-rental problem, which we call super-martingale stopping. We also consider an adaptive setting where the learner can choose to buy information after taking some actions for the underlying optimization problem. We focus on the classic optimization problem, Min-Sum Set Cover, where the goal is to quickly find an action that covers a given request drawn from a known distribution. We provide an 8-competitive algorithm running in polynomial time that chooses actions and decides when to buy information about the underlying request.

1. Introduction

1.1. Offline and Adaptive Stochastic Optimization

Stochastic optimization is one of the core problems in machine learning and theoretical computer sciences. In stochastic optimization, the input parameters of the problems are random variables drawn from a known distribution. Given the distribution of the parameters, a learner constructs a fea-

*Equal contribution 1Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA. Correspondence to: Mingchen Ma <mma54@wisc.edu>.

Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s).

sible solution in advance (offline stochastic optimization) or adaptively (adaptive stochastic optimization) to optimize the objective function in expectation. Formally, the two types of stochastic optimization problems can be defined in the following way.

Definition 1.1 (Offline Stochastic Optimization). Let S be a set of scenarios and X(S) be a set of actions. Let ℓ(A, s) : 2X S R+ be a loss function. An offline stochastic optimization problem (X, S, ℓ, D) is to find a set of actions A that minimize Es Dℓ(A, s), where D is a distribution over S.

Definition 1.2 (Adaptive Stochastic Optimization). Let S be a set of scenarios and X(S) be a set of actions. Initially, a random scenario s is drawn according to a distribution D. Then, the learner sequentially chooses actions a1, a2, . . . and after the t-th action observes a (possibly randomized) outcome r((a1, a2, . . . , at), s) R. The goal of the learner is to take a sequence of actions A that minimizes Es Dℓ(A, s) for a given loss function ℓ(A, s), possibly exploiting the information gained about s along the way.

A huge body of work among different communities such as machine learning, theoretical computer science, statistics, and operations research has studied stochastic optimization problems given their numerous applications. For example, methods of offline stochastic optimization have been widely applied to problems such as training machine learning models (Shalev-Shwartz et al., 2009; Bottou, 2010; Kingma & Ba, 2014) and mechanism design (Nisan & Ronen, 1999; Hartline, 2013; Roughgarden, 2016). On the other hand, many adaptive stochastic optimization problems such as Pandora s Box problem (Weitzman, 1979; Chawla et al., 2020), active learning (Dasgupta, 2004; Settles, 2012) and optimal decision tree (Adler & Heeringa, 2012; Li et al., 2020) have also been applied to areas like artificial intelligence, microeconomics, and operations research.

A common assumption in these works is that the distribution D of the scenario s is considered as a given. However, such an assumption is not realistic in practice. A learner in practice has many ways to gain extra knowledge on the optimization problem he is going to solve. With the extra knowledge, it is reasonable that the learner updates the prior distribution D to some posterior distribution D and uses a better strategy to solve the problem. As a concrete

Buying Information for Stochastic Optimization

example, consider n bidders that compete over an item in an auction. Classic auction theory assumes that the auctioneer only knows a prior distribution D over the buyer values and wants to design an auction to optimize a target objective such as welfare or revenue. In practice though, there is a number of information sources available to the auctioneer that provide information about the bidders such as their demographics, their preferences or their purchase history. Such information can be very useful.

However, this information does not come for free. It may cost significant amounts of money or time and it is not clear in advance, how helpful this information will be. In the example, the auctioneer may pay an information provider only to receive irrelevant pieces of information or information already known.

1.2. Our Contribution and Techniques

In this paper, we study the problem of buying information for stochastic optimization. We consider a learner that wants to minimize the total cost spent on solving the optimization problem and the cost of acquiring information.

We model the information acquisition process using a signaling scheme (Emek et al., 2014). A signaling scheme is a (randomized) function f from the set of scenarios S to a signal space R. If a learner asks for feedback from f, he will receive a signal y and the prior distribution can be updated as D|f(s)=y. In our model, we assume there is a sequence of signaling schemes F = {ft} t=0 arriving online. At any timestep t, based on the signals received so far, the learner has the choice to continue purchasing the next signal given by ft or stop. Our goal is to construct a learner who is competitive to the cost of a prophet who knows the structure of F in advance and can take optimal actions.

For offline optimization, all signals must be purchased before taking any actions in the underlying stochastic optimization problem. We assume the learner is able to compute an (approximate) optimal solution for the underlying problem given the available information at any point in time. The goal of the learner is then to adaptively decide when to stop buying feedback. Our main results in this setting are summarized below:

Theorem 1.3 (Informal Version of Theorem 3.6 and Theorem 3.8). There exist a 2-competitive deterministic learner and an e e 1-competitive randomized learner to buy information for offline stochastic optimizations.

We show that both learners can be implemented efficiently and have competitive ratios that are information theoretically optimal. Thus, we give a comprehensive understanding of buying feedback for offline stochastic optimization. To solve the problem, we formulate it as a super-martingale stopping problem: There is an unknown sequence of random vari-

ables (X0, X1, . . . ) satisfying E(Xi+1 | Xi) Xi. The realizations of the random variables arrive online and an algorithm outputs a stopping index i adaptively to minimize E(i + Xi ). The super-martingale stopping problem can be seen as a generalization of the classic ski-rental problem introduced in (Karlin et al., 1994) where all Xi {0, B} and its variant introduced in (Chawla et al., 2020), where (X0, X1, . . . ) are monotone decreasing constants. In the more general setting of super-martingale stopping though, the values of Xi may not be monotone, and they are only monotone in expectation. This makes the problem significantly more challenging and as we show in Appendix C, natural algorithms for ski-rental problems are not competitive for our problem.

For adaptive stochastic optimization, it is also natural to intertwine purchasing information with taking actions. For example, several actions may be taken first in the problem and then information may be purchased conditional on their outcome. As this setting is more problem dependent, we focus on a paradigmatic case of adaptive stochastic optimization, where there is a random set of good actions, and the learner takes actions in each round until a good action is chosen. Such a problem is called Min Sum Set Cover (MSSC), a well-studied adaptive stochastic optimization problem (Bar-Noy et al., 1998; 1999; Feige et al., 2004). In our model, the learner has an extra action at each round to buy information getting a better estimate of the probability that an action is good.

We provide an algorithm for this problem competitive to a prophet that knows the sequence of signaling schemes in advance:

Theorem 1.4 (Informal Version of Theorem 4.6). There is a poly-time learner that is 8-competitive for buying information for Min Sum Set Cover.

We achieve this in two steps. In the first step, we show we can shrink the action space so that we don t need to consider when to buy feedback. We introduce a simpler model called adaptive stochastic optimization with time dependent feedback, where a learner takes an action in each round, and feedback arrives for free after an action is taken. We show in Theorem 4.3 that if there is a learner that is α-competitive for adaptive stochastic optimization with time dependent feedback, then we can use it to construct a 2α-competitive learner to buy information for adaptive stochastic optimization. Our second step is to prove the following technical theorem, which is of independent interest.

Theorem 1.5 (Informal Version of Theorem 4.4). The greedy algorithm is 4-competitive for MSSC with time dependent feedback.

There is a lot of work done for analysis of the greedy algorithm of min sum coverage objective under different settings

Buying Information for Stochastic Optimization

(Feige et al., 2004; Streeter & Golovin, 2008; Golovin & Krause, 2011). The analysis is usually based on an elegant histogram approach proposed in (Feige et al., 2004). However, in our model, the decision made by the learner is fully adaptive and it is hard to adapt such an analysis directly. Instead, we bypass such difficulty and use an interesting linear programming dual approach to analyze the greedy algorithm. Besides algorithmic results, we also present hard instances to build information theoretic lower bound for MSSC under our models.

1.3. Applications of our Model

Buying information is very common in practice. In fact, our model fits well in both theory and practical applications. In this section, we give several applications of our model. We first give a typical example of buying information for offline stochastic optimization.

Selling One Item with Feedback There is a seller who wants to sell an item to a buyer. The seller sets a price p for the item. The buyer has a value v for the item and would like to pay the price p for the item if p v. However, if p > v, the buyer will not buy the item. Given a pair of (v, p), denote by P(v, p) = p1p v the payment of the buyer. The value of the buyer may depend on his nationality, education, or other factors. The information can be collected from the historic trade and thus the seller has a prior distribution D of the value v. The goal of the learner is to set up the price p to minimize Ev D (v P(v, p)). However, instead of setting the price immediately, the seller may pay some money to collect more information about the buyer. This can help the seller update the prior distribution of the value v. In practice, it is hard to predict the quality of the information. The question for the seller is how much information is sufficient for him to set up a good price.

Our second example is on buying information for adaptive stochastic optimization.

Optimal Decision Tree with Feedback A doctor wants to diagnose the disease of a patient. There are X = [n] different tests that can be performed by the doctor and S = [m] different possible diseases. If the patient has a disease s S and a test a X is performed, then the doctor will receive an outcome r(s, a). The doctor has a prior distribution D of the disease s based on the symptom of the patient. In the standard optimal decision tree problem, based on the knowledge of D, the goal of the doctor is to perform a sequence of tests adaptively to identify the disease while minimizing the expected cost of the tests. In practice, the doctor may choose not to run tests but instead send the patient home to see whether the symptoms worsen. However, this is also costly and it may be challenging to predict what symptoms will appear and how much time it

will take for them to appear. Combined with an algorithm for computing approximately optimal decision trees, our work shows how to incorporate the symptom monitoring component to efficiently identify the disease.

Beyond these applications, our model fits well with many existing theoretical frameworks in learning theory. Here we take adaptive submodular optimization, a recently popular research direction in the field of machine learning as our example.

Adaptive Submodularity with Feedback Motivated by applications on artificial intelligence, (Golovin & Krause, 2011) introduces the notion of adaptive submodularity, which was a popular research topic in the last decade. A function f(A, s) of a set of actions A and a random scenario s is adaptive submodular if Esf(A, s) is a submodular function. After an action a is taken, the learner will see an outcome s(a). Given the distribution of s, the learner will construct the action set A adaptively to optimize classic objectives for submodular functions (Fujishige, 2005) such as submodular maximization, min submodular coverage, and min sum submodular coverage. Many natural questions arise when feedback is involved in this framework. For example, if feedback is costly, how can we buy feedback to help us make adaptive decisions? If the feedback is free and time dependent, are existing policies still competitive?

1.4. Organization of paper

In Section 2, we formally introduce the model studied by the paper. In Section 3, we introduce the super-martingale stopping problem to study buying information for offline stochastic optimization. We give a tight deterministic algorithm and a tight randomized algorithm for the supermartingale stopping problem. Furthermore, we will discuss the robustness of these algorithms. In Section 4, we focus on buying information for adaptive stochastic optimization. We introduce the model of time dependent feedback and build a connection between adaptive stochastic optimization with time dependent feedback and buying information for adaptive stochastic optimization in Section 4.1. In Section 4.2, we show a simple greedy learner is 4-competitive for Min Sum Set Cover with time dependent feedback. And in Section 4.3, we design an 8-competitive algorithm for buying information for Min Sum Set Cover. Furthermore, we discuss the information theoretic lower bound for Min Sum Set Cover under both settings.

2. Stochastic Optimization with Feedback

2.1. Feedback Signals for Stochastic Optimization

Let S be a set of scenarios with a distribution D over S and let Y be a set of random variables over R. A randomized

Buying Information for Stochastic Optimization

signaling scheme f : S Y is a map from S to Y. Let s be a scenario drawn from D. A signal received from f is a realization y R of the random variable f(s). Similarly, a deterministic signaling scheme f : S R is a function from S to R. When a scenario s is drawn, a signal received from f is defined by y = f(s) R. In particular, any deterministic signaling scheme gives a partition of S. Given the definition of a signaling scheme, we are able to define feedback for stochastic optimization problems.

Definition 2.1 (Feedback). Let (X, S, ℓ, D) be a stochastic optimization problem. A sequence of feedback F = {ft} t=0 is a sequence of unknown randomized (deterministic) signaling scheme. The tth feedback received by a learner is the pair (yt, D |ft(s)=yt,ft 1(s)=yy 1,...,f0(s)=y0), where yt is the signal from ft.

For convenience, we assume f0 is a constant for every scenario, throughout the paper. Such an assumption is used to reflect the fact that the learner has no extra knowledge at time 0. In fact, for our model, randomized signaling schemes are equivalent to deterministic ones. We leave a discussion for this in Appendix A. In this paper, we consider deterministic signaling schemes. A deterministic signaling scheme can simplify our analysis and provide more intuition. In particular, if each signaling scheme f F is deterministic, then F can be represented as a tree. For such feedback F, we define a feedback tree T(F) as follows.

Definition 2.2 (Feedback Tree). Let feedback F = {ft} t=0 be a set of deterministic signaling schemes. The feedback tree T(F) for F is a tree that is defined as follows. Each node v T(F) contains a set of scenarios and the children of v form a partition of the set of scenarios contained in v. The root of T(F) contains all scenarios. For every s S, let P(s) = (v0, v1, . . . , vn) be the longest path in T(F) such that every node in P(s) contains s. Then the set of scenarios contained in vi is defined by {s vi 1 | fi(s ) = fi(s)}.

2.2. Problem Formulation

Although feedback is helpful for a learner to make better decisions for stochastic optimization problems, obtaining feedback always requires some cost. The cost can be either time or money. Thus, it is natural for a learner to consider how to balance the cost of asking for feedback and the cost of solving the optimization problem. We consider formulating this problem in an online fashion for offline and adaptive stochastic optimization problems.

Definition 2.3 (Buying Information for Offline Stochastic Optimization). Let (X, S0, ℓ, D0) be an offline stochastic optimization problem and F = {ft} t=0 be a sequence of unknown feedback. Let C = {ct} t=0 be a sequence of cost for receiving a signal from ft+1 F. Here, ct : R Z+ is a nonnegative function that depends on the last received signal. In each time round t 0, a learner receives an offline

stochastic optimization problem (X, St, ℓ, Dt) and a cost ct(yt) to obtain a signal from ft+1, where yt is the signal received from ft. Here, St = {s St 1 | yt dom(ft(s))} and Dt = Dt 1 |ft(s)=yt for t 1. The learner can either stop and pay Pt 1 j=0 cj(yj) + min A X Es Dtℓ(A, s) or enter the next time round. An offline stochastic optimization with feedback (X, S, ℓ, D, F, C) is to decide a stopping time T adaptively to minimize ET PT 1 j=0 cj(yj) + min A X Es DT ℓ(A, s) .

Let I = (X, S, ℓ, D, F, C) be an instance of offline stochastic optimization with feedback, denote by cost(A, I) the cost of the stopping time output by a learner A for the given instance. A learner is α-competitive if for every instance (X, S, ℓ, D, F, C), cost(A, I) αOPT(I) = α min A cost(A, I).

We can describe the problem in a more intuitive way in terms of the feedback tree. Let (X, S, ℓ, D) be a stochastic optimization problem and T(F) be a feedback tree. Each node v of T(F) represents a new stochastic optimization problem (X, Sv, ℓ, Dv), where Sv is the set of scenarios contained in v and Dv = D |s Sv. Solving this optimization problem needs a cost min A X Es Dvℓ(A, s). Each node also has a cost cv to move down for one step. The stochastic optimization problem and the cost will be revealed to the learner when the learner reaches v. T(F) is unknown to the learner and a path of T(F) is selected according to D initially. The learner will keep moving along the path by paying the cost cv and will decide when to stop and solve the optimization problem. The benchmark we want to compare is a learner who knows the whole feedback tree in advance and thus can compute the optimal stopping time.

Definition 2.4 (Buying Information for Adaptive Stochastic Optimization). Let (X, S, ℓ, D) be an adaptive stochastic optimization problem. F = {ft} t=0 be a sequence of unknown feedback. Let C = {ct} t=0 be a sequence of cost for receiving a signal from ft+1 F. Here, ct : R Z+ is a nonnegative function that depends on the last received signal. Initially, a scenario s is drawn according to D. In each time round t, a learner first adaptively receives an arbitrary number of signals y(s) from the sequence F by paying the corresponding cost, then selects an action at X. Let T(s) be the number of signals received by the learner if s is drawn. An adaptive stochastic optimization problem with feedback is to make decisions to ask for feedback and take actions adaptively in each time round to minimize Es D ℓ(A, s) + PT (s) 1 j=0 cj(yj(s)) .

Let I = (X, S, ℓ, D, F, C) be an instance of adaptive stochastic optimization with feedback, denote by cost(A, I) the expected cost of the decisions made by a learner A for the given instance. A learner is α-competitive if for every instance (X, S, ℓ, D, F, C), cost(A, I) αOPT(I) =

Buying Information for Stochastic Optimization

α min A cost(A, I).

3. Buying Information for Offline Stochastic Optimization and Super-Martingale Stopping Problem

Let (X, S, ℓ, D) be an offline stochastic optimization problem and f be a signaling scheme. Denote by Dy the posterior distribution of D after receiving signal y from f. Although it is possible that min A X Es Dℓ(A, s) < min A X Es Dyℓ(A, s), it is always true that

Ey min A X Es Dyℓ(A, s) min A X Es Dℓ(A, s).

That is to say, feedback is always helpful in expectation. This implies the sequence of minimum value of the stochastic optimization problems is a super-martingale. Formally, given a sequence of feedback F, denote by Di the posterior distribution after receiving signals from f0, f1, . . . , fi. Let random variable Xi = min A X Es Diℓ(A, s). Then for every i 0, we have E (Xi+1 | Xi) Xi. This motivates us to formulate the problem of buying information as the following super-martingale stopping problem. As we discuss in Appendix B, super-martingale stopping is equivalent to buying information for stochastic optimization.

3.1. Super-Martingale Stopping Problem

Definition 3.1 (Super-Martingale Stopping Problem). Let X0, X1, . . . , Xn be a sequence of nonnegative random variables unknown to the learner. Assume for every i, E(Xi+1 | Xi) Xi. The problem has n + 1 rounds. In the ith round, given an observed realization of X0, . . . , Xi, a learner decides either to stop and pay i + Xi or to obtain the realization of Xi+1 and go to the next round. The goal of the learner is to compute a decision rule to obtain a stopping time i only based on the observed realization of the sequence to minimize E(i + Xi ).

For convenience, we assume X0 is a constant throughout the paper. Suppose each random variable Xi has finite support, then the sequence can be represented by a tree T, where a node v with depth i stores a realization of Xi. To simplify the notation, we use v to denote both the node and the value stored at the node. When we make a single movement from node v, we will reach a child v of v with probability Pr(Xi+1 = v | Xi = v). An optimal learner knows tree T in advance and can decide in advance which node to stop to optimize the expected cost. Formally, a set of stopping nodes S is feasible for T if every path of T with length n contains one and only one stopping node. The cost of S is P

v S Pr(v)(depth(v) + v). We denote by OPT(T) the minimum cost among all feasible sets of stopping nodes of T. An algorithm is α-competitive if for every instance of the super-martingale stopping problem

with a representation T, the expected cost of the algorithm ALG(T) = E(i + Xi ) αOPT(T).

In the ski-rental problem studied in (Karlin et al., 1994), there is a pair of positive numbers (B, T) such that Xi = B if i < T and Xi = 0 if i T. This implies that ski-rental problem is a special case of the super-martingale stopping problem. Thus, we have the following information theoretic lower bound for the super-martingale stopping problem. Theorem 3.2. For every ϵ > 0, no randomized algorithm is e e 1 ϵ-competitive for the super-martingale stopping problem. Theorem 3.3. For every ϵ > 0, no deterministic algorithm is 2 ϵ-competitive for super-martingale stopping problem.

Recall that the key idea in the design of algorithms for skirental problem is to balance the payment Xi and the index i. However, this idea cannot be simply applied to the supermartingale stopping problem. There are two difficulties faced in the super-martingale stopping problem. First, since any algorithm can only get information from one path of the tree, it is hard to estimate the expected stopping time for the whole tree. Second, unlike most ski-rental type problems, the value Xi is not necessarily decreasing. It is possible that an algorithm moves for one step but sees an Xi with a very large value. We will show in Appendix C that some natural algorithms that work for ski-rental problems are not competitive for the super-martingale stopping problem. On the other hand, in Appendix D, we establish a simple randomized 2-competitive algorithm for the super-martingale stopping problem using a completely novel idea. Although the algorithm we present in Appendix D shows competitive algorithms do exist for super-martingale stopping problem, the competitive ratio doesn t match the information theoretic lower bound in Theorem 3.2 and Theorem 3.3. In the following sections, we will give a tight deterministic algorithm and randomized algorithm for the super-martingale stopping problem. Furthermore, we will also discuss the robustness of these algorithms, when the input is not a supermartingale.

The key idea for designing our algorithms is to maintain the following estimator Qp(t) throughout the execution of the algorithms. Let (X1, . . . , Xn) be an instance of supermartingale stopping and let T be its tree representation. Initially, a path p = (v0, v1, . . . , vn) of T will be drawn randomly according to the joint distribution of (X0, . . . , Xn). We define a function vp(t) = vi, if t [i, i + 1). Furthermore, we define Qp(t) = R t 0 1 vp(t)dt. In particular, Qp(t) only depends on our observed realization and doesn t depend on the realization of the random variables we have not seen. We notice that Qp(t) is strictly increasing with respect to t and thus for every s 0, we can define its inverse function Q 1 p (s) = t, where Qp(t) = s. The power of Q is that it can be used to upper bound and lower bound the optimal

Buying Information for Stochastic Optimization

stopping time, which can be summarized by the following two lemmas that we will frequently used in our proof. The proof of Lemma 3.5 can be found in Appendix E.1 due to a lack of space.

Lemma 3.4. Let T be a tree representation of an instance of the super-martingale stopping problem and let p be a path of T. Then for every s > r > 0, Q 1 p (s) Q 1 p (r) = R s r vp(Q 1 p (w))dw.

Proof. The proof follows a change of variable. We write w = Qp(t). Then we have

r vp(Q 1 p (w))dw = Z Q 1 p (s)

Q 1 p (r) vp(t)d Qp(t)

= Z Q 1 p (s)

vp(t) vp(t)dt = Q 1 p (s) Q 1 p (r).

Lemma 3.5. Let v T be a node with depth i and let {pj}k j=1 be the set of paths that passes v. For every ρ [Qpj(i), Qpj(i + 1)] and for every ρ ρ , Pk j=1 Pr(pj)(Q 1 pj (ρ) Q 1 pj (ρ )) Pr(v)(ρ ρ )v.

3.2. A Tight Deterministic Algorithm for Martingale Stopping

In this section, we propose a simple deterministic 2competitive algorithm for the super-martingale stopping problem. The competitive ratio is tight according to Theorem 3.3. We leave the proof for Appendix E.2 due to the space limit.

Theorem 3.6. There is a deterministic poly-time algorithm that is 2-competitive for the super-martingale stopping problem.

Algorithm 1 DETERMINISTICSTOPPING (2-competitive deterministic algorithm for super-martingale stopping)

for i=0,1,2,. . . do

Observe Xi = vi and compute Qp(t) for t [i, i + 1] based on the realization of X0, . . . , Xi. if t (i, i + 1] such that Qp(t) = 1 then

Stop at time i and pay i + vi. end if end for

In particular, if the sequence of random variables is monotone decreasing, then our algorithm can even compete against a prophet who knows the realization of the sequence in advance.

Corollary 3.7. Let I be an instance of the super-martingale stopping problem and (X0, X1, . . . ) be the input sequence.

Denote by ALG(I) the cost of Algorithm 1 over instance I. If (X0, X1, . . . ) is monotone decreasing, then ALG(I) 2E mini (i + Xi).

Proof. Let x = (x0, x1, . . . ) be a realization of (X0, X1, . . . ) and denote by ALG(x) the cost of Algorithm 1 if the realization is x. Since x is monotone decreasing, we have ALG(x) 2 mini (i + xi). Thus,

ALG(I) Ex2 min i (i + xi) = 2E min i (i + Xi) .

3.3. A Tight Randomized Algorithm for Martingale Stopping

In this section, we extend the idea of Theorem 3.6 to obtain a e e 1-competitive randomized algorithm for the supermartingale stopping problem. Notice that according to Theorem 3.2, the competitive ratio is tight. Recall that in the Algorithm 1, we maintain an estimator QP (t) throughout the execution of the algorithm and stop when QP (t) = 1. To obtain a better randomized algorithm, we select a random threshold ρ initially, and stop when QP (t) exceeds this threshold. The proof of Theorem 3.8 can be found in Appendix E.3.

Theorem 3.8. There is a randomized poly-time algorithm for the super-martingale stopping problem that is e e 1competitive.

Algorithm 2 RANDOMIZEDSTOPPING ( e e 1-competitive algorithm for super-martingale stopping)

Randomly draw a threshold ρ [0, 1] with a probability density function p(ρ) = eρ e 1. for i=0,1,2,... do

Observe Xi = vi and compute Qp(t) for t [i, i + 1] based on the realization of X0, . . . , Xi. if t [i, i + 1] such that Qp(t) = ρ then

Stop at time t and pay i + vi. {Every time we stop, i t.} end if end for

Similarly, we have the following corollary, when the input sequence is monotone decreasing.

Corollary 3.9. Let I be an instance of the super-martingale stopping problem and (X0, X1, . . . ) be the input sequence. Denote by ALG(I) the cost of Algorithm 2 over instance I. If (X0, X1, . . . ) is monotone decreasing, then ALG(I) e e 1E mini (i + Xi).

Proof. Let x = (x0, x1, . . . ) be a realization of (X0, X1, . . . ) and denote by ALG(x) the cost of Algo-

Buying Information for Stochastic Optimization

rithm 2 if the realization is x. Since x is monotone decreasing, we have ALG(x) e e 1 mini (i + xi). Thus,

ALG(I) Ex e e 1 min i (i + xi) = e e 1E min i (i + Xi) .

3.4. A Discussion on Benchmark

In this section, we discuss the benchmark of the supermartingale stopping problem. According to Corollary 3.7 and Corollary 3.9, if the input sequence is monotone decreasing, then our algorithms can compete with a prophet who knows the realization of the sequence in advance. However, in general, it is not possible to compete against such a strong benchmark, since the gap between the two benchmarks can be arbitrarily large. Thus, it is only reasonable to compete with an algorithm that knows the structure of the feedback in advance. We formalize the discussion as the following theorem, whose proof is in Appendix E.4.

Theorem 3.10. No algorithm is competitive against E mini (i + Xi) for the super-martingale stopping problem.

3.5. On the Robustness of Algorithm 1 and Algorithm 2

In this part, we consider the robustness of Algorithm 1 and Algorithm 2. Back to our motivation, buying information for offline stochastic optimization. In the model of buying information for offline stochastic optimization, we assume that given a stochastic optimization problem, the learner can solve the problem exactly. However, since most stochastic optimization problems are NP-hard, usually, the learner might only have an α-approximate algorithm to solve it. If Xi is the optimal value of the stochastic optimization problem after receiving the ith feedback, then the cost to solve the problem for the learner is instead Xi, where Xi [Xi, αXi]. That is to say, if the learner stops at Xi, he will pay i + Xi. We remark that in this case, X0, . . . , Xn may not satisfies the super-martingale property anymore, thus we cannot apply the analysis of Algorithm 1 and Algorithm 2 directly. However, we will show that the two algorithms are robust under such perturbation. In other words, Algorithm 1 is 2α-competitive and Algorithm 2 is e e 1α-competitive. Formally, we have the following theorem, whose proof is deferred to Appendix E.5.

Theorem 3.11. Let T be a tree representation of an instance of the super-martingale stopping problem. Let T be any tree constructed by changing the value of every leaf v T by some value v [v, αv]. If we run Algorithm 1 over T, then ALG( T) 2αOPT(T) and if we run Algorithm 2 over T, then ALG( T) e e 1αOPT(T).

4. Buying Information for Adaptive Stochastic Optimization and Prophet Inequality

Unlike offline stochastic optimization with feedback, buying information for adaptive stochastic optimization is much more problem-dependent. For this reason, we consider designing competitive learners to buy information for specific problems. We choose Min Sum Set Cover, an extreme case of the adaptive stochastic optimization problem as the first problem studied under the feedback setting.

Definition 4.1 (Min Sum Set Cover). Let B = [n] be a set of boxes, each box i contains an unknown number bi {0, 1}. A learner can know bi by querying box i, i.e. the action space X = B. A scenario s {0, 1}n is a binary vector that represents the number contained in each box. If scenario s is realized, then for every box i B, si = bi. A scenario s is covered if a box i such that si = 1 is queried. Let S be a set of scenarios and D be a probability distribution over S. Let f be a sequence of feedback. A scenario s is drawn from D initially. In each round t, a learner takes an action at X to query the box at and observes the number contained in that box. Given an instance (B, S, D) of Min Sum Set Cover, the goal of a learner is to construct the sequence of boxes A to query to minimize Es Dℓ(A, s), where ℓ(A, s) is the number of boxes in A to query until the drawn scenario s is covered.

The main contribution of this section can be broken down into two parts. In the first part, we give a general strategy to shrink the action space of buying information for a broad class of stochastic optimization problems. For such a class of problems, we show that if an α-prophet inequality exists for an adaptive stochastic optimization problem with time dependent feedback, which we will define later, then there is a 2α-competitive learner for buying information for adaptive stochastic optimization. In the second part, using such an idea, we construct an 8-competitive learner to buy information for Min Sum Set Cover(MSSC) by showing a 4-prophet inequality for MSSC with time dependent feedback. Furthermore, we will establish information theoretic lower bounds for MSSC under both settings.

4.1. Time Dependent Feedback and Prophet Inequality

A prophet inequality for an adaptive stochastic optimization is established when a signal arrives from ft for free in each round. Formally, we have the following model.

Definition 4.2 (Adaptive Stochastic Optimization with Time Dependent Feedback). Let (X, S, ℓ, D) be an adaptive stochastic optimization problem. F = {ft} t=0 be a sequence of feedback. Initially, a scenario s is drawn according to D. In each time round t, a learner receives a signal yt(s) from ft(s), then takes an action at X. An adaptive stochastic optimization problem with time dependent

Buying Information for Stochastic Optimization

feedback (X, S, ℓ, D, F) is to make decisions to construct a sequence of actions A adaptively to minimize Es Dℓ(A, s).

If we denote by cost(A, I) be the expected cost of a learner A at a given instance I, then a learner is α-competitive if for every instance I, cost(A, I) min A cost(A, I). In particular, here we are competing with a learner who knows F in advance. We say a stochastic optimization (X, S, ℓ, D) satisfies an α-prophet inequality if there is an α-competitive learner for the corresponding stochastic optimization problem with time dependent feedback. We have the following theorem to establish the relation between the two problems.

Theorem 4.3. If there is an α-competitive learner for Min Sum Set Cover with Time Dependent Feedback, then there is a 2α-competitive learner for Buying Information for Min Sum Set Cover.

Although the statement of Theorem 4.3 is on MSSC here, the same results actually hold for a broader class of problems, where the loss function can be written as a covering function. Due to space limitations, we leave the general statement and the proof of Theorem 4.3 for Appendix F.1.

4.2. Min Sum Set Cover with Time Dependent Feedback

In this part, we establish a 4-prophet inequality for MSSC with time dependent feedback via the following theorem.

Theorem 4.4. Algorithm 3, a simple greedy learner is 4competitive for Min Sum Set Cover with Time Dependent Feedback.

Algorithm 3 GREEDY (4-competitive Learner for MSSC with Time Dependent Feedback)

for t=0,1,2,. . . do

Receive scenario set St that are consistent with the feedback and outcomes received so far. Compute Pr(i) := P

s St:si=1 Pr(s | St) Query any box i arg max{Pr(i) | i B}. if s i = 1 then return end if end for

Here we give an overview of our proof, the whole proof is deferred to Appendix F.2. Our proof is based on a linear programming approach. Assume the feedback is known in advance, then the problem becomes to assign a box for each node of the feedback tree T(F) to minimize the average number of boxes used to cover the drawn scenario. This problem can be naturally lower bounded by a linear program, and thus every feasible solution to the dual of the linear program gives a lower bound for OPT. We will show that a simple greedy algorithm with no knowledge of T(F) can

be used to construct a feasible solution to the dual program such that the cost of the greedy algorithm is at most a quarter times the dual objective of the solution it constructs.

By Theorem 13 in (Feige et al., 2004), we know that for every ϵ > 0, it is NP-hard to approximate MSSC within a ratio of 4 ϵ. MSSC is a very special case of MSSC with Time Dependent Feedback, thus the result given by Theorem 4.4 is tight if we only consider learners that can be implemented in poly-time. However, in the classic MSSC, if we allow a learner to be implemented in super-polynomial time, then we can simply compute the optimal order of box to query using a brute force method. This gives a natural question. Is the knowledge of F useful? We show that such knowledge is indeed useful by giving the following information theoretical lower bound for MSSC with Time Dependent Feedback. That is to say, we consider all learners regardless of their running time. We establish the following information theoretic lower bound for MSSC with time dependent feedback. The proof is deferred to Appendix F.3.

Theorem 4.5. For every ϵ > 0, there is no deterministic learner that is 2 ϵ-competitive for Min Sum Set Cover with Time Dependent Feedback.

4.3. Buying Information for Min Sum Set Cover

In the last section, we establish a prophet inequality for MSSC. In this section, we go back to the original motivation of buying feedback for adaptive stochastic optimization to discuss the upper bound and information theoretic lower bound for MSSC when asking for feedback requires some cost. The model of the problem is given as follows.

According to Theorem 4.4 and Theorem 4.3, we can immediately obtain an efficient competitive learner to buy feedback for Min Sum Set Cover, which is described in Algorithm 4.

Theorem 4.6. There is a poly-time learner that is 8competitive for buying information for Min Sum Set Cover.

The main goal of this section is to obtain an information theoretical lower bound for buying information for MSSC. We establish the information theoretic lower bound via the following theorem, whose proof is in Appendix F.4.

Theorem 4.7. For every ϵ > 0, there is no deterministic algorithm that is 2 ϵ-competitive for buying information for Min Sum Set Cover.

5. Acknowledgements

This work was supported by the NSF Award CCF-2144298 (CAREER).

Buying Information for Stochastic Optimization

Algorithm 4 GREEDYBUYING (8-competitive learner for MSSC with Feedback)

for t = 0, 1, 2, . . . do

Receive scenario set St that are consistent with the feedback and outcomes received so far. Receive the cost ct to receive a signal yt+1 from ft+1 for j = 1 . . . , ct do

Compute Pr(i) := P

s St:si=1 Pr(s | St). Query any box i arg max{Pr(i) min i B}. if s i = 1 then return else

Update St St {s S | si = 0}. end if end for Pay ct to obtain signal yt+1 from ft+1. end for

Adler, M. and Heeringa, B. Approximating optimal binary decision trees. Algorithmica, 62(3):1112 1121, 2012.

Bar-Noy, A., Bellare, M., Halld orsson, M. M., Shachnai, H., and Tamir, T. On chromatic sums and distributed resource allocation. Information and Computation, 140 (2):183 202, 1998.

Bar-Noy, A., Halld orsson, M. M., and Kortsarz, G. A matched approximation bound for the sum of a greedy coloring. Information Processing Letters, 71(3-4):135 140, 1999.

Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT 2010, pp. 177 186. Springer, 2010.

Chawla, S., Gergatsouli, E., Teng, Y., Tzamos, C., and Zhang, R. Pandora s box with correlations: Learning and approximation. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pp. 1214 1225. IEEE, 2020.

Dasgupta, S. Analysis of a greedy active learning strategy. Advances in neural information processing systems, 17, 2004.

Emek, Y., Feldman, M., Gamzu, I., Paes Leme, R., and Tennenholtz, M. Signaling schemes for revenue maximization. ACM Transactions on Economics and Computation (TEAC), 2(2):1 19, 2014.

Feige, U., Lov asz, L., and Tetali, P. Approximating min sum set cover. Algorithmica, 40(4):219 234, 2004.

Fujishige, S. Submodular functions and optimization. Elsevier, 2005.

Golovin, D. and Krause, A. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research, 42:427 486, 2011.

Hartline, J. D. Mechanism design and approximation. Book draft. October, 122(1), 2013.

Karlin, A. R., Manasse, M. S., Mc Geoch, L. A., and Owicki, S. Competitive randomized algorithms for nonuniform problems. Algorithmica, 11(6):542 571, 1994.

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. ar Xiv preprint ar Xiv:1412.6980, 2014.

Langley, P. Crafting papers on machine learning. In Langley, P. (ed.), Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp. 1207 1216, Stanford, CA, 2000. Morgan Kaufmann.

Li, R., Liang, P., and Mussmann, S. A tight analysis of greedy yields subexponential time approximation for uniform decision tree. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 102 121. SIAM, 2020.

Nisan, N. and Ronen, A. Algorithmic mechanism design. In Proceedings of the thirty-first annual ACM symposium on Theory of computing, pp. 129 140, 1999.

Roughgarden, T. Twenty lectures on algorithmic game theory. Cambridge University Press, 2016.

Settles, B. Active learning. Synthesis lectures on artificial intelligence and machine learning, 6(1):1 114, 2012.

Shalev-Shwartz, S., Shamir, O., Srebro, N., and Sridharan, K. Stochastic convex optimization. In COLT, pp. 5, 2009.

Streeter, M. and Golovin, D. An online algorithm for maximizing submodular functions. Advances in Neural Information Processing Systems, 21, 2008.

Weitzman, M. L. Optimal search for the best alternative. Econometrica: Journal of the Econometric Society, pp. 641 654, 1979.

Buying Information for Stochastic Optimization

A. Equivalence of Randomized and Deterministic Signaling Schemes

In our model, it is sufficient to study the case when each signaling scheme is deterministic. In this part, we give a brief discussion on the equivalence of randomized and deterministic signaling schemes.

Given a set of scenarios S, a distribution D over S, and a randomized signaling scheme f. We show we can construct a modified triple (S , D , f ) such that f is a deterministic signaling scheme and (S , D , f ) is equivalent to (S, D, f). The triple is constructed in the following way. S contains multiple copies for each s S. D is a uniform distribution over S . For every s S, assume the range of f(s) is {y1(s), . . . , yk(s)} and the set of copies is {Y1(s), . . . , Yk(s)} accordingly. The sizes of the copies are made such that if we draw a scenario according to D , the probability that it is a copy of s is equal to the probability of obtaining s from D. Furthermore, if we uniformly draw a copy from {Y1(s), . . . , Yk(s)} the probability that we obtain a copy from Yi(s) is equal to the probability that we receive yi(s) from f(s). In this way, we define f (s ) = yi(s) if s Yi(s). Thus, we obtain an equivalent triple (S , D , f ) with a deterministic signaling scheme.

B. Equivalence of Super-Martingale Stopping and Buying Information for Offline Stochastic Optimization

In this part, we give a brief discussion on the equivalence of the super-martingale stopping problem and buying information for offline stochastic optimization problems.

We have seen that the super-martingale stopping problem is a special case of buying information for offline stochastic optimization. To see the other direction, it remains to see that given an instance I = (X, S, ℓ, D, F, C) of buying information for offline stochastic optimization, we can assume each ct = 1 C. We give the intuition here via the definition of feedback tree. Let T(F) be a feedback tree. Assume a learner arrives at a node v of T(F), the posterior distribution of the stochastic optimization problem at v is Dv and the cost to move to the next node v is cv. Then we can add cv 1 virtual nodes between v and v such that the posterior distribution at each node is Dv and the cost to move to the next node is 1. After the modification, we can run any algorithm for the super-martingale stopping problem over the modified instance. We pay cv to move to v if and only if we reach v in the modified instance. In this way, any α-competitive algorithm for the super-martingale stopping problem can be used to construct an α-competitive learner to buy information for offline stochastic optimization problems.

C. Natural Algorithms Fail for Martingale Stopping Problem

In this section, we show some natural algorithms that work for ski-rental problems but fail for the super-martingale stopping problem. According to (Karlin et al., 1994), it is well-known that the following algorithm is 2-competitive for the ski-rental problem.

Algorithm 5 CLASSICSKIRENTAL (2-competitive deterministic algorithm for ski-rental)

for i = 0, 1, 2, . . . do

Observe Xi = vi if vi i then

Stop and pay i + vi. end if end for

Theorem C.1. Algorithm 5 is not competitive for the super-martingale stopping problem.

Proof. We construct a sequence of instance In of the super-martingale stopping problem. Denote by ALG(In) the cost of Algorithm 5 over instance In and denote by OPT(In) the optimal cost of In. We will show that ALG(In) Hn OPT(In), where Hn is the nth harmonic number.

Let (X0, . . . , Xn) be the sequence of random variables for instance In. Define X0 = 1 to be a constant. For every i 1, Xi can take two possible values. Given Xi 1, Xi = 0 with probability 1 i+1 and with probability i i+1, Xi = i+1

i Xi 1. That is to say, Xi is either 0 or i + 1 and EXi = 1.

Assume we run Algorithm 5 over instance In. Suppose we just observe Xi. If Xi = 0, then we stop and pay i right away.

Buying Information for Stochastic Optimization

If Xi = i + 1, then Algorithm 5 will keep querying Xi+1. Denote by i the random variable of the stopping time of Algorithm 5. Then, we have

1 i + 1 = Hn 1.

On the other hand, we know from the construction of the instance that EXi = 1, since EXi = 1 for every i. Thus the total cost of the algorithm is ALG(In) = Ei + Xi = Hn. On the other hand, we have OPT(In) 1, since it can simply stop at the beginning. This gives ALG(In) Hn OPT(In), which implies that Algorithm 5 is not competitive.

The reason why Algorithm 5 fails is that (X1, . . . , Xn) might be an increasing sequence, which forces the algorithm to keep querying the next box forever. To avoid keeping querying boxes forever, a natural idea is to change the stopping rule by looking at the smallest value we have seen so far. However, it turns out that such a stopping rule still fails. We consider the following algorithm.

Algorithm 6 REVISEDSKIRENTAL (2-competitive deterministic algorithm for ski-rental)

for i = 0, 1, 2, . . . do

Observe Xi = vi and set v = min{v0, . . . , vi}. if v i then

Stop and pay i + vi. end if end for

Theorem C.2. Algorithm 6 is not competitive for the super-martingale stopping problem.

Proof. We construct a sequence of instance In of the super-martingale stopping problem. Denote by ALG(In) the cost of Algorithm 6 over instance In and denote by OPT(In) the optimal cost of In. We will show that ALG(In) Ω(n)OPT(In).

Let (X0, . . . , Xn, Xn+1) be the sequence of random variables of instance In of the super-martingale stopping problem. Define X0 = n and Xn+1 = 0. For i [n], Xi can take two possible values. Given Xi 1, Xi = en Xi 1 with probability e n and Xi = 0 with probability 1 e n. That is to say for i [n], EXn = n. Notice that according to the stopping rule of Algorithm 6, Xn+1 will never be queried by the algorithm. Thus, we have ALG(In) EXi = n.

On the other hand, we consider an algorithm that keeps querying Xi+1 if Xi = 0. Denote by i the stopping time of this algorithm. We know that EXi = 0. Furthermore, we have

i=1 i(1 e n)

j=1 e n e n n+1 X

i=1 i O(1).

This implies that OPT(In) Ei + Xi O(1), while ALG(In) Ω(n). Thus, Algorithm 6 is not competitive.

D. A Simple Randomized Algorithm for Martingale Stopping Problem

In this section, we give a simple randomized 2-competitive algorithm for the super-martingale stopping problem.

Algorithm 7 THROWCOIN (Simple 2-competitive randomized algorithm for super-martingale stopping)

for i = 0, 1, 2, . . . do

Observe Xi = vi. Stop and pay i + vi with probability min{1, 1/vi}. end for

Buying Information for Stochastic Optimization

Theorem D.1. Algorithm 7 is 2-competitive for the super-martingale stopping problem.

Proof. Let (X0, . . . , Xn) be a sequence of random variables, and let T be the tree representation of the sequence. Denote by OPT(T) the optimal cost of the instance and denote by ALG(T) the cost of Algorithm 7 over the instance. We prove the theorem using inductions on the number of random variables, which is also the depth of T.

If depth(T) = 0, which means there is only one random variable X0 in the sequence, the cost of any algorithm is X0 and the theorem holds trivially. Assuming the theorem holds for any tree with depth n k, we show the theorem holds for any tree with depth n k 1. Let T be a tree of an instance of super-martingale stopping problem such that depth(T) = n k 1. Let v be the root of T and let v1, . . . , vk be the children of v. Denote by T i the subtree rooted at vi. By a dynamic programming approach, we know that

OPT(T) = min{v, 1 +

i=1 Pr(vi)OPT(T i)}.

We consider two cases. In the first case, OPT(T) = v. Without loss of generality, we assume v > 1, otherwise, the algorithm will simply stop at v. The cost of Algorithm 7 is

i=1 Pr(vi)ALG(T i))

i=1 Pr(vi)OPT(T i))

i=1 Pr(vi)vi)

v + 2v 2 2v.

Here, in the first inequality, we use the induction hypothesis, in the third inequality, we use the super-martingale property.

In the second case, OPT(T) = 1 + Pk i=1 Pr(vi)OPT(T i). Similarly, we have

i=1 Pr(vi)ALG(T i))

i=1 Pr(vi)OPT(T i))

i=1 Pr(vi)OPT(T i)

This shows that for every instance with a tree representation T, ALG(T) 2OPT(T). This implies Algorithm 7 is 2-competitive.

E. Miss Proof in Section 3

E.1. Proof of Lemma 3.5

Proof. We prove this lemma using induction on the depth of v. If v has a depth of n (v is a leaf), then Lemma 3.5 follows directly by Lemma 3.4, since Q 1 pj (ρ) Q 1 pj (ρ ) = R ρ ρ vdw = (ρ ρ )v. Assume Lemma 3.5 holds for every node v

with depth n k, we show this for a node v with depth n k 1. We notice that if ρ Qpj(n k), then this is correct by Lemma 3.4. So in the rest of the proof, we assume ρ > Qpj(n k). Let {uj}ℓ j=1 be the set of children of v and let

Buying Information for Stochastic Optimization

D(uj) {pj}k j=1 be the set of paths that passes uj. then we have

j=1 Pr(pj)(Q 1 pj (ρ) Q 1 pj (ρ )) =

j=1 Pr(pj)(n k Q 1 pj (ρ )) +

p D(uj) Pr(p)(Q 1 pj (ρ) (n k))

j=1 Pr(pj)(Qpj(n k) ρ )v +

j=1 Pr(uj)(ρ (Qp(n k)))uj

= Pr(v)(Qp(n k) ρ )v +

j=1 Pr(uj)(ρ (Qp(n k)))uj

Pr(v)(Qp(n k) ρ )v + Pr(v)(ρ (Qp(n k)))v

= Pr(v)(ρ ρ )v.

Here, in the first inequality, we use the assumption of induction and in the second inequality, we use the fact that Pℓ j=1 Pr(uj | v)uj uj.

E.2. Proof of Theorem 3.6

Proof. We show Algorithm 1 is 2-competitive. Let T be a tree representation of an instance of the super-martingale stopping problem. We maintain two sets of nodes O and A in the following way. For each path p T. We travel down p from the root of T and stop traveling at a node v of p if either v is a stopping node of OPT(T) or it is a stopping node of Algorithm 1. In the first case, we add v to O, otherwise, we add it to A. We denote by T the subtree of T with the set of leaves A O. Furthermore, let P(v) be the path of T that ends at v O A. Then we have the following lower bound for OPT(T).

v O Pr(v) (v + depth(v)) + X

v A Pr(v)depth(v)

To upper bound ALG(T), we will need to establish the following inequality and claim. Let P be a path such that there is some v P O, then there must be some stopping node f P (v) P of ALG(T) that has v as its ancestor. Let D(v) be the set of paths that passes v. We know from the stopping rule of Algorithm 1 that for every P D(v), QP (depth(f P (v))) 1. By Lemma 3.5, we have

P D(v) Pr(P)(depth(f P (v)) depth(v)) X

P D(v) Pr(P)(Q 1 P (1) depth(v)) Pr(v) (1 QP (v)) v Pr(v)v. (1)

Furthermore, we next prove the following claim.

Claim 1. Let T be a subtree of T with the same root of T. Let S be the set of leaves of T . For each v S, denote by P(v) the path from the root to v. If every path of T has a node in S, then P

v S Pr(v)v QP (v)(depth(v)) P

v S Pr(v)depth(v).

Proof of Claim. Let v S be a leave of T . Assume that depth(v) = i and P(v) = (v0, . . . , vi). Then

v QP (v)(i) = vi

Now we prove this claim by induction on the depth of T . If T has a depth of 0, then the claim holds trivially. Now we assume the claim for any tree with depth k, we show this holds for a tree T with depth k + 1. We remove the nodes with depth k + 1 in T and denote by the remaining tree T. Denote by S the leaves of T and denote by K the set of leaves of T

Buying Information for Stochastic Optimization

with depth k. For every node v, let N(v) be the set of children of v. Then we have

v S p(v)v QP (v)(depth(v)) = X

v S Pr(v)v QP (v)(depth(v))

v K Pr(v) X

Pr(u | v)(u QP (u)(k + 1) v QP (v)(k))

v S Pr(v)depth(v) + X

v K Pr(v) X

Pr(u | v)(u QP (u)(k + 1) v QP (v)(k))

v S Pr(v)depth(v) + X

v K Pr(v) X

Pr(u | v)u(QP (u)(k + 1) QP (v)(k))

v S Pr(v)depth(v) + X

v K Pr(v) = X

v S Pr(v)depth(v).

Here, the first inequality follows by our induction, the second inequality follows by the super-martingale property and the second equality follows by (2).

This gives the following upper bound for ALG(T).

Pr(v) (v + depth(v)) + X

P D(v) Pr(P)(depth(f P (v)) depth(v))

v A Pr(v) (v + depth(v))

v O Pr(v) (depth(v) + 2v) + X

v A Pr(v) (v + depth(v))

v O Pr(v) (depth(v) + 2v) + X

v A Pr(v) v(QP (v)(depth(v)) + 1) + depth(v)

v O Pr(v) (depth(v) + 2v) + X

v A Pr(v) v(QP (v)(depth(v)) + 1) + depth(v)

v O Pr(v) v QP (v)(depth(v))

v O Pr(v)v + 2 X

v A Pr(v)v + 2 X

v A Pr(v)depth(v) + 2 X

v A Pr(v)depth(v)

Here, in the first inequality, we used the super-martingale property of T. In the second inequality, we use (1). In the third inequality, we use the stopping rule of Algorithm 1. The second last inequality follows by Claim 1.

E.3. Proof of Theorem 3.8

Proof. We show Algorithm 2 is e/(e 1)-competitive. Let T be a representation of an instance of the super-martingale stopping problem. Let S = {v j }k j=1 be the set of stopping nodes of OPT(T). Let u S and let P(u) T be the path from the root to u. We notice that we can assume the depth of u is at most Q 1 P (u)(1). Since the cost of Algorithm 2 only depends on the value of nodes with depth strictly less than Q 1 P (u)(1), we can assume every node with a depth larger than Q 1 P (u)(1) has a value of 0. This assumption doesn t affect the cost of Algorithm 2 but will force every u S has depth at most Q 1 P (u)(1) . Under this assumption, if a node u has depth exactly Q 1 P (u)(1) , we can furthermore assume the

contribution of u to the cost of OPT(T) is Q 1 P (u)(1). This will only decrease the cost of OPT(T). So in the rest of the proof, every u in S has a depth at most Q 1 P (u)(1). In particular, this implies for every u S, there exists some ρu [0, 1] such

Buying Information for Stochastic Optimization

that QP (u)(depth(u)) = ρu. Thus, we can write

u S Pr(u) u + Q 1 P (u)(ρu) .

On the other hand, we can decompose the cost of the algorithm according to S. For every u S, we define D(u) to be the set of paths in T from the root to a leaf that passes u. Then we can write the cost of the algorithm

u S Pr(u) X

p D(u) Pr(p | u) Z 1

vp (Q 1 p (ρ)) + Q 1 p (ρ) p(ρ)dρ,

where we use the fact that when the algorithm stops at time t, the depth of the stopping node is at most t. This implies

ALG(T) OPT(T) X

u S Pr(u) Z 1

p D(u) Pr(p | u) h vp (Q 1 p (ρ)) + Q 1 p (ρ) u + Q 1 P (u)(ρu) i p(ρ)dρ

u S Pr(u) Z ρu

p D(u) Pr(p | u) h vp (Q 1 p (ρ)) + Q 1 p (ρ) u + Q 1 P (u)(ρu) i p(ρ)dρ

u S Pr(u) Z 1

p D(u) Pr(p | u) h u + Q 1 P (u)(ρu) vp (Q 1 p (ρ)) + Q 1 p (ρ) i p(ρ)dρ

u S Pr(u) Z ρu

p D(u) Pr(p | u) h vp (Q 1 p (ρ)) + Q 1 p (ρ) u + Q 1 P (u)(ρu) i p(ρ)dρ

u S Pr(u) Z 1

p D(u) Pr(p | u) Q 1 p (ρ) Q 1 P (u)(ρu) p(ρ)dρ

u S Pr(u) Z ρu

p D(u) Pr(p | u) h vp (Q 1 p (ρ)) + Q 1 p (ρ) u + Q 1 P (u)(ρu) i p(ρ)dρ

u S Pr(u) Z 1

ρu (ρ ρu)up(ρ)dρ

u S Pr(u) Z ρu

v P (u)(Q 1 P (u)(ρ)) + Q 1 P (u)(ρ) Q 1 P (u)(ρu) p(ρ)dρ

u S Pr(u) Z 1

ρu (ρ ρu)up(ρ)dρ Z ρu

0 up(ρ)dρ .

Here, the second inequality follows the super-martingale property of the sequence of random variables starting from node u. The second inequality follows by Lemma 3.5.

Recall our goal is to show that ALG(T) e e 1OPT(T) = ALG(T) OPT(T) 1 e 1OPT(T) 0. For every u S, we define two functions Fu(s) and Gu(τ) as follows. Let

Fu(s) := Z s

v P (u)(Q 1 P (u)(ρ)) + Q 1 P (u)(ρ) Q 1 P (u)(s) p(ρ)dρ 1 e 1Q 1 P (u)(s)

Gu(τ) := Z 1

ρu (ρ ρu)τp(ρ)dρ Z ρu

0 τp(ρ)dρ 1 e 1τ.

From our above discussion, we know that

ALG(T) e e 1OPT(T) = ALG(T) OPT(T) 1 e 1OPT(T) X

u S Pr(u) (Fu(ρu) + Gu(u)) .

Buying Information for Stochastic Optimization

It is sufficient to show for every u, Fu(s) 0, s [0, ρu] and Gu(τ) 0, τ 0. We first look at Fu(s). Recall the definition of the density function is p(ρ) = eρ e 1. We know from Lemma 3.4 that

Fu(s) = Z s

v P (u)(Q 1 P (u)(ρ)) Z s

ρ v P (u)(Q 1 P (u)(w)dw p(ρ)dρ 1 e 1

0 v P (u)(Q 1 P (u)(w)dw

0 v P (u)(Q 1 P (u)(w)dw er

0 v P (u)(Q 1 P (u)(w)dwp(ρ)dρ

ρ v P (u)(Q 1 P (u)(w)dwp(ρ)dρ 1 e 1

0 v P (u)(Q 1 P (u)(w)dw

0 v P (u)(Q 1 P (u)(w)dw Z s

0 v P (u)(Q 1 P (u)(w)dwp(ρ)dρ 1 e 1

0 v P (u)(Q 1 P (u)(w)dw

0 v P (u)(Q 1 P (u)(w)dw = 0.

Then we look at Gu(τ). We have

ρu (ρ ρu)p(ρ)dρ Z ρu

0 p(ρ)dρ 1 e 1

ρu p(ρ)dρ ρu(e eρu)

0 p(ρ)dρ 1 e 1

This implies that Gu(τ) Gu(0) = 0. Put the above arguments together, we obtain ALG(T) e e 1OPT(T).

E.4. Proof of Theorem 3.10

Proof. Let I be an instance of super-martingale stopping problem. Let OPT(I) = min E (i + Xi ) and OPT = E mini (i + Xi). We will construct a sequence of instance IN such that OPT(IN) Ω(N)OPT (IN), showing that the gap between the two benchmarks can be arbitrarily large.

Let (X0, X1, . . . ) be the sequence of random variables of instance IN. Define X0 = N. For every i 1, Xi can take two possible values. Given Xi, Xi+1 = e NXi with probability e N and Xi+1 = 0 with probability 1 e N. That is to say, the sequence of random variables is a super-martingale with a mean equal to N. Thus, the optimal stopping rule is to simply stop at X0 and OPT(IN) = N. On the other hand, consider any realization (x0, x1, . . . ) of the sequence. We notice from the construction that if xi = 0 then for every j > i, xj = 0. Denote by i the smallest index such that xi = 0. Then we have min i + xi = i if i N and min i + xi = N if i > N. Since Pr(i = i) = (1 e N)e (i 1)N, we have

i=1 i(1 e N)e (i 1)N +

i=N+1 N(1 e N)e (i 1)N O(1).

This implies OPT(IN) Ω(N)OPT (IN).

E.5. Proof of Theorem 3.11

Proof. It is sufficient to show that if we run Algorithm 1 or Algorithm 2 then ALG( T) αALG(T). Recall that the only difference between Algorithm 1 and Algorithm 2 is that they use different threshold ρ. Algorithm 1 uses ρ = 1 and Algorithm 2 uses a random threshold. Let ρ [0, 1] be a realization of the random threshold used in Algorithm 1 and Algorithm 2. We denote by ALGρ( T) and ALGρ(T) the cost of the Algorithm on the corresponding instances with a threshold ρ. In the rest of the proof, we will show ALGρ( T) αALGρ(T) for every ρ [0, 1]. This will directly imply that ALG( T) αALG(T).

Buying Information for Stochastic Optimization

Since the only difference between T and T is the value of each node, let p = (v0, v1, . . . , vn) be a path in T, we define vp(t) = vi if t [i, i + 1). We can also define Qp(t) and Q 1 p (t) in the similar way. Using these notations, we have

ALGρ(T) = Ep Q 1 p (ρ) + vp(Q 1 p (ρ)) = Z ρ

0 Epvp(Q 1 p (w))dw + Epvp(Q 1 p (ρ) ρEpvp(Q 1 p (ρ)) + Epvp(Q 1 p (ρ)).

Here, the second equality follows by Lemma 3.4 and the inequality follows by the super-martingale property of T.

On the other hand, for every path p, since for every v T, v v, we know that Q 1 p (ρ) Q 1 p (ρ). If we denote by t = Q 1 p (ρ), then this implies there exists some ρ ρ such that Qp(t ) = ρ . In particular, since vp(t) αvp(t) for every t, it follows that ρ 1

αρ. Thus, we can write

ALGρ( T) = Ep Q 1 p (ρ) + Q 1 p (ρ) Q 1 p (ρ ) + vp( Q 1 p (ρ))

= Ep Q 1 p (ρ) + Ep

ρ vp( Q 1 p (w))dw + Ep vp( Q 1 p (ρ))

Ep Q 1 p (ρ) + α(ρ ρ )Epvp(Q 1 p (ρ)) + αEpvp(Q 1 p (ρ))

Ep Q 1 p (ρ) + (α 1)ρEpvp(Q 1 p (ρ)) + αEpvp(Q 1 p (ρ)).

Here, the equality follows Lemma 3.4, the first inequality follows by the super-martingale property of the T and the second inequality follows by the fact that ρ 1

αρ. Thus, we obtain

ALGρ( T) ALGρ(T) Ep Q 1 p (ρ) + (α 1)ρEpvp(Q 1 p (ρ)) + αEpvp(Q 1 p (ρ))

Ep Q 1 p (ρ) + Epvp(Q 1 p (ρ)) max{α, 1 + (α 1)ρEpvp(Q 1 p (ρ))

Ep Q 1 p (ρ) }.

Here we use the fact that if a, b, c, d 0, then a+b

d}. By Lemma 3.4 and the super-martingale property, we know that

Ep Q 1 p (ρ) = Z ρ

0 Epvp(Q 1 p (w))dw ρEpvp(Q 1 p (ρ)).

This implies

1 + (α 1)ρEpvp(Q 1 p (ρ))

Ep Q 1 p (ρ) 1 + α 1 = α.

Thus, ALGρ( T) αALGρ(T) for every ρ [0, 1].

F. Missing Proof in Section 4

F.1. Proof of Theorem 4.3

As we mentioned in the main body of the paper, Theorem 4.3 not only holds for MSSC but also holds for a broader class of stochastic optimization problems. In this part, we give the general statement and the proof for Theorem 4.3. To begin with, we define a broad class of adaptive stochastic optimization problems for which Theorem 4.3 holds.

Definition F.1 (Adaptive Stochastic Optimization with Covering Loss). Let (X, S, ℓ, D) be an adaptive stochastic optimization problem. For every s S, we define a family of sets of actions C(s) 2X . We say a scenario s is covered if a set of actions A C(s) are taken. We say the loss function ℓis a covering loss if for every scenario s S and every sequence of actions a = (a1, a2, . . . ), ℓ( a, s) = min{t | A C(s), s.t.A {a1, . . . , at}}, which is the time for a to cover s.

Many adaptive stochastic optimization problems such as MSSC and optimal decision tree problems have covering objective functions. Next, we give a general statement of Theorem 4.3, which builds a connection between adaptive stochastic optimization with time dependent feedback and buying information for adaptive stochastic optimization. Theorem F.2 (General Version of Theorem 4.3). Let (X, S, ℓ, D) be an adaptive stochastic optimization problem with a covering loss function. If (X, S, ℓ, D) satisfies an α-prophet inequality, then there is a 2α-competitive learner for the adaptive stochastic optimization problem with feedback (X, S, ℓ, D, F, C).

Buying Information for Stochastic Optimization

Proof. Denote by I = (X, S, ℓ, D, F, C) the instance of buying information for stochastic optimization and OPT(I) be the optimal value of the instance. Since (X, S, ℓ, D) satisfies an α-prophet inequality, let A be an α-competitive learner for the stochastic optimization problem with feedback. At time round t, denote by Rt the set of outcomes received after taking a set of actions and denote by Yt a set of signals received from the signaling schemes. Notice that Rt, Yt are random sets that depend on the random scenarios s. Then a = A(Rt, Yt) is the next action taken by the learner A. Based on the notations, we design the following algorithm, which will be shown as 2α-competitive for I = (X, S, ℓ, D, F, C).

Algorithm 8 BUYINGINFORMATION (2α-competitive algorithm for (X, S, ℓ, D, F, C))

for t = 0, 1, 2, . . . do

Receive Rt 1 the set of outcomes received so far and Yt 1 the signals received so far. Receive a cost ct to receive a signal from ft+1. for i = 1, . . . , ct do

Take action ai = A(Rt 1, Yt 1) and receive outcome ri. Update Rt 1 Rt 1 {ri}. if s is covered then

return end if end for Pay ct to get signal yt+1 from ft+1. Update Rt Rt 1, Yt Yt 1 {yt+1} end for

We notice that during the execution of Algorithm 8, we count the time round in a different way for convenience. This doesn t affect the final cost of the algorithm. We now decompose the cost of Algorithm 8 into two parts. Denote by A Algorithm 8. For each scenario s S, let cost(s) be the total cost of A when s is drawn. We write

cost(s) = costf(s) + costc(s),

where costf(s) is the feedback cost, the total cost A spends on buying signals when s is drawn and costc(s) is the coverage cost, the number of actions taken by A to cover s. That is to say

cost(A , I) = Es Dcost(s) = Es Dcostf(s) + Es Dcostc(s)

2Es Dcostc(s),

since the feedback cost is always less than the coverage cost.

In the rest of the proof, we will construct an instance I = (X, S, ℓ, D, F ) of stochastic optimization with time dependent feedback based on I such that Es Dcostc(s) αOPT( I) and OPT( I) OPT(I). We construct the feedback F by constructing every possible sequence of signals received from F . Let (y0, y1, . . . ) be a sequence of signals received from the signaling schemes (f0, f1, . . . ). Let ct(yt) be the cost to obtain signal yt+1 for the sequence. Then for every t 0, we make ct(yt) copies for yt. Thus, the corresponding signals sequence in F is (y0, . . . , y0, y1, . . . , y1, . . . ), where yt appears ct(yt) times.

Now we consider Algorithm 8. If we ignore the step where we pay ct to get yt+1, then the remaining algorithm is exactly running A over I = (X, S, ℓ, D, F ). Since A is an α-competitive learner for I, we know that

Es Dcostc(s) = cost(A, I) αOPT( I).

It remains to show that OPT( I) OPT(I). Consider instance I. Assume s is the drawn scenario and the corresponding sequence of signals is (y0, y1, . . . ). Assume the sequence of actions taken in OPT(I) for this sequence of signals is (a0, a1, . . . ). We construct a sequence of actions taken for instance I with sequence of signals (y0, . . . , y0, y1, . . . , y1, . . . ) by modifying (a0, a1, . . . ). Assume that in OPT(I), the learner pays a cost c to obtain the next signal after taking actions ai. Then in the modified sequence, we take c arbitrary actions after ai. For every drawn scenario s, the modified sequences can take less cost to cover s. This implies that OPT( I) OPT(I). Putting things together, we have

cost(A , I) 2Es Dcostc(s) 2αOPT( I) 2αOPT(I),

which means Algorithm 8 is 2α-competitive for adaptive stochastic optimization with feedback.

Buying Information for Stochastic Optimization

F.2. Proof of Theorem 4.4

Before presenting the proof, we define the model of MSSC with time dependent feedback as a remainder.

Definition F.3 (Min Sum Set Cover with Time Dependent Feedback). Let B = [n] be a set of boxes, each box i contains an unknown number bi {0, 1} A learner can know bi by querying box i, i.e. the action space X = B. A scenario s {0, 1}n

is a binary vector that represents the number contained in each box. If scenario s is realized, then for every box i B, si = bi. A scenario s is covered if a box i such that si = 1 is queried. Let S be a set of scenarios and D be a probability distribution over S. Let f be a sequence of feedback. A scenario s is drawn from D initially. In each round t, a learner receives signal yt(s ) from signaling scheme ft and takes an action at X to query the box at and observed the number contained in that box. Given an instance (B, S, D, F) of Min Sum Set Cover with Time Dependent Feedback, the goal of a learner is to construct the sequence of boxes A to query to minimize Es Dℓ(A, s), where ℓ(A, s) is the number of boxes to query to cover the drawn scenario s.

As a remainder, we restate Algorithm 3, the simple greedy algorithm that we want to analyze here.

Algorithm 9 GREEDY (4-competitive Learner for MSSC with Time Dependent Feedback)

for t=0,1,2,. . . do

Receive scenario set St that are consistent with the feedback and outcomes received so far. Compute Pr(i) := P

s St:si=1 Pr(s | St) Query any box i arg max{Pr(i) | i B}. if s i = 1 then return end if end for

Proof. Without loss of generality, we can assume D is a uniform distribution over S and F contains only deterministic signaling schemes. This is because given a distribution D, we can modify S by making multiple copies of each scenario and uniformly draw a scenario from the modified set of scenarios according to our discussion in Appendix A. Under this assumption, we will write a linear program to lower bound OPT(I). We say a scenario s is covered by a box i if si = 1. For every scenario, s, denote by Ls the set of boxes that cover s.

Fix a sequence of feedback F, let T(F) be the feedback tree induced by F. Let P(s) be the longest path in T(F) such that s is contained in every node in P(s). Any learner A will assign a box to each node in T(F) such that for every scenario s, there is some node v P(s) such that the box assigned to v by A covers s. We derive the following integer program to capture the cost of a learner. For every node v T(F) and for every box i B, let xvi {0, 1} be the indicator if A assigns box i to node v. For every node, v T(F) and for every scenario s v, let yvs {0, 1} be the indicator if s is not covered by any box assigned to an ancestor of v. Here, we use the notation v < v to denote that v is an ancestor of v. For every scenario s, let cost(s) := P

v P (s) yvs, which is the time when s is first covered by an assigned box. Then, any learner A gives a feasible solution to the following integer program.

s S cost(s)

i B xvi 1 v T(F)

v :v <v xv i 1 v T(F), s v

xvi {0, 1} v T(F), i B

yvs {0, 1} v T(F), s S.

Here, the first set of constraints implies that for any node v, any learner can assign at most 1 box. The second set of constraints implies that for every node v and every s v, either s has not been covered so far or there is an ancestor v of v that is assigned a box i Ls by learner A. In particular, since D is uniform over S, |S|cost(A, I) = P s S cost(s). Thus, the following linear programming relaxation gives a natural lower bound for |S|OPT(I).

Buying Information for Stochastic Optimization

s S cost(s)

i B xvi 1 v T(F)

v :v <v xv i 1 v T(F), s v

xvi 0 v T(F), i B

yvs 0 v T(F), s v.

Let {Av}v T (F) be the set of dual variables for the first set of constraints in (LP) and let {Bvs}v T (F),s v be the set of dual variables for the second set of constraints in (LP). Then we derive the following dual linear program for (LP).

s.t. Bvs 1 v T(F) X

v :v<v Bv s Av v T(F), i B

Bvs 0 v T(F), s v

Av 0 v T(F).

Since (LP) is feasible and bounded, we know from linear programming dual theory that (DUAL) is feasible, furthermore, (DUAL) and (LP) have the same optimal value. Denote by D the optimal value of (DUAL), then we know that |S|OPT(I) D.

Next, we will show that there is an optimal solution to (DUAL) that has a special structure. We have the following observations.

Observation 1. Let (A, B) be any feasible solution to (DUAL). For every v T(F), let A v = maxi P

s:s v,i Ls P

v :v<v Bv s. For every v T(F), s v, let B vs = Bvs. Then (A , B ) is feasible to (DUAL), furthermore, (A , B ) has a larger objective value than (A, B).

The proof of Observation 1 follows directly by the second set of constraints in (DUAL).

Observation 2. Let (A, B) be any feasible solution to (DUAL). For every s S, let Cs := P

v P (s) Bvs. For every s S and for every v P(s), define

1 if depth(v) < Cs , Cs Cs if depth(v) = Cs , 0 otherwise.

For every v T(F), define A v = maxi P

s:s v,i Ls P

v :v<v B v s. Then (A , B ) is feasible to (DUAL) and (A , B ) has a larger objective value than (A, B).

Proof of Observation. The feasibility of (A , B ) follows by Observation 1. Thus, we only need to show (A , B ) has a larger objective value. We notice that X

s v Bvs = X

v P (s) Bvs = X

v P (s) B vs = X

It remains to show that P

v T (F) Av P

v T (F) A v. By Observation 1, we may assume Av = maxi P

s:s v,i Ls P

v :v<v Bv s for every v T(F). It is sufficient to show for every v and every s v,

Buying Information for Stochastic Optimization

v P (s):v<v Bv s P

v P (s):v<v B v s. We have

v P (s):v<v Bv s = Cs X

v P (s):v v Bv s Cs X

v P (s):v v Bv s = X

v P (s):v<v B v s.

Observation 1 and Observation 2 imply that an optimal solution to (DUAL) can be constructed in the following way. For each s S, assign Cs 0 to s. For every s S and for every v P(s), define

1 if depth(v) < Cs , Cs Cs if depth(v) = Cs , 0 otherwise.

For every v T(F), define Av = maxi P

s:s v,i Ls P

v :v<v Bv s. Let (A, B) be such a solution constructed in the way we discussed above using a vector C RS +. Denote by D(C) the objective value of (A, B). Then we have

v T (F) max i B

v :v<v Bv s

v :v<v Bv s1vi

s:s v,i Ls Bv s1vi

s:s v,i Ls Bv s1vi

s:s v,i Ls Bv s1vi,

where 1vi is the indicator function if i arg maxj P s:s v,j Ls P v :v<v Bv s. For convenience, we assume there is only one box that achieves the max.

We interpret D(C) via the following physical process. For each scenario s S, we generate a particle Ps. Ps moves along the path P(s) with a rate of 1 and stops at time t = Cs. The length of an edge in T(F) is 1. Let Vs(t) be the speed of Ps at time t. That is to say Cs = R 0 Vs(t)dt for every s S. From this point of view, we can write the first term in D(C) as

s S Vs(t)dt. (3)

On the other hand, for each node v T(F), there is a box i(v) such that 1vi(v) = 1. At a given time t, we will charge each moving particle Gs(t) := |{v | v P(s), depth(v) t, i(v) Ls}|. In other words, for every moving particle, we will charge it the number of visited nodes v such that box i(v) covers the corresponding scenario. Next, we build a connection between Gs(t) and D(C). For every s S, write P(s) = (v0, v1, . . . , vn). Then we have

v P (s):depth(v) t Vs(t)1vi = X

j=0 Vs(t)1vji,

which implies

0 Gs(t)dt = Z

j=0 Vs(t)1vjidt = X

i Ls Bv s1vi,

Buying Information for Stochastic Optimization

according to the construction of B. Thus, we can write the second term in D(C) as

s:s v,i Ls Bv s1vi Z

s S Gs(t)dt. (4)

Combine (3) and (4), we get

s S Vs(t) X

s S Gs(t)dt. (5)

In the rest of the proof, instead of constructing the optimal solution to (DUAL), we will construct a vector Cg based on Algorithm 3 such that |S|ALG(I) 4D(Cg) 4D 4|S|OPT(I), which implies that Algorithm 3 is 4-competitive.

Consider the implementation of Algorithm 3, the greedy algorithm. We notice that if we arrive at some node v T(F), the set of scenarios St we received is exactly Rv, the set of scenarios s v that has not been covered so far. Since D is uniform, the box A(v) queried by the algorithm at node v is the box that can cover most scenarios in Rv. Denote by Xv = {s Rv | A(v) Ls}, which is the scenarios in Rv covered by the box that Algorithm 3 queries in this round. Now we define Cs for each scenario s. Let P = (v0, v1, . . . , vn) be a path of T(F) from the root to some leaf. For each vi P, define Cvi = max{Cvi 1, |Rvi| c|Xvi|}, where c > 0 is a constant that we will determine later and Cv0 = |Rv0| c|Xv0|. Notice that {Xv}v T (F) forms a partition of S, so each s belongs to a unique Xv. For every node v T(F) and for every s v, we set Cs = Cv. Denote by Cg the vector we just constructed. We next show that |S|ALG(I) 4D(Cg). Notice that

|S|ALG(I) = X

v T (F) (depth(v) + 1)|Xv| = X

v T (F) |Rv|.

Based on this observation, we first derive the following lower bound for R 0 P

s S Vs(t)dt. We have

s S Vs(t)dt = X

0 Vs(s)dt = X

v T (F) |Rv| = 1

c |S|ALG(I).

Next, we will show that R 0 P

s S Gs(t)dt 1

s S Vs(t)dt. To do this, we upper bound P

s Gs(t) for every t 0. For every s S, let P t(s) = (v0(s), v1(s), . . . , v t (s)) be the truncation of path P(s) with length of t . We know that for every t, P t, the set of such truncated paths, forms a partition of S. So we can write P

s S Gs(t) = P

Let P = (v0, . . . , v t ) P t be such a truncated path. The set of particles that are moving along P corresponds to scenarios in v t with Cs t. We observe that along the path P, Cvi is a step function with respect to the index i. Based on the definition of Cs, for every v and every s Rv, we have Cs Cv. This implies that along the path P, there must be some i t such that the set of particles that are moving along P at time t corresponds to scenarios exactly in Rvi v t . In particular, if we consider the set P t(i ) of all paths in P t that passes vi , then at time t, the set of particles moving along these paths is exactly Rvi .

By the greedy property of Algorithm 3, every box can cover at most |Xvi | scenarios from Rvi . Since each path P P t(i ) contains at most t nodes and each node is charged by at most |Xvi | moving particles at time t, we have

s P Gs(t) t|Xvi | |Rvi |

c|Xvi ||Xvi | = 1

c |Rvi | = 1

Here, the second inequality follows by Cvi = |Rvi | c|Xvi | t. The last equality holds because P

P P t(i ) P

s P Vs(t) is the

number of moving particles along paths in P t(i ), which is |Rvi |. Thus we have

s S Gs(t)dt 1

s S Vs(t)dt.

Buying Information for Stochastic Optimization

Put the above discussions together, we have

s S Vs(t) X

s S Gs(t)dt (1 1

s S Vs(t)dt 1

c )|S|ALG(I) = 1

4|S|ALG(I),

by setting c = 2 to maximize the ratio. This shows Algorithm 3 is 4-competitive.

F.3. Proof of Theorem 4.5

Proof. We consider the following instance of min sum set cover with time dependent feedback. Let B be the set of n boxes. The set of scenarios S = {si}n i=1, where si j = 1 if i = j and 0 otherwise. D is a uniform distribution over S. Let A be any deterministic learner. We design a set of feedback FA such that cost(A, I) = n

2 o(1), while there is a learner A such that cost(A , I) = n

4 + o(1). Here, I = (B, S, D, FA) and cost(A, I) is the cost of a learner A over instance I.

We describe FA via its feedback tree representation T(FA). We first fix the structure of T(FA), then define the scenario contained in each node of T(FA). Let T(FA) be a binary tree. Let v be a node in T(FA). We denote by L(v) its left child and R(v) its right child. Let {vi}n i=1 be a path of T(FA) such that vi+1 = R(vi) and v1 be the root of T(FA). We define the set of scenarios contained in each node in T(F). We know that v1 = S. Let Ai = A(vi) be the box queried by A at node vi. We define L(vi) = {s Ai} and R(vi) = vi \ L(vi). This gives the definition of FA. Intuitively, every time A queries a box, F only tells A if it queries the unique box that contains 1. This is to say F is useless for A and the cost of A is

cost(A, I) = 1

j=1 j = n 1

On the other hand, let A be the following learner. Let A (vi) = An+1 i, for i [n] and A (L(vi)) = Ai. That is, along the path {vi}n i=1, the order of the queried box by A is the inverse of that of A and at every node L(vi), A queries the box corresponding to the unique scenario contained in L(vi). This implies

cost(A , I) = 1

Thus, we have cost(A,I)

cost(A ,I) 2, which implies no deterministic learner is 2 ϵ-competitive.

F.4. Proof of Theorem 4.7

Before presenting the proof, we remind the definition of buying information for MSSC.

Definition F.4 (Buying Information for Min Sum Set Cover). Let (B, S, D) be an instance of Min Sum Set Cover, F = {ft} t=0 be a sequence of feedback and C = {ct} t=0 be a sequence of cost for receiving a signal from ft+1 from F. Initially, a scenario s is drawn from D. In each time round t, before s is covered, a learner adaptively receives an arbitrary number of signals from the sequence F by paying the corresponding cost and then selects a box to query. An instance (B, S, D, F, C) of Buying Information for Min Sum Set Cover is to make decisions adaptively to minimize the expected number of the queried box plus the expected cost paid for the feedback to cover the random scenario.

Proof. We consider the following instance of buying information for min sum set cover. Let B be the set of n boxes. The set of scenarios S = {si}n i=1, where si j = 1 if i = j and 0 otherwise. D is a uniform distribution over S. We assume the cost of obtaining any single feedback is 1. Let A be any deterministic learner. We design a set of feedback FA for A.

We describe FA via its feedback tree representation T(FA). To do this, we will first fix the structure of T(FA), then describe the scenarios contained in each node. The structure of T(FA) is defined in the following way. There are ni + 1 nodes in T(FA) that have depth of i. Here n0 = 0 and for i 1, ni 0 is a number that depends on A. Furthermore, for each level of T(FA), only the rightmost node has children. In particular, for i 1, let vi+1 = R(vi) be the right most child of vi, where v1 is the root of T(FA).

Buying Information for Stochastic Optimization

We notice that given the structure of T(FA), any deterministic learner A can be described in the following way using T(FA). For every node v T(FA), A will query a set of boxes Bv in some order, where |Bv| 0. Denote by Bi, the set of boxes queried by A at node vi. Let ni = |Bi| 0, then set of scenarios that contained in R(vi) is defined by {sj vi | j Bi}. Recall that there are ni + 1 nodes in T(FA) that have depth i and we have defined the set of scenarios contained in one of these nodes. For the rest of ni nodes, we assign a unique scenario covered by Bi to each of them. This gives the definition of FA. In particular, FA is useless for A, since every time A asks for feedback, the feedback only tells A which scenarios are not covered so far.

Now we compute the cost of A. Consider the path (v1, v2, . . . , vk) in T(FA), such that Pk i=1 |Bi| = n. That is to say, all scenarios are covered before the kth feedback is asked. Notice that (B1, . . . , Bk) forms a partition of S. Let s Bi be the jth scenario in Bi covered by A, then the cost of A when scenario s is drawn is

cost(A, s) =

ℓ=1 nℓ+ i 1 + j,

which implies

cost(A, I) = Escost(A, s) = 1

ℓ=1 nℓ+ i 1 + j

We consider the two different cases. In the first case, Pn i=1 i Pk i=1 Pni j=1 i. We notice that any deterministic learner A

that asks for no feedback has a cost Escost(A , s) = 1

n Pn i=1 i. This means

cost(A, I) cost(A , I)

2 n Pn i=1 i 1

1 n Pn i=1 i = 2 on(1).

In the second case, we assume Pn i=1 i > Pk i=1 Pni j=1 i. In this case, we define a deterministic learner A in the following way. A keeps asking for feedback until the feedback reveals the drawn scenario, then A covers the drawn scenario via the unique box. It is not hard to see, any scenario in Bi will cost A , i + 1. Thus, Escost(A , s) = 1 + 1

n Pk i=1 Pni j=1 i. In this case, we have

cost(A, I) cost(A , I)

1 n Pn i=1 i + Pk i=1 Pni j=1 i 1

n Pk i=1 Pni j=1 i =

1 n Pn i=1 i + Pk i=1 Pni j=1 i

1 n Pk i=1 Pni j=1 i on(1) 2 on(1).

Thus, for every ϵ > 0, there is no deterministic learner that is 2 ϵ competitive.