# online_revenue_maximization_for_server_pricing__9aeb781c.pdf

Online Revenue Maximization for Server Pricing

Shant Boodaghians1 , Federico Fusco2 , Stefano Leonardi2 , Yishay Mansour3 and Ruta Mehta1

1University of Illinois at Urbana-Champaign, Urbana IL 61801, USA 2Sapienza University of Rome, 00185 Rome, Italy 3Tel Aviv University, P.O. Box 39040, Tel Aviv 6997801, Israel {boodagh2, rutameht}@illinois.edu, {fuscof, leonardi}@diag.uniroma1.it, mansour@tau.ac.il

Efﬁcient and truthful mechanisms to price time on remote servers/machines have been the subject of much work in recent years due to the importance of the cloud market. This paper considers online revenue maximization for a unit capacity server, when jobs are non preemptive, in the Bayesian setting: at each time step, one job arrives, with parameters drawn from an underlying distribution. We design an efﬁciently computable truthful posted price mechanism, which maximizes revenue in expectation and in retrospect, up to additive error. The prices are posted prior to learning the agent s type, and the computed pricing scheme is deterministic We also show the pricing mechanism is robust to learning the job distribution from samples, where polynomially many samples sufﬁce to obtain near optimal prices.

1 Introduction

Designing mechanisms for a desired outcome with strategic and selﬁsh agents is an extensively studied problem in economics, with classical truthful mechanisms by [Myerson, 1981], and Vickrey-Clarke-Groves [Vickrey, 1961]. The advent of e-commerce has shifted the priority from traditional objectives to computational efﬁciency, leading to simple, approximate mechanisms when optimal mechanisms are computationally intractable. Starting with [Nisan and Ronen, 1999], theoretical computer science has contributed greatly to the ﬁeld, in both fundamental problems and speciﬁc applications, such as designing truthful mechanisms for the maximization of welfare and revenue, or learning distributions of agent types, menu complexity, and dynamic mechanisms (e.g., [den Boer, 2015; Cole and Roughgarden, 2014].) We consider this question in the setting of selling computational resources on remote servers or machines (cf. [Tang et al., 2017; Babaioff et al., 2017]), one of the fastest growing markets on the Internet. Several difﬁculties arise when selling goods in the cloud market. First of all, the goods (resources) are assigned non-preemptively and thus have strong complementarities. Furthermore, the supply (server capacity) is limited, so any mechanism should trade off immediate revenue

for future supply. Finally, mechanisms must be incentivecompatible, to avoid the unpredictable effects of non-truthful, strategic, behaviour on the performance of the system. This leads us to ask the following question: can we design an efﬁcient, truthful, and revenue-maximizing mechanism to sell time non-preemptively on a unit-capacity server? We design a posted-price mechanism which maximizes expected revenue for agents/buyers arriving online, with parameters of value, length and maximum delay, drawn from an underlying distribution. Our focus will be on devising online mechanisms in the Bayesian setting. Three key aspects distinguish our problem from standard online scheduling: (i) In our setting, as time progresses, the server clears up, allowing longer jobs to be scheduled in the future if no smaller jobs are scheduled until then. (ii) Scheduling the jobs depends both on the interests of the mechanism designer (revenue), and the jobs being scheduled (cost). (iii) As the mechanism designer, we do not have access to job parameters in an incentive-compatible way before deciding on a posted price menu. In our online model, time on the server is discrete. At every time step, a job arrives on the server, with a value V , length requirement L, and maximum delay D. These parameters are drawn from a common distribution, i.i.d. across jobs. The job wishes to be scheduled for at least L consecutive time slots, no more than D time units after its arrival, and wishes to pay no more than V ; and will minimize price within these constraints. The mechanism designer never learns the parameters of the job. Instead, she posts a price menu of (length, price) pairs, and the minimum available delay s. The job simply accepts if it can be scheduled, i.e. D s and for some ℓ L, the price for ℓis at most V . If, at time epoch t, an agent chooses option (ℓ, πℓ), then she pays πℓand her job is allocated to the interval [t + s, t + s + ℓ]. She will choose the option which minimizes πℓ. Throughout this paper we assume that the random variables L, V, D are discrete and have ﬁnite support.

1.1 Our Results and Structure of the Paper In this work, we provide conditions for the existence of a posted-price Bayesian revenue-maximizing truthful mechanism for server pricing. Also we show how it can be computed efﬁciently, if those conditions hold.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

To give these results, we ﬁrst study the underlying optimization problem ignoring strategic considerations of the jobs. In Section 2 we model the problem of ﬁnding a revenue maximizing pricing strategy as a Markov Decision Process (MDP). Given a price menu (length,price) and a state (minimum available delay) s at time t, the probability of transition to any other state at time t+1 is obtained from the distribution of the job s parameters. In Section 3 we show that the revenue maximizing pricing strategy can be efﬁciently computed via backwards induction given the distribution. Unfortunately, the optimal posted price strategy does not necessarily imply a truthful mechanism. For example, we show that some basic monotonicity conditions of truthful mechanisms are violated if the valuation is not positively correlated with the length of the job. On the positive side, in Section 3.3 we prove that the optimal pricing strategy is monotone in length under a distributional assumption, which we show is satisﬁed when the jobs valuation follows a logconcave distribution, parametrized by length. Log-concave distributions are exactly those which have a monotone hazard rate [Wellner, 2012], a widely studied setting in the design of Bayesian mechanisms. Under the above mentioned assumptions, we conclude that truthfulness is possible for the optimal pricing mechanism if the distributions are known. In Section 3.4, we also show that the revenue obtained by this optimal posted-price strategy concentrates well around its mean. Concentration of the revenue obtained by the truthful optimal Bayesian mechanism is a property of practical and theoretical importance that is also used in the study of the robustness of the pricing strategy in settings of limited information on the distributions. We show (Section 4.2) that a near optimal solution is still obtained when the distribution is known up to small error. In Section 4.3 we complement this result by analyzing the performance of the proposed pricing strategy when the distribution is only known from samples collected through the observation of the agents decisions. We provide a truthful posted-price ε-approximate mechanism when the number of samples is polynomial in 1/ε and the size of the support of the distribution. Finally, Section 5 is devoted to summarizing our results and describing future directions of research.

1.2 Related Work Much recent work has focused on designing efﬁcient mechanisms for pricing cloud resources. Chawla et al. [Chawla et al., 2017] recently studied time-of-use pricing mechanisms, to match demand to supply with deadlines and online arrivals. Their result assumes large-capacity servers, and seeks to maximize welfare in a setting in which the jobs arriving over time are not i.i.d.. [Azar et al., 2015] provides a mechanism for preemptive scheduling with deadlines, maximizing the total value of completed jobs. Another possible objective for the design of incentive-compatible scheduling mechanisms is the total value of completed jobs, which have release times and deadlines. [Porter, 2004] solves this problem in an online setting, while [Carroll and Grosu, 2008], in the ofﬂine setting for parallel machines, and [Str ohle et al., 2014], in the online competitive setting with uncertain supply. [Jain et al., 2013] focuses on social welfare maximization

for non-preemptive scheduling on multiple servers, and obtains a constant competitive ratio as the number of servers increases. Our work differs from these by considering revenue maximization and stochastic job types which are i.i.d. over time. [Kilcioglu and Rao, 2016] addresses computing a price menu for revenue maximization with different machines. Finally, [Babaioff et al., 2017] proposes a system architecture for scheduling and pricing in cloud computing. Posted price mechanisms (PPM) have been introduced by [Sandholm and Gilpin, 2004] and have gained attention due to their simplicity, robustness to collusion, and their ease of implementation in practice. One of the ﬁrst theoretical results concerning PPM s is an asymptotic comparison to classical single-parameter mechanisms [Blumrosen and Holenstein, 2008]. They were later studied by [Chawla et al., 2010] for the objective of revenue maximization, and further strengthened by [Kleinberg and Weinberg, 2012] and [D utting and Kleinberg, 2015]. [Feldman et al., 2015] shows that sequential PPM s can 1/2-approximate social welfare for XOS valuation functions, if the price for an item is equal to the expected contribution of the item to the social welfare. Sample complexity for revenue maximization has recently been studied in [Cole and Roughgarden, 2014] showing that polynomially many of samples sufﬁce to obtain near optimal Bayesian auction mechanisms. [Morgenstern and Roughgarden, 2015] propose a statistical-learning approach with arbitrarily-near-optimal revenue. The same authors study the problem of learning simple auctions from samples [Morgenstern and Roughgarden, 2016].

Notation. In what follows, the variables t, ℓor L, v or V , and d or D are reserved for describing the parameters of a job that wishes to be scheduled. Respectively, they represent the arrival time t, required length ℓ, value v, and maximum allowed delay d. The lowercase variables represent ﬁxed values, whereas the uppercase represent random variables. Script-uppercase letters L, V, D represent the supports of the distributions on L, V , and D, respectively; and the bold-uppercase letters L, V, D represent the maximum values in these respective sets. Finally, π is reserved for pricing policy, whereas p is reserved for probabilities.

Single-Machine, Non-Preemptive, Job Scheduling. A sequence of random jobs wish to be scheduled on a server, nonpreemptively, within a time and cost constraint. Formally, at every time step t, a single job with parameters (L, V, D) is drawn from an underlying distribution Q over the space L V D. We remark we are not assuming independence between L, V and D, so the setting is quite general. Later we are only going to impose some mild monotone hazard rate assumptions on the distribution. Furthermore our setting contemplates also the possibility of time step with no job showing up: it corresponds to (L = 0, V = 0, D = 0). The job wishes to be scheduled for a price π V in an interval [a, b] such that a t D and b a L.

Price Menus. Our goal is to design a take-it-or-leave-it, posted-price mechanism which maximizes expected revenue.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

At each time period, the mechanism posts a price menu and an earliest-available-time st, indicating that times t through t + st 1 have already been scheduled. (st will henceforth be referred to as the state of the server.) We let S := {0, . . . , D + L} to be the set of all possible states and T to be the time horizon. The state of the server at a given time t is naturally a random variable which depends on the earlier jobs and on the adopted policy π. As before, we will denote with s or st the ﬁxed value, and with S or St the corresponding random variable. The price menu will be given by the function π : [T] S L R, i.e., if we are a time t and the server is in state st, then the prices are set according to πt(st, ) : L R. The reported pair (πt(st, ), st) is computed by the scheduler s strategy, which we determine in this paper. Once this is posted, a job (L, V, D) is then sampled i.i.d. from the distribution Q. If V πt(st, ℓ) for some ℓ L, and D st, then the job accepts the schedule, and reports a length ℓ L which minimizes the price. Otherwise, the job reports ℓ= 0 and is not scheduled. To guarantee truthfulness, it sufﬁces to have πt(s, ) be monotonically non-decreasing for every state s: the agent would not want a longer interval since it costs more, and would not want one of the shorter intervals since they cannot run the job. It should be clear that the mechanism s strategy is to always report monotone non-decreasing prices, as a decrease in the price menu would only cause more utilization of the server, without accruing more revenue. The main technical challenge in this paper, then, is to show that under some assumptions, the optimal strategy is monotone non-decreasing, and efﬁciently computable. Revenue Objective. Revenue can be measured in either a ﬁnite or an inﬁnite discounted horizon. In the former (ﬁnite) case, only T time periods will occur, and we seek to maximize the expected sum of revenue. In the inﬁnite-horizon setting, future revenue is discounted, at an exponentially decaying rate. Formally, revenue at time t is worth a γt fraction of revenue at time 0, for some ﬁxed γ < 1 (e.g., [Puterman, 2005]). In this paper, due to space constraints, we are going to focus only on the ﬁnite case. We argue, though, that similar results can be shown also in the inﬁnite discounted setting. Recall that the job parameters are drawn independently at random from the underlying distribution, so the scheduler can only base their price menu on the state of the system, the current time and the information on the distribution Q. Thus, the only realistic strategy is to ﬁx a state-and-timedependent pricing policy π : [T] S L R, πt(s, ℓ) , where [T] := {0, 1, . . . , T}. Let X = {X1 := (1, L1, V1, D1), X2 := (2, L2, V2, D2) . . . } be the random sequence of jobs arriving, sampled i.i.d. from the underlying distribution. Let π : [T] S L R be the pricing policy. We denote as Revt(X, π) the revenue earned at time t with policy π and sequence X. If Xt does not buy, then Revt(X, π) = 0, and otherwise, it is equal to πt(st, Lt). We denote as Cml Rev T the total (cumulative) revenue earned over the T periods. Thus,

Cml Rev T (X, π) := PT t=0 Revt(X, π). We will also need the expected-future-revenue, given a current time and server state, which we will denote as follows:

U π t (s) = EX t h PT i=t Revi(π, X) St = s i , (1)

The subscript of the expectation X t denotes that we consider only jobs arriving from time t onward. Our objective is to ﬁnd the pricing policy π which maximizes U π 0 (s = 0). Call this π , and denote the expected revenue under π as U t ( ).

3 Bayes-optimal Strategies for Server Pricing

In this section we seek to compute an optimal monotone pricing policy π : [T] S L R which maximizes revenue in expectation over T jobs sampled i.i.d. from an underlying known distribution Q. We ﬁrst model the problem of maximizing the revenue in online server pricing as a Markov Decision Process that admits an efﬁciently-computable, optimal pricing strategy. The main contribution of this section is to show that, for a natural assumption on the distribution Q, the optimal policy is monotone. We recall that this allows us to derive truthful Bayes-optimal mechanisms.

3.1 Markov Decision Processes We show that the theory of Markov Decision Processes is well suited to model our problem. A Markov Decision Process is, in its essence, a Markov Chain whose transition probabilities depend on the action chosen at each state, and where to each transition is assigned a reward. A policy is then a function π mapping states to actions. In our setting, the states are the states of the system outlined in Section 2 (i.e., the possible delays before the earliest available time on the server), and the actions are the price menus. At every state s, a job of a random length arrives, and with some probability, chooses to be scheduled, given the choice of prices. The next state is either max{s 1, 0}, if the job does not choose to be scheduled (since we have moved forward in time), or s+ℓ 1, if a job of length ℓis scheduled, since we have occupied ℓ more units. The transition probabilities depend on the distribution of job lengths, and the probability that a job accepts to be scheduled given the pricing policy (action). Formally,

P[st+1 = st + ℓ 1] = ( P [Lt = ℓ, Vt πt(st, ℓ), Dt st + ℓ] if ℓ 1 1 P k 0 P[st+1 = st + k] if ℓ= 0

(Transitions to state 1 should be read as transitions to state 0 .) Note that a job of length ℓmay choose to purchase an interval of length greater than ℓ, which would render these transition probabilities incorrect. However, this may only happen if the larger interval is more affordable. It is therefore in the scheduler s interest to guarantee that πt(s, ) in monotone non-decreasing in ℓ, incentivizing truthfulness, since this increases the amount of server-time available, without affecting revenue. Thus we restrict ourselves to this case. It remains to deﬁne the transition rewards, i.e. the revenue earned. Formally, a transition from state st to st + ℓ 1 incurs a reward of πt(s, ℓ), whereas a transition from state st to st 1 incurs 0 reward. We wish to compute a policy π in such a way as to maximize the expected cumulative revenue, given as the sum of all transition rewards in expectation.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

3.2 Solving for the Optimal Policy with Distributional Knowledge In this section, we present a modiﬁed MDP whose optimal policies can be efﬁciently computed, and show that these policies are optimal for the original MDP. For now, we assume that the mechanism designer is given access to the underlying distribution Q. However, in the following sections, we will show that if the distribution Q is estimated from samples, then solving for the MDP on this estimated distribution is sufﬁcient to ensure sufﬁciently good revenue guarantees. Since the problem has been modelled as a Markov Decision Process (MDP), we may rely on the wealth of literature available on it, in particular we will leverage the backwards induction algorithm (BIA) of [Puterman, 2005] Section 4.5. We will however need to ensure that this standard algorithm (i) runs efﬁciently, and (ii) returns a monotone pricing policy. Note that past prices do not contribute to future revenue insofar as the current state remains unchanged. Thus, to compute optimal current prices, we need only know the current state and expected future revenue. This allows us to use the BIA. The idea is to compute the optimal time-dependent policy, and the incurred expected reward, for shorter horizons, then use this to recursively compute them for longer horizons. The total runtime of the BIA is O(T|S||A|), where S and A denote the action and state spaces, respectively. Note that the dependence on T is unavoidable, since any optimal policy must be time-dependent. Recall that L and D denote the maximum values that L and D can take, respectively, and V is the set of possible values that V can take. Denote K := max{D + L, |V|}. A na ıve action space has |S| = D + L K, and |A| KL. Thus, a na ıve deﬁnition of the MDP bounds the runtime at KO(K), which is intractable. Requiring monotonicity only affects lower-order terms.

Modiﬁed MDP. To avoid this exponential dependence, we can be a little more clever about the deﬁnition of the state space: instead of states being the possible server states, we deﬁne our state space as possible (state, length) pairs. Thus, when the MDP is in state (s, ℓ), the server is in state s, and a job of length ℓhas been sampled from the distribution. Our action-space then is simply the possible values of πt(s, ℓ), and the transition probabilities and rewards become:

P[(s, ℓ) (s , ℓ )|π] =

P[V πt(s, ℓ), D s|L = ℓ]P[L = ℓ ] if s = s + ℓ 1 P[V < πt(s, ℓ) or D < s|L = ℓ]P[L = ℓ ] if s = s 1 0 otherwise

R((s, ℓ) (s , ℓ )|π) = πt(s, ℓ) if s = s + ℓ 1 0 otherwise

Therefore, we get |S| = (D + L) L K2, and |A| K. Thus, the runtime of the algorithm becomes O(TK3). It remains to prove that it is correct. We begin by claiming that these two MDPs are equivalent in the following sense: Lemma 1. For any ﬁxed pricing policy π : [T] S L R, U π t (s) = EL [uπ t (s, L)] , t T, s S, where the U π t ( ) s are as in (1), and the uπ t ( , ) s are from the modiﬁed MDP.

This lemma, however, does not sufﬁce on its own, as agents may behave strategically by over-reporting their length, if the prices are not increasing. This would alter the transition probabilities, breaking the analysis. We will see that under a mild assumption, this can not happen, as the optimal policy for non-strategic agents will be monotone, and therefore truthful. Given this new state space, we call MODIFIEDBIA the classical Backward induction on it.

3.3 Monotonicity of the Optimal Pricing Policies Recall that the solution of the more efﬁcient MDP formulation is only correct if we can show that it is always monotone without considering the strategic behaviour of agents, ensuring incentive-compatibility of the optimum. An optimal monotone strategy cannot be obtained for all the distributions on L, V, and D. As an example, if a job s value is a function of their length (fully correlated), the optimal policy is to price-discriminate by length. If this function is not monotone, the optimum won t be either. To this end, we introduce the following assumption, discussed below.

Assumption 1. The quantity P[V µ ,D s|L=ℓ]

P[V µ,D s|L=ℓ] is monotone non-decreasing in ℓ, for any state s and 0 µ < µ ﬁxed. This is not a natural, or immediately intuitive assumption. However, we will show that it is satisﬁed if the valuation of jobs follows a log-concave distribution which is parametrized by the job s length, and where the valuation is (informally) positively correlated with this length. Log-concave distributions are also commonly referred to as distributions having a monotone hazard rate, and it is common practice in economics to require this property of the agent valuations. Lemma 2. Let, V s ℓdenote the marginal r.v. V conditioned on L = ℓand D s. Let Z be a continuously-supported r.v. and γs 1 γs 2 R. If V s ℓis distributed as γs ℓ Z, γs ℓ Z , Z + γs ℓ, or Z + γs ℓ , then Assumption 1 is satisﬁed if Z is log-concave, or if the γ s do not depend on ℓ. Many standard (discrete) distributions are (discrete) logconcave random variables, including the uniform, Gaussian, logistic, exponential, Poisson, binomial, etc. [Wellner, 2012]. In the above, the γ terms represent a notion of spread or shifting, parametrized by the length, indicating some amount of positive correlation. It remains to show price monotonicity under the above assumption. First, we begin with the following, which holds for arbitrary distributions. Lemma 3. Let U t (s) be the expected future revenue earned starting at time t in state s, for the optimal policy computed by MODIFIEDBIA. Then the function s 7 U t (s) is monotone non-increasing in s for any t ﬁxed. This lemma ensures that over-selling time on the server can only hurt the mechanism. This allows us to conclude Lemma 4. If the job parameters satisfy the above assumption, then for all ℓ, s, t, we have π t (s, ℓ) π t (s, ℓ+ 1). Sketch. The idea is to show that, for any price µ less than the optimum π t (s, ℓ), the difference in revenue between charging µ and π t (s, ℓ) to jobs of length ℓis less than the difference in revenue between the same prices for jobs of length ℓ+ 1. This is achieved by applying the assumption to recursive

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

deﬁnition of future revenue, along with the previous lemma. Thus, we can conclude that the optimal price π t (s, ℓ+ 1) must be greater than π t (s, ℓ).

With Lemma 4 we ﬁnally have: Theorem 1. The online server pricing problem admits an optimal monotone pricing strategy when the variables L, V , and D satisfy assumption 1. Also, when V is ﬁnitely supported, an exact optimum can be computed in time O(TK3).

3.4 Concentration Bounds on Revenue In this section, we show that the revenue of arbitrary policies concentrates around their mean. In particular it holds true for the (near) optimal strategies described above. This also allows us to argue later that, for any estimate ˆQ of Q, the policy given by executing MODIFIEDBIA on the distribution ˆQ will perform well with respect to Q, both in expectation, and with high probability. To show concentration, we consider the Doob or exposure martingale of the cumulative revenue function, introduced in Section 2. Deﬁne

Rπ i := E [Cml Rev T (π, X)|X1, . . . , Xi] (2)

where the Xi s are jobs in the sequence X and the expected value is taken with respect to Xi+1, . . . XT . Thus, Rπ 0 is the expected cumulative revenue, and Rπ T is the random cumulative revenue. To formally describe this martingale sequence, we will introduce some notation, and formalize some previous notation. Recall that X1, X2, . . . is a sequence of jobs sampled i.i.d. from an underlying distribution Q. Fix a pricing policy π : [T] S L R. Note that the state at time t is a random variable depending on both the (deterministic) pricing policy and the (random) X1, . . . , Xt 1. We denote it St(π, X), or St for short. Formally, suppose Xt = (Vt, Lt, Dt), then St+1(π, X) = St(π, X) 1 if either Vt < πt(St, Lt) or Dt < St, and otherwise St+1(π, X) = St(π, X) + Lt 1. Furthermore, let Revt(π, X) be equal to 0 in the ﬁrst case above (the t-th job is not scheduled), and πt(St, Lt) otherwise. Thus, St(π, X) and Revt(π, X) are functions of the random values X1, . . . , Xt for π ﬁxed. Note that Revt implicitly depends on St. Let X>i := (Xi+1, Xi+2, . . .) and X i := (X1, . . . Xi). Recalling that Cml Rev T (X, π) = PT t=1 Revt(X, π), we have that

Rπ i = (Pi t=0 Revt(π, X t) + U π i+1(Si+1(π, X i)).

We wish to show that Cml Rev(X, π) concentrates around its mean. Since Rπ 0 is the expected revenue due to π, and Rπ T is the (random) revenue observed, it sufﬁces to show |Rπ 0 Rπ T | is small, which we will do by applying Azuma s inequality, after showing the bounded-differences property. Theorem 2. Let X be a ﬁnite sequence of T jobs sampled i.i.d. from Q, and let π be any monotone policy. Then, with probability 1 δ, |Cml Rev T (X, π) EX [Cml Rev T (X , π)]| V q

In particular it holds true for the (approximately) optimal pricing strategies of Theorem 1.

4 Robustness of Pricing with Approximate Distributional Knowledge

In this section, we show that results analogous to Theorems 1 and 2 may be obtained even in the case in which we do not have full knowledge of the distribution Q, but only an estimate ˆQ. We then show how to obtain ˆQ from samples.

4.1 Robustness of the Pricing Strategy Suppose, for some ε > 0, we are given some estimate of Q called ˆQ = ( ˆD, ˆL, ˆV ) with the following property: P(ˆL = ℓ, ˆV v, ˆD s) P(L = ℓ, V v, D s) < ε, (3) s S, ℓ L and v V. For the sake of brevity, we abuse notation and denote the condition in (3) as |Q ˆQ| < ε. Later, we will need to estimate the value P[L = ℓ, (V v, D s)], given ˆQ, but this is P[L = ℓ] P[L = ℓ, V v, D s] . The left-hand term is equal to P[L = ℓ,V 0,D 0], and so we have access to both terms. The estimation error is additive, so the deviation is at most 2ε. Denote pℓ t,s := P[V πt(s, ℓ), D s|L = ℓ], and recall

U π t (s) := X

ℓ L P[L = ℓ] pℓ t,s πt(s, ℓ) + U π t+1(s + ℓ 1) +

+(1 pℓ t,s)U π t+1(s 1) ,

the expected revenue from time t onwards, conditioning on St = s. Let ˆU π t ( ) be the same as U π t ( ), but where the variables are distributed as ˆQ. As before, let U t ( ) be U π t ( ) for π = π , the Bayes-optimal policy returned by our Algorithm, and ˆU t ( ) deﬁned similarly but with respect to ˆQ. We will show that ˆU t ( ) is a good estimate for U t ( ).

Lemma 5. Let Q, and ˆQ such that |Q ˆQ| < ε, then, t, s,

|U t (s) ˆU t (s)| < 2(T t)VLε.

4.2 Learning the Job Distribution from Samples

As discussed above, we show here how to compute a ˆQ from samples of Q, such that |Q ˆQ| is small with high probability. In particular we present a sampling procedure which respects the rules of the pricing server mechanism. When a job arrives, we only learn its length, and only if it agrees to be scheduled. Thus, we are not given full samples of Q, complicating the learning procedure. Thanks to the previous section, we know that a policy which is optimal with respect to ˆQ will be closeto-optimal with respect to Q. We remark, however, that the power of the results of the previous section is not exhausted by this application: one may apply directly the robustness results to speciﬁc problems in which the ˆQ is subject to (small) noise or an approximate distribution is already known from other sources. Let X = {(L1, V1, D1), . . . , (Ln, Vn, Dn), } be an i.i.d. sample of n jobs from the underlying distribution Q. Note that the expectation of an indicator is the probability of the indicated event. Fix a length ℓ, a state s, and a value v. As a

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

consequence of H offding s inequality, with probability 1 δ, the following quantity is smaller than p

log(2/δ) / 2n , 1

n Pn k=1 I[Lk = ℓ, Vk v, Dk s] P[L = ℓ, V v, D s] (4)

Sampling Procedure. We wish to estimate the value P[L = ℓ, V v, D s] for all choices of ℓ, v, and s. Fixing v and s, we may repeatedly post prices πt(s, ℓ) = v and declare that the earliest available time is s, then record (i) which job accepts to be scheduled, and (ii) the length of each scheduled job. Let ε > 0 and n log(2/δ)/(2ε2), then by (4), the sample-average of each value will have error at most ε with probability 1 δ, for any one choice of (ℓ, v, s). Repeating this process for all K2 choices of v V and s S gives us estimates for each. For the estimate hold over all choices of ℓ, v, s, it sufﬁces to take the union bound over all K3 values (incl. ℓ), and scaling accordingly. If we take n 3 log(2K/δ)/(2ε2) samples for each of the K2 choices of v and s, then simultaneously for all ℓ, v, and s, the quantity in (4) is at most ε. So we have obtained the |Q ˆQ| < ε condition. It should be noted that, for this sampling procedure, if a job of length ℓis scheduled, we must possibly wait at most ℓtimes units before taking the next sample to clear the buffer. This blows up the sampling time by a factor of O(L). The following result follows immediately from Lemma 5 and H offding s inequality.

Lemma 6. Let n, Q, and ˆQ, be as above. For all ε > 0, if n 6TK4 log(2K/δ)/ε2, we have that with probability 1 δ, |U t (s) ˆU t (s)| < ε for all t, s.

4.3 Performance of the Computed Policy We use here the result of the previous sections to analyze the performance of the policy output by MODIFIEDBIA after the learning procedure. By the estimation of revenue, the best policy in estimated-expectation is near-optimal in expectation. Since revenues from arbitrary policies concentrate, we get near-optimal revenue in hindsight. Formally, for ε > 0, Lemma 6 gives us that if the sampledistribution ˆQ is computed on n 6TK4 log(2K/δ)/ε2 samples, then with probability 1 δ over the samples, |U t (s) ˆU t (s)| ε. Note that U t=0(s = 0) is exactly the expected cumulative revenue of the optimal policy. For clarity of notation, denote ECRev T (π|Q) := EX Q [Cml Rev T (X, π)], then for sufﬁciently many samples

|ECRev T (π |Q) ECRev T (π | ˆQ)| < ε,

with probability 1 δ. This observation allows us to then conclude Theorem 3. Let Q be the underlying distribution over jobs. Let ε > 0, and n 24TK4 log(8K/δ)/ε2. Then in time O(TK3+n L), we may compute a policy ˆπ which is monotone in length, and therefore incentive compatible, such that for any policy π, with probability (1 δ),

Cml Rev T (X, ˆπ)

Cml Rev T (X, π) 2V p

2 log(8/δ)(T + 1) ε

Proof. We have chosen n 6TK4 log(2K/(δ/4))/(ε/2)2. Let π be the optimal policy for the true distribution Q. By Theorem 2, we have

|Cml Rev T (X, π) ECRev T (π|Q)| < V p

2 log(8/δ)(T + 1) with probability 1 δ/4 for both π and ˆπ. Furthermore, by Lemma 6, |ECRev T (π|Q) ECRev T (π| ˆQ)| < ε/2 with probability 1 δ/4, for both π = ˆπ and π . This is because from the point of view of ˆπ, ˆQ is the true distribution, and Q is the estimate. Taking the union bound over all four events above, and recalling that ˆπ maximizes ECRev T (π| ˆQ), and π maximizes ECRev T (π|Q), we get the following with probability 1 δ: Cml Rev T (X, ˆπ)

ECRev T (ˆπ|Q) V q

δ )(T + 1) (Thm. 2)

ECRev T (ˆπ| ˆQ) V q

δ )(T + 1) ε/2 (Lem. 6)

ECRev T (π | ˆQ) V q

δ )(T + 1) ε/2 (defn.)

ECRev T (π |Q) V q

δ )(T + 1) ε (Lem. 6)

ECRev T (π|Q) V q

δ )(T + 1) ε (defn.)

Cml Rev T (X, π) 2V q

δ )(T + 1) ε (Thm. 2)

as desired.

5 Conclusions and Future Work

In summary, we propose to price time on a server by ﬁrst learning the distribution over jobs from samples and then computing the Bayes-optimal policy from the estimated distribution. Our learning algorithm is simple: we sample the distribution through the observation of n jobs at artiﬁcially ﬁxed prices and server-states, and learn the job parameters depending on whether they accept to be scheduled. Using these observations, we build a distribution ˆQ. We then run MODIFIEDBIA on ˆQ and compute an optimal policy ˆπ for ˆQ. We are guaranteed that the policy prices monotonically (due to Lemma 3), and hence it is incentive compatible, which implies the correctness of the estimated revenue. Future Work. There are many natural extensions to this work. For example, one could consider a multi-server setting, settings where jobs can request to be scheduled later than the earliest available time, or settings where jobs need various quantities of differing resources, such as memory and computation time.

Acknowledgements

Shant Boodaghians and Ruta Mehta were partially supported by NSF grant CCF-1750436. Federico Fusco and Stefano Leonardi were supported by ERC Advanced Grant 788893 AMDROMA Algorithmic and Mechanism Design Research in Online Markets and MIUR PRIN project ALGADIMAR Algorithms, Games, and Digital Markets . Yishay Mansour was supported in part by a grant from the Israel Science Foundation (ISF).

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)

[Azar et al., 2015] Yossi Azar, Inna Kalp-Shaltiel, Brendan Lucier, Ishai Menache, Joseph Naor, and Jonathan Yaniv. Truthful online scheduling with commitments. In EC, 2015. [Babaioff et al., 2017] Moshe Babaioff, Yishay Mansour, Noam Nisan, Gali Noti, Carlo Curino, Nar Ganapathy, Ishai Menache, Omer Reingold, Moshe Tennenholtz, and Erez Timnat. Era: A framework for economic resource allocation for the cloud. In Proceedings of the 26th WWW Companion, pages 635 642, 2017. [Blumrosen and Holenstein, 2008] Liad Blumrosen and Thomas Holenstein. Posted prices vs. negotiations: an asymptotic analysis. In Proceedings of the 9th EC, pages 49 49. ACM, 2008. [Carroll and Grosu, 2008] Thomas E. Carroll and Daniel Grosu. An incentive-compatible mechanism for scheduling non-malleable parallel jobs with individual deadlines. In Proceedings of the 2008 ICPP, pages 107 114, 2008. [Chawla et al., 2010] Shuchi Chawla, Jason D. Hartline, David L. Malec, and Balasubramanian Sivan. Multiparameter mechanism design and sequential posted pricing. In Proc s of the 42nd STOC. ACM, 2010. [Chawla et al., 2017] Shuchi Chawla, Nikhil R Devanur, Alexander E Holroyd, Anna R Karlin, James B Martin, and Balasubramanian Sivan. Stability of service under time-of-use pricing. In Procs. of the 49th Annual ACM SIGACT Symp. on Theory of Computing, pages 184 197. ACM, 2017. [Cole and Roughgarden, 2014] Richard Cole and Tim Roughgarden. The sample complexity of revenue maximization. In Proceedings of the 46th STOC, pages 243 252. ACM, 2014. [den Boer, 2015] Arnoud V den Boer. Dynamic pricing and learning: historical origins, current research, and new directions. Surveys in O.R. and management science, 20(1):1 18, 2015. [D utting and Kleinberg, 2015] Paul D utting and Robert Kleinberg. Polymatroid prophet inequalities. In Proceedings of the 23rd Annual European Symposium on Algorithms (ESA), pages 437 449, 2015. [Feldman et al., 2015] Michal Feldman, Nick Gravin, and Brendan Lucier. Combinatorial auctions via posted prices. In Proceedings of the 26th SODA, pages 123 135. ACMSIAM, 2015. [Jain et al., 2013] Navendu Jain, Ishai Menache, Joseph Naor, and Jonathan Yaniv. A truthful mechanism for valuebased scheduling in cloud computing. Theory of Computing Systems, 54:388 406, 2013. [Kilcioglu and Rao, 2016] Cinar Kilcioglu and Justin M. Rao. Competition on price and quality in cloud computing. In WWW, 2016. [Kleinberg and Weinberg, 2012] Robert Kleinberg and Seth M. Weinberg. Matroid prophet inequalities. In

Proceedings of the 44th STOC, pages 123 136. ACM, 2012. [Morgenstern and Roughgarden, 2015] Jamie Morgenstern and Tim Roughgarden. The pseudo-dimension of nearoptimal auctions. In Proceedings of the 28th Neur IPS, Cambridge, MA, USA, 2015. MIT Press. [Morgenstern and Roughgarden, 2016] Jamie Morgenstern and Tim Roughgarden. Learning simple auctions. In 29th COLT, volume 49 of Proceedings of Machine Learning Research, pages 1298 1318, 2016. [Myerson, 1981] Roger B. Myerson. Optimal auction design. Math s. of O.R., 6(1):58 73, 1981. [Nisan and Ronen, 1999] Noam Nisan and Amir Ronen. Algorithmic mechanism design (extended abstract). In Proceedings of the 31st STOC, pages 129 140. ACM, 1999. [Porter, 2004] Ryan Porter. Mechanism design for online real-time scheduling. In Proc s. of the 5th EC, EC 04, pages 61 70. ACM, 2004. [Puterman, 2005] Martin L Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley Series in Probability and Statistics). Wiley-Interscience, 2005. [Sandholm and Gilpin, 2004] Tuomas Sandholm and Andrew Gilpin. Sequences of Take-It-or-Leave-It Offers: Near-Optimal Auctions Without Full Valuation Revelation, pages 73 91. Springer, 2004. [Str ohle et al., 2014] Philipp Str ohle, Enrico H. Gerding, Mathijs de Weerdt, Sebastian Stein, and Valentin Robu. Online mechanism design for scheduling non-preemptive jobs under uncertain supply and demand. In Int l. conference on AAMAS 2014, pages 437 444, 2014. [Tang et al., 2017] Xiaoyong Tang, Xiaochun Li, and Zhuojun Fu. Budget-constraint stochastic task sched. on heterogeneous cloud systems. Concurrency and Comp.: Practice and Experience, 29(19), 2017. [Vickrey, 1961] William Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of Finance, 16(1):8 37, 1961. [Wellner, 2012] Jon A. Wellner. Log-concave distributions: deﬁnitions, properties, and consequences, January 2012.

Proceedings of the Twenty-Ninth International Joint Conference on Artiﬁcial Intelligence (IJCAI-20)