# online_learning_with_primary_and_secondary_losses__301a7701.pdf

Online Learning with Primary and Secondary Losses

Avrim Blum Toyota Technological Institute at Chicago

avrim@ttic.edu

Han Shao Toyota Technological Institute at Chicago

han@ttic.edu

We study the problem of online learning with primary and secondary losses. For example, a recruiter making decisions of which job applicants to hire might weigh false positives and false negatives equally (the primary loss) but the applicants might weigh false negatives much higher (the secondary loss). We consider the following question: Can we combine expert advice to achieve low regret with respect to the primary loss, while at the same time performing not much worse than the worst expert with respect to the secondary loss? Unfortunately, we show that this goal is unachievable without any bounded variance assumption on the secondary loss. More generally, we consider the goal of minimizing the regret with respect to the primary loss and bounding the secondary loss by a linear threshold. On the positive side, we show that running any switching-limited algorithm can achieve this goal if all experts satisfy the assumption that the secondary loss does not exceed the linear threshold by o(T) for any time interval. If not all experts satisfy this assumption, our algorithms can achieve this goal given access to some external oracles which determine when to deactivate and reactivate experts.

1 Introduction

The online learning problem has been studied extensively in the literature and used increasingly in many applications including hiring, advertising and recommender systems. One classical problem in online learning is prediction with expert advice, in which a decision maker makes a sequence of T decisions with access to K strategies (also called experts ). At each time step, the decision maker observes a scalar-valued loss of each expert. The standard objective is to perform as well as the best expert in hindsight. For example, a recruiter (the decision maker) sequentially decides which job applicants to hire with the objective of minimizing errors (of hiring an unqualiﬁed applicant and rejecting a qualiﬁed one). However, this may give rise to some social concerns since the decision receiver has a different objective (getting a job) which does not receive any attention. This problem can be modeled as an online learning problem with the primary loss (for the decision maker) and secondary loss (for the decision receiver). Taking the social impact into consideration, we ask the following question:

Can we achieve low regret with respect to the primary loss, while performing

not much worse than the worst expert with respect to the secondary loss? Unfortunately, we answer this question negatively. More generally, we consider a bicriteria goal of minimizing the regret to the best expert with respect to the primary loss while minimizing the regret to a linear threshold c T with respect to the secondary loss for some c. When the value of c is set to the average secondary loss of the worst expert with respect to the secondary loss, the objective reduces to no-regret for the primary loss while performing no worse than the worst expert with respect to the secondary loss. Other examples, e.g., the average secondary loss of the worst expert with respect to the secondary loss among the experts with optimal primary loss, lead to different criteria of the secondary loss. Therefore, with the notion of regret to the linear threshold, we are able to study a more general goal. Based on this goal, we pose the following two questions:

34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.

1. If all experts have secondary losses no greater than c T + o(T) for some c, can we achieve

no-regret (compete comparably to the best expert) for the primary loss while achieving secondary loss no worse than c T + o(T)? 2. If we are given some external oracles to deactivate some bad experts with unsatisfactory

secondary loss, can we perform as well as each expert with respect to the primary loss during the time they are active while achieving secondary loss no worse than c T + o(T)? These two questions are trivial in the i.i.d. setting as we can learn the best expert with respect to the primary loss within O(log(T)) rounds and then we just need to follow the best expert. In this paper, we focus on answering these two questions in the adversarial online setting.

1.1 Contributions

An impossibility result without a bounded variance assumption We show that without any constraints on the variance of the secondary loss, even if all experts have secondary loss no greater than c T, achieving no-regret with respect to the primary loss and bounding secondary loss by c T + O(T) is still unachievable. This answers our motivation question that it is impossible to achieve low regret with respect to the primary loss, while performing not much worse than the worst expert with respect to the secondary loss. This result explains why minimizing one loss while bounding another is non-trivial and applying existing algorithms for scalar-valued losses after scalarizing primary and secondary losses does not work. We propose an assumption on experts that the secondary loss of the expert during any time interval does not exceed c T by O(T ) for some 2 [0, 1).

Then we study the problem in two scenarios, a good one in which all experts satisfy this assumption and a bad one in which experts partially satisfy this assumption and we are given access to an external oracle to deactivate and reactivate experts.

Our results in the good scenario In the good scenario, we show that running an algorithm with limited switching rounds such as Follow the Lazy Leader [Kalai and Vempala, 2005] and Shrinking Dartboard (SD) [Geulen et al., 2010] can achieve both regret to the best with respect to the primary loss and regret to c T with respect to the secondary loss at O(T

2 ). We also provide a lower bound of (T ).

From another perspective, we relax the good scenario constraint by introducing adaptiveness to the secondary loss and constraining the variance of the secondary loss between any two switchings for any algorithm instead of that of any expert. We show that in this weaker version of good scenario, the upper bound of running switching-limited algorithms matches the lower bound at (T

Our results in the bad scenario In the bad scenario, we assume that we are given an external oracle to determine which experts to deactivate as they do not satisfy the bounded variance assumption. We study two oracles here. One oracle deactivates the experts which do not satisfy the bounded variance assumption once detecting and never reactivates them. The other one reactivates those inactive experts at ﬁxed rounds. In this framework, we are limited to select among the active experts at each round and we adopt a more general metric, sleeping regret, to measure the performance of the primary loss. We provide algorithms for the two oracles with theoretical guarantees on the sleeping regrets with respect to the primary loss and the regret to c T with respect to the secondary loss.

1.2 Related work

One line of closely related work is online learning with multi-objective criterion. A bicriteria setting which examines not only the regret to the best expert but also the regret to a ﬁxed mixture of all experts is investigated by Even-Dar et al. [2008], Kapralov and Panigrahy [2011], Sani et al. [2014]. The objective by Even-Dar et al. [2009] is to learn an optimal static allocation over experts with respect to a global cost function. Another multi-objective criterion called the Pareto regret frontier studied by Koolen [2013] examines the regret to each expert. Different from our work, all these criteria are studied in the setting of scalar-valued losses. The problem of multiple loss functions is studied by Chernov and Vovk [2009] under a heavy geometric restriction on loss functions. For vector losses, one fundamental concept is the Pareto front, the set of feasible points in which none can be dominated by any other point given several criteria to be optimized [Hwang and Masud, 2012, Auer et al., 2016]. However, the Pareto front contains unsatisfactory solutions such as the one minimizing the secondary loss, which implies that learning the Pareto front can not achieve our goal. Another classical concept is approachability, in which a learner aims at making the averaged vector loss converge to a pre-speciﬁed target set [Blackwell et al., 1956, Abernethy et al., 2011]. However,

we show that our fair solution is unapproachable without additional bounded variance assumptions. Approachability to an expansion target set based on the losses in hindsight is studied by Mannor et al. [2014]. However, the expansion target set is not guaranteed to be meet our criteria. Multi-objective criterion has also been studied in multi-armed bandits [Turgay et al., 2018].

We consider the adversarial online learning setting with a set of K experts H = {1, . . . , K}. At round t = 1, 2, . . . , T, given an active expert set Ht H, an online learner A computes a probability distribution pt 2 K over H with support only over Ht and selects one expert from pt. Simultaneously an adversary selects two loss vectors (1)

t 2 [0, 1]K, where (1)

t,h and (2)

t,h are the primary and secondary losses of expert h 2 H at time t. Then A observes the loss vector and incurs expected losses (i)

t for i 2 {1, 2}. Let L(i)

t,h denote the loss of expert h and

t denote the loss of algorithm A for i 2 {1, 2} during the ﬁrst T rounds. We will begin by focusing on the case that the active expert set Ht = H.

2.1 Regret notions

Traditionally, the regret (to the best) is used to measure the scalar-valued loss performance of a learner, which compares the loss of the learner and the best expert in hindsight. Similar to Even-Dar et al. [2008], we adopt the regret notion of A with respect to the primary loss as

Reg(1) , max

We introduce another metric for the secondary loss called regret to c T for some c 2 [0, 1], which compares the secondary loss of the learner with a linear term c T,

Sleeping experts are developed to model the problem in which not all experts are available at all times [Blum, 1997, Freund et al., 1997]. At each round, each expert h 2 H decides to be active or not and then a learner can only select among the active experts, i.e. have non-zero probability pt,h over the active experts. The goal is to perform as well as h in the rounds where h is active for all h 2 H. We denote by Ht the set of active experts at round t. The sleeping regret for the primary loss with respect to expert h is deﬁned as

Sleep Reg(1)(h ) , max

The sleeping regret notion we adopt here is different from the regret to the best ordering of experts in the sleeping expert setting of Kleinberg et al. [2010]. Since achieving the optimal regret bound in Kleinberg s setting is computationally hard [Kanade and Steinke, 2014], we focus on the sleeping regret notion deﬁned above.

2.2 Assumptions

Following a standard terminology, we call an adversary oblivious if her selection is independent of the learner s actions. Otherwise, we call the adversary adaptive. First, we assume that the primary loss is oblivious. This is a common assumption in the online learning literature and this assumption holds throughout the paper.

Assumption 1. The primary losses { (1)

t }t2[T ] are oblivious.

For an expert h 2 H, we propose a bounded variance assumption on her secondary loss: the average secondary loss for any interval does not exceed c much. More formally, the assumption is described as below. Assumption 2. For some given c, δ, 2 [0, 1] and for all expert h 2 H, for any T1, T2 2 [T] with T1 T2,

t,h c) δT .

We show that such a bounded variance assumption is necessary in Section 3. We call a scenario good if all experts satisfy assumption 2. Otherwise, we call the scenario bad . This good constraint can be relaxed by introducing adaptiveness to the secondary loss. We have a relaxed version of the good scenario in which the average secondary loss between any two switchings does not exceed c much for any algorithm. More formally,

Assumption 20. For some given c, δ, 2 [0, 1], for any algorithm A, let At 2 H denote the selected expert at round t. For any expert h 2 H and T1 2 [T] such that AT1 = h and AT1 1 6= h (where AT +1 = TA0 = 0 for notation simplicity), we have

mint>T1:At6=h t 1 X

In the good scenario, the active expert set Ht = H for all rounds and the goal is minimizing both Reg(1) and Reg(2)

c . In the bad scenario, we consider that we are given an oracle which determines Ht at each round and the goal is minimizing Sleep Reg(1)(h ) for all h 2 H and Reg(2)

3 Impossibility result without any bounded variance assumption

In this section, we show that without any additional assumption on the secondary loss, even if all experts have secondary loss no greater than c T for some c 2 [0, 1], there exists an adversary such that any algorithm incurs E[max(Reg(1), Reg(2)

c )] = (T).

Theorem 1. Given a ﬁxed expert set H, there exists an adversary such that any algorithm will incur E[max(Reg(1), Reg(2)

c )] = (T) with c = maxh2H L(2)

T,h/T, where the expectation is taken over the randomness of the adversary.

Proof. To prove this theorem, we construct a binary classiﬁcation example as below.

In a binary classiﬁcation problem, for each sample with true label y 2 {+, } and prediction by 2 {+, }, the primary loss is deﬁned as the expected 0/1 loss for incorrect prediction, i.e., Ey,by

and the secondary loss is deﬁned as the expected 0/1 loss for false negatives, i.e., Ey,by

1{by6=y,y=+}

. We denote by h(b) the expert predicting with probability b and + otherwise. Then every expert can be represented by a sequence of values of b. At round t, the true label is negative with probability a. We divide T into two phases evenly, {1, . . . , T/2} and T/2 + 1, . . . , T, in each of which the adversary generates outcomes with different values of a and two experts H = {h1, h2} have different values of b in different phases. We construct two worlds with different values of a and b in phase 2 and any algorithm should have the same behavior in phase 1 of both worlds. The adversary randomly chooses one world with equal probability. The speciﬁc values of a and b are given in Table 1. Let c = 1/16.

Table 1: The values of a and b in different phases for the binary classiﬁcation example.

experts\phase 1 : a = 5

8 2 : a = 3

4 (world I) 2 : a = 5

8 (world II)

6 b = 0 b = 1

6 h2 b = 0 b = 1

The loss of expert h(b) is (1)

t,h(b) = (1 a)b + a(1 b) and (2)

t,h(b) = (1 a)b. In phase 1 and

phase 2 of world II, (1)

t,h1 = 7/12, (2)

t,h1 = 1/16, (1)

t,h2 = 5/8 and (2)

t,h2 = 0. In phase 2 of world I,

t,h1 = 3/4, (2)

t,h1 = 0, (1)

t,h2 = 1/2 and (2)

t,h2 = 1/8. For any h 2 H, we have L(2)

For any algorithm which selects h1 for T1 (in expectation) rounds in phase 1 and T2 (in expectation) rounds in phase 2 of world I. If T1 T/4, then Reg(1) (T/2 T1)/24 T/96 in world II; else if T1 > T/4 and T2 T1/4, then Reg(1) T2/4 T1/24 T/192 in world I; else Reg(2)

c = T1/16 + (T/2 T2)/8 T/16 = (T1 2T2)/16 T/128 in world I. In any case, we have E[max(Reg(1), Reg(2)

c )] = (T).

The proof of Theorem 1 implies that an expert with total secondary loss no greater than c T but high secondary loss at the beginning will consume a lot of budget for secondary loss, which makes switching to other experts with low primary loss later costly in terms of secondary loss. The theorem answers our ﬁrst question negatively, i.e., we are unable to achieve no-regret for primary loss while performing as well as the worst expert with respect to the secondary loss.

4 Results in the good scenario

In this section, we consider the problem of minimizing max(Reg(1), Reg(2)

c ) with Assumption 2 or 20. We ﬁrst provide lower bounds of (T ) under Assumption 2 and of (T

2 ) under Assumption 20. Then we show that applying any switching-limited algorithms such as Shrinking Dartboard (SD) [Geulen et al., 2010] and Follow the Lazy Leader (FLL) [Kalai and Vempala, 2005] can achieve max(Reg(1), Reg(2)

2 ) under Assumption 2 or 20, which matches the lower bound under Assumption 20.

4.1 Lower bound

Theorem 2. If Assumption 2 holds with some given c, δ, , then there exists an adversary such that any algorithm incurs E[max(Reg(1), Reg(2)

c )] = (T ).

Proof. We construct a binary classiﬁcation example to prove the lower bound.

The losses and the experts H = {h1, h2} are deﬁned based on h(b) in the same way as that in the proof of Theorem 1. We divide T into 3 phases, the ﬁrst two of which have T rounds and the third has T 2T rounds. Each expert has different bs in different phases as shown in Table 2. At each time t, the sample is negative with probability 3/4. We set c = 0.

Since ( (1)

t,h(0), (2)

t,h(0)) = (3/4, 0) and ( (1)

t,h(1), (2)

t,h(1)) = (1/4, 1/4), the cumulative loss for both

experts are (L(1)

T,h) = (3T/4 T /2, T /4). Any algorithm A achieving L(1)

T,h 3T/4 T /4 will incur Reg(2)

Table 2: The values of b in different phases for the binary classiﬁcation example.

experts\phase 1 : T 2 : T 3 : T 2T

h1 b = 1 b = 0 b = 0

h2 b = 0 b = 1 b = 0

Combined with the classical lower bound of (

T) in online learning [Cesa-Bianchi and Lugosi, 2006], E[max(Reg(1), Reg(2)

c )] = (max(T ,

T)). In the relaxed version of the good scenario, we have the following theorem.

Theorem 3. If Assumption 20 holds with some given c, δ, , then there exists an adversary such that any algorithm incurs E[max(Reg(1), Reg(2)

Sketch of the proof Inspired by the proof of the lower bound by Altschuler and Talwar [2018], we construct an adversary such that any algorithm achieving Reg(1) = O(T

2 ) has to switch for some number of times. For the secondary loss, the adversary sets (2)

t,h = c only if h has been

selected for more than T rounds consecutively until time t 1; otherwise (2)

t,h = c + δ. In this case, every switching will increase the secondary loss. Then we can show that either Reg(1) or Reg(2)

2 ). The complete proof can be found in Appendix A.

4.2 Algorithm

Under Assumption 2 or 20, we are likely to suffer an extra δT secondary loss every time we switch from one expert to another. Inspired by this, we can upper bound max(Reg(1), Reg(2)

c ) by limiting the number of switching times. Given a switching-limited learner L on scalar-valued losses, e.g., Shrinking Dartboard (SD) [Geulen et al., 2010] and Follow the Lazy Leader (FLL) [Kalai and Vempala, 2005], our algorithm ASL(L) is described as below.

We divide the time horizon into T 1 epochs evenly and within each epoch we select the same expert. Let ei = {(i 1)T + 1, . . . , i T } denote the i-th epoch and (1)

t,h/T denote

the average primary loss of the i-th epoch. We apply L over { (1)

ei,h}h2H for i = 1, . . . , T 1 . Let s SL(E) and r SL(E) denote the expected number of switching times and the regret of running L for E rounds. Then we have the following theorem. Theorem 4. Under Assumption 2 or 20, given a switching-limited learner L, ASL(L) achieves Reg(1) T r SL(T 1 ) and Reg(2)

c δT (s SL(T 1 ) + 1). By adopting SD or FLL as the learner L, ASL(SD) and ASL(FLL) achieve max(Reg(1), Reg(2)

log(K)T 1+ ).

Proof. It is obvious that Reg(1) T r SL(T 1 ). We denote by S the random variable of the total number of switching times and 1, . . . , S the time steps the algorithm switches. For notation simplicity, let 0 = 1 and S+1 = T + 1. Then Reg(2)

t,At c)] EA[PS

s=0 δT ] = δT (s SL(T 1 ) + 1). Both SD and FLL have s SL(T 1 ) = O(

log(K)T 1 ) and r SL(T 1 ) = O(

log(K)T 1 ) [Geulen et al., 2010, Kalai and Vempala, 2005], which completes the proof.

ASL(SD) and ASL(FLL) match the lower bound at (T

2 ) under Assumption 20. But there is a gap between the upper bound O(T

2 ) and the lower bound (T ) under Assumption 2, which is left as an open question. We investigate this question a little bit by answering negatively if the analysis of ASL(L) can be improved to achieve O(T ). We deﬁne a class of algorithms which depends only on the cumulative losses of the experts, i.e., there exists a function g : R2K 7! K

such that pt = g(L(1)

t 1). Many classical algorithms such as Exponential Weights [Littlestone et al., 1989] and Follow the Perturbed Leader [Kalai and Vempala, 2005] are examples in this class. The following theorem show that any algorithm dependent only on the cumulative losses cannot achieve a better bound than (T

2 ), which provides some intuition on designing algorithms for future work. The detailed proof can be found in Appendix B. Theorem 5. Under Assumption 2, for any algorithm only dependent on the cumulative losses of the experts, E[max(Reg(1), Reg(2)

5 Results in the bad scenario

In the bad scenario, some experts may have secondary losses with high variance. To compete with the best expert in the period in which it has low variance, we assume that the learner is given some ﬁxed external oracle determining which experts to deactivate and reactivate. In this section, we consider the goal of minimizing Sleep Reg(1)(h ) for all h 2 H and Reg(2)

c . Here we study two oracles: one deactivates the unsatisfactory expert if detecting high variance of the secondary loss and never reactivates it again; the other one deactivates the unsatisfactory expert if detecting high variance of the secondary loss and reactivates it at ﬁxed time steps.

5.1 The ﬁrst oracle: deactivating the unsatisfactory experts

The oracle is described as below. The active expert set is initialized to contain all experts H1 = H. At time t = 1, . . . , T, we let Ht = {h 2 Ht : 9t0 t, Pt

,h c) > δT } denote the set of active experts which do not satisfy Assumption 2. Then we remove these experts from the active expert set, i.e., Ht+1 = Ht \ Ht. We assume that there always exist some active experts, i.e. HT 6= ;.

One direct way is running ASL(L) as a subroutine and restarting ASL(L) at time t if there exist experts deactivated at the end of t 1, i.e., Ht 1 6= ;. However, restarting will lead to linear dependency on K for sleeping regrets. To avoid this linear dependency, we construct pseudo primary losses for each expert such that if h is active at time t, e (1)

t,h; otherwise, e (1)

t,h = 1. The probability of selecting inactive experts degenerates due to the high pseudo losses. For those inactive experts we cannot select, we construct a mapping f : H 7! H, which maps each expert to an active expert. If ASL(L) decides to select an inactive expert h at time t, we will select f(h) instead. The detailed algorithm is described in Algorithm 1. Although the algorithm takes as an input, it is worth to mention that the algorithm only uses to decide the length of each epoch. We can choose a different epoch length and derive different regret upper bounds.

Algorithm 1 A1

1: Input: T, H, and a learner L 2: Initialize f(h) = h for all h 2 H. 3: Start an instance ASL(L). 4: for t = 1, . . . , T do 5: Get expert ht from ASL(L). 6: Select expert f(ht).

7: Feed e (1)

t to ASL(L). 8: For all h with f(h) 2 Ht, set f(h) = h0, where h0 is any expert in Ht+1. 9: end for

Theorem 6. Let Th denote the number of rounds where expert h is active. Running Algorithm 1 with learner L being SD or FLL can achieve

Sleep Reg(1)(h ) = O(

log(K)Th T ) , (1)

for all h 2 H and

log(K)T 1+ + KT ) . (2)

Proof. Since (1)

m,h, we have

Sleep Reg(1)(h ) =

log(K)Th T ) ,

where the last step uses the results in Theorem 4. It is quite direct to have Reg(2)

log(K)T 1 + K)) = O(

log(K)T 1+ + KT ), where the ﬁrst term comes from the number of switching times for running ASL and the second term comes from an extra switching caused by deactivating one expert.

For the sleeping regret for expert h , the right hand side in Eq. (1) is o(Th ) if Th = !(T ), which is consistent with the impossibility result without bounded variance in Section 3. When 1/2, the right hand side of Eq. (2) is dominated by KT . This linear dependency on K is inevitable if we want to have Sleep Reg(1)

h = o(Th ) for all h 2 H. The proof is given in Appendix C. Theorem 7. Let Th = !(T ) for all h 2 H. There exists an adversary such that any algorithm achieving Sleep Reg(1)

h = o(Th ) for all h 2 H will incur Reg(2)

c = (KT ) for K = O(log(T)).

5.2 The second oracle: reactivating at ﬁxed times

Now we consider the oracle which deactivates the unsatisfactory experts once detecting and reactivate them at ﬁxed times. The oracle is described as follows. At given N + 1 ﬁxed time steps t0 = 1, t1, . . . , t N with tn+1 tn = (T β) for some β > (where t N+1 = T +1 for notation simplicity), the active expert set Ht is reset to H. At time t = tn, . . . , tn+1 2 for any n = 0, . . . , N, the experts Ht = {h 2 Ht : 9t0 such that tn t0 t, Pt

,h c) > δT } will be deactivated, i.e. Ht+1 = Ht \ Ht. We assume that there always exists some satisfactory experts, i.e. Htn 1 6= ; for all n = 1, . . . , N + 1.

Restarting Algorithm 1 at t = t0, . . . , t N is one of the most direct methods. Let T (n)

h denote the number of rounds h is active during t = tn, . . . , tn+1 1 and Th = PN

h denote the total

number of rounds h is active. Then we have Sleep Reg(1)

log(K)T (n)

log(K)Th T N) and Reg(2)

log(K)T (tn+1 tn) + KδT )) = O(

log(K)T 1+ N + NKT ).

However, if all experts are active all times, then the upper bound of Sleep Reg(1)(h ) for the algorithm of restarting is O(

log(K)T 1+ N) = O(

log(K)T 2+ β), which is quite large. We consider

a smarter algorithm with better sleeping regrets when Th is large. The algorithm combines the methods of constructing meta experts for time-selection functions by Blum and Mansour [2007] to bound the sleeping regrets and inside each interval, we select experts based on SD [Geulen et al., 2010] to bound the number of switching times. We run the algorithm in epochs with length T and within each epoch we play the same expert. For simplicity, we assume that the active expert set will be updated only at the beginning of each epoch, which can be easily generalized. Let ei = {(i 1)T +1, . . . , i T } denote the i-th epoch and E = {ei}i2[T 1 ] denote the set of epochs. We let (1)

t,h/T and (1)

t,At/T denote the average primary loss of expert h and the algorithm. And we let He and He denote the active expert set at the beginning of epoch e and the deactivated expert set at the end of epoch e. Then we deﬁne the time selection function for epoch e as Ih (e) = 1(h is active in epoch e) for each h 2 H. Then we construct K meta experts for each time selection function. Similar to Algorithm 1, we adopt the same expert mapping function f and using pseudo losses e (1)

e,h if h is active and e (1)

e,h = 1 if not. The detailed algorithm is shown as Algorithm 2. Then we have the following theorem, the detailed proof of which is provided in Appendix D. Theorem 8. Running Algorithm 2 can achieve

Sleep Reg(1)(h ) = O(

log(K)T 1+ + Th

log(K)T 1) ,

for all h 2 H and

log(K)T 1+ + log(K)T N + NKT ) .

Algorithm 2 achieves o(Th ) sleeping regrets for h with Th = !(T

2 ) and outperforms restarting

Algorithm 1 when NTh = !(T). Sleep Reg(1)(h ) of Algorithm 2 is O(

h ) when Th = (T), which matches the results in Theorem 4.

Algorithm 2 A2

1: Input: T, H, and 2: Initialize f(h) = h for all h 2 H. 3: wh

K for all h 2 H, for all h 2 H. 4: for m = 1, . . . , T 1 do 5: wm,h = P

h Ih (em)wh

m,h, Wm = P

h wm,h and pm,h = wm,h

6: if m 2 {(tn 1)/T 1 + 1}N

n=0 then get hm from pm. else 7: With prob.

wm,hm 1 wm 1,hm 1 , get hm = hm 1; with prob. 1

wm,hm 1 wm 1,hm 1 , get hm from pm.

8: end if 9: Select expert f(hm).

10: Update wh

m,h Ih (em)(e (1)

em,A)+1 for all h, h 2 H. 11: For all h with f(h) 2 Hem, set f(h) = h0, where h0 is any expert in Hem+1. 12: end for

6 Discussion

We introduce the study of online learning with primary and secondary losses. We ﬁnd that achieving no-regret with respect to the primary loss while performing no worse than the worst expert with respect to the secondary loss is impossible in general. We propose a bounded variance assumption over experts such that we can control secondary losses by limiting the number of switching times. Therefore, we are able to bound the regret with respect to the primary loss and the regret to c T with respect to the secondary loss. Our work is only a ﬁrst step in this problem and there are several open questions.

One is the optimality under Assumption 2. As aforementioned, our bounds of max(Reg(1), Reg(2)

c ) in the good scenario are not tight and we show that any algorithm only dependent on the cumulative losses will have Reg(1) = (T

2 ), which indicates that the optimal algorithm cannot only depends on the cumulative losses if the optimal bound is o(T

2 ). Under Assumption 20, the upper bound of the algorithm of limiting switching matches the lower bound. This possibly implies that limiting switching may not be the best way to make use of the information provided by Assumption 2.

In the bad scenario with access to the oracle which reactivates experts at ﬁxed times, our sleeping regret bounds depend not only on Th but also on T, which makes the bounds meaningless when Th is small. It is unclear if we can obtain optimal sleeping regrets dependent only on Th for all h 2 H. The algorithm of Adanormalhedge by Luo and Schapire [2015] can achieve sleeping regret of O(p Th ) without bound on the number of switching actions. However, how to achieve sleeping regret of o(Th ) with limited switching cost is of independent research interest.

In the bad scenario where Assumption 2 does not hold, we assume that c is pre-speciﬁed and known to the oracle. Theorem 1 show that achieving max(Reg(1), Reg(2)

c ) = o(T) with c = maxh L(2)

T,h is impossible without any external oracle. How to deﬁne a setting an unknown c and design a reasonable oracle in this setting is an open question.

Broader Impact

This research studies a society-constrained online decision making problem, where we take the decision receiver s objective into consideration. Therefore, in a decision making process (e.g. deciding whether to hire a job applicant, whether to approve a loan, or whether to admit a student to an honors class), the decision receiver (e.g., job applicants, loan applicants, students) could beneﬁt from our study at the cost of increasing the loss of the decision maker (e.g., recruiters, banks, universities) a little. The consequences of failure of the system and biases in the data are not applicable.

Acknowledgments and Disclosure of Funding

This work was supported in part by the National Science Foundation under grant CCF-1815011.

Jacob Abernethy, Peter L Bartlett, and Elad Hazan. Blackwell approachability and no-regret learning

are equivalent. In Proceedings of the 24th Annual Conference on Learning Theory, pages 27 46, 2011.

Jason Altschuler and Kunal Talwar. Online learning over a ﬁnite action set with limited switching. In

Conference On Learning Theory, pages 1569 1573, 2018.

Peter Auer, Chao-Kai Chiang, Ronald Ortner, and Madalina Drugan. Pareto front identiﬁcation from

stochastic bandit feedback. In Artiﬁcial intelligence and statistics, pages 939 947, 2016.

David Blackwell et al. An analog of the minimax theorem for vector payoffs. Paciﬁc Journal of

Mathematics, 6(1):1 8, 1956.

Avrim Blum. Empirical support for winnow and weighted-majority algorithms: Results on a calendar

scheduling domain. Machine Learning, 26(1):5 23, 1997.

Avrim Blum and Yishay Mansour. From external to internal regret. Journal of Machine Learning

Research, 8(Jun):1307 1324, 2007.

Nicolo Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge university

press, 2006.

Alexey Chernov and Vladimir Vovk. Prediction with expert evaluators advice. ar Xiv preprint

ar Xiv:0902.4127, 2009.

Eyal Even-Dar, Michael Kearns, Yishay Mansour, and Jennifer Wortman. Regret to the best vs. regret

to the average. Machine Learning, 72(1-2):21 37, 2008.

Eyal Even-Dar, Robert Kleinberg, Shie Mannor, and Yishay Mansour. Online learning for global cost

functions. In COLT, 2009.

Yoav Freund, Robert E Schapire, Yoram Singer, and Manfred K Warmuth. Using and combining

predictors that specialize. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 334 343, 1997.

Sascha Geulen, Berthold Vöcking, and Melanie Winkler. Regret minimization for online buffering

problems using the weighted majority algorithm. In COLT, pages 132 143, 2010.

C-L Hwang and Abu Syed Md Masud. Multiple objective decision making methods and applications:

a state-of-the-art survey, volume 164. Springer Science & Business Media, 2012.

Adam Kalai and Santosh Vempala. Efﬁcient algorithms for online decision problems. Journal of

Computer and System Sciences, 71(3):291 307, 2005.

Varun Kanade and Thomas Steinke. Learning hurdles for sleeping experts. ACM Transactions on

Computation Theory (TOCT), 6(3):1 16, 2014.

Michael Kapralov and Rina Panigrahy. Prediction strategies without loss. In Advances in Neural

Information Processing Systems, pages 828 836, 2011.

Robert Kleinberg, Alexandru Niculescu-Mizil, and Yogeshwer Sharma. Regret bounds for sleeping

experts and bandits. Machine learning, 80(2-3):245 272, 2010.

Wouter M Koolen. The pareto regret frontier. In Advances in Neural Information Processing Systems,

pages 863 871, 2013.

Nick Littlestone, Manfred K Warmuth, et al. The weighted majority algorithm. University of

California, Santa Cruz, Computer Research Laboratory, 1989.

Haipeng Luo and Robert E Schapire. Achieving all with no parameters: Adanormalhedge. In

Conference on Learning Theory, pages 1286 1304, 2015.

Shie Mannor, Vianney Perchet, and Gilles Stoltz. Approachability in unknown games: Online

learning meets multi-objective optimization. In Conference on Learning Theory, pages 339 355, 2014.

Amir Sani, Gergely Neu, and Alessandro Lazaric. Exploiting easy data in online optimization. In

Advances in Neural Information Processing Systems, pages 810 818, 2014.

Eralp Turgay, Doruk Oner, and Cem Tekin. Multi-objective contextual bandit problem with similarity

information. In International Conference on Artiﬁcial Intelligence and Statistics, pages 1673 1681, 2018.