# nonclairvoyant_scheduling_with_partial_predictions__aaea6953.pdf

Non-clairvoyant Scheduling with Partial Predictions

Ziyad Benomar 1 2 Vianney Perchet 1 3

The non-clairvoyant scheduling problem has gained new interest within learning-augmented algorithms, where the decision-maker is equipped with predictions without any quality guarantees. In practical settings, access to predictions may be reduced to specific instances, due to cost or data limitations. Our investigation focuses on scenarios where predictions for only B job sizes out of n are available to the algorithm. We first establish near-optimal lower bounds and algorithms in the case of perfect predictions. Subsequently, we present a learning-augmented algorithm satisfying the robustness, consistency, and smoothness criteria, and revealing a novel tradeoff between consistency and smoothness inherent in the scenario with a restricted number of predictions.

1. Introduction

Optimal job scheduling is a longstanding and actively studied class of optimization problems (Panwalkar & Iskander, 1977; Lenstra & Rinnooy Kan, 1978; Graham et al., 1979; Martel, 1982; Cheng & Sin, 1990; Lawler et al., 1993;

Pinedo, 2012), with applications in various domains spanning from supply chain management (Hall & Potts, 2003; Ivanov et al., 2016) to operating systems (Jensen et al., 1985; Ramamritham & Stankovic, 1994; Steiger et al., 2004). A particular setting is preemptive single-machine scheduling (Pinedo, 2012; Baker & Trietsch, 2013), where n jobs i [n] must be executed on the same machine, with the possibility of interrupting a job and resuming it afterward, and the objective is to minimize the sum of their completion times. An algorithm is called clairvoyant if it has initial access to the job sizes, otherwise, it is called nonclairvoyant (Motwani et al., 1994). The design of nonclairvoyant scheduling algorithms is a classical problem

1ENSAE, FAIRPLAY joint team, CREST, Palaiseau, France 2Ecole polytechnique, Palaiseau, France 3Criteo AI Lab, FAIRPLAY joint team, Paris, France. Correspondence to: Ziyad Benomar <ziyad.benomar@ensae.fr>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

in competitive analysis and online algorithms (Borodin & El-Yaniv, 2005). In this paradigm, decisions must be made in an environment where the parameters governing the outcome are unknown or might evolve over time.

Due to the inherent difficulty of the problems in competitive analysis, the performance of any online algorithm stays bounded away from that of the optimal offline algorithm. However, the ascent of machine learning motivated the incorporation of predictions in algorithm design, which started in works such as (Munoz & Vassilvitskii, 2017; Kraska et al., 2018), then was formalized in (Lykouris & Vassilvtiskii, 2018) and (Purohit et al., 2018). Since then, learningaugmented algorithms became a popular research topic and had multiple applications (Mitzenmacher & Vassilvitskii, 2022). The outcome of these algorithms depends both on the parameters of the problem and the quality of the predictions. They are required to have a performance that is near-optimal when the predictions are accurate (consistency), near the worst-case performance without advice if the predictions are arbitrarily erroneous (robustness), and that degrades smoothly as the prediction error increases (smoothness).

In practice, predictions often incur costs and, at times, are infeasible due to the lack of data. It is, therefore, crucial to understand the limitations and the feasible improvements with a restrained number of predictions in scenarios with multiple unknown variables. This question was first investigated for the caching problem (Im et al., 2022), and very recently for metrical task systems (Sadek & Elias, 2024), in settings where the algorithm is allowed to query a limited number of predictions. It was also explored for the scheduling problem (Benomar & Perchet, 2023), assuming that the decision-maker can query the true sizes of B jobs out of n. The authors present a 2 B(B 1)

n(n 1) -competitive algorithm, and they give a lower bound on the competitive ratio of any algorithm only when B = o(n). The case of imperfect predictions, however, is not examined.

In non-clairvoyant scheduling, besides the querying model studied in the works mentioned above, predictions of the sizes of certain jobs i I may be available, where I [n], perhaps derived from previous executions of similar tasks. Assuming that I is a subset of [n] of size B, taken uniformly at random, we examine the limitations and possible improvements of non-clairvoyant algorithms.

Non-clairvoyant Scheduling with Partial Predictions

1.1. Contributions

We initiate our analysis by addressing the scenario of perfect predictions. We establish a lower bound on the (n, B)- competitive ratio of any algorithm (for fixed n 2 and B n), which extends the lower bound of 2 4 n+3 in the non-clairvoyant case (Motwani et al., 1994). Considering that B = wn + o(n) for some w [0, 1], we derive from the prior bound that the competitive ratio of any algorithm is at least 2 w ( 4

e 1)w(1 w), and we show an improved bound of 2 w (3 2

2)w(1 w). Demonstrating these bounds is considerably more challenging than the case B = 0, due to the eventual dependency between the actions of the algorithm and the known job sizes.

In the case of perfect predictions, we show that knowing only the relative order of the B job sizes, without knowledge of their values, enables a (2 B

n )-competitive algorithm, which improves substantially upon the result of (Benomar & Perchet, 2023). We propose a second algorithm leveraging the true sizes of the B jobs, yielding an (n, B)-competitive ratio of (2 B

n+1 ), which is strictly better than the former, although both are asymptotically equivalent.

Subsequently, we adapt the latter algorithm to handle imperfect predictions. While the difficulty in most works on learning-augmented algorithms lies in ensuring robustness and consistency, smoothness in the case of scheduling with limited predictions is also not immediate. Alongside the typical consistency-robustness tradeoff, our algorithm also exhibits a consistency-smoothness tradeoff. More precisely, governed by two hyperparameters λ, ρ [0, 1], the (n, B)-competitive ratio of the algorithm is at most min( 2 1 λ, C

OPT ). Here, E[η] denotes the total expected prediction error, OPT is the objective function achieved by the optimal offline algorithm, 2 1 λ is the algorithm s robustness, C

λ = 1 λ(2 B

n 1 )) its consistency, and S

n ) its smoothness factor, characterizing the sensitivity of the bound to E[η]. Notably, alterations in the parameter ρ yield opposing variations on the consistency and the smoothness factor. Nonetheless, this tradeoff vanishes for B close to 0 or n, and does not appear, for instance, in (Purohit et al., 2018), (Bampis et al., 2022) or (Lindermayr & Megow, 2022).

We illustrate our results for the case of perfect predictions in Figure 1, comparing them with the competitive ratio proved in (Benomar & Perchet, 2023).

1.2. Related work

Since their introduction in (Purohit et al., 2018; Lykouris & Vassilvtiskii, 2018), learning-augmented algorithms witnessed an exponentially growing interest, as they offered a fresh perspective for revisiting online algorithms, and pro-

0.0 0.2 0.4 0.6 0.8 1.0 Fraction B/n of known job sizes

Competitive ratio

Lower bound (Corollary 2.2) Lower bound (Corollary 2.3) CR of Algorithms 1 and 2 CR of (Benomar & Perchet, 2023)

Figure 1. Lower bounds and competitive ratios for B known job sizes.

vided new applications for machine learning in algorithm design (Mitzenmacher & Vassilvitskii, 2022) and in the implementation of data structures (Kraska et al., 2018; Lin et al., 2022). Many fundamental problems in competitive analysis were studied in this setting, such as ski rental (Gollapudi & Panigrahi, 2019; Anand et al., 2020; Bamas et al., 2020b; Diakonikolas et al., 2021; Antoniadis et al., 2021a; Maghakian et al., 2023; Shin et al., 2023), secretary (Antoniadis et al., 2020; D utting et al., 2021), matching (Dinitz et al., 2021; Chen et al., 2022; Sakaue & Oki, 2022; Jin & Ma, 2022), caching and metrical task systems (Lykouris & Vassilvtiskii, 2018; Chlkedowski et al., 2021; Antoniadis et al., 2023b;a; Christianson et al., 2023). In particular, scheduling is one of the problems that were studied most thoroughly. Different works cover various objective functions (Purohit et al., 2018; Lattanzi et al., 2020; Azar et al., 2021), prediction types (Antoniadis et al., 2021b; Merlis et al., 2023; Lassota et al., 2023), error metrics (Im et al., 2021; Lindermayr & Megow, 2022), and other aspects and applications (Wei & Zhang, 2020; Bamas et al., 2020a; Dinitz et al., 2022)

The setting of learning-augmented algorithms with limited predictions was initially explored by Im et al. (2022) for caching. The authors presented an algorithm using parsimonious predictions, with a competitive ratio increasing with the number of allowed queries. In another very recent paper (Sadek & Elias, 2024), a similar setting is studied for the more general problem of metrical task systems, where the algorithm is allowed to query a reduced number of action predictions (Antoniadis et al., 2023b), each giving the state of an optimal algorithm at the respective query step. An additional related study by Drygala et al. (2023) focuses on the ski-rental and Bahncard problems in a penalized adaptation of the setting with limited advice, where the cost of the predictions is added to the algorithm s objective function. Other works have explored related settings with different types of limited advice. For instance, the setting with a restricted number of perfect hints was examined in the con-

Non-clairvoyant Scheduling with Partial Predictions

text of online linear optimization by (Bhaskara et al., 2021) and in the multi-color secretary problem by (Benomar et al., 2023). Another setting, where the algorithm can query two types of hints one that is free but possibly inaccurate, and another that is expensive but accurate has been studied in several problems, such as correlation clustering (Silwal et al., 2023), computing minimum spanning trees in a metric space (Bateni et al., 2023), sorting (Bai & Coester, 2024), and matroid optimization (Eberle et al., 2024).

In the context of scheduling, Benomar & Perchet (2023) introduced the B-clairvoyant scheduling problem, where an algorithm can query the exact sizes of B jobs at any moment during its execution. They show that the optimal strategy involves querying the sizes of B jobs selected uniformly at random at the beginning of the process. They establish that, if B = o(n), then the competitive ratio of any algorithm is at least 2, then they provide a 2 B(B 1)

n(n 1) -competitive algorithm. The same paper also addresses the secretary problem with restricted access to binary predictions of known accuracy, and the ski-rental problem with access to an oracle whose accuracy improves progressively over time.

The limit scenario B = 0 corresponds to the non-clairvoyant scheduling problem, studied in-depth in (Motwani et al., 1994). In particular, the paper demonstrates that the competitive ratio of any non-clairvoyant algorithm is at least 2, and that it is achieved by the round-robin algorithm, executing all unfinished jobs concurrently at equal rates. On the other hand, B = n corresponds to the setting presented in (Purohit et al., 2018), where the authors introduce, for all λ (0, 1), a preferential round-robin algorithm with robustness 2 1 λ and consistency 1

1.3. Problem and notations

The decision-maker is given n jobs i [n] with unknown sizes x1, . . . , xn to schedule on a single machine, and predictions (yi)i I of (xi)i I, with I a uniformly random subset of [n] of size B. The objective is to leverage the available predictions to minimize the sum of the completion times. We assume that preemption is allowed, i.e. it is possible to interrupt the execution of a job and resume it later, which is equivalent, by neglecting the preemption cost, to assuming that the jobs can be run in parallel at rates that sum to at most 1.

To simplify the presentation, we consider that there are predictions y1, . . . , yn of x1, . . . , xn, but the decision-maker has only access to yσ(1), . . . , yσ(B), where σ is a uniformly random permutation of [n]. We denote by ηi = |xi yi| the error of the prediction yi, and by ησ = PB i=1 ησ(i) the total error of the predictions accessed by the algorithm.

Consider an algorithm A and an instance x = (x1, . . . , xn) of job sizes, we denote by A(x) the sum of the completion

times of all the jobs when executed by A. Furthermore, for all i = j [n] and t > 0, we denote by

SA i (t) the processing time spent on job i until time t,

t A i = inf{t 0 : SA i (t) = xi} its completion time,

DA ij = SA i (t A j ) the total time spent on job i before job j terminates,

and P A ij = DA ij + DA ji the mutual delay caused by i, j to each other.

When there is no ambiguity, we omit writing the dependency to A. With these notations, it holds that t A i = xi+P

j =i DA ji for all i [n]. Consequently, the objective function of A can be expressed as

1 i<j n P A ij . (1)

Observing that, for all i = j [n], if i terminates before j then P A ij xi, otherwise P A ij xj, we deduce that P A ij min(xi, xj). Equality is achieved by the clairvoyant algorithm that runs the jobs until completion in nondecreasing size order (Motwani et al., 1994), which is the optimal offline algorithm, that we denote OPT, satisfying

1 i<j n min(xi, xj) . (2)

When the predictions are perfect, for all n 2 and B n, we define the (n, B)-competitive ratio of algorithm A as the worst-case ratio between its objective, knowing the sizes of B jobs taken uniformly at random, and that of OPT, on instances of n jobs

Rn,B(A) = sup x (0, )n E[A(x)] OPT(x) ,

where the expectation E[A(x)] is taken over the permutation σ and the actions of A if it a randomized algorithm.

If the number of predictions depends on the number of jobs, i.e. B = (Bn)n 1 defines a sequence of integers, then the competitive ratio of A is defined by

CRB(A) = sup n 2 Rn,Bn(A) .

When the predictions are imperfect, the competitive ratio becomes also a function of E[ησ].

Non-clairvoyant Scheduling with Partial Predictions

2. Lower Bounds

In this section, we assume that the predictions are error-free, and we establish lower bounds on the (n, B)-competitive ratio of any algorithm, followed by a lower bound independent of n when Bn = wn + o(n) for some w [0, 1]. These lower bounds are obtained by constructing random job size instances x such that, for any deterministic algorithm A, the ratio Eσ,x[A(x)]/Ex[OPT(x)] is above them. The result then extends to randomized algorithms and yields bounds on their (n, B)-competitive ratio by using Lemma A.1, which is a consequence of Yao s principle (Yao, 1977).

In all this section, we consider i.i.d. job sizes. Therefore, we can assume without loss of generality that the B known job sizes are x1, . . . , x B.

In the non-clairvoyant case B = 0, for any algorithm A, taking i.i.d. exponentially distributed sizes gives, with easy computation, that E[P A ij] = 1 (Remark A.4), which yields, using Equations (1) and (2), the lower bound 2 4 n+3 on the competitive ratio (Motwani et al., 1994). However, if B > 0, the algorithm can act according to the information it has on (xi)i I, and the dependence between its actions and these job sizes makes the analysis more sophisticated.

For any positive and continuous function φ, and positive numbers T x > 0 we denote

Gφ(x, T) = Z T x

dt φ(t) + x φ(T x) . (3)

We prove in the following theorem a generic lower bound, using job sizes sampled independently from the distribution Pr(xi t) = 1 φ(0)

Theorem 2.1. Let φ : [0, ) [0, ) be a continuously differentiable and increasing function satisfying φ(0) > 0, φ /φ is non-increasing and R 0 dt φ(t)2 < , and let αφ a non-negative constant satisfying Z

inf T x Gφ(x, T) φ (x)

φ(x)2 dx αφ

dt φ(t)2 . (4)

If B = wn + o(n) for some w [0, 1], then it holds for any randomized algorithm A that

CRB(A) 2 2(2 αφ)w + (3 2αφ)w2 .

Moreover, if R 0 dt φ(t) < , then for all n 2 and B n

Rn,B(A) Cφ,n,B Cφ,n,B 1

R 0 dt φ(t)2 R 0 dt φ(t)

where Cφ,n,B = (2 B

n ) (3 2αφ) B

To establish this theorem, we analyze i.i.d. job sizes sampled from the distribution Pr(xi t) = 1 φ(0)

derive in Lemma A.5 a lower bound on the mutual delays incurred by these jobs during the run of any algorithm A. This involves solving a functional minimization problem, whose solution is expressed using the function Gφ defined in (3). The left term in Inequality (4) is proportional to the obtained lower bound, while the right term is proportional to E[min(xi, xj)], which is the mutual delay caused in a run of OPT. Finally, using the identity (1), this inequality, which relates the mutual delays caused respectively by executing A and OPT on the chosen job sizes, can be extended to an inequality involving the objectives of both algorithms, giving a lower bound on the competitive ratio.

If R 0 dt φ(t) = , the expectation of the job sizes is infinite. In this case, we consider a truncated distribution with a maximum a > 0. After completing the analysis, we derive a lower bound that depends on a and w, by considering B = wn + o(n) and n , then, we conclude by taking the limit a .

For any value αφ in Theorem 2.1, observe that Cφ,n,0 = 2 and Cφ,n,n = 1, which means that the lower bound interpolates properly the non-clairvoyant and clairvoyant settings. The remaining task is to choose an adequate function φ satisfying the conditions of the theorem with αφ as large as possible. We first consider exponentially distributed job sizes, often used to prove lower bounds in scheduling problems. This corresponds to φ(t) = et.

Corollary 2.2. For any algorithm A, it holds that

Rn,B(A) Cn,B 4(Cn,B 1)

with Cn,B = 2 B

n 1 . In particular, if B = wn + o(n) then

CRB(A) (2 w) ( 4

e 1)w(1 w) .

Corollary 2.2 gives, in particular, that Rn,0(A) 2 4 n+3, which corresponds exactly to the tight lower bound for the non-clairvoyant scheduling problem (Motwani et al., 1994). However, the bound is not tight for all values of B n. To refine it, we consider distributions that would be more difficult to process by the algorithm. One idea is to sample a different parameter λi E(1) independently for each i [n], then sample xi E(λi). The distribution of xi in this case is given by

Pr(xi t) = Z

0 Pr(xi t | λi = λ)e λdλ

0 e (1+t)λdλ = 1 1 + t ,

which corresponds to φ(t) = 1 + t. More generally, we consider φ(t) = (1 + t)r for r ( 1

2, 1]. Such functions

Non-clairvoyant Scheduling with Partial Predictions

φ correspond to distributions with a heavy tale and with infinite expectation (i.e. R 0 dt φ(t) = ). Using Theorem 2.1, they only yield lower bounds on the competitive ratio but not on Rn,B. Corollary 2.3 shows the bound obtained for r 1

Corollary 2.3. Let w [0, 1]. If B = wn + o(n), then it holds for any algorithm A that

CRB(A) (2 w) (3 2

Remark 2.4. If xi is sampled from the distribution induced by φ(t) = (1+t)r, then xi +1 follows a Pareto distribution with scale 1 and shape r (Arnold, 2014), which is commonly used to model the distribution of job sizes in the context of the scheduling problem.

3. Known Partial Order

Before investigating the problem within the learningaugmented framework, we introduce an algorithm exclusively for the scenario with perfect information. Subsequently, in Section 4, we present a second algorithm, that we adapt to handle possibly erroneous predictions.

The optimal algorithm OPT does not necessitate precise knowledge of job sizes. Instead, it relies solely on their ordering. This observation suggests that it might be possible to improve the competitive ratio of the non-clairvoyant case by only knowing the relative order of a subset of the job sizes. Therefore, rather than having access to the values xσ(1), . . . , xσ(B), we assume that the decision-maker is only given a priority ordering π of them, i.e. a bijection π : [B] σ([B]) satisfying xπ(1) . . . xπ(B).

Algorithm 1 Catch-up and Resume Round-Robin (CRRR)

Input: Ordering π of xσ(1), . . . , xσ(1) Set xπ(0) = 0 for i = 1 to B do

Run job π(i) for xπ(i 1) units of time while job π(i) is not finished do

Run round-robin on {σ(j)}j>B {π(i)} end while end for Run round-robin on {σ(j)}n j=B+1 until completion

In Algorithm 1 (CRRR), for all i [B], the execution of job π(i) starts only upon the completion of job π(i 1). At this moment, all jobs σ(j) for j > B are either completed or have undergone execution for xπ(i 1) units of time. CRRR then runs job π(i) for a period of length xπ(i 1) to catch up with the progress of the jobs {σ(j)}j>B. Following this synchronization phase, it runs round-robin on the set of jobs {σ(j)}j>B {π(i)} until π(i) terminates. The same process iterates with π(i + 1) afterward. Once all the jobs

σ(i) for i [B] are completed, the algorithm runs round robin on the unfinished jobs in {σ(j)}j>B.

Leveraging the ordering π, the algorithm aims to minimize the delays caused by longer jobs to shorter ones. In the ideal scenario where B = n, each job begins execution only after all shorter ones have been completed. When B < n, it is evident that the jobs {σ(i)}i [B] should be executed in the order specified by π. However, CRRR takes advantage of this ordering even further, ensuring that job xπ(i) not only avoids delaying xπ(i 1) but also does not delay any job σ(j) with j > B that has a size at most xπ(i 1).

Theorem 3.1. Algorithm CRRR satisfies

n ) (n + 1)(B + 1) Rn,B(CRRR) 2 B

Moreover, if B = wn for some w [0, 1], then CR(CRRR) = 2 w.

Theorem 3.1 shows a substantially stronger result than the one presented in (Benomar & Perchet, 2023), where the algorithm leveraging the values of the job sizes xσ(1), . . . , xσ(B) is only 2 B(B 1)

n(n 1) -competitive.

Action predictions The information provided to CRRR is the order in which the jobs {σ(i)}i [B] would be executed by OPT. This corresponds to the error-free scenario of action predictions (Antoniadis et al., 2023b; Lindermayr & Megow, 2022; Lassota et al., 2023; Sadek & Elias, 2024), where the decision-maker receives predictions regarding the actions taken by the optimal offline algorithm, rather than numeric predictions of unknown parameters. In the context of the scheduling problem, utilizing the ℓ1 norm to measure the error is not ideal for analyzing the action prediction setting (Im et al., 2021). Alternative error metrics, which account for the number of inversions in the predicted permutation in comparison to the true one (Lindermayr & Megow, 2022), would be more suitable. Therefore, adapting CRRR to imperfect action predictions is left for future research as it requires different considerations. For now, we shift our focus to introducing another algorithm that utilizes not only the priority order induced by the job sizes, but the size values themselves.

4. Predictions of the Job Sizes

We propose in this section a generic algorithm Switch, which we will adapt in the cases of perfect and imperfect predictions. The algorithm takes as input n jobs with unknown sizes and breakpoints zσ(1), . . . , zσ(B) that depend on the predictions of xσ(1), . . . , xσ(B), then it alternates running round-robin on the jobs {σ(j)}j>B and Shortest Predicted Job First (SPJF), introduced in (Purohit et al., 2018), on the

Non-clairvoyant Scheduling with Partial Predictions

jobs {σ(i)}i [B], where the moment of switching from an algorithm to another is determined by the breakpoints.

As in Section 3, we call ordering of zσ(1), . . . , zσ(B) any bijective application π : [B] σ([B]) satisfying zπ(1) . . . zπ(B). Note that, if the breakpoints are not pairwise distinct, then the ordering is not unique. In that case, Switch chooses an ordering π uniformly at random. We assume furthermore that the breakpoints induce the same order as the predictions, i.e. zσ(i) < zσ(j) yσ(i) < yσ(j) for all i, j [B].

Algorithm 2 Switch algorithm Switch(zσ, x)

Input: Breakpoints zσ = (zσ(i))i [B] π ordering of zσ chosen uniformly at random for i = 1 to B do

while min j>B

xσ(j) < 1 and max j>B Sσ(j)(t) < zπ(i) do

Run round-robin on {σ(j)}n j=B+1 end while Run job π(i) until completion end for Run round-robin on {σ(j)}n j=B+1 until completion

Consider a run of Switch, and let i [B]. The first condition for entering the while loop is the existence of j > B such that Sσ(j)(t) < xσ(j). This signifies that the jobs {σ(j)}j>B are not all completed, which is a verification feasible for the decision-maker without knowledge of the sizes {xσ(j)}j>B. The second condition means that no job σ(j) for j > B has been in execution for more than zπ(i) units of time. Given that round-robin allocates equal importance to all jobs {σ(j)}j>B, upon exiting the while loop, each job σ(j) is either completed or has been in execution for precisely zπ(i) units of time. Following this step, job π(i) is executed until completion, and the same process recurs for i + 1.

This algorithm ensures that any job xσ(i) with i B does not delay any other job j whose size is at most xj zσ(i), and the delay it causes to jobs not satisfying this condition is exactly xσ(i). This allows efficient control of the mutual delays between the jobs by conveniently choosing the breakpoints.

4.1. Perfect Predictions

Assuming that the predictions are perfect, i.e. the decisionmaker knows the exact sizes of jobs σ(1), . . . , σ(B), it is possible to set zσ(i) = xσ(i) for all i [B].

Theorem 4.1. Algorithm Switch with breakpoints zσ(i) = xσ(i) for all i [B] satisfies

Rn,B(Switch) = 2 B

n ) n + 1 .

In particular, if B = wn for some w [0, 1] then CRB(Switch) = 2 w.

Note that the (n, B)-competitive ratio above is strictly better than that of CRRR, presented in Theorem 3.1. However, both algorithms have equivalent performance when n is large. In particular, their competitive ratios coincide when B = wn .

A slight improvement on the (n, B)-competitive ratio can be obtained by introducing randomness into Switch. Indeed, consider the Run To Completion algorithm (RTC) defined in (Motwani et al., 1994), executing all the jobs until completion in a uniformly random order. Then we have the following result.

Proposition 4.2. The algorithm that runs RTC with probability 2(n B) n(n+3) 2B , and runs Switch with breakpoints zσ(i) = xσ(i) for all i [B] with the remaining probability, has an (n, B)-competitive ratio of

n ) n + 3 2B

For B = 0, the ratio above becomes 2 4 n+3, which is the best possible in the non-clairvoyant setting.

4.2. Imperfect Predictions

We assume in this section that no quality guarantees are given on the predictions {yσ(i)}i [B]. Recall that the total error ησ = PB i=1 |xσ(i) yσ(i)| is a random variable because σ is a uniformly random permutation of [n], hence our results will depend on E[ησ].

The goal is to design an algorithm that is consistent, robust, and with a competitive ratio having a smooth dependency to E[ησ]. We first study the consistency and smoothness of Switch with well-chosen breakpoints, then we show that combining it with round-robin as in (Purohit et al., 2018; Lassota et al., 2023) gives robustness guarantees.

Using the trivial breakpoints zσ(i) = yσ(i) as in the previous section is not enough to guarantee smoothness. Consider, for example, job sizes all equal to 1, and B predictions yσ(i) = 1 ϵ for an arbitrarily small ϵ. Blindly following these predictions, taking zσ(i) = yσ(i) for all i [B], results in delaying all jobs with unknown sizes by B time units compared to the case of perfect predictions. This creates a discontinuity in the competitive ratio when ϵ becomes positive, proving non-smoothness. Hence, we consider instead randomized breakpoints.

Algorithm 3 simply runs Switch with random breakpoints zσ(i) = ξyσ(i). The following lemma gives an upper bound on the algorithm s objective function depending on the distribution F of ξ.

Non-clairvoyant Scheduling with Partial Predictions

Algorithm 3 imperfect predictions Switch(ξyσ, x)

Input: predictions (yσ(i))i [B], distribution F Sample ξ F Run Switch with breakpoints zσ(i) = ξyσ(i)

Lemma 4.3. Let F be a probability distribution on (0, ), and consider the mappings h F : (0, )2 R and g F : (0, ) R defined by

h F (s, t) = t Prξ F (ξ < s

g F (s) = (1 s)Prξ F (ξ < s) + Eξ F [ξ1ξ<s] .

Let βF = sups (0,1] g F (s)

s and γF = sups 1(g F (s) + s). If βF , γF < and h F is LF -Lipschitz w.r.t. the second variable t, then for any job sizes x1, . . . , xn and B n, the expected sum of the completion times achieved by Switch with breakpoints zσ(i) = ξyσ(i) is at most

i=1 xi + C1 n,B,F X

i<j min(xi, xj) + C2 n,B,F E[ησ] ,

with C1 n,B,F = 2 B

n 2 βF γF B

n 1 and C2 n,B,F = (1 + LF + E[ξ])(n B) + B 1.

A trivial choice of ξ is the constant random variable equal to 1 a.s., but this is not enough to guarantee smoothness, as it corresponds to the Dirac distribution F = δ1, for which h F is not continuous w.r.t. to t. In the next lemma, we provide a specific choice of distribution F depending on a single parameter ρ, and we express the upper bound from the previous lemma using this parameter.

Lemma 4.4. Let ρ (0, 1] and

F : s 7 (1 e (s 1)/ρ)1s>1

a shifted exponential distribution with parameter 1/ρ, i.e. ξ 1 + E(1/ρ), then Switch with breakpoints zσ(i) = ξyσ(i) for all i [B] has an (n, B)-competitive ratio of at most 2 B

n 1 ) + 4 ρ(1 B

The previous lemma highlights a tradeoff between the smoothness and the consistency of the algorithm. Indeed, as ρ decreases, the algorithm gains in consistency, but the term 4 ρ(1 B

n multiplying E[ησ] becomes larger. However, while setting ρ close to zero results in an arbitrarily high sensitivity to the error, setting it close to 1 gives consistency of at most 2 B(B 1)

n(n 1) , which is still a decreasing function of B, interpolating the values 2 and 1 in the clairvoyant and non-clairvoyant cases. This implies

that sacrificing a small amount of consistency significantly improves smoothness.

For B = n, assuming that all the job sizes are at least 1, it holds that OPT(x) n(n + 1)/2 and the lemma gives Rn,B(η; Switch) 1 + 2η

n , matching the bound proved on (SPJF) in (Purohit et al., 2018). On the other hand, if η = 0, setting ρ = 0 results in Rn,B(0; Switch) 2 B

n . Using tighter inequalities in the proof, it is possible to retrieve the bound established in Theorem 4.1 (See Inequality (36) in Appendix C).

Preferential algorithm Now we need to adapt the algorithm to guarantee robustness in the face of arbitrarily erroneous predictions. We use the same approach as Lemma 3.1 of (Purohit et al., 2018), which consists of running concurrently a consistent algorithm and round-robin at respective rates λ, 1 λ for some λ [0, 1]. However, their result only applies for deterministic algorithms A satisfying for any instances x = (x1, . . . , xn) and x = (x 1, . . . , x n) that i [n] : xi x i = A(x) A(x ) .

Such algorithms are called monotonic. Switch with breakpoints zσ(i) = ξyσ(i) is not deterministic since its objective function depends both on σ and ξ. Nonetheless, we overcome this difficulty by proving that, conditionally to σ and ξ, its outcome is deterministic and monotonic, then we establish the following theorem.

Theorem 4.5. Let ρ (0, 1] and F = 1+E(1/ρ). Then the preferential algorithm ALGλ which runs Algorithm 3 at rate λ and round-robin at rate 1 λ has an (n, B)-competitive ratio of at most

min 2 1 λ , Cρ,n,B

Cρ,n,B = 2 B

This upper bound generalizes that of (Purohit et al., 2018). It presents a consistency-robustness tradeoff that can be tuned by adjusting the parameter λ, and a consistency-smoothness tradeoff controlled by the parameter ρ, which vanishes for B close to 0 or n, as the terms multiplying ρ and 1/ρ respectively in Cρ,n,B and Sρ,n,B become zero.

5. Experiments

In this section, we validate our theoretical findings by testing the algorithms we presented on various benchmark job sizes. In all the figures, each point is averaged over 104

independent trials.

Non-clairvoyant Scheduling with Partial Predictions

Perfect information We test the performance of Algorithms Switch and CRRR against the hard instances used to prove the lower bounds of Section 2: we consider i.i.d. job sizes sampled from the exponential distribution with parameter 1, and job sizes drawn from the distribution Φ(r, a) with parameters r = 0.51 and a = 104, characterized by the tail probability

Pr(xa i t) = (1 + t) r (1 + a) r

1 (1 + a) r 1t<a ,

This distribution is a truncated version of the one defined by Pr(x i t) = 1 (1+t)r . The bound of Corollary 2.3 is obtained by using this distribution for a > 0 and r ( 1

2, 1), and taking the limits n , a , and r 1/2.

Switch CRRR Cor 2.3

xi Φ(0.51,104)

Switch CRRR Cor 2.4

0 1/2 1 1.00

2.00 Switch CRRR Cor 2.3

Switch CRRR Cor 2.4

Empirical competitive ratio

fraction B/n of known job sizes

Figure 2. Lower bounds and ratios of Switch, CRRR

Figure 2 exhibits the empirical ratios achieved by both algorithms with a number n {20, 1000} of jobs. For n = 20, Switch outperforms CRRR for the two distributions, whereas their ratios are very close for n = 1000. This confirms that Switch and CRRR are asymptotically equivalent, as can be deduced from Theorems 3.1 and 4.1. For the exponential distribution, as expected, both algorithms have ratios above the non-asymptotic lower bound of Corollary 2.2. Meanwhile, considering the distribution Φ(0.51, 104), the empirical ratios for n = 20 are below the lower bound of Corollary 2.3, because it is proved by taking n . For n = 1000, the ratios match the lower bound.

Preferential algorithm In the remaining discussion, we refer to Switch with breakpoints zσ(i) = ξyσ(i) and ξ 1 + E(1/ρ), as Switch with parameter ρ.

We generate a synthetic instance of n = 50 job sizes, drawn independently from the Pareto distribution with scale 1 and shape 1.1. The Pareto distribution, known for its heavy

tail, is particularly suitable for modeling job sizes (Harchol Balter & Downey, 1997; Bansal & Harchol-Balter, 2001; Arnold, 2014), and it is a commonly used benchmark for learning-augmented scheduling algorithms (Purohit et al., 2018; Lindermayr & Megow, 2022). Furthermore, we consider noisy predictions yi = xi + εi for all i [50], where εi is sampled independently from a normal distribution with mean 0 and standard deviation τ.

0 20 40 60 80 100 Error parameter τ

Emprical competitive ratio

PA(ρ,λ) with B = n/2

Round-robin Switch(ρ = 0.5) PA(ρ = 0,λ = 0.5) PA(ρ = 0.5,λ = 0.5)

0 100 200 300 400 500 Error parameter τ

Switch(ρ = 0.5,B) for B [1,n]

B=0 B = 10 B = 20

B = 30 B = 35 B = 50

Figure 3. Preferential Algorithm (PA) with different parameters

Figure 3 illustrates the empirical ratio of the Preferential Algorithm (PA) across various parameter configurations, with varying error parameter τ.

The left plot displays the ratios for different λ and ρ values, with B = 25 = n/2. When λ = 0, PA becomes roundrobin. For λ = 1 and ρ = 0.5, PA simply runs Switch(ρ = 0.5), which gives an improved consistency (τ = 0), not equal to 1 because B < n and ρ > 0, and gives a ratio that deteriorates arbitrarily as τ increases. In contrast, PA with λ = 0.5 gives a weaker consistency but maintains bounded ratios, even with arbitrarily erroneous predictions. The choice of ρ = 0 exhibits a slightly better consistency compared to ρ = 0.5, in line with theoretical expectations, but there is no significant difference regarding sensitivity to errors. This should not be surprising since setting ρ > 0 ensures smoothness in the worst-case (see Figure 4), but it is not necessarily needed for all instances.

The right plot examines the influence of B on PA with parameters λ = 1 and ρ = 0.5, which corresponds to Switch with ρ = 0.5. Larger B values improve consistency and also yield a smaller sensitivity to small errors. However, for high τ values, having numerous predictions leads to faster performance deterioration compared to having fewer predictions. This shows that more predictions enhance consistency, while fewer predictions enhance robustness.

Consistency-smoothness To shed light on the tradeoff between consistency and smoothness raised in Section 4.2, we consider i.i.d. job sizes x1, . . . , x100, each taking the value 1 w.p. 1/2 and 2 w.p. 1/2, and we consider noisy predictions of the form yi = xi + εi, where εi follows a uniform distribution over [ τ, τ]. Figure 4 illustrates the evolution, for τ varying in [0, 0.15], of the empirical competitive ratio of Switch with parameter ρ {0, 0.1, 0.5}

Non-clairvoyant Scheduling with Partial Predictions

and B {50, 95},

For both values of B, the experiment reveals that larger values of ρ give bigger ratios when τ = 0 (less consistency), but on the other hand they yield less sensitivity to variations of the expected prediction error (better smoothness), which confirms our theoretical findings. In particular, for ρ = 0, a significant discontinuity arises when τ becomes positive. Figure 4 also shows that this tradeoff is less significant as B approaches n = 100, with the consistency values for ρ {0, 0.1, 0.5} drawing closer.

0.00 0.05 0.10 0.15 Error parameter τ

Empirical competitive ratio

ρ = 0, B = 50 ρ = 0.1, B = 50 ρ = 0.5, B = 50

0.00 0.05 0.10 0.15 Error parameter τ

ρ = 0, B = 95 ρ = 0.1, B = 95 ρ = 0.5, B = 95

Figure 4. Tradeoff between consistency and smoothness

6. Conclusion and Future Work

This paper explores the non-clairvoyant scheduling problem with a limited number of predicted job sizes. We give near optimal lower and upper bounds in the case of perfect predictions, and we introduce a learning-augmented algorithm raising the common consistency-robustness tradeoff and an additional consistency-smoothness tradeoff, the latter vanishing when B approaches 0 or n.

Our findings join previous works in demonstrating that online algorithms can indeed achieve improved performance even when armed with a restricted set of predictions, which is an assumption more aligned with practical scenarios. Furthermore, they affirm the necessity of studying and understanding these regimes, as they may unveil unique behaviors absent in the zeroor full-information settings.

6.1. Open Questions

Tight lower bounds In the case of perfect predictions, there is a (small) gap between the lower bounds of Section 2 and the competitive ratios of Switch and CRRR. An interesting research avenue is to close this gap, either by designing better algorithms or improving the lower bound. This could involve using Theorem 2.1 with more refined distributions.

Reduced number of action predictions Algorithm Switch leverages the job sizes predictions, not only the order they induce. Using the ℓ1 norm to measure the error is thus a suitable choice. However, as discussed in Section 3, Algorithm CRRR only uses the priority order in which

OPT runs (xσ(i))i [B]. An interesting question to explore is how to adapt it in the case of imperfect action predictions, using appropriate error measures.

Smooth and (2 B

n )-consistent algorithm Lemma 4.4 and Figure 4 emphasize that, to achieve smoothness, Switch with parameter ρ must exhibit a consistency exceeding 2 B

n . A compelling question arises: Is it possible to devise a smooth algorithm with a consistency of at most 2 B

Impact Statement

This paper presents a work whose goal is to advance the field of learning-augmented algorithms. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here.

Acknowledgements

Vianney Perchet acknowledges support from the French National Research Agency (ANR) under grant number (ANR19-CE23-0026 as well as the support grant, as well as from the grant Investissements d Avenir (Lab Ex Ecodec/ANR11-LABX-0047).

Anand, K., Ge, R., and Panigrahi, D. Customizing ml predictions for online algorithms. In International Conference on Machine Learning, pp. 303 313. PMLR, 2020.

Antoniadis, A., Gouleakis, T., Kleer, P., and Kolev, P. Secretary and online matching problems with machine learned advice. Advances in Neural Information Processing Systems, 33:7933 7944, 2020.

Antoniadis, A., Coester, C., Eli as, M., Polak, A., and Simon, B. Learning-augmented dynamic power management with multiple states via new ski rental bounds. Advances in Neural Information Processing Systems, 34:16714 16726, 2021a.

Antoniadis, A., Ganje, P. J., and Shahkarami, G. A novel prediction setup for online speed-scaling. ar Xiv preprint ar Xiv:2112.03082, 2021b.

Antoniadis, A., Boyar, J., Eli as, M., Favrholdt, L. M., Hoeksma, R., Larsen, K. S., Polak, A., and Simon, B. Paging with succinct predictions. In International Conference on Machine Learning, pp. 952 968. PMLR, 2023a.

Antoniadis, A., Coester, C., Eli aˇs, M., Polak, A., and Simon, B. Online metric algorithms with untrusted predictions. ACM Transactions on Algorithms, 19(2):1 34, 2023b.

Arnold, B. C. Pareto distribution. Wiley Stats Ref: Statistics Reference Online, pp. 1 10, 2014.

Non-clairvoyant Scheduling with Partial Predictions

Azar, Y., Leonardi, S., and Touitou, N. Flow time scheduling with uncertain processing time. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pp. 1070 1080, 2021.

Bai, X. and Coester, C. Sorting with predictions. Advances in Neural Information Processing Systems, 36, 2024.

Baker, K. R. and Trietsch, D. Principles of sequencing and scheduling. John Wiley & Sons, 2013.

Bamas, E., Maggiori, A., Rohwedder, L., and Svensson, O. Learning augmented energy minimization via speed scaling. Advances in Neural Information Processing Systems, 33:15350 15359, 2020a.

Bamas, E., Maggiori, A., and Svensson, O. The primal-dual method for learning augmented algorithms. Advances in Neural Information Processing Systems, 33:20083 20094, 2020b.

Bampis, E., Dogeas, K., Kononov, A. V., Lucarelli, G., and Pascual, F. Scheduling with untrusted predictions. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, pp. 4581 4587, 2022.

Bansal, N. and Harchol-Balter, M. Analysis of srpt scheduling: Investigating unfairness. In Proceedings of the 2001 ACM SIGMETRICS International conference on Measurement and modeling of computer systems, pp. 279 290, 2001.

Bateni, M., Dharangutte, P., Jayaram, R., and Wang, C. Metric clustering and mst with strong and weak distance oracles. ar Xiv preprint ar Xiv:2310.15863, 2023.

Benomar, Z. and Perchet, V. Advice querying under budget constraint for online algorithms. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.

Benomar, Z., Chzhen, E., Schreuder, N., and Perchet, V. Addressing bias in online selection with limited budget of comparisons. ar Xiv preprint ar Xiv:2303.09205, 2023.

Bhaskara, A., Cutkosky, A., Kumar, R., and Purohit, M. Logarithmic regret from sublinear hints. Advances in Neural Information Processing Systems, 34:28222 28232, 2021.

Borodin, A. and El-Yaniv, R. Online computation and competitive analysis. cambridge university press, 2005.

Chen, J., Silwal, S., Vakilian, A., and Zhang, F. Faster fundamental graph algorithms via learned predictions. In International Conference on Machine Learning, pp. 3583 3602. PMLR, 2022.

Cheng, T. and Sin, C. A state-of-the-art review of parallelmachine scheduling research. European Journal of Operational Research, 47(3):271 292, 1990.

Chlkedowski, J., Polak, A., Szabucki, B., and .Zolna, K. T. Robust learning-augmented caching: An experimental study. In International Conference on Machine Learning, pp. 1920 1930. PMLR, 2021.

Christianson, N., Shen, J., and Wierman, A. Optimal robustness-consistency tradeoffs for learning-augmented metrical task systems. In International Conference on Artificial Intelligence and Statistics, pp. 9377 9399. PMLR, 2023.

Diakonikolas, I., Kontonis, V., Tzamos, C., Vakilian, A., and Zarifis, N. Learning online algorithms with distributional advice. In International Conference on Machine Learning, pp. 2687 2696. PMLR, 2021.

Dinitz, M., Im, S., Lavastida, T., Moseley, B., and Vassilvitskii, S. Faster matchings via learned duals. Advances in neural information processing systems, 34:10393 10406, 2021.

Dinitz, M., Im, S., Lavastida, T., Moseley, B., and Vassilvitskii, S. Algorithms with prediction portfolios. Advances in neural information processing systems, 35: 20273 20286, 2022.

Drygala, M., Nagarajan, S. G., and Svensson, O. Online algorithms with costly predictions. In International Conference on Artificial Intelligence and Statistics, pp. 8078 8101. PMLR, 2023.

D utting, P., Lattanzi, S., Paes Leme, R., and Vassilvitskii, S. Secretaries with advice. In Proceedings of the 22nd ACM Conference on Economics and Computation, pp. 409 429, 2021.

Eberle, F., Hommelsheim, F., Lindermayr, A., Liu, Z., Megow, N., and Schl oter, J. Accelerating matroid optimization through fast imprecise oracles. ar Xiv preprint ar Xiv:2402.02774, 2024.

Gollapudi, S. and Panigrahi, D. Online algorithms for rentor-buy with expert advice. In International Conference on Machine Learning, pp. 2319 2327. PMLR, 2019.

Graham, R. L., Lawler, E. L., Lenstra, J. K., and Kan, A. R. Optimization and approximation in deterministic sequencing and scheduling: a survey. In Annals of discrete mathematics, volume 5, pp. 287 326. Elsevier, 1979.

Hall, N. G. and Potts, C. N. Supply chain scheduling: Batching and delivery. Operations Research, 51(4):566 584, 2003.

Non-clairvoyant Scheduling with Partial Predictions

Harchol-Balter, M. and Downey, A. B. Exploiting process lifetime distributions for dynamic load balancing. ACM Transactions on Computer Systems (TOCS), 15(3):253 285, 1997.

Im, S., Kumar, R., Montazer Qaem, M., and Purohit, M. Non-clairvoyant scheduling with predictions. In ACM Symposium on Parallelism in Algorithms and Architectures, 2021.

Im, S., Kumar, R., Petety, A., and Purohit, M. Parsimonious learning-augmented caching. In International Conference on Machine Learning, pp. 9588 9601. PMLR, 2022.

Ivanov, D., Dolgui, A., Sokolov, B., Werner, F., and Ivanova, M. A dynamic model and an algorithm for short-term supply chain scheduling in the smart factory industry 4.0. International Journal of Production Research, 54(2): 386 402, 2016.

Jensen, E. D., Locke, C. D., and Tokuda, H. A time-driven scheduling model for real-time operating systems. In Rtss, volume 85, pp. 112 122, 1985.

Jin, B. and Ma, W. Online bipartite matching with advice: Tight robustness-consistency tradeoffs for the two-stage model. Advances in Neural Information Processing Systems, 35:14555 14567, 2022.

Kraska, T., Beutel, A., Chi, E. H., Dean, J., and Polyzotis, N. The case for learned index structures. In Proceedings of the 2018 international conference on management of data, pp. 489 504, 2018.

Lassota, A. A., Lindermayr, A., Megow, N., and Schl oter, J. Minimalistic predictions to schedule jobs with online precedence constraints. In International Conference on Machine Learning, pp. 18563 18583. PMLR, 2023.

Lattanzi, S., Lavastida, T., Moseley, B., and Vassilvitskii, S. Online scheduling via learned weights. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1859 1877. SIAM, 2020.

Lawler, E. L., Lenstra, J. K., Kan, A. H. R., and Shmoys, D. B. Sequencing and scheduling: Algorithms and complexity. Handbooks in operations research and management science, 4:445 522, 1993.

Lenstra, J. K. and Rinnooy Kan, A. Complexity of scheduling under precedence constraints. Operations Research, 26(1):22 35, 1978.

Lin, H., Luo, T., and Woodruff, D. Learning augmented binary search trees. In International Conference on Machine Learning, pp. 13431 13440. PMLR, 2022.

Lindermayr, A. and Megow, N. Permutation predictions for non-clairvoyant scheduling. In Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 357 368, 2022.

Lykouris, T. and Vassilvtiskii, S. Competitive caching with machine learned advice. In International Conference on Machine Learning, pp. 3296 3305. PMLR, 2018.

Maghakian, J., Lee, R., Hajiesmaili, M., Li, J., Sitaraman, R., and Liu, Z. Applied online algorithms with heterogeneous predictors. In International Conference on Machine Learning, pp. 23484 23497. PMLR, 2023.

Martel, C. Preemptive scheduling with release times, deadlines, and due times. Journal of the ACM (JACM), 29(3): 812 829, 1982.

Merlis, N., Richard, H., Sentenac, F., Odic, C., Molina, M., and Perchet, V. On preemption and learning in stochastic scheduling. In International Conference on Machine Learning, pp. 24478 24516. PMLR, 2023.

Mitzenmacher, M. and Vassilvitskii, S. Algorithms with predictions. Communications of the ACM, 65(7):33 35, 2022.

Motwani, R., Phillips, S., and Torng, E. Nonclairvoyant scheduling. Theoretical computer science, 130(1):17 47, 1994.

Munoz, A. and Vassilvitskii, S. Revenue optimization with approximate bid predictions. Advances in Neural Information Processing Systems, 30, 2017.

Panwalkar, S. S. and Iskander, W. A survey of scheduling rules. Operations research, 25(1):45 61, 1977.

Pinedo, M. L. Scheduling, volume 29. Springer, 2012.

Purohit, M., Svitkina, Z., and Kumar, R. Improving online algorithms via ml predictions. Advances in Neural Information Processing Systems, 31, 2018.

Ramamritham, K. and Stankovic, J. A. Scheduling algorithms and operating systems support for real-time systems. Proceedings of the IEEE, 82(1):55 67, 1994.

Sadek, K. A. A. and Elias, M. Algorithms for caching and MTS with reduced number of predictions. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum? id=Qu Ii LSkt O4.

Sakaue, S. and Oki, T. Discrete-convex-analysis-based framework for warm-starting algorithms with predictions. Advances in Neural Information Processing Systems, 35: 20988 21000, 2022.

Non-clairvoyant Scheduling with Partial Predictions

Shin, Y., Lee, C., Lee, G., and An, H.-C. Improved learningaugmented algorithms for the multi-option ski rental problem via best-possible competitive analysis. ar Xiv preprint ar Xiv:2302.06832, 2023.

Silwal, S., Ahmadian, S., Nystrom, A., Mc Callum, A., Ramachandran, D., and Kazemi, S. M. Kwikbucks: Correlation clustering with cheap-weak and expensivestrong signals. In The Eleventh International Conference on Learning Representations, 2023. URL https: //openreview.net/forum?id=p0JSSa1Au V.

Steiger, C., Walder, H., and Platzner, M. Operating systems for reconfigurable embedded platforms: Online scheduling of real-time tasks. IEEE Transactions on computers, 53(11):1393 1407, 2004.

Wei, A. and Zhang, F. Optimal robustness-consistency trade-offs for learning-augmented online algorithms. Advances in Neural Information Processing Systems, 33: 8042 8053, 2020.

Yao, A. C.-C. Probabilistic computations: Toward a unified measure of complexity. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pp. 222 227. IEEE Computer Society, 1977.

Non-clairvoyant Scheduling with Partial Predictions

A. Lower bounds

A.1. Preliminary result

Lemma A.1. Let F be a probability distribution on (0, ) with finite expectation, x an array of n i.i.d. random variables sampled from F, and αn,B 0 satisfying for any deterministic algorithm A with access the sizes of B jobs that

Eσ,x[A(x)] Ex[OPT(x)] αn,B ,

then for any (deterministic or randomized) algorithm ALG, we have

Rn,B(ALG) αn,B .

Proof. Let F and αn,B be as stated in the lemma. Using Yao s minimax principle (Yao, 1977) we deduce that for any randomized algorithm ALG

Eσ,x[ALG(x)] inf A deterministic Eσ,x[A(x)]

αn,BEx[OPT(x)] ,

where the infimum is taken over all deterministic algorithms. The previous inequality can be written as

Ex Eσ[ALG(x)] αn,BOPT(x) 0 ,

and this implies that, necessarily, there exists a value x = (x n, . . . , x n) taken by x verifying Eσ[ALG(x )] αn,BOPT(x ) 0, hence

Rn,B(ALG) Eσ[ALG(x )]

OPT(x ) αn,B .

A.2. Proof of Theorem 2.1

Let A be a deterministic algorithm. Using Equation 1, it suffices to bound E[P A ij] for all i = j to deduce a bound on E[A(x)], and since P A ij is a non-negative random variable, the focus can be narrowed down to bounding Pr(P A ij > t) for all t > 0. Following the proof scheme of (Motwani et al., 1994), we introduce the following definition.

Definition A.2. Let A be a deterministic algorithm given an instance of n jobs. For all i = j [n] and t 0, we denote by u A i,j(t) the time spent on job i when the total time spent on both jobs i and j is t, assuming neither job i nor j is completed. More precisely, u A i,j(t) is defined by

u A i,j(t) = SA i inf{t 0 : SA i (t ) + SA j (t ) = t} .

u A i,j is therefore a rule defined solely by the algorithm. For an instance x = (x1, . . . , xn) of job sizes, the real time spent on i when a total time of t has been spent on both jobs i, j is given by min(u A i,j(t), xi).

For all t 0, the following Lemma allows to express the event P A ij > t using u A ij(t).

Lemma A.3. Let x1, . . . , xn be instance of n job sizes, then for any algorithm A, for any i = j [n] and t 0, the following equivalence holds P A ij > t xi > u A i,j(t) and xj > t u A j,i(t)

Proof. Let A be an algorithm and i = j [n]. We denote by Sij(t) = Si(t) + Sj(t) the total processing time spent on both jobs i and j up to time t. Assuming that job i finishes first, i.e. ti tj, no processing time is spent on job i after ti, hence Si(tj) = Si(ti) = xi, and Pij = Si(tj) + Sj(ti) = Sij(ti). By symmetry, we deduce that Pij = Sij(min(ti, tj)).

Non-clairvoyant Scheduling with Partial Predictions

Therefore, using that Si and Sj are non-decreasing and continuous, it holds for all t 0

Pij > t Sij(min(ti, tj)) > t

min(ti, tj) > inf{t : Sij(t ) t}

xi > Si(inf{t : Sij(t ) t}) and xj > Sj(inf{t : Sij(t ) t}) (5)

xi > uij(t) and xj > t uij(t) . (6)

Equivalence (5) holds because ti = inf{t 0 : Si(t ) xi}, thus for any s 0 we have t > s xi > Si(s). The same holds for j. For Equivalence (6), we simply used Definition A.2 and the observation uij(t) + uji(t) = t.

Remark A.4. In the case of non-clairvoyant algorithms, the rule defining u A ij( ) is dictated by the algorithm, independent of the job sizes. Thus, if the job sizes are sampled independently from the exponential distribution, Lemma A.3 gives for all i = j that Pr(P A ij > t) = Pr(xi > u A i,j(t)) Pr(xj > t u A j,i(t)) = e t, and it follows immediately that E[P A ij] = 1 for any deterministic algorithm. This argument, used in (Motwani et al., 1994), is not applicable in our context since the algorithm possesses access to certain job sizes, enabling the formulation of a rule for u A i,j that considers this information. Therefore, more sophisticated techniques become necessary for our analysis since the independence of the events xi > u A i,j(t) and xj > t u A j,i(t) is lost.

Lemma A.5. Let a (0, ) and φ : [0, ) [0, ) a continuously differentiable and increasing function satisfying that φ(0) > 0, t 7 φ (t)

φ(t) is non-increasing and R 0 dt φ(t)2 < . Let xa 1, . . . , xa n i.i.d. random job sizes with distribution

Pr(xa 1 t) = 1 φ(t) 1 φ(a) 1

φ(0) 1 φ(a) 1 1t<a ,

then for any algorithm A having access to the sizes of the first B jobs xa 1, . . . , xa B, it holds that

E[P A ij] Z

φ(t)2 dt o(1) a i = j B ,

E[P A ij] 2 Z

φ(t)2 dt o(1) a i = j > B ,

E[P A ij] φ(0)2 Z

inf T x Gφ(x, T) φ (x)

φ(x) dx o(1) a i B, j > B ,

where Gφ(x, T) is defined in Equation (3), and the o(1) a term does not depend on the algorithm A.

Although Lemma A.5 is stated with a (0, ), the results also hold for random variables x 1 , . . . , x n sampled from the limit distribution Pr(x 1 t) = 1 φ(0)

φ(t) , where the o(1) term becomes zero.

Before proving the lemma, let us first observe that, since φ is increasing and R 0 dt φ(t)2 < , then necessarily limx φ(x) = .

Proof. Let us first observe that for all t 0

Pr(min(x 1 , x 2 ) t) = Pr(x 1 t) Pr(x 2 t) = φ(0)2

E[min(x 1 , x 2 )] = Z

0 Pr(min(x 1 , x 2 ) t)dt = Z

In the following, we prove separately the three claims of the Lemma: in the cases where both jobs sizes xi, xj are known, both are unknown, and where only xi is known.

Non-clairvoyant Scheduling with Partial Predictions

Both job sizes are known For any i = j [n], it holds that Pij min(xa i , xa j ), and this true in particular for i = j [B].

Furthermore, for any t 0, since φ(t) φ(0), we have that the mapping a 7 φ(t) 1 φ(a) 1

φ(0) 1 φ(a) 1 is non-increasing. It follows that

Pr(xa 1 t) 1 lim a φ(t) 1 φ(a ) 1

φ(0) 1 φ(a ) 1

φ(t) 1t<a , (7)

and therefore, for any i = j B

E[Pij] E[min(xa 1, xa 2)]

0 Pr(xa 1 t)2dt

φ(t)2 1t<adt = Z a

φ(t)2 dt o(1) a .

Both job sizes are unknown For i = j > B, the algorithm ignores the job sizes xa i and xa j Therefore, ui,j is independent of xa i and xa j . Consequently, using Lemma A.3, the independence of xa i and xa j , then Inequality (7), we obtain

Pr(Pij > t) = Pr(xi > ui,j(t) and xj > t ui,j(t))

= Pr(xi > ui,j(t)) Pr(xj > t ui,j(t))

φ(0) φ(ui,j(t))1ui,j(t)<a

φ(0) φ(t ui,j(t))1t ui,j(t)<a

= φ(0)21t a<ui,j(t)<a φ(ui,j(t)φ(t ui,j(t)) .

Using that φ

φ is non-increasing, we have for any t, u 0 that

d du (φ(u)φ(t u)) = φ (u)φ(t u) φ(u)φ (t u)

= φ(u)φ(t u) φ (u)

φ(u) φ (t u)

The sign of d du (φ(u)φ(t u)) is the same as that of φ (u)

φ(u) φ (t u)

φ(t u) , which is a non-increasing function. It is null for u = t/2, hence it is non-positive for u t/2 and non-negative for u t/2. This implies that u 7 φ(u)φ(t u) is minimal for u = t/2, i.e. φ(u)φ(t u) φ( t

2)2 for all t, u 0, in particular

Pr(Pij > t) φ(0)2

φ(t/2)2 1t a<ui,j(t)<a ,

and it follows that

0 Pr(Pij > t)dt Z

2)2 1t a<ui,j(t)<adt .

Observing that, for any t 0, the mapping a 7 φ(0)2/φ( t

2)21t a<ui,j(t)<a is non-decreasing, and its limit is φ(0)2/φ( t

2)2, the monotone convergence theorem guarantees that

2)2 1t a<ui,j(t)<adt = Z

Non-clairvoyant Scheduling with Partial Predictions

therefore, for any i = j > B

φ(t)2 dt o(1) a .

Only the size of one job is known For i B and j > B, the algorithm knows xa i , therefore uij( ) can be a function of xa i . By Lemma A.3 then Inequality (7), it holds for all t 0 that

Pr(Pij > t) = Pr(xa i > ui,j(t) and xa j > t ui,j(t))

= E[1xa i >ui,j(t)1xa j >t ui,j(t)]

= E 1xa i >ui,j(t)E[1xa j >t ui,j(t) | xa i , ui,j(t)]

= E 1xa i >ui,j(t) Pr xa j > t ui,j(t) | ui,j(t)

E 1xa i >ui,j(t) φ(0)1t ui,j(t)<a

φ(t ui,j(t))

φ(0)E 1t a<ui,j(t)<xa i φ(t ui,j(t))

and we obtain by integrating over t and then using Tonelli s theorem that

0 Pr(Pij > t)dt

1t a<ui,j(t)<xa i φ(t ui,j(t)) dt .

Recall that ui,j(t) is defined as the time spent on job i when a total of t units of time have been spent on both jobs i and j, thus uij(0) = 0, uij is non-decreasing and it is 1-Lipschitz. Let us denote by V the set of functions on [0, ) satisfying these properties V = n v : [0, ) [0, ) : v(0) = 0 and v(t2) v(t1)

t2 t1 [0, 1] t1 < t2 o ,

and for all x 0 and T [0, ], we denote by Vx,T the subset of V defined as

Vx,T = n v V : inf{t 0 : v(t) x} = T o .

For any x 0 and v V, let Tv = inf{t 0 : v(t) x}, then v Vx,Tv. Since v is 1-Lipschitz and v(0) = 0, it holds that Tv x. If v(t) < x for all t 0 then Tv = . Therefore, we have for any x 0

T [x, ] Vx,T .

Furthermore, given that v is non-decreasing, v(t) < x if and only if t < Tv. Consequently, E[Pij] satisfies

E[Pij] φ(0)E inf v V

1t a<v(t)<xa i φ(t v(t)) dt

inf T [xa i , ] inf v Vxa i ,T

1t a<v(t)<xa i φ(t v(t)) dt

inf T [xa i , ] inf v Vxa i ,T

1t a<v(t) φ(t v(t))dt

Let x > 0 and T [x, ], and define v x,T : t 0 7 (t T + x)1t>T x, then for all v Vx,T we have v(t) v x,T (t) for all t [0, T]. Indeed, v(t) 0 = v x,T (t) for t [0, T x], and if v(t) < v x,T (t) for some t [T x, T] then, because v is 1-Lipschitz, we have

v(T) v(T x) + x < v x,T (T x) + x = x ,

Non-clairvoyant Scheduling with Partial Predictions

which contradicts v(T) x. Finally, as φ and v 7 1t a<v are both non-decreasing, it holds for any v Vx,T that Z T

1t a<v(t) φ(t v(t))dt Z T

1t a<v x,T (t) φ(t v x,T (t))dt

φ(t) dt + Z T

1t a<t T +x

= Z min(a,T x)

dt φ(t) + x1T x<a

( R T x 0 dt φ(t) + x φ(T x) if T x < a R a 0 dt φ(t) if T x a (9)

= Gφ(x, T) if T x < a R a 0 dt φ(t) if T x a . (10)

Taking the infimum over v Vx,T then over T [x, ] in (10) gives for any x 0 that

inf T [x, ] inf v Vx,T

1t a<v(t) φ(t v(t))dt min inf T [x,a+x) Gφ(x, T), Z a

min inf T [x, ] Gφ(x, T), Z a

and substituting into 8 leads to

E[Pij] φ(0)E min inf T [xa 1, ] Gφ(xa 1, T), Z a

Let us denote by fa and f respectively the density functions of xa 1 and x 1 . We have for any x > 0 that

dx Pr(xa 1 x) = φ (x)/φ(x)2

φ(0) 1 φ(a) 1 1x<a φ(0)φ (x)

φ(x)2 1x<a = f (x)1x<a , (11)

E[Pij] = φ(0) Z

0 min inf T [x, ] Gφ(x, T), Z a

0 min inf T [x, ] Gφ(x, T), Z a

f (x)1x<adx . (12)

Observing for all x > 0 that

inf T [x, ] Gφ(x, T) lim T Gφ(x, T) = Z

dt φ(t) (13)

and that the mapping a 7 min inf T [x, ] Gφ(x, T), R a 0 dt φ(t) f (x)1x<a is non-decreasing, the monotone convergence theorem, the continuity of the minimum, then Inequality 13 guarantee that

0 min inf T [x, ] Gφ(x, T), Z a

f (x)1x<adx

min inf T [x, ] Gφ(x, T), Z a

0 min inf T [x, ] Gφ(x, T), Z

inf T [x, ] Gφ(x, T) f (x)dx

inf T [x, ] Gφ(x, T) φ (x)

Non-clairvoyant Scheduling with Partial Predictions

Finally, the previous term is finite because

inf T [x, ] Gφ(x, T) φ (x)

0 Gφ(x, 2x) φ (x)

dt φ(t) + x φ(x)

φ (x) φ(t)2

dt φ(t)2 < .

Therefore Z

0 min inf T [x, ] Gφ(x, T), Z a

φ(t) 1t<adt

inf T [x, ] Gφ(x, T) φ (x)

φ(x)2 dx o(1) ,

and substituting into Inequality (12) gives the aimed result.

A.2.1. PROOF OF THE THEOREM

Using the previous lemmas, we can now prove the theorem.

Proof. Let x = (x1, . . . , xn) be an array of i.i.d. random job sizes with distribution Pr(xi t) = 1 φ(0)

φ(t) . Observe that

0 Pr(x1 t)dt = φ(0) Z

dt φ(t) , E[min(x1, x2)] = Z

0 Pr(x1 t)2dt = φ(0)2 Z

Lemma A.5 with a = gives for any algorithm A that E[Pij] E[min(x1, x2)] for i = j B, E[Pij] 2E[min(x1, x2)] for i = j > B, and E[Pij] αφE[min(x1, x2)] for i B, j > B. Indeed, the probability density function of x1 is t 7 φ(0)φ (t)

φ(t)2 , and we have for all i B and j > B that

φ(0)E inf T x1 Gφ(x1, T) = φ(0)2 Z

inf T x Gφ(x, T) φ (x)

dt φ(t)2 = αφE[min(x1, x2)] .

Assume that R 0 dt φ(t) < , i.e. E[x1] < , then by Equation (1) it holds that

E[A(x)] n E[x1] + X

1 i<j B E[min(x1, x2)] + X

B<i<j n 2E[min(x1, x2)] +

j=B+1 αφE[min(x1, x2)]

= n E[x1] + B(B 1)

2 + (n B)(n B 1) + αφB(n B) E[min(x1, x2)] ,

Non-clairvoyant Scheduling with Partial Predictions

the terms multiplying E[min(x1, x2)] can be reordered as follows

2 + (n B)(n B 1) + αφB(n B)

2 + (n B)(n 1 + (αφ 1)B)

2 + n(n 1) + (αφ 1)n B n B B((αφ 1)B 1)

2 + n(n 1) + (αφ 2)n B (αφ 1)B(B 1) (αφ 2)B

= n(n 1) (2 αφ)(n 1)B + ( 3

2 αφ)B(B 1) (14)

n + (3 2αφ)B(B 1)

E[A(x)] = n E[x1] + Cn,B n(n 1)

2 E[min(x1, x2)] .

On the other hand, by equation (2)

E[OPT(x)] = E

i<j min(xi, xj)

= n E[x1] + n(n 1)

2 E[min(x1, x2)] ,

E[A(x)] E[OPT(x)] n E[x1] + Cn,B n(n 1)

2 E[min(x1, x2)]

n E[x1] + n(n 1)

2 E[min(x1, x2)]

= Cn,B Cn,B 1

2 E[min(x1,x2)]

and using Lemma A.1, this yields

Rn,B(A) Cn,B Cn,B 1

2 E[min(x1,x2)]

If Bn = wn + o(n) for some w [0, 1], then

CRB(A) lim inf n Rn,Bn(A)

2 2(2 αφ)w + (3 2αφ)w2 .

Assume now that R 0 dt φ(t) = , thus E[x1] = . Let a > 0, and xa = (xa 1, . . . , xa n) be i.i.d. job sizes with distribution

Pr(xa 1 t) = 1 φ(t) 1 φ(a) 1

φ(0) 1 φ(a) 1 1t<a .

Therefore xa 1 has a finite expectation E[xa 1] a. Denoting by x1, x2 i.i.d. random variables with distribution Pr(x1 t) = 1 φ(0)

φ(t) , Lemma A.5 and Equation (1) give for any algorithm A that

E[A(xa)] n E[xa 1] + X

1 i<j B (E[min(x1, x2)] o(1) a ) + X

B<i<j n (2E[min(x1, x2)] o(1) a )

j=B+1 (αφE[min(x1, x2)] o(1) a )

= n E[xa 1] + Cn,B n(n 1)

2 E[min(x1, x2)] n2 o(1) a , (17)

Non-clairvoyant Scheduling with Partial Predictions

where the last equation is obtained with the same computation as in the case R 0 dt φ(t) < . Moreover, if a 1 then

E[min(xa 1, xa 2)] = Z

0 Pr(min(xa 1, xa 2) t)dt = Z

φ(t) 1 φ(a) 1

φ(0) 1 φ(a) 1

and we have for any t 0 that φ(t) 1 φ(a) 1

φ(0) 1 φ(a) 1 2 φ(t) 2

(φ(0) 1 φ(1) 1)2 , which is an integrable function. Furthermore, φ is

increasing and 1/φ2 is integrable, hence lim a φ(a) = . Therefore, the dominated convergence theorem gives that

lim a E[min(xa 1, xa 2)] = Z

( φ(t) 1 φ(a) 1

φ(0) 1 φ(a) 1

= E[min(x1, x2)] .

It follows that

OPT(xa) = E

i=1 xa i + X

i<j min(xa i , xa j )

= n E[xa 1] + n(n 1)

2 E[min(xa 1, xa 2)]

= n E[xa 1] + n(n 1)

2 E[min(x1, x2)] + n2 o(1) a .

Combining this with Inequality (17), then using Lemma A.1, we obtain that

Rn,B n E[xa 1] + Cn,B n(n 1)

2 E[min(x1, x2)] n2 o(1) a n E[xa 1] + n(n 1)

2 E[min(x1, x2)] + n2 o(1) a

and if B = wn + o(n) then

CRB(A) lim inf n Rn,B(A)

2 2(2 αφ)w + (3 2αφ)w2 E[min(x1, x2)] o(1) a E[min(x1, x2)] + o(1) a

and finally, taking the limit a yields

CRB(A) 2 2(2 αφ)w + (3 2αφ)w2 .

A.3. Proof of Corollary 2.2

Proof. Let us consider i.i.d. exponentially distributed job sizes. This corresponds to φ(t) = et for all t 0. It holds that φ(0) = 1 > 0, φ is increasing, φ

φ = 1 is non-increasing, and Z

dt φ(t) = Z

0 e tdt = 1 , Z

dt φ(t)2 = Z

0 e 2tdt = 1

Furthermore, it holds for all x 0 and T x that

G(x, T) = Z T x

dt φ(t) + x φ(T x)

0 e tdt + xe (T x)

= 1 + (x 1)e (T x) .

Non-clairvoyant Scheduling with Partial Predictions

Thus, for all x 0

inf T x G(x, T) =

( G(x, x) = x if x < 1 lim T G(x, T) = 1 if x 1 ,

and it follows that

inf T x Gφ(x, T) φ (x)

φ(x)2 dx = Z

inf T x Gφ(x, T) e xdx

0 xe xdx + Z

Finally, by Theorem 2.1, it holds for any algorithm A having access to the sizes of B jobs that

Rn,B(A) Cn,B Cn,B 1

4 = Cn,B 4(Cn,B 1)

Cn,B = 2 (4/e)B

n + (4/e 1)B(B 1)

In particular, if B = wn + o(n), then again by Theorem 2.1 we have

A.4. Proof of Corollary 2.3

Proof. Let r ( 1

2, 1), and consider φ : t 7 (1 + t)r. It holds that φ(0) = 1 > 0, φ is increasing, φ

φ : t 7 r 1+t is non-increasing, and R 0 dt φ(t)2 = R 0 dt (1+t)2r = 1 2r 1 < because r > 1

2. Moreover, it holds for all x 0 and T x that

G(x, T) = Z T x

dt φ(t) + x φ(T x)

dt (1 + t)r + x (1 + T x)r

= 1 1 r + (1 + T x)1 r

1 r + x (1 + T x)r ,

therefore, for all x 0

inf T x G(x, T) = 1 1 r + inf T x

(1 + T x)1 r

1 r + x (1 + T x)r

= 1 1 r + inf y [0,1]

1 r + xyr ,

where the last inequality is obtained by considering y = 1 1+T x. It holds that

1 r + xyr = y (2 r) + rxy (1 r)

= rxy (2 r) y 1

Non-clairvoyant Scheduling with Partial Predictions

The mapping y 7 y (1 r)

1 r + xyr is thus minimized on [0, ) for y = 1 rx, and it is minimized on [0, 1] for y = min(1, 1

rx). Therefore, it holds for x 1

r that y = 1 and inf T x G(x, T) = 1 1 r + ( 1 1 r + x) = x, and for x > 1

r we have y = 1 rx and

inf T x G(x, T) = 1 1 r + (rx)1 r

1 r + (rx) r

= 1 1 r + 1 1 r + 1

= 1 1 r + (rx)1 r

We deduce that

inf T x Gφ(x, T) φ (x)

φ(x)2 dx = Z 1/r

1 1 r + x r (1 + x)r+1 dx

1 1 r + (rx)1 r

r (1 + x)r+1 dx

= 1 1 r + Z 1/r

rx (1 + x)1+r dx + r1 r

(1 + x)r+1 dx

1 1 r + r1 r

(1 + x)r+1 dx . (18)

Let us denote Wr = R 1/r x1 r (1+x)r+1 dx. It holds that

(1 + x)r+1 dx

1 r dx (1 + x)2r

r x 1 + x dx (1 + x)2r (r > 1

s2 2r ds (s 1 1+x)

= s2r 1 1 s

0 + 1 2(2r 1)

s2r 1 1 sds

2/3 2r 1 + 1 2(2r 1)

s2r 1 1 sds .

It holds for all r ( 1

2, 1) that s2r 1

1 s 1 1 s, which is an integrable function on [0, 1

3], hence the dominated convergence theorem guarantees that

s2r 1 1 sds = 1

s2r 1 1 sds

Non-clairvoyant Scheduling with Partial Predictions

we deduce that

(2r 1)Wr = 3 (2r 1)p

s2r 1 1 sds

2/3 o(1) r 1/2

2/3 o(1) r 1/2

= 1 + o(1) r 1/2 .

Substituting into Inequality 18, and recalling that R 0 dt φ(t)2 = 1 2r 1, we deduce that for r close to 1

inf T x Gφ(x, T) φ (x)

φ(x)2 dx 1 1 r + r1 r

2 + o(1))(1 o(1)) Z

Consequently, by Theorem 2.1, if B = wn + o(n), it holds for any algorithm A that

2 o(1) r 1/2

2 o(1) r 1/2

and taking the limit when r 1

CR(A) 2 2(2

B. Known partial order

B.1. Preliminary results

Lemma B.1. For any real numbers x1, . . . , xn, if σ is a uniformly random permutation of [n] then

E[min(xσ(1), xσ(2))] = 2 n(n 1)

1 i<j n min(xi, xj) .

Proof. Since σ is uniformly random, we have for any i = j that E[min(xσ(1), xσ(2))] = E[min(xσ(i), xσ())], thus

E[min(xσ(1), xσ(2))] = 2 n(n 1)E h X

1 i<j n min(xσ(i), xσ(j)) i

1 i<j n min(xi, xj) .

Lemma B.2. For any real numbers x1, . . . , xn, if σ is a uniformly random permutation of [n] then

E[xσ(1)1xσ(1)<xσ(2)] E[min(xσ(1), xσ(2))] = 1 n(n 1)

1 i<j n min(xi, xj) ,

with equality if x1, . . . , xn are pairwise distinct.

Non-clairvoyant Scheduling with Partial Predictions

Proof. σ is a uniformly random permutation of [n], therefore E[xσ(i)1xσ(i)<xσ(j)] = E[xσ(1)1xσ(1)<xσ(2)] for all i = j [n]. It follows that

E[xσ(1)1xσ(1)<xσ(2)] = 1

2 E[xσ(1)1xσ(1)<xσ(2)] + E[xσ(2)1xσ(2)<xσ(1)]

2E[xσ(i)1xσ(1)<xσ(2) + xσ(2)1xσ(2) xσ(1)]

2E[min(xσ(1), xσ(2))] ,

and using Lemma B.1 concludes the proof. To have equality, it must hold that 1xσ(2)<xσ(1) = 1xσ(2) xσ(1) for any permutation, and this is true if and only if the values x1, . . . , xn are pairwise distinct.

B.2. Proof of Theorem 3.1

Proof. For any i = j B + 1, as in round-robin, Pσ(i)σ(j) = 2 min(xσ(i), xσ(j)). For i = j B, since the jobs {xσ(k)}B k=1 are run in non-decreasing order one after the other, it holds that Pσ(i)σ(j) = min(xσ(i), xσ(j)). Finally, for i B and j B + 1, the delay caused by σ(j) to π(i) is always Dσ(j)π(i) = min(xπ(i), xσ(j)). On the other hand if xσ(j) xπ(i 1) then xσ(j) terminates before phase i begins, thus job π(i) does not delay job σ(j): Dπ(i)σ(j) = 0. If xσ(j) > xπ(i 1), then after the first step of phase i, the time spent on each of the jobs π(i) and σ(j) is xπ(i 1). Both jobs are then executed with identical processing powers until one of them is terminated, thus

Dπ(i)σ(j) = min(xπ(i), xσ(j))1xσ(j)>xπ(i 1) = xπ(i)1xσ(j)>xπ(i) + xσ(j)1xπ(i) xσ(j)>xπ(i 1) .

With the convention xπ(0)=0, taking the sum over i gives

i=1 Dπ(i)σ(j) =

i=1 xπ(i)1xσ(j)>xπ(i) + xσ(j)

i=1 1xπ(i) xσ(j)>xπ(i 1)

i=1 xσ(i)1xσ(j)>xσ(i) + xσ(j)1xσ(j) xπ(B)

i=1 xσ(i)1xσ(j)>xσ(i) + xσ(j) . (19)

Using Lemma B.2, and recalling that {π(i)}B i=1 = {σ(i)}B i=1, we have in expectation,

i=1 Dσ(i)σ(j) i = E h B X

i=1 Dπ(i)σ(j) i

2 E[min(xσ(1), xσ(2))] + 1

k=1 xk , (20)

and it follows that, for any j B + 1

i=1 Pσ(i)σ(j) i = E h B X

i=1 Dσ(j)σ(i) i + E h B X

i=1 Dσ(i)σ(j) i

BE[min(xσ(1), xσ(2))] + B

2 E[min(xσ(1), xσ(2))] + 1

2 E[min(xσ(1), xσ(2))] + 1

Non-clairvoyant Scheduling with Partial Predictions

j=B+1 E h B X

i=1 Pσ(i)σ(j) i 3

2B(n B)E[min(xσ(1), xσ(2))] + 1 B

Finally, using Equation (1), the previous upper bounds on Pσ(i)σ(j) for all i = j, then Equation (16) with 3/2 instead of απ gives that the objective of CRRR on the instance x = {x1, . . . , xn} satisfies in expectation

E[CRRR(x)] =

i=1 xi + E h X

1 i<j n Pσ(i)σ(j) i

1 i<j B E[min(xσ(i), xσ(j))] + X

B<i<j n 2E[min(xσ(i), xσ(j))]

j=B+1 E h B X

i=1 Pσ(i)σ(j) i (21)

i=1 xi + B(B 1)

2 + (n B)(n B 1) + 3

2B(n B) E[min(xσ(1), xσ(2))]

i=1 xi + 2 B

2 E[min(xσ(1), xσ(2))]

i=1 xi + 2 B

1 i<j n min(xi, xj) (22)

OPT(x) , (23)

where we used B.1 for (22) then Equation (2) for (23). Therefore, the (n, B)-competitive ratio of Algorithm 1 satisfies

Rn,B(CRRR) 2 B

For proving the lower bound on Rn,B, let ε > 0 and let us consider job sizes xε i = 1 + iε for all i [n]. Observe that the only inequalities we used in the analysis are Inequalities (19) (xσ(j)1xσ(j) xπ(B) xσ(j)), and (20) given by Lemma B.2. The second one becomes equality if the job sizes are pairwise distinct, which is satisfied by {xε i}n i=1. As for Inequality (19), since the job sizes xε are pairwise distinct and all larger than 1, we can give instead a lower bound on E[xσ(j)1xσ(j) xπ(B)]

Non-clairvoyant Scheduling with Partial Predictions

for j > B as follows

E[xσ(j)1xσ(j) xπ(B)] = E[xσ(j)1σ(j) π(B)]

Pr(σ(j) < π(B)) = Pr(σ(n) < max i B σ(i))

m=1 Pr(σ(n) = m, max i B σ(i) = k)

ℓ=1 Pr(σ(n) = m, σ(ℓ) = k, i [B] \ {ℓ} : σ(i) [k 1] \ {m})

(B 1)!(n B 1)!

= B!(n B 1)!

k=B (k 1) k 2 B 1

= B B!(n B 1)!

= B B!(n B 1)!

= B B + 1 .

Therefore, instead of (19), we have the lower bound

i=1 Dσ(i)σ(j) i = E h B X

i=1 Dπ(i)σ(j) i

2 E[min(xσ(1), xσ(2))] + E[xσ(j)1xσ(j) xπ(B)] (25)

2 + B B + 1 , (26)

and thus, since we proved earlier that Dσ(i)σ(j) = min(xε σ(i),xε σ(j) for all i B and j > B, it holds that

i=1 Pσ(i)σ(j) i

i=1 E[min(xσ(i),xσ(j)] + B

2 + B B + 1 3B

2 + B B + 1

and using that xε i 1 for all i [n], we obtain by substituting the previous inequalities into (21) that

E[CRRR(xε)] n + B(B 1)

2 + (n B)(n B 1) + (n B) B

2 + B B + 1

= n + (n B) B B + 1

2 + (n B)(n B 1) + 3B

+ n(n 1) (n 1)B

= (n + 1) n B

Non-clairvoyant Scheduling with Partial Predictions

On the other hand, Equation (2), along with xε i 1 + nε for all i [n], gives

i=1 (n i + 1)(1 + nε) = n(n + 1)

2 (1 + nε) ,

and it follows that

Rn,B(CRRR) E[CRRR(xε)]

(n + 1) n B

n ) (n + 1)(B + 1)

Taking the limit ε 0 and using Inequality (24), we obtain that

n ) (n + 1)(B + 1) Rn,B(CRRR) 2 B

Finally, if Bn = wn for some w [0, 1] then Bn wn, hence CRB(CRRR) = supn 2 Rn,Bn(CRRR) 2 w, and this bound is reached by the lower bound on Rn,Bn for n , which gives that CRB(CRRR) = 2 w.

C. Predictions of the job sizes

C.1. Proof of Theorem 4.1

We first compute the mutual delays of the jobs in Switch with any breakpoints.

Lemma C.1. For any job sizes x = (x1, . . . , xn), and for any permutation σ of [n] and breakpoints zσ = (zσ(1), . . . , zσ(B)), the Switch algorithm Switch(zσ, x) satisfies for all i = j [n] that

Pσ(i)σ(j) = xσ(i)1zσ(i)<zσ(j) + xσ(j)1zσ(i)>zσ(j) + (θijxσ(i) + (1 θij)xσ(j))1zσ(i)=zσ(j) if i, j B ,

Pσ(i)σ(j) = 2 min(xσ(i), xσ(j)) if i, j > B ,

Pσ(i)σ(j) = xσ(i)1xσ(j)>zσ(i) + min(zσ(i), xσ(i)) if i B, j > B .

where θij is a Bernoulli random variable with parameter 1/2 independent of σ for all i = j [B].

Proof. Let us first define the random variables θij. For all i = j [B], if zσ(i) = zσ(j) then let θij be an independent Bernoulli random variable with parameter 1/2, and if zσ(i) = zσ(j) then let θij be the indicator that zσ(i) comes before zσ(j) with the ordering π. Since π is an ordering of the breakpoints chosen uniformly at random, then θij is a Bernoulli random variable with parameter 1/2 independently of σ.

For i = j [B], if zσ(i) < zσ(j) then job σ(i) is executed until completion before job σ(j) starts being executed, thus Dσ(i)σ(j) = xσ(i) and Dσ(j)σ(i) = 0 and Pσ(i)σ(j) = xσ(i). By symmetry, if zσ(i) > zσ(j) then Pσ(i)σ(j) = xσ(j). In the case where zσ(i) = zσ(j), since π is chosen uniformly at random among all the permutations of [B] satisfying zπ(1) . . . zπ(B), each of the jobs σ(i), σ(j) is run until completion before the other one starts with equal probability 1/2, hence Pσ(i)σ(j) = θijxσ(i) + (1 θij)xσ(j). It follows that

Pσ(i)σ(j) = xσ(i)1zσ(i)<zσ(j) + xσ(j)1zσ(i)>zσ(j) + (θijxσ(i) + (1 θij)xσ(j))1zσ(i)=zσ(j) .

For i = j > B, jobs σ(i), σ(j) are processed symmetrically and the delays they cause to each other are the same as in round-robin. Therefore Dσ(i)σ(j) = Dσ(j)σ(i) = min(xσ(i), xσ(j)), and it holds that

Pσ(i)σ(j) = 2 min(xσ(i), xσ(j)) .

Non-clairvoyant Scheduling with Partial Predictions

For i B and j > B, at the time when job σ(i) starts being executed, it holds by definition of Algorithm 2 that either job σ(j) is completed or Sσ(j) = zσ(i), hence the delay caused by job σ(j) to job σ(i) is Dσ(j)σ(i) = min(zσ(i), xσ(j)). On the other hand, job σ(i) delays job σ(j) if and only if xσ(j) > zσ(i), and in this case job σ(i) runs until completion before job σ(j) is completed, hence Dσ(i)σ(j) = xσ(i)1xσ(j)>zσ(i), and it follows that

Pσ(i)σ(j) = xσ(i)1xσ(j)>zσ(i) + min(zσ(i), xσ(i))

C.1.1. PROOF OF THE THEOREM

Using the previous lemma, we now prove Theorem 4.1

Proof. Using Lemma C.1, it holds for all i = j [n] that Pσ(i)σ(j) = min(xσ(i), xσ(j)) if i, j B, Pσ(i)σ(j) = 2 min(xσ(i), xσ(j)) if i, j > B, and if i B and j > B then

Pσ(i)σ(j) = min(xσ(i), xσ(j)) + xσ(i)1xσ(i)<xσ(j) .

Taking the expectation over σ then using Lemma B.2 gives

E[xσ(i)1xσ(i)<xσ(j)] = E[xσ(1)1xσ(1)<xσ(2)]

2E[min(xσ(1), xσ(2))] , (27)

hence for any i B and j > B

E[Pσ(i)σ(j)] 3

2E[min(xσ(1), xσ(2))] .

It follows from Equation (1) that for any instance of n job sizes x = (x1, . . . , xn), the switch algorithm 2 with breakpoints (zσ(i))B i=1 = (xσ(i))B i=1 satisfies

E[Switch(xσ, x)]

1 i<j B E[min(xσ(1), xσ(2))] + X

B<i<j n 2E[min(xσ(i), xσ(j))]

3 2E[min(xσ(i), xσ(j))]

i=1 xi + B(B 1)

2 + (n B)(n B 1) + 3

2B(n B) E[min(xσ(1), xσ(2))]

i=1 xi + 2 B

2 E[min(xσ(1), xσ(2))]

i=1 xi + 2 B

1 i<j n min(xi, xj) , (28)

where we used Lemma B.1 for the last equation. We can assume without loss of generality that x1 . . . xn, which yields P

1 i<j n min(xi, xj) = Pn i=1(n i)xi. Finally, Equation (2) gives

E[Switch(xσ, x)]

OPT(x) Pn i=1 xi + 2 B

n Pn i=1(n i)xi Pn i=1 xi + Pn i=1(n i)xi

n ) Pn i=1 xi Pn i=1(n i + 1)xi

n ) Pn i=1(n i + 1)qi , (29)

Non-clairvoyant Scheduling with Partial Predictions

where qi = (Pn j=1 xj) 1xi for all i [n]. The variables (qi)n i=1 satisfy Pn i=1 qi = 1, and we can assume without loss of generality that q1 . . . qn (i.e. x1 . . . xn). Expression (29) is maximized under these constraints for q1 = . . . = qn = 1

n. It follows for all job sizes x1, . . . , xn that

E[Switch(xσ, x)]

n ) n + 1 ,

hence Rn,B(Switch) 2 B

n ) n+1 . Let us now prove that this is exactly the competitive ratio of ALG. Observe that the only inequality we used while analyzing Switch(xσ, x) is Inequality (27), which becomes an equality if the job sizes are pairwise distinct. Let us, therefore, consider job sizes x(ε) i = 1 + iε [1, 1 + nε] for some ε > 0. All the inequalities becoming inequalities, it holds in particular that

Rn,B(Switch) E[Switch(xσ, x(ε))]

OPT(x(ε)) = 2 B

n ) Pn i=1 x(ε) i Pn i=1(n i + 1)x(ε) i

n ) Pn i=1(1 + nε) Pn i=1(n i + 1)

n (1 + nε)2(1 B

n ) n + 1 ,

and taking ε 0 yields that

Rn,B(Switch) = 2 B

n ) n + 1 .

Let w [0, 1]. If B = wn , then B/n w and it holds for all n 2 that Rn,B(Switch) 2 w, and this bound is reached for n , thus CRB(Switch) = 2 w.

C.2. Proof of Proposition 4.2

Proof. It is shown in (Motwani et al., 1994) that for any instance x = (x1, . . . , xn), the expected sum of completion times resulting from a run of RTC is E[RTC(x)] = n+1

2 Pn i=1 xi. Using Equation 28, we deduce that the algorithm ALG that runs RTC with probability 2(n B) n(n+3) 2B and runs Switch with breakpoints zσ = xσ with the remaining probability satisfies

E[ALG(x)] = 2(n B) n(n + 3) 2B RTC(x) + 1 2(n B) n(n + 3) 2B

E[Switch(xσ, x)]

= 2(n B) n(n + 3) 2B n + 1

i=1 xi + 1 2(n B) n(n + 3) 2B

i=1 xi + 2 B

1 i<j n min(xi, xj)

= (n 1)(n B)

n(n + 3) 2B + 1 n X

i=1 xi + 1 2(n B) n(n + 3) 2B

1 i<j n min(xi, xj)

n ) n + 3 2B

1 i<j n min(xi, xj)

n ) n + 3 2B

Non-clairvoyant Scheduling with Partial Predictions

C.3. Proof of Lemma 4.3

Proof. Let x1, . . . , xn Let ξ be a random variable with distribution F. Let i = j B, by symmetry, we can assume that xσ(i) xσ(j). Lemma C.1 gives that

Pσ(i)σ(j) = xσ(i)1yσ(i)<yσ(j) + xσ(j)1yσ(i)>yσ(j) + xσ(i)+xσ(j)

2 1yσ(i)=yσ(j) xσ(i)1yσ(i)<yσ(j) + xσ(j)1yσ(i) yσ(j) = xσ(i) + (xσ(j) xσ(i))1yσ(i) yσ(j) xσ(i) + (xσ(j) yσ(j) + yσ(i) xσ(i))1yσ(i) yσ(j) xσ(i) + (ησ(i) + ησ(j))

= min(xσ(i), xσ(j)) + (ησ(i) + ησ(j)) ,

and we obtain in expectation that for all i = j B

E[Pσ(i)σ(j)] E[min(xσ(i), xσ(j))] + 2

B E[ησ] . (30)

For i = j > B, Lemma C.1 gives

E[Pσ(i)σ(j)] = 2E[min(xσ(i), xσ(j))] . (31)

For i B and j > B, we have again by Lemma C.1 that

E[Pσ(i)σ(j)] = E[min(ξyσ(i), xσ(j))] + E[xσ(i)1ξyσ(i)<xσ(j)] .

Conditional on the permutation σ, i.e. taking the expectation only over ξ, the first term can be bounded as follows

E[min(ξyσ(i), xσ(j)) | σ] E[min(ξxσ(i), xσ(j)) + ξησ(i) | σ]

= E[xσ(j)1ξxσ(i) xσ(j) + ξxσ(i)1ξxσ(i)<xσ(j) | σ] + E[ξ]ησ(i)

= xσ(j) Pr(ξ xσ(j) xσ(i) | σ) + xσ(i)E h ξ1ξ< xσ(j) xσ(i) | σ i + E[ξ]ησ(i) , (32)

and for the second term, the expectation conditional on σ satisfies

E[xσ(i)1ξyσ(i)<xσ(j) | σ] E[yσ(i)1ξyσ(i)<xσ(j) + ησ(i) | σ]

= yσ(i) Pr(ξ < xσ(j)

yσ(i) | σ) + ησ(i)

= h F (xσ(j), yσ(i)) + ησ(i) h F (xσ(j), xσ(i)) + LF |xσ(i) yσ(i)| + ησ(i) (33)

= xσ(i) Pr(ξ < xσ(j) xσ(i) | σ) + (1 + LF )ησ(i) , (34)

where we used for (33) that h F is L-Lipschitz with respect to the second variable. Combining (32) and (34) yields

E[Pσ(i)σ(j) | σ] xσ(j) + (xσ(i) xσ(j)) Pr(ξ < xσ(j) xσ(i) | σ) + xσ(i)E h ξ1ξ< xσ(j) xσ(i) | σ i + (1 + LF + E[ξ])ησ(i)

= xσ(j) + xσ(i)

1 xσ(j) xσ(i) Pr(ξ < xσ(j) xσ(i) | σ) + E h ξ1ξ< xσ(j) xσ(i) | σ i!

+ (1 + LF + E[ξ])ησ(i)

= xσ(j) + xσ(i)g F xσ(j)

xσ(i) + (1 + LF + E[ξ])ησ(i) . (35)

Non-clairvoyant Scheduling with Partial Predictions

We can assume without of generality that x1 . . . xn. It holds by definition of the constants βF and γF that g F xσ(j)

xσ(i) βF xσ(j) xσ(i) if σ(j) < σ(i), and g F xσ(j)

xσ(i) γF xσ(j) xσ(i) if σ(j) > σ(i), which gives

E h xσ(j) + xσ(i)g F xσ(j)

xσ(i) i = 1

2E h xσ(j) + xσ(i)g F xσ(j)

xσ(i) | σ(j) < σ(i) i

2E h xσ(j) + xσ(i)g F xσ(j)

xσ(i) | σ(j) > σ(i) i

2 E[xσ(j) | σ(j) < σ(i)] + γF

2 E[xσ(i) | σ(j) > σ(i)]

2 E[min(xσ(i), xσ(j)) | σ(j) < σ(i)] + γF

2 E[min(xσ(i), xσ(j)) | σ(j) > σ(i)]

= 1 + βF + γF

2 E[min(xσ(i), xσ(j))] ,

thus we obtain by taking the expectation over σ in (35) that

E[Pσ(i)σ(j)] 1+βF +γF

2 E[min(xσ(i), xσ(j))] + 1+LF +E[ξ]

Using Equation 1, the inequality above, Inequalities (30), (31), then Equation 16 with 1+βF +γF

2 instead of απ, we deduce that Algorithm 3 satisfies for any job sizes x1, . . . , xn that

E[Switch(ξyσ, x)] =

1 i<j n E[Pσ(i)σ(j)]

E[min(xσ(i), xσ(j))] + 2

B E[ησ] + X

B<i<j n 2E[min(xσ(i), xσ(j))]

2 E[min(xσ(i), xσ(j))] + 1+LF +E[ξ]

i=1 xi + B(B 1)

2 + (n B)(n B 1) + 1+βF +γF

2 B(n B) E[min(xσ(1), xσ(2))]

+ B 1 + (1 + LF + E[ξ])(n B) E[ησ]

i=1 xi + n(n 1) + 1 βF +γF

2 B(B 1) 3 2 βF +γF

2 B(n 1) E[min(xσ(1), xσ(2))]

+ (B 1 + (1 + LF + E[ξ])(n B)) E[ησ]

i=1 xi + 2 3 βF γF B

n + 2 βF γF B(B 1)

i<j min(xi, xj)

+ B 1 + (1 + LF + E[ξ])(n B) E[ησ] ,

where we used Lemma B.1 for the last equality.

C.4. Proof of Lemma 4.4

Proof. Let ξ 1 + E(1/ρ). The mapping h F defined in Lemma 4.3 becomes for all s, t > 0

h F (s, t) = t F( s

t ) = t 1 e 1

t 1) 1t<s .

For all s > 0, it holds for all t < s that

t = 1 e 1 ρ ( s

Non-clairvoyant Scheduling with Partial Predictions

but the mapping u 7 ( u

ρ + 1)e u 1

ρ is decreasing on [1, ]. Indeed

ρ + 1)e u 1

ρ + 1)e u 1

and since s

t > 1 we deduce that

ρ = 1 lim u 1( u

ρ + 1)e u 1

ρ h F (s, t)

t 1 lim u ( u

ρ + 1)e u 1

thus | h F (s,t)

t | max(1, 1

ρ for all t < s. Otherwise, if t s then h F (s, t) = 0 and h F (s, ) is continuous in t = s, therefore t 7 h F (s, t) is 1

ρ-Lipschitz.

On the other hand, given that ξ > 1 a.s., it holds for s 1 that E[ξ1ξ<s] = 0, and for s > 1 that

E[ξ1ξ<s] = Pr(ξ < s) + E[(ξ 1)1ξ 1<s 1]

0 t1t<s 1 e t/ρ

= F(s) + Z s 1

= F(s) + h (t + ρ)e t/ρis 1

= 1 e (s 1)/ρ + (s 1 + ρ)e t/ρ + ρ

= 1 + ρ + (s + ρ)e (s 1)/ρ .

If follows for all s > 0 that

g F (s) = (1 s) 1 e (s 1)/ρ 1s>1 + 1 + ρ + (s + ρ)e (s 1)/ρ 1s>1

= (2 + ρ) s (1 + ρ)e (s 1)/ρ 1s>1 ,

hence βF = sup0<s 1 g F (s)

γF = sup s 1 (g F (s) + s) = sup s 1

(2 + ρ) (1 + ρ)e (s 1)/ρ = 2 + ρ .

Finally, with LF = 1

ρ, βF = 0 and γF = 2 + ρ, and observing that E[ξ] = 1 + ρ, the constants C1 n,B,F and C1 n,B,F defined in Lemma 4.3 satisfy

C1 n,B,F = 2 (1 ρ)B

n(n 1) = 2 B

C2 n,B,F (2 + ρ + 1

ρ)(n B) + B 4

ρ(n B) + B ,

and the objective function of Algorithm 3 with any job sizes x = {x1, . . . , xn} can be upper-bounded as follows

E[Switch(ξyσ, x)]

i=1 xi + C1 n,B,F X

i<j min(xi, xj) + 4

ρ(n B) + B E[ησ] ,

and by Equation (2) we obtain

E[Switch(ξyσ, x)]

Pn i=1 xi + C1 n,B,F P

i<j min(xi, xj) Pn i=1 xi + P

i<j min(xi, xj) + 4

ρ(n B) + B E[ησ] OPT(x)

= C1 n,B,F (C1 n,B,F 1) Pn i=1 xi Pn i=1 xi + P

i<j min(xi, xj) + 4

ρ(n B) + B E[ησ] OPT(x) .

Non-clairvoyant Scheduling with Partial Predictions

As we showed in the proof of Theorem 4.1, we have

Pn i=1 xi Pn i=1 xi+P

i<j min(xi,xj) 2 n+1, and the minimum is reached when all the job sizes are equal.

E[Switch(ξyσ, x)]

OPT(x) C1 n,B,F 2(C1 n,B,F 1)

n + 1 + 4 ρ(1 B

which yields after taking the supremum over instances x of n jobs

Rn,B(η; Switch) C1 n,B,F 2(C1 n,B,F 1)

n + 1 + 4 ρ(1 B

OPT(x) (36)

C1 n,B,F + 4 ρ(1 B

n 1 ) + 4 ρ(1 B

C.5. Proof of Theorem 4.5

Proof. Let ρ (0, 1], and x = (x1, . . . , xn) be an instance of job sizes all at least equal to 1. Before starting to run, the algorithm has access to predictions yσ(1), . . . , yσ(B) of xσ(1), . . . , xσ(B), it samples ξ 1+E(1/ρ) and sets the breakpoints zσ(i) = ξyσ(i) for all i [B]. With σ and the breakpoints fixed, the preferential algorithm runs Algorithm 2 at rate λ and round-robin at rate 1 λ. Lemma C.1 and Equation 1 guarantee that, with a fixed permutation σ and fixed breakpoints, Algorithm 2 is monotonic, as all the mutual delays Pσ(i)σ(j) are non-decreasing functions of xσ(i) and xσ(j). Following the proof of Lemma 3.1 in (Purohit et al., 2018), when running the preferential algorithm, the completion times of Algorithm 2 increase by a factor of 1/λ, while the completion times of round-robin increase by a factor 1/(1 λ), and since both algorithms are monotonic, the fact that some jobs are run simultaneously by both of them can only improve the objective function. Denoting Switch(ξyσ, x) the objective function of Algorithm 2 with instance x = (xi)n i=1 and breakpoints ξyσ = (ξyσ(i))B i=1, and RR(x) the objective function of round-robin with x, it follows that

ALGλ(x) min Switch(ξyσ, x)

Given that RR(x) is deterministic and 2-competitive, and using Lemma 4.4, we obtain by taking the expectation that

E[ALGλ(x)] E min RR(x)

1 λ , Switch(ξyσ, x)

1 λ , E[Switch(ξyσ, x)]

n 1 ) + 4 ρ(1 B

n n E[ησ] OPT(x) λ

which concludes the proof.