# nonclairvoyant_scheduling_with_partial_predictions__aaea6953.pdf Non-clairvoyant Scheduling with Partial Predictions Ziyad Benomar 1 2 Vianney Perchet 1 3 The non-clairvoyant scheduling problem has gained new interest within learning-augmented algorithms, where the decision-maker is equipped with predictions without any quality guarantees. In practical settings, access to predictions may be reduced to specific instances, due to cost or data limitations. Our investigation focuses on scenarios where predictions for only B job sizes out of n are available to the algorithm. We first establish near-optimal lower bounds and algorithms in the case of perfect predictions. Subsequently, we present a learning-augmented algorithm satisfying the robustness, consistency, and smoothness criteria, and revealing a novel tradeoff between consistency and smoothness inherent in the scenario with a restricted number of predictions. 1. Introduction Optimal job scheduling is a longstanding and actively studied class of optimization problems (Panwalkar & Iskander, 1977; Lenstra & Rinnooy Kan, 1978; Graham et al., 1979; Martel, 1982; Cheng & Sin, 1990; Lawler et al., 1993; Pinedo, 2012), with applications in various domains spanning from supply chain management (Hall & Potts, 2003; Ivanov et al., 2016) to operating systems (Jensen et al., 1985; Ramamritham & Stankovic, 1994; Steiger et al., 2004). A particular setting is preemptive single-machine scheduling (Pinedo, 2012; Baker & Trietsch, 2013), where n jobs i [n] must be executed on the same machine, with the possibility of interrupting a job and resuming it afterward, and the objective is to minimize the sum of their completion times. An algorithm is called clairvoyant if it has initial access to the job sizes, otherwise, it is called nonclairvoyant (Motwani et al., 1994). The design of nonclairvoyant scheduling algorithms is a classical problem 1ENSAE, FAIRPLAY joint team, CREST, Palaiseau, France 2Ecole polytechnique, Palaiseau, France 3Criteo AI Lab, FAIRPLAY joint team, Paris, France. Correspondence to: Ziyad Benomar . Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). in competitive analysis and online algorithms (Borodin & El-Yaniv, 2005). In this paradigm, decisions must be made in an environment where the parameters governing the outcome are unknown or might evolve over time. Due to the inherent difficulty of the problems in competitive analysis, the performance of any online algorithm stays bounded away from that of the optimal offline algorithm. However, the ascent of machine learning motivated the incorporation of predictions in algorithm design, which started in works such as (Munoz & Vassilvitskii, 2017; Kraska et al., 2018), then was formalized in (Lykouris & Vassilvtiskii, 2018) and (Purohit et al., 2018). Since then, learningaugmented algorithms became a popular research topic and had multiple applications (Mitzenmacher & Vassilvitskii, 2022). The outcome of these algorithms depends both on the parameters of the problem and the quality of the predictions. They are required to have a performance that is near-optimal when the predictions are accurate (consistency), near the worst-case performance without advice if the predictions are arbitrarily erroneous (robustness), and that degrades smoothly as the prediction error increases (smoothness). In practice, predictions often incur costs and, at times, are infeasible due to the lack of data. It is, therefore, crucial to understand the limitations and the feasible improvements with a restrained number of predictions in scenarios with multiple unknown variables. This question was first investigated for the caching problem (Im et al., 2022), and very recently for metrical task systems (Sadek & Elias, 2024), in settings where the algorithm is allowed to query a limited number of predictions. It was also explored for the scheduling problem (Benomar & Perchet, 2023), assuming that the decision-maker can query the true sizes of B jobs out of n. The authors present a 2 B(B 1) n(n 1) -competitive algorithm, and they give a lower bound on the competitive ratio of any algorithm only when B = o(n). The case of imperfect predictions, however, is not examined. In non-clairvoyant scheduling, besides the querying model studied in the works mentioned above, predictions of the sizes of certain jobs i I may be available, where I [n], perhaps derived from previous executions of similar tasks. Assuming that I is a subset of [n] of size B, taken uniformly at random, we examine the limitations and possible improvements of non-clairvoyant algorithms. Non-clairvoyant Scheduling with Partial Predictions 1.1. Contributions We initiate our analysis by addressing the scenario of perfect predictions. We establish a lower bound on the (n, B)- competitive ratio of any algorithm (for fixed n 2 and B n), which extends the lower bound of 2 4 n+3 in the non-clairvoyant case (Motwani et al., 1994). Considering that B = wn + o(n) for some w [0, 1], we derive from the prior bound that the competitive ratio of any algorithm is at least 2 w ( 4 e 1)w(1 w), and we show an improved bound of 2 w (3 2 2)w(1 w). Demonstrating these bounds is considerably more challenging than the case B = 0, due to the eventual dependency between the actions of the algorithm and the known job sizes. In the case of perfect predictions, we show that knowing only the relative order of the B job sizes, without knowledge of their values, enables a (2 B n )-competitive algorithm, which improves substantially upon the result of (Benomar & Perchet, 2023). We propose a second algorithm leveraging the true sizes of the B jobs, yielding an (n, B)-competitive ratio of (2 B n+1 ), which is strictly better than the former, although both are asymptotically equivalent. Subsequently, we adapt the latter algorithm to handle imperfect predictions. While the difficulty in most works on learning-augmented algorithms lies in ensuring robustness and consistency, smoothness in the case of scheduling with limited predictions is also not immediate. Alongside the typical consistency-robustness tradeoff, our algorithm also exhibits a consistency-smoothness tradeoff. More precisely, governed by two hyperparameters λ, ρ [0, 1], the (n, B)-competitive ratio of the algorithm is at most min( 2 1 λ, C OPT ). Here, E[η] denotes the total expected prediction error, OPT is the objective function achieved by the optimal offline algorithm, 2 1 λ is the algorithm s robustness, C λ = 1 λ(2 B n 1 )) its consistency, and S n ) its smoothness factor, characterizing the sensitivity of the bound to E[η]. Notably, alterations in the parameter ρ yield opposing variations on the consistency and the smoothness factor. Nonetheless, this tradeoff vanishes for B close to 0 or n, and does not appear, for instance, in (Purohit et al., 2018), (Bampis et al., 2022) or (Lindermayr & Megow, 2022). We illustrate our results for the case of perfect predictions in Figure 1, comparing them with the competitive ratio proved in (Benomar & Perchet, 2023). 1.2. Related work Since their introduction in (Purohit et al., 2018; Lykouris & Vassilvtiskii, 2018), learning-augmented algorithms witnessed an exponentially growing interest, as they offered a fresh perspective for revisiting online algorithms, and pro- 0.0 0.2 0.4 0.6 0.8 1.0 Fraction B/n of known job sizes Competitive ratio Lower bound (Corollary 2.2) Lower bound (Corollary 2.3) CR of Algorithms 1 and 2 CR of (Benomar & Perchet, 2023) Figure 1. Lower bounds and competitive ratios for B known job sizes. vided new applications for machine learning in algorithm design (Mitzenmacher & Vassilvitskii, 2022) and in the implementation of data structures (Kraska et al., 2018; Lin et al., 2022). Many fundamental problems in competitive analysis were studied in this setting, such as ski rental (Gollapudi & Panigrahi, 2019; Anand et al., 2020; Bamas et al., 2020b; Diakonikolas et al., 2021; Antoniadis et al., 2021a; Maghakian et al., 2023; Shin et al., 2023), secretary (Antoniadis et al., 2020; D utting et al., 2021), matching (Dinitz et al., 2021; Chen et al., 2022; Sakaue & Oki, 2022; Jin & Ma, 2022), caching and metrical task systems (Lykouris & Vassilvtiskii, 2018; Chlkedowski et al., 2021; Antoniadis et al., 2023b;a; Christianson et al., 2023). In particular, scheduling is one of the problems that were studied most thoroughly. Different works cover various objective functions (Purohit et al., 2018; Lattanzi et al., 2020; Azar et al., 2021), prediction types (Antoniadis et al., 2021b; Merlis et al., 2023; Lassota et al., 2023), error metrics (Im et al., 2021; Lindermayr & Megow, 2022), and other aspects and applications (Wei & Zhang, 2020; Bamas et al., 2020a; Dinitz et al., 2022) The setting of learning-augmented algorithms with limited predictions was initially explored by Im et al. (2022) for caching. The authors presented an algorithm using parsimonious predictions, with a competitive ratio increasing with the number of allowed queries. In another very recent paper (Sadek & Elias, 2024), a similar setting is studied for the more general problem of metrical task systems, where the algorithm is allowed to query a reduced number of action predictions (Antoniadis et al., 2023b), each giving the state of an optimal algorithm at the respective query step. An additional related study by Drygala et al. (2023) focuses on the ski-rental and Bahncard problems in a penalized adaptation of the setting with limited advice, where the cost of the predictions is added to the algorithm s objective function. Other works have explored related settings with different types of limited advice. For instance, the setting with a restricted number of perfect hints was examined in the con- Non-clairvoyant Scheduling with Partial Predictions text of online linear optimization by (Bhaskara et al., 2021) and in the multi-color secretary problem by (Benomar et al., 2023). Another setting, where the algorithm can query two types of hints one that is free but possibly inaccurate, and another that is expensive but accurate has been studied in several problems, such as correlation clustering (Silwal et al., 2023), computing minimum spanning trees in a metric space (Bateni et al., 2023), sorting (Bai & Coester, 2024), and matroid optimization (Eberle et al., 2024). In the context of scheduling, Benomar & Perchet (2023) introduced the B-clairvoyant scheduling problem, where an algorithm can query the exact sizes of B jobs at any moment during its execution. They show that the optimal strategy involves querying the sizes of B jobs selected uniformly at random at the beginning of the process. They establish that, if B = o(n), then the competitive ratio of any algorithm is at least 2, then they provide a 2 B(B 1) n(n 1) -competitive algorithm. The same paper also addresses the secretary problem with restricted access to binary predictions of known accuracy, and the ski-rental problem with access to an oracle whose accuracy improves progressively over time. The limit scenario B = 0 corresponds to the non-clairvoyant scheduling problem, studied in-depth in (Motwani et al., 1994). In particular, the paper demonstrates that the competitive ratio of any non-clairvoyant algorithm is at least 2, and that it is achieved by the round-robin algorithm, executing all unfinished jobs concurrently at equal rates. On the other hand, B = n corresponds to the setting presented in (Purohit et al., 2018), where the authors introduce, for all λ (0, 1), a preferential round-robin algorithm with robustness 2 1 λ and consistency 1 1.3. Problem and notations The decision-maker is given n jobs i [n] with unknown sizes x1, . . . , xn to schedule on a single machine, and predictions (yi)i I of (xi)i I, with I a uniformly random subset of [n] of size B. The objective is to leverage the available predictions to minimize the sum of the completion times. We assume that preemption is allowed, i.e. it is possible to interrupt the execution of a job and resume it later, which is equivalent, by neglecting the preemption cost, to assuming that the jobs can be run in parallel at rates that sum to at most 1. To simplify the presentation, we consider that there are predictions y1, . . . , yn of x1, . . . , xn, but the decision-maker has only access to yσ(1), . . . , yσ(B), where σ is a uniformly random permutation of [n]. We denote by ηi = |xi yi| the error of the prediction yi, and by ησ = PB i=1 ησ(i) the total error of the predictions accessed by the algorithm. Consider an algorithm A and an instance x = (x1, . . . , xn) of job sizes, we denote by A(x) the sum of the completion times of all the jobs when executed by A. Furthermore, for all i = j [n] and t > 0, we denote by SA i (t) the processing time spent on job i until time t, t A i = inf{t 0 : SA i (t) = xi} its completion time, DA ij = SA i (t A j ) the total time spent on job i before job j terminates, and P A ij = DA ij + DA ji the mutual delay caused by i, j to each other. When there is no ambiguity, we omit writing the dependency to A. With these notations, it holds that t A i = xi+P j =i DA ji for all i [n]. Consequently, the objective function of A can be expressed as 1 i 0, the algorithm can act according to the information it has on (xi)i I, and the dependence between its actions and these job sizes makes the analysis more sophisticated. For any positive and continuous function φ, and positive numbers T x > 0 we denote Gφ(x, T) = Z T x dt φ(t) + x φ(T x) . (3) We prove in the following theorem a generic lower bound, using job sizes sampled independently from the distribution Pr(xi t) = 1 φ(0) Theorem 2.1. Let φ : [0, ) [0, ) be a continuously differentiable and increasing function satisfying φ(0) > 0, φ /φ is non-increasing and R 0 dt φ(t)2 < , and let αφ a non-negative constant satisfying Z inf T x Gφ(x, T) φ (x) φ(x)2 dx αφ dt φ(t)2 . (4) If B = wn + o(n) for some w [0, 1], then it holds for any randomized algorithm A that CRB(A) 2 2(2 αφ)w + (3 2αφ)w2 . Moreover, if R 0 dt φ(t) < , then for all n 2 and B n Rn,B(A) Cφ,n,B Cφ,n,B 1 R 0 dt φ(t)2 R 0 dt φ(t) where Cφ,n,B = (2 B n ) (3 2αφ) B To establish this theorem, we analyze i.i.d. job sizes sampled from the distribution Pr(xi t) = 1 φ(0) derive in Lemma A.5 a lower bound on the mutual delays incurred by these jobs during the run of any algorithm A. This involves solving a functional minimization problem, whose solution is expressed using the function Gφ defined in (3). The left term in Inequality (4) is proportional to the obtained lower bound, while the right term is proportional to E[min(xi, xj)], which is the mutual delay caused in a run of OPT. Finally, using the identity (1), this inequality, which relates the mutual delays caused respectively by executing A and OPT on the chosen job sizes, can be extended to an inequality involving the objectives of both algorithms, giving a lower bound on the competitive ratio. If R 0 dt φ(t) = , the expectation of the job sizes is infinite. In this case, we consider a truncated distribution with a maximum a > 0. After completing the analysis, we derive a lower bound that depends on a and w, by considering B = wn + o(n) and n , then, we conclude by taking the limit a . For any value αφ in Theorem 2.1, observe that Cφ,n,0 = 2 and Cφ,n,n = 1, which means that the lower bound interpolates properly the non-clairvoyant and clairvoyant settings. The remaining task is to choose an adequate function φ satisfying the conditions of the theorem with αφ as large as possible. We first consider exponentially distributed job sizes, often used to prove lower bounds in scheduling problems. This corresponds to φ(t) = et. Corollary 2.2. For any algorithm A, it holds that Rn,B(A) Cn,B 4(Cn,B 1) with Cn,B = 2 B n 1 . In particular, if B = wn + o(n) then CRB(A) (2 w) ( 4 e 1)w(1 w) . Corollary 2.2 gives, in particular, that Rn,0(A) 2 4 n+3, which corresponds exactly to the tight lower bound for the non-clairvoyant scheduling problem (Motwani et al., 1994). However, the bound is not tight for all values of B n. To refine it, we consider distributions that would be more difficult to process by the algorithm. One idea is to sample a different parameter λi E(1) independently for each i [n], then sample xi E(λi). The distribution of xi in this case is given by Pr(xi t) = Z 0 Pr(xi t | λi = λ)e λdλ 0 e (1+t)λdλ = 1 1 + t , which corresponds to φ(t) = 1 + t. More generally, we consider φ(t) = (1 + t)r for r ( 1 2, 1]. Such functions Non-clairvoyant Scheduling with Partial Predictions φ correspond to distributions with a heavy tale and with infinite expectation (i.e. R 0 dt φ(t) = ). Using Theorem 2.1, they only yield lower bounds on the competitive ratio but not on Rn,B. Corollary 2.3 shows the bound obtained for r 1 Corollary 2.3. Let w [0, 1]. If B = wn + o(n), then it holds for any algorithm A that CRB(A) (2 w) (3 2 Remark 2.4. If xi is sampled from the distribution induced by φ(t) = (1+t)r, then xi +1 follows a Pareto distribution with scale 1 and shape r (Arnold, 2014), which is commonly used to model the distribution of job sizes in the context of the scheduling problem. 3. Known Partial Order Before investigating the problem within the learningaugmented framework, we introduce an algorithm exclusively for the scenario with perfect information. Subsequently, in Section 4, we present a second algorithm, that we adapt to handle possibly erroneous predictions. The optimal algorithm OPT does not necessitate precise knowledge of job sizes. Instead, it relies solely on their ordering. This observation suggests that it might be possible to improve the competitive ratio of the non-clairvoyant case by only knowing the relative order of a subset of the job sizes. Therefore, rather than having access to the values xσ(1), . . . , xσ(B), we assume that the decision-maker is only given a priority ordering π of them, i.e. a bijection π : [B] σ([B]) satisfying xπ(1) . . . xπ(B). Algorithm 1 Catch-up and Resume Round-Robin (CRRR) Input: Ordering π of xσ(1), . . . , xσ(1) Set xπ(0) = 0 for i = 1 to B do Run job π(i) for xπ(i 1) units of time while job π(i) is not finished do Run round-robin on {σ(j)}j>B {π(i)} end while end for Run round-robin on {σ(j)}n j=B+1 until completion In Algorithm 1 (CRRR), for all i [B], the execution of job π(i) starts only upon the completion of job π(i 1). At this moment, all jobs σ(j) for j > B are either completed or have undergone execution for xπ(i 1) units of time. CRRR then runs job π(i) for a period of length xπ(i 1) to catch up with the progress of the jobs {σ(j)}j>B. Following this synchronization phase, it runs round-robin on the set of jobs {σ(j)}j>B {π(i)} until π(i) terminates. The same process iterates with π(i + 1) afterward. Once all the jobs σ(i) for i [B] are completed, the algorithm runs round robin on the unfinished jobs in {σ(j)}j>B. Leveraging the ordering π, the algorithm aims to minimize the delays caused by longer jobs to shorter ones. In the ideal scenario where B = n, each job begins execution only after all shorter ones have been completed. When B < n, it is evident that the jobs {σ(i)}i [B] should be executed in the order specified by π. However, CRRR takes advantage of this ordering even further, ensuring that job xπ(i) not only avoids delaying xπ(i 1) but also does not delay any job σ(j) with j > B that has a size at most xπ(i 1). Theorem 3.1. Algorithm CRRR satisfies n ) (n + 1)(B + 1) Rn,B(CRRR) 2 B Moreover, if B = wn for some w [0, 1], then CR(CRRR) = 2 w. Theorem 3.1 shows a substantially stronger result than the one presented in (Benomar & Perchet, 2023), where the algorithm leveraging the values of the job sizes xσ(1), . . . , xσ(B) is only 2 B(B 1) n(n 1) -competitive. Action predictions The information provided to CRRR is the order in which the jobs {σ(i)}i [B] would be executed by OPT. This corresponds to the error-free scenario of action predictions (Antoniadis et al., 2023b; Lindermayr & Megow, 2022; Lassota et al., 2023; Sadek & Elias, 2024), where the decision-maker receives predictions regarding the actions taken by the optimal offline algorithm, rather than numeric predictions of unknown parameters. In the context of the scheduling problem, utilizing the ℓ1 norm to measure the error is not ideal for analyzing the action prediction setting (Im et al., 2021). Alternative error metrics, which account for the number of inversions in the predicted permutation in comparison to the true one (Lindermayr & Megow, 2022), would be more suitable. Therefore, adapting CRRR to imperfect action predictions is left for future research as it requires different considerations. For now, we shift our focus to introducing another algorithm that utilizes not only the priority order induced by the job sizes, but the size values themselves. 4. Predictions of the Job Sizes We propose in this section a generic algorithm Switch, which we will adapt in the cases of perfect and imperfect predictions. The algorithm takes as input n jobs with unknown sizes and breakpoints zσ(1), . . . , zσ(B) that depend on the predictions of xσ(1), . . . , xσ(B), then it alternates running round-robin on the jobs {σ(j)}j>B and Shortest Predicted Job First (SPJF), introduced in (Purohit et al., 2018), on the Non-clairvoyant Scheduling with Partial Predictions jobs {σ(i)}i [B], where the moment of switching from an algorithm to another is determined by the breakpoints. As in Section 3, we call ordering of zσ(1), . . . , zσ(B) any bijective application π : [B] σ([B]) satisfying zπ(1) . . . zπ(B). Note that, if the breakpoints are not pairwise distinct, then the ordering is not unique. In that case, Switch chooses an ordering π uniformly at random. We assume furthermore that the breakpoints induce the same order as the predictions, i.e. zσ(i) < zσ(j) yσ(i) < yσ(j) for all i, j [B]. Algorithm 2 Switch algorithm Switch(zσ, x) Input: Breakpoints zσ = (zσ(i))i [B] π ordering of zσ chosen uniformly at random for i = 1 to B do while min j>B xσ(j) < 1 and max j>B Sσ(j)(t) < zπ(i) do Run round-robin on {σ(j)}n j=B+1 end while Run job π(i) until completion end for Run round-robin on {σ(j)}n j=B+1 until completion Consider a run of Switch, and let i [B]. The first condition for entering the while loop is the existence of j > B such that Sσ(j)(t) < xσ(j). This signifies that the jobs {σ(j)}j>B are not all completed, which is a verification feasible for the decision-maker without knowledge of the sizes {xσ(j)}j>B. The second condition means that no job σ(j) for j > B has been in execution for more than zπ(i) units of time. Given that round-robin allocates equal importance to all jobs {σ(j)}j>B, upon exiting the while loop, each job σ(j) is either completed or has been in execution for precisely zπ(i) units of time. Following this step, job π(i) is executed until completion, and the same process recurs for i + 1. This algorithm ensures that any job xσ(i) with i B does not delay any other job j whose size is at most xj zσ(i), and the delay it causes to jobs not satisfying this condition is exactly xσ(i). This allows efficient control of the mutual delays between the jobs by conveniently choosing the breakpoints. 4.1. Perfect Predictions Assuming that the predictions are perfect, i.e. the decisionmaker knows the exact sizes of jobs σ(1), . . . , σ(B), it is possible to set zσ(i) = xσ(i) for all i [B]. Theorem 4.1. Algorithm Switch with breakpoints zσ(i) = xσ(i) for all i [B] satisfies Rn,B(Switch) = 2 B n ) n + 1 . In particular, if B = wn for some w [0, 1] then CRB(Switch) = 2 w. Note that the (n, B)-competitive ratio above is strictly better than that of CRRR, presented in Theorem 3.1. However, both algorithms have equivalent performance when n is large. In particular, their competitive ratios coincide when B = wn . A slight improvement on the (n, B)-competitive ratio can be obtained by introducing randomness into Switch. Indeed, consider the Run To Completion algorithm (RTC) defined in (Motwani et al., 1994), executing all the jobs until completion in a uniformly random order. Then we have the following result. Proposition 4.2. The algorithm that runs RTC with probability 2(n B) n(n+3) 2B , and runs Switch with breakpoints zσ(i) = xσ(i) for all i [B] with the remaining probability, has an (n, B)-competitive ratio of n ) n + 3 2B For B = 0, the ratio above becomes 2 4 n+3, which is the best possible in the non-clairvoyant setting. 4.2. Imperfect Predictions We assume in this section that no quality guarantees are given on the predictions {yσ(i)}i [B]. Recall that the total error ησ = PB i=1 |xσ(i) yσ(i)| is a random variable because σ is a uniformly random permutation of [n], hence our results will depend on E[ησ]. The goal is to design an algorithm that is consistent, robust, and with a competitive ratio having a smooth dependency to E[ησ]. We first study the consistency and smoothness of Switch with well-chosen breakpoints, then we show that combining it with round-robin as in (Purohit et al., 2018; Lassota et al., 2023) gives robustness guarantees. Using the trivial breakpoints zσ(i) = yσ(i) as in the previous section is not enough to guarantee smoothness. Consider, for example, job sizes all equal to 1, and B predictions yσ(i) = 1 ϵ for an arbitrarily small ϵ. Blindly following these predictions, taking zσ(i) = yσ(i) for all i [B], results in delaying all jobs with unknown sizes by B time units compared to the case of perfect predictions. This creates a discontinuity in the competitive ratio when ϵ becomes positive, proving non-smoothness. Hence, we consider instead randomized breakpoints. Algorithm 3 simply runs Switch with random breakpoints zσ(i) = ξyσ(i). The following lemma gives an upper bound on the algorithm s objective function depending on the distribution F of ξ. Non-clairvoyant Scheduling with Partial Predictions Algorithm 3 imperfect predictions Switch(ξyσ, x) Input: predictions (yσ(i))i [B], distribution F Sample ξ F Run Switch with breakpoints zσ(i) = ξyσ(i) Lemma 4.3. Let F be a probability distribution on (0, ), and consider the mappings h F : (0, )2 R and g F : (0, ) R defined by h F (s, t) = t Prξ F (ξ < s g F (s) = (1 s)Prξ F (ξ < s) + Eξ F [ξ1ξ1 a shifted exponential distribution with parameter 1/ρ, i.e. ξ 1 + E(1/ρ), then Switch with breakpoints zσ(i) = ξyσ(i) for all i [B] has an (n, B)-competitive ratio of at most 2 B n 1 ) + 4 ρ(1 B The previous lemma highlights a tradeoff between the smoothness and the consistency of the algorithm. Indeed, as ρ decreases, the algorithm gains in consistency, but the term 4 ρ(1 B n multiplying E[ησ] becomes larger. However, while setting ρ close to zero results in an arbitrarily high sensitivity to the error, setting it close to 1 gives consistency of at most 2 B(B 1) n(n 1) , which is still a decreasing function of B, interpolating the values 2 and 1 in the clairvoyant and non-clairvoyant cases. This implies that sacrificing a small amount of consistency significantly improves smoothness. For B = n, assuming that all the job sizes are at least 1, it holds that OPT(x) n(n + 1)/2 and the lemma gives Rn,B(η; Switch) 1 + 2η n , matching the bound proved on (SPJF) in (Purohit et al., 2018). On the other hand, if η = 0, setting ρ = 0 results in Rn,B(0; Switch) 2 B n . Using tighter inequalities in the proof, it is possible to retrieve the bound established in Theorem 4.1 (See Inequality (36) in Appendix C). Preferential algorithm Now we need to adapt the algorithm to guarantee robustness in the face of arbitrarily erroneous predictions. We use the same approach as Lemma 3.1 of (Purohit et al., 2018), which consists of running concurrently a consistent algorithm and round-robin at respective rates λ, 1 λ for some λ [0, 1]. However, their result only applies for deterministic algorithms A satisfying for any instances x = (x1, . . . , xn) and x = (x 1, . . . , x n) that i [n] : xi x i = A(x) A(x ) . Such algorithms are called monotonic. Switch with breakpoints zσ(i) = ξyσ(i) is not deterministic since its objective function depends both on σ and ξ. Nonetheless, we overcome this difficulty by proving that, conditionally to σ and ξ, its outcome is deterministic and monotonic, then we establish the following theorem. Theorem 4.5. Let ρ (0, 1] and F = 1+E(1/ρ). Then the preferential algorithm ALGλ which runs Algorithm 3 at rate λ and round-robin at rate 1 λ has an (n, B)-competitive ratio of at most min 2 1 λ , Cρ,n,B Cρ,n,B = 2 B This upper bound generalizes that of (Purohit et al., 2018). It presents a consistency-robustness tradeoff that can be tuned by adjusting the parameter λ, and a consistency-smoothness tradeoff controlled by the parameter ρ, which vanishes for B close to 0 or n, as the terms multiplying ρ and 1/ρ respectively in Cρ,n,B and Sρ,n,B become zero. 5. Experiments In this section, we validate our theoretical findings by testing the algorithms we presented on various benchmark job sizes. In all the figures, each point is averaged over 104 independent trials. Non-clairvoyant Scheduling with Partial Predictions Perfect information We test the performance of Algorithms Switch and CRRR against the hard instances used to prove the lower bounds of Section 2: we consider i.i.d. job sizes sampled from the exponential distribution with parameter 1, and job sizes drawn from the distribution Φ(r, a) with parameters r = 0.51 and a = 104, characterized by the tail probability Pr(xa i t) = (1 + t) r (1 + a) r 1 (1 + a) r 1t 0 and r ( 1 2, 1), and taking the limits n , a , and r 1/2. Switch CRRR Cor 2.3 xi Φ(0.51,104) Switch CRRR Cor 2.4 0 1/2 1 1.00 2.00 Switch CRRR Cor 2.3 Switch CRRR Cor 2.4 Empirical competitive ratio fraction B/n of known job sizes Figure 2. Lower bounds and ratios of Switch, CRRR Figure 2 exhibits the empirical ratios achieved by both algorithms with a number n {20, 1000} of jobs. For n = 20, Switch outperforms CRRR for the two distributions, whereas their ratios are very close for n = 1000. This confirms that Switch and CRRR are asymptotically equivalent, as can be deduced from Theorems 3.1 and 4.1. For the exponential distribution, as expected, both algorithms have ratios above the non-asymptotic lower bound of Corollary 2.2. Meanwhile, considering the distribution Φ(0.51, 104), the empirical ratios for n = 20 are below the lower bound of Corollary 2.3, because it is proved by taking n . For n = 1000, the ratios match the lower bound. Preferential algorithm In the remaining discussion, we refer to Switch with breakpoints zσ(i) = ξyσ(i) and ξ 1 + E(1/ρ), as Switch with parameter ρ. We generate a synthetic instance of n = 50 job sizes, drawn independently from the Pareto distribution with scale 1 and shape 1.1. The Pareto distribution, known for its heavy tail, is particularly suitable for modeling job sizes (Harchol Balter & Downey, 1997; Bansal & Harchol-Balter, 2001; Arnold, 2014), and it is a commonly used benchmark for learning-augmented scheduling algorithms (Purohit et al., 2018; Lindermayr & Megow, 2022). Furthermore, we consider noisy predictions yi = xi + εi for all i [50], where εi is sampled independently from a normal distribution with mean 0 and standard deviation τ. 0 20 40 60 80 100 Error parameter τ Emprical competitive ratio PA(ρ,λ) with B = n/2 Round-robin Switch(ρ = 0.5) PA(ρ = 0,λ = 0.5) PA(ρ = 0.5,λ = 0.5) 0 100 200 300 400 500 Error parameter τ Switch(ρ = 0.5,B) for B [1,n] B=0 B = 10 B = 20 B = 30 B = 35 B = 50 Figure 3. Preferential Algorithm (PA) with different parameters Figure 3 illustrates the empirical ratio of the Preferential Algorithm (PA) across various parameter configurations, with varying error parameter τ. The left plot displays the ratios for different λ and ρ values, with B = 25 = n/2. When λ = 0, PA becomes roundrobin. For λ = 1 and ρ = 0.5, PA simply runs Switch(ρ = 0.5), which gives an improved consistency (τ = 0), not equal to 1 because B < n and ρ > 0, and gives a ratio that deteriorates arbitrarily as τ increases. In contrast, PA with λ = 0.5 gives a weaker consistency but maintains bounded ratios, even with arbitrarily erroneous predictions. The choice of ρ = 0 exhibits a slightly better consistency compared to ρ = 0.5, in line with theoretical expectations, but there is no significant difference regarding sensitivity to errors. This should not be surprising since setting ρ > 0 ensures smoothness in the worst-case (see Figure 4), but it is not necessarily needed for all instances. The right plot examines the influence of B on PA with parameters λ = 1 and ρ = 0.5, which corresponds to Switch with ρ = 0.5. Larger B values improve consistency and also yield a smaller sensitivity to small errors. However, for high τ values, having numerous predictions leads to faster performance deterioration compared to having fewer predictions. This shows that more predictions enhance consistency, while fewer predictions enhance robustness. Consistency-smoothness To shed light on the tradeoff between consistency and smoothness raised in Section 4.2, we consider i.i.d. job sizes x1, . . . , x100, each taking the value 1 w.p. 1/2 and 2 w.p. 1/2, and we consider noisy predictions of the form yi = xi + εi, where εi follows a uniform distribution over [ τ, τ]. Figure 4 illustrates the evolution, for τ varying in [0, 0.15], of the empirical competitive ratio of Switch with parameter ρ {0, 0.1, 0.5} Non-clairvoyant Scheduling with Partial Predictions and B {50, 95}, For both values of B, the experiment reveals that larger values of ρ give bigger ratios when τ = 0 (less consistency), but on the other hand they yield less sensitivity to variations of the expected prediction error (better smoothness), which confirms our theoretical findings. In particular, for ρ = 0, a significant discontinuity arises when τ becomes positive. Figure 4 also shows that this tradeoff is less significant as B approaches n = 100, with the consistency values for ρ {0, 0.1, 0.5} drawing closer. 0.00 0.05 0.10 0.15 Error parameter τ Empirical competitive ratio ρ = 0, B = 50 ρ = 0.1, B = 50 ρ = 0.5, B = 50 0.00 0.05 0.10 0.15 Error parameter τ ρ = 0, B = 95 ρ = 0.1, B = 95 ρ = 0.5, B = 95 Figure 4. Tradeoff between consistency and smoothness 6. Conclusion and Future Work This paper explores the non-clairvoyant scheduling problem with a limited number of predicted job sizes. We give near optimal lower and upper bounds in the case of perfect predictions, and we introduce a learning-augmented algorithm raising the common consistency-robustness tradeoff and an additional consistency-smoothness tradeoff, the latter vanishing when B approaches 0 or n. Our findings join previous works in demonstrating that online algorithms can indeed achieve improved performance even when armed with a restricted set of predictions, which is an assumption more aligned with practical scenarios. Furthermore, they affirm the necessity of studying and understanding these regimes, as they may unveil unique behaviors absent in the zeroor full-information settings. 6.1. Open Questions Tight lower bounds In the case of perfect predictions, there is a (small) gap between the lower bounds of Section 2 and the competitive ratios of Switch and CRRR. An interesting research avenue is to close this gap, either by designing better algorithms or improving the lower bound. This could involve using Theorem 2.1 with more refined distributions. Reduced number of action predictions Algorithm Switch leverages the job sizes predictions, not only the order they induce. Using the ℓ1 norm to measure the error is thus a suitable choice. However, as discussed in Section 3, Algorithm CRRR only uses the priority order in which OPT runs (xσ(i))i [B]. An interesting question to explore is how to adapt it in the case of imperfect action predictions, using appropriate error measures. Smooth and (2 B n )-consistent algorithm Lemma 4.4 and Figure 4 emphasize that, to achieve smoothness, Switch with parameter ρ must exhibit a consistency exceeding 2 B n . A compelling question arises: Is it possible to devise a smooth algorithm with a consistency of at most 2 B Impact Statement This paper presents a work whose goal is to advance the field of learning-augmented algorithms. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here. Acknowledgements Vianney Perchet acknowledges support from the French National Research Agency (ANR) under grant number (ANR19-CE23-0026 as well as the support grant, as well as from the grant Investissements d Avenir (Lab Ex Ecodec/ANR11-LABX-0047). Anand, K., Ge, R., and Panigrahi, D. Customizing ml predictions for online algorithms. In International Conference on Machine Learning, pp. 303 313. PMLR, 2020. Antoniadis, A., Gouleakis, T., Kleer, P., and Kolev, P. Secretary and online matching problems with machine learned advice. Advances in Neural Information Processing Systems, 33:7933 7944, 2020. Antoniadis, A., Coester, C., Eli as, M., Polak, A., and Simon, B. Learning-augmented dynamic power management with multiple states via new ski rental bounds. Advances in Neural Information Processing Systems, 34:16714 16726, 2021a. Antoniadis, A., Ganje, P. J., and Shahkarami, G. A novel prediction setup for online speed-scaling. ar Xiv preprint ar Xiv:2112.03082, 2021b. Antoniadis, A., Boyar, J., Eli as, M., Favrholdt, L. M., Hoeksma, R., Larsen, K. S., Polak, A., and Simon, B. Paging with succinct predictions. In International Conference on Machine Learning, pp. 952 968. PMLR, 2023a. Antoniadis, A., Coester, C., Eli aˇs, M., Polak, A., and Simon, B. Online metric algorithms with untrusted predictions. ACM Transactions on Algorithms, 19(2):1 34, 2023b. Arnold, B. C. Pareto distribution. Wiley Stats Ref: Statistics Reference Online, pp. 1 10, 2014. Non-clairvoyant Scheduling with Partial Predictions Azar, Y., Leonardi, S., and Touitou, N. Flow time scheduling with uncertain processing time. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pp. 1070 1080, 2021. Bai, X. and Coester, C. Sorting with predictions. Advances in Neural Information Processing Systems, 36, 2024. Baker, K. R. and Trietsch, D. Principles of sequencing and scheduling. John Wiley & Sons, 2013. Bamas, E., Maggiori, A., Rohwedder, L., and Svensson, O. Learning augmented energy minimization via speed scaling. Advances in Neural Information Processing Systems, 33:15350 15359, 2020a. Bamas, E., Maggiori, A., and Svensson, O. The primal-dual method for learning augmented algorithms. Advances in Neural Information Processing Systems, 33:20083 20094, 2020b. Bampis, E., Dogeas, K., Kononov, A. V., Lucarelli, G., and Pascual, F. Scheduling with untrusted predictions. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, pp. 4581 4587, 2022. Bansal, N. and Harchol-Balter, M. Analysis of srpt scheduling: Investigating unfairness. In Proceedings of the 2001 ACM SIGMETRICS International conference on Measurement and modeling of computer systems, pp. 279 290, 2001. Bateni, M., Dharangutte, P., Jayaram, R., and Wang, C. Metric clustering and mst with strong and weak distance oracles. ar Xiv preprint ar Xiv:2310.15863, 2023. Benomar, Z. and Perchet, V. Advice querying under budget constraint for online algorithms. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. Benomar, Z., Chzhen, E., Schreuder, N., and Perchet, V. Addressing bias in online selection with limited budget of comparisons. ar Xiv preprint ar Xiv:2303.09205, 2023. Bhaskara, A., Cutkosky, A., Kumar, R., and Purohit, M. Logarithmic regret from sublinear hints. Advances in Neural Information Processing Systems, 34:28222 28232, 2021. Borodin, A. and El-Yaniv, R. Online computation and competitive analysis. cambridge university press, 2005. Chen, J., Silwal, S., Vakilian, A., and Zhang, F. Faster fundamental graph algorithms via learned predictions. In International Conference on Machine Learning, pp. 3583 3602. PMLR, 2022. Cheng, T. and Sin, C. A state-of-the-art review of parallelmachine scheduling research. European Journal of Operational Research, 47(3):271 292, 1990. Chlkedowski, J., Polak, A., Szabucki, B., and .Zolna, K. T. Robust learning-augmented caching: An experimental study. In International Conference on Machine Learning, pp. 1920 1930. PMLR, 2021. Christianson, N., Shen, J., and Wierman, A. Optimal robustness-consistency tradeoffs for learning-augmented metrical task systems. In International Conference on Artificial Intelligence and Statistics, pp. 9377 9399. PMLR, 2023. Diakonikolas, I., Kontonis, V., Tzamos, C., Vakilian, A., and Zarifis, N. Learning online algorithms with distributional advice. In International Conference on Machine Learning, pp. 2687 2696. PMLR, 2021. Dinitz, M., Im, S., Lavastida, T., Moseley, B., and Vassilvitskii, S. Faster matchings via learned duals. Advances in neural information processing systems, 34:10393 10406, 2021. Dinitz, M., Im, S., Lavastida, T., Moseley, B., and Vassilvitskii, S. Algorithms with prediction portfolios. Advances in neural information processing systems, 35: 20273 20286, 2022. Drygala, M., Nagarajan, S. G., and Svensson, O. Online algorithms with costly predictions. In International Conference on Artificial Intelligence and Statistics, pp. 8078 8101. PMLR, 2023. D utting, P., Lattanzi, S., Paes Leme, R., and Vassilvitskii, S. Secretaries with advice. In Proceedings of the 22nd ACM Conference on Economics and Computation, pp. 409 429, 2021. Eberle, F., Hommelsheim, F., Lindermayr, A., Liu, Z., Megow, N., and Schl oter, J. Accelerating matroid optimization through fast imprecise oracles. ar Xiv preprint ar Xiv:2402.02774, 2024. Gollapudi, S. and Panigrahi, D. Online algorithms for rentor-buy with expert advice. In International Conference on Machine Learning, pp. 2319 2327. PMLR, 2019. Graham, R. L., Lawler, E. L., Lenstra, J. K., and Kan, A. R. Optimization and approximation in deterministic sequencing and scheduling: a survey. In Annals of discrete mathematics, volume 5, pp. 287 326. Elsevier, 1979. Hall, N. G. and Potts, C. N. Supply chain scheduling: Batching and delivery. Operations Research, 51(4):566 584, 2003. Non-clairvoyant Scheduling with Partial Predictions Harchol-Balter, M. and Downey, A. B. Exploiting process lifetime distributions for dynamic load balancing. ACM Transactions on Computer Systems (TOCS), 15(3):253 285, 1997. Im, S., Kumar, R., Montazer Qaem, M., and Purohit, M. Non-clairvoyant scheduling with predictions. In ACM Symposium on Parallelism in Algorithms and Architectures, 2021. Im, S., Kumar, R., Petety, A., and Purohit, M. Parsimonious learning-augmented caching. In International Conference on Machine Learning, pp. 9588 9601. PMLR, 2022. Ivanov, D., Dolgui, A., Sokolov, B., Werner, F., and Ivanova, M. A dynamic model and an algorithm for short-term supply chain scheduling in the smart factory industry 4.0. International Journal of Production Research, 54(2): 386 402, 2016. Jensen, E. D., Locke, C. D., and Tokuda, H. A time-driven scheduling model for real-time operating systems. In Rtss, volume 85, pp. 112 122, 1985. Jin, B. and Ma, W. Online bipartite matching with advice: Tight robustness-consistency tradeoffs for the two-stage model. Advances in Neural Information Processing Systems, 35:14555 14567, 2022. Kraska, T., Beutel, A., Chi, E. H., Dean, J., and Polyzotis, N. The case for learned index structures. In Proceedings of the 2018 international conference on management of data, pp. 489 504, 2018. Lassota, A. A., Lindermayr, A., Megow, N., and Schl oter, J. Minimalistic predictions to schedule jobs with online precedence constraints. In International Conference on Machine Learning, pp. 18563 18583. PMLR, 2023. Lattanzi, S., Lavastida, T., Moseley, B., and Vassilvitskii, S. Online scheduling via learned weights. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1859 1877. SIAM, 2020. Lawler, E. L., Lenstra, J. K., Kan, A. H. R., and Shmoys, D. B. Sequencing and scheduling: Algorithms and complexity. Handbooks in operations research and management science, 4:445 522, 1993. Lenstra, J. K. and Rinnooy Kan, A. Complexity of scheduling under precedence constraints. Operations Research, 26(1):22 35, 1978. Lin, H., Luo, T., and Woodruff, D. Learning augmented binary search trees. In International Conference on Machine Learning, pp. 13431 13440. PMLR, 2022. Lindermayr, A. and Megow, N. Permutation predictions for non-clairvoyant scheduling. In Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 357 368, 2022. Lykouris, T. and Vassilvtiskii, S. Competitive caching with machine learned advice. In International Conference on Machine Learning, pp. 3296 3305. PMLR, 2018. Maghakian, J., Lee, R., Hajiesmaili, M., Li, J., Sitaraman, R., and Liu, Z. Applied online algorithms with heterogeneous predictors. In International Conference on Machine Learning, pp. 23484 23497. PMLR, 2023. Martel, C. Preemptive scheduling with release times, deadlines, and due times. Journal of the ACM (JACM), 29(3): 812 829, 1982. Merlis, N., Richard, H., Sentenac, F., Odic, C., Molina, M., and Perchet, V. On preemption and learning in stochastic scheduling. In International Conference on Machine Learning, pp. 24478 24516. PMLR, 2023. Mitzenmacher, M. and Vassilvitskii, S. Algorithms with predictions. Communications of the ACM, 65(7):33 35, 2022. Motwani, R., Phillips, S., and Torng, E. Nonclairvoyant scheduling. Theoretical computer science, 130(1):17 47, 1994. Munoz, A. and Vassilvitskii, S. Revenue optimization with approximate bid predictions. Advances in Neural Information Processing Systems, 30, 2017. Panwalkar, S. S. and Iskander, W. A survey of scheduling rules. Operations research, 25(1):45 61, 1977. Pinedo, M. L. Scheduling, volume 29. Springer, 2012. Purohit, M., Svitkina, Z., and Kumar, R. Improving online algorithms via ml predictions. Advances in Neural Information Processing Systems, 31, 2018. Ramamritham, K. and Stankovic, J. A. Scheduling algorithms and operating systems support for real-time systems. Proceedings of the IEEE, 82(1):55 67, 1994. Sadek, K. A. A. and Elias, M. Algorithms for caching and MTS with reduced number of predictions. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum? id=Qu Ii LSkt O4. Sakaue, S. and Oki, T. Discrete-convex-analysis-based framework for warm-starting algorithms with predictions. Advances in Neural Information Processing Systems, 35: 20988 21000, 2022. Non-clairvoyant Scheduling with Partial Predictions Shin, Y., Lee, C., Lee, G., and An, H.-C. Improved learningaugmented algorithms for the multi-option ski rental problem via best-possible competitive analysis. ar Xiv preprint ar Xiv:2302.06832, 2023. Silwal, S., Ahmadian, S., Nystrom, A., Mc Callum, A., Ramachandran, D., and Kazemi, S. M. Kwikbucks: Correlation clustering with cheap-weak and expensivestrong signals. In The Eleventh International Conference on Learning Representations, 2023. URL https: //openreview.net/forum?id=p0JSSa1Au V. Steiger, C., Walder, H., and Platzner, M. Operating systems for reconfigurable embedded platforms: Online scheduling of real-time tasks. IEEE Transactions on computers, 53(11):1393 1407, 2004. Wei, A. and Zhang, F. Optimal robustness-consistency trade-offs for learning-augmented online algorithms. Advances in Neural Information Processing Systems, 33: 8042 8053, 2020. Yao, A. C.-C. Probabilistic computations: Toward a unified measure of complexity. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pp. 222 227. IEEE Computer Society, 1977. Non-clairvoyant Scheduling with Partial Predictions A. Lower bounds A.1. Preliminary result Lemma A.1. Let F be a probability distribution on (0, ) with finite expectation, x an array of n i.i.d. random variables sampled from F, and αn,B 0 satisfying for any deterministic algorithm A with access the sizes of B jobs that Eσ,x[A(x)] Ex[OPT(x)] αn,B , then for any (deterministic or randomized) algorithm ALG, we have Rn,B(ALG) αn,B . Proof. Let F and αn,B be as stated in the lemma. Using Yao s minimax principle (Yao, 1977) we deduce that for any randomized algorithm ALG Eσ,x[ALG(x)] inf A deterministic Eσ,x[A(x)] αn,BEx[OPT(x)] , where the infimum is taken over all deterministic algorithms. The previous inequality can be written as Ex Eσ[ALG(x)] αn,BOPT(x) 0 , and this implies that, necessarily, there exists a value x = (x n, . . . , x n) taken by x verifying Eσ[ALG(x )] αn,BOPT(x ) 0, hence Rn,B(ALG) Eσ[ALG(x )] OPT(x ) αn,B . A.2. Proof of Theorem 2.1 Let A be a deterministic algorithm. Using Equation 1, it suffices to bound E[P A ij] for all i = j to deduce a bound on E[A(x)], and since P A ij is a non-negative random variable, the focus can be narrowed down to bounding Pr(P A ij > t) for all t > 0. Following the proof scheme of (Motwani et al., 1994), we introduce the following definition. Definition A.2. Let A be a deterministic algorithm given an instance of n jobs. For all i = j [n] and t 0, we denote by u A i,j(t) the time spent on job i when the total time spent on both jobs i and j is t, assuming neither job i nor j is completed. More precisely, u A i,j(t) is defined by u A i,j(t) = SA i inf{t 0 : SA i (t ) + SA j (t ) = t} . u A i,j is therefore a rule defined solely by the algorithm. For an instance x = (x1, . . . , xn) of job sizes, the real time spent on i when a total time of t has been spent on both jobs i, j is given by min(u A i,j(t), xi). For all t 0, the following Lemma allows to express the event P A ij > t using u A ij(t). Lemma A.3. Let x1, . . . , xn be instance of n job sizes, then for any algorithm A, for any i = j [n] and t 0, the following equivalence holds P A ij > t xi > u A i,j(t) and xj > t u A j,i(t) Proof. Let A be an algorithm and i = j [n]. We denote by Sij(t) = Si(t) + Sj(t) the total processing time spent on both jobs i and j up to time t. Assuming that job i finishes first, i.e. ti tj, no processing time is spent on job i after ti, hence Si(tj) = Si(ti) = xi, and Pij = Si(tj) + Sj(ti) = Sij(ti). By symmetry, we deduce that Pij = Sij(min(ti, tj)). Non-clairvoyant Scheduling with Partial Predictions Therefore, using that Si and Sj are non-decreasing and continuous, it holds for all t 0 Pij > t Sij(min(ti, tj)) > t min(ti, tj) > inf{t : Sij(t ) t} xi > Si(inf{t : Sij(t ) t}) and xj > Sj(inf{t : Sij(t ) t}) (5) xi > uij(t) and xj > t uij(t) . (6) Equivalence (5) holds because ti = inf{t 0 : Si(t ) xi}, thus for any s 0 we have t > s xi > Si(s). The same holds for j. For Equivalence (6), we simply used Definition A.2 and the observation uij(t) + uji(t) = t. Remark A.4. In the case of non-clairvoyant algorithms, the rule defining u A ij( ) is dictated by the algorithm, independent of the job sizes. Thus, if the job sizes are sampled independently from the exponential distribution, Lemma A.3 gives for all i = j that Pr(P A ij > t) = Pr(xi > u A i,j(t)) Pr(xj > t u A j,i(t)) = e t, and it follows immediately that E[P A ij] = 1 for any deterministic algorithm. This argument, used in (Motwani et al., 1994), is not applicable in our context since the algorithm possesses access to certain job sizes, enabling the formulation of a rule for u A i,j that considers this information. Therefore, more sophisticated techniques become necessary for our analysis since the independence of the events xi > u A i,j(t) and xj > t u A j,i(t) is lost. Lemma A.5. Let a (0, ) and φ : [0, ) [0, ) a continuously differentiable and increasing function satisfying that φ(0) > 0, t 7 φ (t) φ(t) is non-increasing and R 0 dt φ(t)2 < . Let xa 1, . . . , xa n i.i.d. random job sizes with distribution Pr(xa 1 t) = 1 φ(t) 1 φ(a) 1 φ(0) 1 φ(a) 1 1t B , E[P A ij] φ(0)2 Z inf T x Gφ(x, T) φ (x) φ(x) dx o(1) a i B, j > B , where Gφ(x, T) is defined in Equation (3), and the o(1) a term does not depend on the algorithm A. Although Lemma A.5 is stated with a (0, ), the results also hold for random variables x 1 , . . . , x n sampled from the limit distribution Pr(x 1 t) = 1 φ(0) φ(t) , where the o(1) term becomes zero. Before proving the lemma, let us first observe that, since φ is increasing and R 0 dt φ(t)2 < , then necessarily limx φ(x) = . Proof. Let us first observe that for all t 0 Pr(min(x 1 , x 2 ) t) = Pr(x 1 t) Pr(x 2 t) = φ(0)2 E[min(x 1 , x 2 )] = Z 0 Pr(min(x 1 , x 2 ) t)dt = Z In the following, we prove separately the three claims of the Lemma: in the cases where both jobs sizes xi, xj are known, both are unknown, and where only xi is known. Non-clairvoyant Scheduling with Partial Predictions Both job sizes are known For any i = j [n], it holds that Pij min(xa i , xa j ), and this true in particular for i = j [B]. Furthermore, for any t 0, since φ(t) φ(0), we have that the mapping a 7 φ(t) 1 φ(a) 1 φ(0) 1 φ(a) 1 is non-increasing. It follows that Pr(xa 1 t) 1 lim a φ(t) 1 φ(a ) 1 φ(0) 1 φ(a ) 1 φ(t) 1t B, the algorithm ignores the job sizes xa i and xa j Therefore, ui,j is independent of xa i and xa j . Consequently, using Lemma A.3, the independence of xa i and xa j , then Inequality (7), we obtain Pr(Pij > t) = Pr(xi > ui,j(t) and xj > t ui,j(t)) = Pr(xi > ui,j(t)) Pr(xj > t ui,j(t)) φ(0) φ(ui,j(t))1ui,j(t) t) φ(0)2 φ(t/2)2 1t a t)dt Z 2)2 1t a B φ(t)2 dt o(1) a . Only the size of one job is known For i B and j > B, the algorithm knows xa i , therefore uij( ) can be a function of xa i . By Lemma A.3 then Inequality (7), it holds for all t 0 that Pr(Pij > t) = Pr(xa i > ui,j(t) and xa j > t ui,j(t)) = E[1xa i >ui,j(t)1xa j >t ui,j(t)] = E 1xa i >ui,j(t)E[1xa j >t ui,j(t) | xa i , ui,j(t)] = E 1xa i >ui,j(t) Pr xa j > t ui,j(t) | ui,j(t) E 1xa i >ui,j(t) φ(0)1t ui,j(t) t)dt 1t a 0 and T [x, ], and define v x,T : t 0 7 (t T + x)1t>T x, then for all v Vx,T we have v(t) v x,T (t) for all t [0, T]. Indeed, v(t) 0 = v x,T (t) for t [0, T x], and if v(t) < v x,T (t) for some t [T x, T] then, because v is 1-Lipschitz, we have v(T) v(T x) + x < v x,T (T x) + x = x , Non-clairvoyant Scheduling with Partial Predictions which contradicts v(T) x. Finally, as φ and v 7 1t a 0 that dx Pr(xa 1 x) = φ (x)/φ(x)2 φ(0) 1 φ(a) 1 1x 0 that inf T [x, ] Gφ(x, T) lim T Gφ(x, T) = Z dt φ(t) (13) and that the mapping a 7 min inf T [x, ] Gφ(x, T), R a 0 dt φ(t) f (x)1x B, and E[Pij] αφE[min(x1, x2)] for i B, j > B. Indeed, the probability density function of x1 is t 7 φ(0)φ (t) φ(t)2 , and we have for all i B and j > B that φ(0)E inf T x1 Gφ(x1, T) = φ(0)2 Z inf T x Gφ(x, T) φ (x) dt φ(t)2 = αφE[min(x1, x2)] . Assume that R 0 dt φ(t) < , i.e. E[x1] < , then by Equation (1) it holds that E[A(x)] n E[x1] + X 1 i 0, and xa = (xa 1, . . . , xa n) be i.i.d. job sizes with distribution Pr(xa 1 t) = 1 φ(t) 1 φ(a) 1 φ(0) 1 φ(a) 1 1t 0, φ is increasing, φ φ = 1 is non-increasing, and Z dt φ(t) = Z 0 e tdt = 1 , Z dt φ(t)2 = Z 0 e 2tdt = 1 Furthermore, it holds for all x 0 and T x that G(x, T) = Z T x dt φ(t) + x φ(T x) 0 e tdt + xe (T x) = 1 + (x 1)e (T x) . Non-clairvoyant Scheduling with Partial Predictions Thus, for all x 0 inf T x G(x, T) = ( G(x, x) = x if x < 1 lim T G(x, T) = 1 if x 1 , and it follows that inf T x Gφ(x, T) φ (x) φ(x)2 dx = Z inf T x Gφ(x, T) e xdx 0 xe xdx + Z Finally, by Theorem 2.1, it holds for any algorithm A having access to the sizes of B jobs that Rn,B(A) Cn,B Cn,B 1 4 = Cn,B 4(Cn,B 1) Cn,B = 2 (4/e)B n + (4/e 1)B(B 1) In particular, if B = wn + o(n), then again by Theorem 2.1 we have A.4. Proof of Corollary 2.3 Proof. Let r ( 1 2, 1), and consider φ : t 7 (1 + t)r. It holds that φ(0) = 1 > 0, φ is increasing, φ φ : t 7 r 1+t is non-increasing, and R 0 dt φ(t)2 = R 0 dt (1+t)2r = 1 2r 1 < because r > 1 2. Moreover, it holds for all x 0 and T x that G(x, T) = Z T x dt φ(t) + x φ(T x) dt (1 + t)r + x (1 + T x)r = 1 1 r + (1 + T x)1 r 1 r + x (1 + T x)r , therefore, for all x 0 inf T x G(x, T) = 1 1 r + inf T x (1 + T x)1 r 1 r + x (1 + T x)r = 1 1 r + inf y [0,1] 1 r + xyr , where the last inequality is obtained by considering y = 1 1+T x. It holds that 1 r + xyr = y (2 r) + rxy (1 r) = rxy (2 r) y 1 Non-clairvoyant Scheduling with Partial Predictions The mapping y 7 y (1 r) 1 r + xyr is thus minimized on [0, ) for y = 1 rx, and it is minimized on [0, 1] for y = min(1, 1 rx). Therefore, it holds for x 1 r that y = 1 and inf T x G(x, T) = 1 1 r + ( 1 1 r + x) = x, and for x > 1 r we have y = 1 rx and inf T x G(x, T) = 1 1 r + (rx)1 r 1 r + (rx) r = 1 1 r + 1 1 r + 1 = 1 1 r + (rx)1 r We deduce that inf T x Gφ(x, T) φ (x) φ(x)2 dx = Z 1/r 1 1 r + x r (1 + x)r+1 dx 1 1 r + (rx)1 r r (1 + x)r+1 dx = 1 1 r + Z 1/r rx (1 + x)1+r dx + r1 r (1 + x)r+1 dx 1 1 r + r1 r (1 + x)r+1 dx . (18) Let us denote Wr = R 1/r x1 r (1+x)r+1 dx. It holds that (1 + x)r+1 dx 1 r dx (1 + x)2r r x 1 + x dx (1 + x)2r (r > 1 s2 2r ds (s 1 1+x) = s2r 1 1 s 0 + 1 2(2r 1) s2r 1 1 sds 2/3 2r 1 + 1 2(2r 1) s2r 1 1 sds . It holds for all r ( 1 2, 1) that s2r 1 1 s 1 1 s, which is an integrable function on [0, 1 3], hence the dominated convergence theorem guarantees that s2r 1 1 sds = 1 s2r 1 1 sds Non-clairvoyant Scheduling with Partial Predictions we deduce that (2r 1)Wr = 3 (2r 1)p s2r 1 1 sds 2/3 o(1) r 1/2 2/3 o(1) r 1/2 = 1 + o(1) r 1/2 . Substituting into Inequality 18, and recalling that R 0 dt φ(t)2 = 1 2r 1, we deduce that for r close to 1 inf T x Gφ(x, T) φ (x) φ(x)2 dx 1 1 r + r1 r 2 + o(1))(1 o(1)) Z Consequently, by Theorem 2.1, if B = wn + o(n), it holds for any algorithm A that 2 o(1) r 1/2 2 o(1) r 1/2 and taking the limit when r 1 CR(A) 2 2(2 B. Known partial order B.1. Preliminary results Lemma B.1. For any real numbers x1, . . . , xn, if σ is a uniformly random permutation of [n] then E[min(xσ(1), xσ(2))] = 2 n(n 1) 1 i xπ(i 1), then after the first step of phase i, the time spent on each of the jobs π(i) and σ(j) is xπ(i 1). Both jobs are then executed with identical processing powers until one of them is terminated, thus Dπ(i)σ(j) = min(xπ(i), xσ(j))1xσ(j)>xπ(i 1) = xπ(i)1xσ(j)>xπ(i) + xσ(j)1xπ(i) xσ(j)>xπ(i 1) . With the convention xπ(0)=0, taking the sum over i gives i=1 Dπ(i)σ(j) = i=1 xπ(i)1xσ(j)>xπ(i) + xσ(j) i=1 1xπ(i) xσ(j)>xπ(i 1) i=1 xσ(i)1xσ(j)>xσ(i) + xσ(j)1xσ(j) xπ(B) i=1 xσ(i)1xσ(j)>xσ(i) + xσ(j) . (19) Using Lemma B.2, and recalling that {π(i)}B i=1 = {σ(i)}B i=1, we have in expectation, i=1 Dσ(i)σ(j) i = E h B X i=1 Dπ(i)σ(j) i 2 E[min(xσ(1), xσ(2))] + 1 k=1 xk , (20) and it follows that, for any j B + 1 i=1 Pσ(i)σ(j) i = E h B X i=1 Dσ(j)σ(i) i + E h B X i=1 Dσ(i)σ(j) i BE[min(xσ(1), xσ(2))] + B 2 E[min(xσ(1), xσ(2))] + 1 2 E[min(xσ(1), xσ(2))] + 1 Non-clairvoyant Scheduling with Partial Predictions j=B+1 E h B X i=1 Pσ(i)σ(j) i 3 2B(n B)E[min(xσ(1), xσ(2))] + 1 B Finally, using Equation (1), the previous upper bounds on Pσ(i)σ(j) for all i = j, then Equation (16) with 3/2 instead of απ gives that the objective of CRRR on the instance x = {x1, . . . , xn} satisfies in expectation E[CRRR(x)] = i=1 xi + E h X 1 i 0 and let us consider job sizes xε i = 1 + iε for all i [n]. Observe that the only inequalities we used in the analysis are Inequalities (19) (xσ(j)1xσ(j) xπ(B) xσ(j)), and (20) given by Lemma B.2. The second one becomes equality if the job sizes are pairwise distinct, which is satisfied by {xε i}n i=1. As for Inequality (19), since the job sizes xε are pairwise distinct and all larger than 1, we can give instead a lower bound on E[xσ(j)1xσ(j) xπ(B)] Non-clairvoyant Scheduling with Partial Predictions for j > B as follows E[xσ(j)1xσ(j) xπ(B)] = E[xσ(j)1σ(j) π(B)] Pr(σ(j) < π(B)) = Pr(σ(n) < max i B σ(i)) m=1 Pr(σ(n) = m, max i B σ(i) = k) ℓ=1 Pr(σ(n) = m, σ(ℓ) = k, i [B] \ {ℓ} : σ(i) [k 1] \ {m}) (B 1)!(n B 1)! = B!(n B 1)! k=B (k 1) k 2 B 1 = B B!(n B 1)! = B B!(n B 1)! = B B + 1 . Therefore, instead of (19), we have the lower bound i=1 Dσ(i)σ(j) i = E h B X i=1 Dπ(i)σ(j) i 2 E[min(xσ(1), xσ(2))] + E[xσ(j)1xσ(j) xπ(B)] (25) 2 + B B + 1 , (26) and thus, since we proved earlier that Dσ(i)σ(j) = min(xε σ(i),xε σ(j) for all i B and j > B, it holds that i=1 Pσ(i)σ(j) i i=1 E[min(xσ(i),xσ(j)] + B 2 + B B + 1 3B 2 + B B + 1 and using that xε i 1 for all i [n], we obtain by substituting the previous inequalities into (21) that E[CRRR(xε)] n + B(B 1) 2 + (n B)(n B 1) + (n B) B 2 + B B + 1 = n + (n B) B B + 1 2 + (n B)(n B 1) + 3B + n(n 1) (n 1)B = (n + 1) n B Non-clairvoyant Scheduling with Partial Predictions On the other hand, Equation (2), along with xε i 1 + nε for all i [n], gives i=1 (n i + 1)(1 + nε) = n(n + 1) 2 (1 + nε) , and it follows that Rn,B(CRRR) E[CRRR(xε)] (n + 1) n B n ) (n + 1)(B + 1) Taking the limit ε 0 and using Inequality (24), we obtain that n ) (n + 1)(B + 1) Rn,B(CRRR) 2 B Finally, if Bn = wn for some w [0, 1] then Bn wn, hence CRB(CRRR) = supn 2 Rn,Bn(CRRR) 2 w, and this bound is reached by the lower bound on Rn,Bn for n , which gives that CRB(CRRR) = 2 w. C. Predictions of the job sizes C.1. Proof of Theorem 4.1 We first compute the mutual delays of the jobs in Switch with any breakpoints. Lemma C.1. For any job sizes x = (x1, . . . , xn), and for any permutation σ of [n] and breakpoints zσ = (zσ(1), . . . , zσ(B)), the Switch algorithm Switch(zσ, x) satisfies for all i = j [n] that Pσ(i)σ(j) = xσ(i)1zσ(i)zσ(j) + (θijxσ(i) + (1 θij)xσ(j))1zσ(i)=zσ(j) if i, j B , Pσ(i)σ(j) = 2 min(xσ(i), xσ(j)) if i, j > B , Pσ(i)σ(j) = xσ(i)1xσ(j)>zσ(i) + min(zσ(i), xσ(i)) if i B, j > B . where θij is a Bernoulli random variable with parameter 1/2 independent of σ for all i = j [B]. Proof. Let us first define the random variables θij. For all i = j [B], if zσ(i) = zσ(j) then let θij be an independent Bernoulli random variable with parameter 1/2, and if zσ(i) = zσ(j) then let θij be the indicator that zσ(i) comes before zσ(j) with the ordering π. Since π is an ordering of the breakpoints chosen uniformly at random, then θij is a Bernoulli random variable with parameter 1/2 independently of σ. For i = j [B], if zσ(i) < zσ(j) then job σ(i) is executed until completion before job σ(j) starts being executed, thus Dσ(i)σ(j) = xσ(i) and Dσ(j)σ(i) = 0 and Pσ(i)σ(j) = xσ(i). By symmetry, if zσ(i) > zσ(j) then Pσ(i)σ(j) = xσ(j). In the case where zσ(i) = zσ(j), since π is chosen uniformly at random among all the permutations of [B] satisfying zπ(1) . . . zπ(B), each of the jobs σ(i), σ(j) is run until completion before the other one starts with equal probability 1/2, hence Pσ(i)σ(j) = θijxσ(i) + (1 θij)xσ(j). It follows that Pσ(i)σ(j) = xσ(i)1zσ(i)zσ(j) + (θijxσ(i) + (1 θij)xσ(j))1zσ(i)=zσ(j) . For i = j > B, jobs σ(i), σ(j) are processed symmetrically and the delays they cause to each other are the same as in round-robin. Therefore Dσ(i)σ(j) = Dσ(j)σ(i) = min(xσ(i), xσ(j)), and it holds that Pσ(i)σ(j) = 2 min(xσ(i), xσ(j)) . Non-clairvoyant Scheduling with Partial Predictions For i B and j > B, at the time when job σ(i) starts being executed, it holds by definition of Algorithm 2 that either job σ(j) is completed or Sσ(j) = zσ(i), hence the delay caused by job σ(j) to job σ(i) is Dσ(j)σ(i) = min(zσ(i), xσ(j)). On the other hand, job σ(i) delays job σ(j) if and only if xσ(j) > zσ(i), and in this case job σ(i) runs until completion before job σ(j) is completed, hence Dσ(i)σ(j) = xσ(i)1xσ(j)>zσ(i), and it follows that Pσ(i)σ(j) = xσ(i)1xσ(j)>zσ(i) + min(zσ(i), xσ(i)) C.1.1. PROOF OF THE THEOREM Using the previous lemma, we now prove Theorem 4.1 Proof. Using Lemma C.1, it holds for all i = j [n] that Pσ(i)σ(j) = min(xσ(i), xσ(j)) if i, j B, Pσ(i)σ(j) = 2 min(xσ(i), xσ(j)) if i, j > B, and if i B and j > B then Pσ(i)σ(j) = min(xσ(i), xσ(j)) + xσ(i)1xσ(i) B E[Pσ(i)σ(j)] 3 2E[min(xσ(1), xσ(2))] . It follows from Equation (1) that for any instance of n job sizes x = (x1, . . . , xn), the switch algorithm 2 with breakpoints (zσ(i))B i=1 = (xσ(i))B i=1 satisfies E[Switch(xσ, x)] 1 i 0. All the inequalities becoming inequalities, it holds in particular that Rn,B(Switch) E[Switch(xσ, x(ε))] OPT(x(ε)) = 2 B n ) Pn i=1 x(ε) i Pn i=1(n i + 1)x(ε) i n ) Pn i=1(1 + nε) Pn i=1(n i + 1) n (1 + nε)2(1 B n ) n + 1 , and taking ε 0 yields that Rn,B(Switch) = 2 B n ) n + 1 . Let w [0, 1]. If B = wn , then B/n w and it holds for all n 2 that Rn,B(Switch) 2 w, and this bound is reached for n , thus CRB(Switch) = 2 w. C.2. Proof of Proposition 4.2 Proof. It is shown in (Motwani et al., 1994) that for any instance x = (x1, . . . , xn), the expected sum of completion times resulting from a run of RTC is E[RTC(x)] = n+1 2 Pn i=1 xi. Using Equation 28, we deduce that the algorithm ALG that runs RTC with probability 2(n B) n(n+3) 2B and runs Switch with breakpoints zσ = xσ with the remaining probability satisfies E[ALG(x)] = 2(n B) n(n + 3) 2B RTC(x) + 1 2(n B) n(n + 3) 2B E[Switch(xσ, x)] = 2(n B) n(n + 3) 2B n + 1 i=1 xi + 1 2(n B) n(n + 3) 2B i=1 xi + 2 B 1 iyσ(j) + xσ(i)+xσ(j) 2 1yσ(i)=yσ(j) xσ(i)1yσ(i) B, Lemma C.1 gives E[Pσ(i)σ(j)] = 2E[min(xσ(i), xσ(j))] . (31) For i B and j > B, we have again by Lemma C.1 that E[Pσ(i)σ(j)] = E[min(ξyσ(i), xσ(j))] + E[xσ(i)1ξyσ(i) σ(i), which gives E h xσ(j) + xσ(i)g F xσ(j) xσ(i) i = 1 2E h xσ(j) + xσ(i)g F xσ(j) xσ(i) | σ(j) < σ(i) i 2E h xσ(j) + xσ(i)g F xσ(j) xσ(i) | σ(j) > σ(i) i 2 E[xσ(j) | σ(j) < σ(i)] + γF 2 E[xσ(i) | σ(j) > σ(i)] 2 E[min(xσ(i), xσ(j)) | σ(j) < σ(i)] + γF 2 E[min(xσ(i), xσ(j)) | σ(j) > σ(i)] = 1 + βF + γF 2 E[min(xσ(i), xσ(j))] , thus we obtain by taking the expectation over σ in (35) that E[Pσ(i)σ(j)] 1+βF +γF 2 E[min(xσ(i), xσ(j))] + 1+LF +E[ξ] Using Equation 1, the inequality above, Inequalities (30), (31), then Equation 16 with 1+βF +γF 2 instead of απ, we deduce that Algorithm 3 satisfies for any job sizes x1, . . . , xn that E[Switch(ξyσ, x)] = 1 i 0 h F (s, t) = t F( s t ) = t 1 e 1 t 1) 1t 0, it holds for all t < s that t = 1 e 1 ρ ( s Non-clairvoyant Scheduling with Partial Predictions but the mapping u 7 ( u ρ + 1)e u 1 ρ is decreasing on [1, ]. Indeed ρ + 1)e u 1 ρ + 1)e u 1 and since s t > 1 we deduce that ρ = 1 lim u 1( u ρ + 1)e u 1 ρ h F (s, t) t 1 lim u ( u ρ + 1)e u 1 thus | h F (s,t) t | max(1, 1 ρ for all t < s. Otherwise, if t s then h F (s, t) = 0 and h F (s, ) is continuous in t = s, therefore t 7 h F (s, t) is 1 ρ-Lipschitz. On the other hand, given that ξ > 1 a.s., it holds for s 1 that E[ξ1ξ 1 that E[ξ1ξ 0 that g F (s) = (1 s) 1 e (s 1)/ρ 1s>1 + 1 + ρ + (s + ρ)e (s 1)/ρ 1s>1 = (2 + ρ) s (1 + ρ)e (s 1)/ρ 1s>1 , hence βF = sup0