# competitive_fair_scheduling_with_predictions__a5b27224.pdf

Published as a conference paper at ICLR 2025

COMPETITIVE FAIR SCHEDULING WITH PREDICTIONS

Tianming Zhao School of Computer Science The University of Sydney

Chunqiu Xia School of Computer Science The University of Sydney

Xiaomin Chang School of Computer Science The University of Sydney

Chunhao Li School of Computer Science The University of Sydney

Wei Li School of Computer Science The University of Sydney

Albert Y. Zomaya School of Computer Science The University of Sydney

Beyond the worst-case analysis of algorithms, the learning-augmented framework considers that an algorithm can leverage possibly imperfect predictions about the unknown variables to have guarantees tied to the prediction quality. We consider online non-clairvoyant scheduling to minimize the max-stretch under this framework, where the scheduler can access job size predictions. We present a family of algorithms: Relaxed-Greedy (RG) with an O(η3

P) competitive ratio, where η denotes the prediction error for job sizes and P the maximum job size ratio; Adaptive Relaxed-Greedy with an O(λ0.5 η2.5

P) competitive ratio, where λ denotes the error for the minimum job size; Predictive Relaxed-Greedy with an O(λ0.5 φ0.5 η max{η, φ}

P) competitive ratio, where φ denotes the error for the maximum job size. We also present RGx, an algorithm that represents a tradeoff between consistency and smoothness, with an O(η2+2x P 1 x) competitive ratio. We introduce a general method using resource augmentation to bound robustness, resulting in RR-augmented RG, with a (1 + ϵ)-speed O(min{η3

ϵ }) competitive ratio. Finally, we conduct simulations on synthetic and real-world datasets to evaluate the practical performance of these algorithms.

1 INTRODUCTION

In classic online scheduling, algorithms are designed to have worst-case guarantees under incomplete information. However, this conservative approach often results in suboptimal performance, especially compared to where full information is available. Given the rapid advancements in Machine Learning (ML) predictive models, the need for worst-case guarantees under incomplete information may be too pessimistic. Beyond worst-case analysis, recent researchers have explored learning-augmented algorithms to leverage predictions to improve decision-making. With ML models providing predictions of unknown variables, these algorithms are robust and have an improved performance.

Scheduling jobs that arrive over time (preemption allowed) is a fundamental challenge in computing resource management, particularly when minimizing the maximum stretch. The stretch of a job Jj is defined as the ratio of its response time to job size, given by Cj rj

p j , where Cj denotes the completion time, rj release time, and p j job size (or processing time). The maximum stretch, termed max-stretch, measures fairness of the resource allocation: a max-stretch of k implies that each job receives at least a 1

k-speed machine. In contrast, other metrics like total completion time (P Cj) and total response time (P Cj rj) emphasize efficiency and will unfairly delay large jobs, even those released early. Studying max-stretch scheduling offers a general insight into fair resource allocation.

Previous work has considered offline max-stretch scheduling and online clairvoyant scheduling; both assume precisely known job sizes (Bender et al., 1998; 2002; Benoit et al., 2021). However, assuming perfect knowledge is overly optimistic in practice. Job sizes can be estimated using ML models (Amiri and Mohammad-Khanli, 2017) but are inherently subject to error. This raises a key question: how can we make fair scheduling decisions with imperfect information? Our objective is learning-augmented algorithms with performance guarantees tied to the prediction quality and robust

Published as a conference paper at ICLR 2025

against poor predictions. To our knowledge, this paper is the first to explore learning-augmented algorithms for fair scheduling. We define several terms before presenting our contributions.

Prediction errors. We consider several types of predictions: job size predictions per job at the job s arrival, an upfront prediction for the minimum job size, and an upfront prediction for the maximum job size. Prediction errors are defined as the multiplicative difference between predicted and actual job sizes. Let η denote the prediction error, defined as η = max1 j n ηj = max1 j n max{ p j pj , pj

p j }, where pj is the job size prediction for Jj (Zhao et al., 2022; Azar et al., 2021). We define P as the ratio of the largest and smallest job sizes: P = p max p min , where p max and p min represent the maximum and minimum job sizes. For the predicted minimum job size, the error is denoted by λ = max{ pe min p min , p min pe min }, where pe min is the predicted minimum job size. The error for the predicted

maximum job size is φ = max{ pe max p max , p max pemax }, where pe max is the predicted maximum job size.

Competitive framework, consistency, smoothness, and robustness. An online algorithm A will be compared against an offline optimal algorithm A that knows all jobs release times and sizes in advance. We say A has a competitive ratio c or is c-competitive, if SA(I) c SA (I) for any problem instance I, where SA(I) and SA (I) denote the max-stretch achieved by A and A , respectively. Consistency measures the algorithm performance under perfect predictions; smoothness under any prediction; robustness under worst-case predictions. Formally, an algorithm is said to have consistency τ (or be τ-consistent) if it is τ-competitive when η = 1; smoothness ρ (or be ρ-smooth) if it is ρ-competitive where ρ is a function of η; and robustness υ (or be υ-robust) if it is υ-competitive for any η, where υ is independent of η. The relationship between smoothness and robustness can be expressed as robustness = limη smoothness.

Resource augmentation. An algorithm A is (1 + ϵ)-speed c-competitive if SA 1+ϵ(I) c SA 1 (I) for any problem instance I, where SA 1+ϵ(I) and SA 1 (I) denote the max-stretch achieved by A and A , respectively, and the subscripts denote the speed of the processor used by the corresponding algorithm and A is an offline optimal algorithm.

CONTRIBUTIONS

1. Improved fairness in scheduling: We present an exact offline algorithm with a runtime of O(n2 log n), where n is the number of jobs. We develop a family of Relaxed-Greedy (RG) algorithms under various prediction settings, achieving an O(

P) competitive ratio matching the best-known clairvoyant result under perfect predictions. These algorithms require different types of predictions, including job size and minimum/maximum job size predictions, with varying guarantees depending on the quality and availability of the predictions. These algorithms leverage ML predictions for job sizes to optimize fairness in resource allocation. They extend the algorithmic idea proposed by Bender et al. (2002), but with predictions, they no longer require prior knowledge of exact or extreme job sizes, making them applicable to a broader range of scenarios.

2. Consistency-smoothness trade-off: We explore the trade-off between consistency and smoothness by introducing RGx with a competitive ratio of O(η2+2x P 1 x). This algorithm is parameterized, allowing users to adjust the guarantees as a function of the prediction error η, based on the confidence in the prediction quality.

3. Enhanced robustness via resource augmentation: To address the robustness limitations of O(n P) in the RG family, we propose RG+, round-robin-augmented version of RG, which achieves optimal robustness of O(n) with a (1 + ϵ)-speed and a competitive ratio of O(min{η3

ϵ }). We demonstrate how asymptotically optimal robustness can be achieved via resource augmentation when the classic technique of preferential round robin fails (Purohit et al., 2018).

4. Extensive experimental validation: We conduct experiments on synthetic and real-world datasets to evaluate the performance of these algorithms. The experimental results validate our theoretical analysis and demonstrate that RG+ consistently outperforms other algorithms in practical scenarios, showing its robustness and effectiveness.

Published as a conference paper at ICLR 2025

The family of the RG algorithms is somewhat reminiscent of the O(

P)-competitive algorithm proposed by Bender et al. (2002). The existing algorithms work in clairvoyant scheduling. They assume the minimum job size is 1 and a known maximum job size. The assumption is unrealistic, particularly when the extreme job sizes vary from time to time. Our algorithms extend the existing work to non-clairvoyant scheduling and overcome these limitations via predictions. The proposed algorithms allow variable minimum job size and do not require the exact maximum job size.

2 PRELIMINARIES

Problem definition. Consider a job system with a single unit-speed machine and n independent jobs. The jobs arrive at the machine for processing over time. We use Jj to denote the j-th arrived job (tie break arbitrarily). Job Jj has a release time rj and a job size p j, i.e., it takes (in total) p j time to complete processing job Jj. Jobs are preemptive, which means they can be preempted during their processing. A preempted job can resume processing later. Let Cj denote the completion time for job Jj. The stretch Sj for job Jj is the ratio of the response time and job size, i.e., Sj = Cj rj

p j . Our objective is to minimize the maximum stretch, known as max-stretch:

max-stretch = max1 j n Sj = max1 j n Cj rj

Related work. We review recent advances in competitive online scheduling with predictions. For single-machine static scheduling to minimize total completion time, an O(min{ 1

n ), 2 1 λ})- competitive algorithm exists (Purohit et al., 2018), where η is the additive error defined as η = Pn j=1 |pj p j|, and λ is a user-defined parameter. Follow-up works analyze robustness-consistency trade-offs (Wei and Zhang, 2020) and extend the results to dynamic scheduling and parallel machines (Bampis et al., 2022). For unrelated machine scheduling to minimize makespan, a deterministic O(min{ log η log m

log log m , log m})-competitive algorithm is developed (Li and Xian, 2021), building on (Lattanzi et al., 2020). Here, the predictions are for machine loads, η is the multiplicative error of the predictions, and m is the number of machines. For single-machine scheduling to minimize weighted response time, an O(min{µ3 log(µP), µ3 log(µD)})-competitive algorithm is developed (Azar et al., 2021), where µ is the multiplicative error for job size predictions and D is the maximum density ratio. For parallel identical machine scheduling, an O(min{µP, µ log µ + µ log n

m})-competitive algorithm is developed to minimize the mean response time, along with an O(µ2)-competitive nonmigrative algorithm for minimizing mean stretch (Azar et al., 2022). Our problem is a special case of minimizing maximum weighted response time, but prior works under the learning-augmented framework focus on min-sum objective, while our work focuses on min-max. Results on prediction error metrics (Im et al., 2021), multiple predictions (Dinitz et al., 2022), machine speed predictions (Balkanski et al., 2023), and stochastic scheduling (Merlis et al., 2023) are also available.

3 OFFLINE, ONLINE CLAIRVOYANT, ONLINE NON-CLAIRVOYANT SCHEDULING

We start with offline scheduling. We show an optimal algorithm with O(n2 log n) run time. We review the lower bound and competitive algorithms for online clairvoyant scheduling. We consider online non-clairvoyant scheduling and show a Ω(n) competitive lower bound with an O(n)-competitive algorithm. Proofs for Section 3.1 and 3.3 are in Appendix A and B.

3.1 OFFLINE SCHEDULING

We have already known that an approximation exists for offline scheduling.

Theorem 3.1 (Bender et al. (1998)). There exists an algorithm that finds a preemptive offline schedule with max-stretch at most 1 + ϵ times the optimum in time polynomial in n and 1

ϵ , for any fixed ϵ > 0.

The (1 + ϵ)-approximation algorithm is based on a bisection search over the value of the optimal max-stretch, using Earliest Deadline First (EDF) to check the feasibility of each candidate value. While the algorithm is polynomial in terms of the number of jobs n and the precision parameter ϵ, it is not purely polynomial in the input size n. This is because, traditionally, finding the exact minimum max-stretch was considered to involve an infinite search space, thus requiring a predefined precision

Published as a conference paper at ICLR 2025

ϵ to discretize the search space. Bender et al. (2002) propose that the search for the maximum stretch can be reduced to a finite set of candidates, but they do not formalize this result. Recent work (Benoit et al., 2021) provides an exact polynomial-time greedy algorithm for the case where all jobs are released at time 0. This algorithm follows the smallest job size first policy but does not apply to the general case with arbitrary release times. To close this gap, our first technical contribution is a formal proof of an exact, polynomial-time offline scheduling algorithm for arbitrary release times. The insight is identifying an O(n2)-size list of candidates that contains the optimal max-stretch. We establish the following result, with the proof provided in Appendix A. Theorem 3.2 (Optimal Offline Scheduling). There exists an algorithm that finds an offline schedule with optimal max-stretch in time polynomial in n. It admits an O(n2 log n) implementation.

3.2 ONLINE CLAIRVOYANT SCHEDULING

We consider online clairvoyant scheduling, where we know a job at its release time. We also know the exact job size p j when job Jj is released. This problem has been studied by Bender et al. (1998; 2002). We summarize the key results, including the lower bound and competitive algorithms.

Theorem 3.3 (Bender et al. (1998)). No P 1 3 2 -competitive deterministic clairvoyant algorithm exists.

Theorem 3.4 (Bender et al. (2002)). There exist O(

P)-competitive clairvoyant algorithms.

3.3 ONLINE NON-CLAIRVOYANT SCHEDULING

We consider online non-clairvoyant scheduling, where we know a job at its release time but not the job size p j at job Jj s release. The job size is revealed only after the job is completed. We give the lower bound on the competitive ratio for any deterministic algorithm. The key construction of the adversary is to force the algorithm to allocate no more than a 1

n share of processing at some point for the smallest-sized job, resulting in a stretch of n for that job. Subsequently, the rest of the jobs are assigned geometrically increasing sizes, ensuring that the optimum max-stretch is approximately 1. Theorem 3.5 (Non-clairvoyant Lower Bound). No c-competitive deterministic algorithm exists with c < n for online non-clairvoyant max-stretch scheduling, even if all jobs are released at time 0.

We show Round Robin (RR) scheduling where every job in the system shares an equal amount of processing power of the machine at any arbitrarily fine time, i.e., with k jobs in the system, each gets 1

k units of the processing done every time unit until any job completes or any job arrives), is an n-competitive online non-clairvoyant scheduling algorithm. Theorem 3.6 (n-Competitive RR). RR is n-competitive for online non-clairvoyant scheduling.

We show that any non-lazy algorithm is O(n P)-competitive. Here, non-lazy means that the algorithm does not idle the machine unnecessarily when there are active jobs. This result will be used to establish the robustness of learning-augmented algorithms in the following sections. Theorem 3.7 (Worst-case Competitive Ratio). Any non-lazy algorithm is (n 1) P + 1-competitive.

4 ONLINE SCHEDULING WITH PREDICTIONS

We consider the integration of predictions. Rather than knowing the exact job size, we have a size prediction pj for job Jj. Initially, we assume that the extreme predictions are known beforehand; specifically, the algorithm knows the smallest prediction pmin (i.e., min1 j n pj) and the largest prediction pmax (max1 j n pj) at time 0. We present the Relaxed-Greedy algorithm, which achieves an O(η3

P) competitive ratio, where η is the prediction error for job sizes. Next, we show how to remove this assumption by updating the extreme predictions observed online and using predictions of the extreme job sizes. We introduce the Adaptive Relaxed-Greedy algorithm, which uses the prediction of the minimum job size and dynamically updates the maximum job size prediction, achieving an O(λ0.5 η2.5

P) competitive ratio, where λ is the prediction error for the minimum job size. Finally, we present the Predictive Relaxed-Greedy algorithm, which uses predictions of both extreme job sizes, achieving an O(λ0.5 φ0.5 η max{η, φ}

P) competitive ratio, where φ is the prediction error for the maximum job size. We begin by outlining the algorithmic ideas, with proofs for Section 4.2, 4.3, and 4.4 provided in Appendix C, E, and F.

Published as a conference paper at ICLR 2025

4.1 ALGORITHM OVERVIEW

The key algorithmic idea is the reduction to the case with two job sizes. Recall that the stretch is defined as Cj rj

p j . If there are only two job sizes one short and the other long the jobs of the same type should be executed in a First-In-First-Out (FIFO) manner. A short job should be executed before a long job if the long job is released at the same time as, or later than, the short job. It can be shown that greedily scheduling the jobs, i.e., always running the job with the minimal stretch, results in a constant competitive ratio when there are only two job sizes (as shown in Bender et al. (2002) and Lemma C.6). Clearly, the problem becomes easier when there are only two job sizes. Our strategy is to reduce the general problem to this two-job-size case via predictions.

We classify jobs into two categories: job Jj is considered short if its job size prediction pj pmin pmax

µ , where µ is a constant, and long otherwise. We show in Appendix H that the optimal choice for µ is

3, though for now, we can ignore this constant. We then approximate the stretch for short jobs by Cj rj pmin pmax and for long jobs by Cj rj

pmax ; these are defined to be relaxed stretch. This construction follows the idea in Bender et al. (2002), but requires both pmin and pmax to account for that the minimal job size is not necessarily 1, which is assumed in Bender et al. (2002). This approach overestimates the size of a short job Jj by pmin pmax p

η2 p min p max = η

P p j, resulting in an overestimation by at most a factor of η

P. Similarly, it overestimates the size

of a long job Jj (pj > pmin pmax) by pmax pmin pmax q

P η p j = η2

meaning the overestimation is at most a factor of η2

P. Overall, the relaxed stretch overestimates job sizes by at most η2

P. With these approximations, the stretch of a job Jj is bounded by O(η2

P) times the relaxed stretch, resulting in a competitive ratio loss of O(η2

P) (Lemma C.8). However, the advantage of this approximation is that it reduces the scheduling problem to the easier two-jobsize case. In Appendix H we show that using the square root of pmin pmax provides the optimal dependency on P in the competitive ratio, though it is possible to adjust the power to px min p1 x max for any 0 < x 1

2, yielding different power dependencies on P in the competitive ratio.

Using predictions to classify jobs as short or long, we greedily schedule the jobs by running the job with the minimal relaxed stretch at any time. This heuristic minimizes the relaxed stretch to within a constant factor (

3). Finally, we observe that the optimal relaxed stretch is bounded by η times the optimal max-stretch (Lemma C.7). The max-stretch bound from Relaxed-Greedy is O(η2

P) of the optimal relaxed stretch, which is itself bounded by O(η3

P) of the optimal max-stretch.

4.2 RELAXED-GREEDY

One of the assumptions for previous algorithms is that the minimum job size is known to be 1 and that the maximum job size is also known upfront. We relax these assumptions by setting a threshold α as the geometric mean of the predicted minimum and maximum job sizes, which serves as the boundary for classifying jobs as small or large. Our analysis shows that such methods allow the performance to tie to the prediction errors for these quantities.

Definition 4.1 (α-β Relaxed-Stretch). Define α-β relaxed-stretch Sj,α,β(t) for job Jj at time t given

α and β: Sj,α,β(t) = t rj

α if Jj is active at time t and pj α; Sj,α,β(t) = t rj

β if Jj is active at time t and pj > α. We drop the prefix α-β when the context is clear.

Definition 4.2 (α-β Short and Long Jobs). Given parameters α and β, a job Jj is said to be short if pj α; a job Jj is said to be long otherwise.

We present an O(η3

P)-competitive algorithm Relaxed-Greedy (RG). Through the execution of RG, parameters α and β are set to be constants: α = pmin pmax

3 and β = pmax. It holds

α = pmin pmax

3 < pmax = β. Note that the constant

3 in α serves to yield an optimal constant in

the competitive ratio. Any constant can replace it and still maintain RG s O(η3

P) competitive ratio as long as α < β (see Appendix D for a discussion). Algorithm RG maintains the invariant of running the active job with the largest relaxed-stretch. This means that at any time t, RG looks at all

Published as a conference paper at ICLR 2025

Algorithm 1: Relaxed-Greedy Data :job size prediction pj upon job release and the extreme predictions pmin and pmax Result :schedule with a max-stretch O(η3

P) times the optimum max-stretch

1 α pmin pmax

2 Function relaxed Stretch(j, t) // computes job Jj s relaxed-stretch at time t

3 if Jj is completed before t then

4 t Cj // set variable t to be the completion time of Jj

5 return t rj

α if pj α else t rj

6 Relaxed-Greedy maintains the active job (released but not completed job) set U; initially, set U .

7 Event Function Job Release() // Jj is released

8 insert Jj into U; run Jj on the machine if U = .

9 Event Function Job Complete() // Jj has been processed for p j unit at time t

10 remove Jj from U.

11 if there is at least one job in U then

12 let Js be the job in U with the largest relaxed-stretch: Js arg max Jx U relaxed Stretch(x, t), with tie breaking by the smaller job index.

13 run Js on the machine.

15 idle the machine.

16 Event Function Job Switch() // there exists Jx U, x = s such that relaxed Stretch(x, t) > relaxed Stretch(c, t), where Jc is the currently running job

17 let Js be the job in U with the largest relaxed-stretch: Js arg max Jx U relaxed Stretch(x, t), with tie breaking by the smaller job index.

18 preempt Jc.

19 run Js on the machine.

the active jobs and processes the one with the largest relaxed-stretch until the job is completed or until there is another active job later with a larger relaxed-stretch. Note that RG does not need the knowledge of the prediction error η. The pseudo-code of RG is given in Algorithm 1.

Theorem 4.3 (The O(η3

P) Competitive Ratio). Relaxed-Greedy is

P-competitive for online scheduling with predictions.

We show that RG is Θ(n P)-competitive with the worst-case predictions. Theorem 4.4 (Robustness of RG). RG is (n 1) P +1-competitive with arbitrarily bad predictions, and this bound is tight.

Remark 4.5. Relaxed-Greedy is

P-consistent,

P-smooth, and (n 1) P + 1robust. It represents a consistency of O(

P) matching (asymptotically) the best-known clairvoyant competitive ratio but comes with the worst robustness among non-lazy algorithms.

Algorithm RG can be implemented with constant run-time for online decision-making. Observe that any later-released short (long) job has a smaller relaxed-stretch than any early-released active short (long) job. Thus, the short (long) jobs are processed in a First-In-First-Out (FIFO) manner, allowing us to maintain two queues, one for short and one for long jobs. Determining the job with the largest relaxed-stretch requires simply comparing the relaxed-stretch of the jobs at the front of the queues. We give RG s performance for its run-time and space complexity and preemptions. Proposition 4.6 (Complexity of RG). Relaxed-Greedy admits an implementation with O(1) runtime to make any online decision and O(n) space. Proposition 4.7 (Preemptions of RG). Relaxed-Greedy generates a schedule with at most n 1 preemptions. Thus, the amortized number of preemptions per job is at most 1.

4.3 ADAPTIVE RELAXED-GREEDY

We remove the assumption of knowing the extreme predictions beforehand and extend RG to Adaptive Relaxed-Greedy (ARG). We show that the absence of extreme predictions does not

Published as a conference paper at ICLR 2025

significantly alter the competitive ratio. We show that ARG achieves an O(λ0.5 η2.5

P) competitive ratio, where λ denotes the prediction error for the minimum job size. Instead of knowing the exact minimum job size prediction, Algorithm ARG can access a black-box ML model that predicts the minimum job size of all jobs, denoted by pe min. This predicted minimum job size pe min comes with a prediction error λ = max{ pe min p min , p min pe min }. Meanwhile, Algorithm ARG learns pmax along job release.

A key technical challenge in ARG is classifying jobs when α is continuously updated over time. Dynamically changing the classification of jobs might work, but the proof for its competitive ratio is difficult to establish (we did not find proof). Our solution overcomes this issue by ensuring that once a job is classified at its arrival time, based on αrj, it remains unchanged in that classification. For example, if job Jj is classified as r-long, it remains a long job, regardless of future updates of α. This enables the competitive analysis for ARG. One innovation of ARG is the use of heterogeneous predictions, i.e., predictions for different types of quantities, in learning-augmented algorithms. Our analysis reveals that these different predictions impact the algorithm s performance differently.

We introduce subscript t to denote the value of a variable at time t. Thus, pmax at time t (denoted by pmax,t) represents the maximum job size prediction seen so far, i.e., pmax,t = maxrj t pj. Parameters α and β are functions in pmin and pmax and thus are time-varying. They are updated with

updates of pmax in Job Release. Formally, ARG sets αt =

pe min pmax,t

3 and βt = max{pmax,t, αt}, where αt and βt denote the values of α and β at time t. We record αrj and βrj for every job Jj. These values are used to compute the adaptive relaxed-stretch of Jj, defined as follows.

Definition 4.8 (α-β Adaptive Relaxed-Stretch). Define α-β adaptive relaxed-stretch \ Sj,α,β(t) for job Jj at time t given α and β: \ Sj,α,β(t) = t rj

αrj if Jj is active at time t and pj αrj; \ Sj,α,β(t) = t rj

βrj if Jj is active at time t and pj > αrj. We drop the prefix α-β when the context is clear. Definition 4.9 (α-β r-short and r-long Jobs). Given parameters α and β, a job Jj is said to be r-short if pj αrj; a job Jj is said to be r-long otherwise.

The prefix r for r-short or r-long means that the classification depends on the values of α and β at rj for job Jj. It is thus possible that a r-short job released later has a larger job size prediction than a r-long job released earlier. Other than that, ARG maintains the (same) invariant of running the active job with the largest adaptive relaxed-stretch. The pseudo-code of ARG is given in Appendix E.

Theorem 4.10 (The O(λ0.5 η2.5

P) Competitive Ratio). Adaptive Relaxed-Greedy is

3 λ0.5 η2.5

P-competitive for online scheduling with predictions.

Remark 4.11. Adaptive Relaxed-Greedy is

P-consistent,

3 λ0.5 η2.5

P-smooth, and (n 1) P + 1-robust. This robustness result is tight.

The ARG s performance remains the same as RG with respect to run-time and space complexity and preemptions. Since the parameters α and β are non-decreasing, it still holds that any later-released short (long) job has a smaller adaptive relaxed-stretch than any early-released active short (long) job, allowing ARG to have the same implementation of two FIFO job queues as RG does. Proposition 4.12 (Complexity of ARG). Adaptive Relaxed-Greedy admits an implementation with O(1) run-time to make any online decision and O(n) space. Proposition 4.13 (Preemptions of ARG). Adaptive Relaxed-Greedy generates a schedule with at most n 1 preemptions. Thus, the amortized number of preemptions per job is at most 1.

The competitive result in this section does not hold if the algorithm is instead given the predicted maximum job size and has to estimate the minimum job size prediction online. There seems to be an asymmetric importance for the knowledge about extreme job sizes. Having some knowledge of the minimum job size at the start may be a requirement.

4.4 PREDICTIVE RELAXED-GREEDY

If the prediction error(s) for extreme job sizes are believed to be lower than η, the algorithm can use an additional prediction about the maximum job size. We extend RG to Predictive Relaxed-Greedy (PRG), which needs access to predictions of both the minimum job size, denoted by pe min, and the maximum job size, denoted by pe max, of all jobs, with prediction errors λ = max{ pe min p min , p min pe min }

Published as a conference paper at ICLR 2025

and φ = max{ pe max p max , p max pemax }. PRG sets static α and β independent of time: α =

pe min pemax

β = pe max, where µ is a symbolic constant µ = max{

pe min pemax + ϵ} for any small fixed ϵ > 0. This ensures α < β. Algorithm PRG maintains the same invariant of running the active job with the largest relaxed-stretch determined by the above α and β.

Theorem 4.14 (The O(λ0.5 φ0.5 η max{η, φ}

P) Competitive Ratio). Predictive Relaxed Greedy is max{ 3

µ, µ} λ0.5 φ0.5 η max{η, φ}

P-competitive.

With reasonable predictions where pe min pe max and φ η, PRG is

3 λ0.5 φ0.5 η2

Pcompetitive. With extremely bad predictions (i.e., pe min > 3 pe max), however, the competitive ratio

depends on the prediction values and becomes ( q

pe min pemax +ϵ) λ0.5 φ0.5 η max{η, φ}

P. This can be avoided by switching to ARG when pe min > 3 pe max. Combining of ARG and PRG guarantees the competitive ratio of

3 λ0.5 φ0.5 η max{η, φ}

3 λ0.5 η2.5

P. Remark 4.15. Predictive Relaxed-Greedy is max{ 3

P-consistent, max{ 3

µ, µ} λ0.5 φ0.5

η max{η, φ}

P-smooth, and (n 1) P +1-robust. This robustness result is tight. The algorithm admits an implementation with O(1) run-time to make any online decision and O(n) space and generates a schedule with at most n 1 preemptions.

5 CONSISTENCY-SMOOTHNESS TRADE-OFFS

We study the trade-off between consistency and smoothness for RG by altering the parameters (α and β) in the definition of relaxed-stretch. The idea directly applies to ARG and PRG, so this paper does not discuss them. We introduce a hyper-parameter x (0 x 1

2). Denote RGx to be Relaxed

Greedy parametrized by x, where RGx sets α = px minp1 x max

3 and β = pmax and runs the same process

as RG. This constructs a family of algorithms and RG is one member RG

1 2 . We show that RGx

interpolates between the First Come First Serve (denoted by FIFO) algorithm and RG, where FIFO is O(P)-competitive and RG is O(η3

P)-competitive. See Appendix G for proofs for this section and Appendix H for the construction process of RGx.

Theorem 5.1 (The O(η2+2x P 1 x) Competitive Ratio). RGx (0 x 1

3 η2+2x P 1 xcompetitive for online scheduling with predictions.

If we set x = 0 with α = pmax, β = pmax, we have a singular point where RG0 decays to FIFO, as every job would be classified as a short job. We drop the constant

3 here as this term in relaxedstretch does not improve the performance of RG0. Algorithm RG0 with α = pmax, β = pmax, or FIFO, achieves an O(P) competitive ratio.

Theorem 5.2 (The O(P) Competitive Ratio). RG0, or FIFO, is P-competitive.

Setting x = 1

2 gives the best consistency (O(

P)) but the worst smoothness (O(η3)). In contrast, setting x = 0 with α = pmax, β = pmax gives the worst consistency (O(P)) but the optimal smoothness as a function on η (O(1)), for which the prediction quality does not affect the performance at all. If we exclude the singular point RG0, i.e., 0 < x 1

2, increasing hyper-parameter x decreases the power of P from 1 to 1

2 and increases that of η from 2 to 3 (a visualization of the trade-offs is in Appendix I). Practitioners can choose the hyper-parameter x based on the confidence of the prediction quality: high-quality predictions deserve a high x; low-quality predictions a low x.

6 BOUNDING ROBUSTNESS VIA RESOURCE AUGMENTATION

We introduce a general method for bounding robustness. The goal is to address the weak robustness (O(n P)) of the learning-augmented algorithms, i.e., they cannot deal with the worst predictions. We first show that the recent technique of Preferential Round Robin (PRR) (Purohit et al., 2018), which allocates partial processing power to multiple algorithms, fails to address this issue with the max-stretch metric. This shows the necessity of resource augmentation (Kalyanasundaram

Published as a conference paper at ICLR 2025

and Pruhs, 2000). We produce a family of RR-augmented algorithms with asymptotically optimal robustness and near-optimal consistency. Algorithm RG+, one member of the family, is (1+ϵ)-speed O(min{η3

ϵ })-competitive. Proofs are in Appendix J.

We first review the PRR method. Suppose we have two algorithms, A and B, with competitive ratios c A and c B. Combining the two algorithms by giving x (0 < x < 1) proportion of processing power to A and 1 x processing power to B constructs a competitive ratio of min{ c A

1 x} for some metrics (e.g., total completion time) under static scheduling (all jobs are released at time 0). This method, however, requires a critical assumption that each algorithm, when given a processing power less than 1, has a competitive ratio scaled by the allocated speed. This assumption, unfortunately, fails to hold in general and, in particular, not for the max-stretch scheduling. Theorem 6.1 (Motivation for Resource Augmentation). Any algorithm is at best ϵ 1 ϵ n + 1competitive, if given a (1 ϵ)-speed machine, for any 0 < ϵ < 1.

With a slower machine, any algorithm is Ω(n)-competitive, matching the non-clairvoyant lower bound. This means a (slower) speed takes away the advantages of clairvoyance, echoing a similar phenomenon as previously observed by Kalyanasundaram and Pruhs (2000). As a result, the assumption for PRR does not hold. Theorem 6.1 also suggests that RR has an asymptotically optimal competitive ratio (O(n)) with a slower machine. This leads to the design of RR-augmented algorithms with resource augmentation. The idea is to augment a base algorithm (e.g., RG) with a higher-speed machine, using the additional speed to run RR concurrently at unit speed. Definition 6.2 (Monotonicity (Purohit et al., 2018)). A non-clairvoyant scheduling algorithm is called monotonic if given two instances with identical inputs and actual job sizes (p 1, ..., p n) and (p 1, ..., p n) such that p j p j for all j, the objective function value found by the algorithm for the first instance is no higher than that for the second. Lemma 6.3. All algorithms mentioned in this paper (RG, ARG, PRG, RGx, and RR) are monotonic. Theorem 6.4 (RR-augmented Algorithms). Given a monotonic algorithm A with competitive ratio c A, one can obtain a (1 + ϵ)-speed min{c A, n

ϵ }-competitive (RR-augmented) algorithm.

Remark 6.5. RR-augmented RG (denoted by RG+) is (1 + ϵ)-speed min{

ϵ }- competitive. Similarly, we produce ARG+ ((1+ϵ)-speed min{

3 λ0.5 η2.5

ϵ }-competitive), PRG+ ((1+ϵ)-speed min{max{ 3

µ, µ} λ0.5 φ0.5 η max{η, φ}

ϵ }-competitive), and PRGx+

((1 + ϵ)-speed min{

3 η2+2x P 1 x, n

ϵ }-competitive). While preserving their consistency, these algorithms achieve asymptotically optimal robustness, or strictly optimal with 2-speed augmentation.

7 EXPERIMENTS

We evaluate Greedy-with-Rounding (GWR) (Bender et al., 2002), RR, RG, ARG, and RG+ on synthetic and real-world datasets ((Google, 2019), (Alibaba, 2023), and Azure (Cortez et al., 2017)) in minimizing max-stretch, variance of stretch, and mean response time. Stretch variance measures fairness by comparing jobs to each other; the mean response time measures total efficiency. This section presents the results for the max-stretch minimization. The rest of the results and discussions are in Appendix K, L, M, and N. Code is available on Git Hub (Anonymous, 2024).

We measure the performance ratio, which is the ratio between the max-stretch obtained by an algorithm and the optimum. Algorithm 2 is used to compute the optimal max-stretch. We set RG+ to be (1 + 0.3)-speed RR-augmented RG, i.e., it runs RG at unit speed and RR at speed 0.3. The job sizes are generated following exponential distributions. With the maximum job size ratio P, we set p j to max{1, log X A} where X is a random variable with X U(0, 1) and A a scaling factor to ensure p j P. Given η and p j, we set pj p j exp(Y ) with Y U( log η, log η). We generate 50 independent instances for every problem set defined by n, P, η, and the range of release times.

Figures 1 to 3 are the results for synthetic datasets. Figures 1 and 2 present the performance ratio under different n and P with perfect predictions. RG and ARG outperform GWR and RR in most scenarios. RG+, with a slight 0.3 speed augmentation, consistently achieves the best performance. Figure 3 presents the results under an increasing η. While the performance ratio of RG, ARG, and RG+ increases with η, RG+ is robust to prediction error with a bounded performance ratio.

Published as a conference paper at ICLR 2025

Performance Ratio

Performance Ratio

Performance Ratio

Performance Ratio

Performance Ratio

Performance Ratio

Figure 1: Performance ratios with increasing jobs. Each box represents the distribution of the performance ratios. The number of jobs is 1000 (R1C1), 2000 (R1C2), 3000 (R1C3), 4500 (R2C1), 6000 (R2C2), and 10000 (R2C3), released uniformly in [0, 5000] with sizes in [1, 10]. Prediction error is 1 (i.e., pj = p j ).

Performance Ratio

Performance Ratio

Performance Ratio

Performance Ratio

Performance Ratio

Performance Ratio

Figure 2: Performance ratios with an increasing P . The job sizes are drawn from interval [1, P ], where P is set to 1 (R1C1), 2 (R1C2), 5 (R1C3), 10 (R2C1), 20 (R2C2), and 40 (R2C3). The number of jobs is fixed at 4500. Jobs are released in time [0, 5000] uniformly at random. The prediction error is set to 1.

1 2 4 8 16 32 64 128 512 Prediction Error

Performance Ratio

GWR RG ARG RR RG+

Figure 3: Performance ratios with an increasing prediction error η. The prediction error increases from 1 to 512. The number of jobs is fixed at 4500. Jobs are released in time [0, 5000] uniformly at random. The job sizes are drawn from [1, 10].

1 2 3 4 5 6 Prediction Error

Performance Ratio

GWR RG ARG RR RG+

Figure 4: Performance ratios with an increasing prediction error (1 to 6) on real-world trace-log data from Google Cloud.

1 2 3 4 5 6 Prediction Error

Performance Ratio

GWR RG ARG RR RG+

Figure 5: Performance ratios with an increasing prediction error (1 to 6) on real-world trace-log data from Azure Cloud.

1 2 3 4 5 6 Prediction Error

Performance Ratio

GWR RG ARG RR RG+

Figure 6: Performance ratios with an increasing prediction error (1 to 6) on real-world trace-log data from Alibaba Cloud.

Figure 4 to 6 presents the results on real-world trace-log data under different η. RG and ARG have strong performance under perfect predictions. With limited job size information, the performance ratio of ARG increases the fastest with increasing η. RG s performance ratio falls between ARG and GWR under imperfect predictions. Overall, RG+ achieves the best performance at all times.

8 CONCLUSION

Under classic scheduling settings, we propose an exact polynomial-time offline algorithm, establish a non-clairvoyant competitive lower bound of Ω(n), and present a matching O(n)-competitive algorithm. We develop a family of algorithms for online scheduling to minimize max-stretch with different types of predictions. These algorithms offer various guarantees and leverage predictions from multiple sources. They achieve a consistency of O(

P) matching the best-known clairvoyant result but exhibit relatively weak robustness of O(n P). We demonstrate how a consistencysmoothness trade-off can be achieved with a parameterized algorithm. Furthermore, we show how resource augmentation can overcome the poor robustness guarantee, resulting in asymptotically optimal robustness. Through experiments, we demonstrate the practicality of these algorithms. We anticipate that many of the techniques we employ are applicable to other problems.

Many questions remain open for future work. We have not established whether the cubic dependency on η is optimal for learning-augmented algorithms. Therefore, analyzing the lower bound on the dependency on η remains a subject for future study. While we have shown that our construction of RGx achieves the best possible trade-off for consistency and smoothness under this method, we have not demonstrated the Pareto-optimality of the trade-off. Establishing this remains an interesting direction for future work. Additionally, extending the approach to parallel identical machines seems to be a promising direction.

Published as a conference paper at ICLR 2025

ACKNOWLEDGMENTS

We sincerely thank the anonymous reviewers for their valuable feedback and constructive suggestions, which have significantly improved this work. Professor Albert Zomaya and Dr Wei Li would like to acknowledge the support of the Australian Research Council Discovery Grant (DP200103494). Dr Wei Li acknowledges the support of the Australian Research Council (ARC) through the Discovery Early Career Researcher Award (DE210100263). Dr Tianming Zhao expresses his deepest gratitude to his family. In particular, Tianming is grateful to his wife, Min Li, and his newborn son, Michael Renqian Zhao, for their love, encouragement, and inspiration throughout this journey. Their support has been an invaluable source of motivation.

Alibaba. Alibaba cluster data, 2023. URL https://github.com/alibaba/clusterdata. Git Hub repository.

Maryam Amiri and Leyli Mohammad-Khanli. Survey on prediction models of applications for resources provisioning in cloud. J. Netw. Comput. Appl., 82(C):93 113, March 2017. ISSN 1084-8045. doi: 10.1016/j.jnca.2017.01.016. URL https://doi.org/10.1016/j.jnca. 2017.01.016.

Anonymous. Experiment code and datasets for competitive fair scheduling with predictions. https://anonymous.4open.science/r/NIPS Competitive-Fair-Scheduling-with-Predictions3388, 2024. [Online; accessed Date-of-Access].

Yossi Azar, Stefano Leonardi, and Noam Touitou. Flow time scheduling with uncertain processing time. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, page 1070 1080, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450380539. doi: 10.1145/3406325.3451023. URL https://doi.org/10.1145/ 3406325.3451023.

Yossi Azar, Eldad Peretz, and Noam Touitou. Distortion-Oblivious Algorithms for Scheduling on Multiple Machines. In Sang Won Bae and Heejin Park, editors, 33rd International Symposium on Algorithms and Computation (ISAAC 2022), volume 248 of Leibniz International Proceedings in Informatics (LIPIcs), pages 16:1 16:18, Dagstuhl, Germany, 2022. Schloss Dagstuhl Leibniz-Zentrum f ur Informatik. ISBN 978-3-95977-258-7. doi: 10.4230/LIPIcs.ISAAC.2022.16. URL https:// drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ISAAC.2022.16.

Kenneth R. Baker. Introduction to sequencing and scheduling. Wiley, 1974.

Eric Balkanski, Tingting Ou, Clifford Stein, and Hao-Ting Wei. Scheduling with speed predictions. In Jarosław Byrka and Andreas Wiese, editors, Approximation and Online Algorithms, pages 74 89, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-49815-2.

Evripidis Bampis, Konstantinos Dogeas, Alexander Kononov, Giorgio Lucarelli, and Fanny Pascual. Scheduling with untrusted predictions. In Lud De Raedt, editor, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 4581 4587. International Joint Conferences on Artificial Intelligence Organization, 7 2022. doi: 10.24963/ijcai.2022/636. URL https://doi.org/10.24963/ijcai.2022/636. Main Track.

Michael A. Bender, Soumen Chakrabarti, and S. Muthukrishnan. Flow and stretch metrics for scheduling continuous job streams. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 98, page 270 279, USA, 1998. Society for Industrial and Applied Mathematics. ISBN 0898714109.

Michael A. Bender, S. Muthukrishnan, and Rajmohan Rajaraman. Improved algorithms for stretch scheduling. In Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 02, page 762 771, USA, 2002. Society for Industrial and Applied Mathematics. ISBN 089871513X.

Anne Benoit, Redouane Elghazi, and Yves Robert. Max-stretch minimization on an edge-cloud platform. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 766 775, 2021. doi: 10.1109/IPDPS49936.2021.00086.

Published as a conference paper at ICLR 2025

Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. Resource central: Understanding and predicting workloads for improved resource management in large cloud platforms. In Proceedings of the 26th Symposium on Operating Systems Principles, pages 153 167, 2017.

Michael Dinitz, Sungjin Im, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Algorithms with prediction portfolios. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 20273 20286. Curran Associates, Inc., 2022.

Google. Google Cluster Workload Traces 2019, 2019. URL https://research.google/ resources/datasets/google-cluster-workload-traces-2019/. Accessed: 2023-08-20.

Sungjin Im, Ravi Kumar, Mahshid Montazer Qaem, and Manish Purohit. Non-clairvoyant scheduling with predictions. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 21, page 285 294, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450380706. doi: 10.1145/3409964.3461790. URL https://doi.org/ 10.1145/3409964.3461790.

Bala Kalyanasundaram and Kirk Pruhs. Speed is as powerful as clairvoyance. J. ACM, 47(4): 617 643, jul 2000. ISSN 0004-5411. doi: 10.1145/347476.347479. URL https://doi.org/ 10.1145/347476.347479.

Silvio Lattanzi, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Online scheduling via learned weights. In Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 20, page 1859 1877, USA, 2020. Society for Industrial and Applied Mathematics.

Shi Li and Jiayi Xian. Online unrelated machine load balancing with predictions revisited. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 6523 6532. PMLR, 18 24 Jul 2021. URL https://proceedings.mlr.press/v139/li21w.html.

Nadav Merlis, Hugo Richard, Flore Sentenac, Corentin Odic, Mathieu Molina, and Vianney Perchet. On preemption and learning in stochastic scheduling. In Proceedings of the 41st International Conference on Machine Learning, Proceedings of Machine Learning Research. PMLR, 21 27 Jul 2023. URL https://proceedings.mlr.press/v202/merlis23a/merlis23a. pdf.

Manish Purohit, Zoya Svitkina, and Ravi Kumar. Improving online algorithms via ml predictions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.

Alexander Wei and Fred Zhang. Optimal robustness-consistency trade-offs for learning-augmented online algorithms. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS 20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.

Tianming Zhao, Wei Li, and Albert Y. Zomaya. Real-time scheduling with predictions. In 2022 IEEE Real-Time Systems Symposium (RTSS), pages 331 343, 2022. doi: 10.1109/RTSS55097. 2022.00036.

Published as a conference paper at ICLR 2025

TABLE OF NOTATIONS

See Table 1 for the table of notations used in the analysis.

S Meaning n the number of jobs Jj job with index j rj release time of Jj p j job size of Jj pj job size prediction for Jj Cj completion time of Jj t a time instant CA j completion time of Jj in the schedule constructed by algorithm A I a problem instance (in proofs for Section 6 this I refers to problem input) σ a schedule constructed for some problem instance I

ηj prediction error for Jj: ηj = max{ p j pj , pj p j }

η total prediction error: η = max1 j n ηj = max1 j n max{ p j pj , pj p j } p min, p max minimum/maximum job size: p min = min1 j n p j and p max = max1 j n p j P maximum ratio of any two job sizes: P = p max p min Sj stretch of job Jj: Sj = Cj rj

p j SA j stretch of job Jj in the schedule constructed by algorithm A: SA j = CA j rj

p j S optimal max-stretch for some problem instance I SA max-stretch achieved by algorithm A for some problem instance I π an ordering of jobs pmin, pmax minimum/maximum job size prediction: pmin = min1 j n pj and pmax = max1 j n pj pe min, pe max predicted minimum/maximum job size of all jobs λ prediction error associated with pe min: λ = max{ pe min p min , p min pe min }

φ prediction error associated with pe max: φ = max{ pe max p max , p max pemax } Sj,α,β(t) α-β relaxed-stretch of job Jj at time t, abbreviated as f Sj(t) Sj,α,β α-β relaxed-stretch of job Jj at its completion, abbreviated as f Sj when α and β are clear g Sα,β maximum α-β relaxed stretch of all jobs: g Sα,β = max1 j n Sj,α,β, abbreviated as e S f S optimal max-relaxed-stretch for some problem instance I f SA max-relaxed-stretch achieved by algorithm A for some problem instance I edj deadline for job Jj to achieve the optimal max-relaxed-stretch f Pj(t) predecessors of Jj (in terms of edj) at time t: f Pj(t) = {Jx|f dx edj, Jx active at time t} f Fj(t) followers of Jj (in terms of edj) at time t: f Fj(t) = {Jx|f dx > edj, Jx active at time t} ] CEDF j completion time of Jj in the schedule constructed by EDF with deadlines edj SOP T max-relaxed-stretch of the optimal schedule for the max-stretch problem τj the earliest time after which no job delays/ -delays Jj pmax,t maximum job size prediction seen by time t: pmax,t = maxrj t pj \ Sj,α,β(t) α-β adaptive relaxed-stretch of job Jj at time t, abbreviated as c Sj(t) \ Sj,α,β α-β adaptive relaxed-stretch of job Jj at its completion, abbreviated as c Sj d Sα,β maximum α-β adaptive relaxed stretch: d Sα,β = max1 j n \ Sj,α,β, abbreviated as b S c S optimal max-adaptive-relaxed-stretch for some problem instance I c SA max-adaptive-relaxed-stretch achieved by algorithm A for some problem instance I bdj deadline for job Jj to achieve the optimal max-adaptive-relaxed-stretch c Pj(t) predecessors of Jj (in terms of bdj) at time t: c Pj(t) = {Jx|c dx bdj, Jx active at time t} c Fj(t) followers of Jj (in terms of bdj) at time t: c Fj(t) = {Jx|c dx > bdj, Jx active at time t} [ CEDF j completion time of Jj in the schedule constructed by EDF with deadlines bdj \ SOP T max-adaptive-relaxed-stretch of the optimal schedule for the max-stretch problem

Table 1: A table of notations.

Published as a conference paper at ICLR 2025

Algorithm 2: Offline Scheduling

Data :every job Jj with release time rj and job size p j Result :schedule with optimal max-stretch

1 S { ri rj p j p i |1 i, j n, p i = p j, ri rj p j p i 0} {0}

2 view set S as a sorted index-based list.

3 l, r, σo, mso 0, |S|, ,

4 while l < r do

6 assign dj rj + Sk p j for all 1 j n.

7 order jobs by increasing dj; tie breaks by smaller p j first and tie breaks arbitrarily for the rest.

8 schedule jobs with respect to this order by always running the active arrived job with the highest priority (the most front in the order); let the resulting schedule be σ.

9 ms max-stretch of σ

10 if ms < mso then

11 σo, mso σ, ms

12 if every job Jj completes no later than dj in σ then

16 return σo

A MISSING PROOFS FOR SECTION 3.1

An exact minimum (optimal) max-stretch exists. This is a product of the following general result.

Theorem A.1 (Min Max-user-defined-stretch). Fix any positive vector Γ = (γ1, γ2, ..., γn). There exists a schedule, for every problem instance, which achieves a minimum max-user-defined-stretch, defined as f SΓ = max1 j n f SΓj, where f SΓj = Cj rj

In the analysis of this section, we use the optimality of EDF: EDF minimizes the maximum lateness of the jobs and always finds a feasible schedule if one exists.

The max-stretch is a special case of the max-user-defined-stretch with Γ = (p 1, p 2, ..., p n). We show a polynomial-time exact algorithm exists. Observe that we can construct an optimal schedule using the Earliest Deadline First (EDF) algorithm if we know the optimal max-stretch. If S is the optimal max-stretch, we can assign a deadline dj rj + S p j to every job Jj and schedule jobs by EDF. This schedule has max-stretch S by the optimality of EDF. Essentially, EDF uses the deadlines to determine a job ordering and schedules jobs according to the job priority indicated by the ordering. We can construct the optimal schedule if we know the optimal job ordering, which a bisection search can find on the following set. We let S = { ri rj

p j p i |1 i, j n, p i = p j, ri rj

p j p i 0} {0} be the set of all intersections (x-axis) of lines y = rj + p j x and a zero element. Every element x S corresponds to an order of jobs via assigning dj rj + x p j and ordering jobs by increasing dj. One of these orders is the optimal job order. A bisection search on sorted S can efficiently find it. The pseudo-code of this algorithm is given in Algorithm 2.

The proof for Theorem A.1 uses the following observation. Observation A.2. If a schedule exists with max-user-defined-stretch x, there exists a schedule with max-user-defined-stretch y for any y x. If no schedule achieves max-user-defined-stretch x, no schedule achieves max-user-defined-stretch y for any y x.

Proof. Suppose a schedule σ achieves a max-user-defined-stretch x. Construct another schedule σ , which is identical to σ except for the job Jk that is completed as the last job in σ. Preempt Jk in σ before its completion and resume Jk later so that the user-defined-stretch for job Jk reaches exactly y. Thus, σ achieves a max-user-defined-stretch y.

Suppose no schedule achieves max-user-defined-stretch x. Suppose, for proof by contradiction, that there exists a schedule σ that achieves max-user-defined-stretch y with y x. Construct a schedule

Published as a conference paper at ICLR 2025

σ following the same procedure as above to produce σ with max-user-defined-stretch x. This contradicts the idea that no schedule achieves max-user-defined-stretch x.

We introduce some definitions and notations. Given an order of jobs π = (Jj1, Jj2, ..., Jjn) indicating job priority (the more front in the order, the higher the priority is), define the priority schedule with respect to π to be the schedule constructed from always running the active arrived job with the highest priority and preempting the running low priority job if necessary; the priority schedule is denoted by PS(π).

For any x S, define π[x] to be the job order constructed by ordering jobs in increasing deadlines, where dj = rj + x γj, with tie-breaking by smaller γj first and further tie-breaking by the smaller job index first (or any arbitrary order). View set S as a sorted index-based list S = (x0, x1, ..., xκ). Note that x0 = 0 by definition. We have [x0, x1), [x1, x2), ..., [xκ 1, xκ), [xκ, ) forms a partition of non-negative numbers.

Define a mapping Ψ that maps positive numbers to orders of jobs. Order Ψ(x) is defined as follows. Set deadline dj = rj + x γj for all 1 j n. Order the jobs by increasing deadlines. If two jobs have the same deadline, the one with a smaller weight (γj) comes first. If two jobs have the same deadline and size, i.e., the same release times and job sizes, the one with a smaller job index comes first (this tie-breaking criterion can be arbitrary). The resulting order is Ψ(x).

Proof for Theorem A.1. We make two observations: (1) PS(π[0]) does not achieve a max-userdefined-stretch 0; (2) PS(π[x]) achieves a max-user-defined-stretch at most x, for a large x. To

see observation 2, let x = max1 j n rj+Pn j=1 p j min1 j n γj and PS(π[x ]) achieves a max-user-defined-stretch at most x . By Observation A.2, there exists the largest index p such that PS(π[xp]) does not achieve a max-user-defined-stretch xp. We show that the optimal value for max-user-defined-stretch

exists in interval [xp, xp+1], where xp+1 denotes max1 j n rj+Pn j=1 p j min1 j n γj if p = κ. By definition of p, PS(π[xp+1]) achieves a max-user-defined-stretch at most xp+1. Let X be the max-user-definedstretch achieved by schedule PS(π[xp]). Clearly, X > xp.

We first show that PS(π[x]) is identical to each other for all x [xp, xp+1), by showing that Ψ(xp) = Ψ(x) for any x [xp, xp+1). Suppose, for proof by contradiction, there exists y (xp, xp+1) such that Ψ(xp) = Ψ(y). Let k be the smallest index such that Ψ(xp)k = Ψ(y)k (the k-th elements of Ψ(xp) and Ψ(S ) are non-equal). For reference, we let Ψ(xp)k = Jq and Ψ(y)k = Jl. Clearly, Jq is in front of Jl in Ψ(xp), but Jl is in front of Jq in Ψ(y), by the choice of k. This means that Jq has a smaller deadline than Jl in Ψ(xp), but a larger deadline in Ψ(y), by the definition of Ψ. We have: rq + xp γq < rl + xp γl and rq + y γq > rl + y γl This means that γq > γl and rq < rl. However, we have:

γq γl < y < xp+1

This contradicts with xp+1 being the smallest element (in S) greater than xp or a new element should be in S, by the definition of xp+1. Thus, we must have Ψ(xp) = Ψ(y) for all y (xp, xp+1), and PS(π[x]) is identical to each other for all x [xp, xp+1).

If X < xp+1, no schedule achieves a max-user-defined-stretch at most x for any x < X , as PS(π[x]) achieves the same max-user-defined-stretch X for all xp x < X . In this case, X is the minimum (optimal) max-user-defined-stretch. If X xp+1, no schedule achieves a max-user-defined-stretch at most x for any x < xp+1, as PS(π[x]) achieves the same max-userdefined-stretch X for all x [xp, xp+1). In this case, PS(π[xp+1]) achieves the minimum (optimal) max-user-defined-stretch xp+1.

For the rest of this section, we restrict our discussion to the max-stretch, i.e., let Γ = (p 1, p 2, ..., p n) in the max-user-defined-stretch, and prove Theorem 3.2. We can find the optimal job order with set S (S = { ri rj

p j p i |1 i < j n, p i = p j, ri rj

p j p i 0} {0}). View set S as a sorted index-based

list S = (x0, x1, ..., xκ). We have [x0, x1), ..., [xκ 1, xκ), [xκ, ) forms a partition of non-negative

Published as a conference paper at ICLR 2025

numbers. The optimal stretch S must be within one of the intervals. Suppose S [xp, xp+1), where we use xκ+1 to represent for convention. Then, PS(π[xp]) achieves the optimal max-stretch.

Lemma A.3. There exists x S such that PS(π[x]) achieves the optimal max-stretch.

Proof. This is a product of Theorem A.1.

Let x S be the critical element with PS(π[x ]) achieving the optimal max-stretch. Observe the following property enabling the bisection search.

Lemma A.4. At lease one job Jj in PS(π[x]) completes later than rj + x p j, for x S, x < x . Every job Jj in PS(π[x]) completes no later than rj + x p j, for x S, x > x .

Proof. We first show that PS(π[x]) contains at least one job Jj that completes later than rj + x p j for x S, x < xp. Suppose otherwise for proof by contradiction. There exists ˆx S, ˆx < xp such that every job Jj completes no later than rj + ˆx p j in PS(π[ˆx]). Then, we have the max-stretch of PS(π[ˆx]) to be at most ˆx < xp, which contradicts the fact that the optimal max-stretch is at least xp. We then show that every job Jj in PS(π[x]) completes no later than rj + x p j for x S, x > xp. If x S, x > xp, then it holds that x xp+1 > S . Since there exists a schedule with S max-stretch, where every job Jj completes no later than rj + S p j, a schedule where every job Jj completes no later than rj + x p j exists for x > S . By optimality of EDF, PS(π[x]) (x > xp) must ensure that every job Jj completes no later than its deadline, rj + x p j.

Lemma A.5. Algorithm 2 admits an O(n2 log n) implementation.

Proof. Computing S takes O(n2) time, as there are O(n2) pairs of jobs. For the same reason, S has O(n2) elements. Sorting S takes O(n2 log n2) = O(n2 log n) time. We sort all jobs by their release time before entering the bisection search, which takes O(n log n) time. The bisection search loop experiences O(log n2) = O(log n) iterations. Consider any iteration. Computing the job priority takes O(n) + O(n log n) = O(n log n) time. Computing the priority schedule with the static job priority takes O(n log n). This is achieved by maintaining a priority queue of active arrived jobs and processing events, including job release and job completion. There are O(n) events for job release and job completion. Every event takes O(log n) to process with the priority queue. Apart from computing the priority schedule, the rest operations in the iteration take O(n) time. Therefore, an iteration of the bisection search takes O(n log n) + O(n log n) + O(n) = O(n log n) time. The overall complexity is:

O(n2 log n) + O(log n) O(n log n) = O(n2 log n)

Theorem A.6 (Theorem 3.2 Restated). Algorithm 2 finds an offline schedule with optimal max-stretch in time polynomial in n. It admits an O(n2 log n) implementation.

Proof for Theorem 3.2. Let the optimal stretch be S . Let xp S such that PS(π[xp]) achieves max-stretch S by Lemma A.3. We show that one of the bisection search iterations visits the xp and finds the optimal job order π[xp]. First observe that the bisection search procedure terminates within a finite number of iterations. Let li, ri denote the left and right ends of the searching interval at the start of the i-th iteration of the search (initially, l0 = 0 and r0 = |S|). We show the invariant that, if li p < ri, either the i-th iteration finds xp or li+1 p < ri+1. The i-th iteration finds xp if li+ri

2 = p. We consider the other two cases: li+ri

2 < p and li+ri

2 > p. Suppose k = li+ri

2 < p. With k < p, at least one job Jj in PS(π[xk]) completes later than rj + xk p j in this iteration by Lemma A.4. Then, we have li+1 = k + 1 p (by k, p both integers) and p < ri = ri+1. Suppose k = li+ri

2 > p. With k > p, every job Jj completes no later than rj + xk p j in this iteration by Lemma A.4. Then, we have li+1 = li p and p < k = ri+1. Observe that we have le re, where e is the total number of iterations, at the termination of the search. This means that one of the iterations must visit the xp and finds the optimal job order π[xp], which shows the correctness of the algorithm. Further, Lemma A.5 shows that the algorithm admits a polynomial-time O(n2 log n) implementation.

Published as a conference paper at ICLR 2025

B MISSING PROOFS FOR SECTION 3.3

Theorem B.1 (Theorem 3.5 Restated). No c-competitive deterministic algorithm exists with c < n for online non-clairvoyant max-stretch scheduling, even if all jobs are released at time 0.

Proof for Theorem 3.5. We use proof by contradiction. Suppose there exists a c-competitive deterministic algorithm A with c < n. Consider n identical jobs all with job size 1 to be released at time 0. Run A against this set of jobs. Without loss of generality, let J1 be the job with the least amount of the processed size l1. We have l1 1

n by the pigeonhole principle.

Construct the problem instance consisting n (n 2) jobs to be released at time 0, with job sizes p 1 = 1 n + ϵ, p 2 = ψ ( 1

n + ϵ), p 3 = ψ2 ψ, p 4 = ψ3 ψ2, ..., p n = ψn 1 ψn 2. Here, p j = ψj 1 ψj 2 for j 3, and ψ is a large constant ψ = max{2, n n c} and ϵ is a small constant ϵ = min{ 1

n}. Observe that 1

n > 0 and thus ϵ > 0 by the definition of ψ. Run A against this job set. Since the algorithm is deterministic, job J1 will be completed after time 1. The max-stretch achieved by A on this instance is more than 1 p 1 = 1/( 1

n + ϵ) = n 1+n ϵ. Consider the schedule that runs J1, J2, ..., Jn sequentially in the order of job index. The stretch for job J1 is S1 = 1. The stretch for job J2 is S2 = ψ/(ψ 1

n ϵ). Observe that job Jj completes at time ψj 1 for j 3. The stretch for job Jj is:

ψj 1 ψj 2 = ψ ψ 1

for all j 3. Observe that ψ ψ 1 1 and ψ ψ 1 ψ/(ψ 1

n ϵ) by the definition of ϵ. Thus, the

max-stretch achieved by this schedule is ψ ψ 1. Therefore, the minimum max-stretch for this job set is at most ψ ψ 1. However, we have the ratio between the max-stretch achieved by A and the optimal schedule more than:

ψ ψ 1 = n 1 1

ψ 1 + n ϵ n 1 1

ψ 1 + n ( 1

which contradicts the competitive ratio of A.

Theorem B.2 (Theorem 3.6 Restated). RR is n-competitive for online non-clairvoyant scheduling.

Proof for Theorem 3.6. With RR, job Jj is processed by at least p j amount during time rj (when the job arrives) and rj + n p j. Therefore, Cj rj + n p j and Sj n for every 1 j n. We have max-stretch = max1 j n Sj n = n 1. Observe that the optimal schedule has a max-stretch of at least 1. Therefore, RR is n-competitive.

We observe instantaneous fairness achieves global fairness under the non-clairvoyant setting. This processor-sharing policy is necessary for non-clairvoyant scheduling, i.e., any algorithm must adapt a certain degree of processor sharing to be competitive. We define processor-sharing as follows. Definition B.3 (k processor-sharing). An algorithm is said to be k processor-sharing for a problem instance I, if it holds either (1) Jj is completed before t, or (2) Jj has been processed for at least t rj

k unit, for any job Jj and any time instant t [rj, ), in the execution of A on I.

Intuitively, a k processor-sharing algorithm allocates at least 1

k unit of processing power to all active jobs. By definition, RR is n processor-sharing for every problem instance. We show the criticality of processor-sharing in obtaining competitiveness in online non-clairvoyant scheduling. Theorem B.4 (Equivalence of Competitiveness and Processor-sharing). In online non-clairvoyant scheduling, a deterministic algorithm is c-competitive if it is c S processor-sharing for any problem instance I with the minimum max-stretch S . Any c-competitive algorithm must be c n processorsharing for every problem instance I. Remark B.5. Any k1 processor-sharing algorithm (for every problem instance) is k2 processorsharing for any k1 k2. As a corollary of Theorem B.4, any k processor-sharing algorithm (for every problem instance) is k-competitive.

Published as a conference paper at ICLR 2025

The proof for Theorem B.4 uses the following bound for the minimum max-stretch. Observation B.6. The minimum max-stretch is at most n for any problem instance with n jobs.

Proof. The max-stretch obtained by RR on any problem instance is at most n. Therefore, the minimum max-stretch for problem instance I is at most n.

Proof for Theorem B.4. Fix a problem instance. If an algorithm is c S processor-sharing for that instance, every job Jj completes before rj + c S p j with Sj c = c S . The max-stretch is at most c times the optimum, so the algorithm is c-competitive. Below we consider the reverse.

Let A be any c-competitive algorithm. Suppose, for proof by contradiction, that there exists a problem instance I with minimum max-stretch S , such that there exists a job Jj and a time instant t [rj, ), where Jj has been processed for less than q = t rj

c n at time t . Construct another instance I , which contains the same job set as I except for the job Jj with p j = q. Run A on I . The schedule is the same as the one generated for I before and at time t . Therefore, Jj is completed after t for I . The max-stretch achieved on I is more than t rj

q = c n. The minimum max-stretch for I is at most n by Observation B.6. Thus, algorithm A achieves a max-stretch on I more than c times the optimum, which contradicts the c competitive ratio.

Theorem B.7 (Theorem 3.7 Restated). Any non-lazy algorithm is (n 1) P + 1-competitive.

Proof for Theorem 3.7. Pick any job Jj. The algorithm is processing active jobs in time [rj, Cj]. We have Sj = Cj rj

p j (n 1) p max+p min p min = (n 1) P + 1 and max-stretch = max1 j n Sj

(n 1) P + 1.

C MISSING PROOFS FOR SECTION 4.2

Define a heuristic relaxed-stretch based on two parameters α and β (α < β).

Definition C.1 (α-β Relaxed-Stretch). Define α-β relaxed-stretch Sj,α,β(t) for job Jj at time t given α and β: Sj,α,β(t) = t rj

α if Jj is active at time t and pj α; Sj,α,β(t) = t rj

active at time t and pj > α. We use Sj,α,β to denote the α-β relaxed-stretch at the completion of job Jj: Sj,α,β = Cj rj

α if pj α; Sj,α,β = Cj rj

β if pj > α. Fixing a problem instance of

online scheduling with predictions and a schedule, we use max-relaxed-stretch, denoted by g Sα,β, to represent the maximum α-β relaxed stretch of all jobs: g Sα,β = max1 j n Sj,α,β. We drop the subscripts α and β from the notations when the context is clear for simplicity. We also drop the prefix α-β when the context is clear.

Classify jobs into short and long jobs according to their job size predictions, with a given α and β.

Definition C.2 (α-β Short and Long Jobs). Given parameters α and β, a job Jj is said to be short if pj α; a job Jj is said to be long otherwise.

Fix a problem instance I. Run RG. Let SRG and S denote the max-stretch achieved by RG and the optimal max-stretch of I. Let g SRG and f S denote the max-relaxed-stretch achieved by RG and the optimal max-relaxed-stretch of I. The quantity f S exists due to Theorem A.1, regardless of the values of α and β. We study the relations between SRG, g SRG, f S , and S . We begin with showing g SRG 3 f S . We consider how much RG differs from an optimal execution by examining the maximum amount of processing units a job can be delayed compared with the optimal schedule. Define edj to be the deadline of completing job Jj to achieve the max-relaxed-stretch, i.e., edj = rj + f S α for short job Jj and edj = rj + f S β for long job Jj. A schedule achieves the optimal max-relaxed-stretch if every job Jj completes no later than edj. We study how much any job Jj can be delayed after edj in RG.

Published as a conference paper at ICLR 2025

Observe that EDF finds a feasible schedule meeting all deadlines if one exists. The schedule obeying EDF with deadlines edj achieves the optimal max-relaxed-stretch f S . Define set f Pj(t) = {Jx|f dx edj, Jx active at time t} and set f Fj(t) = {Jx|f dx > edj, Jx active at time t} to be the predecessors and followers (in terms of the deadlines) of Jj at time t.

Definition C.3 (Definition of Delay). Job Jj is said to be delayed by another job Jk at time t, if the following four conditions are simultaneously met: (1) Jj is not completed or not released at time t, (2) Jk f Fj(t), (3) f Pj(t) = , and (4) RG is processing Jk at time t with max Jx f Pj(t) f Sx(t) < f Sk(t).

If ] CEDF j and CRG j denote the completion time for Jj in the (optimal) schedule constructed by EDF

with f S and the schedule by RG, it follows CRG j ] CEDF j + R

any job delays Jj at time t dt. The following two lemmas hold with any given α and β (α < β).

Lemma C.4 (No Job Delays a Long Job). A long job is never delayed, for any α and β (α < β).

Proof. Fix any long job Jj and a time t before the completion of Jj. Suppose, for proof by contradiction, that another job Jk f Fj(t) delays Jj at time t. Job Jk must be a short job, since otherwise, we would have rj < rk and f Sj(t) > f Sk(t). We also have f dk > edj rk + f S α > rj + f S β rk > rj. Thus, Jj is active at time t. We have:

f Sj(t) < f Sk(t) t rj

α t > rk β rj α

Observe, by the definition of delay, that Jk f Fj(t) implies rk + f S α > rj + f S β. It holds that:

t > (rj + f S β f S α) β rj α β α = rj + f S β = edj

Thus, no job delays Jj before time edj. In such a case, however, Jj completes no later than edj, which is earlier than t, contradicting the definition of delay.

Lemma C.5 (Bound on the Delay for a Short Job). A short job Jj completes no later than rj+3 f S α, for any α and β (α < β).

Proof. We claim that any short job Jj can only be delayed before time edj, i.e., some job Jk f Fj(t) delays Jj at some time t implies t < edj. We show this via proof by contradiction. Fix a short job Jj. Suppose another job Jk f Fj(t) delays Jj at some time t edj. Job Jj must be active at time t. Job Jk must be a long job, since otherwise, we would have Jk f Fj(t) implying f dk > edj, which means rk > rj and f Sj(t) > f Sk(t). We have:

f Sj(t) < f Sk(t) t rj

β t < rj β rk α

With Jk f Fj(t), it holds rk + f S β > rj + f S α. We have:

t < rj β (rj + f S α f S β) α β α = rj + f S α = edj

which contradicts Jk delays job Jj at time t edj.

Next, consider any short job Jj. Define τj to be the earliest time after which no job delays Jj, i.e., for any time t after τj and before the completion time of Jj, the processor always processes a job from f Pj(t) when f Pj(t) = , and τj is the earliest time instant such that the condition holds (τj is the last time Jj is delayed). Consider set f Pj(τj). Observe that every job Jx f Pj(τj) is delayed by the job (call it Jq) running at time instant τj, since f dx edj and edj < edq. It follows f Pj(τj) contains only short jobs, by Lemma C.4. Observe that, for any job Jx f Pj(τj), we have τj f dx, or

Published as a conference paper at ICLR 2025

equivalently rx τj f S α, as job Jx is delayed at τj. We bound the total amount of delay on Jj by P

Jx f Pj(τj) p x. Consider the optimal schedule constructed by EDF with f S . We have two cases.

Case 1 (τj < rj). In the optimal schedule, all jobs in f Pj(τj) are released no earlier than τj f S α and no later than τj and completed no later than τj + f S α. We have: X

Jx f Pj(τj) p x τj + f S α (τj f S α) = 2 f S α

Case 2 (τj rj). In the optimal schedule, all jobs in f Pj(τj) (including job Jj) are released no earlier than τj f S α and completed no later than edj = rj + f S α. We have: X

Jx f Pj(τj) p x edj (τj f S α) = rj + 2 f S α τj 2 f S α

Thus, the completion time for a short job Jj in RG is at most edj +P

Jx f Pj(τj) p x rj +3 f S α

Let f S j and g SRG j be the relaxed-stretch of Jj in the schedule constructed by EDF with f S and RG.

Lemma C.6. For any given α and β (α < β), it holds g SRG j f S for any long job Jj and g SRG j 3 f S for any short job Jj.

Proof. Pick any job Jj. If Jj is a long job, it follows by Lemma C.4 that:

CRG j ] CEDF j + Z

any job delays Jj at time t dt = ] CEDF j g SRG j f S j f S

If Jj is a short job, it follows by Lemma C.5 that:

g SRG j = CRG j rj α (rj + 3 f S α) rj

Denote SRG j and g SRG j to be the stretch and the relaxed-stretch of job Jj in the execution of RG. The

following lemmas show the relation between f S with S and SRG j with g SRG j by using α = pmin pmax

3 and β = pmax. Lemma C.7 requires α < β and pj β for all 1 j n to hold; Lemma C.8 requires the specific values for α and β to hold. They do not hold under any arbitrary α and β.

Lemma C.7. f S η S , for any given α < β and pj β for all 1 j n.

Proof. Denote SOP T to be the max-relaxed-stretch of the optimal schedule for the max-stretch

problem (the schedule with max-stretch S ). It follows f S SOP T . Consider the optimal schedule

with max-stretch S . Let C j and SOP T j denote the completion time of job Jj and the relaxed-stretch

of Jj in this schedule. Pick any job Jj. It follows pj α and SOP T j = C j rj

p j η S , if Jj is a short job. It follows pj β and SOP T j = C j rj

p j η S , if Jj is a long job. Combining these two cases gives us f S SOP T η S .

Lemma C.8. It holds SRG j

P g SRG j for any long job Jj and SRG j η

P g SRG j for

any short job Jj, with α = pmin pmax

3 and β = pmax.

Published as a conference paper at ICLR 2025

Proof. Pick any job Jj. If job Jj is a long job, it holds that pj > α = pmin pmax

3 . We have:

SRG j = CRG j rj p j = CRG j rj β β

p j g SRG j η β

pj < g SRG j η

3 pmax pmin pmax

pmin g SRG j η

If job Jj is a short job, we have:

SRG j = CRG j rj p j CRG j rj p min = CRG j rj α α

p min g SRG j

pη p min η p max

The derivation uses the observation: pmax η p max, pmin p min

η , and pmin η p min.

Combining the above results completes the proof for Theorem 4.3.

Theorem C.9 (Theorem 4.3 Restated). Relaxed-Greedy is

P-competitive for online scheduling with predictions.

Proof for Theorem 4.3. Consider the execution of RG. Pick any job Jj. If job Jj is a long job, we have, by Lemmas C.8, C.6, and C.7, that:

If job Jj is a short job, we have, by Lemmas C.8, C.6, and C.7, that:

P g SRG j η

Therefore, we have SRG = max1 j n SRG j

P S . RG is

P-competitive.

We show that RG is Θ(n P)-competitive with the worst-case predictions. Theorem C.10 (Theorem 4.4 Restated). RG is (n 1) P + 1-competitive with arbitrarily bad predictions, and this bound is tight.

Proof for Theorem 4.4. The non-lazy RG is (n 1) P + 1-competitive by Theorem 3.7. To see the bound is tight, we show that, for any small ϵ > 0, there exists a set of n jobs with the associated predictions such that the max-stretch obtained by RG is at least (n 1) P +1 ϵ times the optimum. Fix any 0 < ϵ < 1. Let ω = ϵ P (n 1) P +(1 ϵ). Construct n jobs: p 1 = 1, r1 = 0, p1 = n4 P 2

ω2 + 3 and p j = P, rj = (1 ω) + (j 2) P, pj = 1 for all j 2. For this instance, we have pmin = 1 and pmax = p1. The optimal schedule processes job J1, J2, ..., Jn in order and has max-stretch 1 + ω

P . Algorithm RG will classify job J1 as a long job and the rest as short jobs. In the execution of RG, job J1 is not processed in the intervals (rj + rj p1 1, CRG j ] as f S1(t) < f Sj(t) for t (rj + rj p1 1, CRG j ]

for all 2 j n. Thus, before CRG n , job J1 has been processed for at most:

rj p1 1 = (1 ω) +

(1 ω) + (j 2) P q

ω2 + 3 1 < 1

At CRG n , jobs J2, ..., Jn are completed, leaving J1 to be completed at time (n 1) P + 1. The max-stretch obtained by RG is (n 1) P +1, which is (n 1) P +1 ϵ times 1+ ω

P , the optimum max-stretch.

We show RG s performance for its run-time and space complexity and preemptions. Lemma C.11. A short job Jj remains to have the largest relaxed-stretch among active jobs for all time t t before its completion if it had the largest relaxed-stretch among active jobs at some time t .

Published as a conference paper at ICLR 2025

Proof. Suppose job Jj has the largest relaxed-stretch among active jobs at time t . For any short job Jk active at time t , we have f Sj(t ) f Sk(t ) rj rk. Fix any time t t . Any short job Jk active at time t (including those released after t ) has rj rk and, thus, f Sj(t) f Sk(t). For any long job Jk released after t , we have t rj

β f Sj(t) f Sk(t), as rj rk and α < β. For any long job Jk

active at time t , we have t rj

β t β rj α rk

β α t β rj α rk

β α f Sj(t) f Sk(t).

Theorem C.12 (Theorem 4.6 Restated). Relaxed-Greedy admits an implementation with O(1) run-time to make any online decision and O(n) space.

Proof for Theorem 4.6. Main two First-In-First-Out (FIFO) queues: short queue and long queue. The short/long queue contains short/long jobs in order of release time. The algorithm always processes the job with the largest relaxed-stretch at the front of the two queues and removes a job from the (front of the) queue when the job is completed. We claim that one of the jobs at the front of the queues has the overall largest relaxed-stretch at any time t. To see this, it holds that rj rk t rj

α f Sj(t) f Sk(t) for any short jobs Jj, Jk, and that rj rk t rj

β f Sj(t) f Sk(t) for any long jobs Jj, Jk. Therefore, Lines 17 and 22 in Algorithm 1 each require O(1) time to run. The rest operations take O(1) time to run. Finally, since the algorithm maintains at most all jobs in memory and a constant number of support variables, the space required is O(n).

Instead of running Job Switch at every time t, the algorithm may compute a critical time t such that t rj

β , if a long job Jk is currently being processed, where Jj and Jk are the short (if there are any) and long jobs at the front of the queues. Update or re-compute this t if Jk is completed, or a new short job is released when there is no short job prior to the job s release. The algorithm calls Job Switch only at time t , which minimizes the number of function calls. This is supported by the observation that Job Switch is executed only when a long job is being processed by Lemma C.11.

Theorem C.13 (Theorem 4.7 Restated). Relaxed-Greedy generates a schedule with at most n 1 preemptions. Thus, the amortized number of preemptions per job is at most 1.

Proof for Theorem 4.7. Preemptions are triggered by Job Switch, which is not triggered while a short job is being processed by Lemma C.11. A long job may be preempted by short jobs; these short jobs, however, will not be preempted by Lemma C.11. Thus, the number of preemptions is bounded by the total number of short jobs, which is at most n 1. No preemptions occur if all jobs have the same job size prediction. Therefore, the amortized number of preemptions (per job) is at most n 1

D DISCUSSION OF THE CONSTANT IN RELAXED-STRETCH FOR SECTION 4.2

We show that replacing the constant

3 in α with any other constant maintains RG s competitive ratio, as long as α < β. The idea applies to the family of RG-based algorithms, including ARG, PRG, RGx, and their RR-augmentations. For simplicity, we focus our discussion on RG.

Fix any constant µ (µ > q

pmin pmax ). Set α = pmin pmax

µ and β = pmax in RG. It holds α < β. With

parameters α and β, we (re-)use the definitions, lemmas, and notations from Section 4.2. We show the following theorem.

Theorem D.1. Relaxed-Greedy is max{µ, 3

P-competitive with α = pmin pmax

pmax, and µ > q

pmin pmax , for online scheduling with predictions.

Published as a conference paper at ICLR 2025

Algorithm 3: Adaptive Relaxed-Greedy Data :job size prediction pj upon job release and a prediction for minimum job size pe min Result :schedule with a max-stretch O(λ0.5 η2.5

P) times the optimum max-stretch

1 pmin pe min, pmax 1, α 1, β 1

2 Function relaxed Stretch(j, t) // computes adaptive relaxed-stretch for job Jj at t

3 if Jj is completed before t then

5 if pj αrj then

6 return t rj

8 return t rj

9 Event Function Job Release() // job Jj is released at time rj 10 if U = then

11 run Jj on the machine.

12 insert Jj into U.

13 pmax max(pmax, pj), α pmin pmax

3 , β max(pmax, α), αrj α, βrj β

14 The other functions (Job Complete, Job Switch) remain the same as in Algorithm 1.

Proof. Consider the execution of RG. Pick any job Jj. If job Jj is a long job (pj > α = pmin pmax

µ ), we have:

SRG j = CRG j rj p j = CRG j rj β β

p j g SRG j η β

pj < g SRG j η µ pmax pmin pmax

g SRG j η µ

p min/η = µ η2

P g SRG j µ η2

where the second last inequality is by Lemma C.6, and the last inequality is by Lemma C.7. If job Jj is a short job, we have:

SRG j = CRG j rj p j CRG j rj p min = CRG j rj α α p min = g SRG j pmin pmax

pη p min η p max µ p min = η

P g SRG j 3

where the second last inequality is by Lemma C.6, and the last inequality is by Lemma C.7. Therefore, we have SRG = max1 j n SRG j max{ 3

The implications of Theorem D.1 are as follows. Altering the constant

3 in α preserves RG s O(η3

P) competitive ratio, i.e., the choice does not affect the asymptotic performance guarantee. This opens the door to searching for the appropriate constant for specific applications via algorithm engineering. In addition, choosing µ =

3 yields the optimal constant in the competitive ratio, as max{ 3

3 for all µ > 0 and the minimum is obtained at µ =

3. This justifies our choice of

3 in the definition of α.

E MISSING PROOFS FOR SECTION 4.3

We introduce subscript t to denote the value of a variable at time t. Thus, pmax at time t (denoted by pmax,t) represents the maximum job size prediction seen so far, i.e., pmax,t = maxrj t pj. Parameters α and β are functions in pmin and pmax and thus are time-varying. They are updated with

updates of pmax in Job Release. Formally, ARG sets αt =

pe min pmax,t

3 and βt = max{pmax,t, αt},

Published as a conference paper at ICLR 2025

where αt and βt denote the values of α and β at time t. We record αrj and βrj for every job Jj. These values are used to compute the adaptive relaxed-stretch of Jj, defined as follows.

Definition E.1 (α-β Adaptive Relaxed-Stretch). Define α-β adaptive relaxed-stretch \ Sj,α,β(t) for job Jj at time t given α and β: \ Sj,α,β(t) = t rj

αrj if Jj is active at time t and pj αrj; \ Sj,α,β(t) = t rj

βrj if Jj is active at time t and pj > αrj. We use \ Sj,α,β to denote the α-β adaptive relaxed-stretch at the completion of job Jj: \ Sj,α,β = Cj rj

αrj if pj αrj; \ Sj,α,β = Cj rj

βrj if pj > αrj. Fixing a

schedule, we use max-adaptive-relaxed-stretch, denoted by d Sα,β to represent the maximum α-β adaptive relaxed-stretch of all jobs: d Sα,β = max1 j n \ Sj,α,β. We drop the subscripts α and β from the notations and the prefix α-β when the context is clear for simplicity.

The classification of job Jj now depends on the relationship between pj and αrj. Definition E.2 (α-β r-short and r-long Jobs). Given parameters α and β, a job Jj is said to be r-short if pj αrj; a job Jj is said to be r-long otherwise.

Definition E.3 (Minimum max-adaptive-relaxed-stretch). We use d S α,β to denote the minimum max-adaptive-relaxed-stretch, given any α and β. The existence of this quantity is due to Theorem A.1. We drop the subscripts α and β from the notations when the context is clear.

Define bdj to be the deadline of completing job Jj to achieve the minimum max-adaptive-relaxedstretch, i.e., bdj = rj + c S αrj for r-short job Jj and bdj = rj + c S βrj for r-long job Jj. A schedule achieves the optimal max-adaptive-relaxed-stretch if every job Jj completes no later than bdj. Define set c Pj(t) = {Jx|c dx bdj, Jx active at time t} and set c Fj(t) = {Jx|c dx > edj, Jx active at time t} to be the predecessors and followers of Jj at time t in terms of minimizing the max-adaptive-relaxedstretch.

Definition E.4 (Definition of -delay). Job Jj is said to be -delayed by another job Jk at time t, if the following four conditions are simultaneously met: (1) Jj is not completed or not released at time t, (2) Jk c Fj(t), (3) c Pj(t) = , and (4) ARG is processing Jk at time t with max Jx c Pj(t) c Sx(t) < c Sk(t).

The symbol is only to distinguish -delay from delay; it does not contain any other meaning. We first make the following observation. Observation E.5. For any x y, it holds (1) αx βx, (2) αx αy, and (3) βx βy.

Lemma E.6 (No Job -delays a r-long Job). A r-long job is never -delayed.

Proof. Fix a r-long job Jj. Consider, for the proof by contradiction, another job Jk c Fj(t) - delays Jj at some time t. Observe that rj < rk, since otherwise, we would have bdj = rj + c S βrj rk + c S βrk c dk. Thus, Jj is active at time t. Job Jk must be r-short, since otherwise, c Sj(t) = t rj

βrk = c Sk(t). We have:

c Sj(t) < c Sk(t) t rj

αrk βrj > αrk and t > rk βrj rj αrk

βrj αrk rj + c S βrj = bdj

where the last inequality is due to c dk > bdj rk + c S αrk > rj + c S βrj. Thus, no job -delays Jj before time bdj. However, in this case, job Jj completes no later than bdj, which is earlier than t, contradicting the definition of -delay.

Lemma E.7 (Bound on the Delay for a r-short Job). A r-short job Jj completes no later than rj + 3 c S αrj.

Proof. We claim that any r-short job can only be -delayed before bdj, i.e., some job Jk c Fj(t) -delays Jj at time t implies t < bdj. We show this via proof by contradiction. Fix a r-short job Jj.

Published as a conference paper at ICLR 2025

Suppose another job Jk c Fj(t) -delays Jj at some time t bdj. Job Jj must be active at time t. Observe that rj > rk, since otherwise, we would have bdj = rj + c S αrj rk + c S αrk c dk. Job Jk must be r-long, since otherwise, c dk = rk + c S αrk < rj + c S αrj = bdj. Then, it follows c dk > bdj rk + c S βrk > rj + c S αrj βrk > αrj. We have:

c Sk(t) > c Sj(t) t rk

αrj t < rj βrk rk αrj

βrk αrj < rj + c S αrj = bdj

where the last inequality is due to c dk > bdj rk + c S βrk > rj + c S αrj. It follows bdj t < bdj, contradicting that Jk -delays job Jj at time t bdj.

Next, consider any r-short job Jj. Let τj denote the earliest time after which no job -delays Jj. Every job Jx c Pj(τj) is -delayed at time τj by the job Jl running at time instant τj, since c dx bdj < bdl. Thus, every job Jx c Pj(τj) is r-short by Lemma E.6. It follows that, for any job Jx c Pj(τj), c dx bdj rx + c S αrx rj + c S αrj rx rj and τt c dx rx τj c S αrx τj c S αrj. We bound the total amount of delay on Jj by P

Jx c Pj(τj) p x. Consider

the optimal schedule (in minimizing adaptive-relaxed-stretch) constructed by EDF with c S . We have two cases.

Case 1 (τj < rj). In the optimal schedule, all jobs in c Pj(τj) are released no earlier than τj c S αrj and no later than τj and completed no later than τj + c S αy τj + c S αrj, where y denotes max Jx c Pj(τj) rx. We have: X

Jx c Pj(τj) p x τj + c S αrj (τj c S αrj) = 2 c S αrj

Case 2 (τj rj). In the optimal schedule, all jobs in c Pj(τj) (including job Jj) are released no earlier than τj c S αrj and completed no later than bdj = rj + c S αrj. We have: X

Jx c Pj(τj) p x bdj (τj c S αrj) = rj + 2 c S αrj τj 2 c S αrj

Thus, the completion time for a r-short job Jj in ARG is at most bdj + P

Jx c Pj(τj) p x rj + 3 c S αrj

Let [ CEDF j and CARG j be the completion time of Jj in the schedule constructed by EDF with c S and

ARG. Let c S j and [ SARG j be the adaptive relaxed-stretch of Jj in these two schedules.

Lemma E.8. It holds [ SARG j c S for any r-long job Jj and [ SARG j 3 c S for any r-short job Jj.

Proof. Pick any job Jj. If Jj is a r-long job, it follows by Lemma E.6 that:

CARG j [ CEDF j + Z

any job -delays Jj at time t dt = [ CEDF j [ SARG j c S j c S

If Jj is a r-short job by Lemma E.7, it follows that:

[ SARG j = CARG j rj αrj (rj + 3 c S αrj) rj

αrj = 3 c S

Lemma E.9. c S η S , with αt =

pe min pmax,t

3 and βt = max{pmax,t, αt}.

Proof. Let \ SOP T be the max-adaptive-relaxed-stretch of the optimal schedule for the max-stretch problem. It follows c S \ SOP T . Consider the optimal schedule with the minimum max-stretch.

Published as a conference paper at ICLR 2025

Let C j and \ SOP T j be the completion time of job Jj and the adaptive relaxed-stretch of Jj in this

schedule. Pick any job Jj. If job Jj is a r-short job, it follows pj αrj and \ SOP T j = C j rj αrj

pj η C j rj

p j η S . If job Jj is a r-long job, it follows pj maxrx rj px = pmax,rj = βrj

and \ SOP T j = C j rj βrj C j rj

pj η C j rj

p j η S . Combining these two cases gives us

c S \ SOP T = max1 j n \ SOP T j η S .

Let SARG j and [ SARG j be the stretch and adaptive relaxed-stretch of job Jj in the execution of ARG.

Lemma E.10. It holds SARG j

3 λ0.5 η1.5

P [ SARG j for any r-long job Jj and SARG j

P [ SARG j for any r-short job Jj, with αt =

pe min pmax,t

3 and βt = max{pmax,t, αt}.

Proof. Pick any job Jj. If job Jj is r-long, it follows pj > αrj = pe min pmax,rj

3 . We have:

SARG j = CARG j rj p j = CARG j rj βrj βrj

p j [ SARG j η βrj

pj < [ SARG j η

3 pmax,rj ppe min pmax,rj

pe min [ SARG j η

3 λ0.5 η1.5

If job Jj is r-short, we have:

SARG j CARG j rj p min = CARG j rj αrj αrj

p min [ SARG j

λ p min η p max

3 p min = λ0.5 η0.5

The derivation uses the observation: pmax,t = maxrx t px pmax η p max for any t 0.

Theorem E.11 (Theorem 4.10 Restated). Adaptive Relaxed-Greedy is

3 λ0.5 η2.5

Pcompetitive for online scheduling with predictions.

Proof for Theorem 4.10. If job Jj is r-long, we have, by Lemmas E.10, E.8, and E.9, that:

3 λ0.5 η1.5

3 λ0.5 η1.5

3 λ0.5 η2.5

If job Jj is r-short, we have, by Lemmas E.10, E.8, and E.9, that:

SARG j λ0.5 η0.5

P [ SARG j λ0.5 η0.5

3 λ0.5 η1.5

Therefore, we have SARG

3 λ0.5 η2.5

P S . ARG is

3 λ0.5 η2.5

P-competitive.

Algorithm ARG is (n 1) P + 1-robust. The robustness result is tight by the same construction used in Theorem 4.4 with pe min = 1, where ARG sets α = p1

3 and β = p1 at time 0, remains these parameters constants, and produces the same result as RG.

For proving the run-time and space complexity, as well as the number of preemptions, we first show an equivalent version of Lemma C.11 for ARG. Lemma E.12. A r-short job Jj remains to have the largest adaptive relaxed-stretch among active jobs for all time t t before its completion if it had the largest adaptive relaxed-stretch among active jobs at some time t .

Proof. Suppose job Jj has the largest adaptive relaxed-stretch among active jobs at time t . For any r-short job Jk active at time t , we have c Sj(t ) c Sk(t ) t rj

Fix any time t t . Any r-short job Jk active at time t (including those released after t ) has

Published as a conference paper at ICLR 2025

rj rk and αrj αrk, thus, c Sj(t) c Sk(t). For any r-long job Jk released after t , we have t rj

βrk c Sj(t) c Sk(t), as rj rk and αrj βrj βrk. For any r-long job Jk active at

time t , we have:

βrk t βrk rj αrj rk

βrk αrj t βrk rj αrj rk

βrk αrj c Sj(t) c Sk(t)

Therefore, job Jj remains to have the largest adaptive relaxed-stretch among active jobs at time t.

Theorem E.13 (Theorem 4.12 Restated). Adaptive Relaxed-Greedy admits an implementation with O(1) run-time to make any online decision and O(n) space.

Proof for Theorem 4.12. Similar to the proof for Theorem 4.6. Main two First-In-First-Out (FIFO) queues: short queue and long queue. The short queue contains r-short jobs in order of release time; the long queue contains r-long jobs in order of release time. The algorithm always processes the job with the largest adaptive relaxed-stretch at the front of the two queues and removes a job from the (front of the) queue when the job is completed. We show that one of the jobs at the front of the queues has the overall largest adaptive relaxed-stretch at any time t. To see this, it holds that rj rk t rj

αrk c Sj(t) c Sk(t) for any r-short jobs Jj, Jk and that

βrk c Sj(t) c Sk(t) for any r-long jobs Jj, Jk. Finally, as the algorithm maintains at most all jobs in memory and a constant number of support variables, the space required is at most O(n).

Theorem E.14 (Theorem 4.13 Restated). Adaptive Relaxed-Greedy generates a schedule with at most n 1 preemptions. Thus, the amortized number of preemptions per job is at most 1.

Proof for Theorem 4.13. Preemptions are triggered only by Job Switch. Observe that Job Switch is not triggered while a r-short job is being processed by Lemma E.12. A r-long job may be preempted while being processed by r-short jobs; these r-short jobs, however, will not be preempted by Lemma E.12. Thus, the number of preemptions is bounded by the total number of r-short jobs, which is at most n 1. Note that no preemptions occur if all jobs have the same job size prediction pmax. As a result, the amortized number of preemptions per job is at most n 1

F MISSING PROOFS FOR SECTION 4.4

We prove that PRG achieves an O(λ0.5 φ0.5 η max{η, φ}

P) competitive ratio. With static α and β, we (re-)use the definitions, lemmas, and notations from Section 4.2. Theorem F.1 (Theorem 4.14 Restated). Predictive Relaxed-Greedy is max{ 3

µ, µ} λ0.5 φ0.5 η

P-competitive for online scheduling with predictions.

Proof for Theorem 4.14. Consider the execution of PRG. Pick any job Jj. If job Jj is a long job, we have:

SPRG j = CPRG j rj β β

p j ] SPRG j η β

pj < ] SPRG j η µ pe max ppe min pemax ] SPRG j η µ

] SPRG j η µ

p min/λ µ λ0.5 φ0.5 η

P f S (By Lemma C.6)

If job Jj is a short job, we have:

SPRG j CPRG j rj p min ] SPRG j

λ p min φ p max µ p min 3

µ λ0.5 φ0.5

P f S (By Lemma C.6)

We then bound f S by S through SOP T . Pick any job Jj. It follows SOP T j = C j rj

p j η S , if job Jj is a short job. It follows SOP T j = C j rj

β φ C j rj p max φ C j rj

Published as a conference paper at ICLR 2025

if job Jj is a long job. Therefore, we have f S SOP T = max1 j n SOP T j max{η, φ} S , and, thus, SPRG = max1 j n SPRG j max{ 3

µ, µ} λ0.5 φ0.5 η max{η, φ}

Algorithm PRG is (n 1) P + 1-robust. The robustness result for PRG is tight by the same construction used in Theorem 4.4 with pe min = 1 and pe max = p1.

G MISSING PROOFS FOR SECTION 5

We show that RGx is O(η2+2x P 1 x)-competitive.

Theorem G.1 (Theorem 5.1 Restated). RGx (0 x 1

3 η2+2x P 1 x-competitive for online scheduling with predictions.

Proof for Theorem 5.1. Consider the execution of RGx. Pick any job Jj. If job Jj is a long job, we have:

SRGx j ] SRGx j η β

pj < ] SRGx j η

3 pmax px min p1 x max

3 η1+2x P x ] SRGx j

3 η1+2x P x f S

3 η2+2x P x S (By Lemmas C.6 and C.7)

If job Jj is a short job, we have, by Lemmas C.6 and C.7, that:

SRGx j ] SRGx j α p min ] SRGx j px min p1 x max

3 p min η P 1 x

3 η2 P 1 x S

Therefore, we have SRGx = max1 j n SRGx j

3 η2+2x P 1 x S .

Theorem G.2 (Theorem 5.2 Restated). RG0 (with α = pmax, β = pmax), or FIFO, is P-competitive.

Proof for Theorem 5.2. We first show RG0 minimizes max1 j n Cj rj (max-flow). If ω = max1 j n Cj rj (existence of ω is due to Theorem A.1), set dj = rj +ω for all 1 j n and run EDF with dj. The resulting schedule, which is identical to the schedule generated by RG0, achieves the minimum max-flow. Let Jk be the job with the largest flow in the schedule that minimizes max-stretch (e.g., C k rk = max1 j n C j rj). Consider the execution of RG0. Pick any job Jj. We have:

SRG0 j = CRG0 j rj p j max1 q n CRG0 q rq p j max1 q n C q rq p j = C k rk

p k p k p j S P

Therefore, we have SRG0 = max1 j n SRG0 j S P.

H CONSTRUCTION OF THE TRADE-OFF ALGORITHM FOR SECTION 5

We introduce parameters θ = (a1, a2, b1, b2, µ) and parametrize α = pa1 min pa2 max µ and β = pb1 min pb2 max in the definition of relaxed-stretch. This section shows the construction of RGx, the trade-off between consistency and smoothness under this parametrization.

For Lemmas C.4, C.5, and C.6 to hold, we must ensure α < β. For Lemma C.7 to hold, we must ensure pj β for all 1 j n. We will enforce these two relations after we derive the others we desire. For now, we assume that these relations hold. Run RG with the parametrized α and β. Pick any job Jj. If Jj is a long job, it holds that pj > α, and we have:

SRG j = CRG j rj p j = CRG j rj β β

p j g SRG j η β

pj < g SRG j η µ pb1 min pb2 max pa1 min pa2 max

= g SRG j µ η pb2 a2 max pa1 b1 min µ η pb2 a2 max pa1 b1 min f S µ η2 pb2 a2 max pa1 b1 min S (By Lemma C.6 and C.7)

Published as a conference paper at ICLR 2025

If Jj is a short job, we have:

SRG j = CRG j rj p j CRG j rj p min = CRG j rj α α p min = g SRG j pa1 min pa2 max µ p min

µ pa1 min pa2 max p min f S 3

µ η pa1 min pa2 max p min S (By Lemma C.6 and C.7)

µ η1+|a1| pa2 max (p min)1 a1 S

where the last inequality is due to pa1 min η|a1| (p min)a1, as pa1 min (η p min)a1 = η|a1| (p min)a1

if a1 0 and pa1 min = ( 1 pmin ) a1 ( η p min ) a1 = η|a1| (p min)a1 if a1 < 0.

The constant term to appear in the competitive ratio is max{µ, 3

µ}, which is at least

3 (the minimum is achieved at µ =

3). Therefore, we choose µ =

3 for an optimal constant term.

As we aim to relate the competitive ratio to P, the maximum ratio of any two job sizes, we enforce the power of pmax to equal that of pmin (or p min) on the denominator. Thus, we set:

a1 + a2 = b1 + b2 = 1

To enforce α < β, or α

β = pa1 b1 min pa2 b2 max

pmax )a1 b1 < 1, we set:

To enforce pj β for all 1 j n, or pmax pb1 min pb2 max = pb1 min p1 b1 max ( pmax

pmin )b1 1, we set: b1 0 With these constraints, the results (i.e., Lemmas) used in this section can hold.

We have (with a1 + a2 = b1 + b2 and a1 b1):

3 η2 pb2 a2 max pa1 b1 min S =

pmin )a1 b1 S

3 η2 (η p max p min/η )a1 b1 S

3 η2+2a1 2b1 P a1 b1 S

for Jj being a long job. We have (with a2 = 1 a1):

3 η1+|a1| (pmax

p min )1 a1 S

for Jj being a short job.

Observe that FIFO is P-competitive by Theorem 5.2. We set the power of P and that of pmax

p min (which will relate to P) to be at most 1, as FIFO will otherwise dominate our algorithm. We enforce:

a1 b1 1 a1 1 + b1 1

Fix any 0 a1 1, decreasing b1 increases both the power of η and P in term

3 η2+2a1 2b1 P a1 b1 S . We choose b1 = 0 (and therefore b2 = 1) for the optimal (minimal) competitive ratio.

With θ = (a1, 1 a1, 0, 1,

3), we have:

3 η2+2a1 P a1 S

for Jj being a long job, and:

3 η1+a1 (pmax

p min )1 a1 S

3 η1+a1 (η p max p min )1 a1 S =

3 η2 P 1 a1 S

Published as a conference paper at ICLR 2025

for Jj being a short job. Therefore, we have SRG = max1 j n SRG j

3 η2+2a1 P max{a1,1 a1} S . For the competitive ratio, RG with a1 = 1

2 dominates that with a1 = 1

2 + for any 0 < 1

2. Therefore, we restrict:

2 The competitive ratio of RG (with θ = (a1, 1 a1, 0, 1,

3) and 0 a1 1

3 η2+2a1 P 1 a1.

Replacing a1 by x (0 x 1

2), we construct RGx with α = px min p1 x max

3 and β = pmax.

I VISUALIZATION OF CONSISTENCY-SMOOTHNESS TRADE-OFFS

The trade-off between consistency and smoothness is visualized in Figure 7.

Figure 7: Trade-off between consistency and smoothness. The graphs show how the hyper-parameter x (on the x-axis) changes the competitive ratio as a function of η (left curve (x, η, η2+2x)) and P (right curve (x, P, P 1 x)), i.e., the projection on plane x = a (0 a 1

2 ) represents the competitive ratio as a function of η and P .

J MISSING PROOFS FOR SECTION 6

Theorem J.1 (Theorem 6.1 Restated). Any algorithm is at best ϵ 1 ϵ n + 1-competitive, if given a (1 ϵ)-speed machine, for any 0 < ϵ < 1.

Proof for Theorem 6.1. Construct n jobs: p j = 1, rj = j 1 for all 1 j n. Clearly, the optimal schedule with a unit-speed machine has a max-stretch of 1. Run any algorithm with a (1 ϵ)-speed machine. Let the last completed job be Jk. Clearly, we have Ck n 1 ϵ and rk n 1. Therefore, the max-stretch achieved is at least Sk = Ck rk

1 n 1 ϵ (n 1), which is ϵ 1 ϵ n + 1 times the optimum.

We prove Lemma 6.3, which claims the monotonicity of the algorithms mentioned in this paper. We first show that the RG-based algorithms (RG, ARG, PRG, RGx) are monotonic. Our proof shows that (1) any RG-based algorithm has time-varying priority execution for all problem inputs, and (2) any algorithm with time-varying priority execution for all problem inputs is monotonic. We define time-varying priority execution as follows.

Definition J.2 (Time-varying Priority Execution). Let A be an algorithm and I a problem input. We say A has a time-varying priority execution for I if there exist a partitioning of time 0 = t0 t1 t2 ... t P and a mapping F from time partitions to total orderings of jobs (i.e., F(tx, tx+1) is an order of all jobs) such that A always processes the active job that has the highest priority indicated by F(tx, tx+1) (jobs at the front of F(tx, tx+1) have higher priorities) during (tx, tx+1) for all 0 x P, where t P +1 represents .

Here, a problem input differs from a problem instance. A problem input reveals the job size prediction to an algorithm when a job is released, i.e., problem input includes release times and job size predictions. A problem instance is all the data (release times, job sizes, and job size predictions). Thus, a job set can create different problem instances by attaching different job size predictions. A

Published as a conference paper at ICLR 2025

time-varying priority execution processes jobs in a (time-varying) order that depends solely on the problem input, not the job sizes. This is key to proving monotonicity. Lemma J.3. Any RG-based algorithm has time-varying priority execution for all problem inputs.

Proof. Consider any RG-based algorithm A and any problem input I. Suppose I include n jobs (J1, ..., Jn) where job size prediction for Jj is pj and release time is rj. Let yj = α if pj α and yj = β if pj > α for all 1 j n in the case of static α and β. In the case of time-varying α and β (for ARG), let yj = αrj if pj αrj and yj = βrj if pj > αrj. The relaxed-stretch (or adaptive-relaxed-stretch in ARG) for Jj at the time t can be uniformly written as t rj

yj for any RG-based algorithm. Let 0 = t0 t1 t2 ... t P be the sorted elements in set { yi rj yj ri

yi yj | 1 i < j n, yi = yj, yi rj yj ri

yi yj 0} {0}, where every element corresponds to

the solution to equation t ri

yj with variable t for some i, j. Recall that the algorithm runs the

active job Jj with the largest t rj

yj at any time t (with tie-breaking by smaller job index first). Thus, to conclude a time-varying priority execution, we only need to show that the relative priority between jobs (in terms of t rj

yj ) for the entire job set remains the same in every interval (tx, tx+1). Suppose,

for proof by contradiction, that there are two jobs Ja and Jb with t ra

yb and t ra

yb with t < t and t , t (tx, tx+1) for some x. Then, we have:

tx < t < ya rb yb ra

ya yb < t < tx+1

which contradicts the construction of the time partitioning 0 = t0 t1 t2 ... t P .

Lemma J.4. Any algorithm with time-varying priority execution for all problem inputs is monotonic.

Proof. Consider any algorithm A with time-varying priority execution for all problem inputs, any two instances with identical input I and actual job sizes (p 1, ..., p n) and (p 1 , ..., p n ) such that p j p j

for all j. Let S(1) and S(2) be the max-stretch obtained by A on these two instances, and we denote the schedules by σ1 and σ2. To show S(1) S(2), we show C(1) j C(2) j for all 1 j n, where

C(1) j and C(2) j denote the completion time for job Jj in the two schedules. Let p(1) j (t) and p(2) j (t) denote the remaining job size of job Jj at time t in the two schedules, i.e., p j minus the total amount

of processing allocated to job Jj up to time t. To show C(1) j C(2) j , we show that algorithm A

maintains p(1) j (t) p(2) j (t) for all 1 j n, t 0. Let 0 = t0 t1 ... t P be the time partition (for algorithm A running against I) with a mapping F such that A always processes the active job that has the highest priority indicated by F(tx, tx+1).

We show p(1) j (t) p(2) j (t) for all 1 j n, t < tx, x 1 via induction on x (we slightly abuse the notation to let t P +1 represent ). The base case is to consider the interval [t0, t1). Initially, p(1) j (0) = p j p j = p(2) j (0). Fix any time t (t0, t1). The ordering F(t0, t1) defines the job priority in (t0, t1). Schedules σ1 and σ2 are processing (either) the same or different jobs at time t. If they are processing the same job Js, it follows p(1) j (t) = 0 = p(2) j (t) for every job Jj with a priority

higher than Js; p(1) j (t) = p(1) j (0) p(2) j (0) = p(2) j (t) for every job Jj with a priority lower than Js.

It also follows p(1) s (t) p(2) s (t), as p j p j for every job Jj with a higher priority than Js. If they are processing different jobs Ja (in σ1) and Jb (in σ2), it follows that Ja has a lower priority than Jb in F(t0, t1) and Jb has been completed in σ1 at time t. Therefore, any job Jj with a priority higher than Ja has been completed in σ1, i.e., p(1) j (t) = 0 p(2) j (t); any job Jj with a priority no higher

than Ja has not been processed in σ2, i.e., p(1) j (t) p j p j = p(2) j (t). The arguments hold for

every t (t0, t1). For the inductive step, suppose p(1) j (t) p(2) j (t) for all 1 j n, t < tk for

some k 1. Consider the interval [tk, tk+1). By inductive hypothesis, we have p(1) j (tk) p(2) j (tk) for all 1 j n. The inductive step uses a generalized version of the arguments to show the base case. Fix any time t (tk, tk+1). The ordering F(tk, tk+1) defines the job priority in (tk, tk+1). If σ1 and σ2 are processing the same job Js, it follows p(1) j (t) = 0 = p(2) j (t) for every job Jj with a

Published as a conference paper at ICLR 2025

priority higher than Js; p(1) j (t) = p(1) j (tk) p(2) j (tk) = p(2) j (t) for every job Jj with a priority lower

than Js; p(1) s (t) p(2) s (t), as p(1) j (tk) p(2) j (tk) for every job Jj with a higher priority than Js. If

they are processing different jobs Ja (in σ1) and Jb (in σ2), it follows that p(1) j (t) = 0 p(2) j (t) for

any job Jj with a priority higher than Ja and p(1) j (t) p(1) j (tk) p(2) j (tk) = p(2) j (t) for any job Jj with a priority no higher than Ja.

Lemma J.5 (Lemma 6.3 Restated). All algorithms mentioned in this paper (RG, ARG, PRG, RGx, and RR) are monotonic.

Proof for Lemma 6.3. Any RG-based algorithm (RG, ARG, PRG, and RGx) is monotonic by Lemmas J.3 and J.4. It is trivial to see that RR is monotonic.

Theorem J.6 (Theorem 6.4 Restated). Given a monotonic algorithm A with competitive ratio c A, one can obtain a (1 + ϵ)-speed min{c A, n

ϵ }-competitive (RR-augmented) algorithm.

Proof for Theorem 6.4. With a (1 + ϵ)-speed machine, we allocate one-unit speed to algorithm A and ϵ-unit speed to RR. That is, for every time unit, both algorithms run in parallel: algorithm A is run at a rate of 1 1+ϵ, and RR is run at a rate of ϵ 1+ϵ. Let the completion time for Jj be Cj in this schedule.

When running A standalone on a unit-speed machine, the completion time CA j satisfies

max1 j n CA j rj

p j c A S , where S is the optimal max-stretch, and c A is the competitive ratio of

algorithm A. Similarly, for RR running standalone with ϵ-speed, we have max1 j n CR j rj

where CR j is the completion time in this schedule.

Due to the monotonicity of algorithms A and RR, we have:

max 1 j n Cj rj

p j max 1 j n min{CA j , CR j } rj p j max 1 j n min{c A S , n

ϵ S } = min{c A, n

Thus, the combined algorithm achieves a competitive ratio of min{c A, n

J.1 OPTIMALITY OF RG+

The proof of Theorem B.1 can be extended to show that no deterministic algorithm can achieve a competitive ratio better than n 1+ϵ for online non-clairvoyant max-stretch scheduling with a machine of speed (1 + ϵ), even if all jobs are released at time 0. The idea of the extension is as follows.

Consider n identical jobs all with job size 1 to be released at time 0. Run any algorithm A against this set of jobs. By time 1, let J1 be the job with the least amount of the processed size l1. We have l1 1+ϵ

n by the pigeonhole principle. We can then construct an instance with n jobs, all released at time 0, where the job sizes are defined as p 1 = 1+ϵ

n + λ and the rest geometrically increasing sizes, i.e., p 2 = ψ p 1, p 3 = ψ2 ψ, p 4 = ψ3 ψ2, ..., p n = ψn 1 ψn 2, where ψ is a large constant and λ a small constant. In this setup, the optimal max-stretch using a unit-speed machine approaches 1, while the max-stretch achieved by algorithm A with a faster machine is at least approximately n 1+ϵ. This establishes a lower bound for the (1 + ϵ)-speed competitive ratio as n 1+ϵ. Therefore, algorithm RG+ achieves an asymptotically optimal robustness under speed augmentation.

K JOB SIZE DISTRIBUTIONS IN EXPERIMENTS

We aim to replicate real-world data size distributions in our synthetic datasets. Figure 8 shows the job size distributions in the real-world datasets (Google Google (2019), Alibaba Alibaba (2023), and Azure Cortez et al. (2017)). Job sizes generally follow an exponential distribution, except for Google s data, which has more randomness in the right tail. With the maximum job size ratio P, we set p j to max{1, log X A} where X is a random variable with X U(0, 1) and A a scaling

Published as a conference paper at ICLR 2025

0.0 0.2 0.4 0.6 0.8 1.0 Job Size

0.00 0.05 0.10 0.15 0.20 0.25 Job Size

0.00 0.02 0.04 0.06 0.08 0.10 0.12 Job Size

Figure 8: Real-world Job Size Distribution.

Job Size Distribution (KDE)

Figure 9: Synthetic Job Size Distribution

factor ensuring p j P. Figure 9 shows the Kernel Density Estimation (KDE) of the job sizes in the synthetic datasets, showing that the synthetic job sizes follow an exponential distribution aligned with the real-world applications.

L EXPERIMENTS FOR VARIANCE OF STRETCH

Figure 10: The variance of stretch with increasing jobs. Each box represents the distribution of the normalized variance scores. The number of jobs is set to 1000 (Row 1 Column 1), 2000 (R1C2), 3000 (R1C3), 4500 (R2C1), 6000 (R2C2), and 10000 (R2C3). Jobs are released in time [0, 5000] uniformly at random. The job sizes are drawn from [1, 10]. The prediction error is set to 1 (i.e., pj = p j ).

This section presents the experimental results for GWR, RR, RG, ARG, and RG+ in minimizing the variance of stretch, defined as:

variance of stretch = 1

j=1 (Sj S)2

Published as a conference paper at ICLR 2025

Figure 11: The variance of stretch with an increasing P . The job sizes are drawn from interval [1, P ], where P is set to 1 (R1C1), 2 (R1C2), 5 (R1C3), 10 (R2C1), 20 (R2C2), and 40 (R2C3). The number of jobs is fixed at 4500. Jobs are released in time [0, 5000] uniformly at random. The prediction error is set to 1.

where S = 1

n Pn j=1 Sj. The variance of stretch measures fairness from the view of jobs as if they are compared to each other. We run the experiments on the synthetic datasets under the same settings as Section 7. In presenting the results, we apply a min-max normalization to the variance scores, i.e., dividing the variance by the difference between the maximum and minimum variance of all problem instances under each setting.

Figures 10 and Figure 11 present the normalized variance score under increasing n and P with perfect predictions. The performance ordering of these algorithms generally aligns with that in the max-stretch minimization: a lower max-stretch performance ratio corresponds to a lower variance score. RG+ shows its superior performance under all the settings. This suggests that RG+ ensures fairness in resource allocation measured in the stretch space. In contrast, RR suffers the consistent worst variance scores. This worsened performance is the cost of ignoring the job size information in fair scheduling.

1 2 4 8 16 32 64 128 512 Prediction Error

GWR RG ARG RR RG+

Figure 12: The variance of stretch with an increasing prediction error η. The prediction error increases from 1 to 512. The number of jobs is fixed at 4500. Jobs are released in time [0, 5000] uniformly at random. The job sizes are drawn from [1, 10].

Figure 12 presents the results under an increasing η. The variance scores of GWR and RR are constants, as their executions are irrelevant to job size predictions. The variance score of RG and ARG increases with an increasing prediction error, showing a weakening fairness guarantee with worsening predictions. RG+ demonstrates its robustness in minimizing the variance of stretch under imperfect predictions: the variance score increases slowly and is capped as η increases. Overall, the distinction in the variance scores is evident: RG+ outperforms the rest, followed by RG and ARG, and then GWR, with RR the worst.

Published as a conference paper at ICLR 2025

Performance Ratio

(Response Time)

Performance Ratio

(Response Time)

Performance Ratio

(Response Time)

Performance Ratio

(Response Time)

Performance Ratio

(Response Time)

Performance Ratio

(Response Time)

Figure 13: Performance ratios for the mean response time with increasing jobs. Each box represents the distribution of the performance ratios. The number of jobs is set to 1000 (Row 1 Column 1), 2000 (R1C2), 3000 (R1C3), 4500 (R2C1), 6000 (R2C2), and 10000 (R2C3). Jobs are released in time [0, 5000] uniformly at random. The job sizes are drawn from [1, 10]. The prediction error is set to 1 (i.e., pj = p j ).

Performance Ratio

(Response Time)

Performance Ratio

(Response Time)

Performance Ratio

(Response Time)

Performance Ratio

(Response Time)

Performance Ratio

(Response Time)

Performance Ratio

(Response Time)

Figure 14: Performance ratios for the mean response time with an increasing P . The job sizes are drawn from interval [1, P ], where P is set to 1 (R1C1), 2 (R1C2), 5 (R1C3), 10 (R2C1), 20 (R2C2), and 40 (R2C3). The number of jobs is fixed at 4500. Jobs are released in time [0, 5000] uniformly at random. The prediction error is set to 1.

1 2 4 8 16 32 64 128 512 Prediction Error

Performance Ratio

(Response Time)

GWR RG ARG RR RG+

Figure 15: Performance ratios for the mean response time with an increasing prediction error η. The prediction error increases from 1 to 512. The number of jobs is fixed at 4500. Jobs are released in time [0, 5000] uniformly at random. The job sizes are drawn from [1, 10].

Published as a conference paper at ICLR 2025

M EXPERIMENTS FOR RESPONSE TIME

This section presents the experimental results for GWR, RR, RG, ARG, and RG+ in minimizing the mean response time, defined as:

mean response time = 1

The mean response time is used to measure the efficiency of scheduling. A lower response time indicates lower waiting time and an improved quality of service. While our proposed algorithms ensure fairness of scheduling, we are interested in whether this comes with a cost in efficiency. We compare the mean response time of our algorithms with that of GWR and RR. We measure the performance ratio for the response time, defined as the ratio between the mean response time obtained by an algorithm and the optimal value given a problem instance. Shortest Remaining Processing Time First (SRPT) is used to compute the optimal mean response time Baker (1974). We run the experiments on the synthetic datasets under the same settings as Section 7.

Figures 13 and 14 present the performance ratio for the mean response time under increasing n and P with perfect predictions. RR has the worst performance, as active jobs mutually delay each other when they constantly share the resource equally. The performance ordering of the other algorithms aligns with that in the max-stretch minimization. It suggests that reducing the max-stretch does not come with a high cost in response time. While RG achieves outstanding performance in minimizing response time, its augmented variant, RG+, emerges as the champion again.

Figure 15 presents the results under an increasing η. The performance ratios of RG and ARG increase marginally with an increasing prediction error, showing their robustness in minimizing response time with imperfect predictions. In addition, the performance ratio stays close to that of GWR, even with badly wrong predictions. Though the performance ratio increases as that of RG, RG+

shows its superior performance by its winning performance ratio and the robustness to the prediction error. Overall, RG+ performs well in minimizing response time while having a strong guarantee for fairness.

N EXPERIMENTS ON REAL-WORLD DATASETS

1 2 3 4 5 6 Prediction Error

Performance Ratio

GWR RG ARG RR RG+

Figure 16: Performance ratios with an increasing prediction error on real-world trace-log data from Google Cloud. The prediction error increases from 1 to 6.

Figure 16, 18, and 17 presents the results on the real-world datasets. The data are server trace logs collected from Google, Azure, and Alibaba Cloud. Each dataset contains the arrival time and job size of the jobs submitted to the cloud. We generate job size predictions with a prediction error from 1 to 6: with the prediction error η and job size p j, we set pj p j exp(Y ) with Y U( log η, log η). On Azure and Alibaba datasets, RG and ARG, under a low prediction error, are comparable to GWR, with RR the worst. The performance of RG and ARG gradually becomes worse than RR as the prediction error increases. RG+ consistently outperforms the others. The overall performance of all algorithms on Alibaba and Auzre datasets aligns with the results on the synthetic datasets. On the Google dataset, the performance ratios are not sensitive to the prediction error due to the distribution of job size being closer to uniform than exponential. RG+ consistently achieves the best performance. This time, however, RR outperforms RG, ARG, and GWR due to a high maximum job size ratio P

Published as a conference paper at ICLR 2025

1 2 3 4 5 6 Prediction Error

Performance Ratio

GWR RG ARG RR RG+

Figure 17: Performance ratios with an increasing prediction error on real-world trace-log data from Azure Cloud. The prediction error increases from 1 to 6.

1 2 3 4 5 6 Prediction Error

Performance Ratio

GWR RG ARG RR RG+

Figure 18: Performance ratios with an increasing prediction error on real-world trace-log data from Alibaba Cloud. The prediction error increases from 1 to 6.

in the Google dataset (compared to the others). RR s performance does not worsen with an increasing P, but the others do.

O ADDITIONAL EXPERIMENTS WITH A UNIT-SPEED RG+

1 2 4 8 16 32 64 128 512 Prediction Error

Performance Ratio

GWR RG ARG RR RG+ RGF

Figure 19: Performance ratios for the max-stretch with an increasing prediction error η. The prediction error increases from 1 to 512. The number of jobs is fixed at 4500. Jobs are released in time [0, 5000] uniformly at random. The job sizes are drawn from [1, 10].

We performed additional experiments where we set the speed of RG+ to be the same as the others. Figures 19, 20, and 21 presents the results for max-stretch, variance of stretch, and response time. The label RGF denotes the RG+ running with a unit-speed machine, where 0.3-speed is allocated to RR and the rest 0.7-speed to RG.

The performance of RGF is not as good as RG+ due to the slower machine, but the performance generally stays bounded in between RG and RR for all metrics. This is as expected. The intuition is that when RG performs well on an instance, some portion of the processing is shared to RR, thus decreasing the overall performance; when RR performs well, again, some portion is shared to RG,

Published as a conference paper at ICLR 2025

1 2 4 8 16 32 64 128 512 Prediction Error

GWR RG ARG RR RG+ RGF

Figure 20: The variance of stretch with an increasing prediction error η. The prediction error increases from 1 to 512. The number of jobs is fixed at 4500. Jobs are released in time [0, 5000] uniformly at random. The job sizes are drawn from [1, 10].

1 2 4 8 16 32 64 128 512 Prediction Error

Performance Ratio

(Response Time)

GWR RG ARG RR RG+ RGF

Figure 21: Performance ratios for the mean response time with an increasing prediction error η. The prediction error increases from 1 to 512. The number of jobs is fixed at 4500. Jobs are released in time [0, 5000] uniformly at random. The job sizes are drawn from [1, 10].

thus decreasing the overall performance. In return, however, the robustness is persevered, making the performance relatively stable under varying prediction errors.