# online_budgeted_matching_with_general_bids__76fda7b5.pdf

Online Budgeted Matching with General Bids

Jianyi Yang University of Houston Houston, TX, USA jyang71@central.uh.edu

Pengfei Li University of California, Riverside Riverside, CA, USA pli081@ucr.edu

Adam Wierman California Institute of Technology Pasadena, CA, USA adamw@caltech.edu

Shaolei Ren University of California, Riverside Riverside, CA, USA shaolei@ucr.edu

Online Budgeted Matching (OBM) is a classic problem with important applications in online advertising, online service matching, revenue management, and beyond. Traditional online algorithms typically assume a small bid setting, where the maximum bid-to-budget ratio (κ) is infinitesimally small. While recent algorithms have tried to address scenarios with non-small or general bids, they often rely on the Fractional Last Matching (FLM) assumption, which allows for accepting partial bids when the remaining budget is insufficient. This assumption, however, does not hold for many applications with indivisible bids. In this paper, we remove the FLM assumption and tackle the open problem of OBM with general bids. We first establish an upper bound of 1 κ on the competitive ratio for any deterministic online algorithm. We then propose a novel meta algorithm, called Meta Ad, which reduces to different algorithms with first known provable competitive ratios parameterized by the maximum bid-to-budget ratio κ [0, 1]. As a by-product, we extend Meta Ad to the FLM setting and get provable competitive algorithms. Finally, we apply our competitive analysis to the design learningaugmented algorithms.

1 Introduction

Online Budgeted Matching (OBM) with general bids is a fundamental online optimization problem that generalizes to many important settings, such as online bipartite matching and Adwords with equal bids [23]. It has applications in various domains, including online advertising, online resource allocation, and revenue management among others [5, 16, 32]. OBM is defined on a bipartite graph with a set of offline nodes (bidders) and a set of online nodes (queries). The task is to select an available offline node to match with an online query in each round. When an offline node is matched to an online node, a bid value is subtracted from the budget of the offline node, and a reward equal to the consumed budget is obtained. If the remaining budget of an offline node is less than the bid value of an online query, the offline node cannot be matched to the online query. The goal is to maximize the total reward throughout the entire online matching process.

OBM is challenging due to the nature of online discrete decisions. Previous works have studied this problem under one of the following two additional assumptions on bids or matching rules: Small bids. The small-bid assumption is a special case of general bids corresponding to the maximum bid-budget ratio κ 0. That is, while the bid values can vary arbitrarily, the size of each individual bid is infinitely small compared to each offline node s budget, and there is always enough budget for matching. Under this assumption, the first online algorithm was provided by [24],

38th Conference on Neural Information Processing Systems (Neur IPS 2024).

achieving an optimal competitive ratio of 1 1/e [23]. This competitive ratio has also been attained by subsequent algorithms based on primal-dual techniques [4, 7]. However, the small-bid assumption significantly limits these algorithms for broader applications in practice. Take the application of matching Virtual Machines (VMs) to physical servers as an example. An online VM request typically takes up a non-negligible fraction of the total computing units in a server. Fractional last match (FLM). Under FLM, if an offline node has an insufficient budget for an online query, the offline node can still be matched to the query, obtaining a partial reward equal to the remaining budget. Given the limitations of small bids, some recent studies [15, 29, 30] have studied competitive algorithms for OBM with general bids by making the additional assumption of FLM. For example, under FLM, the greedy algorithm (Greedy) achieves a competitive ratio of 1/2, while other studies [4, 15, 29, 30] aim to achieve a competitive ratio greater than 1/2 under various settings and/or using randomized algorithms. Although FLM allows fractional matching of a query to an offline node with insufficient budgets, it essentially assumes that any bids are potentially divisible. This assumption may not hold in many real applications, e.g., allocating fractional physical resources to a VM can result in significant performance issues that render the allocation unacceptable, and charging a fractional advertising fee may not be allowed in online advertising.

Despite its practical relevance and theoretical importance, OBM with general bids has remained a challenging open problem in the absence of the small-bid and FLM assumptions. Specifically, an offline node may have insufficient budget and cannot be matched to a later query with a large value, potentially causing large sub-optimality in the worst case. This issue does not apply to small bids, as the small-bid setting implies that insufficient budgets will never occur. Additionally, this challenge is alleviated in the FLM setting, where fractional matching in cases of insufficient budgets can reduce sub-optimality. Indeed, removing the small-bid and FLM assumptions fundamentally changes and add significant challenges to the problem of OBM [30]. To further highlight the intrinsic difficulty of OBM with general bids, we formally prove in Proposition 4.1 an upper bound of the competitive ratio, i.e., 1 κ, achieved by any deterministic online algorithm, where κ [0, 1] is the maximum bid-budget ratio.

Contributions: In this paper, we address OBM without the small-bid or FLM assumptions and design a meta algorithm called Meta Ad, which adapts to different algorithms with provable competitive ratios. To our knowledge, Meta Ad is the first provable competitive algorithm for general bids without the FLM assumption. Specifically, Meta Ad generates a discounted score for each offline node by a general discounting function, which is then used to select the offline node. The discounting function evaluates the degree of budget insufficiency given a bid-budget ratio κ [0, 1], addressing the challenge of infeasible matching due to insufficient budgets. Given different discounting functions, Meta Ad yields concrete algorithms, and their competitive ratios are derived from Theorem 4.2, established through a novel proof technique. We show that with small bids (i.e., κ 0), Meta Ad recovers the optimal competitive ratio of 1 1

e. Furthermore, we show that Meta Ad, with discounting functions from the exponential and polynomial function classes, achieves a positive competitive ratio for κ [0, 1). As an extension, we adapt the design of Meta Ad to the FLM setting, resulting in a meta-algorithm with provable competitive ratios for κ [0, 1] (Theorem 4.3). The framework of Meta Ad potentially opens an interesting direction for exploring concrete discounting function designs that yield high competitive ratios for settings both with and without FLM. Finally, we apply our competitive analysis to the design of LOBM, a learning-augmented algorithm for OBM, which enhances average performance while still guaranteeing a competitive ratio (Theorem 5.1). We validate the empirical benefits of Meta Ad and LOBM through numerical experiments on the applications of an online movie matching an VM placement on physical servers.

2 Related Work

OBM originates from the online bipartite matching problem defined by [19] 30 years ago. In 2007, [24] generalized the online b-matching problem to OBM (a.k.a. Adwords) [17]. Under the special case of small bids, [24] proposes an algorithm that achieves the competitive ratio of 1 1/e, which is also the optimal competitive ratio under the small-bid setting [17]. In the same year, [4] provides the primal-dual algorithm and analysis for OBM under the small-bid assumption and achieves the competitive ratio of 1 1

e. Subsequently, [7] gives a randomized primal-dual analysis for online bipartite matching and generalizes it to OBM. In addition, OBM has also been studied under the stochastic settings [9, 8, 6, 14, 25].

It is known to be very challenging to go beyond the small-bid assumption and develop a non-trivial competitive ratio for OBM with general bids in an adversarial setting. Recently, [30] points out the inherent difficulty of OBM with general bids and explains the necessity of the assumption of FLM is needed in easing up the challenges. With the FLM assumption, a greedy algorithm can achieve a competitive ratio of 1/2. Additionally, a deterministic algorithm proposed in [4] achieves a competitive ratio of (1 κ 1 κ (1+κ)1/κ ), increasing the competitive ratio when the maximum bid-budget ratio κ is no larger than 0.17. Some other works on OBM with FLM employ randomized algorithm designs. For example, [15] proposes a semi-random algorithm that achieves a competitive ratio of 0.5016, which is known as the best competitive ratio achieved by randomized algorithms up to now. Besides, [30] extends the random algorithm of Ranking to OBM and achieves a competitive ratio of 1 1/e with a strong assumption of the fake budget. Moreover, [29] proposes a randomized algorithm without the knowledge of budget for the FLM setting with competitive ratio 1 1+κ(1 1

e) parameterized by the maximum bid-budget ratios κ, which relies on a strong assumption that the bids are decomposable (i.e. wu,t = wu wt).

Despite the progress on the OBM settings with FLM, OBM without FLM has remained an open challenge except for under the small-bid assumption. The recent study [30] points out that OBM without FLM is difficult because when the offline node has less leftover budget than the bid value of an online arrival, the offline node is not allowed to be matched to the arrival, potentially causing a loss equivalent the leftover budget. [20] considers multi-tier budget constraints with a laminar structure and provides a competitive ratio without FLM, but the result does not apply to the settings without the laminar structure. Additionally, [18] proves an online randomized algorithm with the competitive ratio (in expectation) upper bound of 0.612, an upper bound for deterministic algorithms is still lacking to formally evaluate the difficulty of OBM without FLM.

3 Problem Formulation

We consider OBM with general bids. Specifically, there is a bipartite graph described as G(U, V, E), where the vertices u U (i.e. offline nodes or bidders) are fixed and the vertices v V (i.e. online nodes or queries) arrive sequentially. The edge corresponding to vertices u U and v V has a bid value wu,v 0 which is the amount the offline node u would like to pay for the online node v if matched. The sizes of the vertex sets are denoted as |U| = U and |V| = V , respectively. We index each online node by its arriving order, i.e. online node t arrives at the t-th round.

At the beginning, each offline node u U has an initial budget bu,0 = Bu 0, which is the maximum amount the offline node can pay in total. At round t, a query t V arrives, the edges connected to this arrival t are revealed, and the agent chooses to match the arrival to an available offline node from Ut = {u U | wu,t > 0, bu,t 1 wu,t} or skip this query without any matching. We denote the agent s action as xt Ut S {null}, where null represents skipping this arrival. If at round t, an offline node xt is matched to the query t, a reward rt = wxt,t is earned and a bid value wxt,t is charged from the offline node xt, i.e., bxt,t = bxt,t 1 wxt,t; for the other offline nodes u = xt, the budget remains unchanged bu,t = bu,t 1. If xt = null, no reward is earned, i.e. rt = 0 and the budgets of all offline nodes remain the same as the last round, i.e. bu,t = bu,t 1 u U. The cumulative consumed budget at the end of round t is denoted as cu,t = Bu bu,t, u U and t [V ]. The agent aims to maximize the total reward P = PV t=1 rt over the entire V rounds.

The offline version of OBM can be written as Linear Programming (LP) with its primal and dual problems given in (1), where P is the primal objective and D is the dual objective. While we only need to solve the primal problem for OBM, we present the dual problem with dual variables αu, u U and βt, t V to facilitate the subsequent algorithm design and analysis.

u U wu,txu,t

t=1 wu,txu,t Bu,

u U xu,t 1,

u U, v [V ], xu,t 0.

s.t. u U, t [V ], wu,tαu + βt wu,t,

t [V ], βt 0.

A common performance metric for online algorithms is the competitive ratio defined as

η = min G G{P(G)/P (G)}, (2)

where the minimization is taken over the set of all possible bipartite graphs G 1, P(G) = PV t=1 wxt,t is the total reward obtained by an (online) algorithm for a graph G G, and P (G) = PV t=1 wx t ,t is the corresponding offline optimal total reward with x t being the offline optimal solution to (1).

Next, we formally define the bid-budget ratio κ [0, 1] in Definition 1 which is the maximum ratio of the bid value of an offline node to its total budget. We use η(κ) to denote the competitive ratio of an algorithm for OBM with bid-budget ratio κ.

Definition 1 (Bid-budget ratio). The bid-budget ratio κ [0, 1] for an example G(U, V, E) is defined as κ = supu U,t V wu,t

Many previous works [23, 24, 15, 4] assume FLM which allows for accepting partial bids when remaining budget is insufficient (i.e. modifying each bid wu,t to wu,t = min{wu,t, bu,t 1} given the remaining budget bu,t 1). Without the FLM assumption, the only known competitive ratio for OBM is for the small-bid setting where κ is infinitely small and approaches zero [23, 24]. However, the small-bid and FLM assumptions do not hold in many real-world applications as illustrated by the following examples: Online VM placement. In this problem, a cloud manager allocates virtual machines (VMs, online nodes) to heterogeneous physical servers (offline nodes), each with a computing resource capacity of B u [10, 28]. When a VM request with a computing load of zt arrives, the manager assigns it to a server. If the VM is placed on server u, the manager receives a utility of wu,v = ruzv due to the heterogeneity of servers. The goal is to maximize the total utility PV t=1 P

u U wu,txu,t subject to the computing resource constraint PV t=1 ztxu,t B u for each server u, which can also be written as PV t=1 wu,txu,t Bu with Bu = ru B u. In this problem, VMs are not divisible and consume up a non-negligible portion of the server capacity, violating both the small-bid and FLM assumptions. Inventory management with indivisible goods. Here, a manager must match several indivisible goods (online nodes) to various resource nodes (offline nodes), each with a limited capacity (e.g., matching parcels to mail trucks or food orders to delivery vehicles). Each good can only be assigned to one node without being split, and a good t can occupy a substantial portion of the resource node s capacity, wu,t. The goal is to maximize the total utilization PV t=1 P u U wu,txu,t, subject to the capacity constraint PV t=1 wu,txu,t 1 for each node u. In this problem, neither the small-bid nor FLM assumption applies.

In this paper, without relying on the small-bid or FLM assumptions, we move beyond small bids and propose a meta algorithm (Meta Ad) for general κ [0, 1] which can reduce to many concrete competitive algorithms.

4 Meta Ad: Meta Algorithm for OBM

4.1 An Upper Bound on the Competitive Ratio

In the absence of small-bid and FLM assumptions, OBM faces a unique challenge: when an online query with large bid arrives, there may be no offline node that both connects to the query and has sufficient remaining budgets for it. This leads to missed matches for the queries with large bids, ultimately resulting in a low competitive ratio. To formally show the inherent difficulty of OBM without the small-bid and FLM assumptions, we present an upper bound on the competitive ratio for any deterministic online algorithms in the following proposition. Proposition 4.1. For OBM without small-bid or FLM assumptions, the competitive ratio of any deterministic online algorithm is upper bounded by 1 κ for κ (0, 1]. Specifically, the competitive ratio for any deterministic algorithm is zero when κ = 1 without the FLM assumption.

The proof of the upper bound is deferred to Appendix A.1. The key ingredients of the proof is given below. The best competitive ratio for any deterministic algorithm is maxπ min G G CR(π, G) which

1In this paper, two graphs with different orders of online nodes are considered as two different graphs.

Algorithm 1 Meta Algorithm (Meta Ad) Require: The function ϕ : [0, 1] [0, 1]

1: Initialization: u U, the remaining budget bu,0 = Bu. 2: for t=1 to V , a new vertex t V arrives do 3: For u U, set su,t = wu,tϕ( bu,t 1

Bu ) if bu,t 1 wu,t 0, and su,t = 0 otherwise. 4: if u U, su,t = 0 then 5: Skip the online arrival t (xt = null). 6: else 7: Select xt = arg maxu U su,t. 8: end if 9: Update budget: If xt = null, bxt,t = bxt,t 1 wxt,t; and u = xt, bu,t = bu,t 1. 10: end for

is no larger than maxπ min G G CR(π, G) where G G. Thus, we can prove the upper bound by constructing a subset G with difficult instances and deriving the best competitive ratio among the deterministic algorithms for this subset G . In our constructed G , each example has one offline node and the bid values for the first V 1 rounds sum up to 1 κ + ϵ with ϵ > 0 being infinitely small. We let G = G 1 S G 2 where we have wu,V = κBu for examples in G 1, and wu,V = 0 for examples in G 2. The instances in G illustrate the dilemma between matching a query for immediate reward or saving the budget for future matches. If an algorithm chooses to match all the queries in the first V 1 rounds, it can lose a bid of κBu for instances in G 1 because there is no sufficient budget to match the final query. Conversely, if an algorithm chooses to skip some queries in the first V 1 rounds to save the budget, it can lose a bid of κBu for the instances in G 2 because matching the final query of instances in G 2 earns zero bid. For these difficult instances in G , we formally derive the largest reward ratio of a deterministic algorithm to the offline optimal one which is the upper bound of the competitive ratio.

The upper bound of the competitive ratio 1 κ shows that OBM becomes more difficult when the bid-budget ratio κ gets larger. Intuitively, since the bid value is not fractional for each matching, given a larger κ [0, 1], it is more likely for offline nodes to have insufficient budgets (i.e., unable to be matched to a query with a large bid value). Skipping a query to save budget is also risky because it can happen that the following queries have no positive bid. The FLM assumption can alleviate this difficulty because the remaining budget can be fully spent even if it is insufficient. A greedy algorithm can achieve a competitive ratio of 1/2 for general bids with FLM [23]. By contrast, the upper bound of the competitive ratio without FLM cannot reach 1/2 when the bid-budget ratio κ is larger than 1/2. There is even no non-zero competitive ratio when κ = 1. These observations reveal that without FLM, OBM becomes more difficult. The analysis without FLM assumption will help us to understand OBM better.

4.2 Meta Algorithm Design

We now present Meta Ad in Algorithm 1, which is a meta online algorithm that reduces to many concrete algorithms with provable competitive ratios for OBM in the absence of small-bid and FLM assumptions.

Meta Ad relies on a general discounting function ϕ : [0, 1] [0, 1]. Given a new query t, Meta Ad uses ϕ to score each offline node by discounting its bid value in Line 3, and selects the node with the largest score su,t from Line 4 to Line 7. An offline node is scored zero if it has insufficient budget for the query t (bu,t 1 wu,t < 0) or it has zero bid for the query t (wu,t = 0). If all the offline nodes are scored zero, the algorithm skips this query t.

The scoring in Meta Ad reflects a balance between selecting an offline node with a large bid and saving budget for future. To select offline nodes with large bids, the score scales with the bid value wu,t. Simultaneously, an increasing function ϕ maps the normalized remaining budget bu,t 1

Bu to a discounting value within [0, 1]. If an offline node u has less remaining budget, a smaller discounting value is obtained to encourage conserving budget for u.

Algorithm 2 Dual Construction Require: The function ϕ : [0, 1] [0, 1], φ(x) = 1 ϕ(1 x), ρ 1.

1: Initialization: u U, αu,0 = 0, bu,0 = Bu, U = , and t [V ], βt = 0. 2: for t=1 to V , a new vertex t V arrives do 3: Append {u | bu,t 1 wu,t < 0} into U . 4: Score su,t, select xt for the arrival t, and update budget bu,t by Algorithm 1. 5: Set the dual variable βt = sxt,t, and set the dual variable αu,t = φ( cu,t

Bu ). 6: end for 7: For u / U , set αu = αu,V ; and for u U , set αu = 1.

4.3 Competitive analysis

Given any monotonically increasing function ϕ : [0, 1] [0, 1] in Algorithm 1, we can get a concrete algorithm for OBM. For different bid-budget ratio κ, the competitive ratio of Meta Ad is given in the main theorem below. Theorem 4.2. If the function ϕ : [0, 1] [0, 1] in Algorithm 1 satisfies that given an integer n 1, i n, φ(i)(x) > 0 where φ(x) = 1 ϕ(1 x), the competitive ratio of Algorithm 1 is

1 + κn+1R + maxy [0,1] (y) + ϕ(κ)

where R is the Lipschitz constant of φ(n)(x) if φ(n)(x) is not monotonically decreasing, and R = 0 otherwise. Additionally, (y) = φ(y)

y R y x=0 φ(x)dx + 1

y Pn i=1 κi φ(i 1)(y) φ(i 1)(0) .

By Theorem 4.2, we can easily get a competitive algorithm for OBM with any bid-budget ratio κ by choosing a function ϕ. The only requirement is that the function ϕ is a monotonically increasing function. In the next section, we will give some concrete examples of competitive algorithms by assigning ϕ with different function classes.

We defer the complete proof of Theorem 4.2 to Appendix A.2. The analysis is based on the fundamental conditions in Lemma 1 that guarantee the competitive ratio and presents new challenges due to the absence of the small-bid and FLM assumptions. Lemma 1 (Conditions for competitive ratio). An online algorithm achieves a competitive ratio of η [0, 1] if it selects a series of feasible actions {x1, . . . , x V } and there exist dual variables {β1, , βV }, {α1, , αU} such that

(Dual feasibility) u U, t [V ], βt wu,t(1 αu)

(Primal-Dual Ratio ) P η D, where P = PV t=1 wxt,t and D = P

u U Buαu+PV t=1 βt.

Without the small-bid and FLM assumptions, the competitive analysis presents the following new challenges to satisfy the conditions in Lemma 1: Dual construction for general bids. When an offline node has an insufficient budget to match a query, the remaining budget is almost zero for the small-bid setting, but it can be large and uncertain without the small-bid assumption. This introduces a new challenge to construct dual variables that satisfy the dual feasibility due to budget insufficiency.

To address this challenge, we present a new dual construction in Algorithm 2 where dual variables are determined based on the remaining budget and adjusted at the end of the algorithm. The constructed dual variables satisfy the dual feasibility in Lemma 1, as explained below. We define βt as the score of selected offline node u. For any u with sufficient remaining budget (bu,t 1 wu,t), we have βt su,t = wu,t(1 φ( cu,t 1

Bu )). By choosing αu,t = φ( cu,t

Bu ), the dual feasibility in Lemma 1 is satisfied for t and u with sufficient budget (bu,t 1 wu,t) since by an increasing function φ, it holds that αu αu,t for any t [T].

Different from the small-bid setting, we need to adjust the dual variables at the end of the dual construction (Line 7 in Algorithm 2) to satisfy dual feasibility. We set αu = 1 for any u with insufficient budget (bu,t 1 < wu,t) at the end of the dual construction. This ensures that the dual feasibility is always satisfied without FLM.

Guarantee the primal-dual ratio. The challenges in guaranteeing the primal-dual ratio in Lemma 1 come from the unspecified discounting function ϕ and the absence of the small-bid and FLM assumptions. To solve this challenge, we derive a condition to satisfy the primal dual ratio Pt 1

γ Dt for any round t where γ 1, Pt = Pt i=1 wxi,i is the cumulative primal reward and

u U Buαu,t + Pt i=1 βt is the cumulative dual. Thus, we prove that the primal dual ratio Pt 1

γ Dt is satisfied for any t [T] if for any y [0, 1] it holds that,

x=0 φ(x)dx +

i=1 κiφ(i 1)(y) + (κn+1R γ + 1)y

i=1 κiφ(i 1)(0), (3)

where φ(x) = 1 ϕ(1 x) and ϕ is the discounting function. Given that the dual increase due to the final dual adjustment (Line 7 in Algorithm 2) is bounded by ϕ(κ)

1 κ P, we can bound the final primal-dual ratio as P 1 γ+ ϕ(κ)

1 κ D. This leads to a competitive ratio of 1 γ+ ϕ(κ)

1 κ . Given any

discounting function ϕ, we can solve for γ that satisfies the condition in (3), thereby obtaining the competitive ratio of Meta Ad.

4.4 Competitive Algorithm Examples

In this section, we assign ϕ with different functions to get concrete algorithms and competitive ratios.

4.4.1 Small Bid

We first verify that Meta Ad reduces to the optimal algorithm for small-bid setting (κ 0) [24, 23].

Corollary 4.2.1. By choosing ϕ(x) = e e1 x

e 1 , Meta Ad reduces to the algorithm in [24] and achieves the optimal competitive ratio of 1 1

e for small-bid setting (κ 0).

Corollary 4.2.1 shows that the competitive ratio in Theorem 4.2 is consistent with the classical results for small bids. Interestingly, our analysis shows how the optimal ϕ is obtained which is explained as follows. By solving (3) with "=" and κ = 0, we get

φ(x) = (γ 1)ex + 1 γ, ϕ(x) = γ (γ 1)e1 x, (4)

where γ e e 1 to make sure ϕ(x) 0 for x [0, 1]. By Theorem 4.2, we get the competitive ratio as limκ 0 1 γ+ ρϕ(κ)

1 κ = 1 (2 e)γ+e. By optimally choosing γ = e e 1, we get the optimal competitive

ratio as 1 1

4.4.2 Exponential Function Class

Next, we consider an exponential function class φ(x) = C1eθx + C2 with 0 θ 1. To ensure φ(x) is an increasing function, we choose C1 0. Also, we choose C2 = C1 to simplify the expression of the competitive ratio. We can observe that φ(x) has positive n th derivative for any n 1. Thus, we choose n = in Theorem 4.2 to eliminate the term κn+1R. By substituting φ(x) into η(κ) in Theorem 4.2, we get the corollary below.

Corollary 4.2.2. If we assign φ(x) = Ceθx C with C 0 and 0 θ 1 in Meta Ad in Algorithm 1, we get the competitive ratio as

θ + κ 1 κθ )(eθ 1)+ 1+C Ceθ(1 κ)

θ + κ 1 κθ 0

θ + κ 1 κθ )θ+ 1+C Ceθ(1 κ)

θ + κ 1 κθ < 0 (5)

We numerically solve the optimal η(κ) for each κ [0, 1] by adjusting the parameters θ and C and show the results in Figure 1. We observe that Meta Ad achieves a non-zero competitive ratio for κ [0, 1). The competitive ratio for κ = 0 is the optimal competitive ratio of 1 1

e for small-bid setting. The competitive ratio monotonically decreases with κ. This coincides with the intuition that when κ gets larger, it is more likely to trigger budget insufficiency and the problem becomes more challenging. Also, we can find that for a large enough κ, the competitive ratio of Meta Ad with the exponential function is very close to the upper bound. However, there can exist other forms of exponential discounting function that can achieve higher competitive ratio.

0.0 0.2 0.4 0.6 0.8 1.0 Bid-budget ratio

CR (w/o FLM)

Meta Ad (Exp) Meta Ad (Quad) Upper Bound

Figure 1: Competitive ratio without FLM. Meta Ad (Exp) represents the Meta Ad with φ(x) = C(eθx 1) and Meta Ad (Quad) represents the Meta Ad with φ(x) = Cx2).

Interestingly, the optimal choices of the exponential function φ(x) for different κ reveal the insights into designing deterministic algorithms for OBM. When κ is less than a critical point κ 0.26, the optimal choice of the exponential function is φ(x) = 1 eθx

1 eθ and the optimal choice of θ decreases with κ [0, κ]. In this range, as κ becomes larger in [0, κ], the discounting ϕ( bu,t 1

Bu ) = 1 φ(1 bu,t 1

Bu ) becomes smaller, indicating a more conservative approach to budget usage in preparation for potentially high future bids. However, as κ gets larger than κ, the optimal choice of the discounting function becomes ϕ(x) = 1 with C = 0, yielding a greedy algorithm. This suggests that for large enough κ, the algorithm benefits more by matching a node with a large bid immediately than by conserving more budget for future.

4.4.3 Polynomial Function Class

In this section, we explore another function class to show that Meta Ad is general enough to provide competitive algorithms given different discounting functions. We consider a function class of n th polynomial function, i.e. φ(x) = Pn j=0 Cjxj. We set Pn j=1 Cj 1 to ensure φ(1) 1 and set C0 = 0 to simplify the competitive ratio. We summarize the competitive ratio of the polynomial function class and provide a concrete example for quadratic function in the next corollary.

Corollary 4.2.3. If we assign φ(x) = Pn j=1 Cjxj with Pn j=1 Cj 1 and i-th derivative φ(i)(x) 0 (0 i n), Meta Ad achieves a competitive ratio as

η(κ) = 1 1 + maxy [0,1] (y) + 1 1 κ Pn j=1 Cj(1 κ)j 1 , (6)

where (y) = Cn

n+1yn + ((1 + κ)Cn Cn 1

n )yn 1 + Pn 2 j=0 ((1 + κ)Cj+1 Cj j+1 + Pn j i=2 κi Ci+j (i+j)! (j+1)!)yj. Specifically, given a quadratic example φ(x) = Cx2, by optimally choosing C = 1, we have

4 + 1 1 κ) 1. (7)

We numerically show the results of η(κ) in Figure 1. We observe that Meta Ad with a simple quadratic function φ(x) = x2 can also achieve non-zero competitive ratio for κ [0, 1). However, this competitive ratio is lower than the best competitive ratio achieved by the exponential function φ(x) = C(eθx 1).

The examples of exponential functions and quadratic functions demonstrate the strength of Meta Ad in providing competitive algorithms for OBM with general bids. While Meta Ad provides the first framework to get non-zero competitive ratio for OBM with κ [0, 1) (in the absence of the FLM assumption), it is interesting to explore other functions ϕ under the Meta Ad framework with better competitive ratios.

4.5 Extension to OBM with FLM

While Meta Ad is designed for the more challenging OBM without FLM, this section demonstrates that Meta Ad can be extended to provide competitive algorithms for OBM with FLM.

Due to the space limitation, we defer the algorithm of Meta Ad with FLM (Algorithm 3) and its analysis to Appendix B. Instead of scoring based on the true bid wu,t, Algorithm 3 determines the scores based on a modified bid min{wu,t, bu,t 1}. Based on the modified dual construction in Algorithm 4, we can get the competitive ratio for OBM with FLM in the next theorem.

Theorem 4.3. If the function ϕ : [0, 1] [0, 1] in Algorithm 1 satisfies that given an integer n 2, i n 1, φ(i)(x) > 0 and R = maxx [0,1] φ(n)(x) where φ(x) = 1 ϕ(1 x), the competitive ratio of Algorithm 1 is

η(κ) = 1 1 + κn R + maxy [0,1] (y) + ϕ(κ),

where (y) = φ(y)

y R y x=0 φ(x)dx + 1

y Pn i=1 κi φ(i 1)(y) φ(i 1)(0) .

0.0 0.2 0.4 0.6 0.8 1.0 Bid-budget ratio

CR (w/ FLM)

Meta Ad (Exp) BJN2007 Greedy

Figure 2: Competitive ratio with FLM. Meta Ad (Exp) represents Meta Ad with φ(x) = C(eθx 1) and BJN2007 represents the algorithm in [4].

The competitive ratio with FLM in Theorem 4.3 differs from that in Theorem 4.2 only in the final terms of the denominators, which are ϕ(κ) and ϕ(κ)

1 κ respectively. Thus, the competitive ratio with FLM is always larger than that without FLM given the same values of κ and ϕ. This improvement arises because FLM allows for accepting partial bids when budgets are insufficient, thereby reducing the potential budget waste.

Similar as Meta Ad without FLM, we assign an exponential function class φ(x) = C(eθx 1) to get a concrete algorithm with competitive ratio in Corollary B.1.1. We numerically solve the optimal η(κ) for each κ [0, 1] by adjusting θ and C and compare the results with an existing competitive algorithm BJN2007 [4] for FLM in Figure 2. As κ 0, both Meta Ad and BJN2007 achieve the optimal competitive ratio 1 1/e in the small-bid setting. However, as κ approaches 1, the competitive ratio of BJN2007 decreases to zero while Meta Ad, reducing to a greedy algorithm, maintains a competitive ratio of 1

2, the best known competitive ratio of the deterministic algorithms for the OBM with FLM.

5 Competitive Learning-Augmented Design

In this section, we demonstrate the application of our competitive analysis for designing learningaugmented algorithms which guarantee a competitive ratio of ML-based solutions for OBM.

Our competitive analysis directly motivates a learning-augmented algorithm for OBM called LOBM. The algorithm of LOBM and analysis are deferred to Appendix C. In LOBM, we apply a ML model which at each round takes the features of the arriving query and the offline nodes as inputs and gives the output zu,t. Directly using 1 zu,t as a discounting value to set the score as wu,t(1 zu,t) can result in arbitrarily bad worst-case performance for adversarial examples. To provide a competitive guarantee for OBM, LOBM projects the ML output zu,t into a competitive solution space Du,t in (28) and obtains a projected value zu,t. The score is then set as wu,t(1 zu,t) based on the projected ML output zu,t. The key design of the competitive solution space is motivated by the conditions in Lemma 1, which ensures that any z value in Du,t leads to the satisfactions of the dual feasibility and primal-dual ratio. The competitive solution space is based on the dual construction given the discounting function φ(x) = eθx 1

eθ 1 where θ > 0 in Algorithm 2. Importantly, we introduce a slackness parameter λ [0, 1] in the design of Du,t in (28). The parameter λ controls the size of the competitive space Du,t and further regulates the competitive ratio of LOBM. Given a smaller λ, we can get a larger competitive space Du,t, and so LOBM has more flexibility to exploit the benefits of ML predictions. However, a smaller λ also leads to a smaller competitive ratio shown in the theorem below.

Theorem 5.1. Given the maximum bid-budget ratio κ [0, 1], θ > 0, and the slackness parameter λ [0, 1], with any ML predictions, LOBM in Algorithm 5 achieves a competitive ratio of

ˆη(κ) = λ(1 1

1 + λ 1 e θκ

κ 1 i+ . (8)

Algorithms w/o ML Predictions ML-based Algorithms Greedy Primal Dual Meta Ad ML LOBM-0.8 LOBM-0.5 LOBM-0.3 Worst-case 0.7941 0.8429 0.8524 0.7903 0.8538 0.8324 0.8113 Average 0.9329 0.9340 0.9344 0.9355 0.9372 0.9371 0.9343 Table 1: Worst-case and and average normalized reward on the Movie Lens dataset. The best results among algorithms w/o ML predictions and the best results among ML-based algorithms are highlighted in bold font.

Theorem 5.1 shows that LOBM with a slackness parameter λ [0, 1] can guarantee a competitiveness ratio of ˆη(κ) regardless of the ML prediction quality. The parameter λ [0, 1] determines the worst-case competitive ratio and the degree of flexibility to exploit the benefit of ML predictions. When λ = 0, there is no competitive ratio guarantee and LOBM reduces to a pure ML-based algorithm. This can also be seen from the inequalities in the competitive solution space (28), which are all satisfied automatically when λ = 0. On the other hand, when λ = 1, LOBM achieves the highest competitive ratio. When λ increases from 0 to 1, the competitive solution space in (28) varies from whole solution space (with λ = 0) to the smallest competitive solution space (with λ = 1). Therefore, the choice of the slackness parameter λ provides a trade-off between the competitive guarantee and the average performance by adjusting the level of exploiting the ML predictions.

6 Empirical Results

We evaluate the empirical performance of Meta Ad and LOBM on two applications. The first application is Online Movie Matching where the platform needs to match each query to a movie advertiser with limited budget. The empirical results are obtained based on the Movie Lens Dataset [12]. The main empirical results are shown in Table 1. We compare Meta Ad with the algorithms without using ML (Greedy and Primal Dual introduced in Section D.1.1) and show that Meta Ad achieves the best worstcase and average performance among them. Additionally, we validate that LOBM with a guarantee of competitive ratio in Theorem 5.1 achieves the best worst-case reward with a good average reward. Other empirical ablation studies can be found in Section D.1.2.

The second application is Online VM Placement introduced in Section 3. We generate the bipartite graphs with connections between physical servers and VMs by the Barabási Albert method [3] and assign utility values according to the prices of Amazon EC2 compute-optimized instances [2]. We defer the empirical results and ablation studies to Appendix D.2.3.

7 Conclusion

In this paper, we consider a challenging setting for OBM without the FLM and small-bid assumption. First, we highlight the challenges by proving an upper bound on the competitive ratio for any deterministic algorithms in OBM. Then, we design the first meta algorithm Meta Ad that achieves a provable competitive ratios parameterized by the maximum bid-budget ratio κ [0, 1]. We also extend LOBM under the additional FLM assumption. Additionally, based on the competitive analysis, we propose LOBM to take advantage of ML predictions to improve the performance with a competitive ratio guarantee, followed by its empirical validations.

Limitations and Future Directions. While we provide the first provable meta algorithms for OBM with general bids, determining the best choice of the discounting function ϕ remains an open question and an interesting problem for future exploration.

Broader impacts. By introducing a provable algorithm for OBM under more general settings, our work has the potential to advance the applications and motivate new algorithms. For applications like advertising, if large budget disparities among offline nodes exist, those with larger initial budgets could have a higher chance of being matched due to their smaller bid-to-budget ratios. This fairness issue, also observed in prior algorithms [23, 4, 24], warrants further investigation.

Aknowledgement

Jianyi Yang, Pengfei Li and Shaolei Ren were supported in part by NSF grants CNS-2007115 and CCF-2324941. Adam Wierman was supported by NSF grants CCF-2326609, CNS-2146814, CPS2136197, CNS-2106403, and NGSDI-2105648 as well as funding from the Resnick Sustainability Institute.

[1] Mohammad Ali Alomrani, Reza Moravej, and Elias B Khalil. Deep policies for online bipartite matching: A reinforcement learning approach. ar Xiv preprint ar Xiv:2109.10380, 2021.

[2] Amazon EC2. https://aws.amazon.com/ec2/.

[3] Allan Borodin, Christodoulos Karavasilis, and Denis Pankratov. An experimental study of algorithms for online bipartite matching. Journal of Experimental Algorithmics (JEA), 25:1 37, 2020.

[4] Niv Buchbinder, Kamal Jain, and Joseph Seffi Naor. Online primal-dual algorithms for maximizing ad-auctions revenue. In European Symposium on Algorithms, pages 253 264. Springer, 2007.

[5] Nikhil Devanur and Aranyak Mehta. Online matching in advertisement auctions, 2022.

[6] Nikhil R Devanur and Thomas P Hayes. The adwords problem: online keyword matching with budgeted bidders under random permutations. In Proceedings of the 10th ACM conference on Electronic commerce, pages 71 78, 2009.

[7] Nikhil R Devanur, Kamal Jain, and Robert D Kleinberg. Randomized primal-dual analysis of ranking for online bipartite matching. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, pages 101 107. SIAM, 2013.

[8] Gagan Goel and Aranyak Mehta. Adwords auctions with decreasing valuation bids. In International Workshop on Web and Internet Economics, pages 335 340. Springer, 2007.

[9] Gagan Goel and Aranyak Mehta. Online budgeted matching in random input models with applications to adwords. In SODA, volume 8, pages 982 991. Citeseer, 2008.

[10] Robert Grandl, Ganesh Ananthanarayanan, Srikanth Kandula, Sriram Rao, and Aditya Akella. Multi-resource packing for cluster schedulers. ACM SIGCOMM Computer Communication Review, 44(4):455 466, 2014.

[11] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2024.

[12] F Maxwell Harper and Joseph A Konstan. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4):1 19, 2015.

[13] Nguyen Trung Hieu, Mario Di Francesco, and Antti Ylä Jääski. A virtual machine placement algorithm for balanced resource utilization in cloud data centers. In 2014 IEEE 7th International Conference on Cloud Computing, pages 474 481. IEEE, 2014.

[14] Zhiyi Huang, Zhihao Gavin Tang, Xiaowei Wu, and Yuhao Zhang. Online vertex-weighted bipartite matching: Beating 1-1/e with random arrivals. ACM Transactions on Algorithms (TALG), 15(3):1 15, 2019.

[15] Zhiyi Huang, Qiankun Zhang, and Yuhao Zhang. Adwords in a panorama. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 1416 1426. IEEE, 2020.

[16] Patrick Jaillet and Xin Lu. Online resource allocation problems. Rock & Soil Mechanics, 86:3701 3704, 2011.

[17] Bala Kalyanasundaram and Kirk R Pruhs. An optimal deterministic algorithm for online b-matching. Theoretical Computer Science, 233(1-2):319 325, 2000.

[18] Michael Kapralov, Ian Post, and Jan Vondrák. Online submodular welfare maximization: Greedy is optimal. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, pages 1216 1225. SIAM, 2013.

[19] Richard M Karp, Umesh V Vazirani, and Vijay V Vazirani. An optimal algorithm for on-line bipartite matching. In Proceedings of the twenty-second annual ACM symposium on Theory of computing, pages 352 358, 1990.

[20] Nathaniel Kell and Debmalya Panigrahi. Online budgeted allocation with general budgets. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 419 436, 2016.

[21] Pengfei Li, Jianyi Yang, Adam Wierman, and Shaolei Ren. Robust learning for smoothed online convex optimization with feedback delay. In Neur IPS, 2023.

[22] Wubin Li, Johan Tordsson, and Erik Elmroth. Modeling for dynamic cloud scheduling via migration of virtual machines. In 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pages 163 171. IEEE, 2011.

[23] Aranyak Mehta. Online matching and ad allocation. Foundations and Trends in Theoretical Computer Science, 8(4):265 368, 2013.

[24] Aranyak Mehta, Amin Saberi, Umesh Vazirani, and Vijay Vazirani. Adwords and generalized online matching. Journal of the ACM (JACM), 54(5):22 es, 2007.

[25] Vahab S Mirrokni, Shayan Oveis Gharan, and Morteza Zadimoghaddam. Simultaneous approximations for adversarial and stochastic online budgeted allocation. In Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms, pages 1690 1701. SIAM, 2012.

[26] Mayank Mishra and Anirudha Sahoo. On theory of vm placement: Anomalies in existing methodologies and their mitigation using a novel vector based approach. In 2011 IEEE 4th International Conference on Cloud Computing, pages 275 282. IEEE, 2011.

[27] Zhonghong Ou, Hao Zhuang, Jukka K Nurminen, Antti Ylä-Jääski, and Pan Hui. Exploiting hardware heterogeneity within the same instance type of amazon ec2. In 4th USENIX Workshop on Hot Topics in Cloud Computing (Hot Cloud 12), 2012.

[28] Benjamin Speitkamp and Martin Bichler. A mathematical programming approach for server consolidation problems in virtualized data centers. IEEE Transactions on services computing, 3(4):266 278, 2010.

[29] Rajan Udwani. Adwords with unknown budgets and beyond. ar Xiv preprint ar Xiv:2110.00504, 2021.

[30] Vijay V Vazirani. Towards a practical, budget-oblivious algorithm for the adwords problem under small bids.

[31] Alexander Wei and Fred Zhang. Optimal robustness-consistency trade-offs for learningaugmented online algorithms. In Neur IPS, 2020.

[32] Dan Zhang and William L Cooper. Revenue management for parallel flights with customerchoice behavior. Operations Research, 53(3):415 431, 2005.

A Proof of theorems in Section 4

A.1 Proof of Proposition 4.1

Proof. Proposition 4.1 can be proved as follows. Denote CR(π; G) as the competitive ratio of a deterministic algorithm π on the graph instance G. The competitive ratio for any deterministic algorithm is maxπ min G G CR(π, G) which is no larger than maxπ min G G CR(π, G) where G G. Thus, we can prove the upper bound of the competitive ratio by constructing an example subset G and deriving the resulting competitive ratio for any deterministic algorithm. Specifically, an example subset G is constructed as below.

Example 1. Consider a setting with only one offline node and a total budget of 1. The agent needs to decide whether or not to match an online node with a bid value wu,t κ, κ (0, 1] to the offline node for V 2 rounds. The bid values for the first V 1 rounds are equivalent to ω and sum up to 1 κ + ϵ where ϵ is infinitely small, so we have ω (0, (1 κ + ϵ)/ 1 κ+ϵ

κ ]. The bid value wu,V in the last round is either zero or κ and is not known to the agent. Thus, the constructed example subset is composed of two smaller subsets, i.e. G = G 1 + G 2. In the example of the subset G 1, we have wu,V = κ, and in the examples of the subset G 2, we have wu,V = 0.

The offline optimal solutions are different for G 1 and G 2 in Example 1. For G 1 with the last bid value as wu,V = κ, the optimal solution is to skip one of the first (V 1) rounds. In this way, the last online node with bid κ can be matched and the total reward is 1 + ϵ ω. For G 2 with the last bid value as wu,V = 0, the optimal solution is to match all the online nodes for the first V 1 rounds and obtain a total reward of 1 κ + ϵ.

For the examples in G in Example 1, the optimal online algorithm can be chosen from the following two. First, the algorithm can choose to match the online node to the offline node in all the first (V 1) rounds. This algorithm is optimal for G 2, but for G 1 with wu,V = κ, the total reward is 1 κ + ϵ which is less than the offline optimal reward 1 + ϵ ω. Therefore, the competitive ratio of this algorithm in the worst case is limϵ 0 minω (0,(1 κ+ϵ)/ 1 κ+ϵ

κ ] 1 κ+ϵ 1+ϵ ω 1 κ when ω is infinitely small. Second, the algorithm can choose to skip one round in the first V 1 rounds such that the last online node can be matched if it has a bid value of κ in G 1. However, for G 2 with wu,V = 0 and the total reward is 1 κ + ϵ ω which is less than the optimal reward as 1 κ + ϵ. Thus, the competitive ratio of this algorithm is limϵ 0 minω (0,(1 κ+ϵ)/( 1 κ+ϵ

κ ]) 1 κ+ϵ ω

1 κ+ϵ = 1 1 1 κ

κ for κ (0, 1). When κ = 1, the competitive ratio of this algorithm is minω (0,ϵ) 1 κ+ϵ ω

1 κ+ϵ = minω (0,ϵ) ϵ ω

ϵ = 0. Therefore, the competitive ratio for any deterministic algorithm for G is maxπ min G G CR(π, G) = max{1 κ, 1 1 1 κ

κ } = 1 κ for κ (0, 1), and 0 for κ = 1.

Combining both cases of κ (0, 1) and κ = 1, we get the upper bound of the competitive ratio for any deterministic algorithm for G as 1 κ, which is also an upper bound of the competitive ratio of any deterministic algorithm for OBM.

A.2 Proof of Theorem 4.2

To prove Theorem A.2, we first prove Lemma 1.

Proof of Lemma 1.

Proof. The first condition guarantees that the dual variables are feasible. The second condition is to guarantee the competitive performance. Let D and P be the optimal dual and primal objectives. If the second condition is satisfied, then we have

P ηD ηD ηP , (9)

where the second inequality holds since D is the minimum dual objective, and the third inequality comes from weak duality. This completes the proof.

Proof of Theorem A.2

Proof. To satisfy the primal-dual ratio Pt 1

γ Dt, we can get an inequality of αu,t as below.

u U Buαu,t +

u U Buαu,t + X

i=1,xi=u wu,i(1 φ(cu,i 1

u U, Buαu,t +

i=1,xi=u wu,i(1 φ(cu,i 1

Bu )) γ cu,t

Bu φ(cu,i 1

Bu ) + (γ 1)cu,t

where cu,t = Pt i=1,xi=u wxi,i. Since αu,t = φ( cu,t

Bu ), we get the condition of φ as below.

Bu φ(cu,i 1

Bu ) + (γ 1)cu,t

Further, since φ is an increasing function, we have the following bound for the discrete sum.

Bu φ(cu,i 1

Bu ) φ(cu,i 1

Bu ) φ(cu,i 1

where the inequality holds by the integral inequality R y x=0 φ(x)dx PN i=1 xiφ(Pi j=1 xj) with

y = PN i=1 xi for a positive increasing function φ. If φ (x) 0 for x [0, 1], we have

Bu φ(cu,i 1

i=1,xi=u (wu,i

Bu )2φ (cu,i 1

x=0 φ(x)dx κ Z cu,t

x=0 φ (x)dx

x=0 φ(x)dx (κφ(cu,t

Bu ) κφ(0)),

where the first inequality holds since φ (x) 0 for x [0, 1], the second inequality holds by the bid bound κ and another integral inequality R y x=0 φ(x)dx PN i=1 xiφ(Pi 1 j=1 xj) with y = PN i=1 xi.

If 0 < φ (x) R for x [0, 1], following Eqn. (12), we have

Bu φ(cu,i 1

x=0 φ(x)dx κ

Bu φ (cu,i 1

Bu (φ (cu,i

Bu ) φ (cu,i 1

x=0 φ(x)dx κ Z cu,t

x=0 φ (x)dx κ

i=1,xi=u (wu,i

x=0 φ(x)dx (κφ(cu,t

Bu ) κφ(0)) κ2Rcu,t

where the second inequality holds by the integral inequality and φ (x) R.

Therefore, we can extend to the case where φ has n the derivative. If φ(i)(x) > 0, i n, x [0, 1], then we have

Bu φ(cu,i 1

i=1 κiφ(i 1)(cu,t

i=1 κiφ(i 1)(0) κn+1Rcu,t

where R is the Lipschitz constant of φ(n)(x) if φ(n)(x) is not monotonically decreasing and R = 0 otherwise.

Substituting (15) into (11), the condition becomes

Bu ) (γ 1)cu,t

Bu + Z cu,t

i=1 κiφ(i 1)(cu,t

i=1 κiφ(i 1)(0) κn+1Rcu,t

Thus, a φ function satisfies the primal dual ratio Pt 1

γ Dt if it satisfies for any y [0, 1],

x=0 φ(x)dx +

i=1 κiφ(i 1)(y) + (ρκn+1R γ + 1)y ρ

i=1 κiφ(i 1)(0). (17)

Finally, we bound P

u U BuΛu where Λu = αu αu,V is the dual increase at the end of the dual construction 2. Λu > 0 can hold for u U because αu = 1 for u U Thus, after the V loops, we have for u U

BuΛu = Bu (1 αu,V ) = Bu

Bu ) Buϕ(κ) (18)

where the inequality holds since bu,V κBu for any u U . Thus, we have X

u U Buϕ(κ) ϕ(κ) P P

u U (1 κ)Bu = ϕ(κ)

1 κ P, (19)

where the second inequality holds because P P

u U (1 κ)Bu given that cu,V (1 κ)Bu for u U .

Putting them together, we have P 1 γ (D ϕ(κ)

1 κ P) which leads to the primal dual ratio as

P 1 γ+ ϕ(κ)

1 κ D. To satisfy (17) for any φ, we can choose γ 1 + κn+1R + φ(y)

y R y x=0 φ(x)dx +

1 y Pn i=1 κi φ(i 1)(y) φ(i 1)(0) for any y [0, 1]. Combining with Lemma 1, we get the competitive ratio in Theorem 4.2.

Algorithm 3 Meta Algorithm (Meta Ad with FLM) Require: The function ϕ : [0, 1] [0, 1]

Initialization: u U, the remaining budget bu,0 = Bu. for t=1 to V , a new vertex t V arrives do

For u U, set su,t = wu,tϕ( bu,t 1

Bu ) if bu,t 1 wu,t 0, and set su,t = bu,t 1ϕ( bu,t 1

Bu ), otherwise. if u U, su,t = 0 then

Skip the online arrival t (xt = null). else

Select xt = arg maxu U su,t. end if Update budget: If xt = null, bxt,t = bxt,t 1 wxt,t; and u = xt, bu,t = bu,t 1. end for

B Meta Ad for OBM with FLM

In this section, we extend Meta Ad to the setting with FLM by allowing offline nodes with insufficient budgets to accept fractional bid values equal to their remaining budgets in their last matching. In other words, by matching an online arrival t to an offline node u U, the agent receives an actual reward of min{wu,t, bu,t 1}, where wu,t is the bid value and bu,t 1 is the available budget at the beginning of round t.

B.1 Algorithm Design

Even under the FLM assumption, OBM with general bids is challenging because when an online node t arrives, if the remaining budget bu,t 1 of an offline node u is smaller than the bid wu,t, matching the arrival to this offline node can cause a reward loss of wu,t bu,t 1, which increases with the bid value wu,t. With FLM, the greedy algorithm (Greedy) can achieve a competitive ratio of 0.5 [23]. The competitive ratio achieved by a deterministic algorithm in [4] is (1 κ 1 κ (1+κ)1/κ ).

For OBM with FLM, we use a different meta algorithm as in Algorithm 3. When the remaining budget bu,t 1 for an offline node u is enough to accept arrival t (i.e. bu,t 1 wu,t), the scoring strategy is the same as Algorithm 1 which sets the score as su,t = wu,tϕ( bu,t 1

Bu ). Nonetheless, the scoring strategy is different from Algorithm 1 when the remaining budget bu,t 1 of an offline node u is insufficient for an online arrival t (i.e. bu,t 1 < wu,t). Without FLM, Algorithm 1 directly sets the score su,t as zero to avoid the selection of offline node u. However, FLM allows matching an offline node u to the online arrival t and consuming all the remaining budget bu,t 1 to obtain a reward of bu,t 1. Thus, Algorithm 3 can be greedier and sets the score as su,t = bu,t 1ϕ( bu,t 1

Bu ) to balance the actual reward increment and the budget consumption. Given an increasing function ϕ, the score increases with the remaining budget, and it is still possible to select an offline node with an insufficient but large enough remaining budget.

B.2 Competitive Analysis

In this section, we provide the competitive ratio of Algorithm 3 for OBM with FLM and discuss the insights and analysis techniques. The competitive ratio is given in Theorem B.1 with its proof deferred to Appendix B.3.

To prove the competitive ratio of Meta Ad for FLM, we still need to construct dual variables αu, u U βt, t [V ], which assists with online matching with a provable competitive ratio. The dual construction procedure is given in Algorithm 4. At each round t, same as the dual construction without FLM in Algorithm 2, βt is set as the score of the selected offline node and αu,t is set as φ( cu,t

Bu ). Different from Algorithm 2, αu is set at the end of the algorithm as below to satisfy dual feasibility with the FLM assumption.

αu = max αu,V , 1 bu,t 1

wu,t ϕ(bu,t 1

Bu ), t T , (20)

Algorithm 4 Dual Construction (with FLM) Require: The function ϕ : [0, 1] [0, 1], φ(x) = 1 ϕ(1 x), ρ 1.

Initialization: u U, αu,0 = 0, bu,0 = Bu, U = , T = , and t [V ], βt = 0. for t=1 to V , a new vertex t V arrives do

If there exist u such that bu,t 1 wu,t < 0, append {u | bu,t 1 wu,t < 0} into U and append t to T . Score su,t, select xt for the arrival t, and update budget bu,t by Algorithm 3. Set the dual variable βt = sxt,t, and set the dual variable αu,t = φ( cu,t

Bu ). end for For u / U , set αu = αu,V ; and for u U , set αu as Eqn. (20).

where t T is the round when budget insufficiency happens. This is to guarantee the dual feasibility βt bu,t 1(1 αu,t 1) wu,t(1 αu) when the offline node has an insufficient budget for an arrival. Note that the dual increment Λu = αu αu,V at the end of the Algorithm 4 can be less than the dual increment at the end of Algorithm 2, thus resulting in a better competitive ratio for OBM with FLM than without FLM.

With the constructed dual variables, the competitive ratio of Meta Ad for OBM with FLM is given in the next theorem.

Theorem B.1. If the function ϕ : [0, 1] [0, 1] in Algorithm 1 satisfies that given an integer n 2, i n 1, φ(i)(x) > 0 and R = maxx [0,1] φ(n)(x) where φ(x) = 1 ϕ(1 x), the competitive ratio of Algorithm 1 is

η(κ) = 1 1 + κn R + maxy [0,1] (y) + ϕ(κ),

where (y) = φ(y)

y R y x=0 φ(x)dx + 1

y Pn i=1 κi φ(i 1)(y) φ(i 1)(0) .

Given different φ, we can get concrete competitive algorithms for OBM with FLM. In this paper, we show the example of the competitive algorithm where φ is from the exponential function class.

Corollary B.1.1. If we assign φ(x) = Ceθx C with C 0 and 0 θ 1 in Meta Ad in Algorithm 3, we get the competitive ratio as

( 1 1+C+C(1 1

θ + κ 1 κθ )(eθ 1)+1+C Ceθ(1 κ) , 1 1

θ + κ 1 κθ 0

1 1+C+C(1 1

θ + κ 1 κθ )θ+1+C Ceθ(1 κ) , 1 1

θ + κ 1 κθ < 0 (21)

B.3 Proof of Theorem B.1 (Theorem 4.3)

Proof. Denote an equivalent bid as wu,t = min{wu,t, bu,t 1}. To guarantee the primal-dual ratio Pt 1

γ Dt, we get the condition as below.

u U Buαu,t +

u U Buαu,t + X

i=1,xi=u wu,i(1 φ(cu,i 1

u U, Buαu,t +

i=1,xi=u wu,i(1 φ(cu,i 1

Bu )) γ cu,t

wu,i Bu φ(cu,i 1

Bu ) + (γ 1)cu,t

where cu,t = Pt i=1,xi=u wxi,i.

Since αu,t = φ( cu,t

Bu ), we get the condition of φ as below.

wu,i Bu φ(cu,i 1

Bu ) + (γ 1)cu,t

Same as the setting with FLM, we can bound the discrete sum as below. If for i n 1, φ(i)(x) > 0, x [0, 1], then we have

wu,i Bu φ(cu,i 1

i=1 κiφ(i 1)(cu,t

i=1 κiφ(i 1)(0) κn Rcu,t

where R is the Lipschitz constant of φ(n)(x) if φ(n)(x) is not monotonically decreasing and R = 0 otherwise. Thus, a φ function satisfies the primal dual ratio Pt 1

γ Dt if it holds for any y [0, 1] that

x=0 φ(x)dx +

i=1 κiφ(i 1)(y) + (κn R γ + ρ)y

i=1 κiφ(i 1)(0). (25)

Finally, we bound P

u U BuΛu where Λu = αu αu,V . Λu > 0 can hold for u U because

αu = max n αu,V , n 1 bu,t 1

wu,t ϕ( bu,t 1

Bu ), t T oo by Eqn. (20). Thus, for a request t T , we have bu,t 1 wu,t < 0 and bu,t 1 κBu. Since φ is an increasing function, it holds that αu,V αu,t for t [V ]. Thus, after the V loops, we have for u U

wu,t ϕ(bu,t 1

wu,t ϕ(bu,t 1

Bu ) αu,t 1

(Bu bu,t 1) 1 φ(bu,t 1

cu,V (1 φ(1 κ)) ,

where the third inequality holds since wu,t Bu and the last inequality holds since Bu bu,t 1 = cu,t 1 cu,V . Thus, we have P

u U BuΛu P (1 φ(1 κ)).

Putting them together, we have P 1 γ (D ϕ(κ) P) which leads to the primal dual

ratio as P 1 γ+ϕ(κ)D. Since we have γ 1 + κn+1R + φ(y)

y 1 y R y x=0 φ(x)dx +

1 y Pn i=1 κi φ(i 1)(y) φ(i 1)(0) for any y [0, 1] to satisfy (25). Combining with Lemma 1, we get the competitive ratio in Theorem 4.3.

C Learning Augmented OBM

In this section, we exploit the competitive solution space in Meta Ad and propose to augment Meta Ad with ML predictions (called LOBM) to improve the average performance while still offering a guaranteed competitive ratio in the worst case.

C.1 Algorithm Design

A main technical challenge for OBM is to estimate the discounting value that discounts the bid values by a factor for the matching decision at round t. Thus, an ML model can be potentially leveraged to

replace the manual design of score assignment. More specifically, we can utilize an ML model to predict a discounting factor zu,t and set the score as su,t = wu,t(1 zu,t) for matching when the offline node u has a sufficient budget for the online arrival t. That is, we incorporate ML predictions into Meta Ad (i.e., LOBM) to explore alternative score assignment strategies that can outperform manual designs on average while still offering guaranteed competitiveness.

In learning-augmented online algorithms [31, 21], there exists an intrinsic trade-off between following ML predictions for average performance improvement and achieving better robustness in the worst case. Such trade-off between average and worst-case performances also exist in learning-augmented OBM ( LOBM). To better control the trade-off, we introduce a slackness parameter λ [0, 1] to relax the competitiveness requirement while allowing LOBM to improve the average performance through ML-based scoring within the competitive solution space.

If we blindly use the ML prediction as the discounting factor for matching, competitive ratio cannot be satisfied due to the lack of worst-case competitiveness for ML predictions. Thus, to ensure that LOBM still offers guaranteed competitiveness, we consider a competitive solution space based on the conditions for dual variables specified in Lemma 1.

Based on the competitive analysis with the exponential function class, we design a learning-augmented algorithm (i.e., LOBM) in Algorithm 5, which leverages ML prediction zu,t to improve the average performance while guaranteeing the worst-case competitive ratio. The key idea is to construct dual variables as we solve the primal problem online and utilize the dual variables to calibrate the ML prediction zu,t. In this way, the matching decisions by LOBM are guaranteed to be competitive in the worst case while utilizing the potential benefits of ML predictions.

We describe LOBM in Algorithm 5 as follows. At the beginning, we initialize the dual variables as zero. Whenever an online node arrives, the agent receives a ML prediction zu,t indicating the discounting factors for all the offline nodes u U. Instead of directly using the ML prediction to set the scores and selecting the offline node, LOBM projects the ML prediction zu,t into the competitive space Du,t by solving the following for all u U,

zu,t = arg min z Du,t |z zu,t| , (27)

which is a key step to ensure the competitive ratio. In order to better utilize the potential benefit of ML predictions, we use the projection operation in (27) to select the discounting factor zu,t out of competitive space Du,t, such that the selected zu,t is the closest to the ML prediction zu,t. Then, the projected value zu,t is used to set the scores su,t for offline nodes with sufficient budgets for the online arrival t, and the scores for offline nodes with insufficient budgets are set as zero and these offline nodes are appended to U . The scores based on calibrated ML predictions are then used to select the offline node for matching.

As the key design to guarantee the competitive ratio, the competitive space Du,t is based on the conditions for dual variables in Lemma 1 and the dual construction by the exponential function class (Corollary 4.2.2). The dual variables are constructed as follows. For the selected note xt and its score sxt,t, we update the dual variable αxt,t as αxt,t 1 + wxt,tzxt,t

λρθBxt + δxt,t, where ρθ = 1 1 eθ with θ > 0, zxt,t is the discounting factor when setting the score of xt (Line 4 of Algorithm 5), and

δxt,t = exp(θ(1 bxt,t 1/Bxt))

eθ 1 h exp( θwxt,t

Bxt ) 1 wxt,t

i is a variable relying on the bid value wxt,t and the remaining budget bxt,t 1. For unselected offline nodes, we keep their dual variables αu,t the same as αu,t 1 for u = xt. The dual variable βt is set based on the score of the selected offline node, i.e. βt = 1 λρθ sxt,t = 1 λρθ wxt,t(1 zxt,t). By constructing dual variables in this way, when an

action xt is selected, the primal objective Pt = Pt τ=1 wxτ ,τ increases by wxt,t and the dual objective Dt = Pt τ=1 Bxτ αxτ ,τ + βτ increases by Bxt(αxt,t αxt 1,t 1) + βt = wxt,t

λρθ + Bxtδxt,t. When an online arrival t is skipped without any matching, both primal and dual objectives remain the same with no updates. Thus, we can always ensure that the primal objective and the dual objective satisfy Dt = 1 λρθ Pt + Pt τ=1 Bxτ ,τδxτ ,τ for each t [T], leading to a bounded ratio of the primal objective to the dual objective at the end of each round. The parameter λ can be used to adjust the bound of primal-dual ratio, leading to different competitive ratios.

Next, we need to ensure that conditions in Lemma 1 are always satisfied no matter which offline node u U is selected at each round t. Thus, we construct the competitive space Du,t as below and

Algorithm 5 Learning-Augmented OBM (LOBM, w/o FLM)

1: Initialization: u U, bu,0 = Bu, u U, αu,0 = 0, β0, , βV = 0. 2: for t=1 to V , a new request t V arrives do 3: Get the ML prediction zu,t, u U 4: Project zu,t into Du,t in (28) and get zu,t, u U. 5: For all u U, if bu,t 1 wu,t 0, set score su,t = wu,t(1 zu,t); otherwise, set score su,t = 0 and append {u | bu,t 1 wu,t < 0} into U . 6: if u U, su,t = 0 then 7: Skip the online arrival t (xt = null). 8: else 9: Select xt = arg maxu U su,t. 10: end if 11: If xt = null, update budget bxt,t = bxt,t 1 wxt,t; and u = xt, bu,t = bu,t 1. 12: Update dual variables βt = 1 λρθ sxt,t. If xt = null, update αxt,t = αxt,t 1 + 1 λρθBxt wxt,tzxt,t + δxt,t.

13: end for 14: For u / U , set αu = αu,V ; and for u U , set αu = 1.

project the ML predictions zu,t into Du,t if they fall outside Du,t:

Du,t = z 0 1 λρθ wu,t(1 z) wu,t wu,tαu,t 1,

αu,t 1 + wu,t λρθBu z + δu,t exp(θ(1 (bu,t 1 wu,t)/Bu)) 1

Since the dual variable βt is set as 1 λρθ sxt,t after selecting the offline node xt with the highest su,t and sufficient budgets, βt is no less than 1 λρθ su,t = 1 λρθ wu,t(1 zu,t) for any u U. Thus, as long as the first inequality in (28) is satisfied, we always have the dual feasibility βt wu,t wu,tαu,t 1 in Lemma 1 if all the offline nodes have sufficient budgets for the arrival t. For the offline nodes with insufficient budgets for the online arrival t, we ensure the dual feasibility βt wu,t wu,tαu,t 1 by setting their corresponding dual variable αu as one after the matching process (Line 14 in Algorithm 5).

The second inequality in (28) sets a target for the increment of the dual variables αu,t, which forces the dual variable αu,t to be larger when the remaining budget becomes less. In this way, the score of an offline node u with fewer remaining budgets can be set lower to be conservative in consuming budgets. Also, since αu,t is larger when the remaining budget is less, the second inequality in (28) guarantees a large enough dual variable αu,t when u has insufficient budget for an arrival t. This keeps the additional dual increment after the matching process (Line 14 in Algorithm 5) bounded and further guarantees a bounded primal-dual ratio in the second condition of Lemma 1.

As we discussed, if the discounting factor zu,t at each round satisfies the inequalities in (28), the primal variables and the constructed dual variables will satisfy the conditions in Lemma 1, and so a competitive ratio for OBM is guaranteed. The size of the set Du,t is controlled by the hyper-parameter λ: with smaller λ, the size of Du,t becomes larger because the inequalities are easier to be satisfied. We will rigorously prove that Du,t is always non-empty given any λ [0, 1] to enable feasible competitive solutions that guarantee the competitive ratio bound in (29) in the robustness analysis of LOBM in Section C.2.

ML model training and inference. Given any ML predictions, Algorithm 5 provides a guarantee for the competitive ratio. Nonetheless, the average performance EG[P(π, G)] depends on the ML model that yields the ML prediction. Here, we briefly discuss how to achieve high average performance by training the ML model in an environment that is aware of the design of Algorithm 5. Note first that the projection operation is differentiable while the discrete matching decision is not differentiable. Thus, we apply policy gradient to train the ML model. Once the ML model is trained offline, it can be applied online to provide zu,t as advice for scoring and matching by LOBM (Line 3 in Algorithm 5).

C.2 Analysis

Now, we provide a robustness analysis of LOBM and formally show that LOBM always guarantees the competitive ratio.

A learning-augmented algorithm is robust if its competitive ratio is guaranteed for any problem instance given arbitrary ML predictions. We show that LOBM is robust in the sense that it offers competitive guarantees regardless of the quality of ML predictions.

Theorem C.1. Given the maximum bid-budget ratio κ [0, 1] and the slackness parameter λ [0, 1], with any ML predictions, LOBM in Algorithm 5 achieves a competitive ratio of

ˆη(κ) = λ(1 1

1 + λ 1 e θκ

κ 1 i+ , (29)

where [x]+ = x if x > 0 and [x]+ = 0 if x 0.

Theorem C.1 shows that LOBM can guarantee a competitiveness ratio of ˆη(κ) regardless of the ML prediction quality for any slackness parameter λ [0, 1]. The parameter λ [0, 1] determines the requirement for the worst-case competitive ratio and the flexibility to exploit the benefit of ML predictions. When λ = 0, there is no competitiveness requirement, the inequalities in the competitive space (28) always hold, and LOBM reduces to a pure ML-based algorithm with no competitive ratio guarantee. On the other hand, when λ = 1, LOBM achieves the highest competitive ratio. When λ is flexibly chosen between the competitive solution space also varies from whole solution space (with λ = 0) to the smallest competitive solution space (with λ = 1) in (28). Thus, LOBM achieves a flexible trade-off between the competitive guarantee and average performance by varying the levels of trust in ML predictions.

C.3 Proof of Theorem C.1 (Theorem 5.1in the main text)

Proof. The sequence of dual variables is constructed by Algorithm 5 We prove three claims leading to Theorem C.1.

The dual feasibility is satisfied,i.e. u U, t [V ], wu,tαu + βt wu,t.

The primal-dual ratio is guaranteed, i.e. P ηD.

The solution of projection (28) always exists, i.e. the feasible set of (28) is not empty for each round.

First, we prove the feasibility of dual variables (The first condition in Lemma 1). If βt = 0 for a round t [T], we have either wu,t = 0 or wu,t > 0 and αu = αu,V 1 holds for a slot u U by Line 14, so βt = 0 wu,t(1 αu) holds for the dual construction. On the other hand, if βt > 0 holds for round t [T], then the score sxt,t must be calculated based on the projected zxt,t. Thus, we have βt = 1 λρθ sxt,t 1 λρθ su,t = 1 λρθ wu,t(1 zu,t) wu,t(1 αu,t) wxt,t(1 αu) where

eθ and the second inequality holds by the first inequality of the set (28). This proves the feasibility of the dual variables

Next, we prove the primal-dual ratio (the second condition in Lemma 1 ) is satisfied. At round t, if no vertex is selected for a vertex t, the primal objective P and dual objective D do not increase. Otherwise, the primal objective increases by wxt,t and dual objective increases by Bxt(αxt,t αxt 1,t 1) + βt = wxt,t

λρθ + Bxtδxt,t where δxt,t = exp(θ(1 bxt,t 1/Bxt))

eθ 1 (exp( θwxt,t

Bxt ) 1 wxt,t

Bxt ). By Line 14 at the end of Algorithm 5, the dual variable αu,V increases to αu by Λu = αu αu,V . Thus, the dual objective can be written as D = 1 λρθ PV t=1 wxt,t + PV t=1 Bxtδxt,t + P

u U BuΛu, and further we have

t=1 wxt,t λ(1 1

t=1 Bxtδxt,t

To bound the right-hand-side, we have X

u U BuΛu eθ(1 e θκ)

u U Bu eθ(1 e θκ)P

eθ(1 e θκ)P

u U (1 κ)Bu = eθ

eθ 1 1 e θκ

For each u U, we sum up Buδu,t and get

t=1,xt=u Bxtδxt,t =

exp(θ(1 bu,t 1/Bu))

eθ 1 (Bu exp(θwu,t

Bu ) Bu wu,t)

exp(θ(1 bu,t 1/Bu))

eθ 1 wu,t Bu

wu,t exp(θwu,t

exp(θ(1 bu,t 1/Bu))

Bu exp(θ cu,V

θ(eθ 1) eθκ

where the first inequality holds because eθx

x 1 x 1 is an increasing function for x [0, 1], θ [0, 1], the second inequality holds since PV t=1,xt=u exp(θ(1 bu,t 1

Bu = PV t=1,xt=u exp(θ cu,t

Bu x=0 exp(θx)dx = 1

θ(exp(θ cu,V

Bu ) 1), and the last inequality

holds because Bu exp(θ cu,V

Bu ) 1 θ(eθ 1) = cu,V exp(θ cu,V

θ(eθ 1) cu,V

By summing up all bidders in U, we get

t=1 Bxtδt = X

t=1,xt=u Bxtδxt,t P 1

Continuing with inequality (30), we have

eθ 1 1 e θκ

Thus, by moving terms, we have

1 + λ 1 e θκ

κ 1 i+ . (35)

This proves the second condition in Theorem 1.

Finally, we prove that the solution of the projection into (28) always exists, i.e. the feasible set of (28) is not empty for each round. To do this, we prove by induction that

z u,t = λ1ρθ exp(θ(1 bu,t 1/Bu))

(eθ 1) , λ1 [λ, 1] (36)

is always feasible for the set (28). For the first round, the initialized dual variable αu,0 = 0, the initialized remaining budget bu,0 = Bu, so we have

αu,0 + wu,t λρθBu z u,1 + δu,1 exp(θ( wu,t

eθ 1 , (37)

(a) Meta Ad

Figure 3: Illustration of the scoring strategies in Meta Ad and LOBM. The example has 3 offline nodes (u1, u2, u3). The algorithms select the offline node with the largest score.

where the inequality holds because λ1 λ, so the second inequality of (28) holds for t = 1.

Also, it holds that

1 λρθ wu,1 1 z u,1 = wu,1

λ(eθ 1) λ1 λ(eθ 1)

λ wu,1 = wu,1(1 αu,0),

where the first inequality holds since λ1 1 and the second inequality holds since λ 1. Thus, we can prove the first inequality of (28) holds for initialization.

Then if the constraints are satisfied for the round before arrival t, then we have αu,t 1 exp(θ(1 bu,t 1/Bu)) 1

eθ 1 , and thus we have

αu,t 1 + wu,t λρθBu z u,t + δu,t

exp(θ(1 bu,t 1)/Bu)

eθ 1 exp(θwu,t

Bu ) = exp(θ(1 (bu,t 1 wu,t)/Bu)

eθ 1 . (39)

where the inequality holds because λ1 λ. Thus the second inequality of (28) holds.

Then, since αu,t 1 exp(θ(1 bu,t 1/Bu)) 1

eθ 1 , we have

1 λρθ wu,t 1 z u,t

λ(eθ 1) λ1 exp(θ(1 bu,t 1

wu,t(1 αu,t 1), (40)

where the inequality holds since λ 1 and λ1 1. Thus we prove the first inequality of (28) holds. In conclusion, we can always find a feasible dual variable update z u,t in the projection set (28).

D Empirical Results

To complement the theoretical analysis, we validate the empirical benefits of proposed algorithms by conducting numerical experiments for an online movie matching application.

D.1 Online Movie Matching

We evaluate the performances of our algorithms on the online movie matching application based on the Movie Lens Dataset [12].

D.1.1 Setup

In the application of online movie matching, each movie (i.e., an offline node) has a maximum budget set by advertisers. Once an online query arrives, the bid values of the query for all the movies are

Algorithms w/o ML Predictions ML-based Algorithms Greedy Primal Dual Meta Ad ML LOBM-0.8 LOBM-0.5 LOBM-0.3 Worst-case 0.7941 0.8429 0.8524 0.7903 0.8538 0.8324 0.8113 Average 0.9329 0.9340 0.9344 0.9355 0.9372 0.9371 0.9343 Table 2: Worst-case and and average normalized reward on the Movie Lens dataset. The best results among algorithms w/o ML predictions and the best results among ML-based algorithms are highlighted in bold font.

revealed to the matching platform agent. The platform agent needs to match a movie to each query, generating a reward equivalent to the bid value and consuming a budget of the bid value from the total budget of the matched movie. The bid value is determined by the relevance of the movie and the query. For example, if a movie is more relevant to the online query, there is a potentially higher value. The goal of the advertising platform is to maximize the total reward while satisfying the budget constraints of each movie. In this application, the platform agent does not allow a fractional fee for any matching. Thus, this matching problem is a OBM without FLM.

We run the online movie matching application based on a real dataset of Movie Lens [12]. The Movie Lens dataset provides data on the relevance of movies and users. We generate bipartite graphs, each with U = 10 offline nodes (movies) and V = 100 online nodes (queries/users) based on the Movie Lens dataset. For each graph instance, we sample 10 movies uniformly without replacement and 100 users uniformly with replacement. A bid value scaled based on relevance is assigned to each edge between the offline node and the online node. The total budget for each offline node is sampled from a normal distribution with a mean of 1 and a standard deviation of 0.1, and the maximum bid value is 0.1 (i.e., κ = 0.1). We generate 10k, 1k, and 1k samples of graph instances based on the Movie Lens dataset for training, validation and testing, respectively. To test the ML performance with out-of-distribution/adversarial examples, we also create examples by modifying 10% examples in the testing dataset and randomly removing edges and/or rescaling the weights.

We compare our algorithms with the most common baselines for OBM as listed below.

OPT: The offline optimal solution is obtained using Gurobi [11] for each graph instance. Greedy: The greedy algorithm [23] matches an online node to the available offline node that is connected to the node and has the highest bid value. Greedy has a strong empirical performance and is a special case of Meta Ad with θ . Primal Dual: Primal Dual [23] calculates the scores of each bidder for each online node based on both the bid values and the remaining budgets, and then selects for an online node the available bidder with the highest score. It is a special case of Meta Ad with θ 1. ML: A policy-gradient algorithm that solves the OBM problem [1]. The inputs to the policy model are the available history information including the current bid value, the remaining budget of each offline node and the average matched bid value.

We evaluate the performances of Meta Ad with the discounting function φ(x) = eθx 1

eθ 1 and LOBM in Algorithm 5. The illustration of Meta Ad for round t is shown in Figure 3(a). LOBM-λ is LOBM with the slackness parameter λ in the competitive solution space (28). The illustration of LOBM for round t is given in Figure 3(b). The The optimal parameter θ governing the level of conservativeness in Meta Ad is tuned based on the validation dataset. We also evaluate the performance of Meta Ad under different choices of θ, and evaluate LOBM with ML predictions under different choices of the hyper-parameter λ [0, 1] and use LOBM-λ to represent LOBM with the hyper-parameter λ in Du,t in (28). For a fair comparison, we use the same neural architecture as ML in LOBM. The neural network has two layers, each with 200 hidden neurons. The neural networks are trained by Adam optimizer with a learning rate of 10 3 for 50 epochs. The training process on a laptop takes around 1 hour, while the inference process over each instance takes less than one second.

D.1.2 Results

The empirical worst-case and average reward (normalized by the optimal reward) based on the Movie Lens dataset are shown in Table 2. In this table, the parameter θ of Meta Ad is 0.7 by default which is obtained by tuning on a validation dataset. We find that Meta Ad can achieve a higher

= 0.5 = 0.7 = 1 = 4 = 16 0.80

Performance

0.8482 0.8524

0.9341 0.9339 0.934 0.9345 0.9334 Worst Case Average

(a) Worst-case/average reward of Meta Ad

0.0 0.2 0.4 0.6 0.8 1.0 Parameter

LOBM ( = 0.5) LOBM ( = 0.7) LOBM ( = 1)

(b) Worst-case reward of LOBM

0.0 0.2 0.4 0.6 0.8 1.0 Parameter

LOBM ( = 0.5) LOBM ( = 0.7) LOBM ( = 1)

(c) Average reward of LOBM

Figure 4: (a) Worst-case and average reward of Meta Ad with different choices of θ. (b) Worst-case reward of LOBM with different choices of θ and λ. (c)Average reward of LOBM with different choices of θ and λ.

worst-case reward ratio than alternative competitive algorithms without predictions (i.e., Greedy and Primal Dual). Through training, ML can achieve a higher average reward than competitive algorithms without predictions. However, due to the existence of out-of-distribution testing examples, ML has a lower worst-case reward ratio than competitive algorithms that have theoretical worst-case performance guarantees. LOBM can significantly improve the worst-case performance of ML. This is because the projection of ML predictions onto the competitive solution space in 28 corrects low-quality ML predictions. Interestingly, LOBM with λ = 0.8 achieves the best empirical worstcase and average performance, demonstrating the superiority of LOBM despite that its competitive ratio is lower than that of Meta Ad. The high average performance of LOBM shows that LOBM can effectively utilize the benefits of good ML predictions to improve the average performance while offering guaranteed competitiveness. Importantly, when λ [0, 1] decreases, the requirements for the worst-case performance are more relaxed, and hence LOBM achieves a higher average reward but a lower worst-case reward.

The effects of θ in Meta Ad. To validate the effects of the hyper-parameter θ on the performance of Meta Ad with φ(x) = eθx 1

eθ 1 , we give more details of the performances of Meta Ad under different choices of θ in Fig. 4(a). We give both the empirical worst-case and average reward of Meta Ad with different choices of θ. The results show that the average reward of Meta Ad is not significantly affected by the choice of θ, but θ has a large effect on the empirical worst-case reward. This is because θ controls the conservativeness of Meta Ad and hence is crucial for the worst-case competitive ratio when κ = 0 as discussed in Section 4.2. More specifically, a larger worst-case reward can be obtained with a smaller θ for the Movie Lens dataset. The reason is that a higher level of conservativeness is needed when the maximum bid-budget ratio κ is not zero.

The effects of θ and λ in LOBM. The empirical worst-case and average rewards of LOBM with different choices of θ and λ are provided in Fig. 4(b) and Fig. 4(c), respectively. Different choices of θ yield different competitive solution spaces, while the choices of λ specify the relaxed robustness requirements of the worst-case competitive ratio for LOBM. Thus, we can get a different competitive ratio for LOBM by setting different θ and λ as shown in Theorem C.1. These theoretical findings are validated by our numerical results.

95 96 97 98 99 100 Percentile

Reward ratio

ML LOBM( = 0.8) LOBM( = 0.4)

LOBM( = 0.6)

Meta Ad (Exp)

Figure 5: Reward (normalized by the offline optimal reward) at high percentiles (95% - 100%). θ is chosen as 1.

As we can see from Fig. 4(b) and Fig 4(c), when λ = 0, the inequalities in the robust region (28) always hold, and hence LOBM reduces to pure ML and gives the same competitive ratio and average reward as ML. When λ = 1, LOBM guarantees the same competitive ratio as Meta Ad, but does not necessarily always follow the solutions of Meta Ad for each problem instance, since there exist other solutions that also satisfy the robustness requirement for certain problem instances. Therefore, when λ = 1, the competitive ratio and average reward of LOBM are close to but can be higher than those of Meta Ad when the ML model used by LOBM is well trained. When λ lies between 0 and 1, we can find that for some choices of θ, LOBM can achieve an even better average reward than ML. This improvement comes from the fact that the competitive solution space in (28) can correct some low-quality ML predictions on certain problem instances. Also, for certain choices of θ, LOBM can empirically achieve a better worst-case reward than Meta Ad,

because LOBM can perform well due to ML predictions on some problem instances where Meta Ad does not perform well. This observation validates that LOBM can effectively utilize the ML predictions to improve the average performance while guaranteeing a worst-case competitive ratio.

Tail reward performance. Last but not least, to evaluate the performance on adversarial/out-ofdistribution instances, we show in Fig. 5 the reward (normalized by the offline optimal reward) at high percentiles from 95% to 100%. We observe that the reward of ML quickly decreases when the percentile becomes higher and becomes the lowest at the high percentiles (larger than 98%), showing that ML is vulnerable to adversarial instances. Due to the worst-case competitiveness guarantees, Meta Ad achieves a relatively higher reward even at high percentiles. Moreover, since LOBM guarantees the worst-case performance by the competitive solution space, the rewards of LOBM with different λ are all higher than ML at high percentiles. The high percentile reward of LOBM increases with λ because a larger choice of λ guarantees a higher competitive ratio according to Theorem C.1. Interestingly, we can find that the rewards of LOBM at high percentiles are even larger than Meta Ad when λ is 0.6 or 0.8. This validates that when the ML model is well trained to provide high-quality predictions, LOBM can become more powerful and explore better matching decisions than the purely manual design of Meta Ad.

D.2 Online VM Placement

D.2.1 Problem setting

Virtual Machine (VM) placement is the process of matching the newly-created VMs to the most suitable servers in cloud data centers [13, 22, 26]. In this problem, once an end user send a VM request, the cloud operator needs to select a physical server for it. Different VM requests require different amount of physical computing resources. For example, the compute-optimized instances of Amazon EC2 [2] have different sizes, each requires different amount of computing resources. Due to the hardware heterogeneity [27], the available computing resources on different servers are different and the utilities of different servers can also be different. Our goal is to optimize the total utility of VM placement.

We consider a setup where the cloud manager allocates V VMs (online nodes) to U different physical servers (offline nodes). Based on the requirement of VMs, a VM request can be matched to a subset of the physical servers. The connections between VM requests and physical servers are represented by a bipartite graph G. A VM request t at round t, t V has a computing load in the number of computing units denoted as zt. Each server u U has a limited capacity of the computing units (e.g., virtual cores) denoted as B u. If the VM request v is placed on a server u, the manager receives a utility proportional to the computing load wu,t = ru zt where ru is the utility of one computing unit on server u. Denoting xu,t {0, 1} as the decision on whether to place request t on server u, the objective of the VM placement problem can be formulated as an OBM:

u U wu,txu,t

t=1 wu,txu,t Bu, t [V ], X

u U xu,t 1, u U, v [V ], xu,t 0,

where wu,t = ru zt and Bu = ru B u. In this OBM, the VM request is not divisible, which means fractional matching is not allowed at any time and FLM does not apply.

D.2.2 Experiment setting

In the experiment, the cloud manager allocates V = 100 VMs (online nodes) to U = 10 different physical servers (offline nodes). We randomly generate graphs by Barabási Albert method [3]. For an online node v, we sample its degree (the number of offline nodes connected to it) by a Binomial distribution B(U, dv/U) where dv is the average degree of node v. The average degrees of online nodes are chosen from 4, 2, and 0.5. Each server has a capacity on the number of the computing units. The capacity B u is sampled from a uniform distribution on the range [20, 40]. The computing load of a VM request is sampled from a uniform distribution on the range [1, 4]. The utility per computing unit ru is the price of a computing unit on the server u. We choose the price (in dollars) in the range [0.08, 0.12] according to the prices of the compute-optimized instances on Amazon EC2

Algorithms w/o ML Predictions ML-based Algorithms Greedy Primal Dual Meta Ad ML LOBM-0.8 LOBM-0.5 LOBM-0.3 Worst-case 0.6528 0.7937 0.8027 0.8005 0.8432 0.8253 0.8277 Average 0.6950 0.8343 0.8449 0.9626 0.934 0.9619 0.9610 Table 3: Worst-case and average rewards of different algorithms for VM placement. The worst-case and average rewards are normalized by optimal rewards. We compare Meta Ad with the algorithms without using ML (Greedy and Primal Dual introduced in Section D.1.1). Additionally, we compare our learning-augmented algorithm LOBM with the ML algorithm. LOBM-λ means LOBM with a slackness parameter λ in Eqn. (28).

[2]. We randomly generate 20k, 1k, and 1k samples of BA graphs for training, validation and testing, respectively.

We compare our algorithms with baselines for OBM listed below. We compare our algorithms with the most common baselines for OBM as listed below.

OPT: The offline optimal solution is obtained using Gurobi [11] for each graph instance. Greedy: The greedy algorithm [23] matches an online node to the available offline node that is connected to the node and has the highest bid value. Greedy has a strong empirical performance and is a special case of Meta Ad with θ . Primal Dual: Primal Dual [23] calculates the scores of each bidder for each online node based on both the bid values and the remaining budgets, and then selects for an online node the available bidder with the highest score. It is a special case of Meta Ad with θ 1. ML: A policy-gradient algorithm that solves the OBM problem [1]. The inputs to the policy model are the available history information including the current bid value, the remaining budget of each offline node and the average matched bid value.

For learning-based algorithms, we use the neural networks which have two layers, each with 200 hidden neurons for fair comparison. The neural networks are trained by Adam optimizer with a learning rate of 10 3 for 50 epochs. Likewise, we use LOBM-λ to refer to LOBM with a hyper-parameter λ governing the competitiveness requirement in (28).

D.2.3 Results

We first show Worst-case and average rewards of different algorithms for VM placement in Table 3. We observe that Meta Ad achieves a higher worst-case and average reward than the the other algorithms without using ML (i.e., Greedy and Primal Dual). This is because Meta Ad is more flexible to adjust the discounting function. Additionally, we can find that ML predictions can significantly improve the average performance compared to algorithms without ML predictions. In particular, ML even has an empirically higher worst-case reward than Greedy and Primal Dual, although it does not have a theoretical guarantee in terms of the worst-case competitive ratio. Note also that the worstcase reward ratio on the finite testing dataset is an empirical evaluation, and the true competitive ratio of ML without the theoretical guarantee can be even much lower than presented in the table. Importantly, we can find that LOBM can achieve a high average performance while guaranteeing a worst-case competitive ratio theoretically as shown in Theorem 5.1. Moreover, LOBM achieves the highest empirical worst-case reward among all the algorithms, because LOBM can effectively correct low-quality ML predictions for some difficult testing examples by learning augmented design and meanwhile also leverage good ML predictions to improve the performance for other testing examples.

Effects of θ and λ. Next, we give the ablation study of Meta Ad and LOBM for different choices of parameters θ and λ in Fig. 6. The parameter θ is the constant in the exponential discounting function and the parameter λ is the slackness parameter in the competitive space in Eqn. (28). First, we give the worst-case and average rewards of Meta Ad under different choices of θ in Fig. 6(a). We can find that compared with Primal Dual (i.e., θ = 1), Meta Ad can further improve the worst-case performance for the general bid settings by decreasing θ which is consistent with the competitive analysis in Corollary4.2.2.

Moreover, we provide the worst-case and average rewards for LOBM under different θ and λ in Fig. 6(b) and Fig. 6(c), respectively. The results show that when λ = 0, LOBM reduces to pure ML and achieves the same worst-case and average performances as ML. The worst-case performance can

= 0.6 = 0.8 = 1 = 4

Performance

0.7993 0.8027 0.7937

0.8494 0.8449 0.8399

Worst Case Average

(a) Meta Ad with exponential function class

0.0 0.2 0.4 0.6 0.8 1.0 Parameter

LOBM ( = 0.8) LOBM ( = 1)

(b) Worst-case reward of LOBM

0.0 0.2 0.4 0.6 0.8 1.0 Parameter

LOBM ( = 0.8) LOBM ( = 1)

(c) Average reward of LOBM

Figure 6: (a) Worst-case and average rewards of Meta Ad with the exponential function class (a) Worst-case reward of LOBM. (b)Average reward of LOBM. The worst-case and average rewards are normalized by optimal rewards and are calculated empirically based on a testing dataset with 1000 samples.

be improved by increasing λ since LOBM with a larger λ has a higher competitive ratio guarantee according to Theorem C.1. However, the average performance can be affected when λ becomes larger, because a larger λ results in a smaller solution space for increased robustness and hence may exclude some solutions with high rewards for average cases. When λ = 1, LOBM shares the same theoretical competitive ratio bound as Meta Ad, but it can still achieve better empirical worst-case and average rewards than Meta Ad when the ML model is well trained.

Neur IPS Paper Checklist

The checklist is designed to encourage best practices for responsible machine learning research, addressing issues of reproducibility, transparency, research ethics, and societal impact. Do not remove the checklist: The papers not including the checklist will be desk rejected. The checklist should follow the references and precede the (optional) supplemental material. The checklist does NOT count towards the page limit.

Please read the checklist guidelines carefully for information on how to answer these questions. For each question in the checklist:

You should answer [Yes] , [No] , or [NA] .

[NA] means either that the question is Not Applicable for that particular paper or the relevant information is Not Available.

Please provide a short (1 2 sentence) justification right after your answer (even for NA).

The checklist answers are an integral part of your paper submission. They are visible to the reviewers, area chairs, senior area chairs, and ethics reviewers. You will be asked to also include it (after eventual revisions) with the final version of your paper, and its final version will be published with the paper.

The reviewers of your paper will be asked to use the checklist as one of the factors in their evaluation. While "[Yes] " is generally preferable to "[No] ", it is perfectly acceptable to answer "[No] " provided a proper justification is given (e.g., "error bars are not reported because it would be too computationally expensive" or "we were unable to find the license for the dataset we used"). In general, answering "[No] " or "[NA] " is not grounds for rejection. While the questions are phrased in a binary way, we acknowledge that the true answer is often more nuanced, so please just use your best judgment and write a justification to elaborate. All supporting evidence can appear either in the main paper or the supplemental material, provided in appendix. If you answer [Yes] to a question, in the justification please point to the section(s) where related material for the question can be found.

IMPORTANT, please:

Delete this instruction block, but keep the section heading Neur IPS paper checklist",

Keep the checklist subsection headings, questions/answers and guidelines below.

Do not modify the questions and only use the provided macros for your answers.

Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope?

Answer: [Yes]

Justification: The abstract and introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations.

Guidelines:

The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.

2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors?

Answer: [Yes]

Justification: The limitations are discussed in the conclusion section which are mainly about the open question about the best choice of the discounting function. Guidelines:

The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations. 3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof? Answer: [Yes] Justification: Our paper summarizes the assumptions listed in Section 3. The proofs of our theorems are given in the appendix. Guidelines:

The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced. 4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)? Answer: [Yes] Justification: The paper has provided detailed settings of the experiments in Section D Guidelines:

The answer NA means that the paper does not include experiments. If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results. 5. Open access to data and code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We will open source the codes upon the publication of the paper. Guidelines:

The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable).

Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted.

6. Experimental Setting/Details

Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results?

Answer: [Yes]

Justification: The training and test details are summarized in Section D.

Guidelines:

The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material.

7. Experiment Statistical Significance

Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?

Answer: [Yes]

Justification: The paper reports the high percentile performance in Figure D.1.2.

Guidelines:

The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean. It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text.

8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments?

Answer: [Yes]

Justification: The experiments can be reproduced by a personal computer with CPU.

Guidelines:

The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage.

The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper).

9. Code Of Ethics

Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines?

Answer: [Yes]

Justification:

Guidelines:

The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction).

10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?

Answer: [Yes] Justification: We do not foresee negative societal impacts or the need of safeguards due to the theoretical nature of our work.

Guidelines:

The answer NA means that there is no societal impact of the work performed. If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).

11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?

Answer: [NA] Justification: We didn t foresee the need of safeguards due to the theoretical nature of our work.

Guidelines:

The answer NA means that the paper poses no such risks. Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.

12. Licenses for existing assets

Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?

Answer: [Yes]

Justification: All original sources of the datasets used in the paper are cited.

Guidelines:

The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. If this information is not available online, the authors are encouraged to reach out to the asset s creators.

13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?

Answer: [NA]

Justification:

Guidelines:

The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file.

14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?

Answer: [NA] Justification: Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.