# online_submodular_maximization_via_adaptive_thresholds__d90f91bb.pdf

Online Submodular Maximization via Adaptive Thresholds

Zhengchen Yang1 and Jiping Zheng1,2

1College of Computer Science & Technology, Nanjing University of Aeronautics & Astronautics, Nanjing, China 2State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China {yangzc, jzh}@nuaa.edu.cn

Submodular function maximization has been studied extensively in recent years due to its numerous applications in machine learning and artiﬁcial intelligence. We study a natural online variant of this problem on massive streaming data in which elements arrive one-by-one and the algorithm has to maintain a solution under cardinality constraint, i.e., k. Upon arrival of an element, the algorithm to maximize a monotone submodular function has to decide whether to accept the element and may replace a previously chosen element. Existing algorithms cannot simultaneously achieve optimal performance in terms of competitive ratio, memory complexity and running time. Also, the algorithm with best competitive ratio performs poorly in practice. In this paper, we propose a new algorithm ONLINEADAPTIVE with optimal performance by exploiting adaptive thresholds to decide the acceptance of arriving elements by replacement. We prove that the competitive ratio of ONLINEADAPTIVE is at least 1/4, and the ratio is about 0.2959 when k 4 and approaches 0.3178 when k tends to inﬁnity. In addition, ONLINEADAPTIVE only needs O(k) memory and just performs one oracle per element. Experiments on diverse datasets conﬁrm that ONLINEADAPTIVE outperforms existing algorithms in both quality and efﬁciency.

1 Introduction Submodularity exhibits the diminishing return property of set functions arising in numerous machine learning and artiﬁcial intelligence applications, such as data summarization [Mirzasoleiman et al., 2016; Kumari and Bilmes, 2021], active learning [Wei et al., 2015], feature selection [Schlegel et al., 2017], inﬂuence maximization [Becker et al., 2022], and user recommendation [Ashkan et al., 2015]. A set function f : 2V R+ with a ground set V is submodular if f(A {e}) f(A) f(B {e}) f(B) for all sets A B V and every element e V \B. The submodular function f is monotone if for all A B we have f(A) f(B). As a natural extension, online submodular maximization on monotone functions plays an important

role on ubiquitous streaming data in real-world applications. Consider an example of a soccer team manager recruiting k players for the 2026 World Cup. When getting an additional application from a new potential player, the manager needs to make decision whether to recruit the player. When the number of the players in his team exceeds k, e.g., k = 11, a replacement occurs for satisfying the cardinality constraint k. The goal of the manager is to maximize the quality of the team which is indeed a submodular function of the chosen players [Buchbinder et al., 2015]. As another example, in online or reinforcement learning, to avoid the inefﬁciency of kernel prototype selection for multiple passes over the dataset and maintaining a large number of parallel solutions, the prototypes are selected in the online manner [Schlegel et al., 2017]. Speciﬁcally, when a prototype from an inﬁnite or even uncountable space of observations is given, the algorithm needs to immediately decide whether to include the prototype. The objective is to maximize the coverage time to achieve the optimal solution of the chosen kernel prototypes. Also, similar scenarios can be found in many applications. In this paper, we consider the online monotone submodular maximization problem with replacement subject to cardinality constraint k. An immediate decision is made for an arriving element and the discarded elements will never be considered, and the accepted elements may be replaced by oncoming elements when better results can be achieved.

Currently, the representative streaming submodular maximization algorithms, such as STREAMGREEDY [Gomes and Krause, 2010], SIEVE-STREAMING [Badanidiyuru et al., 2014] and SIEVE-STREAMING++ [Kazemi et al., 2019], SALSA [Norouzi-Fard et al., 2018], QUICKSTREAM [Kuhnle, 2021], and THREESIEVES [Buschj ager et al., 2021] cannot meet the requirements of the above applications since these algorithms need to cache some elements for later evaluation. Our considered problem is also different from the classical submodular secretary problem [Hajiaghayi et al., 2004; Bateni et al., 2013] for the accepted elements may be replaced by oncoming elements when the algorithm achieves better solutions. Existing online submodular maximization algorithms in the online setting with replacement cannot simultaneously perform well in the three aspects, i.e., competitive ratio, memory complexity and running time. INDEPENDENTSETIMPROVEMENT [Chakrabarti and Kale, 2015], STREAMINGGREEDY [Chekuri et al., 2015] and PREEMP-

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

TION [Buchbinder et al., 2015; Buchbinder et al., 2019] do not achieve the best competitive ratio and they all have the competitive ratio 1/4. Moreover, STREAMINGGREEDY and PREEMPTION need O(k) oracles (function calls) for each element thus leading to more time consumption. The current state-of-the-art theoretical result for online submodular maximization with replacement is achieved by the FREEDISPOSAL algorithm [Chan et al., 2017; Chan et al., 2018]. However, the memory complexity of FREEDISPOSAL is O(n) which is intolerable for large-scale datasets in practice. Our contributions. We design a novel algorithm ONLINEADAPTIVE simultaneously providing the best competitive ratio, memory complexity and running time for the online monotone submodular maximization problem with replacement. More precisely, we consider the optimization problem

max S V,|S|=k f(S),

where V is a possibly inﬁnite ground set where the elements of V = {v1, v2, ..., vn} are revealed one-by-one and S V is a solution set which is initialized to an empty set and the revealed elements are added to S according to the elaborately designed thresholds. At any timestamp t (1 t n), St is the feasible solution set after element vt is revealed. When |S| < k, the elements are selected into S by the thresholds. For the revealed element vt when |S| = k, a replacement may occur to get a better result set. We assume that f is given as an evaluation oracle: when we specify S V , the oracle returns the value of f(S). Our ONLINEADAPTIVE algorithm achieves a competitive ratio about 0.2959 when k 4 by using adaptive thresholds for replacement and the competitive ratio approaches 0.3178 when k tends to inﬁnity which are the same to the current state-of-the-art results. Besides this, ONLINEADAPTIVE only needs O(k) memory by building a relationship to the set of all previously accepted elements. Moreover, for each revealed element, ONLINEADAPTIVE only needs O(1) oracles. We also conduct experiments on diverse datasets and the experimental results show that the quality, i.e., submodular function value of our ONLINEADAPTIVE algorithm is comparable to other algorithms. Furthermore, ONLINEADAPTIVE runs faster than existing algorithms and the number of oracles is much less than those of the other algorithms.

2 Related Work After the seminal work [Nemhauser et al., 1978] exhibits nice properties of submodular functions which are apt to obtain near-optimal solutions by a simple greedy algorithm GREEDY with 1 1/e approximation ratio, these decades witness the ﬂourish of the study on submodular function maximization [Krause and Golovin, 2014; Bilmes, 2023]. Following we only review the literature of the monotone submodular function maximization over streaming data. Note that the studied problem of [Streeter et al., 2009; Si-Salem et al., 2024] with the same name focuses to reveal the submodular functions and is with different process and objective. Thus, it is completely different from ours. To deal with the massive streaming data, STREAMGREEDY [Gomes and Krause, 2010] extends the GREEDY al-

gorithm by simply accepting the ﬁrst k elements and continuing to accept the elements if the improvement of replacing an element in the current solution set surpasses a ﬁxed threshold. STREAMGREEDY achieves a 1/2 ϵ approximation only if multiple passes over the streaming data are allowed, where ϵ is related to the number of passes and some speciﬁed parameters. The ﬁrst proper streaming algorithm with the same theoretical guarantee is SIEVE-STREAMING [Badanidiyuru et al., 2014] which maintains a number of parallel solutions using different sizes of sieves and at arbitrary time period the solution with the maximum function value is returned as the result set. To make SIEVE-STREAMING more efﬁcient or practical, several enhancements are provided. The SIEVE-STREAMING++ algorithm [Kazemi et al., 2019] reduces the number of sieves by offering a better lower bound for the function value. Moreover, SIEVE-STREAMING++ only requires O(k/ϵ) memory instead of O(k log k/ϵ) of SIEVE-STREAMING. The SALSA algorithm [Norouzi-Fard et al., 2018] also achieves 1/2 ϵ approximation but only needs O(k log k) memory by the assumption that the elements in data streams arrive in a random order. The QUICKSTREAM [Kuhnle, 2021] and THREESIEVES [Buschj ager et al., 2021] algorithms aim at decreasing the number of oracles to improve efﬁciency. QUICKSTREAM evaluates the function f every c elements and achieves a 1/(4c) ϵ approximation while THREESIEVES only accepts informative elements in data streams leading to O(1) oracles per element and O(k) memory, and the approximation further improves to (1 ϵ)(1 1/e) in high probability. Since these streaming algorithms need to maintain parallel solutions, i.e., the elements are cached without immediate decision, they are not suitable for the online setting. To meet the requirements of the online submodular maximization, some of the single-pass streaming algorithms are indeed applicable to the online scenarios, such as STREAMINGGREEDY [Chekuri et al., 2015] and INDEPENDENTSETIMPROVEMENT [Chakrabarti and Kale, 2015]. They share the same structure that the ﬁrst k elements are directly accepted and replacement occurs when the marginal gain of an arriving element doubles the minimum marginal gain among the elements in the solution set. They both achieve a competitive ratio of 1/4 and O(k) memory complexity by recording the marginal gains, INDEPENDENTSETIMPROVEMENT only conducts one oracle per element instead of O(k) oracles for each element of the STREAMINGGREEDY algorithm. The PREEMPTION algorithm [Buchbinder et al., 2015; Buchbinder et al., 2019] is explicitly designed for the submodular function maximization in the online setting. PREEMPTION also accepts the ﬁrst k elements and continues to accept the elements if the improvement of replacing an element in the current solution surpasses the average value of the elements in it. PREEMPTION achieves the same competitive ratio and memory complexity as STREAMINGGREEDY and INDEPENDENTSETIMPROVEMENT but its theoretical analysis is much simpler than those. The FREEDISPOSAL algorithm [Chan et al., 2017; Chan et al., 2018] achieves the current state-of-the-art competitive ratio by delicately designing the acceptance conditions with the help of an auxiliary set storing all ever ac-

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

cepted elements. FREEDISPOSAL has a competitive ratio at least 0.2959, and the ratio approaches 0.3178 when k tends to inﬁnity at the cost of O(n) memory to store the auxiliary set. For larger-scale streaming data, the memory consumption is intolerable which makes FREEDISPOSAL inapplicable in real-world applications. By the analysis of the existing algorithms for online monotone submodular function maximization, none of the algorithms can simultaneously achieve the optimal competitive ratio, acceptable memory complexity and the fewest number of oracles which usually results in the least running time.

3 Preliminaries

In this section, we present some deﬁnitions and observations that are helpful in our analysis. Let [e] denote the timestamp when element e arrives. We ﬁrst provide the deﬁnitions of marginal gain and incremental value of e where e St 1, i.e., [e] < t.

Deﬁnition 1. The marginal gain of e over its solution set S[e] 1 is deﬁned as f(e|S[e] 1) = f(S[e] 1 {e}) f(S[e] 1).

Deﬁnition 2. The incremental value of e over its solution set S[e] 1 and current solution set St is deﬁned as f(e|S[e] 1 St 1) = f(S[e] 1 St 1 {e}) f(S[e] 1 St 1).

Let m(e) and ct(e) denote the marginal gain and the incremental value of e respectively, and we have ct(e) m(e) due to submodularity of f. To give a lower bound of f(St), we use the trivial deﬁnition that f( ) 0 and accumulate the marginal gains of elements in St, then we naturally obtain:

Lemma 1. For a timestamp t {1, 2, . . . , n}, the solution set is St, and its value is at least

To achieve the optimal competitive ratio for the algorithm, we observe that it is related to the cardinality constraint, i.e., k. Moreover, we only consider k 4 because the trivial algorithm that only selects the singleton with the largest value achieves a 1/k competitive ratio. For a ﬁxed k value, we deﬁne three constants: ζ := log2 log1.2 k, η denoting the positive root of the equation (1 + x)(k+1) = kx + x + 2 and β0 := 1+kη (1+η)k 1. Next, based on these we deﬁne three parameters αt, βt and τt varying along with t similar to [Chan et al., 2017; Ene and Nguyen, 2022]: αt := exp(( |St|

k )ζ log r) η, βt := 1+kαt (1+αt)k 1, and τt := P|St| i=1[(1 + αt)i 1 m(ei)], where r is an adjusting parameter and m(ei) is the i-th largest marginal gain in {m(e) : e St}. After these deﬁnitions, we observe that our deﬁned threshold τt satisﬁes monotonicity property.

Lemma 2. τt is monotone non-decreasing with timestamp t, i.e., for the timestamp t {1, 2, . . . , n}, we have τt 1 τt.

Lemma 2 apparently holds due to non-decreasing of αt and m(ei).

Algorithm 1: The ONLINEADAPTIVE algorithm

Input: Cardinality constraint k, parameters β0, ζ, η, r. Output: The solution set St at timestamp t.

1 S0 , τ0 0.

2 foreach arriving element vt do

3 m(vt) = f(vt|St 1).

4 if m(vt) βt 1

k τt 1 then

5 if |St 1| < k then

6 St St 1 {vt}.

8 Let v t = arg mine St 1 m(e).

9 St St 1 {vt} \ {v t}.

10 αt exp(( |St|

k )ζ log r) η.

11 βt 1+kαt (1+αt)k 1.

12 τt P|St| i=1[(1 + αt)i 1 m(ei)], where m(ei) is the i-th largest marginal gain in {m(e) : e St}.

13 return St

4 The ONLINEADAPTIVE Algorithm In this section, we present our ONLINEADAPTIVE algorithm for maximizing monotone submodular functions in the online setting. We ﬁrst show the advantages of our adaptive thresholding strategy. Then ONLINEADAPTIVE is provided based on the constants and parameters deﬁned before. Replacement condition. For the arriving element vt, our adaptive thresholding strategy can be simply expressed as

By expanding the marginal gain m(vt) and the threshold τt 1 as deﬁned, we obtain the general form of the replacement condition:

f(vt|St 1) βt 1

i=1 [(1 + αt 1)i 1 f(ei|St 1)],

where ei is with the i-th largest marginal gain in {m(e) : e St 1}. The thresholds based on the delicate constants and parameters pave the way to achieving the state-of-the-art competitive ratio. The algorithm. Algorithm 1 shows the details of our proposed ONLINEADAPTIVE algorithm. Different from previous online algorithms such as STREAMINGGREEDY, INDEPENDENTSETIMPROVEMENT and PREEMPTION, the ﬁrst k elements are not directly accepted but need to satisfy the replacement condition (Lines 4-6) which guarantees the accepted elements have enough contributions when |St| < k. In our ONLINEADAPTIVE algorithm, the element in St 1 with least contribution is replaced when |St| = k (Lines 89). After that, the parameters αt, βt and τt are updated along with the solution set St (Lines 10-12) and αt and βt remain unchanged when |St| = k. Note that αt is adjusted by an additional parameter r (r > 1) for m(ei) decreases with the

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

increase of i and a larger weight denoted as (1+αt)i 1 (Line 12) to a smaller m(ei) means we do not neglect the accepted elements with small marginal gains. With the help of this adjusting parameter r, the quality of the solution set will be improved which is also veriﬁed in the experiments. Advantages. In contrast to existing algorithms for online submodular maximization, our adaptive thresholding strategy in Algorithm 1 simultaneously has three advantages: a) improving the competitive ratio. Compared to the algorithms STREAMINGGREEDY, INDEPENDENTSETIMPROVEMENT and PREEMPTION, the adaptivity of the thresholds along with t is apt to achieve a higher competitive ratio than 1/4. b) memoryand computation-efﬁciency. Reexamining the constants and parameters used in our adaptive strategy, we ﬁnd they are only related to the user-speciﬁed cardinality constraint k and the elements in the solution set, instead of the set of ever accepted elements as FREEDISPOSAL did. We know that for FREEDISPOSAL, which is with best competitive ratio, the size of the set of ever accepted elements is O(n) in the worst case. Thus, compared to FREEDISPOSAL, our proposed thresholding strategy is much more efﬁcient in memory and time consumption. c) less number of oracles. Since we record the marginal gains of e as to the solution set St 1 for later reuse, many oracles are avoided which is more efﬁcient especially when the function calls are time-consuming.

5 Theoretical Analysis In this section, we provide the competitive ratio of our ONLINEADAPTIVE algorithm as well as the memory and computation complexities. As provided in Lemma 2, τt is monotone non-decreasing. More precisely, we quantify the difference between the two adjacent thresholds. It is obvious that when |St 1| < k, τt τt 1 + m(vt). When |St 1| = k, we know that αt remains unchanged and αt 1 = αt = rη. The difference between τt 1 and τt is quantiﬁed by Lemma 3. Lemma 3. When |St 1| = k, let τt 1 and τt be two adjacent thresholds, we have (a) τt τt 1 + m(vt) (1 + rη)k 1m(v t) and (b) τt (1 + rη)τt 1 + m(vt) (1 + rη)km(v t) where vt is the arriving element replacing v t and v t = arg mine St 1 m(e).

Proof. When the replacement occurs, i.e., vt replaces v t when m(vt) exceeds the threshold, the set of the marginal gains changes. Let m(vt) be the p-th largest marginal gain in {m(e) : e St}. After the replacement, assume the elements in τt 1 and τt are ordered by the marginal gains. We observe that the ﬁrst p-1 elements in τt remain unchanged as to τt 1. The p-th element has changed to (1 + rη)p 1m(vt) and the i-th element in τt (p + 1 i k) is the (i-1)-th element in τt 1 multiplied by the coefﬁcient 1 + rη. Thus, we have τt τt 1 = (1 + rη)p 1m(vt) (1 + rη)k 1m(v t)

i=p+1 [(1 + rη)i 1 (1 + rη)i 2]m(ej). (1)

Since 1 + rη > 1 and m(ei) 0, Inequality (a) obviously holds. Further, we know that m(vt) m(ei) when 1 i p, we have

i=p+1 ((1 + rη)i 2)m(ei)

= τt 1 (1 + rη)k 1m(v t)

i=1 ((1 + rη)i 1)m(ei)

τt 1 (1 + rη)k 1m(v t)

i=1 ((1 + rη)i 1)m(vt)

= τt 1 (1 + rη)k 1m(v t) + 1 (1 + rη)p 1

By Equations 1 and 2, Inequality (b) holds.

Next, we show βτ is also monotone non-decreasing. Lemma 4. Let τt 1 and τt be two adjacent thresholds and βt 1, βt be the corresponding parameters. We have βt 1τt 1 βtτt, i.e., βτ is monotone non-decreasing. Moreover, if |St 1| < k, we have

βtτt βt 1τt 1 βtτt 1.

Proof. When |St 1| = k, βt 1τt 1 βtτt trivially holds since βt 1 = βt and τt 1 τt (Lemma 2). When |St 1| < k, by Algorithm 1 we have

βtτt βt 1τt 1 βt(τt 1 + m(vt)) βt 1τt 1

βt(τt 1 + βt 1

k τt 1) βt 1τt 1 βtτt 1.

Then, let S t and At be the optimal solution set and the set of all the ever accepted elements respectively at timestamp t. We obtain the upper bound of S t . Lemma 5. An upper bound of the optimal solution S t is

e At m(e) + βtτt.

Proof. By monotonicity and submodularity of f and St At, we have

f(S t ) f(S t At)

e S \At f(e|A[e] 1)

e At f(e|A[e] 1) + X

e S t \At f(e|A[e] 1)

e At f(e|S[e] 1) + X

e S t \At f(e|S[e] 1)

e At m(e) + X

e S t \At m(e).

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Since the elements in S t \At are all rejected by the algorithm and by Lemma 4, we further obtain

e At m(e) + X

β[e] 1τ[e] 1

e At m(e) + βtτt.

We are now ready to prove the competitive ratio of ONLINEADAPTIVE. Theorem 1. Algorithm 1 is 1 ρk,r -competitive, where

ρk,r = 1 + krη + 1 + krη (1 + rη)k 1.

Proof. We ﬁrst consider the case that |St 1| < k and At = St. We have

e At m(e) + βtτt = X

e St m(e) + βtτt

i=1 m(ei) + βt

i=1 [(1 + αt)i 1 m(ei)]

(1 + βt(1 + αt)k 1)

By Lemma 1 and monotonicity of αt, we obtain

f(S t ) (1 + 1 + kαt (1 + αt)k 1 (1 + αt)k 1)f(St)

(1 + kαt + 1 + kαt (1 + αt)k 1)f(St)

(1 + krη + 1 + krη (1 + rη)k 1)f(St).

Next, we consider the case |St 1| = k. For simplicity, let Ut = P e At m(e) + βtτt denote the upper bound of f(S t ) and Lt = P e St m(e) denote the lower bound of f(St). Since we have proved f(S t ) Ut ρk,r Lt ρk,rf(St) for the case: |St 1| < k, it is enough to prove the case: |St 1| = k by induction. If ONLINEADAPTIVE rejects vt, we have Ut = Ut 1 and Lt = Lt 1, then we obtain f(S t ) Ut = Ut 1 ρk,r Lt 1 = ρk,r Lt ρk,rf(St). If the algorithm accepts vt, we have

f(S t ) Ut = Ut 1 + m(vt) + βtτt βt 1τt 1 = Ut 1 + m(vt) + βt(τt τt 1)

Ut 1 + (1 + βt)m(vt) + βtrητt 1 βt(1 + rη)km(v t)

Ut 1 + (1 + βt)m(vt) + krηm(vt) βt(1 + rη)km(v t)

= Ut 1 + ρk,rm(vt) ρk,rm(v t)

ρk,r Lt 1 + ρk,rm(vt) ρk,rm(v t)

= ρk,r(Lt 1 + m(vt) m(v t)) = ρk,r Lt ρk,rf(St).

Here, the second and third inequalities hold by Lemma 3 and Algorithm 1 respectively, and the fourth inequality is induced from the case |St 1| < k. Overall, the competitive ratio of our ONLINEADAPTIVE algorithm is at least 1 ρk,r .

Corollary 1. For r = 1, the competitive ratio of ONLINEADAPTIVE is at least 1

4. Moreover, the competitive ratio is at least 0.2959 when k 4, and 0.3178 when k approaches inﬁnity.

Proof. Due to the proof of Theorem 1, ρk,r is set to the maximum of g(η) = 1 + kη + 1+kη (1+η)k 1 where η is the positive

root of the equation (1+x)(k+1) = kx+x+2. When k = 1, we have η = 1, and the competitive ratio is at least 1 ρ1,1 = 1

4. When k = 4, we have η 0.2756, so the competitive ratio is at least 1 ρ4,1 1 3.3784 0.2959. When k approaches inﬁnity, we can easily have ρk,r g( η

k). Thus, we estimate the maximum of g( η

k) = 1 + η + 1+η (1+ η

k )k 1. When k approaches

inﬁnity, we have (1 + η

k)k eη, g( η

k) = 1 + η + 1+η eη 1 and g( η

k) reaches its maximum 3.1461 at η 1.1461. Hence, when k approaches inﬁnity, the competitive ratio is at least 1 3.1461 0.3178.

Finally, we provide the memory complexity and the number of oracles of ONLINEADAPTIVE.

Corollary 2. Algorithm 1 uses O(k) memory and the number of oracles is just one per element.

The results can be easily derived from the process of Algorithm 1. Moreover, we maintain the result set by a priority queue where the marginal gains are ordered in the queue.

6 Experimental Evaluation

In this section, we experimentally evaluate our proposed ONLINEADAPTIVE algorithm on three applications corresponding to ﬁve real-world datasets, i.e., the ﬁrst three datasets to the ﬁrst application and the last two datasets to the second and third applications respectively. We begin by comparing ONLINEADAPTIVE with its non-adaptive version to assess the advantages of exploiting adaptive thresholds, and subsequently compare ONLINEADAPTIVE with ﬁve wellestablished algorithms to evaluate its quality and efﬁciency. All the experiments were conducted on a machine running Ubuntu 20.04 with an Intel(R)Xeon(R) E3-1225 3.30GHz CPU and 16 GB main memory.

Name Size Dim. Reference

Forest Cover 286,048 10 [Liu et al., 2008] Credit Card Fraud 284,807 29 [Pozzolo et al., 2015] KDDCup99 48,113 79 [Campos et al., 2016]

You Tube 9,010 4 [Kazemi et al., 2019]

Twitter 42,104 - [Kazemi et al., 2019]

Table 1: Five datasets corresponding to the listed applications.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Online Adaptive Online Non Adaptive

1 3 5 7 9 r

Function Value

1 3 5 7 9 r

Function Value

1 3 5 7 9 r

Relative Performance

1 3 5 7 9 r

Relative Performance

1 3 5 7 9 r

(a) Forest Cover

1 3 5 7 9 r

(b) You Tube

Figure 1: Comparison between ONLINEADAPTIVE and its nonadaptive version ONLINENONADAPTIVE for different rs when k = 30 on the Forest Cover and You Tube datasets. Note that the second row shows the relative performance to ONLINENONADAPTIVE. We omit the results on number of oracles for they are the same.

6.1 Applications and Datasets

We maximize the corresponding submodular functions in the following three applications. Online Kernel Prototype Selection. In this application, we want to select a set S with k kernel prototypes from each in the three datasets as shown in the top group in Table ??, and each element e of the datasets is a multi-dimensional representative vector. The goal is to maximize the logdeterminant f(S) = 1 2 log det(I + κMS) using Gaussian

kernel KG(x, y) = exp( ||x y||2 2 2σ2 ). Here, I denotes the identity matrix, κ, σ R+ are parameters, and MS denotes the kernel matrix based on all pairs of the elements in S, where MS[i][j] = KG(ei, ej) for a pair of elements (ei, ej). This function is shown to be submodular [Schlegel et al., 2017], and we set κ = 1 and σ = 1 2

d in our experiments, where d denotes the dimensionality of the element in each dataset. Online Video Summarization. In this application, we want to select a set S with k representative frames from the You Tube dataset as shown in Table ??, and each element e of the dataset is a 4-dimensional representative vector compressed from a frame. The goal is to maximize the log-determinant f(S) = log det(I + κMS) using Laplacian kernel KL(x, y) = exp( ||x y||2

σ ), where MS[i][j] = KL(ei, ej). Note that the submodular function and the kernel function used here are different from those used in the ﬁrst application, and we set κ = 10 and σ = 1 in our experiments. Online Text Summarization. In this application, we want to select a set S with k tweets from the Twitter dataset as shown in Table ??. Each element e in the ground set V is a

tweet with a group of keywords We and a number of retweets Ne. The score of a word w We for a tweet e is deﬁned by val(w, e) = Ne and val(w, e) = 0 if w / We. Let W be the general set including all the words in the ground set V . The goal is to maximize f(S) = P w W p P e S val(w, e) and it is shown to be submodular [Kazemi et al., 2019].

6.2 Baselines and Evaluation Metrics In the ﬁrst experiment, we use the non-adaptive version of ONLINEADAPTIVE, called ONLINENONADAPTIVE for comparison. For all the timestamps t {1, 2, . . . , n}, ONLINENONADAPTIVE sets all αts to the constant rη. Thus, βts also remain unchanged. In the second experiment, we use the classical ofﬂine algorithm GREEDY and the four abovementioned online algorithms, INDEPENDENTSETIMPROVEMENT, STREAMINGGREEDY, PREEMPTION, and FREEDISPOSAL for comparison. The GREEDY algorithm is not an online algorithm but with the best solution quality, and we put GREEDY here to show how far the online algorithms are from it. Moreover, since PREEMPTION accepts a parameter c and achieves a competitive ratio c (c+1)2 , we set c = 1 to achieve its best ratio 1

4 in our experiments. To comprehensively evaluate the performance of the algorithms, we employ the following four metrics: Function Value: The function value of the selected solution set S, i.e., f(S), which intuitively reﬂects the algorithm s variation trends and its performance in terms of effectiveness. Relative Performance: The relative performance in terms of function value to the speciﬁed algorithm, i.e., ONLINENONADAPTIVE or GREEDY, which facilitates the comparison among the algorithms for effectiveness. Runtime: The total runtime in seconds, which provides an intuitive reﬂection of the algorithms efﬁciency and feasibility. Number of Oracles: The number of total oracles, which reﬂects efﬁciency and feasibility of the algorithms. Note that larger values are preferred for the ﬁrst two metrics while smaller values are better for the last two metrics. In addition, runtime cannot be wholly substituted by number of oracles since for different applications, the running time of an oracle varies a lot. Our code is publicly available 1.

6.3 Performance of Our Algorithm To illustrate the advantages of employing the adaptive thresholding strategy, we ﬁrst compare ONLINEADAPTIVE with its non-adaptive version ONLINENONADAPTIVE by varying r {1, 3, 5, 7, 9} while ﬁxing the solution size k to 30. Since they have exactly the same number of total oracles, we omit the results with respect to this metric. As shown in Figure 1, their function values all exhibit their minimums at r = 1 and increase with r, which once again emphasizes the phenomenon that using theoretically optimal parameter, i.e. r = 1, does not result in the best actual performance. The reason is that the thresholds for ﬁltering elements become progressively more suitable to increase f(S) with the growth of r which improves the quality of the solution set.

1https://github.com/dcsjzh/Online Adaptive

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

Greedy Free Disposal

Independent Set Improvement Online Adaptive(r=9)

Streaming Greedy Online Adaptive(r=k)

10 20 30 40 50 k

Function Value

10 20 30 40 50 k

Function Value

10 20 30 40 50 k

Function Value

10 20 30 40 50 k

Function Value

10 20 30 40 50 k

Function Value

10 20 30 40 50 k

Relative Performance

10 20 30 40 50 k

Relative Performance

10 20 30 40 50 k

Relative Performance

10 20 30 40 50 k

Relative Performance

10 20 30 40 50 k

Relative Performance

10 20 30 40 50 k

10 20 30 40 50 k

10 20 30 40 50 k

10 20 30 40 50 k

10 20 30 40 50 k

10 20 30 40 50 k

104 105 106 107 108

Num. Oracles

(a) Forest Cover

10 20 30 40 50 k

104 105 106 107 108

Num. Oracles

(b) Credit Card Fraud

10 20 30 40 50 k

103 104 105 106 107

Num. Oracles

(c) KDDCup99

10 20 30 40 50 k

102 103 104 105 106

Num. Oracles

(d) You Tube

10 20 30 40 50 k

104 105 106 107 108

Num. Oracles

(e) Twitter

Figure 2: Comparison between GREEDY(an ofﬂine algorithm), INDEPENDENTSETIMPROVEMENT, STREAMINGGREEDY, PREEMPTION, FREEDISPOSAL, and ONLINEADAPTIVE by varying cardinality constraint k. Note that the second row shows the relative performances of the online algorithms to GREEDY, and since FREEDISPOSAL fails to ﬁnish within the speciﬁed time, i.e., three days, on the ﬁrst four datasets, i.e., Forest Cover, Credit Card Fraud, KDDCup99 and You Tube, we only present its results on the last dataset, i.e., Twitter.

Since ONLINENONADAPTIVE selects at least k elements earlier, it has better solution quality when r is small, e.g., r = 3. However, when both of the algorithms have selected k elements, the adaptive thresholding strategy of ONLINEADAPTIVE results in superior quality for the ONLINEADAPTIVE algorithm only accepts the elements above the thresholds and is with a smaller number of replacements. Thus, ONLINEADAPTIVE consumes less time than the ONLINENONADAPTIVE algorithm.

To evaluate the effectiveness and efﬁciency of ONLINEADAPTIVE, we compare it with ﬁve well-established algorithms by varying solution size k {10, 20, 30, 40, 50}. Here, we use ONLINEADAPTIVE with r = 9 and ONLINEADAPTIVE with r = k, where the former is due to its good performance as shown in Figure 1 and the latter comes from a simple fact that users are not required to specify r. Note that since FREEDISPOSAL uses the set of all ever accepted elements to evaluate each arriving element, it fails to ﬁnish within the speciﬁed time, i.e., three days, on the ﬁrst four datasets. Thus, we only present its results on the Twitter dataset. As shown in Figure 2, except the ofﬂine algorithm GREEDY, our ONLINEADAPTIVE(r = 9) and ONLINEADAPTIVE(r = k) algorithms both exhibit superior performance in terms of function value, runtime and number of or-

acles. As k increases, it can be observed that ONLINEADAPTIVE(r = 9), ONLINEADAPTIVE(r = k) and FREEDISPOSAL maintain relatively good performance compared to GREEDY, while the relative performances of the other algorithms tend to decrease. This is due to the fact that their competitive ratios increase with k. Moreover, ONLINEADAPTIVE(r = k) performs almost the best among them.

7 Conclusion In this paper, we investigated the online submodular maximization problem under cardinality constraint. We proposed the ONLINEADAPTIVE algorithm with the adaptive thresholding strategy that not only simultaneously achieved the current state-of-the-art in terms of competitive ratio, memory complexity, running time and number of oracles, but also demonstrated signiﬁcantly better empirical performance in real-world applications than existing online algorithms. Especially, the competitive ratio of ONLINEADAPTIVE is at least 1/4, and is about 0.2959 when k 4 and approaches 0.3178 as k tends to inﬁnity. Besides, it just uses O(k) memory and performs one oracle per element. An interesting future work is to generalize the ONLINEADAPTIVE framework to weakly submodular functions in the online setting.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

References [Ashkan et al., 2015] Azin Ashkan, Branislav Kveton, Shlomo Berkovsky, and Zheng Wen. Optimal greedy diversity for recommendation. In Proceedings of the International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 1742 1748, 2015. [Badanidiyuru et al., 2014] Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Streaming submodular maximization: massive data summarization on the ﬂy. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 671 680, 2014. [Bateni et al., 2013] Mohammad Hossein Bateni, Mohammad Taghi Hajiaghayi, and Morteza Zadimoghaddam. Submodular secretary problem and extensions. ACM Transactions on Algorithms, 9(4):32:1 32:23, 2013. [Becker et al., 2022] Ruben Becker, Gianlorenzo D Angelo, Sajjad Ghobadi, and Hugo Gilbert. Fairness in inﬂuence maximization through randomization. Journal of Artiﬁcial Intelligence Research, 73:1251 1283, 2022. [Bilmes, 2023] Jeff Bilmes. Submodular optimization. In Kevin P. Murphy, editor, Probabilistic Machine Learning: Advanced Topics. MIT Press, 2023. [Buchbinder et al., 2015] Niv Buchbinder, Moran Feldman, and Roy Schwartz. Online submodular maximization with preemption. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1202 1216, 2015. [Buchbinder et al., 2019] Niv Buchbinder, Moran Feldman, and Roy Schwartz. Online submodular maximization with preemption. ACM Transactions on Algorithms, 15(3):30:1 30:31, 2019. [Buschj ager et al., 2021] Sebastian Buschj ager, Philipp-Jan Honysz, Lukas Pfahler, and Katharina Morik. Very fast streaming submodular function maximization. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), pages 151 166, 2021. [Campos et al., 2016] Guilherme Oliveira Campos, Arthur Zimek, J org Sander, Ricardo J. G. B. Campello, Barbora Micenkov a, Erich Schubert, Ira Assent, and Michael E. Houle. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 30(4):891 927, 2016. [Chakrabarti and Kale, 2015] Amit Chakrabarti and Sagar Kale. Submodular maximization meets streaming: matchings, matroids, and more. Mathematical Programming, 154:225 247, 2015. [Chan et al., 2017] T.-H. Hubert Chan, Zhiyi Huang, Shaofeng H.-C. Jiang, Ning Kang, and Zhihao Gavin Tang. Online submodular maximization with free disposal: Randomization beats ¼ for partition matroids. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1204 1223, 2017.

[Chan et al., 2018] T.-H. Hubert Chan, Zhiyi Huang, Shaofeng H.-C. Jiang, Ning Kang, and Zhihao Gavin Tang. Online submodular maximization with free disposal. ACM Transactions on Algorithms, 14(4):56:1 56:29, 2018.

[Chekuri et al., 2015] Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. Streaming algorithms for submodular function maximization. In Proceedings of the International Colloquium on Automata, Languages, and Programming (ICALP), pages 318 330, 2015.

[Ene and Nguyen, 2022] Alina Ene and Huy Nguyen. Streaming algorithm for monotone k-submodular maximization with cardinality constraints. In Proceedings of the International Conference on Machine Learning (ICML), pages 5944 5967, 2022.

[Gomes and Krause, 2010] Ryan Gomes and Andreas Krause. Budgeted nonparametric learning from data streams. In Proceedings of the International Conference on Machine Learning (ICML), pages 391 398, 2010.

[Hajiaghayi et al., 2004] Mohammad Taghi Hajiaghayi, Robert Kleinberg, and David C. Parkes. Adaptive limitedsupply online auctions. In Proceedings of the ACM Conference on Electronic Commerce (EC), page 71 80, 2004.

[Kazemi et al., 2019] Ehsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, and Amin Karbasi. Submodular streaming in all its glory: Tight approximation, minimum memory and low adaptive complexity. In Proceedings of the International Conference on Machine Learning (ICML), pages 3311 3320, 2019.

[Krause and Golovin, 2014] Andreas Krause and Daniel Golovin. Submodular function maximization. In Tractability: Practical Approaches to Hard Problems, pages 71 104. 2014.

[Kuhnle, 2021] Alan Kuhnle. Quick streaming algorithms for maximization of monotone submodular functions in linear time. In Proceedings of the International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), pages 1360 1368, 2021.

[Kumari and Bilmes, 2021] Lilly Kumari and Jeff A. Bilmes. Submodular span, with applications to conditional data summarization. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence (AAAI), pages 12344 12352, 2021.

[Liu et al., 2008] Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In Proceedings of the IEEE International Conference on Data Mining (ICDM), pages 413 422, 2008.

[Mirzasoleiman et al., 2016] Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. Fast constrained submodular maximization: Personalized data summarization. In Proceedings of the International Conference on Machine Learning (ICML), pages 1358 1367, 2016.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)

[Nemhauser et al., 1978] George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. An analysis of approximations for maximizing submodular set functions - I. Mathematical Programming, 14(1):265 294, 1978. [Norouzi-Fard et al., 2018] Ashkan Norouzi-Fard, Jakub Tarnawski, Slobodan Mitrovic, Amir Zandieh, Aidasadat Mousavifar, and Ola Svensson. Beyond 1/2approximation for submodular maximization on massive data streams. In Proceedings of the International Conference on Machine Learning (ICML), pages 3826 3835, 2018. [Pozzolo et al., 2015] Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson, and Gianluca Bontempi. Calibrating probability with undersampling for unbalanced classiﬁcation. In Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI), pages 159 166, 2015. [Schlegel et al., 2017] Matthew Schlegel, Yangchen Pan, Jiecao Chen, and Martha White. Adapting kernel representations online using submodular maximization. In Proceedings of the International Conference on Machine Learning (ICML), pages 3037 3046, 2017. [Si-Salem et al., 2024] Tareq Si-Salem, G ozde Ozcan, Iasonas Nikolaou, Evimaria Terzi, and Stratis Ioannidis. Online submodular maximization via online convex optimization. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence (AAAI), pages 15038 15046, 2024. [Streeter et al., 2009] Matthew J. Streeter, Daniel Golovin, and Andreas Krause. Online learning of assignments. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pages 1794 1802, 2009. [Wei et al., 2015] Kai Wei, Rishabh K. Iyer, and Jeff A. Bilmes. Submodularity in data subset selection and active learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 1954 1963, 2015.

Proceedings of the Thirty-Third International Joint Conference on Artiﬁcial Intelligence (IJCAI-24)