# nonmonotone_sequential_submodular_maximization__20ff0289.pdf

Non-monotone Sequential Submodular Maximization

Shaojie Tang1, Jing Yuan 2

1Naveen Jindal School of Management, University of Texas at Dallas 2 Department of Computer Science and Engineering, University of North Texas shaojie.tang@utdallas.edu, jing.yuan@unt.edu

In this paper, we study a fundamental problem in submodular optimization known as sequential submodular maximization. The primary objective of this problem is to select and rank a sequence of items to optimize a group of submodular functions. The existing research on this problem has predominantly concentrated on the monotone setting, assuming that the submodular functions are non-decreasing. However, in various real-world scenarios, like diversity-aware recommendation systems, adding items to an existing set might negatively impact the overall utility. In response, we propose to study this problem with non-monotone submodular functions and develop approximation algorithms for both ﬂexible and ﬁxed length constraints, as well as a special case with identical utility functions. The empirical evaluations further validate the effectiveness of our proposed algorithms in the domain of video recommendations.

Introduction Submodular optimization is a central problem in machine learning with various applications in a wide range of ﬁelds, including data summarization (Lin and Bilmes 2011), sparse reconstruction (Das and Kempe 2011), active learning (Golovin and Krause 2011; Tang and Yuan 2022), and viral marketing (Tang and Yuan 2020). These formulations aim to select a subset of items that maximizes a submodular function. However, in many real-world applications, the objective is not only to select items but also to rank them in a speciﬁc order (Azar and Gamzu 2011; Tschiatschek, Singla, and Krause 2017; Tang and Yuan 2021b). This motivates the study of sequential submodular maximization (Asadpour et al. 2022; Zhang, Tatti, and Gionis 2022), a fundamental problem in submodular optimization. This problem involves selecting and ranking a group of k items from a ground set V . The goal is to maximize the weighted summation of k submodular functions, denoted as f1, , fk : 2V R+, where each function fj takes the ﬁrst j items from the ranking sequence as input. Formally, the objective is to ﬁnd a feasible sequence π = {π1, , πk} over items in V that maximizes the value of F(π) def = P j [k] λj fj(π[j]) where

λj denotes the weight of function j and π[j] def = {π1, , πj}

Copyright c 2024, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

denotes the ﬁrst j items of π. This problem, which captures the position-bias in item selection, has a wide range of applications, including sequential active learning and recommendation systems (Zhang, Tatti, and Gionis 2022). For instance, it can be applied to tackle challenges in ranking products on online retail platforms (Asadpour et al. 2022). Platforms like Amazon and Airbnb face the task of not only selecting a subset of products or rooms to showcase but also arranging them in a vertical list format to provide customers with an organized and customer-friendly browsing experience. Shoppers scroll through this list, depending on their level of patience, and may potentially make a purchase from the products displayed. The platform s objective is to optimize the selection and ranking of products to maximize the likelihood of a purchase. Interestingly, these applications can be framed as a problem of sequential submodular maximization. In this context, the parameters in F(π) can be interpreted as follows: We denote the set of products as V , the window size of displayed products as k, λj represents the proportion of customers with a speciﬁc patience level j (e.g., a customer with patience level j is willing to view the ﬁrst j products π[j]). The function fj(π[j]) denotes the likelihood of purchase for customers with a patience level of j after seeing the ﬁrst j products π[j]. Typically, fj is described as a submodular function. In this case, F(π) captures the expected probability of purchase when a customer is shown a sequence of products π. While previous research on sequential submodular maximization has primarily focused on the monotone setting, where submodular functions are assumed to be nondecreasing, the non-monotonicity of submodular functions becomes more apparent in many real-world scenarios, including feature selection (Das and Kempe 2008), maximum cut (Gotovos, Karbasi, and Krause 2015), proﬁt maximization (Tang and Yuan 2021a), and data summarization (Mirzasoleiman, Badanidiyuru, and Karbasi 2016). One such example involves designing a diversity-aware recommendation system for a vast assortment of products spanning different categories (Lin and Bilmes 2010; Mirzasoleiman, Badanidiyuru, and Karbasi 2016; Amanatidis et al. 2020; Carbonell and Goldstein 1998). The system s primary goal is to generate a sequence of products that not only have high ratings but also effectively represent the entire collection. To

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

address this tradeoff, a commonly adopted objective function is submodular but not monotone. More details about this application can be found in the experiment section. This highlights the need to develop algorithms that can handle non-monotone submodular functions efﬁciently. Here are the main contributions of this paper: 1. We are the ﬁrst to explore the sequential submodular maximization problem with non-monotone submodular functions. We investigate two variants: one with a ﬂexible length constraint, where the goal is to ﬁnd a sequence of items with a length of at most k, and another with a ﬁxed length constraint, where the aim is to ﬁnd a sequence of items with an exact length of k. 2. For the ﬂexible length constraint, we introduce efﬁcient constant-factor algorithms. In the case of the ﬁxed length constraint, we develop an algorithm with an approximation ratio dependent on the ratio k/n, where n is the size of the ground set. When all k utility functions are identical, we provide constant-factor approximation algorithms. Additional Related Work. The traditional nonmonotone submodular optimization, where the objective is to select a set of items to maximize a non-monotone submodular function, has been extensively studied in the literature (Gharan and Vondr ak 2011; Buchbinder et al. 2014). The best-known result for this problem, subject to a cardinality constraint, is a 0.385-approximation algorithm (Buchbinder and Feldman 2019). It is crucial to emphasize that our work differs from this traditional setting. Instead of aiming to ﬁnd a set of items where the order does not matter, our goal is to ﬁnd a sequence of items to maximize a group of submodular functions. It will become evident later that their problem can be seen as a special case of our setting. The focus on sequences introduces a new dimension of complexity in comparison to the traditional set-based approaches. However, we draw inspiration from previous studies (Amanatidis et al. 2020; Tang 2021) and incorporate a sampling technique to tackle the challenges arising from non-monotonicity. While there are other studies that have explored position bias in submodular optimization (Tschiatschek, Singla, and Krause 2017; Alaei, Makhdoumi, and Malekian 2010), our problem formulation signiﬁcantly differs from theirs.

Preliminaries and Problem Statement Notations. We ﬁrst introduce some useful notations. Throughout the remainder of this paper, we denote the set {1, 2, . . . , m} as [m] for any positive integer m. Given a function f, we use f(i | S) to denote the marginal utility of adding i to S, i.e., f(i | S) def = f(S {i}) f(S). We say a function f is submodular if and only if f(i | X) f(i | Y ) for any two sets X and Y such that X Y and any item i / Y . Let π = {π1, , πk} be a sequence of items, we deﬁne the operation π i as the concatenation of i to π, that is, π i def = {π1, , πk, i}. Now we are ready to introduce our research problem. Given k non-monotone submodular functions f1, , fk : 2V R+ and non-negative coefﬁcients λ1, , λk, the objective of the Non-Monotone Sequential Submodular Maximization (NSM) problem is to ﬁnd a feasible sequence

π = {π1, , πk} over items in V that maximizes the value of F(π). Here F(π) def = P j [k] λj fj(π[j]) where

π[j] def = {π1, , πj} denotes the ﬁrst j items of π. In this paper, we adopt a non-standard notation, employing π to represent both a sequence of items and the set of items comprising this sequence. To simplify notation, deﬁne πj = if |π| < j where |π| denotes the number of non-empty items contained in π. In this paper, we consider two types of feasibility constraints. The ﬁrst type, known as NSM with Flexible Length, imposes a constraint where feasible sequences can contain at most k items. The second type, known as NSM with Fixed Length, requires that all feasible sequences contain a ﬁxed number k of items from V . Both problems have additional restrictions: the same item cannot appear multiple times in a feasible sequence, and there should be no empty slots between two items. These constraints ensure that each item is considered at most once and maintain the sequential nature of the sequence.

NSM with Flexible Length A formal description of NSM with Flexible Length can be written as follows:

P.1 maxπ F(π) subject to |π| k.

We ﬁrst provide a negative result by showing that it is impossible to ﬁnd a 0.491-approximation algorithm for P.1. This result can be easily shown by setting λ1 = λ2 = , = λk 1 = 0 and λk = 1 in P.1, thereby reducing the problem to maximizing a non-monotone submodular function fk over a set of at most k items, which does not allow for a 0.491-approximation solution (Gharan and Vondr ak 2011). Thus, we establish the following lemma. Lemma 1 It is impossible to achieve a 0.491approximation for P.1. Algorithm Design. Next, we present the design of our algorithm, referred to as Sampling-Greedy, which builds upon a simple greedy approach that selects items based on their marginal utility. However, since our utility function is nonmonotone, employing a straightforward greedy strategy may result in suboptimal selections with low utility. To address this challenge, we draw inspiration from (Amanatidis et al. 2020) and introduce a sampling phase to the greedy algorithm, extending its guarantees to the non-monotone setting. We provide a detailed description of our algorithm below, which consists of two phases: 1. RANDOM SUBSET SELECTION: We begin by selecting a random subset, denoted as R, from the ground set V . Each item i V is independently included in R with a probability p [0, 1], where p is a parameter to be optimized later. This random subset serves as the initial pool of items for subsequent processing. 2. GREEDY ALGORITHM ON R: We run a greedy algorithm only on R. This algorithm operates iteratively, augmenting the current sequence by selecting an item that provides the greatest incremental utility. To be precise, consider a speciﬁc round of the greedy algorithm, let π, whose

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Algorithm 1: Sampling-Greedy

1: E = V , π = , t = 1, Q = {i E | P j {1, ,k} λj fj({i}) > 0} 2: while t k and Q = do 3: consider z = arg maxi Q P j {t, ,k} λj fj(i | π) 4: E = E \ {z} 5: let Φz Bernoulli(p) 6: if Φz = 1 then 7: π = π z, t t + 1 8: Q = {i E | P j {t, ,k} λj fj(i | π) > 0} 9: end if 10: end while 11: return π

initial value is , denote the current solution. Let z arg maxi R [F(π i) F(π)] denote the item that has the largest marginal utility on top of π. If F(π z) F(π) > 0, then we append z to π (i.e., π = π z) and proceed to the next iteration. Note that F(π i) F(π) = P j {|π|+1, ,k} λj fj(i | π), observing that appending i to π only affects the utility of those functions in {f|π|+1, , fk}. This construction process continues until one of two stopping criteria is met: either the sequence reaches a length of min{k, |R|}, or the marginal utility of the remaining items becomes non-positive. This ensures that further additions would not contribute positively to the overall utility. In order to facilitate analysis, we present an alternative approach to implementing Sampling-Greedy (a detailed description of this alternative is listed in Algorithm 1). Unlike the original implementation, where the entire set R is sampled at the start of the algorithm, our alternative defers this decision. Instead, we employ a coin toss, with a success probability of p, after considering each item. This coin toss determines whether the item should be added to the solution. It is straightforward to conﬁrm that both versions of the algorithm produce same output distributions. Performance Analysis. We next analyze the approximation ratio of Sampling-Greedy. As in Algorithm 1, we introduce a random variable Φ {0, 1}V to denote the outcome of the coin tosses, e.g., Φi {0, 1} represents the coin toss of item i. Let π = {π 1, , π l } denote the optimal solution of P.1, where l k. In the context of a speciﬁc run of our algorithm, where the coin tosses Φ is realized and the corresponding sequence returned by our algorithm is denoted as π(Φ), we partition the optimal solution π into three sets as follows: 1. Set Ocons.(Φ): This set contains all items π j in the optimal sequence π that have been considered by our algorithm before (including when) position j being ﬁlled, but were not picked due to the random coin ﬂip Φ. That is,

Ocons.(Φ) def = {π j π | π j was considered by Algorithm 1

during the selection of π(Φ)[j] but π j / π(Φ)[j]},

where π(Φ)[j] denotes the ﬁrst j items of π(Φ).

2. Set Onot cons.(Φ): This set consists of all items π j in π that have not been considered during our algorithm before (including when) position j being ﬁlled. That is,

Onot cons.(Φ) def = {π j π | π j was not considered by

Algorithm 1 during the selection of π(Φ)[j]}.

3. Set Oovlpd.(Φ) = π \(Onot cons.(Φ) Ocons.(Φ)): This set contains those items π j in π that have been added to π(Φ) before (including when) position j being ﬁlled. That is, Oovlpd.(Φ) def = {π j π | π j π(Φ)[j]}. Let s consider a speciﬁc example to illustrate the aforementioned three sets. Suppose we have an optimal sequence π = {a, b, c, d}. And let s assume that the sequence considered by our algorithm is {b, c, d, e, f, g}, and the ﬁnal sequence picked by our algorithm, based on the coin tosses Φ, is π(Φ) = {c, e, f, g}. For this example, Ocons.(Φ) = {b, d}, this is because our algorithm considered b (resp. d) before ﬁlling position 2 (resp. 4), but did not pick b (resp. d); Onot cons.(Φ) = {a}, this is because our algorithm did not consider a before ﬁlling position 1; Oovlpd.(Φ) = {c}, this is because our algorithm picked c before ﬁlling position 3. By deﬁning this partition of π , we can effectively analyze the inﬂuence of the coin tosses Φ (or, equivalently, the random subset R in the original implementation of our algorithm) on the resultant sequence and its connection to the optimal solution. To simplify notation, we drop the random variable Φ from π(Φ), Ocons.(Φ), Onot cons.(Φ) and Oovlpd.(Φ) if it is clear from the context. Before analyzing the performance of our algorithm, we introduce some additional notations. Recall that we deﬁne πj = if j > |π|. Given a random output π from our algorithm and an optimal solution π , we deﬁne F(π π ) = P j [k] λj fj(π[j] π [j]) as the utility of the union of π and π . Here, π[j] (resp. π [j]) denotes all items from π (resp. π ) that are placed up to position j. Intuitively, π π can be interpreted as a virtual sequence where both πj and π j are added to position j for all j [k]. Similarly, we deﬁne F(π π ovlpd.) = P j [k] λj fj(π[j] (π [j] Oovlpd.)) as the utility of the union of π and Oovlpd. π . Furthermore, we deﬁne F(π π cons.) = P j [k] λj fj(π[j] (π [j] Ocons.)) as the utility of the union of π and Ocons. π . Finally, we deﬁne F(π π not cons.) = P

j [k] λj fj(π[j] (π [j] Onot cons.)) as the utility of the union of π and Onot cons. π . Throughout this section, all expectations are taken over the coin tosses Φ. We ﬁrst provide two technical lemmas.

Lemma 2 Let p [0, 1] denote the sampling probability of each item, π denote the output from our algorithm, and π denote the optimal solution, we have

p) E[F(π)] E[F(π π )]. (1)

Proof: By the deﬁnition of Oovlpd., all items from π [j] Oovlpd. must appear in π[j] for all j [k]. Hence, π[j]

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

(π [j] Oovlpd.) = π[j]. It follows that for all j [k],

fj(π[j] (π [j] Oovlpd.) (π [j] Onot cons.)

(π [j] Ocons.))

= fj(π[j] (π [j] Onot cons.) (π [j] Ocons.)). (2)

Therefore, F(π π ) = P j [k] λj fj(π[j] π [j]) = P j [k] λj fj(π[j] (π [j] Oovlpd.) (π [j] Onot cons.) (π [j] Ocons.)) = P j [k] λj fj(π[j] (π [j] Onot cons.) (π [j] Ocons.)) where the second equality is by the fact that Ocons., Onot cons. and Oovlpd. is a partition of π , and the third equality is by equality (2). For simplicity, let F(π π not cons. π cons.) = P j [k] λj fj(π[j] (π [j] Onot cons.) (π [j] Ocons.)), the above equality indicates that F(π π ) = F(π π not cons. π cons.). It follows that

E[F(π π )] = E[F(π)] + E[F(π π not cons.) F(π)] +E[F(π π ) F(π π not cons.)] = E[F(π)] + E[F(π π not cons.) F(π)] +E[F(π π not cons. π cons.) F(π π not cons.)] (3)

where the second equality is by the observation that F(π π ) = F(π π not cons. π cons.). Observe that

F(π π not cons. π cons.) F(π π not cons.)

j [k] λj fj(π[j] (π [j] Onot cons.) (π [j] Ocons.))

j [k] λj fj(π[j] (π [j] Onot cons.))

j [k] λj fj(π[j] (π [j] Ocons.)) X

j [k] λj fj(π[j])

= F(π π cons.) F(π)

where the inequality is by the assumption that fj is submodular for all j [k], and the observations that π[j] π[j] (π [j] Onot cons.), and π [j] Ocons. does not overlap with π[j] (π [j] Onot cons.). Here π [j] Ocons. does not overlap with π[j] (π [j] Onot cons.) is because Ocons. Onot cons. = (this is by the deﬁnitions of these two sets) and Ocons. π = (this is due to the fact that any item that is considered but not picked by our algorithm must not appear in π, noting that each item can be considered only once). This, together with (3), implies that

E[F(π π )] E[F(π)] + E[F(π π not cons.) F(π)]

+E[F(π π cons.) F(π)]. (4)

To prove this lemma, it sufﬁces to show that p E[F(π π cons.) F(π)] E[F(π)] and

E[F(π π not cons.) F(π)] E[F(π)]. (5)

The proofs for these two inequalities are provided in the technical report (Tang and Yuan 2023).

Lemma 3 Let p [0, 1] denote the sampling probability of each item, π denote the output from our algorithm, and π denote the optimal solution, we have E[F(π π )] (1 p) F(π ).

Proof: We begin by presenting a result that links random sampling to submodular maximization.

Lemma 4 (Lemma 2.2 of (Buchbinder et al. 2014)). Consider a submodular set function f : 2V R. Let X be a subset of V , and let X(p) denote a sampled subset obtained by including each item of X independently with a probability of at most p (not necessarily independent). The expected value of f(X(p)) is at least (1 p) times the value of f( ). In other words, E[f(X(p))] (1 p)f( ).

We deﬁne hj : 2V R as follows: hj(T) = fj(T π [j]). It can be easily veriﬁed that hj is a submodular function, and hj( ) = fj(π [j]). By applying the above lemma to the function hj and considering that the items in π[j] are chosen with a probability of at most p, we can conclude that: E[fj(π[j] π [j])] = E[hj(π[j])] (1 p) hj( ) = (1 p) fj(π [j]) for all j [k]. The following chain proves this lemma: E[F(π π )] = E[P j [k] λj fj(π[j] π [j])] = P j [k] λj E[fj(π[j] π [j])] P j [k] λj (1 p) fj(π [j]) = (1 p) P j [k] λj fj(π [j]) = (1 p) F(π ). Lemmas 2 and 3 imply the following main theorem.

Theorem 1 Let p [0, 1] be the sampling probability of each item, π be the output from our algorithm, and π denote the optimal solution, we have E[F(π)] p(1 p)

2p+1 F(π ).

Corollary 1 Let p =

2 , we have E[F(π)] 0.134 F(π ).

Remark 1: For the case when all utility functions fj are monotone and submodular, Algorithm 1, by setting p = 1, can achieve an improved approximation ratio of 1/2. This is because if p = 1, then Ocons. = , observing that when p = 1, all considered items must be added to the solution. It follows that inequality (4) is reduced to E[F(π π )] E[F(π)] + E[F(π π not cons.) F(π)]. This can be rewritten as E[F(π π )] E[F(π π not cons.)]. This, together with inequality (5), implies that E[F(π)] E[F(π π )]/2. Moreover, if fj are monotone, it is easy to verify that E[F(π π )] F(π ). It follows that E[F(π)] E[F(π π )]/2 F(π )/2. Remark 2: When all utility functions are homogeneous (i.e., f : fj = f for all j [k]), Algorithm 1 becomes independent of the speciﬁc values of λj. This is because in Line 3 of Algorithm 1, ﬁnding z that maximizes Pk j=t λj fj(i | π) is equivalent to maximizing f(i | π), assuming fj = f for all j [k]. Thus, the selection of z becomes independent of λj, showcasing the algorithm s robustness against the knowledge of λj. In the context of recommendation systems, where λj represents the distribution of customers patience levels, which may only be estimated approximately or remain unknown, our algorithm consistently delivers highquality solutions.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

NSM with Fixed Length We next study the case with ﬁxed length constraints. A formal description of NSM with Fixed Length can be written as follows:

P.2 maxπ F(π) subject to |π| = k.

By setting λ1 = λ2 = , = λk 1 = 0 and λk = 1 in P.2, our problem reduces to maximizing a non-monotone submodular function fk over a set of exactly k items, which does not admit a 0.491-approximation solution (Gharan and Vondr ak 2011). Hence, we establish the following lemma. Lemma 5 It is impossible to achieve a 0.491approximation for P.2.

Approximation Algorithms for P.2 The basic idea of our algorithm is to ﬁrst apply Algorithm 1 to calculate a sequence π with a maximum size of k, then, if required, we supplement π with additional backup items to ensure that it reaches a size of exactly k. We provide a detailed description of our algorithm below: 1. Apply Algorithm 1 to calculate a sequence π with a maximum size of k. 2. If |π| = k, then return π as the ﬁnal output. Otherwise, randomly select k |π| items B from the set V \π, and append them to π in an arbitrary order. The resulting modiﬁed set π is then returned as the ﬁnal output πp2. Note that problem P.1 is a relaxed problem of P.2, observing that any feasible solution of P.2 is also a feasible solution of P.1. As a consequence, we have the following lemma. Lemma 6 Let π and πo denote the optimal solution of P.1 and P.2 respectively, we have F(π ) F(πo). Now we are ready to present the performance bound of our algorithm. The proof of this theorem is moved to the technical report (Tang and Yuan 2023). Theorem 2 Let πp2 denote the solution returned from our algorithm, we have EΦ,B[F(πp2)] (1 k

2p+1 F(πo) where the expectation is taken over the random coin tosses Φ (phase 1) and the random backup set B (phase 2).

If we set p =

2 in the ﬁrst phase, we have the following corollary.

Corollary 2 Let p =

2 , we have EΦ,B[F(πp2)] (1 k n) 0.134 F(πo).

The above corollary implies that when the size constraint k remains relatively small compared to the overall number of items n, our algorithm is capable of achieving a good approximation of the optimal result. This observation is particularly relevant in scenarios such as recommendation systems where k is often much smaller than n.

Constant-Factor Approximation Algorithm for Homogeneous Functions Next, we will examine an important special case of P.2, where we assume that all utility functions are homogeneous. This means that there exists a submodular function

f : 2V R+ such that fj = f for all j [k]. A formal description of this special case is listed in below.

j [k] λj f(π[j]) subject to |π| = k.

Next, we present constant-factor approximation algorithms for this special case, which represents an improvement over the general case where we can only achieve an approximation ratio based on k/n. We consider two cases: when k < n/2 and when k n/2. The case when k < n/2 is straightforward, as one can directly apply our algorithm developed for the general case (as presented in the previous section) to achieve a constantfactor approximation. This is because when k < n/2, we have 1 k/n > 1/2. This, together with Corollary 2, implies the following corollary.

Corollary 3 Assume k < n/2, let p =

2 , we have EΦ,B[F(πp2)] 0.134

2 F(πo) where πo is the optimal solution of P.2.

The rest of this section focuses on the case when k n/2. For the sake of simplicity, let s assume that n is an even number. Considering the optimal solution πo, we can partition its utility F(πo) into two parts: F(πo) = P j [ n

2 ] λj f(πo [j]) + P j { n

2 +1, ,k} λj f(πo [j]). Here, the ﬁrst part represents the utility obtained from the ﬁrst n/2 functions, while the second part represents the utility obtained from the remaining k n/2 functions. We can further divide the analysis into two subcases based on the relationship between the utilities from these two parts. Intuitively, if the ﬁrst part dominates the overall utility, it sufﬁces to ﬁnd a sequence that maximizes the utility from the ﬁrst part. Otherwise, if the second part contributes signiﬁcantly to the overall utility, our focus is on ﬁnding a solution maximizes the second part. Although we initially have no information about the relationship between the two parts of the utility, given that πo is unknown, we can make a guess regarding this relation and consider both possibilities. We can evaluate the performance of each case and choose the one that yields the best overall utility as the ﬁnal output.

The Case When P j [ n

2 ] λj f(πo [j]) P j { n

2 +1, ,k} λj f(πo [j]) We ﬁrst study the case when P j [ n

2 ] λj f(πo [j]) P j { n

2 +1, ,k} λj f(πo [j]), that is, the ﬁrst n/2 functions contributes more than half of the total utility. In this scenario, our objective is to ﬁnd a sequence that maximizes the utility from the ﬁrst part (a formal description of this problem is listed in P.2a).

P.2a maxπ P

2 ] λj f(π[j]) subject to |π| = k.

Let πa denote the optimal solution of P.2a, we have X

2 ] λj f(πa [j]) X

2 ] λj f(πo [j]) 1

2 F(πo) (6)

where the ﬁrst inequality is because πo is a feasible solution of P.2a and the assumption that πa is an optimal solution of

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

P.2a, and the second inequality is by the fact that F(πo) = P j [ n

2 ] λj f(πo [j]) + P j { n

2 +1, ,k} λj f(πo [j]) and the assumption that P j [ n

2 ] λj f(πo [j]) P j { n

2 +1, ,k} λj f(πo [j]). Now we are ready to present our algorithm. It is easy to verify that if we replace the constraint |π| = k in P.2a by |π| = n

2 , it does not affect the value of the optimal solution. This is because those items placed after position n/2 do not affect the utility of the ﬁrst n/2 functions anyway. A formal description of this variant is listed in P.2b.

P.2b maxπ P

2 ] λj f(π[j]) subject to |π| = n

Let πb denote the optimal solution of P.2b, we have

2 ] λj f(πb [j]) = X

2 ] λj f(πa [j]) 1

2 F(πo) (7)

where the equality is because, as previously discussed, P.2a and P.2b share the same value of the optimal solution, and the inequality is by inequality (6). We can utilize our algorithm designed for the general case to solve P.2b and derive a solution denoted as π. Considering that the size constraint in P.2b is n

2 , we can substitute n 2 into Corollary 2 and conclude that π provides a 0.134

2 - approximation solution for P.2b, that is,

2 ] λj f(π[j]) 0.134

2 ] λj f(πb [j]). (8)

However, π might not be a feasible solution for the original problem P.2a, as its size might be less than k. To ensure feasibility, we can append an arbitrary set of k n

2 items from V \ π to π to obtain the ﬁnal solution π . This step ensures that the solution satisﬁes the size constraint and makes it feasible for P.2a. More importantly, this step does not affect the utility of π, given that any items placed after position n/2 do not affect the utility of the ﬁrst n/2 functions. As a result, we achieve a 0.134

4 -approximation for P.2, that is, F(π ) P j [ n

2 ] λj f(π [j]) = P j [ n

2 ] λj f(π[j])

2 ] λj f(πb [j]) 0.134

4 F(πo) where the equality is because, as previously discussed, any items placed after position n/2 do not affect the utility of the ﬁrst n/2 functions, the second inequality by inequality (8) and the third inequality is by inequality (7). This leads to the following lemma.

Lemma 7 There exists a 0.134

4 -approximation solution for P.2, assuming P j [ n

2 ] λj f(πo [j]) P

2 +1, ,k} λj f(πo [j]).

The Case When P j [ n

2 ] λj f(πo [j]) < P

2 +1, ,k} λj f(πo [j]) For this case, we manage to design an algorithm that achieves an approximation ratio of 0.134

4 . We move this part to the technical report (Tang and Yuan 2023).

Putting It All Together Recall that we develop 0.134

4 - approximation algorithms for the case when P j [ n

2 ] λj f(πo [j]) P j { n

2 +1, ,k} λj f(πo [j]) and P j [ n

2 ] λj f(πo [j]) < P j { n

2 +1, ,k} λj f(πo [j]) respectively. Although we lack prior knowledge of the optimal solution, selecting the superior solution between the aforementioned options guarantees achieving an approximation ratio of 0.134

Experimental Evaluation

We conduct experiments on real-world datasets to evaluate the impact of user type distributions in the context of video recommendation. Suppose we are dealing with a large video library, denoted as V , containing videos spanning multiple categories, each categorized as potentially overlapping subsets, namely C1, C2, . . . , Cm V . When a user provides a set of category preferences, our platform s objective is to generate a sequence of videos, denoted as π, from those speciﬁed categories that maximizes the expected user engagement P j [k] λj fj(π[j]). Recall that k denotes the maximum window size of displayed videos, and λj represents the proportion of users with a speciﬁc patience level j who are willing to view the ﬁrst j videos π[j]. Denote by U the space of user types, each user type u U is speciﬁed by a pair (j, fj( )). The user considers the ﬁrst j videos in the list and obtains utility fj(π[j]). The platform lacks precise knowledge of the user s exact type but is aware of the type distribution D. We consider a common characterization of fj( ) as a submodular function. Each video s has a rating ρs and we denote by wst [0, 1] some measure of the percentage of similarity between any two videos s and t. In introducing our function fj( ), we adopt the approach outlined in (Amanatidis et al. 2020). We start by considering the auxiliary objective gj(π[j]) = P s π[j] P t V wst η P s π[j] P t π[j] wst, for some η 1 (Mirzasoleiman, Badanidiyuru, and Karbasi 2016). This objective takes inspiration from maximal marginal relevance (Carbonell and Goldstein 1998), emphasizing coverage while penalizing similarity. To balance between highly-rated videos and those representing the entire collection, we employ the submodular function fj(π[j]) = α P s π[j] ρs + βgj(π[j]) for α, β 0. Datasets. We evaluate our algorithms and benchmarks on the latest Movie Lens dataset (Harper and Konstan 2015), consisting of 62, 423 movies, of which 13, 816 have both user-generated tags and ratings. Similarities, represented as wst, are computed from these tags using pairwise minimum tag vectors with more details forthcoming. Algorithms and Parameters. We compare our proposed sampling-based greedy algorithms (labeled as SG) against two baselines, namely, COVDIV and QUALITY, under both ﬂexible length and ﬁxed length settings. COVDIV iteratively selects an item with the largest marginal utility in gj(π[j]), i.e., the marginal relevance inspired objective, until no more items with positive marginal utility can be found. COVDIV returns a sequence of items ranked in the same order as they are selected. QUALITY is a simple ranking method that orders individual items in non-increasing quality. Here quality can be measured by average ratings or scores pre-

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

Figure 1: SG achieves superior utility among all three algorithms under various user type distributions.

dicted by the recommender systems. Due to space limitation and the results being similar for these metrics, we report the results with average rating metric for QUALITY. For user type distribution D, we use the normal mass function that approximates the normal distribution N(µ, σ2) to set the values for λj, j [k]. We explore the impact of user type distribution on the performance of the algorithms by varying the value of µ and σ. Each movie, denoted as i, is linked to a tag vector ti [0, 1]1128. Within this vector, each component represents the relevance score for an individual tag. We employ a widely accepted model to quantify the similarity wij between two videos, i and j, deﬁned

as wij = q P1128 l=1 (min{ti l, tj l })2 (Amanatidis et al. 2020). This metric calculates the L2 norm of the element-wise minimum between ti and tj. We set η = 35 and adjust the parameters α and β to ensure that the two components in fj( ) are roughly equal in magnitude. In each experimental set, we perform 100 rounds and present the average results as follows. Experimental Results. We measure the performance of the algorithms in terms of their expected utility with respect to various user type distributions. As shown in Figure 1, we test under a uniform user type distribution where λj = 1/k, j [k], labeled as UNIFORM on the x-axis. We also report the results under the approximated normal distribution with varying mean, µ, labeled as M-µ on the x-axis. In order to distinguish these two types of distributions, we add hatches on the bars for the uniform distribution. In our experiments, we set k = 500. Figure 1(a) and (b) show the results for NSM with ﬂexible length, i.e., |π| k, and that for NSM with ﬁxed length, i.e., |π| = k, respectively. It shows in Figure 1(a) that SG outperforms the benchmarks under all tested user type distributions. While the benchmarks yield an expected utility of 7.29 105 and 6.75 105 respectively under the uniform distribution, SG yields an expected utility over 1.04 106, a 43% increase. Under the approximated normal distribution, the increase of

µ indicates a higher number of videos viewed by average users, leading to an increase in the expected utility for the algorithms. QUALITY always returns a sequence of size k as adding more videos always increases the sum of ratings. However, our objective function is non-monotone, similar videos added by QUALITY result in a lower expected utility as µ further increases. SG and COVDIV only add items with positive marginal utility. In our experiments, SG returns a sequence of around 400 videos, and COVDIV returns around 200 videos. As µ further increases, their overall expected utility converge since most users will view all the videos listed. We observe that SG outperforms both benchmarks under all test settings. While QUALITY solely considers the ratings of the items, COVDIV only considers their capability of representing the whole collection. SG shows a superior balance between the two. This result validates the superiority of our proposed algorithm over the benchmarks. Figure 1(b) illustrates the results for NSM with ﬁxed length, which means all three algorithms return a sequence of 500 videos. The results for QUALITY remain the same. We observe that under the uniform distribution, SG yields a lower expected utility while COVDIV yields a higher one, compared with the case of ﬂexible length. We also observe that under approximated normal distribution, the expected utility of SG starts to decrease as µ goes over 400, due to the negative contribution from the lastly added videos. The expected utility of COVDIV peaks at µ = 250, and then declines. The reason is that for 200 < µ < 250, the increase in the total rating is enough to compensate for the negative contribution from gj( ), which does not hold any more as µ further increases. Notably, SG outperforms the others, underscoring our method s advantage.

Acknowledgements

This work was supported in part by CAHSI-Google institutional Research Program.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)

References Alaei, S.; Makhdoumi, A.; and Malekian, A. 2010. Maximizing sequence-submodular functions and its application to online advertising. ar Xiv preprint ar Xiv:1009.4153. Amanatidis, G.; Fusco, F.; Lazos, P.; Leonardi, S.; and Reiffenh auser, R. 2020. Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint. In Advances in neural information processing systems. Asadpour, A.; Niazadeh, R.; Saberi, A.; and Shameli, A. 2022. Sequential Submodular Maximization and Applications to Ranking an Assortment of Products. Operations Research. Azar, Y.; and Gamzu, I. 2011. Ranking with submodular valuations. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, 1070 1079. SIAM. Buchbinder, N.; and Feldman, M. 2019. Constrained submodular maximization via a nonsymmetric technique. Mathematics of Operations Research, 44(3): 988 1005. Buchbinder, N.; Feldman, M.; Naor, J.; and Schwartz, R. 2014. Submodular maximization with cardinality constraints. In Proceedings of the twenty-ﬁfth annual ACM-SIAM symposium on Discrete algorithms, 1433 1452. SIAM. Carbonell, J.; and Goldstein, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 335 336. Das, A.; and Kempe, D. 2008. Algorithms for subset selection in linear regression. In Proceedings of the fortieth annual ACM symposium on Theory of computing, 45 54. Das, A.; and Kempe, D. 2011. Submodular meets spectral: greedy algorithms for subset selection, sparse approximation and dictionary selection. In Proceedings of the 28th International Conference on International Conference on Machine Learning, 1057 1064. Gharan, S. O.; and Vondr ak, J. 2011. Submodular maximization by simulated annealing. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, 1098 1116. SIAM. Golovin, D.; and Krause, A. 2011. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artiﬁcial Intelligence Research, 42: 427 486. Gotovos, A.; Karbasi, A.; and Krause, A. 2015. Nonmonotone adaptive submodular maximization. In Twenty Fourth International Joint Conference on Artiﬁcial Intelligence. Harper, F. M.; and Konstan, J. A. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4): 1 19. Lin, H.; and Bilmes, J. 2010. Multi-document summarization via budgeted maximization of submodular functions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 912 920.

Lin, H.; and Bilmes, J. 2011. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 510 520. Mirzasoleiman, B.; Badanidiyuru, A.; and Karbasi, A. 2016. Fast Constrained Submodular Maximization: Personalized Data Summarization. In ICML, 1358 1367. Tang, S. 2021. Beyond pointwise submodularity: Nonmonotone adaptive submodular maximization in linear time. Theoretical Computer Science, 850: 249 261. Tang, S.; and Yuan, J. 2020. Inﬂuence maximization with partial feedback. Operations Research Letters, 48(1): 24 28. Tang, S.; and Yuan, J. 2021a. Adaptive Regularized Submodular Maximization. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Schloss Dagstuhl-Leibniz-Zentrum f ur Informatik. Tang, S.; and Yuan, J. 2021b. Cascade Submodular Maximization: Question Selection and Sequencing in Online Personality Quiz. Production and Operations Management, 30(7): 2143 2161. Tang, S.; and Yuan, J. 2022. Optimal Sampling Gaps for Adaptive Submodular Maximization. In AAAI. Tang, S.; and Yuan, J. 2023. Non-monotone Sequential Submodular Maximization. ar Xiv preprint ar Xiv:2308.08641. Tschiatschek, S.; Singla, A.; and Krause, A. 2017. Selecting sequences of items via submodular maximization. In Thirty First AAAI Conference on Artiﬁcial Intelligence. Zhang, G.; Tatti, N.; and Gionis, A. 2022. Ranking with submodular functions on a budget. Data mining and knowledge discovery, 36(3): 1197 1218.

The Thirty-Eighth AAAI Conference on Artiﬁcial Intelligence (AAAI-24)