# nonmonotone_sequential_submodular_maximization__20ff0289.pdf Non-monotone Sequential Submodular Maximization Shaojie Tang1, Jing Yuan 2 1Naveen Jindal School of Management, University of Texas at Dallas 2 Department of Computer Science and Engineering, University of North Texas shaojie.tang@utdallas.edu, jing.yuan@unt.edu In this paper, we study a fundamental problem in submodular optimization known as sequential submodular maximization. The primary objective of this problem is to select and rank a sequence of items to optimize a group of submodular functions. The existing research on this problem has predominantly concentrated on the monotone setting, assuming that the submodular functions are non-decreasing. However, in various real-world scenarios, like diversity-aware recommendation systems, adding items to an existing set might negatively impact the overall utility. In response, we propose to study this problem with non-monotone submodular functions and develop approximation algorithms for both flexible and fixed length constraints, as well as a special case with identical utility functions. The empirical evaluations further validate the effectiveness of our proposed algorithms in the domain of video recommendations. Introduction Submodular optimization is a central problem in machine learning with various applications in a wide range of fields, including data summarization (Lin and Bilmes 2011), sparse reconstruction (Das and Kempe 2011), active learning (Golovin and Krause 2011; Tang and Yuan 2022), and viral marketing (Tang and Yuan 2020). These formulations aim to select a subset of items that maximizes a submodular function. However, in many real-world applications, the objective is not only to select items but also to rank them in a specific order (Azar and Gamzu 2011; Tschiatschek, Singla, and Krause 2017; Tang and Yuan 2021b). This motivates the study of sequential submodular maximization (Asadpour et al. 2022; Zhang, Tatti, and Gionis 2022), a fundamental problem in submodular optimization. This problem involves selecting and ranking a group of k items from a ground set V . The goal is to maximize the weighted summation of k submodular functions, denoted as f1, , fk : 2V R+, where each function fj takes the first j items from the ranking sequence as input. Formally, the objective is to find a feasible sequence π = {π1, , πk} over items in V that maximizes the value of F(π) def = P j [k] λj fj(π[j]) where λj denotes the weight of function j and π[j] def = {π1, , πj} Copyright c 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. denotes the first j items of π. This problem, which captures the position-bias in item selection, has a wide range of applications, including sequential active learning and recommendation systems (Zhang, Tatti, and Gionis 2022). For instance, it can be applied to tackle challenges in ranking products on online retail platforms (Asadpour et al. 2022). Platforms like Amazon and Airbnb face the task of not only selecting a subset of products or rooms to showcase but also arranging them in a vertical list format to provide customers with an organized and customer-friendly browsing experience. Shoppers scroll through this list, depending on their level of patience, and may potentially make a purchase from the products displayed. The platform s objective is to optimize the selection and ranking of products to maximize the likelihood of a purchase. Interestingly, these applications can be framed as a problem of sequential submodular maximization. In this context, the parameters in F(π) can be interpreted as follows: We denote the set of products as V , the window size of displayed products as k, λj represents the proportion of customers with a specific patience level j (e.g., a customer with patience level j is willing to view the first j products π[j]). The function fj(π[j]) denotes the likelihood of purchase for customers with a patience level of j after seeing the first j products π[j]. Typically, fj is described as a submodular function. In this case, F(π) captures the expected probability of purchase when a customer is shown a sequence of products π. While previous research on sequential submodular maximization has primarily focused on the monotone setting, where submodular functions are assumed to be nondecreasing, the non-monotonicity of submodular functions becomes more apparent in many real-world scenarios, including feature selection (Das and Kempe 2008), maximum cut (Gotovos, Karbasi, and Krause 2015), profit maximization (Tang and Yuan 2021a), and data summarization (Mirzasoleiman, Badanidiyuru, and Karbasi 2016). One such example involves designing a diversity-aware recommendation system for a vast assortment of products spanning different categories (Lin and Bilmes 2010; Mirzasoleiman, Badanidiyuru, and Karbasi 2016; Amanatidis et al. 2020; Carbonell and Goldstein 1998). The system s primary goal is to generate a sequence of products that not only have high ratings but also effectively represent the entire collection. To The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) address this tradeoff, a commonly adopted objective function is submodular but not monotone. More details about this application can be found in the experiment section. This highlights the need to develop algorithms that can handle non-monotone submodular functions efficiently. Here are the main contributions of this paper: 1. We are the first to explore the sequential submodular maximization problem with non-monotone submodular functions. We investigate two variants: one with a flexible length constraint, where the goal is to find a sequence of items with a length of at most k, and another with a fixed length constraint, where the aim is to find a sequence of items with an exact length of k. 2. For the flexible length constraint, we introduce efficient constant-factor algorithms. In the case of the fixed length constraint, we develop an algorithm with an approximation ratio dependent on the ratio k/n, where n is the size of the ground set. When all k utility functions are identical, we provide constant-factor approximation algorithms. Additional Related Work. The traditional nonmonotone submodular optimization, where the objective is to select a set of items to maximize a non-monotone submodular function, has been extensively studied in the literature (Gharan and Vondr ak 2011; Buchbinder et al. 2014). The best-known result for this problem, subject to a cardinality constraint, is a 0.385-approximation algorithm (Buchbinder and Feldman 2019). It is crucial to emphasize that our work differs from this traditional setting. Instead of aiming to find a set of items where the order does not matter, our goal is to find a sequence of items to maximize a group of submodular functions. It will become evident later that their problem can be seen as a special case of our setting. The focus on sequences introduces a new dimension of complexity in comparison to the traditional set-based approaches. However, we draw inspiration from previous studies (Amanatidis et al. 2020; Tang 2021) and incorporate a sampling technique to tackle the challenges arising from non-monotonicity. While there are other studies that have explored position bias in submodular optimization (Tschiatschek, Singla, and Krause 2017; Alaei, Makhdoumi, and Malekian 2010), our problem formulation significantly differs from theirs. Preliminaries and Problem Statement Notations. We first introduce some useful notations. Throughout the remainder of this paper, we denote the set {1, 2, . . . , m} as [m] for any positive integer m. Given a function f, we use f(i | S) to denote the marginal utility of adding i to S, i.e., f(i | S) def = f(S {i}) f(S). We say a function f is submodular if and only if f(i | X) f(i | Y ) for any two sets X and Y such that X Y and any item i / Y . Let π = {π1, , πk} be a sequence of items, we define the operation π i as the concatenation of i to π, that is, π i def = {π1, , πk, i}. Now we are ready to introduce our research problem. Given k non-monotone submodular functions f1, , fk : 2V R+ and non-negative coefficients λ1, , λk, the objective of the Non-Monotone Sequential Submodular Maximization (NSM) problem is to find a feasible sequence π = {π1, , πk} over items in V that maximizes the value of F(π). Here F(π) def = P j [k] λj fj(π[j]) where π[j] def = {π1, , πj} denotes the first j items of π. In this paper, we adopt a non-standard notation, employing π to represent both a sequence of items and the set of items comprising this sequence. To simplify notation, define πj = if |π| < j where |π| denotes the number of non-empty items contained in π. In this paper, we consider two types of feasibility constraints. The first type, known as NSM with Flexible Length, imposes a constraint where feasible sequences can contain at most k items. The second type, known as NSM with Fixed Length, requires that all feasible sequences contain a fixed number k of items from V . Both problems have additional restrictions: the same item cannot appear multiple times in a feasible sequence, and there should be no empty slots between two items. These constraints ensure that each item is considered at most once and maintain the sequential nature of the sequence. NSM with Flexible Length A formal description of NSM with Flexible Length can be written as follows: P.1 maxπ F(π) subject to |π| k. We first provide a negative result by showing that it is impossible to find a 0.491-approximation algorithm for P.1. This result can be easily shown by setting λ1 = λ2 = , = λk 1 = 0 and λk = 1 in P.1, thereby reducing the problem to maximizing a non-monotone submodular function fk over a set of at most k items, which does not allow for a 0.491-approximation solution (Gharan and Vondr ak 2011). Thus, we establish the following lemma. Lemma 1 It is impossible to achieve a 0.491approximation for P.1. Algorithm Design. Next, we present the design of our algorithm, referred to as Sampling-Greedy, which builds upon a simple greedy approach that selects items based on their marginal utility. However, since our utility function is nonmonotone, employing a straightforward greedy strategy may result in suboptimal selections with low utility. To address this challenge, we draw inspiration from (Amanatidis et al. 2020) and introduce a sampling phase to the greedy algorithm, extending its guarantees to the non-monotone setting. We provide a detailed description of our algorithm below, which consists of two phases: 1. RANDOM SUBSET SELECTION: We begin by selecting a random subset, denoted as R, from the ground set V . Each item i V is independently included in R with a probability p [0, 1], where p is a parameter to be optimized later. This random subset serves as the initial pool of items for subsequent processing. 2. GREEDY ALGORITHM ON R: We run a greedy algorithm only on R. This algorithm operates iteratively, augmenting the current sequence by selecting an item that provides the greatest incremental utility. To be precise, consider a specific round of the greedy algorithm, let π, whose The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Algorithm 1: Sampling-Greedy 1: E = V , π = , t = 1, Q = {i E | P j {1, ,k} λj fj({i}) > 0} 2: while t k and Q = do 3: consider z = arg maxi Q P j {t, ,k} λj fj(i | π) 4: E = E \ {z} 5: let Φz Bernoulli(p) 6: if Φz = 1 then 7: π = π z, t t + 1 8: Q = {i E | P j {t, ,k} λj fj(i | π) > 0} 9: end if 10: end while 11: return π initial value is , denote the current solution. Let z arg maxi R [F(π i) F(π)] denote the item that has the largest marginal utility on top of π. If F(π z) F(π) > 0, then we append z to π (i.e., π = π z) and proceed to the next iteration. Note that F(π i) F(π) = P j {|π|+1, ,k} λj fj(i | π), observing that appending i to π only affects the utility of those functions in {f|π|+1, , fk}. This construction process continues until one of two stopping criteria is met: either the sequence reaches a length of min{k, |R|}, or the marginal utility of the remaining items becomes non-positive. This ensures that further additions would not contribute positively to the overall utility. In order to facilitate analysis, we present an alternative approach to implementing Sampling-Greedy (a detailed description of this alternative is listed in Algorithm 1). Unlike the original implementation, where the entire set R is sampled at the start of the algorithm, our alternative defers this decision. Instead, we employ a coin toss, with a success probability of p, after considering each item. This coin toss determines whether the item should be added to the solution. It is straightforward to confirm that both versions of the algorithm produce same output distributions. Performance Analysis. We next analyze the approximation ratio of Sampling-Greedy. As in Algorithm 1, we introduce a random variable Φ {0, 1}V to denote the outcome of the coin tosses, e.g., Φi {0, 1} represents the coin toss of item i. Let π = {π 1, , π l } denote the optimal solution of P.1, where l k. In the context of a specific run of our algorithm, where the coin tosses Φ is realized and the corresponding sequence returned by our algorithm is denoted as π(Φ), we partition the optimal solution π into three sets as follows: 1. Set Ocons.(Φ): This set contains all items π j in the optimal sequence π that have been considered by our algorithm before (including when) position j being filled, but were not picked due to the random coin flip Φ. That is, Ocons.(Φ) def = {π j π | π j was considered by Algorithm 1 during the selection of π(Φ)[j] but π j / π(Φ)[j]}, where π(Φ)[j] denotes the first j items of π(Φ). 2. Set Onot cons.(Φ): This set consists of all items π j in π that have not been considered during our algorithm before (including when) position j being filled. That is, Onot cons.(Φ) def = {π j π | π j was not considered by Algorithm 1 during the selection of π(Φ)[j]}. 3. Set Oovlpd.(Φ) = π \(Onot cons.(Φ) Ocons.(Φ)): This set contains those items π j in π that have been added to π(Φ) before (including when) position j being filled. That is, Oovlpd.(Φ) def = {π j π | π j π(Φ)[j]}. Let s consider a specific example to illustrate the aforementioned three sets. Suppose we have an optimal sequence π = {a, b, c, d}. And let s assume that the sequence considered by our algorithm is {b, c, d, e, f, g}, and the final sequence picked by our algorithm, based on the coin tosses Φ, is π(Φ) = {c, e, f, g}. For this example, Ocons.(Φ) = {b, d}, this is because our algorithm considered b (resp. d) before filling position 2 (resp. 4), but did not pick b (resp. d); Onot cons.(Φ) = {a}, this is because our algorithm did not consider a before filling position 1; Oovlpd.(Φ) = {c}, this is because our algorithm picked c before filling position 3. By defining this partition of π , we can effectively analyze the influence of the coin tosses Φ (or, equivalently, the random subset R in the original implementation of our algorithm) on the resultant sequence and its connection to the optimal solution. To simplify notation, we drop the random variable Φ from π(Φ), Ocons.(Φ), Onot cons.(Φ) and Oovlpd.(Φ) if it is clear from the context. Before analyzing the performance of our algorithm, we introduce some additional notations. Recall that we define πj = if j > |π|. Given a random output π from our algorithm and an optimal solution π , we define F(π π ) = P j [k] λj fj(π[j] π [j]) as the utility of the union of π and π . Here, π[j] (resp. π [j]) denotes all items from π (resp. π ) that are placed up to position j. Intuitively, π π can be interpreted as a virtual sequence where both πj and π j are added to position j for all j [k]. Similarly, we define F(π π ovlpd.) = P j [k] λj fj(π[j] (π [j] Oovlpd.)) as the utility of the union of π and Oovlpd. π . Furthermore, we define F(π π cons.) = P j [k] λj fj(π[j] (π [j] Ocons.)) as the utility of the union of π and Ocons. π . Finally, we define F(π π not cons.) = P j [k] λj fj(π[j] (π [j] Onot cons.)) as the utility of the union of π and Onot cons. π . Throughout this section, all expectations are taken over the coin tosses Φ. We first provide two technical lemmas. Lemma 2 Let p [0, 1] denote the sampling probability of each item, π denote the output from our algorithm, and π denote the optimal solution, we have p) E[F(π)] E[F(π π )]. (1) Proof: By the definition of Oovlpd., all items from π [j] Oovlpd. must appear in π[j] for all j [k]. Hence, π[j] The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) (π [j] Oovlpd.) = π[j]. It follows that for all j [k], fj(π[j] (π [j] Oovlpd.) (π [j] Onot cons.) (π [j] Ocons.)) = fj(π[j] (π [j] Onot cons.) (π [j] Ocons.)). (2) Therefore, F(π π ) = P j [k] λj fj(π[j] π [j]) = P j [k] λj fj(π[j] (π [j] Oovlpd.) (π [j] Onot cons.) (π [j] Ocons.)) = P j [k] λj fj(π[j] (π [j] Onot cons.) (π [j] Ocons.)) where the second equality is by the fact that Ocons., Onot cons. and Oovlpd. is a partition of π , and the third equality is by equality (2). For simplicity, let F(π π not cons. π cons.) = P j [k] λj fj(π[j] (π [j] Onot cons.) (π [j] Ocons.)), the above equality indicates that F(π π ) = F(π π not cons. π cons.). It follows that E[F(π π )] = E[F(π)] + E[F(π π not cons.) F(π)] +E[F(π π ) F(π π not cons.)] = E[F(π)] + E[F(π π not cons.) F(π)] +E[F(π π not cons. π cons.) F(π π not cons.)] (3) where the second equality is by the observation that F(π π ) = F(π π not cons. π cons.). Observe that F(π π not cons. π cons.) F(π π not cons.) j [k] λj fj(π[j] (π [j] Onot cons.) (π [j] Ocons.)) j [k] λj fj(π[j] (π [j] Onot cons.)) j [k] λj fj(π[j] (π [j] Ocons.)) X j [k] λj fj(π[j]) = F(π π cons.) F(π) where the inequality is by the assumption that fj is submodular for all j [k], and the observations that π[j] π[j] (π [j] Onot cons.), and π [j] Ocons. does not overlap with π[j] (π [j] Onot cons.). Here π [j] Ocons. does not overlap with π[j] (π [j] Onot cons.) is because Ocons. Onot cons. = (this is by the definitions of these two sets) and Ocons. π = (this is due to the fact that any item that is considered but not picked by our algorithm must not appear in π, noting that each item can be considered only once). This, together with (3), implies that E[F(π π )] E[F(π)] + E[F(π π not cons.) F(π)] +E[F(π π cons.) F(π)]. (4) To prove this lemma, it suffices to show that p E[F(π π cons.) F(π)] E[F(π)] and E[F(π π not cons.) F(π)] E[F(π)]. (5) The proofs for these two inequalities are provided in the technical report (Tang and Yuan 2023). Lemma 3 Let p [0, 1] denote the sampling probability of each item, π denote the output from our algorithm, and π denote the optimal solution, we have E[F(π π )] (1 p) F(π ). Proof: We begin by presenting a result that links random sampling to submodular maximization. Lemma 4 (Lemma 2.2 of (Buchbinder et al. 2014)). Consider a submodular set function f : 2V R. Let X be a subset of V , and let X(p) denote a sampled subset obtained by including each item of X independently with a probability of at most p (not necessarily independent). The expected value of f(X(p)) is at least (1 p) times the value of f( ). In other words, E[f(X(p))] (1 p)f( ). We define hj : 2V R as follows: hj(T) = fj(T π [j]). It can be easily verified that hj is a submodular function, and hj( ) = fj(π [j]). By applying the above lemma to the function hj and considering that the items in π[j] are chosen with a probability of at most p, we can conclude that: E[fj(π[j] π [j])] = E[hj(π[j])] (1 p) hj( ) = (1 p) fj(π [j]) for all j [k]. The following chain proves this lemma: E[F(π π )] = E[P j [k] λj fj(π[j] π [j])] = P j [k] λj E[fj(π[j] π [j])] P j [k] λj (1 p) fj(π [j]) = (1 p) P j [k] λj fj(π [j]) = (1 p) F(π ). Lemmas 2 and 3 imply the following main theorem. Theorem 1 Let p [0, 1] be the sampling probability of each item, π be the output from our algorithm, and π denote the optimal solution, we have E[F(π)] p(1 p) 2p+1 F(π ). Corollary 1 Let p = 2 , we have E[F(π)] 0.134 F(π ). Remark 1: For the case when all utility functions fj are monotone and submodular, Algorithm 1, by setting p = 1, can achieve an improved approximation ratio of 1/2. This is because if p = 1, then Ocons. = , observing that when p = 1, all considered items must be added to the solution. It follows that inequality (4) is reduced to E[F(π π )] E[F(π)] + E[F(π π not cons.) F(π)]. This can be rewritten as E[F(π π )] E[F(π π not cons.)]. This, together with inequality (5), implies that E[F(π)] E[F(π π )]/2. Moreover, if fj are monotone, it is easy to verify that E[F(π π )] F(π ). It follows that E[F(π)] E[F(π π )]/2 F(π )/2. Remark 2: When all utility functions are homogeneous (i.e., f : fj = f for all j [k]), Algorithm 1 becomes independent of the specific values of λj. This is because in Line 3 of Algorithm 1, finding z that maximizes Pk j=t λj fj(i | π) is equivalent to maximizing f(i | π), assuming fj = f for all j [k]. Thus, the selection of z becomes independent of λj, showcasing the algorithm s robustness against the knowledge of λj. In the context of recommendation systems, where λj represents the distribution of customers patience levels, which may only be estimated approximately or remain unknown, our algorithm consistently delivers highquality solutions. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) NSM with Fixed Length We next study the case with fixed length constraints. A formal description of NSM with Fixed Length can be written as follows: P.2 maxπ F(π) subject to |π| = k. By setting λ1 = λ2 = , = λk 1 = 0 and λk = 1 in P.2, our problem reduces to maximizing a non-monotone submodular function fk over a set of exactly k items, which does not admit a 0.491-approximation solution (Gharan and Vondr ak 2011). Hence, we establish the following lemma. Lemma 5 It is impossible to achieve a 0.491approximation for P.2. Approximation Algorithms for P.2 The basic idea of our algorithm is to first apply Algorithm 1 to calculate a sequence π with a maximum size of k, then, if required, we supplement π with additional backup items to ensure that it reaches a size of exactly k. We provide a detailed description of our algorithm below: 1. Apply Algorithm 1 to calculate a sequence π with a maximum size of k. 2. If |π| = k, then return π as the final output. Otherwise, randomly select k |π| items B from the set V \π, and append them to π in an arbitrary order. The resulting modified set π is then returned as the final output πp2. Note that problem P.1 is a relaxed problem of P.2, observing that any feasible solution of P.2 is also a feasible solution of P.1. As a consequence, we have the following lemma. Lemma 6 Let π and πo denote the optimal solution of P.1 and P.2 respectively, we have F(π ) F(πo). Now we are ready to present the performance bound of our algorithm. The proof of this theorem is moved to the technical report (Tang and Yuan 2023). Theorem 2 Let πp2 denote the solution returned from our algorithm, we have EΦ,B[F(πp2)] (1 k 2p+1 F(πo) where the expectation is taken over the random coin tosses Φ (phase 1) and the random backup set B (phase 2). If we set p = 2 in the first phase, we have the following corollary. Corollary 2 Let p = 2 , we have EΦ,B[F(πp2)] (1 k n) 0.134 F(πo). The above corollary implies that when the size constraint k remains relatively small compared to the overall number of items n, our algorithm is capable of achieving a good approximation of the optimal result. This observation is particularly relevant in scenarios such as recommendation systems where k is often much smaller than n. Constant-Factor Approximation Algorithm for Homogeneous Functions Next, we will examine an important special case of P.2, where we assume that all utility functions are homogeneous. This means that there exists a submodular function f : 2V R+ such that fj = f for all j [k]. A formal description of this special case is listed in below. j [k] λj f(π[j]) subject to |π| = k. Next, we present constant-factor approximation algorithms for this special case, which represents an improvement over the general case where we can only achieve an approximation ratio based on k/n. We consider two cases: when k < n/2 and when k n/2. The case when k < n/2 is straightforward, as one can directly apply our algorithm developed for the general case (as presented in the previous section) to achieve a constantfactor approximation. This is because when k < n/2, we have 1 k/n > 1/2. This, together with Corollary 2, implies the following corollary. Corollary 3 Assume k < n/2, let p = 2 , we have EΦ,B[F(πp2)] 0.134 2 F(πo) where πo is the optimal solution of P.2. The rest of this section focuses on the case when k n/2. For the sake of simplicity, let s assume that n is an even number. Considering the optimal solution πo, we can partition its utility F(πo) into two parts: F(πo) = P j [ n 2 ] λj f(πo [j]) + P j { n 2 +1, ,k} λj f(πo [j]). Here, the first part represents the utility obtained from the first n/2 functions, while the second part represents the utility obtained from the remaining k n/2 functions. We can further divide the analysis into two subcases based on the relationship between the utilities from these two parts. Intuitively, if the first part dominates the overall utility, it suffices to find a sequence that maximizes the utility from the first part. Otherwise, if the second part contributes significantly to the overall utility, our focus is on finding a solution maximizes the second part. Although we initially have no information about the relationship between the two parts of the utility, given that πo is unknown, we can make a guess regarding this relation and consider both possibilities. We can evaluate the performance of each case and choose the one that yields the best overall utility as the final output. The Case When P j [ n 2 ] λj f(πo [j]) P j { n 2 +1, ,k} λj f(πo [j]) We first study the case when P j [ n 2 ] λj f(πo [j]) P j { n 2 +1, ,k} λj f(πo [j]), that is, the first n/2 functions contributes more than half of the total utility. In this scenario, our objective is to find a sequence that maximizes the utility from the first part (a formal description of this problem is listed in P.2a). P.2a maxπ P 2 ] λj f(π[j]) subject to |π| = k. Let πa denote the optimal solution of P.2a, we have X 2 ] λj f(πa [j]) X 2 ] λj f(πo [j]) 1 2 F(πo) (6) where the first inequality is because πo is a feasible solution of P.2a and the assumption that πa is an optimal solution of The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) P.2a, and the second inequality is by the fact that F(πo) = P j [ n 2 ] λj f(πo [j]) + P j { n 2 +1, ,k} λj f(πo [j]) and the assumption that P j [ n 2 ] λj f(πo [j]) P j { n 2 +1, ,k} λj f(πo [j]). Now we are ready to present our algorithm. It is easy to verify that if we replace the constraint |π| = k in P.2a by |π| = n 2 , it does not affect the value of the optimal solution. This is because those items placed after position n/2 do not affect the utility of the first n/2 functions anyway. A formal description of this variant is listed in P.2b. P.2b maxπ P 2 ] λj f(π[j]) subject to |π| = n Let πb denote the optimal solution of P.2b, we have 2 ] λj f(πb [j]) = X 2 ] λj f(πa [j]) 1 2 F(πo) (7) where the equality is because, as previously discussed, P.2a and P.2b share the same value of the optimal solution, and the inequality is by inequality (6). We can utilize our algorithm designed for the general case to solve P.2b and derive a solution denoted as π. Considering that the size constraint in P.2b is n 2 , we can substitute n 2 into Corollary 2 and conclude that π provides a 0.134 2 - approximation solution for P.2b, that is, 2 ] λj f(π[j]) 0.134 2 ] λj f(πb [j]). (8) However, π might not be a feasible solution for the original problem P.2a, as its size might be less than k. To ensure feasibility, we can append an arbitrary set of k n 2 items from V \ π to π to obtain the final solution π . This step ensures that the solution satisfies the size constraint and makes it feasible for P.2a. More importantly, this step does not affect the utility of π, given that any items placed after position n/2 do not affect the utility of the first n/2 functions. As a result, we achieve a 0.134 4 -approximation for P.2, that is, F(π ) P j [ n 2 ] λj f(π [j]) = P j [ n 2 ] λj f(π[j]) 2 ] λj f(πb [j]) 0.134 4 F(πo) where the equality is because, as previously discussed, any items placed after position n/2 do not affect the utility of the first n/2 functions, the second inequality by inequality (8) and the third inequality is by inequality (7). This leads to the following lemma. Lemma 7 There exists a 0.134 4 -approximation solution for P.2, assuming P j [ n 2 ] λj f(πo [j]) P 2 +1, ,k} λj f(πo [j]). The Case When P j [ n 2 ] λj f(πo [j]) < P 2 +1, ,k} λj f(πo [j]) For this case, we manage to design an algorithm that achieves an approximation ratio of 0.134 4 . We move this part to the technical report (Tang and Yuan 2023). Putting It All Together Recall that we develop 0.134 4 - approximation algorithms for the case when P j [ n 2 ] λj f(πo [j]) P j { n 2 +1, ,k} λj f(πo [j]) and P j [ n 2 ] λj f(πo [j]) < P j { n 2 +1, ,k} λj f(πo [j]) respectively. Although we lack prior knowledge of the optimal solution, selecting the superior solution between the aforementioned options guarantees achieving an approximation ratio of 0.134 Experimental Evaluation We conduct experiments on real-world datasets to evaluate the impact of user type distributions in the context of video recommendation. Suppose we are dealing with a large video library, denoted as V , containing videos spanning multiple categories, each categorized as potentially overlapping subsets, namely C1, C2, . . . , Cm V . When a user provides a set of category preferences, our platform s objective is to generate a sequence of videos, denoted as π, from those specified categories that maximizes the expected user engagement P j [k] λj fj(π[j]). Recall that k denotes the maximum window size of displayed videos, and λj represents the proportion of users with a specific patience level j who are willing to view the first j videos π[j]. Denote by U the space of user types, each user type u U is specified by a pair (j, fj( )). The user considers the first j videos in the list and obtains utility fj(π[j]). The platform lacks precise knowledge of the user s exact type but is aware of the type distribution D. We consider a common characterization of fj( ) as a submodular function. Each video s has a rating ρs and we denote by wst [0, 1] some measure of the percentage of similarity between any two videos s and t. In introducing our function fj( ), we adopt the approach outlined in (Amanatidis et al. 2020). We start by considering the auxiliary objective gj(π[j]) = P s π[j] P t V wst η P s π[j] P t π[j] wst, for some η 1 (Mirzasoleiman, Badanidiyuru, and Karbasi 2016). This objective takes inspiration from maximal marginal relevance (Carbonell and Goldstein 1998), emphasizing coverage while penalizing similarity. To balance between highly-rated videos and those representing the entire collection, we employ the submodular function fj(π[j]) = α P s π[j] ρs + βgj(π[j]) for α, β 0. Datasets. We evaluate our algorithms and benchmarks on the latest Movie Lens dataset (Harper and Konstan 2015), consisting of 62, 423 movies, of which 13, 816 have both user-generated tags and ratings. Similarities, represented as wst, are computed from these tags using pairwise minimum tag vectors with more details forthcoming. Algorithms and Parameters. We compare our proposed sampling-based greedy algorithms (labeled as SG) against two baselines, namely, COVDIV and QUALITY, under both flexible length and fixed length settings. COVDIV iteratively selects an item with the largest marginal utility in gj(π[j]), i.e., the marginal relevance inspired objective, until no more items with positive marginal utility can be found. COVDIV returns a sequence of items ranked in the same order as they are selected. QUALITY is a simple ranking method that orders individual items in non-increasing quality. Here quality can be measured by average ratings or scores pre- The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Figure 1: SG achieves superior utility among all three algorithms under various user type distributions. dicted by the recommender systems. Due to space limitation and the results being similar for these metrics, we report the results with average rating metric for QUALITY. For user type distribution D, we use the normal mass function that approximates the normal distribution N(µ, σ2) to set the values for λj, j [k]. We explore the impact of user type distribution on the performance of the algorithms by varying the value of µ and σ. Each movie, denoted as i, is linked to a tag vector ti [0, 1]1128. Within this vector, each component represents the relevance score for an individual tag. We employ a widely accepted model to quantify the similarity wij between two videos, i and j, defined as wij = q P1128 l=1 (min{ti l, tj l })2 (Amanatidis et al. 2020). This metric calculates the L2 norm of the element-wise minimum between ti and tj. We set η = 35 and adjust the parameters α and β to ensure that the two components in fj( ) are roughly equal in magnitude. In each experimental set, we perform 100 rounds and present the average results as follows. Experimental Results. We measure the performance of the algorithms in terms of their expected utility with respect to various user type distributions. As shown in Figure 1, we test under a uniform user type distribution where λj = 1/k, j [k], labeled as UNIFORM on the x-axis. We also report the results under the approximated normal distribution with varying mean, µ, labeled as M-µ on the x-axis. In order to distinguish these two types of distributions, we add hatches on the bars for the uniform distribution. In our experiments, we set k = 500. Figure 1(a) and (b) show the results for NSM with flexible length, i.e., |π| k, and that for NSM with fixed length, i.e., |π| = k, respectively. It shows in Figure 1(a) that SG outperforms the benchmarks under all tested user type distributions. While the benchmarks yield an expected utility of 7.29 105 and 6.75 105 respectively under the uniform distribution, SG yields an expected utility over 1.04 106, a 43% increase. Under the approximated normal distribution, the increase of µ indicates a higher number of videos viewed by average users, leading to an increase in the expected utility for the algorithms. QUALITY always returns a sequence of size k as adding more videos always increases the sum of ratings. However, our objective function is non-monotone, similar videos added by QUALITY result in a lower expected utility as µ further increases. SG and COVDIV only add items with positive marginal utility. In our experiments, SG returns a sequence of around 400 videos, and COVDIV returns around 200 videos. As µ further increases, their overall expected utility converge since most users will view all the videos listed. We observe that SG outperforms both benchmarks under all test settings. While QUALITY solely considers the ratings of the items, COVDIV only considers their capability of representing the whole collection. SG shows a superior balance between the two. This result validates the superiority of our proposed algorithm over the benchmarks. Figure 1(b) illustrates the results for NSM with fixed length, which means all three algorithms return a sequence of 500 videos. The results for QUALITY remain the same. We observe that under the uniform distribution, SG yields a lower expected utility while COVDIV yields a higher one, compared with the case of flexible length. We also observe that under approximated normal distribution, the expected utility of SG starts to decrease as µ goes over 400, due to the negative contribution from the lastly added videos. The expected utility of COVDIV peaks at µ = 250, and then declines. The reason is that for 200 < µ < 250, the increase in the total rating is enough to compensate for the negative contribution from gj( ), which does not hold any more as µ further increases. Notably, SG outperforms the others, underscoring our method s advantage. Acknowledgements This work was supported in part by CAHSI-Google institutional Research Program. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) References Alaei, S.; Makhdoumi, A.; and Malekian, A. 2010. Maximizing sequence-submodular functions and its application to online advertising. ar Xiv preprint ar Xiv:1009.4153. Amanatidis, G.; Fusco, F.; Lazos, P.; Leonardi, S.; and Reiffenh auser, R. 2020. Fast Adaptive Non-Monotone Submodular Maximization Subject to a Knapsack Constraint. In Advances in neural information processing systems. Asadpour, A.; Niazadeh, R.; Saberi, A.; and Shameli, A. 2022. Sequential Submodular Maximization and Applications to Ranking an Assortment of Products. Operations Research. Azar, Y.; and Gamzu, I. 2011. Ranking with submodular valuations. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, 1070 1079. SIAM. Buchbinder, N.; and Feldman, M. 2019. Constrained submodular maximization via a nonsymmetric technique. Mathematics of Operations Research, 44(3): 988 1005. Buchbinder, N.; Feldman, M.; Naor, J.; and Schwartz, R. 2014. Submodular maximization with cardinality constraints. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, 1433 1452. SIAM. Carbonell, J.; and Goldstein, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, 335 336. Das, A.; and Kempe, D. 2008. Algorithms for subset selection in linear regression. In Proceedings of the fortieth annual ACM symposium on Theory of computing, 45 54. Das, A.; and Kempe, D. 2011. Submodular meets spectral: greedy algorithms for subset selection, sparse approximation and dictionary selection. In Proceedings of the 28th International Conference on International Conference on Machine Learning, 1057 1064. Gharan, S. O.; and Vondr ak, J. 2011. Submodular maximization by simulated annealing. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, 1098 1116. SIAM. Golovin, D.; and Krause, A. 2011. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. Journal of Artificial Intelligence Research, 42: 427 486. Gotovos, A.; Karbasi, A.; and Krause, A. 2015. Nonmonotone adaptive submodular maximization. In Twenty Fourth International Joint Conference on Artificial Intelligence. Harper, F. M.; and Konstan, J. A. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4): 1 19. Lin, H.; and Bilmes, J. 2010. Multi-document summarization via budgeted maximization of submodular functions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 912 920. Lin, H.; and Bilmes, J. 2011. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 510 520. Mirzasoleiman, B.; Badanidiyuru, A.; and Karbasi, A. 2016. Fast Constrained Submodular Maximization: Personalized Data Summarization. In ICML, 1358 1367. Tang, S. 2021. Beyond pointwise submodularity: Nonmonotone adaptive submodular maximization in linear time. Theoretical Computer Science, 850: 249 261. Tang, S.; and Yuan, J. 2020. Influence maximization with partial feedback. Operations Research Letters, 48(1): 24 28. Tang, S.; and Yuan, J. 2021a. Adaptive Regularized Submodular Maximization. In 32nd International Symposium on Algorithms and Computation (ISAAC 2021). Schloss Dagstuhl-Leibniz-Zentrum f ur Informatik. Tang, S.; and Yuan, J. 2021b. Cascade Submodular Maximization: Question Selection and Sequencing in Online Personality Quiz. Production and Operations Management, 30(7): 2143 2161. Tang, S.; and Yuan, J. 2022. Optimal Sampling Gaps for Adaptive Submodular Maximization. In AAAI. Tang, S.; and Yuan, J. 2023. Non-monotone Sequential Submodular Maximization. ar Xiv preprint ar Xiv:2308.08641. Tschiatschek, S.; Singla, A.; and Krause, A. 2017. Selecting sequences of items via submodular maximization. In Thirty First AAAI Conference on Artificial Intelligence. Zhang, G.; Tatti, N.; and Gionis, A. 2022. Ranking with submodular functions on a budget. Data mining and knowledge discovery, 36(3): 1197 1218. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)