# fair_submodular_cover__1b505595.pdf Published as a conference paper at ICLR 2025 FAIR SUBMODULAR COVER Wenjing Chen Shuo Xing Samson Zhou Victoria G. Crawford Machine learning algorithms are becoming increasing prevalent in the modern world, and as a result there has been significant recent study into algorithmic fairness in order to minimize the possibility of unintentional bias or discrimination in these algorithms. Submodular optimization problems also arise in many machine learning applications, including those such as data summarization and clustering where fairness is an important concern. In this paper, we initiate the study of the Fair Submodular Cover Problem (FSC). Given a ground set U, a monotone submodular function f : 2U R 0, and a threshold τ, the goal of FSC is to find a balanced subset of U with minimum cardinality such that f(S) τ. We first introduce discrete algorithms for FSC that achieve a bicriteria approximation ratio of ( 1 ε, 1 O(ε)). We then present a continuous algorithm that achieves a (ln 1 ε, 1 O(ε))-bicriteria approximation ratio, which matches the best approximation guarantee of submodular cover without a fairness constraint. Finally, we complement our theoretical results with a number of empirical evaluations that demonstrate the efficiency of our algorithms on instances of maximum coverage. 1 INTRODUCTION From high-volume applications such as online advertising and smart devices to high-impact applications such as credit assessment, medical diagnosis, and self-driving vehicles, machine learning algorithms are increasingly prevalent in technologies and decision-making processes in the modern world. However, the amount of automated decision-making has resulted in concerns about the risk of unintentional bias or discrimination (Chouldechova, 2017; Kleinberg et al., 2018; Berk et al., 2021). For example, Chierichetti et al. (2017) noted that although machine learning algorithms may not be biased or unfair by design, they may nevertheless acquire and amplify biases already present in the training data available to the algorithms. Consequently, there has recently been significant focus on achieving algorithmic fairness for a number of fundamental problems, such as classification (Zafar et al., 2017), clustering (Chierichetti et al., 2017), data summarization (Celis et al., 2018b), and matchings (Chierichetti et al., 2019). Though various definitions have been proposed, there is no universal notion of fairness; indeed, determining the correct notion of fairness is an ongoing active line of research. In fact, Kleinberg et al. (2017) showed that three common desiderata of fairness (probabilistic calibration across classes, numerical balance across classes, and statistical parity) are inherently incompatible. Nevertheless, there has been significant focus recently (Chierichetti et al., 2017; Celis et al., 2018a;b;c; Chierichetti et al., 2019; El Halabi et al., 2020; Halabi et al., 2024) on the fairness notion that demands a solution to be balanced with respect to a sensitive attribute, such as ethnicity or gender. In this work, we focus on fairness within submodular optimization. Submodular functions informally satisfy a diminishing returns property that is exhibited by many objective functions for fundamental optimization problems in machine learning. In particular, a function f : 2U R is submodular if for every X Y U and for every x U \ Y , we have f(X {x}) f(X) f(Y {x}) f(Y ). We further assume f is monotone, i.e., f(Y ) f(X) for every X Y . Thus, submodular optimization naturally arises in a wide range of applications, such as clustering and facility location (Gomes & Krause, 2010; Lindgren et al., 2016), document summarization (Lin & Bilmes, 2011; Wei et al., 2013; Sipos et al., 2012), image processing (Iyer & Bilmes, 2019), principal component analysis (Khanna et al., 2015), and recommendation systems (Leskovec et al., 2007; El-Arini & Guestrin, 2011; Chen & Crawford, 2024b; Mitrovi c et al., 2017; Yu et al., 2018; Texas A&M University, jj9754@tamu.edu, shuoxing@tamu.edu, samsonzhou@gmail.com, vcrawford@tamu.edu Published as a conference paper at ICLR 2025 Avdiukhin et al., 2019; Yaroslavtsev et al., 2020). While submodular maximization has received the most attention, submodular cover is also an important problem arising in machine learning applications (Iyer & Bilmes, 2013b; Mirzasoleiman et al., 2015; Soma & Yoshida, 2015; Norouzi-Fard et al., 2016; Mirzasoleiman et al., 2016; Ghuge et al., 2021; Ran et al., 2022; Chen & Crawford, 2024a). In monotone submodular cover (SC), the monotone submodular function arises in the constraint: Given oracle access to a monotone submodular function f : 2U R and a threshold τ, the goal of SC is to identify a subset S U of minimum cardinality such that f(S) τ. Many applications of submodular functions such as in clustering (Gomes & Krause, 2010; Lindgren et al., 2016) and data summarization (Lin & Bilmes, 2011; Wei et al., 2013; Sipos et al., 2012), are also applications where algorithmic fairness is important. Motivated by this, fair submodular maximization (FSM) has been considered under both a cardinality constraint and a fairness constraint (Celis et al., 2018a; Halabi et al., 2020). In the fair setting, we assume that each element in U is associated with a color c that denotes a protected attribute, which allows partitioning the ground set U into disjoint groups U1, . . . , UN. Then informally, the goal is to maximize the objective while selecting a representative number of elements from each color. Surprisingly, there has been no previous work studying fairness for the submodular cover problem, to the best of our knowledge. Thus in this work, we initiate the study of fairness for monotone submodular cover1. Definition 1 (Fair Submodular Cover (FSC)). Given input threshold τ, and bounds on the proportion of the elements in each group pc and qc, FSC is to find argmin S U|S| such that pc|S| |S Uc| qc|S| for all c [N] and f(S) τ. This definition of fairness incorporates multiple other existing notions of fairness, such as diversity rules (Biddle, 2017; Cohoon et al., 2013), statistical parity (Dwork et al., 2012), or proportional representation rules (Monroe, 1995; Brill et al., 2017). To guarantee the existence of feasible subsets, we assume that the inputs satisfy P c [N] pc 1 and P c [N] qc 12. To further illustrate FSC, we describe a fair data summarization application. Let U be a dataset that is split into disjoint subsets U1, ..., UN such that each subset represents some attribute. The function f is a monotone and submodular function that measures the information contained in a subset X U, such as a submodular information measure (Iyer et al., 2021). The values of pc and qc for all c [N], and τ, are input by the user. Then FSC asks to find a minimum size summary that contains sufficient information (f(S) τ), while maintaining a balanced representation amongst the attributes (determined by pc and qc). Our contributions. In this paper, we propose bicriteria approximation algorithms for FSC. An (α, β)-bicriteria approximation algorithm for FSC returns a solution set X that satisfies |X| α|OPT|, pc|X| |X Uc| qc|X| for all c [N], and f(X) βτ, where OPT is an optimal solution to the instance of FSC. Notice that the solution of a bicriteria algorithm for FSC always satisfies the fairness constraint. However, the constraint on the function value (f(X) τ) might be violated by a factor of β and therefore the solution is not necessarily feasible. But, if β to close to 1, we can get a solution that is close to being feasible. We now describe the main contributions of the paper: In our first result, we take advantage of the dual relationship between FSM and FSC, and present two algorithms that convert bicriteria approximation algorithms for FSM into bicriteria approximation algorithms for FSC in Section 2. The first algorithm, convert-fair, is designed to convert discrete algorithms for FSM into ones for FSC. In particular, convert-fair takes a (γ, β)-bicriteria approximation algorithms for FSM and converts it into a ((1 + α)β, γ)-bicriteria approximation algorithm for FSC. Our sec- 1Notice that it is not necessarily obvious what values of input τ, and pc, qc result in an instance of FSC having a feasible solution. In particular, because f is monotone it is easy to check whether there exists a set X such f(X) τ by simply testing whether τ f(U), but whether there exists such a set that also satisfies the fairness constraint is unclear. We discuss the question of existence of a feasible solution further in Appendix B.4, and throughout the paper assume that values of input τ, and pc, qc are chosen such that there does exist a feasible solution to the instance. 2Notice that this assumption is necessary: If P c [N] pc > 1, then P c [N] pc|S| > |S|. However, by the definition of fairness constraint in FSC, we can get pc|S| |S Uc|. It then follows that P c [N] pc|S| P c [N] |S Uc| = |S|, which is a contradiction, implying there are no feasible sets. Similarly, if P c [N] qc 1, we can also prove that no feasible sets satisfy the constraint. Published as a conference paper at ICLR 2025 ond conversion algorithm, convert-continuous, takes a continuous (γ, β)-bicriteria approximation algorithm and converts it into a ((1 + α)β, (1 3ε)γ 2ε γ )-bicriteria approxi- mation algorithm for FSC. Using our above result, we are now able to convert existing algorithms for FSM into algorithms for FSC. In particular, the algorithm Fair-Greedy in El Halabi et al. (2020) can be converted into bicriteria (1 + α, 1/2) for FSC. However, a factor of 1/2 on the feasibility constraint of f(X) τ means that our solution is relatively far from being feasible. Motivated by this, our subsequent results focus on developing bicriteria algorithms for FSC that produce solutions arbitrarily close to being feasible. In particular: In our second result, we develop the first bicriteria approximation algorithms for FSC that find nearly feasible solutions. In particular, we propose three bicriteria algorithms for FSM that can be paired with our converting algorithms in order to find approximate solutions for FSC that are arbitrarily close to meeting the constraint f(S) τ in FSC. The first two algorithms are the discrete algorithms greedy-fairness-bi and threshold-fairness-bi, which both achieve bicriteria approximation ratios of (1 O(ε), 1 ε), but the latter makes less queries to f compared to the former. The third algorithm is a continuous one, cont-thresh-greedy-bi, which achieves an improved (1 O(ε), ln 1 ε + 1) bicriteria approximation ratio but requires more queries to f. The theoretical analysis of all these algorithms depends on Lemma 2, which is one of our central technical contributions and distinguishes our approach from the traditional submodular cover problem. By leveraging greedy-fairness-bi and threshold-fairness-bi as subroutine algorithms for convert-fair and cont-thresh-greedy-bi as a subroutine for convert-continuous, we can obtain algorithms for FSC with a bicriteria approximation ratio of ((1 + α) 1 ε, 1 O(ε)) and ((1 + α)(ln( 1 ε) + 1), 1 O(ε)) respectively. In contrast, using existing algorithms for FSM along with the converting algorithm does not yield solutions that are very close to being feasible for FSC. Finally: We perform an experimental comparison between our discrete algorithms for FSC and the standard greedy algorithm (which does not necessarily find a fair solution) on instances of fair maximum coverage in a graph and fair image summarization. We find that our algorithms find fair solutions while the standard greedy algorithm does not, but at a cost of returning solutions of higher cardinality. 1.1 RELATED WORK Celis et al. (2018a) first gave a (1 1/e)-approximation algorithm for fair monotone submodular maximization under a cardinality constraint, which is tight given a known (1 1/e) hardness of approximation even without fairness constraints (Nemhauser & Wolsey, 1978). This is accomplished by converting their instance of FSM into monotone submodular maximization with a specific type of matroid constraint called a fairness matroid, which we describe in more detail in Section 1.2, and then using existing algorithms for submodular maximization with a matroid constraint. The standard greedy algorithm achieves a 1/2-approximation ratio for the submodular maximization with a matroid constraint (Fisher et al., 1978), and in addition there exists approximation algorithms using the multilinear extension that achieve a 1 1/e approximation guarantee (Calinescu et al., 2007; Badanidiyuru & Vondr ak, 2014). Halabi et al. (2024) gave a (1 1/e)-approximation algorithm for fair monotone submodular maximization under general matroid constraints, though their algorithm only achieves the fairness constraints in expectation. Fair submodular optimization has also been under both cardinality and matroid constraints in the streaming setting (El Halabi et al., 2020; Halabi et al., 2024). For the classical submodular cover problem without fairness constraints and integral valued f, the standard greedy algorithm, where the element of maximum marginal gain is selected one-by-one until f has reached τ, has been shown to have an approximation ratio of O(log maxe U f(e)) (Wolsey, 1982). To deal with real-valued f (as in our case), a slight variant of the greedy where we stop at (1 ε)τ instead of τ has been shown to be a (ln(1/ε), 1 ε)-bicriteria approximation algorithm (Krause et al., 2008). On the other hand, set cover is a special case of fair submodular cover (FSC), Published as a conference paper at ICLR 2025 and by the result of Feige (1998), it is not possible to achieve a (1 o(1)) ln(n) approximation to FSC unless NP has slightly superpolynomial time algorithms . Therefore, a large part of our motivation in considering bicriteria approximation guarantees for FSC is to develop constant factor approximation at the price of a small reduction to feasibility. Indeed, for a fixed ε, our algorithms achieve a constant factor approximation. If we set ε = 1/n, our continuous algorithm (Algorithm 2) for FSM with the converting algorithm (Algorithm 4) achieves a ((1 + α) ln(n) + 1), 1 7/n) bicriteria approximation guarantee for an instance of FSC, and so is very close to being feasible while having an approximation guarantee close to the lower bound. Although submodular maximization has received relatively more attention than submodular cover, the problems have a dual relationship and thus, a natural approach for submodular cover is to convert existing algorithms for submodular maximization into ones for cover (Iyer & Bilmes, 2013a; Chen & Crawford, 2024a). In particular, Iyer & Bilmes (2013a) showed that given a (γ, β)-bicriteria approximation algorithm for submodular maximization with a cardinality constraint, one can produce a ((1+α)β, γ)-bicriteria approximation algorithm for submodular cover by making log1+α(n) guesses for |OPT| in the instance of submodular cover, running the submodular maximization algorithm with the cardinality constraint set to each guess, and returning the smallest solution with f value above γτ. However, this approach does not take into account the fairness constraints and cannot be used to convert algorithms for FSM into ones for FSC. 1.2 PRELIMINARIES We now present preliminary definitions and notation that will be used throughout the paper. OPT refers to the optimal solution of a submodular optimization problem. We use [N] to denote the set {1, 2, ..., N}. The marginal gain of adding an element s to the subset S is denoted as f(S, s). In addition, for any vector v = (v1, v2, ..., v N), and any k R, we define k v = (kv1, kv2, ..., kv N), and we define v = ( v1 , v2 , ..., v N ) and v = ( v1 , v2 , ..., v N ). We now define the related problem of fair submodular maximization (FSM) of a monotone submodular function f (El Halabi et al., 2020), as the search problem of max{f(S) : S U, lc |S Uc| uc, c [N], |S| k} where lc and uc are the bound of cardinality within each small group. Without loss of generality, in this problem, it is assumed that P c [N] uc k. This is because if P c [N] uc < k, then |S| = P c [N] |S Uc| P c [N] uc k. Therefore, the problem is equivalent to setting k = P c [N] uc. Since for the cover problem, the objective is to minimize the cardinality of the solution set which means |S| is not fixed as it is in FSM, therefore we introduced the definition of fairness for FSC as a natural modification of the above problem where the fairness constraint is a proportion of the solution size as opposed to a fixed value. The set of subsets satisfying fairness constraint above for FSM is not a matroid. However, it was proven by El Halabi et al. (2020) that we can convert an instance of FSM into an instance of submodular maximization problem with a matroid constraint; we state this result as Lemma 3 in Appendix A. This matroid constraint is called a fairness matroid, which we denote as Mfair(P, κ, l, u) = {S U : |S Uc| uc, c [N], P c [N] max{|S Uc|, lc} k}, where P = {U1, ..., UN} is the partition of the ground set U, k is the total cardinality constraint, l, u NN are the lower and upper bound vectors respectively. Below we propose the idea of a β-extension of a fairness matroid, which we will use in our bicriteria algorithms for FSM. Definition 2. For any β N+, we define the β-extension of the fairness matroid to be Mβ = Mfair(P, βκ, β l, β u) = {S U : |S Uc| βuc, c [N], P c [N] max{|S Uc|, βlc} βκ}. Our continuous algorithms will use the multilinear extension of f, defined as follows. Definition 3. For any submodular objective f : 2U R+ with |U| = n, the multi-linear extension of f is defined as F(x) = P j / S(1 xj)f(S) where x [0, 1]n, and xi is the i-th coordinate of x. If we define S(x) to be a random set that contains each element i U with probability xi, then by definition, we have that F(x) = Ef(S(x)). We now present the definitions of discrete and continuous algorithms with an (α, β)-bicriteria approximation ratio for FSM, which is defined to find arg max S Mfair(U,k, l, u) f(S). Published as a conference paper at ICLR 2025 Definition 4. A discrete algorithm for FSM with an (α, β)-bicriteria approximation ratio returns a solution X such that f(X) αf(OPT) c [N], |X Uc| βuc, X c [N] max{|X Uc|, βlc} βk. Here OPT is the optimal solution of the problem FSM, i.e., OPT = arg max S Mfair(P,k, l, u) f(S). By this definition, we have that an algorithm satisfies a (α, β)-bicriteria approximation ratio for FSM i.f.f the output set S satisfies f(S) αf(OPT) and that S Mβ. Definition 5. A continuous algorithm with (α, β)-bicriteria approximation ratio for FSM returns a fractional solution x such that F(x) αf(OPT), x P(Mβ). Here OPT is the optimal solution of the problem FSM, i.e., OPT = arg max S Mfair(P,κ, l, u) f(S). Mβ is the β-extension of the fairness matroid Mfair(P, κ, l, u), and P(Mβ) is the matroid polytope of Mβ. More specifically, P(Mβ) = conv{1S : S Mβ}. 2 CONVERSION ALGORITHMS FOR FSC In this section, we introduce two algorithms that make use of the dual relationship between FSC and FSM, and convert bicriteria approximation algorithms for FSM into ones for FSC. The first algorithm, convert-fair, is designed to convert discrete algorithms for FSM into ones for FSC. In particular, convert-fair takes an (γ, β)-bicriteria approximation algorithms for FSM that runs in time T (n, κ) and converts it into a ((1+α)β, γ)-bicriteria approximation algorithm for FSC that runs in time O( log(|OP T |) log(α+1) T (n, (1 + α)|OPT|)). However, because of the matroid constraint, better approximation guarantees for FSM may be achieved by a continuous algorithm that produce a fractional solution. Motivated by this, our second converting algorithm, convert-continuous, takes a continuous (γ, β)-bicriteria approximation algorithm (where guarantees are with respect to the multilinear extension as described in Section 1.2), and converts it into a ((1 + α)β, (1 3ε)γ 2ε bicriteria approximation algorithm for FSC. In the next section, we will develop corresponding bicriteria approximation algorithms for FSM that can be used along with the results in this section in order to produce approximately optimal solutions for FSC that are arbitrarily close to being feasible. For both algorithms, it is required that the sets Uc for c [N] be sufficiently large so that our method of constructing a solution does not run out of elements to pick. In particular, we assume that the instance of FSC satisfies that P c [N] min{qc, |Uc| β(1+α)|OP T |)} 1. Recall from the definition of FSC that it is already assumed P c [N] qc 1, so this assumption is essentially requiring that there be enough elements within each set Uc of the partition, relative to parameters α and β, to ensure that the rank of the β-extension of the fairness matroid Mβ (see definition in Section 1.2) is βκ. For example, with the existing algorithm for FSM in El Halabi et al. (2020), β is 1 and therefore the assumption is met if |Uc| qc(1 + α)|OPT| for all c [N]. For our algorithms greedy-fairness-bi and cont-thresh-greedy-bi in Section 3, β would be 1/ϵ and ln(1/ϵ) respectively, and the assumption is met if |Uc| qc(1 + α)|OPT|/ϵ and |Uc| qc(1 + α) ln(1/ϵ)|OPT| for all c [N] respectively. We first consider our algorithm convert-fair for converting discrete algorithms for FSM. Pseudocode for convert-fair is provided in Algorithm 1. convert-fair takes as input an instance of FSC, a (γ, β)-bicriteria approximation algorithm for FSM, and a parameter α > 0. Each iteration of the while loop from Line 2 to Line 4 corresponds to a guess κ on the size of the optimal solution to the instance of FSC. For each guess κ, we have an instance of FSM with budget κ, fairness vector of lower bound κ p, fairness vector of upper bound κ q. We then run the algorithm for FSM on this instance to get a set S. Notice that this algorithm will convert the matroid corresponding to the instance of FSM into its β-extension. In Lines 6 to 8, convert-fair adds additional elements so that the lower bounds are met for every one of the partitions. Next in Lines 10 to 12, convert-fair then adds elements until the size constraint βκ is met, without breaking the fairness constraints. Finally, convert-fair checks if the set S satisfies f(S) γτ. If it does not, the guess of optimal solution size increases by a factor of 1 + α and the process repeats itself. Published as a conference paper at ICLR 2025 Algorithm 1 convert-fair Input: An FSC instance with threshold τ, fairness parameters p, q, partition of U P, a (γ, β)- bicriteria approximation algorithm for FSM, α > 0 Output: S U 1: κ (1 + α), S . 2: while f(S) < γτ do 3: S Run (γ, β)-approximation algorithm for FSM with fairness matroid Mfair(P, κ, pκ , qκ ) 4: κ (1 + α)κ 5: //Rounding the solution 6: for c [N] do 7: if |S Uc| < β pcκ then 8: Add new elements from Uc/S to S until |S Uc| β pcκ 9: if |S| < βκ then 10: for c [N] do 11: while |S| < βκ and |S Uc| < β qcκ do 12: Add new elements in Uc/S to S 13: return S We now state the theoretical results for convert-fair below in Theorem 1. We defer the proof and analysis of the proof of Theorem 1 to Appendix C.1. Theorem 1. Suppose P c [N] min{qc, |Uc| β(1+α)|OP T |)} 1. Then any (γ, β)-bicriteria approximation algorithm for FSM that returns a solution set in time T (n, κ) can be converted into an approximation algorithm for FSC that is a ((1 + α)β, γ)-bicriteria approximation algorithm that runs in time O( log(|OP T |) log(α+1) T (n, (1 + α)|OPT|)). We now present our algorithm convert-continuous for converting continuous algorithms. To motivate it, notice that here we can t directly use the converting theorem for discrete algorithms. There are two reasons for this: (i) The output solution is fractional so we need a rounding step, and (ii) the bicriteria approximation ratio for the continuous algorithms is on the value of the multi-linear extension and we don t have exact access to F, so we can t check directly if F(x) γτ as we did on Line 2 of convert-fair. Then we develop the converting algorithm convert-continuous for continuous algorithms. The key idea of the converting theorem is similar to convert-fair, so we defer the pseudocode of convert-continuous to Algorithm 4 in Section C.2 of the appendix. Here we describe the major differences. For each guess of optimal solution size κ, convert-continuous invokes the continuous subroutine algorithm for FSM to obtain a fractional solution x. Since F can t be queried exactly in general, we estimate F(x) by taking a sufficient number of samples in Line 4. Once the estimate of F(x) is higher than γτ, we use the pipage rounding technique to convert x into a subset S, and then use the rounding procedure analogous to that in Lines 6 to 12 in Algorithm 1 to obtain a solution set with fairness guarantee. Notice that since the solution set obtained from the pipage rounding step is only guaranteed to satisfy that Ef(S) F(x), the approximation guarantee on the function value in Theorem 5 holds in expectation. The corresponding theoretical guarantees for convert-continuous are stated below in Theorem 2. The proof of Theorem 2 can be found in Section C.2 of the appendix. Theorem 2. Suppose P c [N] min{qc, |Uc| β(1+α)|OP T |)} 1. Then with probability at least 1 δ, any (γ, β)-bicriteria approximation algorithm for FSM that returns a solution set in time T (n, κ) with probability at least 1 δ n can be converted into an approximation algorithm for FSC that is a ((1 + α)β, (1 3ε)γ 2ε γ )-bicriteria approximation algorithm where (1 3ε)γ 2ε γ holds in expectation. The query complexity is at most O log(|OP T |) log(α+1) T (n, (1 + α)|OPT|) + n log1+α |OP T | Published as a conference paper at ICLR 2025 3 BICRITERIA ALGORITHMS FOR FSM In the last section, we propose converting algorithms to convert bicriteria algorithms for FSM into ones for FSC. Existing algorithms for FSM can be used as the input (γ, β)-bicriteria subroutine, but these algorithms all return feasible solutions to the instance of FSM and have guarantees of β = 1 and γ 1 1/e. After applying the converting algorithms in Section 2 this results in solutions for FSC that are far from feasible. For example, the greedy algorithm for FSM proposed has γ = 1/2 and β = 1, and the continuous greedy algorithm for FSM has γ = 1 1/e and β = 1 (Celis et al., 2018a). In this section, we propose new algorithms that can be used for FSM where γ is arbitrarily close to 1 and β > 1. As a result, the algorithms proposed in this section can be paired with the converting theorems in Section 2 to find solutions to our instance of FSC that are arbitrarily close to being feasible. We propose three bicriteria algorithms for FSM. The first two algorithms are the discrete algorithms greedy-fairness-bi and threshold-fairness-bi, which both achieve bicriteria approximation ratios of (1 O(ε), 1 ε) where the former is in time O(nκ/ε) and the latter in time O( n ε ). The third algorithm is a continuous one, cont-thresh-greedy-bi, which achieves a (1 O(ε), ln 1 ε + 1) bicriteria approximation ratio in time O nκ ε4 ln2(n/ε) . In the case of submodular maximization with a cardinality constraint without fairness, one can find a solution with f value that is a factor of 1 ε off of that of the optimal solution by greedily adding O(ln(1/ε))κ elements beyond the cardinality constraint κ. However, existing algorithms for FSM transform the instance into an instance of submodular maximization subject to a fairness matroid constraint, and it is not clear how one can take an analogous approach and produce an infeasible solution in order to get a better approximation guarantee when dealing with a matroid constraint while maintaining a fair solution. We propose the β-extension of a fairness matroid, defined in Section 1.2, in order to get a (γ, β)-bicriteria algorithm for FSM with γ > 1 1/e. In particular, we will return a solution that is a feasible solution to the β-extension of the fairness matroid corresponding to our instance of FSM. We now introduce two lemmas concerning the β-extension of a matroid that will be needed for our algorithms and their theoretical analysis. Before we proceed to present our algorithms, we first introduce the following general lemma that helps to build a connection between the fairness matroid M and its β-extension Mβ. For the sake of simplicity in notation throughout this section and its subsequent proofs, we use the notation Mβ, representing the β-extension of Mfair(P, κ, l, u), which is defined in Section 1.2. Since the bicriteria algorithm for FSM is designed as a subroutine for the converting theorem with input Mfair(P, κ, pκ, qκ), we have that here lc = pcκ and uc = qcκ . From the fact that P c [N] pc 1 P c [N] qc, we have that P c [N] lc κ P c [N] uc. Since the bicriteria algorithm for FSM is used as a subroutine for the converting theorems for FSC, where we assume P c [N] min{qc, |Uc| β(1+α)|OP T |)} 1 in both Theorem 1 and Theorem 2, we have that rank(Mfair(P, κ, l, u)) = κ and that rank(Mβ) = βκ. Therefore, we have the following Lemma 1. c [N] lc κ P c [N] uc and rank(Mfair(P, κ, l, u)) = κ, rank(Mβ) = βκ. Next, we present Lemma 2, which is one of the most interesting and novel parts of our analysis in proposing the bicriteria algorithm for FSM, and is necessary in the proof of all the bicriteria algorithm for FSM. Notice that in the bicriteria algorithm for submodular maximization with cardinality constraint, the greedy step is achieved by simply adding the element with the highest marginal gain. However, since we need to consider the fairness constraint in our paper, we can only add the element that makes the solution set feasible. To prove the theoretical guarantee, we have to construct a mapping from the solution set to the optimal solution. To tackle this difficulty, we propose and analyze Lemma 2, which guarantees the existence of such a mapping. Lemma 2. For any β N+ and any fairness matroid Mfair(P, κ, l, u), denote Mβ as the βextended fairness matroid of Mfair(P, κ, l, u). Then for any set S Mβ with |S| = βκ, T Mfair(P, κ, l, u) with |T| = κ, and any permutation of S = (s1, s2, ..., sβκ), there exist a sequence E = (e1, e2, ..., eβκ) such that each element in T appears β times in E and that Si {ei+1} Mβ, i {0, 1, ..., βκ} Published as a conference paper at ICLR 2025 where Si = (s1, s2, ..., si) and S0 = . Notice that this mapping is not straightforward and trivial due to two reasons. First of all, the size of the optimal solution and the solution set are different. Second, the portion of |S Uc| over |T Uc| can be different for different subroup c [N]. To prove Lemma 2, we construct the mapping by an iterative proof and by dealing with different cases for each step. This lemma is of great importance in our analysis as it guarantees the existence of a mapping from the solution set to the set containing β copies of the OPT. The detailed proof of this lemma is deferred to the appendix. On the other hand, we highlight that this lemma reveals an important and general fact about the fairness matroid: For each base set T Mfair(P, κ, l, u), and each subset S Mβ that is a base set, we can find a mapping from T to a sequence E that contains β copies of T such that Si {ei} is always feasible for Mβ. Notice that since Mβ is a matroid, then for any subset S1 S2, if S2 Mβ then S1 Mβ. Consequently, the above lemma holds for not only just base set of Mβ, but also for any subset of Mβ by adding dummy variables to the end of the sequence S if the number of elements in S is less than βκ. Building upon this lemma, we propose three bicriteria algorithms in the following part. 3.1 DISCRETE BICRITERIA ALGORITHMS FOR FSM We now analyze two discrete bicriteria algorithms for FSM, greedy-fair-bi and threshold-fairness-bi. Let SMMC refer to the problem of monotone submodular maximization with a matroid constraint. greedy-fair-bi is based on the standard greedy algorithm which is well-known to produce a feasible solution with a 1/2 approximation guarantee in O(nk) time for SMMC (Fisher et al., 1978), where k is the rank of the matroid. greedy-fair-bi proceeds in a series of rounds, where at each round we select the element x U with the highest marginal gain to f that stays on the 1/ε-extension of the fairness matroid corresponding to the instance of FSM, i.e. S {x} M1/ε. threshold-fairness-bi is based on the threshold greedy algorithm (Badanidiyuru & Vondr ak, 2014), which is also a 1/2 ε approximation for SMMC but requires only O(n log k) queries of f. threshold-fairness-bi iteratively makes passes through the universe U and adds all elements into its solution with marginal gains exceeding τ that are feasible with respect to the 1/ε-extension of the fairness matroid, and this threshold is decreased by 1 ε at each round until it falls below a stopping criterion. Notice that these algorithms specifically use the β-extension of the fairness matroid, and therefore do not apply to the more general setting of submodular maximization with a matroid constraint. Pseudocode for the algorithms greedy-fair-bi and threshold-fairness-bi are included in Appendix D as Algorithms 5 and Algorithm 6 respectively. We now present the theoretical guarantees of greedy-fair-bi and threshold-fairness-bi. The key benefit of these algorithms over existing ones for FSM is that by making ε arbitrarily small and using convert-fair in Section 2, we have algorithms for FSC that are arbitrarily close to being feasible. In particular, if we use greedy-fair-bi as a subroutine in convert-fair, we have a ( 1 ε + 1, 1 ε)-bicriteria algorithms for FSC in O(n log(n)κ/ε2) queries of f. If we use threshold-fairness-bi, we get a similar approximation guarantee in O(n log(n) log(κ/ε)/ε2) queries of f. The proofs of both of these theorems can be found in Section D of the appendix. Theorem 3. Suppose that greedy-fairness-bi is run for an instance of FSM, then greedy-fairness-bi outputs a solution S that satisfies a (1 ε, 1 ε)-bicriteria approximation guarantee in at most O (nκ/ε) queries of f. Theorem 4. Suppose that threshold-fairness-bi is run for an instance of FSM with ε (0, 1). Then threshold-fairness-bi outputs a solution S that satisfies a (1 2ε, 1 ε)- bicriteria approximation guarantee in at most O (n/ε log(κ/ε)) queries of f. 3.2 CONTINUOUS ALGORITHMS FOR FSM A downside to the discrete greedy algorithms proposed in Section 3 is that we are above our budget κ by a factor of 1/ε, which is weaker than the analogous guarantee of ln(1/ε) that the greedy algorithm gives for submodular maximization with a cardinality constraint without fairness. We now introduce Published as a conference paper at ICLR 2025 Algorithm 2 cont-thresh-greedy-bi (cont-bi) 1: Input: ε, δ, M 2U 2: x 0 3: d := maxs M f(s), 4: for t = 1 to 1/ε do 5: B decreasing-threshold-procedure (x, ε, δ, d, M) 6: x x + ε 1B 7: return x Algorithm 3 decreasing-threshold-procedure (DTP) 1: Input: x, ε, δ, d, M 2U 2: B 3: Denote Mfair(P, κ ln(1/ε), l ln(1/ε), u ln(1/ε)) as Mln(1/ε). 4: for w = d; w > εd κ ; w = w(1 ε) do 5: for u U do 6: X = f(S(x + ε1B), u) 7: if B {u} Mln(1/ε)+1 then 8: ˆX average over 3κ ε3 samples from DX 9: if ˆX w then 10: B B {u} 11: w = w(1 ε) 12: return B and analyze our continuous algorithm cont-thresh-greedy-bi (cont-bi), which produces a fractional solution for FSM that achieves a (1 O(ε), ln(1/ε) + 1)-bicriteria approximation ratio in O nκ ln2(n) time. cont-bi is based on the continuous threshold greedy algorithm of Badanidiyuru & Vondr ak (2014). Compared to the discrete algorithms presented in Section 3.1, cont-bi improves the ratio on the cardinality of the solution from O(1/ε) to O(ln(1/ε)), and therefore has as strong of guarantees as the greedy algorithm without fairness. We can achieve a discrete solution with an arbitrarily small loss by employing rounding schemes, like swap rounding (Chekuri et al., 2010), on the returned fractional solution x. cont-bi iteratively takes a step of size ε in the direction 1B, where 1B is the indicator function of a set B U, over 1/ε iterations. At each step, the set B is determined by the subroutine decreasing-threshold-procedure (DTP). DTP builds B over a series of rounds corresponding to thresholds w, where w begins as the max singleton marginal gain and the rounds exit once w is sufficiently small. During each round, we iterate over the universe U, and if an element u U can be added to B while staying on the ln(1/ε) + 1-extension of the fairness matroid, then we approximate the multilinear extension f and add u to B if and only if the marginal gain is above w. Pseudocode for cont-bi is provided in Algorithm 2, and pseudocode for decreasing-threshold-procedure is provided in Algorithm 3. Theorem 5. Suppose that Algorithm 2 is run for an instance of FSM, then with probability at least 1 1 n2 , cont-thresh-greedy-bi outputs a solution S that satisfies a (1 7ε, ln( 1 ε) + 1)- bicriteria approximation guarantee in at most O nκ ε4 ln2(n/ε) queries of f. By applying a converting theorem, we can obtain the algorithm for submodular cover that achieves an approximation ratio of ((1 + α)(ln( 1 ε) + 1), 1 O(ε)), which aligns with the best-known results for bicriteria submodular cover without the fairness constraint (Chen & Crawford, 2024a; Iyer & Bilmes, 2013b). The detailed theoretical guarantee and proof of the algorithm can be found in Corollary 5.1 in the appendix. 4 EXPERIMENTS In this section, we evaluate several of our algorithms for FSC on instances of fair maximum coverage, where the objective is to identify a set of fixed nodes that optimally maximize coverage within Published as a conference paper at ICLR 2025 (a) greedy-bi (b) greedy-fairness-bi (c) threshold-fairness-bi 1500 1700 1900 2100 2300 2500 Value of the given threshold GREEDY GREEDY-Fair THRES-Fair 1500 1700 1900 2100 2300 2500 Value of the given threshold GREEDY GREEDY-Fair THRES-Fair 1500 1700 1900 2100 2300 2500 Value of the given threshold Fairness difference GREEDY GREEDY-Fair THRES-Fair (f) Fairness difference Figure 1: Performance comparison on the Twitch 5000 dataset for Maximum Coverage. 1a, 1b, 1c illustrate the distribution of users speaking different languages in the solutions produced by various algorithms with τ = 2400. f: the value of the objective submodular function. Cost: the size of the returned solution. Fairness difference: (maxc |S Uc| minc |S Uc|)/|S|. a graph. The dataset utilized is a subset of the Twitch Gamers dataset (Rozemberczki & Sarkar, 2021), comprising 5,000 vertices (users) who speak English, German, French, Spanish, Russian, or Chinese. We aim to develop a solution with a high f value exceeding a given threshold τ while ensuring a fair balance between users who speak different languages. Additional discussion about the application as well as experimental setup including parameter settings are included in Appendix G. In addition, we include experiments on instances of fair image summarization in Appendix G. We evaluate the performance of our discrete bicriteria algorithms greedy-fair-bi and threshold-fairness-bi for FSM as subroutines in our algorithm convert-fair. In addition, we consider the baseline algorithm, greedy-bi, which is the standard greedy algorithm for submodular cover without fairness. Figures 1a, 1b and 1c showcase the distribution of users speaking different languages in the solutions produced by these algorithms with τ = 2400. Figures 1d, 1e and 1f present the performance of these algorithms (f value, cost, and fairness difference) for varying values of τ. As shown in Figure 1a, with τ = 2400, over 80% of the users in the solution returned by greedy-bi are English speakers, which indicates a lack of fairness in user language distribution. While the solutions produced by greedy-fair-bi and threshold-fairness-bi exhibit significantly fairer distributions across different languages, demonstrating the effectiveness of our proposed algorithms. Further, as the value of given τ increases, the magnitude of this difference also increases (see Figure 1f). Figure 1d showcases that for all these algorithms the objective function value f(S) scales almost linearly with the threshold τ, which aligns with the theoretical guarantees of the approximation ratio. Additionally, as shown in Figure 1e, the cost of the solutions returned by our proposed algorithms is higher than that of the solution from greedy-bi. This is an expected trade-off, as our algorithms have to include more elements to maintain the approximation ratio while ensuring fairness. Overall, our proposed algorithms are efficient and effective in producing a fair solution. Published as a conference paper at ICLR 2025 ACKNOWLEDGEMENTS Samson Zhou is supported in part by NSF CCF-2335411. The work was conducted in part while Samson Zhou was visiting the Simons Institute for the Theory of Computing as part of the Sublinear Algorithms program. Victoria Crawford is supported in part by the Seed Program for AI, Computing, and Data Science created by the Texas A&M Institute for Data Science. Dmitrii Avdiukhin, Slobodan Mitrovic, Grigory Yaroslavtsev, and Samson Zhou. Adversarially robust submodular maximization under knapsack constraints. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD., pp. 148 156, 2019. Ashwinkumar Badanidiyuru and Jan Vondr ak. Fast algorithms for maximizing submodular functions. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms, pp. 1497 1514. SIAM, 2014. Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 50(1):3 44, 2021. Dan Biddle. Adverse impact and test validation: A practitioner s guide to valid and defensible employment testing. Routledge, 2017. Markus Brill, Jean-Franc ois Laslier, and Piotr Skowron. Multiwinner approval rules as apportionment methods. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 414 420, 2017. Gruia Calinescu, Chandra Chekuri, Martin P al, and Jan Vondr ak. Maximizing a submodular set function subject to a matroid constraint. In International Conference on Integer Programming and Combinatorial Optimization, pp. 182 196. Springer, 2007. L. Elisa Celis, Lingxiao Huang, and Nisheeth K. Vishnoi. Multiwinner voting with fairness constraints. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI, pp. 144 151, 2018a. L. Elisa Celis, Vijay Keswani, Damian Straszak, Amit Deshpande, Tarun Kathuria, and Nisheeth K. Vishnoi. Fair and diverse dpp-based data summarization. In Proceedings of the 35th International Conference on Machine Learning, ICML, pp. 715 724, 2018b. L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. Ranking with fairness constraints. In 45th International Colloquium on Automata, Languages, and Programming, ICALP, pp. 28:1 28:15, 2018c. Chandra Chekuri, Jan Vondr ak, and Rico Zenklusen. Dependent randomized rounding via exchange properties of combinatorial structures. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 575 584. IEEE, 2010. Wenjing Chen and Victoria Crawford. Bicriteria approximation algorithms for the submodular cover problem. Advances in Neural Information Processing Systems, 36, 2024a. Wenjing Chen and Victoria G Crawford. Linear submodular maximization with bandit feedback. ar Xiv preprint ar Xiv:2407.02601, 2024b. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Fair clustering through fairlets. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp. 5029 5037, 2017. Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, and Sergei Vassilvitskii. Matroids, matchings, and fairness. In The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS, 2019. Published as a conference paper at ICLR 2025 Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data, 5(2):153 163, 2017. Joanne Mc Grath Cohoon, James P. Cohoon, Seth Reichelson, and Selwyn Lawrence. Effective recruiting for diversity. In IEEE Frontiers in Education Conference, FIE, pp. 1123 1124, 2013. Pinar Duygulu, Kobus Barnard, Joao FG de Freitas, and David A Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Computer Vision ECCV 2002: 7th European Conference on Computer Vision Copenhagen, Denmark, May 28 31, 2002 Proceedings, Part IV 7, pp. 97 112. Springer, 2002. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard S. Zemel. Fairness through awareness. In Innovations in Theoretical Computer Science, pp. 214 226, 2012. Khalid El-Arini and Carlos Guestrin. Beyond keyword search: discovering relevant scientific literature. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 439 447. ACM, 2011. Marwa El Halabi, Slobodan Mitrovi c, Ashkan Norouzi-Fard, Jakab Tardos, and Jakub M Tarnawski. Fairness in streaming submodular maximization: Algorithms and hardness. Advances in Neural Information Processing Systems, 33:13609 13622, 2020. Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634 652, 1998. Marshall L Fisher, George L Nemhauser, and Laurence A Wolsey. An analysis of approximations for maximizing submodular set functions II. Springer, 1978. Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. GPTQ: accurate post-training quantization for generative pre-trained transformers. Co RR, abs/2210.17323, 2022. Rohan Ghuge, Anupam Gupta, and Viswanath Nagarajan. The power of adaptivity for stochastic submodular cover. In International Conference on Machine Learning, pp. 3702 3712. PMLR, 2021. Ryan Gomes and Andreas Krause. Budgeted nonparametric learning from data streams. In Proceedings of the 27th International Conference on Machine Learning, ICML, pp. 391 398, 2010. Marwa El Halabi, Slobodan Mitrovic, Ashkan Norouzi-Fard, Jakab Tardos, and Jakub Tarnawski. Fairness in streaming submodular maximization: Algorithms and hardness. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, Neur IPS, 2020. Marwa El Halabi, Suraj Srinivas, and Simon Lacoste-Julien. Data-efficient structured pruning via submodular optimization. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems, Neur IPS, 2022. Marwa El Halabi, Jakub Tarnawski, Ashkan Norouzi-Fard, and Thuy-Duong Vuong. Fairness in submodular maximization over a matroid constraint. In International Conference on Artificial Intelligence and Statistics, pp. 1027 1035, 2024. Rishabh Iyer and Jeffrey Bilmes. Near optimal algorithms for hard submodular programs with discounted cooperative costs. In Proceedings of Machine Learning Research, volume 89, pp. 276 285, 2019. Rishabh Iyer, Ninad Khargonkar, Jeff Bilmes, and Himanshu Asnani. Generalized submodular information measures: Theoretical properties, examples, optimization algorithms, and applications. IEEE Transactions on Information Theory, 68(2):752 781, 2021. Rishabh K. Iyer and Jeff A. Bilmes. Submodular optimization with submodular cover and submodular knapsack constraints. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems, pp. 2436 2444, 2013a. Rishabh K Iyer and Jeff A Bilmes. Submodular optimization with submodular cover and submodular knapsack constraints. Advances in neural information processing systems, 26, 2013b. Published as a conference paper at ICLR 2025 David Kempe, Jon M. Kleinberg, and Eva Tardos. Maximizing the spread of influence through a social network. Theory Comput., 11:105 147, 2015. Rajiv Khanna, Joydeep Ghosh, Russell Poldrack, and Oluwasanmi Koyejo. Sparse Submodular Probabilistic PCA. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, pp. 453 461, 2015. Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. Human decisions and machine predictions. The quarterly journal of economics, 133(1):237 293, 2018. Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. In 8th Innovations in Theoretical Computer Science Conference, ITCS, pp. 43:1 43:23, 2017. Andreas Krause, H Brendan Mc Mahan, Carlos Guestrin, and Anupam Gupta. Robust submodular observation selection. Journal of Machine Learning Research, 9(12), 2008. Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne Van Briesen, and Natalie Glance. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 420 429. ACM, 2007. Hui Lin and Jeff Bilmes. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 510 520. Association for Computational Linguistics, 2011. Erik M. Lindgren, Shanshan Wu, and Alexandros G. Dimakis. Leveraging sparsity for efficient submodular data summarization. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems, pp. 3414 3422, 2016. Baharan Mirzasoleiman, Amin Karbasi, Ashwinkumar Badanidiyuru, and Andreas Krause. Distributed submodular cover: Succinctly summarizing massive data. Advances in Neural Information Processing Systems, 28, 2015. Baharan Mirzasoleiman, Morteza Zadimoghaddam, and Amin Karbasi. Fast distributed submodular cover: Public-private data summarization. Advances in Neural Information Processing Systems, 29, 2016. Slobodan Mitrovi c, Ilija Bogunovic, Ashkan Norouzi-Fard, Jakub Tarnawski, and Volkan Cevher. Streaming robust submodular maximization: A partitioned thresholding approach. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp. 4560 4569, 2017. Burt L Monroe. Fully proportional representation. American Political Science Review, 89(4):925 940, 1995. George L. Nemhauser and Laurence A. Wolsey. Best algorithms for approximating the maximum of a submodular set function. Math. Oper. Res., 3(3):177 188, 1978. Ashkan Norouzi-Fard, Abbas Bazzi, Ilija Bogunovic, Marwa El Halabi, Ya-Ping Hsieh, and Volkan Cevher. An efficient streaming algorithm for the submodular cover problem. Advances in Neural Information Processing Systems, 29, 2016. Yingli Ran, Zhao Zhang, and Shaojie Tang. Improved parallel algorithm for minimum cost submodular cover problem. In Conference on Learning Theory, pp. 3490 3502. PMLR, 2022. Benedek Rozemberczki and Rik Sarkar. Twitch gamers: a dataset for evaluating proximity preserving and structural role-based node embeddings. ar Xiv preprint ar Xiv:2101.03091, 2021. Ravid Shwartz-Ziv, Micah Goldblum, Yucen Lily Li, C. Bayan Bruss, and Andrew Gordon Wilson. Simplifying neural network training under class imbalance. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems, Neur IPS, 2023. Published as a conference paper at ICLR 2025 Ruben Sipos, Adith Swaminathan, Pannaga Shivaswamy, and Thorsten Joachims. Temporal corpus summarization using submodular word coverage. In Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 754 763. ACM, 2012. Tasuku Soma and Yuichi Yoshida. A generalization of submodular cover via the diminishing return property on the integer lattice. Advances in neural information processing systems, 28, 2015. Kai Wei, Yuzong Liu, Katrin Kirchhoff, and Jeff Bilmes. Using document summarization techniques for speech data subset selection. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 721 726, 2013. Kai Wei, Rishabh K. Iyer, and Jeff A. Bilmes. Submodularity in data subset selection and active learning. In Proceedings of the 32nd International Conference on Machine Learning, ICML, volume 37, pp. 1954 1963, 2015. Laurence A. Wolsey. An analysis of the greedy algorithm for the submodular set covering problem. Comb., 2(4):385 393, 1982. Grigory Yaroslavtsev, Samson Zhou, and Dmitrii Avdiukhin. bring your own greedy +max: Nearoptimal 1/2-approximations for submodular knapsack. In The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS, pp. 3263 3274, 2020. Qilian Yu, Easton Li Xu, and Shuguang Cui. Streaming algorithms for news and scientific literature recommendation: Monotone submodular maximization with a $d$ -knapsack constraint. IEEE Access, 6:53736 53747, 2018. Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez-Rodriguez, and Krishna P. Gummadi. Fairness constraints: Mechanisms for fair classification. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS, volume 54 of Proceedings of Machine Learning Research, pp. 962 970, 2017. A OMITTED LEMMA OF SECTION 1.2 Lemma 3 ((El Halabi et al., 2020)). If f is monotone, then solving FSM is equivalent to the problem below. max S U f(S) s.t. |S Uc| uc c [N] X c [N] max{|S Uc|, lc} k B ADDITIONAL DISCUSSION B.1 ADDITIONAL APPLICATIONS OF FSC We first describe an additional number of applications of the fair submodular cover (FSC) problem. Data summarization. The goal of data summarization is to find a small subset of a dataset, such as images, documents, etc, that summarizes the dataset. Monotone submodular functions have commonly been used in data summarization to quantify the performance of a particular subset (Lin & Bilmes, 2011; Lindgren et al., 2016), and in particular SC has been used to model the data summarization problem (Mirzasoleiman et al., 2015). In many of these applications, the items of the dataset can be classified into a number of categories over which a balanced representation in the summary would be desirable. For example, images may be people of different nationalities, or news article documents may correspond to different perspectives on an issue. The FSC formulation emphasizes maintaining a certain information threshold via the constraint f(S) τ, balancing categories through a fairness constraint, while minimizing the summary size. One particular possibility Published as a conference paper at ICLR 2025 for the constraint on f is that it be nearly its maximum value, i.e., we desire a summary that is as small as possible while maintaining nearly all of the information from the original dataset. We also note that data summarization motivates FSM (Halabi et al., 2022). In fact, for many use cases such a formulation of FSC may be more meaningful than the alternative FSM formulation, which places emphasis on restricting the summary to a particular budget. Training data subset selection. A second application is training data subset selection in machine learning, which has previously been studied from the submodular optimization perspective (Wei et al., 2015). The fairness constraint can be used to address the problem of innacuracies introduced by class imbalance during model training, which is a major concern (Shwartz-Ziv et al., 2023). This problem can be modeled by FSC, where the constraint on the f value represents a requirement that the smaller training data represent the overall dataset sufficiently well therefore promoting an accurate learned model, that the training data be balanced over the classes, and we seek to select the smallest amount of training data possible as training is computationally expensive. Neural network pruning and quantization. Due to the massive size of modern neural networks, large, highly-accurate models may require multiple GPUs to do the inference, which limits the usability of such models. Neural network pruning and quantization address this by reducing network size or memory while preserving performance. In this task, the goal is to select a subset of neurons while preserving network performance. One common approach used in neural network pruning and quantization is the reweighted input change pruning (Frantar et al., 2022), where the objective function has been proved to preserve weak submodularity (Halabi et al., 2022). Assume that our goal is to prune the network with the objective function to achieve 70% of the value of F on the ground set i.e., F(S) 0.7τ, where τ is F(U) where U is the ground set. Additionally, a fairness constraint can ensure that neurons are proportionally selected across different layers or blocks to maintain structural balance. Influence maximization in social network analysis. As a final application, influence maximization is an important problem in social network analysis. Suppose the social graph is described by G = (V, E, w), where V is the set of nodes with |V | = n, E denotes the set of edges, and w is the weight vector defined on the set of edges E. The objective function is defined on subsets of the nodes of the graph to be the expected number of nodes influenced in the graph by a chosen seed set S, and it is well-known to be an example of a monotone and submodular function (Kempe et al., 2015). In this application, FSC addresses the problem where we want to find the subset of minimum size that could influence a target fraction of nodes, f(S) τ, while ensuring fairness in node selection based on associated features (e.g., demographics). This application highlights FSC s ability to balance influence spread across diverse subgroups in social networks. B.2 COMPARISON WITH EXISTING REDUCTIONS In this section, we briefly discuss the difference between our reduction and the existing reduction from general submodular maximization to submodular cover. The standard reduction from an instance of submodular cover (SC) with objective f and threshold constraint τ to an instance of submodular maximization (SM) with objective f and budget k involves iteratively doing the following procedure: A guess is made for the size of the optimal solution (|OPT|) to the instance of SC, and this guess as the budget along with f are input into an algorithm for SM. The procedure is repeated with increasingly large guesses until a solution is found with f value sufficiently close to τ. This process is relatively straightforward since the two problems are dual to each other, and the conversion requires only a flip between the objective and the constraint. A clear description of the process is provided in Iyer & Bilmes (2013a). In contrast, in our fairness setting, the conversion from fair submodular cover (FSC) to fair submodular maximization (FSM) is less clear because of the more complex matroid structure of the fairness constraints. It is important to note that there is currently no existing formulation of the Submodular Cover (SC) problem that incorporates a matroid constraint, nor has any conversion process been developed to address such constraints. To address this, we devised a method where each guess of the size of the optimal solution for the instance of SC is used to construct an extended fairness matroid (we propose the concept of an extension to a matroid in Section 1.2). This matroid is then Published as a conference paper at ICLR 2025 input as a constraint into a bicriteria FSM algorithm, such as those developed in Sections 3.1 and 3.2. Furthermore, post-processing (Lines 6 12 in Algorithm 1 and Lines 7-13 in Algorithm 4) is required for each guess to ensure the fairness constraint is met, unlike the non-fairness setting where no such post-processing is necessary. A final difference between our fair setting and the general setting is that we are the first to introduce a converting process for multilinear extension algorithms (Algorithm 4). B.3 COMPARISON WITH ALGORITHMS FOR SM WITH MATROID CONSTRAINTS In this section, we provide a comparison between our fair submodular maximization algorithms and existing algorithms for submodular maximization under matroid constraints. First, we note that the fair submodular maximization (FSM) problem can be converted into an instance of submodular maximization with a general matroid constraint, as shown by Lemma 3 in Halabi et al. (2020). Therefore, existing algorithms for SM with a general matroid constraint can also be applied to FSM. Such algorithms return a feasible solution to the instance of SM with a matroid constraint, and achieve an approximation guarantee of at best 1 1/e. However, our goal with our algorithms for FSM is to use them as subroutines in the conversion process, and so we develop algorithms that find sets that are better approximations to the optimal solution than 1 1/e but are not necessarily feasible. i.e., we propose algorithms for FSM with bicriteria approximation ratio. To this end, we have developed the notion of an extension of the fairness matroid (see Definition 2 in Section 1.2) and shown that by transferring to this bigger constraint we can achieve better objective values. In the case of our continuous algorithm, this means traveling within an extended polytope of the β-extension of the fairness matroid. This approach enables the f value of the solution set to approach arbitrarily close to the optimal objective. B.4 OMITTED DISCUSSION ON THE FEASIBILITY OF FSC In Section 1.2, we present the problem definition for FSC. Notice that for some input values of τ, and pc, qc, there might be no feasible solution, i.e. the instance of FSC is invalid. Further, it might not be easy to check whether there exists a feasible solution with the value of submodular objective f to be higher than τ and also satisfy the fairness constraint. However, if we have an algorithm that is guaranteed to produce an approximately feasible solution to FSC assuming the instance is valid, we can use this algorithm to check whether there exists an approximately feasible solution to the instance or not. In particular, suppose we have an algorithm for valid instances of FSC that is guaranteed to produce a fair solution X such that f(X) γτ for some value γ that is close to 1. Then if we run this algorithm on the instance, we can check if the returned solution satisfies f(X) γτ and is fair, in which case we know there exists a nearly feasible solution. If the solution does not satisfy the guarantees, then the FSC instance must not have a feasible solution at all. The algorithms proposed in this paper are examples of algorithms that provide nearly feasible solutions to valid instances of FSC. Another approach to find a value for τ which makes the FSC problem feasible is as follows. We can choose a value of the solution set, denoted as r, which satisfies that qcr |Uc| for each c [N], and run any non-bicriteria algorithm for the FSM instance with the fairness matroid Mfair(P, r, pr, qr). Since that qcr |Uc|, the rank of the fairness matroid is r. Therefore, if we set τ to be f(S) where S is the output solution set of the algorithm for FSM, then the FSC problem would be feasible since S would be a feasible set for the FSC instance. C APPENDIX FOR SECTION 2 In this section, we present missing discussions and proofs from Section 2 in the main paper. We first present missing proofs of Theorem 1 about algorithm convert-fair in Section C.1. Then we present the proof of Theorem 2 about the converting algorithm convert-continuous for continuous algorithms in Section C.2. In addition, pseudocode for the algorithm convert-continuous is presented in Algorithm 4. Published as a conference paper at ICLR 2025 C.1 PROOF OF THEOREM 1 In this section, we present the missing proofs of the lemmas that are used in the proof of Theorem 1. In order to prove Theorem 1, we need the following two lemmas. Lemma 4 guarantees that the solution set S after the rounding step satisfies the fairness constraint for cover. Lemma 5 implies the inclusion relationship of the fairness matroid with the same fairness ratios. Lemma 4. For each guess κ such that κ (1+α)|OPT|, the solution set S in Algorithm 1 satisfies β |S Uc| β qc|S| β , |S| = βκ. Proof. Here we denote the solution set returned by the bicriteria algorithm for FSM as S , and the solution set after the rounding steps from Line 6 to Line 8 as S . From the definition of bicriteria approximation algorithm for FSM, we can see that the solution set returned by the subroutine algorithm for FSM satisfies that |S Uc| β qcκ X c [N] max{|S Uc|, β pcκ } βκ After the rounding steps for each group from Line 6 to Line 8, the solution set satisfies that |S Uc| = max{β pcκ , |S Uc|} for any c [N]. It then follows that β pcκ |S Uc| β qcκ . Since that P c [N] max{|S Uc|, β pcκ } βκ, we have that c [N] |S Uc| = X c [N] max{|S Uc|, β pcκ } βκ. From the assumption that P c [N] qc 1 and P c [N] min{qc, |Uc| β(1+α)|OP T |)} 1, after the second rounding phase from Line 10 to Line 12, we have |S| = βκ and that for each group c, β pcκ |S Uc| β qcκ . Since the solution set S is of cardinality βκ, then we have β |S Uc| β qc|S| Lemma 5. For any positive integers κ1, κ2 such that κ1 κ2, we have that Mfair(P, κ1, pκ1 , qκ1 ) Mfair(P, κ2, pκ2 , qκ2 ) Proof. The lemma is equivalent to prove that for any subset A Mfair(P, κ1, pκ1 , qκ1 ), we have that A is also in Mfair(P, κ2, pκ2 , qκ2 ). Since κ1 κ2, |A Uc| qcκ1 qcκ2 . For the second constraint, notice that P c [N] max{|A Uc|, pcκ1 } κ1 is equivalent to that P c [N] max{|A Uc|/κ1, pcκ1 κ1 } 1. It then follows that c [N] max{|A Uc|/κ2, pcκ2 c [N] max{|A Uc|/κ1, pcκ1 Therefore, A Mfair(P, κ2, pκ2 , qκ2 ). We now prove Theorem 1. Theorem 1. Suppose P c [N] min{qc, |Uc| β(1+α)|OP T |)} 1. Then any (γ, β)-bicriteria approximation algorithm for FSM that returns a solution set in time T (n, κ) can be converted into an approximation algorithm for FSC that is a ((1 + α)β, γ)-bicriteria approximation algorithm that runs in time O( log(|OP T |) log(α+1) T (n, (1 + α)|OPT|)). Published as a conference paper at ICLR 2025 Proof. Denote the optimal solution of the FSC as OPT. Since by Lemma 4, the fairness constraint for cover is always satisfied. When the guess of OPT satisfies that |OPT| < κ (1 + α)|OPT|, by the definition of bicriteria approximation algorithm for FSM, it follows that f(S) γ max X Mfair(P,κ, pκ , qκ ) f(X). Since κ > |OPT|, by Lemma 5, we have that Mfair(P, |OPT|, p|OPT| , q|OPT| ) Mfair(P, κ, pκ , qκ ). Therefore, it follows that max X Mfair(P,κ, pκ , qκ ) f(X) max X Mfair(P,|OP T |, p|OP T | , q|OP T | ) f(X) Since OPT is the optimal solution of FSC, we have that pc|OPT| |OPT Uc| qc|OPT| . It implies that OPT Mfair(P, |OPT|, p|OPT| , q|OPT| ). Therefore we can get max X Mfair(P,|OP T |, p|OP T | , q|OP T | ) f(X) f(OPT) τ. This means that the algorithm stops by the time when κ reaches the region of (|OPT|, (1 + α)|OPT|], which implies that the output solution set satisfies |S| = βκ β(1 + α)|OPT|. Since by Lemma 4, the fairness constraint is always satisfied, the output solution set satisfies a ((1 + α)β, γ)-approximation ratio. The algorithm makes O(log1+α |OPT|) calls to the bicriteria algorithm for FSM with κ = 1 + α, (1 + α)2, ..., (1 + α)|OPT|, so the query complexity is O(P log(|OP T |) log(α+1) i=1 T (n, (1 + α)i)). C.2 CONVERTING THEOREM FOR CONTINUOUS ALGORITHMS In this section, we present and analyze the converting algorithm for the continuous algorithms, which is denoted as convert-continuous. The algorithm description is in Algorithm 4. The main result of the algorithm is presented in Theorem 2, which we restate as follows. Theorem 2. Any continuous algorithm with a (γ, β)-bicriteria approximation ratio for FSM that returns a solution in time T (n, κ) with probability at least 1 δ n can be converted into an approximation algorithm for FSC such that with probability 1 δ, the algorithm satisfies a ((1 + α)β, (1 3ε)γ 2ε γ )-bicriteria approximation ratio where (1 3ε)γ 2ε γ holds in expectation. The query complexity is at most O(log1+α |OPT|T (n, (1 + α)|OPT|)) + n log1+α |OP T | Proof. Throughout the proof, we use OPT to denote the optimal solution of the FSM. In addition, we denote the optimal solution of FSM under the total cardinality κ as OPTκ, i.e., OPTκ = arg max S Mfair(P,κ, pκ, qκ) f(S). First of all, notice that there are at most min{n, log1+α |OPT| + 1} number of guesses of |OPT| before κ reaches |OPT| κ (1 + α)|OPT|. By taking a union bound over all guess of |OPT| we would obtain with probability at least 1 δ 2 and for each guess of |OPT|, the algorithm for FSM outputs a solution x with a bicriteria approximation ratio of (γ, β). Since F(x) n maxs Mβ f(s) nf(OPTκ), by the Chernoff bound in Lemma 8 and taking the union bound, it follows that with probability at least 1 δ 2, for each guess of |OPT|, the estimate of F(x) in Line 4 of Algorithm 4 denoted as Y , satisfies that |Y F(x)| 2εf(OPTκ) + 3εF(x). By the definition of the bicriteria approximation ratio, it follows that Y {(1 3ε)γ 2ε}f(OPTκ). Similar to the proof of Theorem 1, we can see that when κ, which is the guess of the size OPT satisfies that |OPT| κ (1 + α)|OPT|, it follows that Published as a conference paper at ICLR 2025 Algorithm 4 convert-continuous Input: An FSC instance with threshold τ, fairness parameters p, q, partition P, a (γ, β)-bicriteria approximation algorithm for FSM, α > 0 Output: S U 1: κ 1 + α , S . 2: while true do 3: x (γ, β)-bicriteria approximation for FSM with fairness matroid Mfair(P, κ, pκ, qκ) 4: Y average over n 2ε2 log( 4n δ ) samples from F(x) 5: if Y {(1 3ε)γ 2ε}τ then 6: S pipage rounding of x 7: for c [N] do 8: if |S Uc| < pcβκ then 9: Add new elements from Uc/S to S until |S Uc| pcβκ 10: if |S| < βκ then 11: for c [N] do 12: while |S| < βκ and |S Uc| < qcβκ do 13: Add new elements in Uc/S to S 14: κ (1 + α)κ 15: return S Therefore, Y {(1 3ε)γ 2ε}τ. It then follows that the algorithm stops before the guess of |OPT| satisfies |OPT| κ (1 + α)|OPT|. The value of multi-linear extension of the output fractional solution then satisfies (1 + 3ε)F(x) + 2εf(OPTκ) Y {(1 3ε)γ 2ε}τ. Combining the above inequality with that F(x) γf(OPTκ), then F(x) (1 3ε)γ 2ε 1 + 3ε + 2ε Since x P(Mβ), where Mβ is the β extension of the fairness matroid under the guess κ, then after the pipage rounding step, we would have that S Mβ, and the value of objective function satisfies Ef(S) F(x) (1 3ε)γ 2ε γ τ. After the rounding steps from Line 7 to Line 13 in Algorithm 4, we would get that the final solution set satisfies Ef(S) (1 3ε)γ 2ε 1 + 3ε + 2ε β |S Uc| β qc|S| |S| (1 + α)β|OPT|. D APPENDIX FOR SECTION 3 In this section, we present the missing content in Section 3 in the main paper. D.1 APPENDIX FOR SECTION 3.1 In this portion of appendix, we present the missing details and proofs in Section 3.1 in the main paper, which is about two discrete algorithms greedy-fair-bi and threshold-fairness-bi. We begin by presenting the proof of Lemma 2, followed by proofs of the threshold greedy algorithm threshold-fairness-bi. Finally, the pseudocode Published as a conference paper at ICLR 2025 of greedy-fair-bi and threshold-fairness-bi are presented in Algorithm 5 and Algorithm 6 respectively. First of all, we prove Lemma 2, which builds the relationship between the original fairness matroid and its β-extension for any β N+. Lemma 2. For any β N+ and any fairness matroid Mfair(P, κ, l, u), denote Mβ as the βextended fairness matroid of Mfair(P, κ, l, u). Then for any set S Mβ with |S| = βκ, T Mfair(P, κ, l, u) with |T| = κ, and any permutation of S = (s1, s2, ..., sβκ), there exist a sequence E = (e1, e2, ..., eβκ) such that each element in T appears β times in E and that Si {ei+1} Mβ, i {0, 1, ..., βκ} where Si = (s1, s2, ..., si) and S0 = . Proof. Before proving the lemma, we define some notations here. For any sequence of any length m denoted as A = (a1, a2, ..., am), we define the number of element x in the sequence as |Ax|, i.e., |Ax| := |{i : ai = x}|. In addition, we define the number of elements of group c in the sequence as |Ac|, i.e., |Ac| = |{i : ai Uc}| = P x Uc |Ax|. For the sequence E, we denote the sequence containing i-th element to the last elements as Ei, i.e., Ei = (ei, ei+1, ..., eβκ). Now we prove a stronger claim which would imply the results in the lemma. Claim 1. For any β N+, denote Mβ as the β-extension of the fairness matroid. Then for any set S Mβ, there exists a sequence E = (e1, ..., eβκ) such that for each i {0, 1, ..., βκ}, the sequence Fi = (Si, Ei+1) = (s1, s2, ..., si, ei+1, ..., eβκ) satisfies that |F c i | ucβ c [N] X c [N] max{|F c i |, lcβ} βκ. Here for each e E, we have e T. Besides, we have that for any element x T, |F x i | β. We prove the claim by induction. First, when i = βκ, Fi = S. Since S Mβ, the claim holds. Suppose the result in the claim holds for i, and we prove the claim for i 1. There are two cases. Case 1. There exists some group c0 such that |(Si 1, Ei+1)c0| lc0β 1. Since |T| = κ, |T Uc| lc for each c [N]. Therefore, in this case, there exists at least one element x Uc0 T such that |(Si 1, Ei+1)x| < β. Then choose ei = x and Ei = (x, Ei+1), the results in the claim will be satisfied. Case 2. For all group c [N], |(Si 1, Ei+1)c| lcβ. Since the sequence (Si 1, Ei+1) is of length βκ 1, we have that |(Si 1, Ei+1)| < βκ |T|β. Therefore, there exists at least one group c1 such that |(Si 1, Ei+1)c1| < |T Uc1|β. (Otherwise P c [N] |(Si 1, Ei+1)c| P c [N] |T Uc|β = βκ, which breaks the assumption.) From |(Si 1, Ei+1)c1| < |T Uc1|β, we have that there exists at least one element x T Uc1 such that |(Si 1, Ei+1)x| β 1. Then we set the i-th element in E to be x, then Ei = (x, Ei+1). It follows that |(Si 1, Ei)x| β. For each element x T/{x} , |(Si 1, Ei)x | = |(Si 1, Ei+1)x | β. Since ei = x Uc1. For group c = c1, |(Si 1, Ei)c| = |(Si 1, Ei+1)c| ucβ by the assumption that the claim holds for iteration i. For group c1, |(Si 1, Ei)c1| = |(Si 1, Ei)c| + 1 ucβ. Since for all group c [N], |(Si 1, Ei+1)c| lcβ, it follows that |(Si 1, Ei)c| lcβ. Thus X c [N] max{|(Si 1, Ei)c|, lcβ} = X c [N] |(Si 1, Ei)c| Published as a conference paper at ICLR 2025 = |(Si 1, Ei)| = βκ. Thus we prove the claim for iteration i 1 under the assumption that the claim holds for i. By induction, the claim holds for all i. For i = 0, (S0, E0) = E. From the construction of E we have that |E| = βκ, and that |Eo| = β for all o T. Since for each group c, we have |Si {ei+1} Uc| |(Si, Ei)c|. From the result in the claim, we can prove that Si {ei+1} Mβ. D.1.1 PROOF OF THEOREM 3 Theorem 3. Suppose that greedy-fairness-bi is run for an instance of FSM, then greedy-fairness-bi outputs a solution S that satisfies a (1 ε, 1 ε)-bicriteria approximation guarantee in at most O (nκ/ε) queries of f. Proof. Denote the optimal solution of max S Mfair(P,κ, l, u) f(S) as OPT, i.e., OPT = arg max S Mfair(P,κ, l, u) f(S). Since by Lemma 1, we have M1/ε = Mfair(P, κ/ε, l/ε, u/ε) is a matroid of rank κ/ε, then the Algorithm 5 ends after κ/ε steps and the output solution set satisfies |S| = κ/ε. Since S M1/ε, then |S Uc| uc/ε c [N] max c [N]{|S Uc|, lc/ε} κ/ε. Then it remains to prove that f(S) (1 ε)f(OPT). From Lemma 2, we know that there exists a sequence E that contains 1/ε copies of OPT and that at each step i, Si {ei+1} M1/ε. Then by the greedy selection strategy, we have f(Si+1) f(Si) f(Si {ei+1}) f(Si). Thus by submodularity, we have f(Si+1) f(Si) f(Si {ei+1}) f(Si) f(S, ei+1). Summing over all i, we would get i=0 f(Si+1) f(Si) i=0 f(S, ei+1). Since the sequence E contains 1/ε copies of each element in OPT, then P k ε 1 i=0 f(S, ei+1) = 1/ε P o OP T f(S, o). Since P k ε 1 i=0 f(Si+1) f(Si) = f(S) f( ) and that f is nonnegative, i=0 f(S, ei+1) 1/ε X o OP T f(S, o) f(OPT) f(S) Thus we have f(S) 1 1 + εf(OPT) (1 ε)f(OPT). D.1.2 PROOF OF THEOREM 4 Before we present the proof of the theorem, first we present the proof of the following lemma. Let us denote the solution set after the i-th element in threshold-fairness-bi as Si. By Lemma 2, we know that we can construct a sequence E = (e1, e2, .., eκ/ε) that contains 1/ε copies of OPT and that Si {ei+1} M1/ε. Then we have the following lemma. Published as a conference paper at ICLR 2025 Lemma 6. For any 0 i < κ/ε, it follows that f(Si, si+1) (1 ε) f(Si, ei+1) εd/κ. Proof. First, we consider the case if si+1 is added to the solution set and is not a dummy variable, it follows that f(Si, si+1) τ. Since Si {ei+1} M1/ε, then if ei+1 / Si, by submodularity we have f(Si, ei+1) τ/(1 ε). If ei+1 Si, then f(Si, ei+1) = 0 τ/(1 ε). Next, we consider the case if si+1 is a dummy variable, then f(Si, si+1) = 0. If ei+1 Si, then f(Si, ei+1) = 0 and the above inequality in the lemma holds. If ei+1 / Si, since Sκ1 {ei+1} M1/ε, then f(Si, ei+1) εd/κ. Therefore, we have that f(Si, si+1) (1 ε) f(Si, ei+1) εd/κ. Next, we present the proof of Theorem 4. Theorem 4. Suppose that threshold-fairness-bi is run for an instance of FSM with ε (0, 1). Then threshold-fairness-bi outputs a solution S that satisfies a (1 2ε, 1 ε)- bicriteria approximation guarantee in at most O (n/ε log(κ/ε)) queries of f. Proof. First, notice that the Algorithm 6 ends in at most (1/ε) log(κ/ε) number of iterations. Therefore, there are at most n/ε log(κ/ε) number of queries to f. Next, we prove the bicriteria approximation ratio of threshold-fairness-bi. From the description of Algorithm 6, we have that the output solution set S M1/ε, then |S Uc| uc/ε c [N] max c [N]{|S Uc|, lc/ε} κ/ε. It remains to prove that f(S) (1 2ε)f(OPT) where OPT is defined as the optimal solution of FSM, i.e., OPT = arg max S Mfair(P,κ, l, u) f(S). For simplicity, we assume the returned solution has size |S| = k1. As discussed in the proof of Theorem 3, Rank(M1/ε) = κ/ε. Here we denote the solution set as S = (s1, s2, ..., sκ/ε), and we define Si as Si = (s1, ..., si). Here si is the i-th element added to the solution set. In the case when the threshold τ drops below εd/κ at the termination and κ1 κ/ε, we can add dummy elements to S such that |S| = κ/ε. By Lemma 6, we have that there is a sequence E = (e1, e2, ..., ek/ε) that contains 1/ε copies of OPT and f(Si, si+1) (1 ε) f(Si, ei+1) εd/κ. Thus by submodularity, we have f(Si+1) f(Si) (1 ε){f(Si {ei+1}) f(Si)} εd/κ (1 ε) f(S, ei+1) εd/κ. Summing over all i, we would get i=0 f(Si+1) f(Si) (1 ε) i=0 f(S, ei+1) d. Since the sequence E contains 1/ε copies of each element in OPT, then P k ε 1 i=0 f(S, ei+1) = 1/ε P o OP T f(S, o). Since Pκ/ε 1 i=0 f(Si+1) f(Si) = f(S) f( ) and that f is nonnegative, i=0 f(S, ei+1) d o OP T f(S, o) d ε{(1 ε){f(OPT) f(S)} εf(OPT)} Published as a conference paper at ICLR 2025 Algorithm 5 greedy-fairness-bi 1: Input: ε, fairness matroid Mfair(P, κ, l, u) 2: Output: S U 3: S 4: Denote Mfair(P, κ/ε, l/ε, u/ε) as M1/ε. 5: while x s.t. S {x} M1/ε do 6: V {x U|S {x} M1/ε} 7: u arg maxx V f(S, x) 8: S S {u} return S Algorithm 6 threshold-fairness-bi 1: Input: ε, fairness matroid Mfair(P, κ, l, u) 2: Output: S U 3: S 4: Denote Mfair(P, κ/ε, l/ε, u/ε) as M1/ε 5: d max{x} M1/ε f({x}) 6: for τ = d; τ εd/k; τ τ(1 ε) do 7: for x U do 8: if S {x} M1/ε and f(S, x) τ then 9: S S {x} 10: if |S| = κ/ε then 11: return S return S ε{(1 2ε)f(OPT) (1 ε)f(S)}. By re-arranging the above equation, we have that f(S) (1 2ε)f(OPT). D.2 APPENDIX FOR SECTION 3.2 In this section, we provide the omitted content from Section 3.2 of the main paper. Specifically, we present the proof of Lemma 7, which offers the theoretical guarantee for the subroutine algorithm decreasing-threshold-procedure of the continuous algorithm cont-bi. The statement of the Lemma is as follows. Lemma 7. During each call of decreasing-threshold-procedure, the output coordinate set B satisfies that F(x + ε1B) F(x) ε{ln(1/ε) + 1}((1 6ε)f(OPT) F(x + ε1B)). Proof. For notation simplicity, we denote the rank of the matroid Mln( 1 ε )+1 as m, i.e., m := (ln( 1 ε) + 1)κ. Here we denote the output solution set as B = {b1, b2, ..., bm} where bi is the ith element that is added to set B. Here if |B| < m, then for any i > |B|, bi is defined as a dummy variable. In addition, we define Bi = {b1, b2, ..., bi}. Since Mln( 1 ε )+1 is an ln( 1 ε) + 1-extension of the original fairness matroid, then by Lemma 2, there exists a sequence E = (e1, e2, ..., em) such that E contains ln(1/ε) + 1 copies of the optimal solution OPT = {o1, o2, ..., oκ} such that Bi 1 {ei} Mln(1/ε)+1 for each i [m]. Notice that by Lemma 8, we have that with probability at least 1 ε3 2n4 , for any fixed x + 1B and fixed element u, the empirical mean ˆX(Bi, u), which is the average over 3κ ε3 samples of the Published as a conference paper at ICLR 2025 random variable X = f(S(x + ε1Bi), u) satisfies that | ˆX(Bi, u) E f(S(x + ε1Bi), u)| ε κf(OPT) + εE f(S(x + ε1Bi), u). (1) Since during the execution of cont-bi, there are at most n ε2 log(κ/ε) such estimations, by applying the union bound, we have that with probability at least 1 1 2n2 , the inequality (1) holds for each x, B and u during cont-bi. From the description in Algorithm 3, we can see that ˆX(Bi 1, bi) w. For the element ei, we have that ˆX(Bi 1, ei) w 1 ε or at the last iteration, we have that ˆX(Bi 1, ei) εd κ . Therefore, we have that ˆX(Bi 1, bi) (1 ε) ˆX(Bi 1, ei) εd κ . Since OPT = max S Mfair(P,κ, pκ, qκ) f(S) d, it then follows that (1 + ε)E f(S(x + ε1Bi 1), bi) (1 ε)2E f(S(x + ε1Bi 1), ei) 3ε By rearranging the above inequality and simple calculations, we have E f(S(x + ε1Bi 1), bi) (1 3ε)E f(S(x + ε1Bi 1), ei) 3ε By the construction of set B, we would get F(x + ε1B) F(x) = i=1 F(x + ε1Bi) F(x + ε1Bi 1) i=1 E f(S(x + ε1Bi 1), bi) i=1 (1 3ε)E f(S(x + ε1Bi 1), ei) 3εf(OPT) i=1 E f(S(x + ε1Bi 1), ei) ε) + 1)f(OPT) i=1 E f(S(x + ε1B), ei) ε) + 1)f(OPT), where the last inequality results from the submodularity of f. From Lemma 2, we have that F(x + ε1B) F(x) ε(1 3ε)(ln(1 i=1 E f(S(x + ε1B), oi) ε) + 1)f(OPT) ε(1 3ε)(ln(1 ε) + 1){f(OPT) F(x + ε1B)} ε) + 1)f(OPT) ε{ln(1/ε) + 1}((1 6ε)f(OPT) F(x + ε1B)). Published as a conference paper at ICLR 2025 D.2.1 PROOF OF THEOREM 5 Theorem 5. Suppose that Algorithm 2 is run for an instance of FSM, then with probability at least 1 1 n2 , cont-thresh-greedy-bi outputs a solution S that satisfies a (1 7ε, ln( 1 ε) + 1)- bicriteria approximation guarantee in at most O nκ ε4 ln2(n/ε) queries of f. Proof. First of all, from the description of the subroutine algorithm DTP in Algorithm 3, we can see that there are at most log(κ/ε)/ε number of iterations in the outer for loop. Therefore, the subroutine algorithm DTP takes at most O( nκ ln(n/ε) ln(κ/ε) ε3 ). Since there are at most 1 ε calls to DTP, we can prove the sample complexity. Next, we prove the bicriteria approximation ratio. By Definition 5, it is equivalent to prove that x P(Mln(1/ε)+1) and F(x) (1 7ε)f(OPT). Denote B(t) to be the output set of the t-th call to the subroutine algorithm DTP. Then it follows that the output solution set x of cont-bi can be denoted as x = P1/ε t=1 ε1B(t). Since B(t) Mln(1/ε)+1, we have that 1B(t) P(Mln(1/ε)+1). By the fact that P(Mln(1/ε)+1) is convex, we have that x P(Mln(1/ε)+1). Denote the fractional solution x after t-th step as xt, then by Lemma 7, we have F(xt+1) F(xt) ε{ln(1/ε) + 1}((1 6ε)f(OPT) F(xt+1)). For notation simplicity, we define L = ε{ln(1/ε) + 1}. It then follows that F(xt+1) F(xt) + L(1 6ε)f(OPT) 1 + L Since there are 1/ε iterations in cont-bi, the output x satisfies that x = x1/ε. By applying induction to the above inequality, we would get F(x1/ε) (1 (1 + L) 1/ε){(1 6ε)f(OPT)} (1 e 1 ε ( L2 2 L)){(1 6ε)f(OPT)} (1 ε)(1 6ε)f(OPT) (1 7ε)f(OPT). Observe that compared to greedy-fair-bi and threshold-fairness-bi, our proposed algorithm cont-bi demonstrates an enhanced approximation ratio for the cardinality of the output solution set, improving from O(1/ε) to O(ln(1/ε)). This improvement is achieved while maintaining the same order of function value violation, specifically f(S) (1 O(ε))f(OPT). However, this enhancement requires an increased number of queries. This suggests the potential for attaining comparable approximation ratios for the submodular cover problem under specific types of matroidtype constraints. Corollary 5.1. Using the Algorithm 3 as the subroutine for the converting algorithm in Algorithm 1, we obtain an algorithm that achieves an approximation ratio of ((1 + α) ln( 1 ε) + 1, 1 O(ε)) in at most O( n(1+α)|OP T | log2( n ε ) log |OP T | ε4α ) with high probability. Proof. The result of the approximation ratio can be obtained by combining Theorem 2 and Theorem 5 together. Here we provide proof of the sample complexity. Notice that for each guesses of OPT of cardinality κg, the algorithm cont-bi that runs with the input Mfair(P, κg, pκg, uκg) uses at most O( nκg log2( n ε ) ε4 ). From the result in Theorem 2 and Theorem 5, the total number of sample complexity would be O( n(1+α)|OP T | log2( n ε ) log |OP T | ε4α ). E PROOF OF TECHNICAL LEMMAS Lemma 8 (Relative + Additive Chernoff Bound (Lemma 2.3 in Badanidiyuru & Vondr ak (2014))). Let X1, ..., XN be independent random variables such that for each i, Xi [0, R] and E[Xi] = µ for all i. Let b XN = 1 N PN i=1 Xi. Then P(| b XN µ| > αµ + ε) 2 exp{ Nαε Published as a conference paper at ICLR 2025 F G R E E D Y-B I ALGORITHM Algorithm 7 greedy-bi 1: Input: ε, τ 2: Output: S U 3: S 4: while f(S) (1 ε)τ do 5: u arg maxx U f(S, x) 6: S S {u} return S G ADDITIONAL EXPERIMENTS G.1 IMPLEMENTATION Maximum Coverage. In maximum coverage problems, the objective is to identify a set of fixed nodes that optimally maximize coverage within a network or graph. Given a graph G = (V, E), where V and E respectively represent the set of vertices and nodes in the graph. Define a function N : V 2V as N(v) = {u, (u, v) E}, which represents the collection of neighbors of node v. Then the objective of this maximization problem can be defined as the monotone submodular function f(S) = | v S N(v)|. The dataset utilized in our maximum coverage experiments include Twitch 2000 and the Twitch 5000 of Twitch Gamers, which is a uniformly sampled subgraph of the Twitch Gamers dataset (Rozemberczki & Sarkar, 2021), comprising 5,000 vertices and 2,000 vertices (users) who speak English, German, French, Spanish, Russian, or Chinese respectively. We aim to develop a solution with a high f value exceeding a given threshold τ while ensuring a fair balance between users who speak different languages. Set Covering For the datasets annotated with tags, the objective of set covering is to extract a diverse subset that maximizes the objective function f(S) = | x S t(x)|, where the function t maps an element x in a set N to its corresponding tags t(x). The dataset employed in our set covering experiments is a subset of Corel5k Duygulu et al. (2002). For each item in the dataset, we randomly added a category from {0, 2, 3, 4, 5} with a probability of 0.5. Any item not assigned a random category was assigned category 1. By ensuring a balanced distribution of solutions across categories, we aim to extract a representative set with a high f value that surpasses a given threshold τ. Experimental setup for max coverage. We implement our proposed algorithm on two different instances, including the Twitch 5000 dataset and Twitch 2000. The algorithms implemented include convert-fair leveraging two subroutines provided in Appendix D: greedy-fair-bi (Algorithm 5, referred to as GREEDY-Fair ) and threshold-fairness-bi (Algorithm 6, referred to as GREEDY-Fair ). On the relatively small dataset Twitch 2000, we also implement convert-continuous using cont-bi as the subroutine ( referred to as CONTIFair ). Due to high query complexity of the continuous algorithm cont-bi, we evaluate convert-continuous heuristically by taking 5 samples per estimation in Line 8 of the decreasing-threshold-procedure. We compare our approach to the greedy baseline, greedy-bi, provided in Appendix F. To ensure a fair comparison based on the quality of the solutions, we use different default values for the parameter ε in each algorithm. This is because each algorithm has a varying approximation ratio. Specifically, we set ε = 0.1, α = 0.2, uc = 1.1/C, lc = 0.9/C for greedy-bi and greedy-fair-bi (where C is the number of groups). For threshold-fairness-bi, we use ε = 0.05 while keeping the other parameters the same. All the experiments are conducted on a single machine equipped with a 13th Gen Intel(R) Core(TM) i7-13700 CPU, 32GB of RAM, and Ubuntu 22.04.3 LTS. Each experiment with one set of parameters can be done in 120 seconds. Published as a conference paper at ICLR 2025 G.2 EXPERIMENTS SETUP FOR SET COVERING. We implement our proposed algorithm convert-fair leveraging two subroutines provided in Appendix D: greedy-fair-bi (Algorithm 5) and threshold-fairness-bi (Algorithm 6). We compare our approach to the greedy baseline, greedy-bi, provided in Appendix F. To ensure a fair comparison based on the quality of the solutions, we use different default values for the parameter ε in each algorithm. This is because each algorithm has a varying approximation ratio. Specifically, we set ε = 0.1, α = 0.2, uc = 1.1/C, lc = 0.9/C for greedy-bi and greedy-fair-bi (where C is the number of groups). For threshold-fairness-bi, we use ε = 0.05 while keeping the other parameters the same. All the experiments are conducted on a single machine equipped with a 13th Gen Intel(R) Core(TM) i7-13700 CPU, 32GB of RAM, and Ubuntu 22.04.3 LTS. Each experiment with one set of parameters can be done in 30 seconds. (a) greedy-bi (b) greedy-fairness-bi (c) threshold-fairness-bi 50 150 250 350 Value of the given threshold GREEDY GREEDY-Fair THRES-Fair 50 150 250 350 Value of the given threshold GREEDY GREEDY-Fair THRES-Fair 50 150 250 350 Value of the given threshold Fairness difference GREEDY GREEDY-Fair THRES-Fair (f) Fairness difference Figure 2: Performance comparison on the Corel dataset for Set Covering. 2a, 2b, 2c illustrate the distribution of images across various categories in the solutions produced by different algorithms with τ = 300. f: the value of the objective submodular function. Cost: the size of the returned solution. Fairness difference: (maxc |S Uc| minc |S Uc|)/|S| G.3 RESULTS G.4 RESULTS ON COREL DATASET Figures 2a, 2b and 2c showcase the distribution of images across various categories in the solutions produced by these algorithms with τ = 300. Figures 2d, 2e and 2f present the performance of these algorithms (f value, cost, and fairness difference) for varying values of τ. As shown in Figure 2a, with τ = 300, over 70% of the pictures in the solution returned by greedy-bi are labeled as category 1 . While the solutions produced by greedy-fair-bi and threshold-fairness-bi exhibit way fairer distributions across various categories as shown in Figures 2a and 2b). Similarly, as the value of given τ increases, the magnitude of this difference also increases (see 2f). Figure 2d showcases that for all these algorithms the objective function value f(S) scales almost linearly with the threshold τ, which aligns with the theoretical guarantees of the approximation ratio. Published as a conference paper at ICLR 2025 400 500 600 Value of the given threshold GREEDY-Fair GREEDY CONTI-Fair THRES-Fair 400 500 600 Value of the given threshold GREEDY-Fair GREEDY CONTI-Fair THRES-Fair Figure 3: Performance comparison on the Twitch 2000 dataset for maximum coverage. CONTIFair corresponds to the convert-fair algorithm using cont-bi as the subroutine. f: the value of the objective submodular function. Cost: the size of the returned solution. Fairness difference: (maxc |S Uc| minc |S Uc|)/|S| Notably, unlike the results presented in Section 4, our proposed algorithms achieve comparable costs to the greedy-bi solution (as shown in Figure 2e) on the Corel5k dataset. This is likely because the Corel5k dataset is less biased and the marginal gains for adding different elements are more uniform, compared to the Twitch Gamer dataset. G.5 RESULTS ON THE TWITCH 2000 DATASET The results of comparing different algorithms on the Twitch 2000 dataset are presented in Figure 3a and 3b. From the results, we can see under fairness constraints, the continuous algorithm CONTIFair achieves a lower cost compared to the discrete algorithms, GREEDY-Fair and THRES-Fair, aligning with the theoretical guarantees presented in the main paper. However, CONTI-Fair incurs a higher cost than the greedy algorithm without fairness constraint, potentially due to the limited sampling (five samples per estimation) of the multilinear extension in Line 8 in Algorithm 3, which falls short of the theoretical requirements.