# fair_online_influence_maximization__b49c4981.pdf Published in Transactions on Machine Learning Research (06/2025) Fair Online Influence Maximization Xiangqi Wang 1, Shaokun Zhang2, Jose Efraim Aguilar Escamilla3, Qingyun Wu2, Xiangliang Zhang1, Jian Kang4, Huazheng Wang 3 1University of Notre Dame 2Pennsylvania State University 3Oregon State University 4University of Rochester Reviewed on Open Review: https: // openreview. net/ forum? id= T1Nj RBI5xs Corresponding author: xwang76@nd.edu, huazheng.wang@oregonstate.edu Abstract Fair influence maximization in networks has been actively studied to ensure equity in fields like viral marketing and public health. Existing studies often assume an offline setting, meaning that the learner identifies a set of seed nodes with known per-edge activation probabilities. In this paper, we study the problem of fair online influence maximization, i.e., without knowing the ground-truth activation probabilities. The learner in this problem aims to maximally propagate the information among demographic groups, while interactively selecting seed nodes and observing the activation feedback on the fly. We propose Fair Online Influence Maximization (FOIM) framework that can solve the online influence maximization problem under a wide range of fairness notions. Given a fairness notion, FOIM solves the problem with a combinatorial multi-armed bandit algorithm for balancing exploration-exploitation and an offline fair influence maximization oracle for seed nodes selection. FOIM enjoys sublinear regret when the fairness notion satisfies two mild conditions, i.e., monotonicity and bounded smoothness. Our analyses show that common fairness notions, including maximin fairness, diversity fairness, and welfare function, all satisfy the condition, and we prove the corresponding regret upper bounds under these notions. Extensive empirical evaluations on three real-world networks demonstrate the efficacy of our proposed framework. 1 Introduction Influence maximization (IM) (Kempe et al., 2003) has been extensively studied with applications in viral marketing (Chen et al., 2010) and public health (Valente & Pumpuang, 2007; Yadav et al., 2018). Classic influence maximization aims to select a set of seed nodes in a social network to maximize the spread of influence (i.e., expected number of influenced nodes). The problem is typically formulated as a stochastic diffusion process (Kempe et al., 2003; Csókás & Vinkó, 2022) in a directed graph, with the edge weights representing the probabilities that the source nodes can activate target nodes. Recently, there has been an increase in concerns regarding algorithmic fairness motivated by recent studies on fair influence maximization (Farnadi et al., 2020; Tsang et al., 2019; Fish et al., 2019; Becker et al., 2020; Rahmattalabi et al., 2021; Feng et al., 2023). These studies have primarily consider egalitarian utility (the number of influenced nodes across demographic groups) over the classic utilitarian utility (the total number of influenced nodes). With advent of official regulations such as GDPR (European Parliament & Council of the European Union), fair algorithms that control discrimination have received renewed interest. For example, consider a health organization aiming to provide information related to preventive care against a pandemic to as many people as possible. Due to limited resources, the health organization is expected to select a set of influencers (seed nodes) who would propagate the vital information to others in a social network. By following egalitarian utility, an ideal choice of influencers would ensure a certain portion of at-risk individuals from some minority (or protected) group will see the information. The code can be found at https://github.com/Xiangqi Wang77/FOIM Published in Transactions on Machine Learning Research (06/2025) To date, solutions to fair influence maximization almost exclusively assume an offline setting, meaning that the activation probabilities are known a priori (Farnadi et al., 2020; Rahmattalabi et al., 2021; Tsang et al., 2019; Rahmattalabi et al., 2019). However, the activation probabilities may not be fully observable in many real-world applications. Prior works in fair offline influence maximization often obtain this information via simulations over data collected offline without fairness considerations (Rahmattalabi et al., 2021). Consequently, the estimated probabilities could likely be inconsistent for individuals in a minority group due to bias introduced in the offline data selection (e.g., fewer logged activations for individuals from the protected group). Based on the estimated information, this discrepancy can result in sub-optimal seed selection for fair offline influence maximization. This motivates the study of fair online influence maximization, whose goal is to estimate the activation probabilities by repeatedly interacting with the network to maximize the information diffusion among demographic groups. There are three key challenges in fair online influence maximization that must be addressed to develop efficient algorithms. The first challenge deals with the exploration-exploitation tradeoff under a fairness constraint (i.e., across different demographic groups). Second, the combinatorial nature of the influence maximization problem makes finding the optimal seed set an NP-hard problem (Kempe et al., 2003). Thus, it is non-trivial to solve the NP-hard combinatorial problem with a good exploration-exploitation trade-off under the fairness constraint. The third key challenge to fair online influence maximization is a theoretical challenge. While banditbased solutions are natural for online influence maximization without fairness considerations, extending their theoretical analysis to fair online scenarios introduces complications. Rigorous regret analyses for bandit-based solutions typically require the reward function to satisfy specific conditions such as submodularity (Hazan & Kale, 2012; Chen et al., 2018; Perrault et al., 2020), monotonicity, and bounded smoothness (Chen et al., 2014; Wang & Chen, 2017a; Wen et al., 2016). However, when considering an egalitarian reward function it is important to ensure the aforementioned characteristics hold for target fair metric. In this paper, we tackle both algorithmic and theoretical challenges and propose a framework named FOIM for Fair Online Influence Maximization. To tackle the algorithmic challenge, FOIM first runs a combinatorial multi-armed bandit (CMAB) algorithm for parameter estimation and balancing exploration-exploitation on the fly, and then applies an offline fair influence maximization oracle (Tsang et al., 2019; Farnadi et al., 2020) to tackle the NP-hard problem of selecting seed nodes. FOIM allows a wide range of fairness constraints (Tsang et al., 2019; Rahmattalabi et al., 2021), and can incorporate any desired CMAB algorithm. We provide theoretical guarantees in terms of sublinear regret bounds on the FOIM framework under various group fairness notions. In a nutshell, our theoretical results generalize existing analyses of CMAB-based online influence maximization without fairness considerations. A key insight of our analysis shows that the common group-fairness reward functions such as maxmin fairness (Tsang et al., 2019), diversity fairness(Tsang et al., 2019; Rahmattalabi et al., 2021), and welfare functions (Rahmattalabi et al., 2021), all satisfy the monotonicity and bounded smoothness conditions, which allows us to leverage existing analysis of CMAB algorithms and online influence maximization (Chen et al., 2014; Wen et al., 2016; Wu et al., 2019). We study the fair online influence maximization problem, whose goal is to improve the quality of activation probabilities and seed node selection on the fly. To the best of our knowledge, this is the first study on fair online influence maximization problem. We propose a generic FOIM framework that uses a CMAB algorithm for balancing exploration-exploitation and an offline fair influence maximization oracle for seed nodes selection. Theoretically, we show that common group fairness notions for influence maximization all satisfy monotonicity and bounded smoothness conditions and prove the corresponding sublinear regret upper bound for each fairness notion. We conduct extensive experiments on three real-world datasets. We compare the performance of FOIM with different CMAB algorithms and fairness oracles. Experimental results confirms the effectiveness of FOIM under different fairness notions. Published in Transactions on Machine Learning Research (06/2025) 2 Related Work Fair influence maximization aims to maximize the influence spread with consideration of fairness (Farnadi et al., 2020; Tsang et al., 2019; Fish et al., 2019; Becker et al., 2020; Rahmattalabi et al., 2021). Existing works on fair influence maximization mainly consider two types of fairness notions: group fairness (Tsang et al., 2019; Rahmattalabi et al., 2021; Farnadi et al., 2020) and individual fairness (Fish et al., 2019; Becker et al., 2020). For group fairness, Tsang et al. (2019) propose a multi-objective Frank-Wolfe based algorithm to ensure maximin fairness and diversity fairness. Rahmattalabi et al. introduce the welfare function which generalizes maximin fairness and diversity fairness, as a fairness notion. Farnadi et al. (2020) utilize mixed integer programming for fair influence maximization with respect to four different fairness notions (equality, equity, maximin fairness, and diversity fairness). For individual fairness, (Fish et al., 2019) use the welfare function to maximize the minimum probability of individuals receiving the information. Becker et al. achieve individual fairness in influence maximization through randomization. Our work is substantially different from these two lines of research. Compared with Tsang et al. (2019); Rahmattalabi et al. (2021); Farnadi et al. (2020) that study influence maximization in offline setting, we consider an online setting where the activation probabilities are not given, and we aim to estimate the activation probabilities by interacting with the network. Additionally, Fish et al. (2019) and Becker et al. (2020) study individual fairness which is a different fairness notion to the focus of this paper, i.e., group fairness. Online influence maximization assume that the activation probabilities are not known a priori. Several works have been proposed to solve online influence maximization problems with combinatorial multi-armed bandit, including CUCB (Chen et al., 2014), IMLin UCB (Wen et al., 2016) and IMFB (Wu et al., 2019). The key idea is to regard the propagation probability of edges in the graph as the arms of bandits and learn the combinatorial set of arms as super arm in the online setting. More specifically, CUCB (Chen et al., 2014), one of the most fundamental CMAB-based online influence maximization algorithms, is an extension of the upper confidence bound (UCB) algorithm (Auer et al., 2002) to the combinatorial setting that achieves asymptotic optimal regret. IMLin UCB (Wen et al., 2016) applies Lin UCB (Li et al., 2010), a classic contextual bandit algorithm, for online influence maximization. IMFB (Wu et al., 2019) is a state-of-the-art online influence maximization algorithm, which takes network assortativity and factorizes the edge weights into latent factors. It is worth noting that both IMLin UCB and IMFB are based on CUCB and can achieve asymptotic optimal regret. However, none of them (Chen et al., 2014; Wu et al., 2019; Wen et al., 2016) take fairness into consideration, whereas our work aims to solve online influence maximization under fairness constraints. 3 Background Influence maximization. The classical Influence Maximization (IM) problem formulation considers a directed graph G = (V ; E; µ), where V and E represent the set of nodes and edges in the graph, respectively. The graph is further equipped with edge weights µ = {µ1, µ2, . . . , µ|E|}, i {1, . . . , |E|}, µi [0, 1] for any edge ei E. These weights represent the probability that an active node will influence some neighbor node according to some diffusion model. In our case, we exclusively consider the independent cascade (IC) model (Kempe et al., 2003). Formally, solving an IM problem involves choosing some seed set S 2V with cardinality constraint |S| K influencing the largest number of nodes under some diffusion model. S := arg max S 2V IG(S) s.t. |S| K, (1) where S represents the optimal seed set maximizing influence spread for graph G, and IG(S) is the expected number of influence nodes achieved by seed set S. Influence maximization is a well-known NP-hard problem (Kempe et al., 2003; Nemhauser et al., 1978). This result, mainly due to the combinatorial nature of finding a seed set, makes finding an optimal solution computationally challenging. To circumvent this problem, most proposed learning algorithms follow a "two-step" approach. First, the learning algorithm estimates the edge weights µ using previously collected Published in Transactions on Machine Learning Research (06/2025) data. Then, an algorithm (known as an oracle) chooses an approximate seed set ˆS. The key to attaining a computationally tractable algorithm is to assume that the oracle algorithm is an (α, β)-approximation oracle. Definition 1 ((α, β)-Approximation Oracle (Chen et al., 2014)). An (α, β)-Approximation Oracle, in the context of Influence Maximization, is defined as any algorithm that can produce a seed set ˆS 2V given estimated edge weights ˆµ obeying the following probability: Pr h rµ( ˆS) α rµ(S ) i β, (2) where rµ(S) is the reward function corresponding to the chosen seed set given edge weights µ. Intuitively, an (α, β)-approximation oracle is allowed to produce seed sets that deviate by a fraction α from the theoretical best with probability at least β. This helps in reducing the computational burden on the oracle algorithm by accepting approximate solutions. For example, in greedy oracle α is set as 1 1 e with β set as 1. Online IM. Online IM is referred as not assuming the known per-edge activation probability beforehand, in contrast to offline influence maximization. Specifically, it aims to simultaneously seek the influential nodes (exploitation) and estimate the social network (exploration). And the combinatorial multi-armed bandit (CMAB) framework is a natural choice that can balance the exploration-exploitation trade-off (Wen et al., 2016; Wu et al., 2019; Chen et al., 2014; Vaswani & Lakshmanan, 2015). Given the graph G and cardinality K, an online solution typically follow the following procedure. At each step t, the learner first chooses a set of seed nodes St V with cardinality limit K based on historical information, by performing offline IM (regardless of fair or not) algorithms according to the current estimation of the graph. Here, these algorithms regard the solution obtained as an ORACLE. We denote it as St = ORACLE(G, ˆµt, K), where ˆµt represents the estimation of per-edge activation probability at step t. Then, the learner updates the estimation for the graph with St as the seed set. rµ(St) is utilized to represent the corresponding reward, which might be modified with fair constraints. The learner s objective is to minimize the expected cumulative regret by finding a sequence of seed nodes ˆS over a finite T rounds of interactions: h rµ (S t ) rµ ˆSt i . (3) Note that the reward function rµ(St) is a general notation that can represent not only influence spread but also other reward functions such as fair IM reward. Fairness in IM. Given its applicability to real-world problems, the IM problem highlights fairness as a major concern in algorithm design. Previous literature has introduced various fairness metrics like maximizing minority group welfare in graph through IM process, but incorporating fairness in an online setting remains challenging due to the lack of consensus on effective fairness integration. This paper presents an algorithmic design framework that generalizes fairness measures while enabling theoretical regret analysis within the Combinatorial Multi-armed Bandit (CMAB) framework, specifically designed for online applications. 4 The Fair Online Influence Maximization (FOIM) Framework We begin the presentation of our novel FOIM framework by extending our formulation of influence maximization (IM) to include fairness. Starting from a set of seed nodes S, we use IG,Ci(S) to denote the expected number of influenced nodes that belong to not necessarily disjoint group (or community) Ci. In one diffusion process , we use the set u = {u1, u2, . . . , u|V |} to represent the probability of each node being activated. We employ u Ci = IG,Ci(S) |Ci| to represent the expected fraction of the nodes being activated in group Ci. Additionally, given graph G and any feasible set of seed nodes S, we use ES to denote the set of edges in the subgraph induced from graph G, where the subgraph is formed by seed nodes S. We also employ ES to represent the set of edges reachable from seed set S in one IC diffusion process. We refer to a diffusion process as a single cycle of influence within network G, where influenced nodes are allowed to attempt to influence neighbor nodes once. Published in Transactions on Machine Learning Research (06/2025) Formally, given a graph G, fair influence maximization aims to find a set of nodes S under a cardinality constraint K, which maximizes the influence spread, where IG,Ci(S) denotes the influenced nodes by some seed set S with respect to community Ci. S := arg max |S| K Ci C IG,Ci(S) (4) Using a slight abuse of notation, we let IG(k) be the maximum number of influence nodes that can be achieved with cardinality constraint k IG(k) = X Ci C IG,Ci(S ) s.t. |S | k. (5) 4.1 Fairness Constraints In this section, we demonstrate how our FOIM framework effectively incorporates various fairness constraints, covering most of the fairness aspects introduced in prior literature. After presenting the studied fairness metrics, we propose the FOIM algorithm. In addition, we provide discussion of FOIM on TPM condition in Appendix A.13 FOIM with maximin fairness. Maxmin fairness proposed by Tsang et al. (2019); Diana et al. (2021) measures the influence of the minimal group in the influence maximization algorithm. Given any feasible seed nodes S, the IM reward function with maximin fairness constraint is rmaxmin µ (S) = min Ci C IG,Ci(S) where IG,Ci(S) |Ci| is the proportion of the expected number of influenced nodes for group Ci to its population, and µ denotes the edge weights in the graph. Searching the seed nodes with this reward function could be described as ˆS = arg max S S {rmaxmin µ (S)}. (7) FOIM with diversity fairness. The key idea of the diversity fairness constraint from Tsang et al. (2019) is that no group benefits more by leaving the IM game and allocating its proportional resources internally. For the i-th group Ci, we define ki = K|Ci|/|C| as the fair allocation of influenced nodes in Ci based on group size, where K is the seed budget. Let G[Ci] denote the subgraph induced by nodes in Ci, while keeping their original connections in G. To satisfy the diversity fairness constraint, the selected seed nodes S should satisfy IG,Ci(S) IG[Ci](ki), Ci C. (8) Then given any feasible seed nodes S and graph G = (V ; E; µ), the IM reward function with diversity fairness is defined as rdiversity µ (S) = Ci C IG,Ci(S) if Equation (8) holds 0 otherwise. (9) And searching the seed nodes for Eq. (9) is ˆS = arg max S S (rdiversity µ (S)). (10) FOIM with welfare function. The welfare function (Rahmattalabi et al., 2021) is a reward function that generalizes multiple fairness notions in IM, including maximin fairness and diversity fairness. It encourages uniform influence among different groups, with the trade-off between fairness and efficiency adjusted by a hyperparameter α. Mathematically, the welfare reward function is rwelfare µ (S) = c C [|c| (uc)α] /α, if α = 0 P c C |c| log(uc), if α = 0 (11) Published in Transactions on Machine Learning Research (06/2025) where uc denotes the expected fraction of the influenced vertices of community c, and uc 0, c C. Notably, when α , equation (11) is equivalent to the reward function for maximin fairness (Equation (6)). 4.2 FOIM Algorithm Now, we introduce the proposed Fair Online Influence Maximization (FOIM) framework, which solves fair online influence maximization problem with a wide range of fair rewards. Notably, the proposed FOIM framework is a general framework that supports the integration of any combinatorial multi-armed bandit (CMAB) algorithm under certain conditions. In Section 5, we prove that these conditions are naturally satisfied in common fair rewards and demonstrate that FOIM could achieve sublinear regret (Chen et al., 2014; Kveton et al., 2014). We present the FOIM framework in Algorithm 1. Algorithm 1: Fair Online Influence Maximization (FOIM) 1: Input Graph G, time budget T, seed nodes size K, reward function rµ(S), edge weight estimator of CMAB A(G, ˆµt, rt), fair oracle Fair ORACLE(G, ˆµt, K). 2: Initialize t 0 3: for t in {1,. . . , T} do 4: ˆµt A(G, ˆµt 1, rt 1) // estimate edge weights 5: St=Fair ORACLE(G, ˆµt, K) // obtain seed nodes 6: rt rµ(St) // obtain rewards 7: end for For instance, if CUCB (Chen et al., 2014) serves as backbone of A, ˆµ is estimated independently with UCB (Stone, 2010) algorithm at each time step. If A belongs to IMLin UCB (Wen et al., 2016), ˆµ is estimated with the contextual bandit Lin UCB algorithm with an additional input of edge features. Another worth mentioned input is the fair oracle Fair ORACLE(G, ˆµt, K), which identifies the optimal seed nodes based on the current estimated edge weights. Actually, any fair offline CMAB algorithms with (α, β) approximation guarantee could be utilized, such as multi-objective Frank-Wolfe algorithm (Tsang et al., 2019) and Mixed Integer Programming (MIP) (Farnadi et al., 2020). Given the graph G and cardinality K, FOIM runs with the following procedure: at each step t, CMAB algorithm A estimates the edge weights ˆµt according to the regret rt 1 and ˆµt 1 from at last time step. Then, Fair ORACLE obtains the seed nodes based on the current estimated edge weights ˆµt and calculate the regret for the obtained seed nodes. The procedure repeats until the given time budget T is reached. In addition, we give the time complexity analysis of FOIM framework in Appendix A.12. 5 Theoretical Analysis It has been demonstrated that a CMAB algorithm will achieve a sublinear regret bound when the corresponding reward function satisfies both monotonicity and bounded smoothness conditions (Chen et al., 2014; Wu et al., 2019; Wen et al., 2016). Here we first show the monotonicity and bounded smoothness conditions for fairness metrics below. Condition 1. Given graph G = (V ; E; µ), we require the following two conditions on the fair online IM reward rµ(S) for FOIM: Monotonicity. Given any feasible seed nodes S, the expected reward rµ(S) is monotonically nondecreasing with respect to the edge weights of the graph, i.e., if i {1, . . . , |E|}, µi µ i, then we have rµ(S) rµ (S). Bounded smoothness. There exists a continuous, strictly increasing (thus invertible) function f( ) with f(0) = 0, called bounded smoothness function, such that for two µ and µ , and for any Λ > 0, we have |rµ(S) rµ (S)| f(Λ), if max i ES |µi µ i| Λ. Published in Transactions on Machine Learning Research (06/2025) Note that FOIM is a general framework for fair online influence maximization based on CMAB algorithms. Specifically, we prove that the reward function with multiple fairness constraints satisfies the monotonicity and bounded smoothness conditions, implying the generality of FOIM being the solution to fair online influence maximization with different fairness constraints. 5.1 Fairness Analaysis Maximin fairness. In Lemma 1, we show that the reward functions for fair online influence maximization with maximin fairness satisfy Condition 1, i.e., monotonicity and bounded smoothness. Lemma 1. Given graph G = (V ; E; µ), the IM reward function with maximin fairness constraint satisfies monotonicity. It also satisfies bounded smoothness with smoothness function f(x) = |E|x. Proof. We first prove the monotonicity and then show the reward function satisfies bounded smoothness. P1 Monotonicity. Given two graph G and G with the set of edge weights µ and µ respectively, if i {1, . . . , |E|}, µi µ i, it is easy to have Ci C: IG,Ci(S) IG ,Ci(S). (12) Let Ci and Ci be the groups receiving the minimum proportional influence in G and G respectively: i = arg min Cj C IG,Cj(S) |Cj| , i = arg min Cj C IG ,Cj(S) |Cj| . (13) Regardless of whether i = i or i = i , we have: |Ci| IG,Ci (S) |Ci | IG ,Ci (S) |Ci | . (14) rmaxmin µ (S) = IG,Ci(S) |Ci| IG ,Ci (S) |Ci | = rmaxmin µ (S). (15) Thus, the monotonicity property rmaxmin µ (S) rmaxmin µ (S) is proven. P2 Bounded smoothness under f(x) = |E|x. Without loss of generality, suppose there exists graph G and G with edge weights µ and µ , respectively, i E µ i = µi + Λ, Λ > 0. According to monotonicity of the reward function, we have rmaxmin µ (S) rmaxmin µ (S) for any feasible seed nodes S. For any edge i = (v, u) ES, the increase of activation probability between v and every node reachable from v in G brought by i-th edge is at most Λ compared with it in graph G. Let Cmin and C min be the demographic groups receiving the minimum proportional influence in G and G , respectively. If Cmin = C min, the total increase of influence nodes in Cmin bringing by any edge in ES is at most |Cmin|Λ. So the total increase of IM reward function with maximin fairness constraint in G is |Cmin|Λ |Cmin| = Λ. If Cmin = C min, the total increase of IM reward function must be smaller than IG ,Cmin(S) |Cmin| IG,Cmin(S) |Cmin| considering Cmin = C min, and group Cmin achieves the smallest number of influenced nodes as portion to its population in G. Since IG ,Cmin(S) |Cmin| IG,Cmin(S) |Cmin| is bounded by Λ, the reward increase is also bounded by Λ. Here we let s = | ES|. The total reward increase bringing by edges set ES is at most sΛ |E|Λ. Since increasing the weights in edges that either cannot be reached from seed nodes S or belong to ES does not lead to an increase in reward, the reward improvement solely benefits from ES. We conclude that the IM utility function with maximin fairness satisfies bounded smoothness with f(x) = |E|x. Published in Transactions on Machine Learning Research (06/2025) Diversity fairness. We show in Lemma 2 that the reward function for fair online influence maximization with diversity fairness satisfies Condition 1, i.e., monotonicity and bounded smoothness. Lemma 2. Given graph G = (V ; E; µ), the reward function for fair online influence maximization with diversity fairness constraint satisfies monotonicity. It also satisfies bounded smoothness with smoothness function f(x) = |E||V |x + |V |. Proof. We first prove the monotonicity and then show the reward function satisfies bounded smoothness. In addition, we need to use Lemma 3 to assist our proof and Lemma 3 has detailed proof referred in Appendix A.9. Lemma 3. If the diversity constraint is satisfied with µ, the constraint will not be violated after the lifting of µ (adding Λ to µ) P1 Monotonicity. Given two graph G and G with the set of edge weights µ and µ respectively, if µi µ i, j {1, . . . , |E|} holds, it is trivial that for class Ci IG,Ci(S) IG ,Ci(S) IG[Ci](ki) IG [Ci](ki). (16) (1) Suppose both G and G satisfy diversity constraint given S. By Equation (16) and Equation (9), we have X Ci C IG,Ci(S) = rdiversity µ (S) rdiversity µ (S) = X Ci C IG ,Ci(S) (17) (2) Suppose at least one of G and G violates the diversity constraint given S. By Equation (16), it is impossible that G satisfies the diversity constraint while G violates (see more details in Lemma 3). Then If both G and G violate the diversity constraint, we have that rdiversity µ (S) = rdiversity µ (S) = 0; If G violates the diversity constraint while G satisfies, we have that rdiversity µ (S) = 0 and rdiversity µ (S) > 0. Thus, the reward function is monotone, i.e., rdiversity µ (S) rdiversity µ (S). P2 Bounded smoothness under f(x) = |E||V |x + |V |. The proof is irregular as it results in a bounded smoothness function of the form f(x) = kx + B. Therefore, its detailed proof and the rationale behind this form are provided in Appendix A.10. Welfare function. With that in mind, we show in Lemma 4 that the reward function for fair online influence maximization with welfare function as fairness constraint satisfies monotonicity and bounded smoothness (Condition 1). Lemma 4. Given graph G = (V ; E; µ), the IM reward function with welfare function as fairness constraint satisfies monotonicity. It also satisfies bounded smoothness with the smoothness function as [|E||V |((1 + x)α 1)] /α, if α 1 [|E||V |xα] /α, if 0 < α 1 [|E||V |((ϵ + x)α ϵα)] /α, if α < 0, where ϵ = min({uc| c C}). Proof. We first prove the monotonicity and then show the reward function satisfies bounded smoothness. P1 Monotonicity. As shown in Rahmattalabi et al. (2021), the welfare function is monotone with respect to the expected fraction of influenced nodes in each group. For graphs G and G with edge weights µ and µ , if µi µ i, i {1, . . . , |E|}, then uc u c, c C, implying rwelfare µ (S) rwelfare µ (S). Thus, increasing edge weights preserves monotonicity of the reward function. f(x) = |E||V |[log (ϵ + x) log ϵ] if α = 0; the analysis is similar to α < 0 case. Published in Transactions on Machine Learning Research (06/2025) P2 Bounded smoothness under Equation (18). Given graph G and any feasible seed nodes S, we use ES to denote the set of edges in the subgraph induced from graph G. We employ ES, to represent the set of edges reachable from seed set S in the expected IC diffusion process. Suppose there exists graph G and G with edge weights µ and µ respectively, where µ i = µi + Λ, i E, Λ > 0. The reward increase in G solely benefits from the weight improvement in edges set ES. For any edge i ES, the total increase of the reward function bought by edge i is at most P c C (|c|(uc + Λ)α)/α P c C (|c|(uc)α)/α compared with it in graph G. Due to space limitation, we defer the detailed discussion of the upper bound P c C (|c|(uc + Λ)α) /α P c C (|c|(uc)α) /α in Appendix A.11. In short summary, by Appendix A.11, we have the following bounded smoothness function |E||V |((1 + x)α 1)/α if α 1 |E||V |xα/α if 0 < α < 1 |E||V |((ϵ + x)α ϵα)/α if α < 0 (19) where ϵ = min({uc| c C}). Fairness notion incompatible with FOIM. While compatible with most fairness metrics in the literature, it should be noted that FOIM may not be universally applied to all fairness notions for influence maximization due to the requirement of theoretical analysis. One example is equity defined in Farnadi et al. (2020). Intuitively, equity asks for the expected number of influenced nodes within a group, denoted as Ci, is proportionate to its population ratio E [IG,Ci (S)] E [IG (S)] |Ci| This fairness notion violates theoretical requirement of monotonicity. Consider µ = µ + Λ where the increase of weight is on edges within the subgraph formulated by group Ci and the increased spread does not propagate to groups other than Ci. Consequently, only the spread in Ci would increase, while the spread in other groups remains the same. Thus, E [IG,Ci (S)] would have a larger share in E [IG (S)], yet |Ci| |V | is the same and violates Equation (20). 5.2 FOIM Fair Regret Analysis In this section, we analyze the regret bound for the proposed FOIM framework. We investigate the regret for FOIM with CUCB, which resembles the analysis of other online IM algorithms such as IMLin UCB (Wen et al., 2016) and IMFB (Wu et al., 2019). In particular, we explore how regret will change when the fair IM reward is employed. We primarily present the problem-dependent regret bounds (Chen et al., 2014; Audibert & Bubeck, 2009; Audibert et al., 2011) under different fairness notions. We define i,min and i,max as the minimum and maximum regret gaps among all bad seed sets that contain edge i, respectively, where each regret gap is given by i,j = α rµ(S ) rµ(Sj B,i). Theorem 1 (Maximin fairness regret). According to Lemma 1, the regret of CUCB under maximin fairness is X i {1,...,|E|},Ki>0 24 |E|2 ln T i min pi + π2 |E| max (21) Note that the problem-dependent regret is in O(ln T/ ). The proof is detailed in Appendix A.4. Theorem 2 (Diversity fairness regret). According to Lemma 2, the regret of CUCB under diversity fairness is X i {1,...,|E|},Ki>0 12 ln T|E|2|V |2 ( i min |V |)2pi i min + 12 ln T|E|2|V |2 pi(|V | imax) |E| max (22) Notably, when α = 1, the function f(x) simplifies to f(x) = |E||V |x, which corresponds to the bounded smoothness function of the original IM problem, making the proof benign. Published in Transactions on Machine Learning Research (06/2025) This regret is also in O(ln T/ ). For detailed proof, please refer to Appendix A.3. Theorem 3 (Welfare function regret). According to Lemma 4, we analyze the regret of CUCB under welfare function under different α values. When 1 α = 2, we have the regret as |E| max + X i {1,...,|E|} Ki>0 max 24|V |2|E|2ln T i minpi , 12 ln T i min α (2α 1)|E||V | i min 2 α pi + 12α ln T h ( i max)1 2 α ( i min)1 2 (α 2)pi α (2α 1)|E||V | 2 When α = 2, we have the regret as X i {1,...,|E|},Ki>0 18|E||V |ln T pi + 36|E||V |ln T( i max i min) |E| max (24) When 0 < α < 1, we have regret bound as i {1,...,|E|},Ki>0 12 ln T i min α |E||V | i min 2 α pi + 12α ln T h ( i max)1 2 α ( i min)1 2 (α 2)pi α |E||V | 2 When α < 0, we have regret bound as |E| max + X i {1,...,|E|} Ki>0 max 24 ln T pi max, 12 ln T i min α [(1+ϵ)α ϵα]|E||V | i min 2 + 12α ln T h ( i min)1+ 2 (α + 2)pi α [(1+ϵ)α ϵα]|E||V | 2 (25) For the detailed proof, please refer to appendix A.5. 6 Empirical Results We assess FOIM with CMAB backbones under multiple fairness metrics, detailing datasets and bandit variants, then report results that confirm superior performance and show the algorithm accurately recovers edge weights key to optimal influence maximization. 6.1 Cumulative Regret Comparison Datasets. We conduct our experiments on five real-world networks , namely NBA, German, Pokec-z, and bail, which are commonly used for fair graph learning (Dai & Wang, 2020; Ma et al., 2022), as well as You Tube. Detailed dataset statistics and descriptions are deferred in Table 1 and Appendix A.7. Dataset Nodes Edges Sensitive Activation Attribute Probability NBA 400 16,570 Country Jaccard index of nodes German 1,000 24,970 Age Dice ratio of nodes Pokec-z 6,185 15,321 Sex Cosine similarity of nodes Bail 18,876 403,977 Race Cosine similarity of nodes You Tube 1,134,890 2,987,624 Community Cosine similarity of nodes Table 1: Dataset statistics CMAB algorithms. We compare four different combinatorial multi-armed bandit (CMAB) algorithms in FOIM, including CUCB (Chen et al., 2014), ϵ-greedy (Kempe et al., 2003), IMLin UCB (Wen et al., 2016), and IMFB (Wu et al., 2019). CUCB (Chen et al., 2014). It estimates the per-edge activation probability independently and uses the upper confidence bound of the estimation to balance exploitation and exploration. https://github.com/yushundong/Graph-Mining-Fairness-Data/tree/main/dataset/ Published in Transactions on Machine Learning Research (06/2025) ϵ-greedy (Kempe et al., 2003). It estimates the per-edge activation probability independently and uses ϵ-greedy to balance exploitation (with probability 1 ϵ) and exploration (with probability ϵ). IMLin UCB (Wen et al., 2016) (Lin UCB). It is a model-independent contextual bandit algorithm that requires features on the receiving nodes as input to estimate the reachability between node pairs. IMFB (Wu et al., 2019). It is a factorization bandit algorithm that factorizes the activation probability on the edges into latent factors on the corresponding nodes. Offline oracle. In addition to the CMAB algorithm, another key component in the propose FOIM framework is the fair oracle, which is the fair offline influence maximization algorithm for seed node selection. In our experiments, we use two different fair oracles, namely multi-objective Frank-Wolfe (FW) algorithm (Tsang et al., 2019) (with suffix -FW) and mixed integer programming (MIP) (Farnadi et al., 2020) (with suffix -MIP). To demonstrate the efficacy of using fair oracle, we use degree discount (Chen et al., 2009) (i.e., methods without suffix), which is an offline influence maximization without fairness consideration, as oracle. Figure 1: Cumulative regrets under maximin fairness and diversity fairness in NBA. Methods with suffix -MIP use mixed integer programming (Farnadi et al., 2020) as offline oracle with fairness consideration. Methods without suffix use degree discount (Chen et al., 2009) as the offline oracle. Reward and regret. For maximin fairness, we set the reward function as Equation (6); for diversity fairness, we set the reward function as Equation (9); and for welfare function, the reward function is defined as in Equation (11). The cumulative regrets for all methods are calculated using Equation (3). Optimal reward is calculated by the fair IM oracle using groundtruth activation probabilities. Hyperparameter settings. In all experiments, the dimensionalities for the susceptibility factor in IMLin UCB and the latent factors in IMFB are both set to 20. The seed set volume is set to K = 300 for Pokec-z and bail dataset, and K = 10 for NBA and K = 50 for German dataset respectively. Activation probability. For all datasets, we select the first 8 features for each dataset with the sensitive attribute included. Then we initialize the activation probability with the heuristics as presented in Table 1 between node features. Regret with fairness constraints under FW oracle. In Figure 3 and Figure 2, we present the experimental results on the effectiveness of FOIM in reducing the cumulative regret for maximin fairness, diversity fairness, and welfare function (α = 2, α = 0.5, and α = 2) with multi-objective Frank-Wolfe (FW) algorithm (Tsang et al., 2019) oracle. From the figures, we observe that the cumulative regret with fairness constraints achieved under the fair oracle is significantly lower compared to other CMAB frameworks. This highlights the effectiveness of FOIM in improving the fairness of the algorithms. Regret with fairness constraints under MIP oracle (Farnadi et al., 2020). We also test the regret of FOIM with MIP (Farnadi et al., 2020) as oracle on maximin fairness and diversity fairness. Figure 1 presents the regret with MIP oracle on the NBA dataset. From the figure, we obtain similar observation that FOIM with MIP can achieve lower cumulative regret. 6.2 Estimation Error Analysis The empirical findings presented in our study provide substantial evidence supporting the accuracy and integrity of our fair experiment. In order to evaluate the performance of different experiments conducted under fair or submodular oracles, we quantitatively analyze the estimation error. This error metric captures the absolute discrepancy between the estimated activation probability pe,t and ground-truth probability pe over observed edges. Thus we are estimating the trend of |ˆpe,t pe|. Published in Transactions on Machine Learning Research (06/2025) Figure 2: Cumulative regrets and their standard deviation under welfare function (α = 2, 2, 0.5) in Pokec-z and NBA. Methods with suffix -FW use the multi-objective Frank-Wolfe algorithm (Tsang et al., 2019) as offline oracle with fairness consideration. Methods without suffix use degree discount (Chen et al., 2009) oracle. Figure 3: Cumulative regrets and their standard deviation under maximin fairness, diversity fairness in NBA and Pokec-z dataset. Methods with suffix -FW use the multi-objective Frank-Wolfe algorithm (Tsang et al., 2019) as offline oracle with fairness consideration. Methods without suffix use degree discount (Chen et al., 2009) as the offline oracle. Upon careful examination of the results in Figure 4, it becomes apparent that the disparity in estimation error between the fair oracle and the submodular oracle is not significant. This observation suggests that the fair oracle performs nearly as effectively as submodular oracle like degree discount. Consequently, this comparative analysis lends support to the validity of our fair experiment and lends credence to the notion that the fair modification implemented in our study is both reasonable and efficacious. 6.3 Additional Experiments Figure 4: Estimation error of activation probabilities in NBA. In Appendix A.8.2, we provide additional results on German, Bail and Youtube datasets with similar observations to prove fair regret smaller result with FOIM and scalability to millions of nodes. In addition, we also provide offline and online influence comparison in Appendix A.8.6 and trade-off between fair metrics and influence in Appendix A.8.7. We also additionally provide theoretical FOIM time complexity analysis and realtime running result in Appendix A.12 as additional empirical proof. Published in Transactions on Machine Learning Research (06/2025) 7 Conclusion and Future Work To promote equity and inclusivity in the online environment, we propose the FOIM framework, which combines fair offline influence maximization solutions with online combinatorial bandit models. We demonstrate the feasibility of using fairness constrained reward and oracle in combinatorial bandit algorithms. Additionally, we provide a thorough analysis of regret bounds of FOIM using the CUCB algorithm as an example. Empirical comparisons on three real-world datasets highlight the superior performance in terms of regret under fairness notions, and we validate the efficiency and accuracy through estimation error. However, it is important to note that our analysis currently focuses on group fairness notions. We consider to explore individual fairness as future work. Acknowledgements Jose Efraim Aguilar Escamilla and Huazheng Wang are supported in part by National Science Foundation under grant IIS-2403401. Jean-Yves Audibert and Sébastien Bubeck. Minimax policies for adversarial and stochastic bandits. In Annual Conference Computational Learning Theory, 2009. Jean-Yves Audibert, Sébastien Bubeck, and Gábor Lugosi. Minimax policies for combinatorial prediction games. In Annual Conference Computational Learning Theory, 2011. Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Mach. Learn., 47(2-3):235 256, May 2002. ISSN 0885-6125. doi: 10.1023/A:1013689704352. URL http://dx.doi.org/10.1023/A:1013689704352. Ruben Becker, Gianlorenzo D angelo, Sajjad Ghobadi, and Hugo Gilbert. Fairness in influence maximization through randomization. Ar Xiv, abs/2010.03438, 2020. Christian Borgs, Mickey Brautbar, Jennifer T. Chayes, and Brendan Lucier. Maximizing social influence in nearly optimal time. In ACM-SIAM Symposium on Discrete Algorithms, 2012. URL https://api. semanticscholar.org/Corpus ID:14989973. Lixing Chen, Jie Xu, and Zhuo Lu. Contextual combinatorial multi-armed bandits with volatile arms and submodular reward. Advances in Neural Information Processing Systems, 31, 2018. Wei Chen, Yajun Wang, and Siyu Yang. Efficient influence maximization in social networks. In Knowledge Discovery and Data Mining, 2009. Wei Chen, Chi Wang, and Yajun Wang. Scalable influence maximization for prevalent viral marketing in large-scale social networks. Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010. Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit and its extension to probabilistically triggered arms. J. Mach. Learn. Res., 17:50:1 50:33, 2014. Eszter Julianna Csókás and Tamás Vinkó. An exact method for influence maximization based on deterministic linear threshold model. Central European Journal of Operations Research, 31:269 286, 2022. Enyan Dai and Suhang Wang. Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information. Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2020. URL https://api.semanticscholar.org/Corpus ID:227141505. Emily Diana, Wesley Gill, Michael Kearns, Krishnaram Kenthapadi, and Aaron Roth. Minimax group fairness: Algorithms and experiments. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 2021. Published in Transactions on Machine Learning Research (06/2025) European Parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council. URL https://data.europa.eu/eli/reg/2016/679/oj. G. Farnadi, Behrouz Babaki, and Michel Gendreau. A unifying framework for fairness-aware influence maximization. Companion Proceedings of the Web Conference 2020, 2020. Yuting Feng, Ankitkumar N. Patel, Bogdan Cautis, and Hossein Vahabi. Influence maximization with fairness at scale. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023. URL https://api.semanticscholar.org/Corpus ID:259063763. Benjamin Fish, Ashkan Bashardoust, Danah Boyd, Sorelle A. Friedler, Carlos Eduardo Scheidegger, and Suresh Venkatasubramanian. Gaps in information access in social networks? The World Wide Web Conference, 2019. Elad Hazan and Satyen Kale. Online submodular minimization. Journal of Machine Learning Research, 13 (10), 2012. David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD, pp. 137 146. ACM, 2003. Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. Tight regret bounds for stochastic combinatorial semi-bandits. Ar Xiv, abs/1410.0949, 2014. Lihong Li, Wei Chu, John Langford, and Robert E Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pp. 661 670, 2010. Jing Ma, Ruocheng Guo, Mengting Wan, Longqi Yang, Aidong Zhang, and Jundong Li. Learning fair node representations with graph counterfactual fairness. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 2022. URL https://api.semanticscholar.org/Corpus ID: 245853686. George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. An analysis of approximations for maximizing submodular set functions i. Mathematical Programming, 14:265 294, 1978. Pierre Perrault, Jennifer Healey, Zheng Wen, and Michal Valko. Budgeted online influence maximization. In International Conference on Machine Learning, pp. 7620 7631. PMLR, 2020. Aida Rahmattalabi, Phebe Vayanos, Anthony Fulginiti, Eric Rice, Bryan Wilder, Amulya Yadav, and Milind Tambe. Exploring algorithmic fairness in robust graph covering problems. Advances in neural information processing systems, 32, 2019. Aida Rahmattalabi, Shahin Jabbari, Himabindu Lakkaraju, Phebe Vayanos, Max Izenberg, Ryan Brown, Eric Rice, and Milind Tambe. Fair influence maximization: A welfare optimization approach. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11630 11638, 2021. Peter Stone. Reinforcement learning. In Encyclopedia of Machine Learning and Data Mining, 2010. Alan Tsang, Bryan Wilder, Eric Rice, Milind Tambe, and Yair Zick. Group-fairness in influence maximization. In International Joint Conference on Artificial Intelligence, 2019. Thomas W. Valente and Patchareeya Pumpuang. Identifying opinion leaders to promote behavior change. Health Education & Behavior, 34:881 896, 2007. Sharan Vaswani and Laks V. S. Lakshmanan. Influence maximization with bandits. Ar Xiv, abs/1503.00024, 2015. Qinshi Wang and Wei Chen. Improving regret bounds for combinatorial semi-bandits with probabilistically triggered arms and its applications. In NIPS, 2017a. Published in Transactions on Machine Learning Research (06/2025) Yao-Yuan Wang and Yuxin Chen. Improving online influence maximization with regret minimization. In Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 3598 3607, 2017b. Zheng Wen, Branislav Kveton, Michal Valko, and Sharan Vaswani. Online influence maximization under independent cascade model with semi-bandit feedback. In NIPS, 2016. Qingyun Wu, Zhige Li, Huazheng Wang, Wei Chen, and Hongning Wang. Factorization bandits for online influence maximization. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019. Amulya Yadav, Bryan Wilder, Eric Rice, Robin Petering, Jaih B. Craddock, Amanda Yoshioka-Maxwell, Mary Hemler, Laura Onasch-Vera, Milind Tambe, and Darlene Woo. Bridging the gap between theory and practice in influence maximization: Raising awareness about hiv among homeless youth. In International Joint Conference on Artificial Intelligence, 2018. Published in Transactions on Machine Learning Research (06/2025) A.1 Preliminary In this part, we present the mathematical notations employed in our theoretical inference. Specifically, the problem description and its associated notations are detailed in Table 2. Symbol Description G Directed graph V Node set E Edge set Ci Group i nodes in V µ Edge weights u Node activatation probability S Seed set K Maximum size of seed set IG(S) Expected number of influenced nodes by S IG,Ci(S) Expected number of nodes in group Ci influenced by seed set S IG(k) Maximum number of nodes influence nodes under cardinality k (α, β)-approximation Pr [rµ(S) α rµ(S )] β ES Set of edges in the subgraph induced from G by seed nodes S rµ(S) Optimization objective function with edge weights µ and seed set S Table 2: Summary of notation used in problem description. Notation used in regret inference. We begin by introducing the notations used in this paragraph. With approximation guarantee (α, β), we define seed nodes S as a bad set if rµ(S) < α rµ(S ), where rµ(S ) denotes the global optimal reward. The set of bad seed nodes is defined as SB := {S | rµ(S) < α rµ(S )}. In CMAB, each edge in the graph represents the arm in the bandit. We let S B,i = {S SB | i ES} denotes the set of bad seed nodes where the ith edge is expected to be reached. We order all bad seed nodes in S B,i as S1 B,i, S2 B,i, . . . , SKi B,i in increasing order of their expected rewards, where Ki = |S B,i|. For any bad seed set S SB, with approximation guarantee (α, β), we define approximate regret S := α rµ(S ) rµ(S). For the ith edge i {1, . . . , |E|} with Ki > 0 and reward order index j {1, . . . , Ki}, we define i,j := Sj B,i Additionally, for convenience purpose, we let i max := i,1 and i min := i,Ki. Moreover, we define max := max i {1,...,|E|},Ki>0 i max, and min := min i {1,...,|E|},Ki>0 i min. A.2 Quoted Theorems Our theoretical analysis rely on the following theorems from CUCB (Chen et al., 2014). Theorem 4 (Theorem 1 of CUCB (Chen et al., 2014)). The (α, β)-approximation regret of the CUCB algorithm in T rounds using an (α, β)-approximation oracle is at most X i {1,...,|E|},Ki>0 ℓn i min, pi i min + Z i max ℓn (x, pi) dx |E| max, (26) where p = mini {1,...,|E|} pi, and ℓn( , p) = max 12 ln T (f 1( ))2 p , 24 ln T , 0 < p < 1 (27) Theorem 5 (Theorem 2 of CUCB(Chen et al., 2014)). Consider a CMAB problem with an (α, β)- approximation oracle. Let p = mini {1,...,|E|} pi. If the bounded smoothness function f(x) = γ xω for some γ > 0 and ω > 0, the regret of CUCB is at most 2γ 2 ω 12|E|ln T p ω/2 T 1 ω/2 + π2 |E| max + X i {1,...,|E|} µi max (28) Published in Transactions on Machine Learning Research (06/2025) A.3 Regret of Diversity Fairness Here we provide a detailed proof for regret of diversity fairness in Equation (22). For diversity fairness, the bounded smoothness function we use is f(x) = |E||V |x + |V |, and its corresponding inverse function is f 1( ) = |V | Then, to determine ℓT ( , p), our objective is to compare 1 (f 1( ))2 and 2 following ℓT ( , p) = max( 12 ln T (f 1( ))2p, 24 ln T p ) shown in Theorem 4. To make such a comparison, it is important to note |V |. Consequently, |E|2|V |2 ( |V |)2 is evidently greater than 2. As a result, we can conclude that ℓT ( , p) = 12 ln T |E|2|V |2 ( |V |)2p . Finally, we provide the regret bound. By employing Theorem 4, we have the following regret formula: |E| max + X i {1,...,|E|} Ki>0 " 12 ln T |E|2 |V |2 i min i min |V | 2 pi + 12 ln T |E|2 |V |2 1 |V | imax 1 |V | i min (29) Since 1 (|V | i min) is negative, we can remove it and obtain 12 ln T|E|2|V |2 ( i min |V |)2pi i min + 12 ln T|E|2|V |2 pi(|V | imax) 2 + 1 |E| max (30) A.4 Regret of Maximin Fairness In this section, we provide a detailed proof for the regret of maximin fairness in Equation (21). For maximin fairness, we choose the bounded smoothness function as f(x) = |E|x. Then, to determine ℓT ( , p), we need to compare 1 (f 1( ))2 and 2. Because the objective function of maximin fairness is a proportion, we can naturally infer that 0 1. Consequently, we have |E|2 eventually, we get that ℓT ( , p) = 12|E|2ln T Finally, we show the regret bound. By employing Theorem 4, the regret upper bound is as follows. i {1,...,|E|},Ki>0 24 |E|2 ln T i min pi + π2 2 + 1 |E| max (31) A.5 Regret of Welfare Function Here we provide the detailed proof of regret of welfare function in Section 5.1. A.5.1 When α 1 Based on Equation (18), the bounded smoothness function we use is f(x) = |E||V |x. To derive the upper bound for f(x), we analyze f(Λ, α) using Lagrange median theorem. By transforming f(Λ, α) into difference form, which is f(Λ, α) = P c C (|c|(Λ(ωc)α 1)) for uc ωc uc + Λ, where ωc represents the corresponding gradient parameter used in Lagrange median theorem, we can bound it with Λ and obtain the upper bound as f(Λ, α) = P c C (|c| (Λ)). This implies that f(x) = |E||V |x is a valid form of bounded smoothness function when α > 1. Additionally, we aim to provide an upper bound for f(x) in the form of f(x) = γxω. For f(x) = |E||V | [(1+x)α 1] α , the upper bound bounded smoothness function is f(x) = |E||V |(2α 1)xα. This result can be obtained by comparing the gradients and coincident starting points of the two functions. Published in Transactions on Machine Learning Research (06/2025) In summary, we can have three forms of bounded smoothness function:f(x) = |E||V |x, f(x) = |E||V | (1+x)α 1 α and f(x) = |E||V | 2α 1 To determine ℓT ( , p), we use f(x) = |E||V | (1+x)α 1 α and discuss it case by case. Note that we have ℓT ( , p) = max( 12 ln T (f 1( ))2p, 24 ln T p ). Our main goal is to compare the relationship between 1 (f 1( ))2 and 2 with f 1( ) = ( α |E||V | + 1) 1 α 1. Based on the above expression and definition, if ( α |E||V | + 1) 1 α > 1 + 1 2, we have ℓT ( , p) = 24 ln T p . Otherwise, if ( α |E||V | + 1) 1 α 1 + 1 ℓT ( , p) = 12 ln T (f 1( ))2p. ℓT ( , p) = 24 ln T p is mathematically impossible. If ℓT ( , p) = 24 ln T p happens to be true, we can re-formulate the requirement ( α |E||V | + 1) 1 α 1 + 1 2 to (1 + 1 2)α < 1 + α |E||V |. Given the magnitude of |E| and |V | assumed in the G and the disparity between exponential and linear functions, the aforementioned inequality is almost impossible to be satisfied. In conjunction with max = |V | α as the extreme case, solving α |E||V | + 1 (1 + 1 2)α gives us α log1+ 1 Thus we have ℓT ( , p) = 12 ln T (f 1( ))2p if α log1+ 1 Finally, we provide the problem-dependent regret bound. We use ℓT ( , p) to derive the final regret result. In Theorem 4, for the sake of brevity, we use f(x) = γxω to provide an upper bound for regret. When α log1+ 1 2 2, we apply Theorem 4 with f(x) = |E||V |(2α 1)xα. And the regret bound is applicable in the case where log1+ 1 2 2 α and α = 2 i {1,...,|E|} Ki>0 " 12 ln T i min α (2α 1)|E||V | i min 2 α pi + 12α ln T h i max 1 2 α ( i min)1 2 (α 2) pi α (2α 1)|E||V | 2 If α = 2, we simply apply Theorem 4 with f(x) = 3|E||V | x2 i {1,...,|E|},Ki>0 h 18|E||V |ln T pi + 36|E||V |ln T pi ( i max i min) i + π2 |E| max (32) In summary, if we take consideration of all cases, and notably include the fact that for α 1 there would always have f(x) = |E||V |x. In the case of 1 α log1+ 1 2 2, the regret upper bound would be same as original CUCB bound. We have the regret bound as follows. If 1 α log1+ 1 2 2, we have the regret as i {1,...,|E|} Ki>0 24 |V |2|E|2ln T i min pi + π2 |E| max (33) 2 2 α = 2, we have the regret as i {1,...,|E|} Ki>0 " 12 ln T i min α (2α 1)|E||V | i min 2 α pi + 12α ln T h ( i max)1 2 α ( i min)1 2 (α 2)pi α (2α 1)|E||V | 2 |E| max (34) If α = 2, we have the regret as X i {1,...,|E|} Ki>0 18|E||V |ln T pi + 36|E||V |ln T( i max i min) |E| max (35) Published in Transactions on Machine Learning Research (06/2025) A.5.2 When 0 < α 1 In this case, we use the bounded smoothness function f(x) = |E||V | xα To analyze ℓT ( , p), our main goal is to compare the relationship between 1 (f 1( ))2 and 2, which entails comparing ( α |E||V |) 2 α with 2. Since the upper bound of is |V | α , if α 2 log |E| log 2 , then ℓT ( , p) = 24 ln T However, it is trivial that, α 2 log |E| log 2 cannot be attained since α (0, 1]. Thus, for 0 < α 1, we have ℓT ( , p) = 12 ln T (f 1( ))2p. Finally, the problem-dependent regret bound is i {1,...,|E|} Ki>0 12 ln T i min α |E||V | i min 2 α pi + 12α ln T h ( i max)1 2 α ( i min)1 2 (α 2)pi α |E||V | 2 |E| max (36) A.5.3 When α < 0 Specifically, in this case, we have f(x) = |E||V | (1+ϵ)α ϵα α x α. The determination of an upper bound for f(x) can be easily achieved by employing gradient and ensuring that the starting and ending points coincide. Eventually, we can have two f(x), which respectively are f(x) = |E||V | (1+ϵ)α ϵα α x α and f(x) = |E||V |((ϵ+x)α ϵα) Then we analyze ℓT ( , p). Given that f 1( ) = ( α |E||V | +ϵα) 1 α ϵ for the f(x) = |E||V |((ϵ+x)α ϵα) α , the objec- tive would be to compare 1 (f 1( ))2 with 2. By solving the inequality, we know that 1 2 > h ( α |E||V | + ϵα) 1 α ϵ i2 would result in ℓT ( , p) = 12 ln T (f 1( ))2p. To solve the aforementioned inequality, it is evident that ( α |E||V | + ϵα) 1 α ϵ. In the extreme scenario where max = |V | α , the requirement for ℓT ( , p) = 12 ln T (f 1( ))2p is 2 2 ( α |E||V | + ϵα) 1 α ϵ. Yet the condition 2 2 ( α |E||V | + ϵα) 1 α ϵ would result in ℓT ( , p) = 24 ln T Similar to previous cases, we now would have θ as the solution to the following equation. 2 2 + ϵ)θ ϵθ (37) Hence, when α > θ, the expression for ℓT ( , p) is given by 12 ln T (f 1( ))2p. Otherwise, when α < θ, it becomes ℓT ( , p) = 24 ln T Finally, we provide the problem-dependent regret bound. As we proceed to use upper bounded smoothness function f(x) = |E||V | (1+ϵ)α ϵα α x α into Theorem 4, we obtain the following problem-dependent regrets: By applying Theorem 4 with f(x) = γxω, we have the following problem-dependent regret bound If α θ, we have the regret as i {1,...,|E|} Ki>0 " 12 ln T i min α [(1+ϵ)α ϵα]|E||V | i min 2 + 12α ln T h ( i min)1+ 2 (α + 2)pi α [(1+ϵ)α ϵα]|E||V | 2 |E| max (38) If α θ, we have the regret as π2 |E| max + X i {1,...,|E|} pi max (39) A.6 Problem-Independent Regret In this section, we outline the procedural steps to obtain the problem-independent regret bound for maximin fairness, diversity fairness, and welfare function. The focus of this section is to outline the procedural steps Published in Transactions on Machine Learning Research (06/2025) involved in obtaining the problem-independent form of regret for Maximin Fairness, Diversity Fairness and Welfare Function. A.6.1 Problem-independent regret bound for maximin fairness By utilizing the function f(x) = |E|x and applying Theorem 5, we derive the problem-independent form of regret as follows 48|E|3T ln T 2 + 1 |E| max (40) A.6.2 Problem-independent regret bound for diversity fairness Since in previous works, bounded smoothness and Theorem5 under the assumption that f(0) = 0. We need to introduce the justification of assumption f(0) = 0 in bounded smoothness can still be employed to give problem-independent regret guarantee. So we modify the proof steps in CUCB of Theorem 5 to give a modified problem-independent regret upper bound. The result of such modification would be that we have an additional 2c T regret bound, where f(x) = c + γxω. Thus, for f(x) = |V |+|E||V |x, we can have the following regret bound 48|E|3T ln T 2 + 1 |E| max + 2|V |T (41) A.6.3 Problem-independent regret bound for welfare function Here we respectively discuss the problem-independent regret in regard to the value of α and ℓT ( , p). When α 1: The value of ℓT ( , p) can be seen in Appendix A.5.1 and use f(x) in form of f(x) = γxω, which is f(x) = |E||V |(2α 1)xα. Then by utilizing Theorem5, we can have problem-independent regret as follows. If 1 α log1+ 1 2 2 or α = 2, we have the problem-independent regret as 48|E|3T ln T 2 + 1 |E| max (42) 2 2 α = 2, we have the problem-independent regret as 2 2 α |E||V | α (2α 1)[12|E|ln(V ) 2 + 1 |E| max (43) When 0 < α 1: The value of ℓT ( , p) can be seen in Appendix A.5.2, and we can simply use f(x) = |E||V | xα α for problem-independent bound derived from Theorem 5. Thus the problem-independent regret would be 2 2 α |E||V | α [12|E|ln(V ) 2 + 1 |E| max (44) When α < 0: The f(x) here would be f(x) = |E||V | (1+ϵ)α ϵα α x α. Thus, by applying Theorem 5, we have the following regret upper bound under the condition that α θ. 2 2 + α |E||V |[(1 + ϵ)α ϵα] α 12|E|ln(V ) 2 + 1 |E| max (45) Published in Transactions on Machine Learning Research (06/2025) In addition, we would like to point out that, for α θ, we have that ℓT ( , p) = 24 ln n p . With the different ℓT ( , p) form, the problem-independent regret in Theorem 5 would be employed without the 2γ 2 ω 12|E|ln T p ω/2 T 1 ω/2 part, resulting in upper regret bound same to problem-dependent case, i.e., π2 2 + 1 |E| max + X i {1,...,|E|} pi max (46) A.7 Dataset Descriptions In our experiments, we use various widely-used benchmark datasets for fair graph learning: NBA, German, Pokec-z, bail and Youtube. These datasets are common datasets used in in fair graph learning. For example, NBA dataset was used in Fair GNN (Dai & Wang, 2020) with Nationality as sensitive attribute and Bail dataset was used in GEAR (Ma et al., 2022) with Race as sensitive attribute. We use the same sensitive attribute as in references. Attributes like (Race, etc) are sensitive in real scenarios when considering fair online influence maximization. The detailed dataset descriptions are as follows. NBA is a social network of NBA players on Twitter. Two players (nodes) are connected if one follow another on Twitter. We use the nationality of each user as the sensitive attribute. German is a similarity graph of clients in a Germany bank. Nodes are the clients, and two nodes are connected if their credit accounts are similar. We use the age of each client as the sensitive attribute. Pokec-z is a sub-network of the Slovakian social network Pokec of one province. Each node is a user in the selected province, and edges represent the friendship relationship among users. We use the biological sex of each user as the sensitive attribute. You Tube is a social network of user friendship relationship on the video-sharing web platform You Tube. Each node is a user on You Tube, and two nodes are connected if they form the friendship with each other. We use the community membership of each user as the sensitive attribute. A.8 Additional Experiments Figure 5: Cumulative regrets using submodular oracles in NBA. Methods with suffix -degree use use degree discount (Chen et al., 2009) as the offline oracle, while methods with suffix -greedy use greedy selection as the offline oracle. In this section, we provide evidence as additional support for our Experiments part. A.8.1 Comparison between different submodular oracles We compare the performance of FOIM with two submodular oracles: degree discount and greedy strategy. From Figure 5, we can conclude that the two compared submodular oracles have many similar regret. Thus, in our experiments in the main body (i.e., Figure 3 and Figure 2), we only use degree discount. A.8.2 Experiments on German We provide additional experimental results on German dataset, as shown in Figure 6. From the figure, we observe that the IMLin UCB has poor performance in German, as the regret of IMLin UCB is much larger. This might be due to a few reasons. First, for a dense network like German, the seed size we select is relatively large. Second, the sensitive attribute, i.e., age, contains more possible sensitive attribute values (non-binary), making the objective function insensitive to context, which is critical in IMLin UCB. Other than IMLin UCB, we can still observe the superiority of fair oracle under maximin regret in German. Published in Transactions on Machine Learning Research (06/2025) A.8.3 Experiments on bail dataset We present additional experimental results on bail dataset, as shown in Figure 7. We employed the MIP as fair oracle and compared it with the results of baselines, and this figure can serve as an additional proof of the superiority of fair oracle when the reward is fair. A.8.4 Scalability demonstration To demonstrate our algorithm s scalability to large graph dataset spanning million of nodes, we tested FOIM on Youtube dataset with MIP as fair oracle and compared with the baselines. The results can be seem in Figure 8 and can serve as additional proof of the performance of FOIM framework. Due to the size of dataset and computation resource constraint, we only ran on it one time and thus there is no deviation. Figure 6: Cumulative regrets under maximin fairness, diversity fairness, and welfare function (α = 2, 2, 0.5) in german. Methods with suffix -FW use the multi-objective Frank Wolfe algorithm (Tsang et al., 2019) as offline oracle with fairness consideration. Methods without suffix use degree discount (Chen et al., 2009) as the offline oracle. Shaded area refers to the standard deviation of the corresponding method across different runs. Figure 7: Cumulative regrets under maximin fairness, diversity fairness, and welfare function (α = 2, 2, 0.5) in bail. Methods with suffix -MIP use the Mixed Integer Programming algorithm (Farnadi et al., 2020) as offline oracle with fairness consideration. Methods without suffix use degree discount (Chen et al., 2009) as the offline oracle. Shaded area refers to the standard deviation of the corresponding method across different runs. Figure 8: Cumulative regrets under maximin fairness, diversity fairness, and welfare function (α = 2, 2, 0.5) in Youtube dataset. Methods with suffix -MIP use the Mixed Integer Programming algorithm (Farnadi et al., 2020) as offline oracle with fairness consideration. Methods without suffix use degree discount (Chen et al., 2009) as the offline oracle. Published in Transactions on Machine Learning Research (06/2025) A.8.5 Regret between FOIM and offline fair oracle We evaluate the regret performance of a Frank-Wolfe offline oracle on the NBA dataset using the diversity fairness metric, under perturbed activation probabilities. Specifically, Gaussian noise sampled from N(0, 0.052) is added to the base activation probabilities to simulate realistic imperfections in offline estimates. Dotted line in Figure 9 presents a generally higher cumulative regret. This setup highlights the sensitivity of offline methods to noise in activation estimates. In contrast, the adaptive online methods dynamically adjust to the true graph structure and exhibit lower expected regret under the same noise conditions. A.8.6 Experiment of comparison between offline fair algorithms and online fair algorithms. Figure 9: Diversity regret on NBA. Dashed line: FWOffline with N(0, 0.052). To support the claim in Introduction part "Consequently, it is likely that the estimated probabilities are inconsistent with respect to individuals in the minority group due to bias introduced in the offline data selection (e.g., less logged activations for individuals from the protected group)", we performed a new experiment on pokec dataset with "Sex" as sensitive attribute and select the group with labeled as Sex="1". We set seed size as 300 to observe the performance of influenced nodes in the last round of online learning (100 iterations) and compare it to the output of offline oracle (MIP algorithm). We evaluate the results on the metric of estimation error, which is the absolute of the gap between estimated propagation probability and ground truth probability. We need to note that the estimation error in offline algorithm is more related to offline data selection, which is like a data sampling process and somehow independent of the operating of Frank-Wolfe algorithm and other fair oracles like MIP. For example, on MIP, the offline data selection is achieved by Monte Carlo Sampling. To report and ensure the reliability of our evaluation, we report Table 3 with standard deviation after 5 parallel executions. MIP(offline) CUCB IMFB IMLin UCB Estimation error 0.219 0.047 0.177 0.055 0.199 0.022 0.151 0.050 Influenced nodes 1488 14.2 1120 10.7 1507 18.8 1525 15.0 Table 3: Comparison between online fair algorithms and offline fair algorithms We record the comparison in the Table 3. The experiment indicates that the final seed selection of fair online learning might be better than offline fair algorithm and supports the statement "It may further result in sub-optimal seed selection for fair offline influence maximization that relies on the estimated information.". A.8.7 Influence spread and fairness trade-off In addition to the online and offline fair algorithm comparison, we also report new results of influence spread of fair algorithms and baselines at Table 4 measured by the number of activated nodes at the last round (T = 200) on the Pokec-Z dataset, which includes 6,185 nodes in total. Compared to their non-fair counterparts that only maximize influence, we observe that fair algorithms by FOIM achieve similar or smaller influence spread under different metrics. Specifically, influence spread is almost not affected by diversity fairness. For maximin fairness and welfare function, the spread is 30 smaller than non-fair baselines with IMLin UCB and IMFB, two efficient algorithms that leverage/estimate features of nodes and edges. The reduced influence spread under fairness constraints is expected and is the price of trading-off between fairness and influence. Published in Transactions on Machine Learning Research (06/2025) Table 4: Influence spread and fairness trade-off on the Pokec-Z dataset. Influence Maximin fairness Diversity fairness Welfare (α = 0.5) α = 2 α = 2 CUCB 1289 1298 1243 1221 1251 epsilon-greedy 1246 1234 1238 1237 1217 IMFB 1481 1437 1483 1460 1471 IMLin UCB 1387 1334 1419 1338 1409 CUCB-FW 1038 1324 1220 1163 1132 epsilon-greedy-FW 974 1372 1259 1056 1190 IMFB-FW 1110 1431 1177 1192 1192 IMLin UCB-FW 1074 1383 1188 1074 1116 A.9 Additional Proof of Diversity Fairness In this section, we provide the additional proof of diversity fairness that if the diversity constraint is satisfied in G with utility vector as µ, the constraint will not be violated for G where µ i = µi + Λ. Proof. If the diversity constraint is satisfied with µ, the constraint will not be violated after the lifting of µ (adding Λ to µ) If we only investigate on an edge j and let µ j = µj + Λ, the increase of IG,Ci(S) is larger than IG[Ci](ki) would complete the proof, as the general proof of lifting µ is just a combination of each increase to edge propagation weight. Thus, we give the formal proof of setting µ j = µj + Λ to have the increase of IG,Ci(S) is larger than IG[Ci](ki). Case 1. If edge µj is out of G[Ci], the proof naturally stands. We can notice that IG[Ci](ki) is the maximal expectation that fair allocation ki can achieve on the deduced subgraph G[Ci], which is independent of edge µj out of the subgraph. Since we have condition that IG,Ci(S) IG[Ci](ki) before the lifting(adding Λ) process, yet adding edge µj is out of G[Ci] would only possibly increase on IG,Ci(S), completing the proof in this case. Case 2. If edge µj is within G[Ci], the proof goes as the followings. We consider the case that we lift edge µj to µ j and the increase of spread to be Λ, and we mark the end node of µj as y. Thus, the increase of IG,Ci(S) and IG[Ci](ki) is determined by the increased spread starting from y. The the increase of IG[Ci](ki) would be at most ΛIG [Ci](y) and the increase of IG,Ci(S) would be ΛIG ,Ci(y) at most. However, the increase would not always reach the maximal value like ΛIG [Ci](y) and ΛIG ,Ci(y). Since the spread from y is not on an empty graph, the nodes in Ci may already be activated. For example, ΛIG [Ci](y) would have non-zero influence to its neighbors, yet the neighbors already have activation probability of 1.0 as they are seed nodes. 2.1 We first discuss about when ΛIG [Ci](y) and ΛIG ,Ci(y) are the increases. We look at the volume relationship between ΛIG [Ci](y) and ΛIG ,Ci(y). As G [Ci] is subgraph of G , implying that the increase of IG [Ci](y) is also made IG ,Ci(y) in Ci, so IG ,Ci(y) would only be larger or equal than IG [Ci](y) in Ci as it would spread to groups other than Ci and may propagate back to Ci, making the influence strictly larger. 2.2 We analyze the case when increase starting from y in each metric is smaller than ΛIG [Ci](y) and ΛIG ,Ci(y) in this part. This case would happen as the increase spreading from y is not on an empty graph, and each node in Ci has already has it s expected activation probability due to the IG [Ci](S) or IG ,Ci(S). For example, if we employ ΛIG [Ci](y) as increase, and ΛIG [Ci](y) would have allocated increase in each node in Ci. Yet there exists such constraint that for each node in Ci, the allocated increase plus it s original expected activation probability must be smaller than 1, explaining why this case happen. As in some nodes, due to the existing of the constraint, the increase could only be smaller than the allocation of spread from y. For each node o in class Ci, we have the following notations. Po is the what IG [Ci](y) has on node o, and Ro is the allocation of IG[Ci](ki) on node o, Mo is the allocation of IG ,Ci(y), and Bo of IG,Ci(S). They represent Published in Transactions on Machine Learning Research (06/2025) what the activation probability of each node in different diffusion process. For example IG [Ci](y) can be expressed as P o Ci Po, and Ro = 1 or Bo = 1 represents o is seed node. ( ΛPo o Ci, 1 ΛPo Ro 0 1 Ro o Ci, 1 ΛPo Ro 0 (47) Due to the existence of the constraint in Equation 47, we can divide the nodes in Ci into two groups respectively in considering IG[Ci](S) and IG,Ci(S). We annotate node set Ωas {o Ci, ΛPo 1 Ro}, node set Θ as {o Ci, ΛMo 1 Bo}. For node o in Ωor Θ, the increase would be ΛPo and ΛMo. Yet for node o in Ci Ωor Ci Θ, the increase would be 1 Ro and 1 Bo. We compare the increase of spread in Ci under the constraint. For increase under IG[Ci](ki), the general spread would be P o Ci Ω(1 Ro). And for the increase of IG,Ci(S) would be P o Ci Θ(1 Bo) following the same logic. We target to compare the relationship between the both constrained increase. Since we already have relationship of IG ,Ci(y) and IG [Ci](y), IG [Ci](ki) and IG ,Ci(S) in Case2 and our pre-condition, and the conditions can be expressed as Λ P o Ci Po Λ P o Ci Bo. Combining these two conditions with Equation 47, we can have Hereby we target to have P o Ci(1 Ro)+P o Ω(ΛPo+Ro 1) minused by P o Ci(1 Bo)+P o Θ(ΛMo+Bo 1) and compare it to 0. After calculation, we can have the gap expressed as P o Ci(ΛMo ΛPo) P o Ci Ω(ΛPo+ Ro 1) + P o Ci Θ(ΛMo + Bo 1). Since only in set Ci Θ, ΛMo + Bo 1 can be positive and reach the maximal value, which means we can have P o Ci Ω(ΛMo + Bo 1) P o Ci Θ(ΛMo + Bo 1). Thus we can have the final result represented as P o Ci(ΛMo ΛPo) P o Ci Ω( ΛPo Ro + ΛMo + Bo). Taking the conditions and constraints into consideration, the final result would be larger than 0. A.10 Bounded Smoothness for Diversity Fairness Proof. We consider two cases here. (1) Given feasible seed nodes S, both G and G satisfy diversity constraint. In this case, the problem is equivalent to influence maximization without fairness constraint with reward function being Thus, it satisfies bounded smoothness with function f(x) = |E||V |x (Chen et al., 2014). (2) Given feasible seed nodes S, at least one of G and G violates the diversity constraint. Similarly, by Lemma 3, it is impossible that G satisfies the diversity constraint while G violates. Then If both G and G violate the diversity constraint, we have that rdiversity µ (S) = rdiversity µ (S) = 0. If G violates the diversity constraint while G satisfies, the largest reward increase is achieved by replacing µi with µ i in G i {1, ..., |E |}, which will make rdiversity µ > 0. Given that the diversity constraint is already satisfied in the graph, the increase of the reward brought by edge i in ES is at most |V |Λ. If we regard the reward increase brought by ES as a sequential process, i.e., the reward increase is brought by each edge in ES sequentially, then the total increase of the reward brought by ES in G must be smaller than (s 1)|V |Λ + h, where h denotes the reward increase bought by the first edge making rdiversity µ = 0. Since h |V |, we have |rdiversity µ (S) rdiversity µ (S)| (s 1)|V |Λ + |V | |E||V |Λ + |V |. Combining (1) and (2), we complete the proof for bounded smoothness with smoothness function f(x) = |E||V |x + |V |. A.11 Detailed Steps of Bounded Smoothness of Welfare Function In this section, a comprehensive analysis is presented, elucidating the upper bound derivation for the expression of bound function, |c| (uc+Λ)α (uc)α α within the context of the welfare function, associated with value of α. Given α and Λ, we define g(uc) = |c| (uc+Λ)α (uc)α α . Therefore, the maximum increase of the welfare reward Published in Transactions on Machine Learning Research (06/2025) function bought by any one edge equals to P c C g(uc). Here we take the derivative of g with respect to any uc. We have g uc = (uc + Λ)α 1 (uc)α 1. (1) If α 1: Since 0 < uc 1, we have g uc 0. Therefore, g(uc) is strictly increasing regard to uc. In this case, we have the upper bound for the increase of the welfare reward function bring by any edge P c C (|c|(1 + Λ)α)/α P c C (|c|)/α (2) If 0 < α < 1: we have g uc < 0. Therefore, g(uc) is strictly decreasing with regard to uc. Consequently, we have the upper bound for the increase of the welfare reward function bring by any edge P c C (|c|(Λ)α)/α. (3) If α < 0: we let ϵ = min({uc| c C}). Since c C, uc 0, we have ϵ 0. Since g uc < 0, the welfare reward function achieves the largest vale under the condition that c C, uc = ϵ. Therefore, the upper bound equals to the welfare IM reward increase when c C, uc = ϵ, i.e., P c C (|c|(ϵ + Λ)α)/α P c C (|c|ϵα)/α. A.12 Time Complexity Analysis In this part, we delve into the comparison between FOIM algorithms and original online IM algorithms. Notably, FOIM also operates within the CMAB framework, akin to the foundational online algorithms. Taking the example of the CUCB algorithm, its time complexity is expressed as O(T(|E|log|E|+ORACLEtime)) (Chen et al., 2014). Consequently, the bottleneck of our analysis hinges on the computational cost associated with the ORACLE, denoted as ORACLEtime. Intuitively, the time complexity tends to be higher for fair oracles compared to unfair ones, due to trade-off between efficiency and fairness. The time complexity would be O( (|E|+m)2k2b ϵ log2 |V | δ ) (Tsang et al., 2019) for of Frank-Wolfe fair oracle, where m is the number of demographic groups, k is the number of seed nodes, b = maxi,j( IG,Ci(j) |Ci| ) is the maximum normalized influence to a demographic group, ϵ is an error residual, and δ is the probability threshold where ϵ and δ exists in the (α, β) approximation (Tsang et al., 2019). However, the time complexity for unfair oracle like general greedy and degree discount is Ω(m|E|k POLY (ϵ 1)) (Borgs et al., 2012; Kempe et al., 2003). In addition, at the Table 5 we report the average running time of our algorithm (with suffix -FW) and non-fair algorithms (without suffix) in the table above. The Youtube dataset (the largest dataset) is run on Google Cloud with N4 VM and Google Axion Processor including 32 v CPUs and 128 GB of DDR5 memory, while other datasets are tested on an AMD 5800H CPU. The non-fair methods (without suffix) use the efficient degree discount oracle, while the fair algorithms use the multi-objective Frank-Wolfe as a fair oracle. From the table, we observe that FOIM algorithms tend to have a larger running time but are within the acceptable range. For example, on the largest dataset, Youtube, fair algorithms are five to six times slower than the non-fair baseline. The additional running time is because the fair ORACLE is slower to solve the optimization problem and could be further reduced given more efficient fair offline IM solvers in the future. Table 5: Average running time of algorithms on various datasets. Algorithm NBA Pokec-Z German Bail Youtube CUCB 0:00:15 0:01:14 0:00:57 0:08:48 0:33:41 epsilon-greedy 0:00:12 0:01:01 0:00:46 0:07:31 0:30:18 IMFB 0:00:38 0:02:13 0:02:20 0:14:48 0:59:49 IMLin UCB 0:00:26 0:02:03 0:04:53 0:12:57 0:47:48 CUCB-FW 0:02:32 0:12:47 0:06:21 1:03:45 3:46:39 epsilon-greedy-FW 0:02:24 0:12:10 0:06:07 0:53:00 3:30:51 IMFB-FW 0:03:06 0:13:59 0:08:53 1:38:29 4:18:06 IMLin UCB-FW 0:02:57 0:13:43 0:08:05 1:27:26 4:03:00 Published in Transactions on Machine Learning Research (06/2025) A.13 Discussion of TPM Condition In the context of OIM SOTA algorithms, the Tightening Probability Mass (TPM) condition Wang & Chen (2017b); Wen et al. (2016) provides a smoothness assumption that is weaker and more broadly applicable than the classical bounded smoothness condition. We investigate whether common fairness-aware objective functions satisfy this condition. In particular, we compare the behavior of Diversity Fairness and Maximin Fairness under TPM. Theorem 6 (TPM Condition). Let µ and µ be two influence probability vectors over the edge set, and let S be a seed set. A reward function rµ(S) satisfies the Tightening Probability Mass (TPM) condition if there exists a constant B > 0 such that |rµ(S) rµ (S)| B X i p S i |µi µ i|, (48) where p S i is the probability that edge i is activated (i.e., traversed) under the influence process given seed set S. Diversity Fairness Violates the TPM Condition. Proof. Assume a seed set S marginally satisfies the diversity constraint under edge probabilities µ, resulting in a non-zero reward: rµ(S) > 0. (49) Now perturb a single edge probability by a small amount ϵ, yielding a new probability vector µ . This small change may violate the diversity constraint, causing the reward to drop sharply: rµ (S) = 0. (50) Thus, the reward difference is a constant: |rµ(S) rµ (S)|= c > 0, (51) whereas the TPM term vanishes as ϵ 0: X i p S i |µi µ i| 0. (52) Therefore, there exists no constant B such that |rµ(S) rµ (S)| B X i p S i |µi µ i|, (53) implying that the TPM condition fails for any B. Hence, the diversity fairness objective does not satisfy the TPM condition. Maximin Fairness Satisfies the TPM Condition. Proof. Define the maximin fairness reward as rmaxmin µ (S) = min C C IGC(S, µ) where IGC(S, µ) denotes the expected influence on group C. Let µ be a perturbed influence probability vector. For any group C, we have: IGC(S, µ) |C| IGC(S, µ ) = 1 |C| |IGC(S, µ) IGC(S, µ )| . (55) Published in Transactions on Machine Learning Research (06/2025) Assuming the Independent Cascade (IC) model and Lipschitz continuity of the influence function over edge weights: |IGC(S, µ) IGC(S, µ )| X i ES p S i |µi µ i|, (56) where ES is the set of edges that may be activated under seed set S. It follows that: IGC(S, µ) |C| IGC(S, µ ) i ES p S i |µi µ i|. (57) Taking the worst-case deviation over all groups: |rmaxmin µ (S) rmaxmin µ (S)| max C C 1 |C| i ES p S i |µi µ i|. (58) B = max C C 1 |C|, (59) we finally obtain: |rmaxmin µ (S) rmaxmin µ (S)| B X i ES p S i |µi µ i|, (60) which satisfies the TPM condition.