# shaping_social_activity_by_incentivizing_users__4b7d7ff7.pdf Shaping Social Activity by Incentivizing Users Mehrdad Farajtabar Nan Du Manuel Gomez-Rodriguez Isabel Valera Hongyuan Zha Le Song Georgia Institute of Technology MPI for Software Systems Univ. Carlos III in Madrid {mehrdad,dunan}@gatech.edu manuelgr@mpi-sws.org {zha,lsong}@cc.gatech.edu ivalera@tsc.uc3m.es Events in an online social network can be categorized roughly into endogenous events, where users just respond to the actions of their neighbors within the network, or exogenous events, where users take actions due to drives external to the network. How much external drive should be provided to each user, such that the network activity can be steered towards a target state? In this paper, we model social events using multivariate Hawkes processes, which can capture both endogenous and exogenous event intensities, and derive a time dependent linear relation between the intensity of exogenous events and the overall network activity. Exploiting this connection, we develop a convex optimization framework for determining the required level of external drive in order for the network to reach a desired activity level. We experimented with event data gathered from Twitter, and show that our method can steer the activity of the network more accurately than alternatives. 1 Introduction Online social platforms routinely track and record a large volume of event data, which may correspond to the usage of a service (e.g., url shortening service, bit.ly). These events can be categorized roughly into endogenous events, where users just respond to the actions of their neighbors within the network, or exogenous events, where users take actions due to drives external to the network. For instance, a user s tweets may contain links provided by bit.ly, either due to his forwarding of a link from his friends, or due to his own initiative to use the service to create a new link. Can we model and exploit these data to steer the online community to a desired activity level? Specifically, can we drive the overall usage of a service to a certain level (e.g., at least twice per day per user) by incentivizing a small number of users to take more initiatives? What if the goal is to make the usage level of a service more homogeneous across users? What about maximizing the overall service usage for a target group of users? Furthermore, these activity shaping problems need to be addressed by taking into account budget constraints, since incentives are usually provided in the form of monetary or credit rewards. Activity shaping problems are significantly more challenging than traditional influence maximization problems, which aim to identify a set of users, who, when convinced to adopt a product, shall influence others in the network and trigger a large cascade of adoptions [1, 2]. First, in influence maximization, the state of each user is often assumed to be binary, either adopting a product or not [1, 3, 4, 5]. However, such assumption does not capture the recurrent nature of product usage, where the frequency of the usage matters. Second, while influence maximization methods identify a set of users to provide incentives, they do not typically provide a quantitative prescription on how much incentive should be provided to each user. Third, activity shaping concerns a larger variety of target states, such as minimum activity and homogeneity of activity, not just activity maximization. In this paper, we will address the activity shaping problems using multivariate Hawkes processes [6], which can model both endogenous and exogenous recurrent social events, and were shown to be a good fit for such data in a number of recent works (e.g., [7, 8, 9, 10, 11, 12]). More importantly, we will go beyond model fitting, and derive a novel predictive formula for the overall network activity given the intensity of exogenous events in individual users, using a connection between the processes and branching processes [13, 14, 15, 16]. Based on this relation, we propose a convex optimization framework to address a diverse range of activity shaping problems given budget constraints. Compared to previous methods for influence maximization, our framework can provide more fine-grained control of network activity, not only steering the network to a desired steady-state activity level but also do so in a time-sensitive fashion. For example, our framework allows us to answer complex time-sensitive queries, such as, which users should be incentivized, and by how much, to steer a set of users to use a product twice per week after one month? In addition to the novel framework, we also develop an efficient gradient based optimization algorithm, where the matrix exponential needed for gradient computation is approximated using the truncated Taylor series expansion [17]. This algorithm allows us to validate our framework in a variety of activity shaping tasks and scale up to networks with tens of thousands of nodes. We also conducted experiments on a network of 60,000 Twitter users and more than 7,500,000 uses of a popular url shortening services. Using held-out data, we show that our algorithm can shape the network behavior much more accurately than alternatives. 2 Modeling Endogenous-Exogenous Recurrent Social Events We model the events generated by m users in a social network as a m-dimensional counting process N(t) = (N1(t), N2(t), . . . , Nm(t)) , where Ni(t) records the total number of events generated by user i up to time t. Furthermore, we represent each event as a tuple (ui, ti), where ui is the user identity and ti is the event timing. Let the history of the process up to time t be Ht := {(ui, ti) | ti t}, and Ht be the history until just before time t. Then the increment of the process, d N(t), in an infinitesimal window [t, t + dt] is parametrized by the intensity λ(t) = (λ1(t), . . . , λm(t)) 0, i.e., E[d N(t)|Ht ] = λ(t) dt. (1) Intuitively, the larger the intensity λ(t), the greater the likelihood of observing an event in the time window [t, t + dt]. For instance, a Poisson process in [0, ) can be viewed as a special counting process with a constant intensity function λ, independent of time and history. To model the presence of both endogenous and exogenous events, we will decompose the intensity into two terms λ(t) !"#$ overall event intensity = λ(0)(t) ! "# $ exogenous event intensity + λ (t) ! "# $ endogenous event intensity where the exogenous event intensity models drive outside the network, and the endogenous event intensity models interactions within the network. We assume that hosts of social platforms can potentially drive up or down the exogenous events intensity by providing incentives to users; while endogenous events are generated due to users own interests or under the influence of network peers, and the hosts do not interfere with them directly. The key questions in the activity shaping context are how to model the endogenous event intensity which are realistic to recurrent social interactions, and how to link the exogenous event intensity to the endogenous event intensity. We assume that the exogenous event intensity is independent of the history and time, i.e., λ(0)(t) = λ(0). 2.1 Multivariate Hawkes Process Recurrent endogenous events often exhibit the characteristics of self-excitation, where a user tends to repeat what he has been doing recently, and mutual-excitation, where a user simply follows what his neighbors are doing due to peer pressure. These social phenomena have been made analogy to the occurrence of earthquake [18] and the spread of epidemics [19], and can be well-captured by multivariate Hawkes processes [6] as shown in a number of recent works (e.g., [7, 8, 9, 10, 11, 12]). More specifically, a multivariate Hawkes process is a counting process who has a particular form of intensity. We assume that the strength of influence between users is parameterized by a sparse nonnegative influence matrix A = (auu )u,u [m], where auu > 0 means user u directly excites user u. We also allow A to have nonnegative diagonals to model self-excitation of a user. Then, the intensity of the u-th dimension is i:ti µj, and vj = 0, otherwise. (ii) Minimax activity shaping: g(λ(0)) = Ψ(t) e, where e is defined such that ej = 1 if µj = µmin, and ej = 0, otherwise. (iii) Least-squares activity shaping: g(λ(0)) = 2Ψ(t) B * BΨ(t)λ(0) v . (iv) Activity homogenization: g(λ(0)) = Ψ(t) ln (Ψ(t)λ(0)) + Ψ(t) 1, where ln( ) on a vector is the element-wise natural logarithm. Since the activity maximization and the minimax activity shaping tasks require only one evaluation of Ψ(t) times a vector, Algorithm 1 can be used directly. However, computing the gradient for least-squares activity shaping and activity homogenization is slightly more involved and it requires to be careful with the order in which we perform the operations (Refer to Appendix B for details). Equipped with an efficient way to compute of gradients, we solve the corresponding convex optimization problem for each activity shaping problem by applying projected gradient descent (PGD) [26] with the appropriate gradient1. Algorithm 2 summarizes the key steps. 6 Experimental Evaluation We evaluate our framework using both simulated and real world held-out data, and show that our approach significantly outperforms several baselines. The appendix contains additional experiments. Dataset description and network inference. We use data gathered from Twitter as reported in [27], which comprises of all public tweets posted by 60,000 users during a 8-month period, from January 2009 to September 2009. For every user, we record the times she uses any of six popular url shortening services (refer to Appendix C for details). We evaluate the performance of our framework on a subset of 2,241 active users, linked by 4,901 edges, which we call 2K dataset, and we evaluate its scalability on the overall 60,000 users, linked by 200,000 edges, which we call 60K dataset. The 2K dataset accounts for 691,020 url shortened service uses while the 60K dataset accounts for 7.5 million uses. Finally, we treat each service as independent cascades of events. In the experiments, we estimated the nonnegative influence matrix A and the exogenous intensity λ(0) using maximum log-likelihood, as in previous work [8, 9, 12]. We used a temporal resolution of one minute and selected the bandwidth ω = 0.1 by cross validation. Loosely speaking, ω = 0.1 corresponds to loosing 70% of the initial influence after 10 minutes, which may be explained by the rapid rate at which each user news feed gets updated. Evaluation schemes. We focus on three tasks: capped activity maximization, minimax activity shaping, and least square activity shaping. We set the total budget to C = 0.5, which corresponds to supporting a total extra activity equal to 0.5 actions per unit time, and assume all users entail the same cost. In the capped activity maximization, we set the upper limit of each user s intensity, α, by adding a nonnegative random vector to their inferred initial intensity. In the least-squares activity shaping, we set B = I and aim to create three user groups: less-active, moderate, and super-active. We use three different evaluation schemes, with an increasing resemblance to a real world scenario: Theoretical objective: We compute the expected overall (theoretical) intensity by applying Theorem 3 on the optimal exogenous event intensities to each of the three activity shaping tasks, as well as the learned A and ω. We then compute and report the value of the objective functions. 1For nondifferential objectives, subgradient algorithms can be used instead. 0 1 2 3 4 5 6 7 8 9 logarithm of time sum of users activity CAM XMU WEI DEG PRK 0 1 2 3 4 5 6 7 8 9 logarithm of time sum of users activity CAM XMU WEI DEG PRK rank correlation 0 1 2 3 4 5 6 7 8 9 0 logarithm of time minimum activity MMASH UNI MINMU LP GRD 0 1 2 3 4 5 6 7 8 9 2 logarithm of time minimum activity MMASH UNI MINMU LP GRD rank correlation 0 1 2 3 4 5 6 7 8 9 1.2 logarithm of time Euclidean distance LSASH PROP LSGRD 0 1 2 3 4 5 6 7 8 9 1.2 logarithm of time Euclidean distance LSASH PROP LSGRD rank correlation (a) Theoretical objective (b) Simulated objective (c) Held-out data Figure 2: Row 1: Capped activity maximization. Row 2: Minimax activity shaping. Row 3: Leastsquares activity shaping. * means statistical significant at level of 0.01 with paired t-test between our method and the second best Simulated objective: We simulate 50 cascades with Ogata s thinning algorithm [28], using the optimal exogenous event intensities to each of the three activity shaping tasks, and the learned A and ω. We then estimate empirically the overall event intensity based on the simulated cascades, by computing a running average over non-overlapping time windows, and report the value of the objective functions based on this estimated overall intensity. Appendix D provides a comparison between the simulated and the theoretical objective. Held-out data: The most interesting evaluation scheme would entail carrying out real interventions in a social platform. However, since this is very challenging to do, instead, in this evaluation scheme, we use held-out data to simulate such process, proceeding as follows. We first partition the 8-month data into 50 five-day long contiguous intervals. Then, we use one interval for training and the remaining 49 intervals for testing. Suppose interval 1 is used for training, the procedure is as follows: 1. We estimate A1, ω1 and λ(0) 1 using the events from interval 1. Then, we fix A1 and ω1, and estimate λ(0) i for all other intervals, i = 2, . . . , 49. 2. Given A1 and ω1, we find the optimal exogenous event intensities, λ(0) opt, for each of the three activity shaping task, by solving the associated convex program. We then sort the estimated λ(0) i (i = 2, . . . , 49) according to their similarity to λ(0) opt, using the Euclidean distance λ(0) i 2. 3. We estimate the overall event intensity for each of the 49 intervals (i = 2, . . . , 49), as in the simulated objective evaluation scheme, and sort these intervals according to the value of their corresponding objective function. 4. Last, we compute and report the rank correlation score between the two orderings obtained in step 2 and 3.2 The larger the rank correlation, the better the method. We repeat this procedure 50 times, choosing each different interval for training once, and compute and report the average rank correlations. More details can be found in the appendix. 2rank correlation = number of pairs with consistent ordering / total number of pairs. Capped activity maximization (CAM). We compare to a number of alternatives. XMU: heuristic based on µ(t) without optimization; DEG and WEI: heuristics based on the degree of the user; PRANK: heuristic based on page rank (refer to Appendix C for further details). The first row of Figure 2 summarizes the results for the three different evaluation schemes. We find that our method (CAM) consistently outperforms the alternatives. For the theoretical objective, CAM is 11 % better than the second best, DEG. The difference in overall users intensity from DEG is about 0.8 which, roughly speaking, leads to at least an increase of about 0.8 60 24 30 = 34, 560 in the overall number of events in a month. In terms of simulated objective and held-out data, the results are similar and provide empirical evidence that, compared to other heuristics, degree is an appropriate surrogate for influence, while, based on the poor performance of XMU, it seems that high activity does not necessarily entail being influential. To elaborate on the interpretability of the real-world experiment on held-out data, consider for example the difference in rank correlation between CAM and DEG, which is almost 0.1. Then, roughly speaking, this means that incentivizing users based on our approach accommodates with the ordering of real activity patterns in 0.1 50 49 2 = 122.5 more pairs of realizations. Minimax activity shaping (MMASH). We compare to a number of alternatives. UNI: heuristic based on equal allocation; MINMU: heuristic based on µ(t) without optimization; LP: linear programming based heuristic; GRD: a greedy approach to leverage the activity (see Appendix C for more details). The second row of Figure 2 summarizes the results for the three different evaluation schemes. We find that our method (MMASH) consistently outperforms the alternatives. For the theoretical objective, it is about 2 better than the second best, LP. Importantly, the difference between MMASH and LP is not trifling and the least active user carries out 2 10 4 60 24 30 = 4.3 more actions in average over a month. As one may have expected, GRD and LP are the best among the heuristics. The poor performance of MINMU, which is directly related to the objective of MMASH, may be because it assigns the budget to a low active user, regardless of their influence. However, our method, by cleverly distributing the budget to the users whom actions trigger many other users actions (like those ones with low activity), it benefits from the budget most. In terms of simulated objective and held-out data, the algorithms performance become more similar. Least-squares activity shaping (LSASH). We compare to two alternatives. PROP: Assigning the budget proportionally to the desired activity; LSGRD: greedily allocating budget according the difference between current and desired activity (refer to Appendix C for more details). The third row of Figure 2 summarizes the results for the three different evaluation schemes. We find that our method (LSASH) consistently outperforms the alternatives. Perhaps surprisingly, PROP, despite its simplicity, seems to perform slightly better than LSGRD. This is may be due to the way it allocates the budget to users, e.g., it does not aim to strictly fulfill users target activity but benefit more users by assigning budget proportionally. Refer to Appendix E for additional experiments. Sparsity and Activity Shaping. In some applications there is a limitation on the number of users we can incentivize. In our proposed framework, we can handle this requirement by including a sparsity constraint on the optimization problem. In order to maintain the convexity of the optimization problem, we consider a l1 regularization term, where a regularization parameter γ provides the trade-off between sparsity and the activity shaping goal. Refer to Appendix F for more details and experimental results for different values of γ. Scalability. The most computationally demanding part of the proposed algorithm is the evaluation of matrix exponentials, which we scale up by utilizing techniques from matrix algebra, such as GMRES and Al-Mohy methods. As a result, we are able to run our methods in a reasonable amount of time on the 60K dataset, specifically, in comparison with a naive implementation of matrix exponential evaluations. Refer to Appendix G for detailed experimental results on scalability. Appendix H discusses the limitations of our framework and future work. Acknowledgement. This project was supported in part by NSF IIS1116886, NSF/NIH BIGDATA 1R01GM108341, NSF CAREER IIS1350983 and Raytheon Faculty Fellowship to Le Song. Isabel Valera acknowledge the support of Plan Regional-Programas I+D of Comunidad de Madrid (AGES-CM S2010/BMD-2422), Ministerio de Ciencia e Innovaci on of Spain (project DEIPRO TEC2009-14504-C02-00 and program Consolider-Ingenio 2010 CSD2008-00010 COMONSENS). [1] David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137 146. ACM, 2003. [2] Matthew Richardson and Pedro Domingos. Mining knowledge-sharing sites for viral marketing. In KDD, pages 61 70. ACM, 2002. [3] Wei Chen, Yajun Wang, and Siyu Yang. Efficient influence maximization in social networks. In KDD, pages 199 208. ACM, 2009. [4] Manuel G. Rodriguez and Bernard Sch olkopf. Influence maximization in continuous time diffusion networks. In ICML, 2012. [5] Nan Du, Le Song, Manuel Gomez Rodriguez, and Hongyuan Zha. Scalable influence estimation in continuous-time diffusion networks. In NIPS 26, 2013. [6] Thomas .J. Liniger. Multivariate Hawkes Processes. Ph D thesis, SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH, 2009. [7] Charles Blundell, Jeff Beck, and Katherine A Heller. Modelling reciprocating relationships with Hawkes processes. In NIPS, 2012. [8] Ke Zhou, Hongyuan Zha, and Le Song. Learning social infectivity in sparse low-rank networks using multi-dimensional Hawkes processes. In AISTATS, 2013. [9] Ke Zhou, Hongyuan Zha, and Le Song. Learning triggering kernels for multi-dimensional Hawkes pro- cesses. In ICML, 2013. [10] Tomoharu Iwata, Amar Shah, and Zoubin Ghahramani. Discovering latent influence in online social activities via shared cascade poisson processes. In KDD, pages 266 274. ACM, 2013. [11] Scott W Linderman and Ryan P Adams. Discovering latent network structure in point process data. ar Xiv preprint ar Xiv:1402.0914, 2014. [12] Isabel Valera, Manuel Gomez-Rodriguez, and Krishna Gummadi. Modeling adoption of competing prod- ucts and conventions in social media. ar Xiv preprint ar Xiv:1406.0516, 2014. [13] Ian Dobson, Benjamin A Carreras, and David E Newman. A branching process approximation to cas- cading load-dependent system failure. In System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on, pages 10 pp. IEEE, 2004. [14] Jakob Gulddahl Rasmussen. Bayesian inference for Hawkes processes. Methodology and Computing in Applied Probability, 15(3):623 642, 2013. [15] Alejandro Veen and Frederic P Schoenberg. Estimation of space time branching process models in seis- mology using an em type algorithm. JASA, 103(482):614 624, 2008. [16] Jiancang Zhuang, Yosihiko Ogata, and David Vere-Jones. Stochastic declustering of space-time earth- quake occurrences. JASA, 97(458):369 380, 2002. [17] Awad H Al-Mohy and Nicholas J Higham. Computing the action of the matrix exponential, with an application to exponential integrators. SIAM journal on scientific computing, 33(2):488 511, 2011. [18] David Marsan and Olivier Lengline. Extending earthquakes reach through cascading. Science, 319(5866):1076 1079, 2008. [19] Shuang-Hong Yang and Hongyuan Zha. Mixture of mutually exciting processes for viral diffusion. In ICML, pages 1-9, 2013. [20] Theodore E Harris. The theory of branching processes. Courier Dover Publications, 2002. [21] Alan G Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1):83 90, 1971. [22] John Frank Charles Kingman. Poisson processes, volume 3. Oxford university press, 1992. [23] Manuel Gomez-Rodriguez, Krishna Gummadi, and Bernhard Schoelkopf. Quantifying Information Over- load in Social Media and its Impact on Social Contagions. In ICWSM, 2014. [24] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU Press, 2012. [25] Youcef Saad and Martin H Schultz. Gmres: A generalized minimal residual algorithm for solving non- symmetric linear systems. SIAM Journal on scientific and statistical computing, 7(3):856 869, 1986. [26] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, England, 2004. [27] Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and P Krishna Gummadi. Measuring User Influ- ence in Twitter: The Million Follower Fallacy. In ICWSM, 2010. [28] Yosihiko Ogata. On lewis simulation method for point processes. Information Theory, IEEE Transactions on, 27(1):23 31, 1981.