# rain_social_roleaware_information_diffusion__c58a46d7.pdf RAIN: Social Role-Aware Information Diffusion Yang Yang , Jie Tang , Cane Wing-ki Leung , Yizhou Sun , Qicong Chen , Juanzi Li , Qiang Yang Department of Computer Science and Technology, Tsinghua University, China Tsinghua National Laboratory for Information Science and Technology (TNList), China Huawei Noah s Ark Lab, Hong Kong College of Computer and Information Science, Northeastern University, USA {sherlockbourne, cane.leung}@gmail.com, ljz@keg.tsinghua.edu.cn, jietang@tsinghua.edu.cn, yzsun@ccs.neu.edu, qyang@cse.ust.hk Information diffusion, which studies how information is propagated in social networks, has attracted considerable research effort recently. However, most existing approaches do not distinguish social roles that nodes may play in the diffusion process. In this paper, we study the interplay between users social roles and their influence on information diffusion. We propose a Role-Aware INformation diffusion model (RAIN) that integrates social role recognition and diffusion modeling into a unified framework. We develop a Gibbssampling based algorithm to learn the proposed model using historical diffusion data. The proposed model can be applied to different scenarios. For instance, at the micro-level, the proposed model can be used to predict whether an individual user will repost a specific message; while at the macro-level, we can use the model to predict the scale and the duration of a diffusion process. We evaluate the proposed model on a real social media data set. Our model performs much better in both microand macro-level prediction than several alternative methods. Introduction Information diffusion, also known as diffusion of innovations, is the study of how information propagates in or between networks (Rogers 2010). Central to information diffusion is the influence of individual nodes (or users in online social networks). In representative information diffusion models, such as the Linear Threshold (LT) model (Granovetter 1978) and the Independent Cascade (IC) model (Goldenberg, Libai, and Muller 2001), every directed link from a user v to another user u in a given network is associated with a non-negative weight, to reflect how much influence user v has on user u in information diffusion. In reality, the information diffusion process is complex, as is the influence of one user on another. How information may diffuse in a network is affected by the structure of the network, in which users structural properties reflect their social roles in different communities (Wasserman and Faust Copyright c 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1994). Users social roles in turn affect the influence they may have on other users, and hence the information diffusion process. Based on Twitter where a tweet corresponds to a piece of information and retweeting corresponds to information diffusion, a study reveals that 25% of information diffusion is controlled by 1% of users serving the role of structural hole spanners, who are bridges between otherwise disconnected communities in a network (Lou and Tang 2013). Another study shows that 50% of URLs on Twitter are posted by less than 1% of users who act as opinion leaders, who are people taking central positions in a community (Wu et al. 2011). Compared with posts originated from ordinary users, those from opinion leaders not only attracted much more retweets (larger diffusion scales), but also have longer lifespans (longer diffusion lengths). All these findings suggest that it is crucial to consider users social roles in information diffusion modeling. Social roles and diffusion are not independent of each other in nature. To further motivate our study on social role aware information diffusion, we present an exploratory analysis on a large social network with 200 million users and 174 million microblog messages. Each post (message) in this network is considered a piece of information, while reposting (or retweeting in Twitter) corresponds to the diffusion of information. We analyze how users taking three roles, namely opinion leaders, structural hole spanners and ordinary users, influence other users probability of reposting a message. Figure 1 provides the results. When an opinion leader reposts a message, the probability that her follower v will subsequently repost the message is 12 times higher than the case where the message is reposted by an ordinary user in the first place (corresponding to two-step flow theory (Lazarsfeld, Berelson, and Gaudet 1944)). More interestingly, if the number of reposting opinion leaders, all followed by v, reaches 3, the probability that v will subsequently repost decreases significantly, but keeps increasing after that. Regarding this finding, we conjecture that 2-3 opinion leaders are sufficient to spread a piece of information throughout a community, making their followers unwilling to repost a message that Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Figure 1: Diffusion influence analysis. We study how users with different roles affect other users probability of reposting a certain message. In the figure, y-axis denotes the probability that a user v will repost a certain message. X-axis denotes the number of v s followees who reposted the message before v did. most of her friends would have known already. However, when a message attracts the attention of more than 3 opinion leaders in a community, it may have become so influential and popular that reposting the message becomes a social norm that other users might want to adopt, which leads information overload to information everywhere. Results on structural hole spanners show a different story. The probability for v to repost a post keeps increasing with the number of her reposting followees who are structural hole spanners. As structural hole spanners are those who bridge different otherwise disconnected communities, they tend to bring information that a certain community is rarely exposed to, thus may be able to interest v more easily (Burt 2001). This result also suggests that most users tires to bridge information flow between different groups. To summarize, the probability that a user will repost a message depends strongly on the roles of her followees who reposted the message. It is therefore crucial to capture users social roles when modeling the information diffusion process. Intuitively, a user may play multiple roles with respect to different communities or social circles, thus exhibiting different influential strengths in different diffusion processes. For instance, one may act as an opinion leader when speaking on her area of expertise, and a structural hole spanner when forwarding a piece of news from her colleagues to her family members. How to effectively uncover the social roles users play in information diffusion processes remains an open problem. In this paper, we approach this problem through a role-aware information diffusion model. There are two intuitions behind our model. Firstly, a user may play multiple social roles in a network as noted. We therefore propose to learn a probability distribution over social roles for each user, allowing a user to play different roles in different diffusion processes. Secondly, as social roles and diffusion process are interrelated, we can exploit the observed diffusion in a network to help infer the unobserved roles of users and the influence of each role. As such, our model takes as input a social network and its information diffusion traces. It then jointly learns the social role distributions of users and the influence of each role by utilizing both users structural Figure 2: Illustration of RAIN. Notice that r2 is the social role that v2 plays when she tries to activate v1; an r with no subscript indicates the role sampled for generating a user s social attributes. properties and their behaviors as observed in the diffusion traces. We summarize our technical contributions as follows: We propose the problem of role-aware information diffusion modeling in online social networks. We formulate a generative model and devise a Gibbs sampler that integrates social roles learning and diffusion modeling into a unified probabilistic framework. Employing a large real-world network as experimental data, we conduct extensive experiments to validate the proposed model over several baselines. Social Role-Aware Diffusion Model Formulation Let G = (V , E, X) be a social network, where V is a set of users, E V V is a set of links between users, evu E, denotes a directed (follow) link from user v to u (v, u V ), and X is a |V | K social attribute matrix, with each row xv = {x1, . . . , x K|xi R} representing K social attributes of the user v. The K social attributes to use can be defined based on application-specific needs. Examples include Page Rank score (Page et al. 1999), network constraint score (Burt 2009; Lou and Tang 2013), node degree, etc. For each node v V , we use B(v) = {u|u V , evu E} to denote the set of followees of v. Different pieces of messages will be propagated over G. When a user v posts or reposts a specific message i at time t, we say that the user v is activated with respect to i at t (and will stay active after t). To model the intuition that a user may take different social roles in different diffusion processes, we associate each user with a social role distribution: Definition 1. Social Role Distribution. The social role distribution of user v V is denoted by θv, which is a R-dimensional vector and satisfies P r θvr = 1. θvr is the probability that v plays role r when diffusing a message. Model Description We propose a social Role-Aware INformation diffusion model (RAIN) for learning users social roles and modeling information diffusion simultaneously. Figure 2 illus- Table 1: Notations in the proposed model. SYMBOL DESCRIPTION R number of latent roles K total number of social attributes of users T the largest timestamp in the given diffusion trees t diffusion time delay tiu the time when u becomes active to diffuse i yitu a binary variable denoting whether user u is activated for message i at time t ru a latent variable denoting the social role of user u zt iuv a latent variable indicating whether user u successfully activates user v to diffuse i at time t θv social role distribution of user v ρr Bernoulli distribution over ziuv associated with r λr geometric distribution over t associated with r µrk, δrk mean and precision of the Gaussian distribution used to sample the k-th attribute of users with r trates our model. RAIN determines social role distribution of each user according to both her structural attributes and her behavior in diffusion process. Inspired by the work in (Lou and Tang 2013), we consider three social roles in this paper, namely opinion leaders, structural hole spanners, and ordinary users. Existing work detects social roles of users only based on their social attributes. For example, Burt (Burt 2009) treats users with small network constraint scores as structural hole spanners, while users with high pagerank scores are often considered opinion leaders (Page et al. 1999). However, using these methods alone to identify the roles of users fall short in detecting the different roles that a certain user may take in different diffusion processes. In RAIN, the social role distribution of each user is determined not only by her social attributes but also by her information diffusion behaviors. Overall, our generative model contains two parts: users social attributes generation and information diffusion process generation. Generative process. We first introduce the diffusion process generation. Inspired by our exploratory analysis, which reveals that the social role of a user affects her influential strength and diffusion delay, we introduce per-role parameters ρr and λr as the probability that users playing role r will activate another user successfully and will cause a 1timestamp diffusion delay respectively. We then use a diffusion function (e.g., a threshold function or a cascade function) parametrized by ρr and λr to determine whether a user will become active. In this paper, to make things concrete, we focus on the Independent Cascade model. More specifically, we first generate the influential strength and diffusion delay with respect to each social role r: ρr Beta(β), λr Beta(γ). Consider message i which is first posted by user u at time t, u will have a chance to activate each inactive follower v: first, we sample the role r, which user u is playing when she tries to activate v: r Mult(θu). Next, we generate a diffusion delay t according to the geometric distribution P( t|λr). At time t = t+ t + 1, we toss a coin: zt iuv Bernoulli(ρr), to determine whether u will succeed in activating v. At anytime, user v will become active if at least one of her followees activate her success- fully. Notice that multiple activation attempts are sequenced in an arbitrary order. After v becomes active, she will then execute the diffusion process we just described to try to activate her inactive followers. The process terminates when no more activation is possible. For the social attribute generation process, we assume that each attribute of a user v is sampled according to a Gaussian distribution. Users with the same social roles have similar social attributes and share the same Gaussian distribution. Thus, we first generate each user v s social role distribution: θv Dir(α). Then, for each role r, we generate K Gaussian parameters: (µrk, δrk) NG(τ), for k = 1, ..., K. Next, for the k-th attribute of user v, we generate a latent variable: r Mult(θv). Finally, we generate that attribute: xvk N(µrk, δ 1 rk ). Table 1 summarizes major notations used in RAIN. Likelihood function. For each message i, we define Ait as the set of users who become active at time t, Dit = Ai0 Ait as the set of users who are active by time t, and the binary variable yitu to denote whether user u is activated (yitu = 1) or not (yitu = 0) with respect to message i at time t. For user v, zt i v = (zt iuv)u B(v) Dit 1 is an indicator vector. zt iuv = 1 if user u succeeds in activating user v at time t to diffuse message i, and zt iuv = 0 if user u fails to activate v within time [tiu + 1, t], where tiu indicates the time u was activated to diffuse message i. We consider the probability that user u will succeed in activating one of her followers v at time t (zt iuv = 1), by considering u s social role information: r ρrλr(1 λr)t tiu 1θur (1) We define Dit as the set of users who are active by time t. If user v is not activated by user u B(v) Dit 1 within the time period [tiu + 1, t], then zt iuv = 0 with probability: r θur[ρr(1 λr)t tiu + 1 ρr] (2) Based on Eqs. (1) and (2), the probability that user v is active at time t can be expressed as: P(v Ait) = Y u B(v) Dit 1 (ϕt iuv + εt iuv) Y u B(v) Dit 1 εt iuv Further, the probability that user v is never activated by the last timestamp T can be written as: P(v / Di T ) = Y u B(v) Di T r (1 ρr)θur (4) For the social attribute generation part, we have: 2π exp{ δrk(xuk µrk)2 Based on Eqs. (3) to (5), we obtain the following likelihood function: v Ait P(v Ait) v / Di T P(v / Di T ) k=1 P(xuk) Y r=1 P(θur|α) r=1 {P(ρr|β) + P(λr|γ)} k=1 P(µrk, δrk|τ) Model learning We employ Gibbs sampling (Resnik and Hardisty 2010), (Yang et al. 2014) to estimate the unknown parameters in the proposed model. Specifically, we begin with the posterior for sampling the latent variable r for each social attribute of a user u: P(ruk|r uk, x) = n uk uruk + α P r(n uk ur + α) Γ(τ2 + nrukk Γ(τ2 + n uk rukk (τ1 + n uk rukk)η(n uk rukk, x uk rukk, s uk ruk ) p (τ1 + nrukk)η(nrukk, xrukk, sruk) where the counter nur (resp. nrk) denotes the number of times r being sampled with (resp. the k-th social attribute of) user u; xrk and srk are respectively the mean and variance of the k-th social attribute associated with role r; The superscript uk on the counters indicates exclusion of the current observation (resp. the k-th structural attribute of user u) from the counts. One challenge in Eq. (7) is the calculation of Gamma functions, which we approximated in this work using Stirling s formula (Abramowitz and Stegun 1970). The function η( ) is used to simplify the presentation of Eq. (7) and is defined as: η( ) = [τ3 + 1 2 (nrukksrukk + τ1nrukk( xrukk τ0)2 τ1 + nrukk )](τ2+ nrukk In Eqs. (7) and (8), τ is the parameter of normal-gamma prior. Similarly, we evaluate the posterior for sampling the latent variables (t, r, z) for each diffusion process: P(riuv, tiuv, ziuv|r iuv, t iuv, z iuv, y) = n iuv uriuv + α P r(n iuv ur + α) n iuv ziuvriuv + βziuv 1 β1 ziuv 0 n iuv 1riuv + β1 + n iuv 0riuv + β0 (n iuv riuv + γ1) Q t 2 t=0 (s iuv riuv n iuv riuv + γ0 + t) Q t 1 t=0 (γ1 + s iuv riuv + γ0 + t) Φ where nr (resp. nzr) denotes the number of times r sampled (resp. with z); sr denotes the sum of t that has been sampled with r. We use Φ to indicate P (y|z, t) P (y iuv|z iuv, t iuv) for brevity. Intuitively, Φ is used to handle contradictions arise during the sampling process. Please refer to more details about Φ and other implementation notes here1. We now estimate model parameters by the sampling results. The updating rules for θ, λ, and ρ can be deduced as: θur = P ( r = r|r, t, z, y) = nur + α P λr = P ( t = 1| r = r, r, t, z, y) = nr + γ1 γ1 + sr + γ0 ρr = P ( z = 1| r = r, r, t, z, y) = n(z=1)r + β1 n1r + β1 + n0r + β0 where r, t and z respectively represent a new observation of r, t and z. Note that the updating rules of both µrk and δrk involve an integration that is intractable. Hence, we approximate µrk and δrk as E(µrk) and E(δrk) respectively according to (Bernardo and Smith 2009): µrk E(µrk) = τ0τ1 + nrk xrk δrk E(δrk) = 2τ2 + nrk 2τ3 + nrksrk + τ1nrk( xrk τ0)2 Experimental Results All data and codes used here are publicly available1. Experimental Setup Data set. We conduct experiments on real data from Tencent Weibo2, a popular Twitter-like microblogging service in China. The complete data set contains the directed following networks and tweets (posting logs) of over 200 million users. If there exists a following link from a user v to another user u, we say that v is a follower of u, and that u is a followee of v. Similar to Twitter, there are two types of posts in Tencent Weibo, namely original posts (tweets) and reposts (or retweets). The reposting log of an original post essentially represents an information diffusion process. We extracted the complete following relationships between users and all posting logs of November 1st, 2011 as the training set, and those of November 2nd, 2011 as the test set to evaluate the proposed model. In total, we have 184,491 users, and 4,588,559 original posts. We removed from both the training and test sets original posts that were reposted by fewer than 5 users, and use the remaining 242,831 original posts for experiments. We further categorize posts in our data set based on their topics, as existing work has discovered that information diffusion behavior of users is dependent on the topic of the information (Yang and Leskovec 2010). Specifically, we first use LDA (Blei, Ng, and Jordan 2003) to extract latent topics from all the posts in our data set, and assign each post to the topic to which it is most relevant. Due to the space limitation, we only present the results on the 4 most popular topics: campus, horoscope, movie, and history. 1http://arnetminer.org/rain/ 2http://t.qq.com/ Tasks. We evaluate the proposed model, RAIN, based on the following two tasks. (1) At the micro-level, how accurate is RAIN in predicting whether a user will repost a given message? (2) At the macro-level, how well does RAIN predict the scale and duration of a diffusion process? Micro-Level Evaluation Evaluation setting. Given an original post (message) on a particular topic, we aim to identify users who will most likely repost this message. Specifically, for each original post in the test set, we rank all users according to their probability of reposting the given message as predicted by RAIN and several baseline methods (described below). Note that on average, each original message in our data set was only reposted by 0.008% of users. We consider the following baselines in our experiments: Count. Given an original post i, this method ranks users, in descending order, by the number of followees who have reposted i. This method assumes that a user s reposting decision only depends on her followees decisions. SVM. This method predicts whether user v will repost message i based on three features: the number of v s followers who have reposted i, the number of v s followees who have reposted i, and the number of times v reposted a message posted by the author of i before. Similar features have been utilized in (Zhang et al. 2013). This method then trains a Ranking SVM (Joachims 2002; 2006) to predict v s probability of reposting i. For Ranking SVM, we use Tree Rank SVM (Airola, Pahikkala, and Salakoski 2011) to handle our large-scale data. IC Model. This method employs the traditional Independent Cascade (IC) model (Goldenberg, Libai, and Muller 2001; Kempe, Kleinberg, and Tardos 2003). We estimate the parameters of the IC model from the training set by the learning algorithm proposed in (Kimura et al. 2011). RAIN. This is the proposed social role-aware diffusion model. For each message i, both this method and IC model use simulation to estimate the probability of a user being activated and rank all users by that. We empirically set the model parameters as: R = 10, α = 0.1, β = (1, 1), and γ = (1, 1). Performance comparison. Table 2 shows the performance of RAIN and baselines in the micro-level prediction task. Overall, all models perform unsatisfactorily, which is not surprising due to the small percentage of positive instances in the data set (around 0.008%). RAIN outperforms baselines by 32.6% in terms of MAP (mean average precision on all instances). Due to the lack of supervised information, Count performs worst on all topics. SVM generates mixed performance. It performs well on local topics (e.g., horoscope , as people tend to be interested in posts about their own constellations), but falls short on more global topics (e.g., movie ). This can be explained by the fact that SVM optimizes the reposting probability of each user independently by considering only her local diffusion features, while neglecting the overall mechanism behind the whole diffusion process. For IC, its performance is hindered by the Table 2: Performance of repost prediction on several topics. Topic Method P@10 P@50 P@100 MAP Count 0.028 0.010 0.006 0.068 SVM 0.098 0.045 0.032 0.127 IC Model 0.231 0.142 0.102 0.259 RAIN 0.228 0.145 0.106 0.263 Count 0.019 0.010 0.006 0.005 SVM 0.124 0.162 0.088 0.263 IC Model 0.149 0.111 0.098 0.125 RAIN 0.171 0.121 0.102 0.130 Count 0.015 0.007 0.004 0.009 SVM 0.094 0.111 0.060 0.199 IC Model 0.227 0.147 0.147 0.236 RAIN 0.229 0.173 0.144 0.238 Count 0.191 0.056 0.033 0.096 SVM 0.154 0.051 0.030 0.221 IC Model 0.206 0.134 0.135 0.230 RAIN 0.225 0.171 0.134 0.262 Figure 3: Social role analysis. over-fitting problem resulting from its large number of unknown parameters to learn. RAIN addresses such a problem by allowing users with the same social role to share the same diffusion patterns, thus greatly reduces the number of model parameters. Social role analysis. We further study how social roles influence the diffusion process of messages with different topics. To conduct this experiment, we first analyze the estimated Gaussian parameters of the RAIN, which summarize the structural properties of users taking a certain role, to uncover the meaning of the latent roles learned by RAIN. For instance, a latent role with high Page Rank score is considered to be representing the opinion leader role. Next, we group users into opinion leaders, structural hole spanners, and ordinary users. Finally, we use RAIN to perform pergroup predictions and analyze the results. We present four more topics in this experiment: society, health, political, and travel. As Figure 3 shows, RAIN can better predict the diffusion behavior of opinion leaders and structural hole spanners, as ordinary users tend to behave more randomly. Fur- (b) Horoscope (d) History Figure 4: Diffusion scale distributions of the different topics in the test set. (b) Horoscope (d) History Figure 5: Diffusion duration distributions of the different topics in the test set. thermore, opinion leaders can be better predicted on more regional and specialized topics (e.g., campus , society and political ), while structural hole spanners can be better predicted on more general topics, which tend to propagate from one community to another more easily (e.g., movie , history , and travel ). Macro-Level Evaluation Evaluation setting. At the macro-level, we use the fitted model to predict the scale and duration of a diffusion process. Specifically, we first trace the diffusion process of each topic by selecting all original posts relevant to that topic. Then, we evaluate how accurate RAIN can predict for each topic its diffusion scale, defined as the number of reposts of the original posts under that topic, and the diffusion duration, defined as the last reposting time of these posts. We use the IC model as the baseline for comparison. Scale and duration prediction. Figs. 4(a)-(d) show the diffusion scale prediction results for the 8 different topics. The x-axis in each sub-figure denotes the number of reposts, and the y-axis denotes the proportion of original posts with a particular number of reposts. Overall, RAIN performs better, while the baseline method tends to overestimate diffusion scale. Figs. 5(a)-(d) show the diffusion duration prediction results of the two models. The x-axis in each sub-figure denotes the time interval between the posting time of an original post and the latest repost time of it, while the y-axis shows the proportion of the original posts with a particular diffusion duration. Related Work Recent years have seen extensive modeling efforts on the information diffusion (Lerman and Ghosh 2010; Gomez Rodriguez, Leskovec, and Krause 2010; Leskovec et al. 2007; Sadikov et al. 2011), with the two types of fundamental models being Linear Threshold (LT) models (Granovetter 1978) and Independent Cascade (IC) models (Goldenberg, Libai, and Muller 2001). Both types of models assume that the tendency of an inactive user to become active increases monotonically with the number of her active neighbors. However, according to the experiments conducted in this paper, we show that the probability of a user become active is not a simple monotonic function of the number of her active neighbors, but is relevant to the social roles of her followees. Social influence and conformity is another related topic. Barbieri et al. (2013) studied social influence from a topic modeling perspective. Myers et al. (2012) considered external influence in information diffusion. In their model, information can be diffused to a node through links in the given network or through influence of external sources. Tang et al. (2009) studied the problem of learning influence probabilities between users in social networks. Tang et al. (2013) further investigate how conformity influence users behaviors and Zhang et al. (2014) extended the problem with awareness of social roles. Rodriguez et al. (2013) applied the survival theory to generalize some existing diffusion models into a multiplicative model. In contrast to our work, these studies focus only on the diffusion process without considering how different types of users may influence such process. In this paper, we study a novel problem of social role-aware information diffusion, with an emphasis on understanding the interplay between users social roles and their influence on information diffusion. We propose a social role-aware information diffusion (RAIN) model, which integrates social role extraction and diffusion modeling into a unified framework. We evaluate the proposed model on a real social media data set at both microand macro-levels. Compared with several alternative methods, our model shows better performance. Acknowledgements Yang Yang and Jie Tang are supported by National 863 project (No. 2014AA015103), National 973 projects (No. 2014CB340506, No. 2012CB316006, No. 2011CB302302), NSFC (No. 61222212), and a research fund from Huawei Inc. Qiang Yang and Cane Leung have been supported in part by National 973 project 2014CB340304 and Hong Kong RGC Projects 621013, 620812, and 621211. Yizhou Sun is supported by Yahoo! ACE Award and NEU TIER 1 Grant. Abramowitz, M., and Stegun, I. 1970. Handbook of mathematical functions. Dover Publishing Inc. New York. Airola, A.; Pahikkala, T.; and Salakoski, T. 2011. An improved training algorithm for the linear ranking support vector machine. In ICANN 2011, 134 141. Barbieri, N.; Bonchi, F.; and Manco, G. 2013. Topic-aware social influence propagation models. Knowledge and information systems 37(3):555 584. Bernardo, J. M., and Smith, A. F. 2009. Bayesian theory, volume 405. Wiley. com. Blei, D. M.; Ng, A. Y.; and Jordan, M. I. 2003. Latent dirichlet allocation. JMLR 3:993 1022. Burt, R. S. 2001. Structural holes versus network closure as social capital. Social capital: Theory and research 31 56. Burt, R. S. 2009. Structural holes: The social structure of competition. Harvard University Press. Goldenberg, J.; Libai, B.; and Muller, E. 2001. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing letters 12(3):211 223. Gomez Rodriguez, M.; Leskovec, J.; and Krause, A. 2010. Inferring networks of diffusion and influence. In KDD 10, 1019 1028. Granovetter, M. 1978. Threshold models of collective behavior. American journal of sociology 83(6):1420. Joachims, T. 2002. Optimizing search engines using clickthrough data. In KDD 02, 133 142. Joachims, T. 2006. Training linear svms in linear time. In KDD 06, 217 226. Kempe, D.; Kleinberg, J.; and Tardos, E. 2003. Maximizing the spread of influence through a social network. In KDD 03, 137 146. Kimura, M.; Saito, K.; Ohara, K.; and Motoda, H. 2011. Learning information diffusion model in a social network for predicting influence of nodes. Intelligent Data Analysis 15(4):633 652. Lazarsfeld, P. F.; Berelson, B.; and Gaudet, H. 1944. The peoples choice: How the voter makes up his mind in a presidential election. New York: Duell, Sloan and Pearce. Lerman, K., and Ghosh, R. 2010. Information contagion: An empirical study of the spread of news on digg and twitter social networks. In ICWSM 10, 90 97. Leskovec, J.; Mc Glohon, M.; Faloutsos, C.; Glance, N. S.; and Hurst, M. 2007. Patterns of cascading behavior in large blog graphs. In SDM 07, 551 556. Lou, T., and Tang, J. 2013. Mining structural hole spanners through information diffusion in social networks. In WWW 13, 825 836. Myers, S. A.; Zhu, C.; and Leskovec, J. 2012. Information diffusion and external influence in networks. In KDD 12, 33 41. Page, L.; Brin, S.; Motwani, R.; and Winograd, T. 1999. The pagerank citation ranking: Bringing order to the web. Technical Report SIDL-WP-1999-0120, Stanford University. Resnik, P., and Hardisty, E. 2010. Gibbs sampling for the uninitiated. Technical report, DTIC Document. Rodriguez, M. G.; Leskovec, J.; and Sch olkopf, B. 2013. Modeling information propagation with survival theory. ICML 13 666 674. Rogers, E. M. 2010. Diffusion of innovations. Simon and Schuster. Sadikov, E.; Medina, M.; Leskovec, J.; and Garcia-Molina, H. 2011. Correcting for missing data in information cascades. In WSDM 11, 55 64. Tang, J.; Sun, J.; Wang, C.; and Yang, Z. 2009. Social influence analysis in large-scale networks. In KDD 09, 807 816. Tang, J.; Wu, S.; and Sun, J. 2013. Confluence: Conformity influence in large social networks. In KDD 13, 347 355. Wasserman, S., and Faust, K. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press. chapter 9. Wu, S.; Hofman, J. M.; Mason, W. A.; and Watts, D. J. 2011. Who says what to whom on twitter. In WWW 11, 705 714. Yang, J., and Leskovec, J. 2010. Modeling information diffusion in implicit networks. In ICDM 10, 599 608. Yang, Y.; Jia, J.; Zhang, S.; Wu, B.; Li, J.; and Tang, J. 2014. How do your friends on social media disclose your emotions? In AAAI 14, 1 7. Zhang, J.; Liu, B.; Tang, J.; Chen, T.; and Li, J. 2013. Social influence locality for modeling retweeting behaviors. In AAAI 13, 2761 2767. Zhang, J.; Tang, J.; Zhuang, H.; Leung, C. W.-K.; and Li, J. 2014. Role-aware conformity influence modeling and analysis in social networks. In AAAI 14, 958 965.