# learning_diffusions_without_timestamps__439c9a00.pdf The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) Learning Diffusions without Timestamps Hao Huang,1 Qian Yan,1 Ting Gan,1 Di Niu,2 Wei Lu,3 Yunjun Gao4 1School of Computer Science, Wuhan University, China 2Department of Electrical and Computer Engineering, University of Alberta, Canada 3School of Information and DEKE, MOE, Renmin University of China, China 4College of Computer Science and Technology, Zhejiang University, China {haohuang, qy, ganting}@whu.edu.cn, dniu@ualberta.ca, lu-wei@ruc.edu.cn, gaoyj@zju.edu.cn To learn the underlying parent-child influence relationships between nodes in a diffusion network, most existing approaches require timestamps that pinpoint the exact time when node infections occur in historical diffusion processes. In many real-world diffusion processes like the spread of epidemics, monitoring such infection temporal information is often expensive and difficult. In this work, we study how to carry out diffusion network inference without infection timestamps, using only the final infection statuses of nodes in each historical diffusion process, which are more readily accessible in practice. Our main result is a probabilistic model that can find for each node an appropriate number of most probable parent nodes, who are most likely to have generated the historical infection results of the node. Extensive experiments on both synthetic and real-world networks are conducted, and the results verify the effectiveness and efficiency of our approach. Introduction In real life, many underlying influence relationships among people form various diffusion networks, spreading different contents such as information, viewpoints, or even viruses. Diffusion network inference aims to uncover these influence relationships based on the spread results observed in historical diffusion processes. Therefore, this problem is fundamental in many applications such as information propagation (He et al. 2015), viral marketing (Leskovec, Adamic, and Huberman 2007), and epidemic prevention (Wallinga and Teunis 2004), in which the inferred influence relationships can intuitively illustrate the latent diffusion paths, and help users to better predict, promote or prevent future diffusion events. Most existing approaches to diffusion network inference are based on the following basic ideas: nodes that are infected sequentially within a time interval are assumed to have influence relationships, and the previously infected ones are regarded as potential parent nodes of the subsequently infected ones. Hence, these approaches require the observed spread results used by them (known as cascades) to include the exact infection timestamps of the infected nodes in each diffusion process. To infer diffusion networks Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. with given cascades, a major method is adopting a convex optimization framework to find influence relationships that maximize the likelihood of the cascades, and utilizing techniques, such as sequential quadratic programming (Myers and Leskovec 2010; Gomez-Rodriguez, Balduzzi, and Sch olkopf 2011), block coordinate descent (Du et al. 2012), stochastic and proximal gradient methods (Gomez Rodriguez, Leskovec, and Sch olkopf 2013b; Daneshmand et al. 2014), survival theory (Gomez-Rodriguez, Leskovec, and Sch olkopf 2013a), EM algorithm (Wang et al. 2014; Rong, Zhu, and Cheng 2016), and sparse recovery (Pouget-Abadie and Horel 2015), to approximate the optimal solution. While several other approaches adopt non-convex optimization frameworks, each of them finally decouples the non-convex problem into multiple smaller convex problems (Netrapalli and Sanghavi 2012; Narasimhan, Parkes, and Singer 2015; Kalimeris et al. 2018). Another effective method is transforming the problem of diffusion network inference into that of submodular optimization (Gomez-Rodriguez, Leskovec, and Krause 2010; Gomez-Rodriguez and Sch olkopf 2012) by constructing a likelihood function of cascades with the property of submodularity. Then, a near-optimal solution of the submodular optimization problem can be achieved by applying greedy algorithm. It has been validated that with adequate amount of complete and correct cascades, the influence relationships can be accurately recovered, even using some simple inference approaches (Abrahao, Chierichetti, and Kleinberg 2013). In addition, some more sophisticated approaches are proposed to handle the case that cascades have partial incorrect infection timestamps (Sefer and Kingsford 2015) or miss partial snapshots of the network (He et al. 2016; Lokhov 2016). Although the cascade-based approaches have shown their efficacy on diffusion network inference, acquiring the cascades are often expensive in many real-world diffusion processes, such as the spread of epidemics and the viral marketing campaigns. This is because, unlike tracing the infection timestamps of nodes in online social networks (e.g., Facebook and Weibo), monitoring the infections of nodes to obtain the cascades through these real-world diffusion processes is labor/resource demanding and time consuming. Furthermore, due to a few unavoidable objective factors such as a long incubation period (i.e., the time from infection to illness), the observed cascades may not reflect the exact occurrence time of infections. To avoid the limitation of cascade-based approaches, new techniques are required to learn diffusions without the infection timestamps. To the best of our knowledge, only two existing works have partially addressed this problem by learning either from path traces (referred to as PATH henceforth) or from the seed and resulting sets of infected nodes (referred to as S2R henceforth). In PATH (Gripon and Rabbat 2013), the learner is assumed to have all pathconnected triples, i.e., three nodes that are activated along a diffusion path through the network. Although PATH has a solid mathematical foundation, the triples are not always naturally observable in practice. Even if complete and correct cascades are available, inferring exact triples is still challenging. S2R (Amin, Heidari, and Kearns 2014) calculates the lifting effect of each seed node u to another infected node v, which measures the increase in the probability of v s infection on the condition that u is previously infected. Then, S2R adds a directed edge (i.e., an influence relationship) from u to v, if u has the greatest current lifting effect to v. It s worth noting that if there is no priori knowledge on the number of edges in the network, S2R will keep adding edges until all nodes are connected. Aiming at a more general solution to diffusion learning, in this paper, we study the problem of how to infer diffusion networks with only the final infection statuses of the nodes in historical diffusion processes, which are more readily accessible in practice. We propose an effective and efficient algorithm called TWIND (Diffusion Network Inference Without Timestamps) for this problem. TWIND reveals potential parent-child influence relationships by finding for each node a set of most probable parent nodes, who are most likely to have generated the observed infection results. To this end, we present a probabilistic model to quantify the possibility of inferred influence relationships given the observed final infection statuses. Based on the model, we can also theoretically derive an upper limit on the amount of most probable parent nodes for each node in the network, which helps TWIND to prevent its inferred diffusion network from containing too many low-probability influence relationships not in the original diffusion network. Furthermore, to reduce redundant computation during the execution of TWIND, we disqualify the insignificant candidate parent nodes whose infections have rather weak correlations with the infections of the corresponding child nodes, and exclude them from the selection of most probable parent nodes. In summary, our key contributions include the following: (1) We propose a new infection timestamp-free approach for diffusion network inference. To execute this approach, there is no need to monitor the infection timestamps of nodes through each diffusion process, or to worry about the correctness of observed infection timestamps. Except for the final infection statuses of nodes, the approach does not need any other information of infections or priori knowledge on the network. (2) We theoretically guarantee that TWIND will find for each node an limited number of most probable parent nodes, avoiding an overly complex inferred network. In what follows, we first present our problem statement, and then elaborate our proposed TWIND algorithm, followed by reporting experimental results and our findings before concluding the paper. Problem Statement A diffusion network can be represented as a directed graph G = {V, E}, where V = {v1, v2, ..., vn} is the set of n nodes in the network, and E is the set of m directed edges (i.e., influence relationships) between the nodes. A directed edge from a parent node vi to a child node vj indicates that when vi is infected and vj is uninfected, vi will successfully infect vj with a certain probability (which can be regarded as the edge weight of this directed edge). As a few existing approaches have presented how to calculate the edge weight based on observed infection status results for a specified edge (Yan et al. 2017), in this paper, we focus on inferring the unknown directed edge set of the objective network. Our problem can be formulated as follows. Given: a set S = {S1, ..., Sβ} of infection status results observed on a diffusion network G in β diffusion processes, where Sℓ= {sℓ 1, ..., sℓ n} is a n-dimensional vector that records the final infection status sℓ i {0, 1} (1 for infected status and 0 for uninfected status) of each node vi V observed in the ℓ-th diffusion process (ℓ {1, . . . , β}). Infer: the edge set E of the diffusion network G. The TWIND Algorithm In this section, we first introduce how to identify the most probable parent nodes for each node in the network, followed by a theoretical analysis of the upper limit on the amount of most probable parent nodes, and then we present how to reduce redundant computation during the identification of most probable parent nodes before giving the detailed steps of our TWIND algorithm. We conclude this section with a complexity analysis on our approach. Identification of Most Probable Parent Nodes Let matrix A Rn n be a network structure variable, of which each element Aij {0, 1} (i, j {1, . . . , n}) indicating whether there is a directed edge from node vi to node vj (1 for yes, 0 for no), diffusion network inference using infection status results S is equivalent to finding a optimal A that maximizes the following conditional probability: P(A | S) = P(A, S) A Q P(A , S) (1) where set Q is the set of all possible matrices of A, and the value of A Q P(A , S) is a certain constant. Maximizing probability P(A | S) is equivalent to maximizing the joint probability P(A, S), which can be calculated as follows. B P(S|A, B)f(B|A)P(A)d B B P(S|A, B)f(B|A)d B (2) where B is a n n block matrix related to A. If Aij = 0, then Bij is a 2 2 zero matrix; if Aij = 1, then Bij is a 2 2 nonnegative matrix, of which each element refers to a conditional probability P(Xj | Xi) 0, where Xi {0, 1} and Xj {0, 1} are the infection status variables of nodes vi and vj, respectively. Since each historical diffusion process is independent to each other, each Sℓis generated independently. Moreover, since the infection of each node can be only affected by its parent nodes during each diffusion process, the relationship P(X1, . . . , Xn) = n i=1 P(Xi | XFi) holds, where Fi is the parent node set of node vi, and XFi is the infection status variables of the parent nodes of vi. Then, we can reformulate the probability P(S|A, B) in Eq. (2) as follows. P(S|A, B) = β ℓ=1 P(Sℓ| A, B) ℓ=1 P(X1 = sℓ 1, . . . , Xn = sℓ n | A, B) i=1 P(Xi = sℓ i | XFi = πℓ i, B) k=1 P(Xi = sk | XFi = πij, B)Nijk k=1 θNijk ijk where πℓ i refers to the infection statuses of vi s parent nodes in the ℓ-th diffusion process, sk {0, 1} is the k-th possible infection status of a node (without loss of generality, s1 = 0, s2 = 1), 2|Fi| refers to the number of all possible combinations of the infection statuses of vi s parent nodes, πij is the corresponding j-th possible combination, Nijk is the number of times situation Xi = sk XFi = πij appears in S, θijk is equal to P(Xi = sk | XFi = πij, B), and vi, 2|Fi| j=1 2 k=1 Nijk = β, θij1 + θij2 = 1. Let f(θij1, θij2) denote the probability density function of (θij1, θij2). Since there is no correlation between the influences of different parent nodes to a same child node (or a same parent node to different child nodes), relationship f(B|A) = n i=1 2|Fi| j=1 f(θij1, θij2) holds. Combining it with Eq. (3), we can reformulate the calculation of probability P(A, S) as follows. ( n i=1 2|Fi| j=1 2 k=1 θNijk ijk ) ( n i=1 2|Fi| j=1 f(θij1, θij2) ) ( 2 k=1 θNijk ijk f(θij1, θij2) ) dθij1dθij2 (4) As there is no prior knowledge on the value of θijk, we are indifferent to regard every possible value of θijk. In other words, we assume that distribution f(θij1, θij2) is uniform, for 1 i n, 1 j 2|Fi|. Then, the value of f(θij1, θij2), denoted as cij, is a constant. According to the property of probability density function, we have f(θij1, θij2)dθij1dθij2 cijdθij1dθij2 = 1 (5) According to Dirichlet s integral, we have dθij1dθij2 = 1 (2 1)! = 1 (6) k=1 θNijk ijk = Nij1! Nij2! (Nij1 + Nij2 + 2 1)! Combining Eqs. (5) & (6), we have f(θij1, θij2) = cij = 1. Combining this conclusion with Eq. (7), we can finally formulate the calculation of probability P(A, S) as follows. P(A, S) = P(A) Nij1! Nij2! (Nij1 + Nij2 + 1)! (8) To estimate P(A, S), we need a prior probability P(A) for each possible network structure, and the numbers Nij1 and Nij2 determined by each node vi and its parent nodes Fi in corresponding network structure. Nij1 and Nij2 can be counted from the observed infection status results S, while there is no prior knowledge on the network structure to estimate P(A). Furthermore, as there are 2n(n 1) possible network structures, it is not feasible to apply Eq. (8) for each possible network structure when n is large. Therefore, we equally treat each possible network structure by assuming equal priors on A, i.e., the prior probability P(A) is equal to a certain constant. Then, to maximize P(A, S), what we need to do is finding for each node vi a optimal parent node set Fi that maximizes the following scoring function. g(vi, Fi) = log Nij1! Nij2! (Nij1 + Nij2 + 1)! ( log Nij1! + log Nij2! log(Nij1 + Nij2 + 1)! where the base of log is 2. Then, the nodes in this optimal Fi are regarded as the most probable parent nodes of vi. Given the above scoring function, we can utilize greedy search to find the most probable parent nodes for vi. It starts from an empty parent node set Fi, and expands the set Fi by incrementally adding a node combination (i.e., a subset of V \{vi}) that can most increase the value of current g(vi, Fi) until this value dose not increase. In this way, we can efficiently achieve a local optimal solution of Fi. A similar greedy-search strategy is commonly used in many other applications, such as influence maximization (Tang, Xiao, and Shi 2014) and classification (Huang et al. 2014), due to its high efficiency and nice search performance. Upper Limit on Amount of Parent Nodes At the beginning of the greedy search for most probable parent nodes of a node vi, Fi = and current g(vi, Fi) can be calculated as follows. g(vi, ) = 2 k=1 log Nik! log(β + 1)! (10) where Nik is the number of times situation Xi = sk appears in S, and 2 k=1 Nik = β. Since ( N e )N < N! < e( N 2 )N always holds for any positive integer N, we can deduce a lower bound on the value of g(vi, ) as follows. k=1 log (Nik )Nik log e (β + 1 k=1 Nik log Nik e log e (β + 1) log β + 1 k=1 Nik log β (β + 1) log β + 1 e βH(Xi) log e (β + 1) log β + 1 where H(Xi) is the entropy of variable Xi. When the greedy search method adds a few nodes into set Fi (i.e., Fi = ), the following inequality should hold. g(vi, Fi) > g(vi, ) (12) Moreover, although there are 2|Fi| possible combinations of the infection statuses of nodes in Fi, some combinations may not have instances in the observed infection status results S. We denote the number of these non-existent combinations as δi. It can be obtained by checking how many of the 2|Fi| possible combinations have instances in S. As each of these existent combinations has at least one instance in S, i.e., Nij1+Nij2 1, we can have relationship 2|Fi| j=1 log(Nij1 + Nij2 + 1) 2|Fi| δi j=1 log(1 + 1). In addition, given the fact that N! N N, we can deduce a upper bound on the value of g(vi, Fi) as follows. k=1 log Nijk Nijk ) k=1 Nijk log ( β Nijk ) (2|Fi| δi) k=1 Nijk log β 2|Fi| + δi = β log β 2|Fi| + δi βH(Xi, XFi) (13) where H(Xi, XFi) is the entropy of variables Xi and XFi. Combining Eqs. (11) (13), we have relationship β log β βH(Xi, XFi) 2|Fi| + δi e βH(Xi) log e (β + 1) log β + 1 which can be translated as 2|Fi| < (β + 1) log ( eβ + 1 ) + δi βH(XFi|Xi) (15) where H(XFi|Xi) = H(Xi, XFi) H(Xi) is the entropy of XFi conditioned on Xi . As relationship H(XFi|Xi) 0 always holds, we have |Fi| < log ( (β + 1) log ( eβ + 1 Therefore, by using the greedy search method to find a set Fi of most probable parent nodes for a node vi, the upper limit η for the set size of Fi is log ( (β + 1) log ( e β+1 2 ) + δi ) . In practice, if there is enough historical data logged in S, i.e., β δi, we can adopt a fast estimation on the value of η by using η = log ( (β + 1) log ( e β+1 2 )) . Based on this η, each possible node combination that has the chance to be added into current Fi should satisfy the condition that when this node combination is added into current Fi, the size of new Fi will not be greater than η. Pruning of Candidate Parent Nodes In a given diffusion network with a node set V , each node vj V \{vi} could be a candidate parent node for node vi V , resulting in η i=1 ( i n 1 ) possible parent node combinations for vi, where n is the number of nodes in V . To avoid redundant computation, we should prune the candidate parent nodes to reduce the number of possible parent node combinations for each node vi in the network. Given the fact that the infections of nodes are affected by their parent nodes, the infection statuses of the parent nodes and corresponding child nodes should have correlation. In other words, if the infection statuses of two nodes are independent or have extremely low correlation to each other, there is a very low probability that these nodes have influence relationship between them. To quantify the correlation between two variables, mutual information (abbreviated as MI) is a commonly used criterion and can be calculated as MI(Xi, Xj) = P(Xi, Xj) log P(Xi, Xj) P(Xi)P(Xj). (17) A greater value of MI(Xi, Xj) indicates a stronger correlation between the infection statuses of nodes vi and vj. Furthermore, in a real-world diffusion network, each node vi often has a finite number of parent nodes. Many other nodes in this network do not have influence relationships to vi, and their infection statuses often have no (or very low) correlations to the infection status of vi, resulting in very small MI values (close to 0). These very small MI values form a compact cluster with a very small mean (close to 0). Algorithm 1: The TWIND Algorithm Input : Node set V = {v1, . . . , vn}, infection status results S = {S1, . . . , Sβ} observed on V . Output: The diffusion network G = {V, E}. 1 E = ; // the set of directed edge 2 vi V, vj V , calculate MI(Xi, Xj) by Eq. (17); 3 Partition all MI values into two groups by K-means (with K = 2 and one mean fixed to 0), and set τ to the greatest MI value in the group with mean closer to 0; 4 η = log ( (β + 1) log ( e β+1 2 )) ; //|Fi| s upper limit 5 for each vi V do 6 Fi = ; // inferred parent node set of vi 7 Pi = ; // candidate parent node set of vi 8 Ci = ;//set of possible parent node combinations 9 for each vj V (j = i) do 10 if MI(Xi, Xj) > τ then 11 Pi = Pi {vj}; // merge vj into Pi 12 for each W Pi, |W| η do 13 Calculate g (vi, W) by Eq. (9); 14 Ci = {Ci, W}; // add a new element W to Ci 15 while Ci = do 16 W = arg max W Ci g (vi, W); 17 if |Fi W | η then 18 Fi = Fi W ; 19 Ci = Ci\W ; 20 E = {(vj, vi)|vj Fi} E; // (vj, vi) is directed Inspired by this kind of situations, we can carry out a heuristic pruning method to screen out insignificant candidate parent nodes for each node. As the very small MI values form a compact cluster with a mean close to 0, we can execute K-means with K = 2 and fix one of the two means to 0 through all the iterations of K-means. This modified Kmeans algorithm can efficiently partition all MI values into two groups, in which one group has a mean very close to 0. Let τ be the greatest MI value in the group with a mean closer to 0, then for each MI(Xi, Xj) τ, we regard the corresponding node vj as an insignificant candidate parent node for vi, and exclude it from the candidate parent node set of vi, since the very small MI value means that there is a high probability that vj has no influence relationship to vi. To infer a diffusion network with observed infection status results S, we introduce how to identify the most probable parent nodes for each node in the network, deduce an upper limit on the amount of most probable parent nodes, and present a heuristic pruning method to help avoid redundant computation during the identification of most probable parent nodes. Based on these preparing work, we propose an algorithm called TWIND, which is outlined in Algorithm 1. TWIND takes as inputs node set V of the objective diffusion network G and a set S of infection status results observed on V in β diffusion processes, and consists of two main phases, i.e., (1) the phase of pruning candidate parent nodes, which calculates the MI value for each node pair by Eq. (17) (lines 2), and performs K-means to select candidate parent nodes with greater MI values (lines 3, 9 11), and (2) the phase of greedy search for the parent node set Fi of each node vi (lines 12 20), which first traverses all possible parent node combinations and calculates corresponding scores by scoring function proposed in Eq. (9) (lines 12 14), and then continuously expands the parent node set Fi with the highest scored parent node combinations until the size of Fi is equal to the upper bound η or there is no more possible parent node combination (lines 15 19). Finally, a directed edge from each node in Fi to vi will be added into the inferred edge set E of the objective diffusion network G (line 20). Complexity Analysis In TWIND, the most computationally expensive process consists of the following two parts. (1) In the phase of pruning candidate parent nodes, calculating MI values requires O(βn2) time, and performing K-means on these MI values takes O(tn2) time, where n is the number of nodes in the network, β is the number of historical diffusion processes, and t is the number of iterations of K-means (t n). (2) Since there are at most η i=1 (i κ ) ηκη possible parent node combinations for each node, scoring each possible parent node combination takes at most O(βη) time, scoring all of them takes at most O(η2κηnβ) time, where η n is the upper-bound of parent node set size, and κ is the number of candidate parent nodes for each node. As the candidate parent nodes are pruned by our proposed heuristic method, κ is usually much less than n, i.e., κ n. In summary, the overall time complexity of TWIND is about O(βn2 + tn2 + η2κηnβ), where t n, η n, and κ n. Therefore, the runtime of TWIND mainly depends on the network size and the number of diffusion processes. Experimental Evaluation In this section, we first introduce the experimental setup, and then verify the effectiveness and efficiency of our TWIND algorithm on synthetic and real-world networks. To this end, we investigate the effects of diffusion network size, diffusion network s average degree, initial infection ratio, and the amount of diffusion processes on the accuracy performance and runtime of TWIND. All algorithms in the experiments are implemented in Java, running on a desktop PC with Intel Core i3-6100 CPU at 3.70GHz and 8GB RAM. Experimental Setup Network. We adopt LFR benchmark graphs (Lancichinetti, Fortunato, and Radicchi 2008) as the synthetic networks. By setting different generation parameters, such as the number of nodes and the average degree of each node, we generate two series of LFR benchmark graphs with properties summarized in Table 1. In addition, we adopt two real-world networks, i.e., Net Sci (Newman 2006) which Table 1: Properties of LFR benchmark graphs Graphs Number of Nodes Average Degree LFR1-5 100,150,200,250,300 4 LFR6-10 200 2,3,4,5,6 is a coauthorship network containing 379 scientists and 1602 coauthorships, and DUNF (Wang et al. 2014) which is a microblogging network containing 750 users and 2974 following relationships. Infection Data. The infection status results S can be obtained by simulating β times of diffusion processes on each network with randomly selected initially infected nodes in each simulation (α denotes the initial infection ratio). Corresponding cascades are also recorded for cascade-based tested algorithms in the experiments. In each diffusion process, each infected node tries to infect its uninfected child nodes with a transmission rate, which subjects to a Gaussian distribution with a mean of 0.3 and a standard deviation of 0.05, to make about 95% of transmission rate values are within a range from 0.2 to 0.4. Performance Criterion. To evaluate the accuracy performance of TWIND algorithm, we report the F-score (i.e., the harmonic mean of precision and recall) of its inferred directed edges, which can be calculated as follows. Precision = NT P NT P + NF P , Recall = NT P NT P + NF N , F-score = 2 Precision Recall Precision + Recall , where NT P refers to the number of true positives, i.e., the true edges which are correctly inferred by the algorithm; NF P refers to the number of false positives, i.e., the wrong inferred edges which are not in the real network; and NF N refers to the number of false negatives, i.e., the true edges which are not correctly inferred by the algorithm. Benchmark Algorithms. We compare our algorithm with a classical convex programming-based approach Net Rate (Gomez-Rodriguez, Balduzzi, and Sch olkopf 2011), a state-of-the-art non-convex programming-based approach using hyper-parameters (referred to as Hyper henceforth) (Kalimeris et al. 2018), a high performance submodularity-based approach Mul Tree (Gomez-Rodriguez and Sch olkopf 2012), and an efficient infection timestampfree approach S2R (Amin, Heidari, and Kearns 2014) for performance comparison. Since Net Rate infers the transmission rate between each two node in the network, we give Net Rate a privilege in accuracy performance comparison, i.e., by calculating the F-score of edges whose transmission rates are greater than a threshold, we use different thresholds to find a highest F-score and report this F-score as the final accuracy performance of Net Rate. Moreover, since Mul Tree and S2R need users to specify the number of edges to be inferred, we use the real number m of edges in the network as an input of these two algorithms. 100 150 200 250 300 0.1 Network Size S2R Net Rate Hyper Mul Tree (a) F-score 100 150 200 250 300 Network Size Runtime (seconds) S2R Net Rate Hyper Mul Tree TWIND (b) Runtime Figure 1: Effect of diffusion network size 2 3 4 5 6 0 Average Degree of Network S2R Net Rate Hyper Mul Tree TWIND (a) F-score 2 3 4 5 6 10 2 Average Degree of Network Runtime (seconds) S2R Net Rate Hyper Mul Tree TWIND (b) Runtime Figure 2: Effect of average degree of diffusion network Effect of Diffusion Network Size To study the effect of diffusion network size on algorithm performance, we adopt five synthetic networks, i.e., LFR1 5, of which the sizes vary from 100 to 300. We simulate 150 times of diffusion processes on each network (i.e., β = 150). In each simulation, 0.15n nodes are randomly selected as the initial infected nodes (i.e., α = 0.15). Fig. 1 illustrates the F-score and runtime of each tested algorithm, from which we can observe that (1) a greater diffusion network size tends to degrade the accuracy performance of Net Rate, Hyper, Mul Tree and S2R, while the accuracy performance of TWIND is reasonably insensitive to diffusion network size and outperforms that of the others. (2) The runtime of each tested algorithm increases with the growth of diffusion network size. S2R executes the fastest (but with a low accuracy performance), and TWIND is reasonably more efficient than Net Rate, Hyper and Mul Tree. Effect of Average Degree of Diffusion Network To study the effect of diffusion network s average degree on algorithm performance, we test the algorithms on five synthetic networks, i.e., LFR6 10, of which the average degrees vary from 2 to 6. We simulate 150 times of diffusion processes on each network (i.e., β = 150). In each simulation, 0.15n nodes are randomly selected as the initial infected nodes (i.e., α = 0.15). Fig. 2 illustrates the F-score and runtime of each algorithm, from which we can observe that (1) diffusion networks with greater average degrees degrade the accuracy performance of Hyper, Mul Tree, S2R and TWIND. The accuracy performance of Net Rate increases when the average degree increases from 2 to 5, and decreases after the 0.05 0.1 0.15 0.2 0.25 0 Initial Infection Ratio S2R Net Rate Hyper Mul Tree TWIND (a) F-score 0.05 0.1 0.15 0.2 0.25 10 2 Initial Infection Ratio Runtime (seconds) S2R Net Rate Hyper Mul Tree TWIND (b) Runtime Figure 3: Effect of initial infection ratio on Net Sci 0.05 0.1 0.15 0.2 0.25 0 Initial Infection Ratio S2R Net Rate Hyper Mul Tree TWIND (a) F-score 0.05 0.1 0.15 0.2 0.25 Initial Infection Ratio Runtime (seconds) S2R Net Rate Hyper Mul Tree TWIND (b) Runtime Figure 4: Effect of initial infection ratio on DUNF average degree exceeds 5. Compared with the other tested algorithms, our TWIND algorithm has a reasonably better accuracy performance. (2) The runtimes of Net Rate, Hyper, Mul Tree, S2R and TWIND increase with the growth of average degree, and TWIND shows a significant advantage on efficiency performance over Net Rate, Hyper and Mul Tree. Effect of Initial Infection Ratio The ratio of initially infected nodes may affect the number of final infected nodes. To study the effect of initial infection ratio on algorithm performance, we test the algorithms on two real-world networks Net Sci and DUNF with different initial infection ratios α (vary from 0.05 to 0.25). For each initial infection ratio, we simulate 150 times of diffusion processes on each network (i.e., β = 150). Figs. 3 & 4 illustrate the F-score and runtime of each algorithm on Net Sci and DUNF, respectively. From the figures we can observe that (1) a greater initial infection ratio tends to improve the accuracy performance of Mul Tree, while degrading the accuracy performance of Net Rate, Hyper and S2R. TWIND is reasonably insensitive to initial infection ratio and has better accuracy performance. (2) The increase of initial infection ratio has little effect on the runtime of Net Rate, Hyper, S2R and TWIND, but results in more runtime for Mul Tree. Similar results can also be observed on synthetic networks LRF1 10. Effect of Amount of Diffusion Processes The inference of diffusion network is based on the observed diffusion results of diffusion processes. Hence, the amount of diffusion processes may affect the accuracy performance of diffusion network inference. Generally, more diffusion 50 100 150 200 250 Number of Diffusion Processes S2R Net Rate Hyper Mul Tree TWIND (a) F-score 50 100 150 200 250 Number of Diffusion Processes Runtime (seconds) S2R Net Rate Hyper Mul Tree TWIND (b) Runtime Figure 5: Effect of number of diffusion processes on Net Sci 50 100 150 200 250 Number of Diffusion Processes S2R Net Rate Hyper Mul Tree TWIND (a) F-score 50 100 150 200 250 Number of Diffusion Processes Runtime (seconds) S2R Net Rate Hyper Mul Tree TWIND (b) Runtime Figure 6: Effect of number of diffusion processes on DUNF processes will contain more information about diffusion network, and help the diffusion network inference algorithms to achieve more accurate inference results. To study the effect of the amount of diffusion processes on algorithm performance, we test the algorithms on two real-world networks Net Sci and DUNF with different number β of diffusion processes (β varies from 50 to 250). In each diffusion process, we randomly selected 0.15n nodes as the initial infection nodes (α = 0.15). Figs. 5 & 6 illustrate the F-score and runtime of each algorithm on Net Sci and DUNF, respectively. From the figures we can observe that (1) a greater amount of diffusion processes often helps the tested algorithms to achieve more accurate results on diffusion network structure inference. TWIND often has a better accuracy performance compared with the other tested algorithms. (2) To analyze the infection status results observed from more diffusion processes, the tested algorithms often require more runtime. Compared with Net Rate, Hyper and Mul Tree, TWIND shows a significant advantage on efficiency performance. Similar results can also be observed on synthetic networks LRF1 10. Conclusion In this paper, we have investigated the problem of how to infer diffusion networks using only the infection statuses of nodes observed in historical diffusion processes. Towards this, we have proposed a probabilistic model to identify most probable parent nodes for each node in the objective network, and theoretically driven a upper limit on the amount of each node s most probable parent nodes. Furthermore, we have also presented a heuristic pruning method for candidate parent nodes to reduce redundant computation during the identification of the most probable parent nodes. Extensive experiments on both synthetic and real-world networks have been conducted, and the results have verified the effectiveness and efficiency of our approach. Acknowledgments This research was supported partly by the NSFC Grants 61502347, 61502504 and 61522208, the Technological Innovation Major Projects of Hubei Province under Grant No. 2017AAA125, Beijing Municipal Science and Technology Projects under Grant No. Z171100005117002, and the Science and Technology Program of Wuhan City under Grant No. 2018010401011288, in which Di s work was supported in part by the NSERC Canada under the grant CRDPJ 479555 Niu. Wei Lu is the corresponding author. Abrahao, B.; Chierichetti, F.; and Kleinberg, R. 2013. Trace complexity of network inference. In KDD 2013, 491 499. Amin, K.; Heidari, H.; and Kearns, M. 2014. Learning from contagion(without timestamps). In ICML 2014, 1845 1853. Daneshmand, H.; Gomez-Rodriguez, M.; Song, L.; and Sch olkopf, B. 2014. Estimating diffusion network structures: Recovery conditions, sample complexity & softthresholding algorithm. In ICML 2014, 793 801. Du, N.; Song, L.; Smola, A.; and Yuan, M. 2012. Learning networks of heterogeneous influence. In NIPS 2012, 2780 2788. Gomez-Rodriguez, M., and Sch olkopf, B. 2012. Submodular inference of diffusion networks from multiple trees. In ICML 2012, 489 496. Gomez-Rodriguez, M.; Balduzzi, D.; and Sch olkopf, B. 2011. Uncovering the temporal dynamics of diffusion networks. In ICML 2011, 561 568. Gomez-Rodriguez, M.; Leskovec, J.; and Krause, A. 2010. Inferring networks of diffusion and influence. In KDD 2010, 1019 1028. Gomez-Rodriguez, M.; Leskovec, J.; and Sch olkopf, B. 2013a. Modeling information propagation with survival theory. In ICML 2013, 666 674. Gomez-Rodriguez, M.; Leskovec, J.; and Sch olkopf, B. 2013b. Structure and dynamics of information pathways in online media. In WSDM 2013, 23 32. Gripon, V., and Rabbat, M. 2013. Reconstructing a graph from path traces. In ISIT 2013, 2488 2492. He, X.; Rekatsinas, T.; Foulds, J.; Getoor, L.; and Liu, Y. 2015. Hawkestopic: A joint model for network inference and topic modeling from text-based cascades. In ICML 2015, 871 880. He, X.; Xu, K.; Kempe, D.; and Liu, Y. 2016. Learning influence functions from incomplete observations. In NIPS 2016, 2065 2073. Huang, H.; Chiew, K.; Gao, Y.; Q, H.; and Li, Q. 2014. Rare category exploration. Expert Systems with Applications 41(9):4197 4210. Kalimeris, D.; Singer, Y.; Subbian, K.; and Weinsberg, U. 2018. Learning diffusion using hyperparameters. In ICML 2018, 2420 2428. Lancichinetti, A.; Fortunato, S.; and Radicchi, F. 2008. Benchmark graphs for testing community detection algorithms. Physical Review E 78(4). Leskovec, J.; Adamic, L. A.; and Huberman, B. A. 2007. The dynamics of viral marketing. ACM Transactions on the Web 1(1):5. Lokhov, A. 2016. Reconstructing parameters of spreading models from partial observations. In NIPS 2016, 3467 3475. Myers, S., and Leskovec, J. 2010. On the convexity of latent social network inference. In NIPS 2010, 1741 1749. Narasimhan, H.; Parkes, D. C.; and Singer, Y. 2015. Learnability of influence in networks. In NIPS 2015, 3186 3194. Netrapalli, P., and Sanghavi, S. 2012. Learning the graph of epidemic cascades. In SIGMETRICS 2012, 211 222. Newman, M. E. J. 2006. Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74(3):036104. Pouget-Abadie, J., and Horel, T. 2015. Inferring graphs from cascades: A sparse recovery framework. In ICML 2015, 977 986. Rong, Y.; Zhu, Q.; and Cheng, H. 2016. A model-free approach to infer the diffusion network from event cascade. In CIKM 2016, 1653 1662. Sefer, E., and Kingsford, C. 2015. Convex risk minimization to infer networks from probabilistic diffusion data at multiple scales. In ICDE 2015, 663 674. Tang, Y.; Xiao, X.; and Shi, Y. 2014. Influence maximization: Near-optimal time complexity meets practical efficiency. In SIGMOD 2014, 75 86. Wallinga, J., and Teunis, P. 2004. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. American Journal of Epidemiology 160(6):509 516. Wang, S.; Hu, X.; Yu, P.; and Li, Z. 2014. MMRate: Inferring multi-aspect diffusion networks with multi-pattern cascades. In KDD 2014, 1246 1255. Yan, Q.; Huang, H.; Gao, Y.; Lu, W.; and He, Q. 2017. Group-level influence maximization with budget constraint. In DASFAA 2017, 625 641.