# fragmentation_coagulation_based_mixed_membership_stochastic_blockmodel__5d2a5a03.pdf

The Thirty-Fourth AAAI Conference on Artiﬁcial Intelligence (AAAI-20)

Fragmentation Coagulation Based Mixed Membership Stochastic Blockmodel

Zheng Yu,1 Xuhui Fan,2 Marcin Pietrasik,1 Marek Reformat1,3

1Department of Electrical and Computer Engineering, University of Alberta 2School of Mathematics & Statistics, University of New South Wales 3Information Technology Institute, University of Social Sciences, Poland {zy3, pietrasi, reformat}@ualberta.ca, xuhui.fan@unsw.edu.au

The Mixed-Membership Stochastic Blockmodel (MMSB) is proposed as one of the state-of-the-art Bayesian relational methods suitable for learning the complex hidden structure underlying the network data. However, the current formulation of MMSB suffers from the following two issues: (1), the prior information (e.g. entities community structural information) can not be well embedded in the modelling; (2), community evolution can not be well described in the literature. Therefore, we propose a non-parametric fragmentation coagulation based Mixed Membership Stochastic Blockmodel (fc MMSB). Our model performs entity-based clustering to capture the community information for entities and linkagebased clustering to derive the group information for links simultaneously. Besides, the proposed model infers the network structure and models community evolution, manifested by appearances and disappearances of communities, using the discrete fragmentation coagulation process (DFCP). By integrating the community structure with the group compatibility matrix we derive a generalized version of MMSB. An efﬁcient Gibbs sampling scheme with Polya Gamma (PG) approach is implemented for posterior inference. We validate our model on synthetic and real world data.

Introduction Analysis of complex networks is an important research topic leading to a variety of useful applications. To this end, many interesting and promising approaches have been proposed to address various challenges in investigating these complex networks. The Mixed-Membership Stochastic Blockmodel (MMSB) (Airoldi et al. 2008) is one such state-ofthe-art model in using Bayesian methods to discover meaningful underlying hidden structure. In general, MMSB assumes each entity in the network has a mixed-membership distribution over the groups. To generate the link between two entities, each entity would sample a belonging group from its mixed-membership distribution. The compatibility value between these two sampled groups would then determine the probability of generating this link. MMSB has garnered considerable interest in recent years, however, it is not good at embedding certain prior infor-

Copyright c 2020, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

mation such as, for instance, the entities community structure. When the entities in the network are assumed to have a mixed-membership distribution over the groups, the entity itself would belong to only one community. That is to say, we should consider two types of clustering in MMSB: entity-based clustering (i.e. communities for entities) and linkage-based clustering (i.e. groups for links.) For example, each footballer can play multiple positions (groups) in one match while only belonging to one team (community). This situation is quite common in the real world. Besides, consider the more general example in Figure 1 where there are three communities {Ci|i 1, 2, 3} in the network, each composed of two groups (G1 i , G2 i ). If we use a 6 6 compatibility matrix, this will hinder interpretability because entities that should belong to groups in the same community may belong to groups in different communities. Under this setting, the MMSB can t not infer any community information about entities. Moreover, the size of the compatibility matrix is bigger than the true one (or the proposed one in Figure 1.) which may lead to an overﬁtting problem. Furthermore, another issue will also be prominent under the dynamic setting. Recall that with respect to temporal dynamics, most of MMSB-based temporal models focus on correlation among groups in the adjacent time slice. However, the size of their compatibility matrices is same across time which leads to another shortcoming. Consider, for instance, a simple case where there is a complex network with just 2 time slices. At time slice 1, there is one community that consists of 4 groups. It is reasonable to use MMSB with a 4 4 compatibility matrix to represent it. However, at time slice 2, the community splits into two communities. Each community still consists of 4 groups but the entities originally in the same group may have different relations based on the community they belong to. Thus a compatibility matrix of size 8 8 is more suitable at time slice 2. This causes a problem when selecting the compatibility matrix size in the MMSB. Choosing the 4 4 matrix will lead to an underﬁtting problem while choosing the 8 8 one will lead to an overﬁtting problem. In this work, we focus on the following problems:

In a complex network, we should consider two types of clustering: entity-based clustering (communities for en-

Figure 1: An example to illustrate the intuition of the proposed model. Each community (C1, C2 and C3) consists of two groups G1 and G2. Entities within each group are represented by black dots. Four types of interactions are considered: within/across groups and within/across communities. In MMSB, a 6 6 compatibility matrix can be used (left part). In our model, it is represented by 2 compatibility matrices: one representing the group relations within communities and another representing the group relations across communities. (right part)

tities); and linkage-based clustering (groups for links). MMSB-based models only adapt the second one in both static and dynamic setting and this will hinder community interpretability.

Community evolution exists in complex networks across time. MMSB-based models are not able to capture these changes by merely adjusting the size of the compatibility matrix as they use a ﬁxed size compatibility matrix across time.

To handle these two problems, we propose the fragmentation coagulation based Mixed Membership Stochastic Blockmodel (fc MMSB). To enrich the structure of MMSB, we introduce a community level to MMSB in which the Chinese restaurant process (CRP) is used to partition entities. Due to the nonparametric property of CPR, the number of communities doesn t need to be speciﬁed and this makes the model more ﬂexible. For entities in the same community, MMSB is carried out independently to enable each entity to hold multiple groups. To distinguish the group relations within/across communities, we make use of two compatibility matrices, one for modeling relations between groups in the same community and one for modeling relations between groups in different communities. Speciﬁcally, we introduce an across community adjustment parameter which acts as a modiﬁer on the intra group relations across communities so that intra group relations are different if the groups belong to different communities. Furthermore, to handle the issue in the dynamic setting, we incorporate the discrete fragmentation coagulation process (DFCP) (Elliott and Teh 2012; Luo et al. 2017) to model the community evolution across time. This allows us to release the limitation of the ﬁxed size compatibility matrix in MMSB across time. The reason is that DFCP can automat-

ically learn the number of communities at each time slice. Also, the changes in the number of communities would inﬂuence the entities group membership. Therefore, this will inﬂuence the size of compatibility matrix implicitly. Besides, DFCP helps to model situations such as community splitting and merging while also generalizing MMSB such that when there is only one community in the network, it just turns back to the vanilla MMSB. With this approach, communities can merge into super communities or split into small communities.

Model Formulation In fc MMSB, our task is to do link prediction for the unobserved entity interactions, based on the observed ones. We focus on binary-valued interaction with a total number of N entities at T time slices. Formally, these interactions can be deﬁned as a binary 3-d tensor X {0, 1}T N N, where xt ij = 1 represents a directed interaction between entity ui and entity uj at time slice t, and xt ij = 0 represents no interaction. Other format of the observed interactions is possible by considering different forms of the likelihood functions.

Modelling Community Evolution Using DFCP In our model, each entity (individual) is associated with a community, so community evolution inﬂuences relations between entities. Consider, for example, a scenario where corporations are communities, the branches within these corporations (IT, accounting, etc.) are groups, and the network models relations between employees. In the case of a corporate merger, the interactions between employees in the same branches of the merging corporations will increase. In general, we can categorize community evolution into four types: appearance, disappearance, split, and merge. We use fragmentation and coagulation to depict all four types

of changes such that coagulation and fragmentation correspond to merging and splitting, respectively. Community appearance and disappearance can be viewed as extensions of community splits and merges. Since communities evolve, it is hard to know the number of communities a priori, thus our model infers the number of communities using nonparametric Bayes. We adopt the DFCP framework to implement these two operations. DFCP is a non-parametric dynamic clustering process where clusters are ﬁrst split (fragmentation) and then merged (coagulation). DFCP performs the fragmentation and coagulation processes alternately. To describe the procedures of fragmentation and coagulation, we deﬁne a set of disjoint non-empty subsets, νt = {χt 1, ..., χt r} where χt h is a latent community h at t and r is the number of communities at time t. Furthermore, each subset χt h consists of disjoint entities ui in the network. Figure 2 provides the visualization of fragmentation and coagulation processes. In our model, we process fragmentation and coagulation at times t 1 and t, respectively. At time t 1 , the fragmentation process partitions each community χt 1 h from νt 1 while at time t the obtained partitions are coagulated into a new set of communities νt = {χt 1 , ..., χt r }. Now, we provide the generative process for communities using DFCP. To sample community indicator zt i for each entity ui where i {1, ..., N}, we start an initialization with CRP at t = 0 as: Init(zit) : p(zi0 = h|z0 i)

= |χ0 h|/(N + ζ 1) if χ0 h ν0 i ζ/(N + ζ 1) if χ0 h =

where z0 i is the community indicator for all entities excluding entity ui, ζ is concentration parameter, ν0 i is the set ν0

excluding ui, |χ0 h| is the number of entities in χ0 h and is a new community at t = 0. In the fragmentation part, each community splits into small communities and executes a CRP partition independently. The fragmentation process at t = 0 is summarized as: Frag(zit) : p(zt i = h|νt 1 i , νt i, zt 1 i = q) =

|χt h|/(|χt 1 q | + ζ 1) if χt 1 q νt 1 i , χt h νt i ζ/(|χt 1 q | + ζ 1) if χt 1 q νt 1 i , χt h = 1 if χt 1 q = , χt h = 0 otherwise

We note that all the elements in χt h also belong to χt 1 q . In the coagulation part, we execute a CRP partition on the set of communities. The coagulation process at t is summarized as: Coal(zit ) : p(zit = e|νt i, νt, zit = h) =

1 if χt e νt i, χt h νt i |Ω|/(|νt| + η 1) if χt e νt i, χt h = η/(|νt| + η 1) if χt e = , χt h = 0 otherwise

Figure 2: Visualization of fragmentation and coagulation processes in fc MMSB. For example, the community of {1, 2, 3, 4, 5} at time t 1 will ﬁrst be split into 3 small sub-communities {1}, {2, 3}, {4, 5} and then be reclustered into communities at time t .

where η is the concentration parameter for the coagulation process and Ω represents the communities at t which belong to the community set with index e at time t . = {χt v|χt v χt e }.

Generating Relations In reality, it is common that an entity plays roles in multiple groups. For example, a doctor may be the supervisor of a nurse and the subordinate of the hospital director. Therefore, we induce MMSB to each entity at the group level by imposing a mixed membership vector θt i on each entity ui at a time slice t. (θt i is a membership of entity ui over K groups where k θt,k i = 1). For each pair of entities ui and uj, we sample group indicators gt i j, gt i j from Multinomial(θt i) and Multinomial(θt j). The arrow in gt i j and gt i j indicates the sender (from ui to uj) and the receiver (from uj to ui), respectively. Now, we construct a compatibility matrix to predict entity relations xt ij based on the community and group indicators. Imagine that there are several communities consisting of groups inside a complex network. It is quite common that the inner structure (group relations) of each community is similar. For example, each company has sales and marketing departments. Besides, groups within the community are more likely to have tighter interactions than ones across communities. Moreover, across community, groups with similar functionality are more probable to have interactions. Therefore two assumptions are made to construct these relations. First, group pair relations within communities are consistent. We use a compatibility matrix, B, to model all within community group relations. Second, interactions between entities from the same group but in different communities may be different from ones in the same group and community. To account for this we add a K-array across community adjustment parameter Q to on-diagonal values of the B. This provides a ﬂexible way to model the differences of within-group entity relations based on whether the entities are in the same community. Furthermore, we set the value of relations between entities that do not share commu-

Figure 3: Graphical model of fc MMSB. Hyperparameters are not shown. t and t denote the time index of fragmentation and coagulation process respectively. Notation: zt = {zt i|i {1, ..., N}}.

nity nor group to a small value, ϵ. For each pair of entities ui and uj, we sample xt ij from Bernoulli( 1 1+exp ( yt ij)) where

Blk if zt i = zt j, gt i j = l, gt i j = k Bkk + Qk if zt i = zt j, gt i j = gt i j = k ϵ otherwise

Group pairs are always correlated in the real world. For example, employee-employer relations can be unidirectional while employee-employee may be bidirectional. We are interested in the correlation of group pairs so the Inverse Wishart prior is imposed on the variance σkl of the normal distribution of Blk and Bkl. Finally, we share the group-level compatibility matrix B and adjustment parameter K-ary Q across time due to the data sparsity. In summary, the fc MMSB generative model is as follows:

To generate compatibility matrix B sample σkl Invwishart(ϕ, ψ) sample (Blk, Bkl) N(μkl, σkl) sample Bkk N(μB, σB) For each across community adjustment parameter Qk sample Qk N(μQ, σQ) For each mixed membership of entity ui sample θt i Dirichlet(α) For each community indicator zt i sample z0 i Init(zi0) sample zt i Frag(zit) sample zt i Coal(zit ) To generate each directed relations xt ij sample sender group gt i j Multinomial(θt i) sample receiver group gt i j Multinomial(θt j)

sample xt ij Bernoulli( 1

1+e yt ij )

We give the graphical model of fc MMSB in Figure 3.

Inference Our model is intractable for exact inference, instead we derive a Gibbs sampling scheme for posterior inference. The target is to predict the unobserved relations between entities by inferring parameters z, θ, B, Q, g and σ. The parameter in bold represents its total set. The joint distribution p(x, z, θ, B, Q, g|ϵ, α, ζ, η) can be expressed as:

i,j,t p(xt ij|zt i, zt j, Qgt i j, Bgt i jgt i j, ϵ)

i Init(z0 i )

i,t Frag(zt i)Coal(zt i )

k p(Qk|μQ, σQ)p(Bkk|μB, σB)

i,j,t p(gt i j|θt i)

l,k,l =k p(Blk, Bkl|μkl, σkl)

i,t p(θt i|α)

Sampling Blk, Bkl(l = k) Using Polya-Gamma

For simplicity, the (Blk, Bkl) pair is denoted as a vector ˆB in this section. The Polya-Gamma (PG) data augmentation is implemented for ˆB. Following (Polson, Scott, and Windle 2013), (eφ) m

(1+eφ)n can be expressed as 2 neκφE{e wφ2/2} with a PG variable ω PG(n, 0), where κ = m n/2. Furthermore, with conditional distribution p(w|φ), we have ω|φ PG(n, φ). Assuming that the prior of φ follows N(μ, σ) with likelihood (eφ) m

(1+eφ)n , the posterior of φ is a Gaussian distribution. Therefore, the true posterior of φ can be derived by updating φ and ω alternately. In our model, ˆB is updated via PG approach by alternately sampling ˆB, ωlk, ωkl: ˆB| N(μ , σ ) ωlk PG(nlk, Blk), ωkl PG(nkl, Bkl)

μ = σ (κ + σklμkl)

σ = (Ω + σkl 1) 1

κ = (κlk, κkl). Ω is a diagonal matrix of ωlk and ωkl. κlk = n1 lk nlk/2. Here nlk = t,i,j I[gt i j = l] I[gt i j = k] I[zt i = zt j] and n1 lk =

t,i,j I[gt i j = l] I[gt i j = k] I[zt i = zt j] I[xt ij = 1] where I is an indicator function. As the sampling scheme of Bll and Ql is similar with ˆB, we omit the procedure here.

Sampling gt i j Collapsed Gibbs sampling is used on gt i j by marginalizing over θt i. The posterior of gt i j can be expressed as:

p(gt i j = k| ) [eyt ij] I[xt ij=1]

1 + eyt ij ni j k (t) + αk

k ni j k (t) + αk

where ni j k (t) =

l,l =j I[gt i l = k].

Sampling z The prior of latent communities sequence zi is:

pprior(zi) = Init(zi 0) Coal(zi 0 ) . . . Frag(zi T 1)

so the posterior of zi can be described as:

p(zi| ) p(xi , x i|z, θ, B, Q, g, ϵ) pprior(zi)

[eyt ij] I[xt ij=1]

1 + eyt ij [eyt ji] I[xt ji=1]

1 + eyt ji pprior(zi)

where yt ij follows the previous deﬁnition in section 3. For computational simplicity, we use forward-backward algorithm on p(z T i | ). Here xi = {xt ij|j {1, ..., N}, t {0, ..., T 1}}, x i is deﬁned similarly.

Sampling σkl As the prior and likelihood of σkl are a conjugate pair, we give the posterior of σkl directly.

σkl| Invwishart(1 + ϕ, ψ + ( ˆB μkl)( ˆB μkl) )

Prediction In the previous sections, we derived the samples at each iteration. We would like to use these samples to estimate the unobserved relations. Our prediction target at iteration s, ˆxt[s] ij , is expressed as ˆθt i B ˆθt j, where the superscript of ˆθt i is the transpose of the vector. Here each dimension of

θt j is ˆθt,k i = ni k(t)+αk

k ni k(t)+αk and ni k(t) =

j I[gt i j = k].

Each entry Blk of B is 1 1+exp ( Ylk) and Ylk = I[zt i = zt j]Blk+I[l = k]I[zt i = zt j](Blk+Qk)+I[l = k]I[zt i = zt j]ϵ.

Related Work The Stochastic Block Model (SBM) presents is an earlier approach on modelling network data. In general, SBM builds on fundamental works (Aldous 1981) and (Hoover 1979) that bring the notion of relational data exchangeability (Fan, Li, and Sisson 2018a; 2018b). Blockmodels (Dyer and Frieze 1989) and (Snijders and Nowicki 1997) leverage interactions between entities to generate corresponding clusters that have a symmetric adjacency matrix. The Inﬁnite Relational Model (IRM) (Kemp et al. 2006) makes an extension to SBM by allowing the number of clusters to be undetermined. Our proposed work is based on MMSB, the key contribution of which is to allow each entity to hold multiple groups in a network. Another important class of relation model, Poisson matrix factorization model, is also used in the evaluation section. Bayesian Poisson Tensor Factorization (BPTF) (Schein et al. 2015) models the dyadic events via a tensor factorization. (Zhou 2015) focuses on utilizing hierarchical gamma process on static networks mainly. (Yang and Koeppl 2018a; 2018b) make substantial contributions of incorporating the completely random measures into the modelling, and (Fan et al. 2019) proposes a deep and scalable version of the Mixed Membership Stochastic Blockmodel.

Figure 4: AUC comparison on synthetic data.

DFCP provides ﬂexibility to model the process of splitting and merging of communities. The predecessor to DFCP, the fragmentation coagulation process (FCP) (Teh, Blundell, and Elliott 2011) is a continuous limit of DFCP. The main drawback of FCP is that at most two clusters can undergo a merge operation or one cluster can be split into at most two clusters at the same time. There is no constraint on the number of clusters to be split or merged in DFCP. We derive a highly efﬁcient sampling scheme via a data augmentation approach (PG) (Polson, Scott, and Windle 2013). This approach offers a model that has a Bernoulli likelihood with a Gaussian prior transferred by the logistic function. Naturally, (Durante and Dunson 2014) implements PG approach on Gaussian process (GP). Furthermore, (Zhou et al. 2012) improves Poisson regression by LGNB model with PG approach.

Evaluation Synthetic Data To demonstrate the problem of the MMSB mentioned in the introduction, we generate a synthetic dataset with N = 100 and T = 2, the generative process for which is described as follows:

1. Instantiate a network structure of three communities containing two groups each. For each time slice, generate the mixed membership for 100 entities by sampling the Dirichlet distribution with parameters [0.8, 0.2] or [0.2, 0.8] depending on the group. Set B to be a 2 2 compatibility matrix with high on-diagonal values and low off-diagonal values.

2. For time slice 1, if both entities belong to the same community perform step 3, otherwise set the entity relation to 0. For time slice 2, if both entities belong to the same community and group perform step 3, otherwise set the entity relation to 0.

3. Generate entity relations using the Bernoulli distribution with parameter (θ i Bθj) for the relation between ui and uj.

Model Coleman Student net Mining reality Hypertext 2009 Infectious

CN 0.881 0.018 0.839 0.019 0.873 0.004 0.776 0.006 0.883 0.014 MMSB 0.880 0.016 0.914 0.011 0.885 0.007 0.867 0.005 0.965 0.001 T-MBM 0.881 0.005 0.896 0.010 0.861 0.002 0.790 0.004 0.838 0.008 BPTF 0.908 0.013 0.909 0.021 0.922 0.001 0.874 0.006 0.843 0.011 SVD++ 0.833 0.006 0.735 0.004 0.614 0.011 DRGPM 0.823 0.014 0.933 0.003 0.904 0.008 0.988 0.000 MNE 0.891 0.024 0.940 0.020 0.813 0.004 0.872 0.008 0.900 0.017 Deep Walk 0.914 0.018 0.910 0.018 0.759 0.004 0.816 0.005 0.910 0.014

fc MMSB 0.908 0.009 0.954 0.006 0.935 0.004 0.902 0.001 0.981 0.001

Table 1: Model performance: AUC (mean and standard deviation) on the real dataset. Note: * represents a dynamic model.

Figure 5: Comparison of AUC between MMSB and fc MMSB on the Coleman dataset.

For evaluation, we randomly split the data into 2 subsets: 80% for training and 20% for testing. We compare our model with two different MMSB models varying in the number of groups in the compatibility matrix. The train and test AUC results are provided in Figure 4. We notice that when the number of groups in MMSB is 2, it is underﬁtting relative to fc MMSB with 2 groups. When the number of groups in MMSB is 6, there are two possible outcomes: overﬁtting and not overﬁtting. The overﬁtting of the MMSB is demonstrated by the higher train AUC and lower test AUC on time slice 2 compared to our model. Overﬁtting is not always the outcome, however, and the stochastic nature of the MMSB means that on different runs, the MMSB may achieve similar results to our model, as shown by MMSB-non in Figure 4. This demonstrates the problem of choosing the number of groups in the MMSB.

Prediction Relations

To demonstrate the potential of our fc MMSB model, we use ﬁve real-world datasets for validation. We use the relation prediction task to validate our model. The area under the ROC (Receiver Operating Characteristic) curve (AUC) is used as a performance metric. Here, we randomly select 80% data for training and leave the 20% for testing. Each experiment is run for ﬁve times, and we report the AUC results with their mean and standard deviation values. Five real-world datasets are described as follows:

Figure 6: Left: compatibility matrix in MMSB. Middle: compatibility matrix within community in fc MMSB. Right: compatibility matrix across community in fc MMSB.

The Coleman dataset (Coleman and others 1964) contains the information about the friendships of boys in an Illinois high-school. It records the three closest friends for each student in the fall of 1957 and spring of 1958. The binarized dataset is a 73 73 2 asymmetric matrix.

The Student net dataset (Fan, Cao, and Da Xu 2014) describes the relations between students. We binarize the relations at each time slice, leading to a 50 50 3 asymmetric matrix.

Mining Reality dataset (Eagle and Pentland 2006) records contact data of 96 students at the Massachusetts Institute of Technology (MIT) over 9 months in 2004. The dataset is split into 10 time slices, then we set each entity pair value to be 1 at that time slice if they have at least one contact during that time. Thus, it leads to a 96 96 10 symmetric matrix.

The Hypertext 2009 dataset (Isella et al. 2011) records the contact network ACM Hypertext 2009 conference attendees. The relation between two attendees is 1 if they have a face-to-face contact over 20 seconds. We split the dataset into 10 time slices and binarize it, leading to 113 113 10 symmetric matrix.

The Infectious dataset (Isella et al. 2011) describes the face-to-face interactions between people during the exhibition INFECTIOUS: STAY AWAY in 2009 at the Science Gallery in Dublin. Each relation is 1 if those two people had face-to-face contact for at least 20 seconds. We binarize the relations at each time slice, leading to a 410 410 10 symmetric matrix.

Figure 7: Visualization of community clustering on the Student net dataset across time.

Figure 8: Left: User activeness across time. Middle: The recovered number of user interactions. Right: The original number of user interactions.

General Performance We use eight baseline methods for comparison. One structure-based model: Common neighbor (CN) (Newman 2001). Five feature or cluster based models: Mixed Membership Stochastic Blockmodel (MMSB) with Gibbs sampling (Airoldi et al. 2008), Temporal Tensorial Mixed Membership Stochastic Blockmodel (T-MBM) (Tarr es-Deulofeu et al. 2019), Bayesian Poisson Tensor Factorization (BPTF) (Schein et al. 2015), Collaborative ﬁltering with temporal dynamics (SVD++) (Koren 2009) and Dependent relational gamma process model (DRGPM) (Yang and Koeppl 2018a). Two embedding based models: Scalable Multiplex Network Embedding (MNE) (Zhang et al. 2018) and Deep Walk (Perozzi, Al-Rfou, and Skiena 2014). We show the results in Table 1. The overall result of fc MMSB is competitive with DRGPM and outperforms the other state-of-the-art models. This may result from fc MMSB, with its ﬂexible structure, being more suitable for long time series datasets in which the number of communities may vary across time. The DRGPM performance on Student net dataset may suffer from the short time sequence of the dataset. Compared with vanilla MMSB, fc MMSB also shows its advantage on both short and long time series dataset. We compare our model with MMSB by varying the group number parameter on the Coleman dataset in Figure 5. fc MMSB achieved better AUC on both train and test sets. When we increased the group number, train AUC on both models increased. Due to the ﬂexible structure of fc MMSB, the margin of train and test AUC between fc MMSB and MMSB is relatively larger with smaller group numbers. While the train AUC of MMSB is relatively close to that of fc MMSB, the test AUC is lower. Besides, we compare fc MMSB with vanilla MMSB by looking at the trained compatibility matrix for the Hyper-

text dataset in Figure 6. We see that the MMSB compatibility matrix is similar to the within-community matrix in fc MMSB. However, there is a moderate difference in fc MMSB between the within-community and across communities matrices for the entry (2,2). This shows that the group-pair relation within a community is not same as the one across communities, therefore MMSB with its single compatibility matrix, cannot properly model this network. This is why the fc MMSB is better than MMSB on the Hypertext dataset; its more ﬂexible structure is better at modeling multiple communities. In comparing the compatibility matrices of other datasets, we ﬁnd this to be the case with other datasets as well. We also observe that the second role of the membership covers the main part for most people due to sparsity which can be interpreted as the inactive role. This interpretation is consistent with the compatibility matrix. In Figure 7, we visualize the community clustering result on the Student net dataset. We ﬁnd the data points are dense along the diagonal. This is consistent with our assumption that the interactions within the community are tighter than the ones across communities. Besides, we ﬁnd that most entities belong to the same communities across time, even though the community index may change. This shows why DFCP is used in our model since DFCP constructs a temporal dependency for communities across time. Furthermore, to show the dynamic of membership in the Infectious dataset, for each time slice, we randomly select one user who is active at that time slice. To show the intensity of user activeness, we deﬁne activeness, AC, for each user i at time slice t to be ACt i = θt i B 1. We present the user activeness and the recovered number of users interactions with the original one in Figure 8. It is interesting that the user is active in consecutive time segments. Meanwhile, comparing the user activeness with the original user interactions, it is easy to observe that they have correlations. This shows the membership used for user activeness really reﬂects the characteristic of the data. Also the recovered number of user interactions is similar with the original one in Figure 8. Besides, we ﬁnd that BPTF got the relatively low AUC compared with the other four datasets. It seems that the tight correlation of features across time inherent in BPTF does not ﬁt this dataset. Overall, fc MMSB is stable in both dense (Coleman, Student net, Mining reality) and sparse (Hypertext, Infectious) datasets.

Conclusion In this work, we highlight two problems in MMSB: the structure in MMSB is unable to encapsulate the prior information like the community structure of entities in the static case; and modelling the community evolution using a ﬁxed size compatibility matrix may suffer underﬁtting/overﬁtting in the dynamic case. To overcome these two problems, we developed the fragmentation coagulation based Mixed Membership Stochastic Blockmodel (fc MMSB). Speciﬁcally, we used CRP for entity-based clustering to capture the community information of entities and MMSB for linkagebased clustering to derive the group information for links simultaneously. Besides, we utilized DFCP to infer the community structure (including the number of communities)

among entities and evolution (appearance/disappearance or split/merge). Our model combines a group-level compatibility matrix with a community adjustment parameter to satisfy the four types of entity pair relations: within and across communities and groups. Our model uniﬁes these techniques to derive a generalized MMSB. Furthermore, a PG approach is implemented for an efﬁcient sampling scheme to infer hidden variables. Finally, we demonstrate the fc MMSB outperforms and is competitive with the state-of-the-art methods through experiments on real datasets.

References Airoldi, E. M.; Blei, D. M.; Fienberg, S. E.; and Xing, E. P. 2008. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9(Sep):1981 2014. Aldous, D. J. 1981. Representations for partially exchangeable arrays of random variables. Journal of Multivariate Analysis 11(4):581 598. Coleman, J. S., et al. 1964. Introduction to mathematical sociology. Introduction to mathematical sociology. Durante, D., and Dunson, D. 2014. Bayesian logistic gaussian process models for dynamic networks. In Artiﬁcial Intelligence and Statistics, 194 201. Dyer, M. E., and Frieze, A. M. 1989. The solution of some random np-hard problems in polynomial expected time. Journal of Algorithms 10(4):451 489. Eagle, N., and Pentland, A. S. 2006. Reality mining: sensing complex social systems. Personal and ubiquitous computing 10(4):255 268. Elliott, L., and Teh, Y. W. 2012. Scalable imputation of genetic data with a discrete fragmentation-coagulation process. In Advances in Neural Information Processing Systems, 2852 2860. Fan, X.; Li, B.; Sisson, S. A.; Li, C.; and Chen, L. 2019. Scalable deep generative relational model with high-order node dependence. In Neur IPS. Fan, X.; Cao, L.; and Da Xu, R. Y. 2014. Dynamic inﬁnite mixed-membership stochastic blockmodel. IEEE transactions on neural networks and learning systems 26(9):2072 2085. Fan, X.; Li, B.; and Sisson, S. A. 2018a. The binary space partitioning-tree process. In AISTATS, volume 84, 1859 1867. Fan, X.; Li, B.; and Sisson, S. A. 2018b. Rectangular bounding process. In Neur IPS, 7631 7641. Hoover, D. N. 1979. Relations on probability spaces and arrays of random variables. Preprint, Institute for Advanced Study, Princeton, NJ 2. Isella, L.; Stehl e, J.; Barrat, A.; Cattuto, C.; Pinton, J.-F.; and Van den Broeck, W. 2011. What s in a crowd? analysis of face-to-face behavioral networks. Journal of theoretical biology 271(1):166 180. Kemp, C.; Tenenbaum, J. B.; Grifﬁths, T. L.; Yamada, T.; and Ueda, N. 2006. Learning systems of concepts with an inﬁnite relational model. In AAAI, volume 3, 5.

Koren, Y. 2009. Collaborative ﬁltering with temporal dynamics. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 447 456. ACM. Luo, L.; Li, B.; Koprinska, I.; Berkovsky, S.; and Chen, F. 2017. Tracking the evolution of customer purchase behavior segmentation via a fragmentation-coagulation process. In IJCAI, 2414 2420. Newman, M. E. 2001. Clustering and preferential attachment in growing networks. Physical review E 64(2):025102. Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 701 710. ACM. Polson, N. G.; Scott, J. G.; and Windle, J. 2013. Bayesian inference for logistic models using p olya gamma latent variables. Journal of the American statistical Association 108(504):1339 1349. Schein, A.; Paisley, J.; Blei, D. M.; and Wallach, H. 2015. Bayesian poisson tensor factorization for inferring multilateral relations from sparse dyadic event counts. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1045 1054. ACM. Snijders, T. A., and Nowicki, K. 1997. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. Journal of classiﬁcation 14(1):75 100. Tarr es-Deulofeu, M.; Godoy-Lorite, A.; Guimer a, R.; and Sales-Pardo, M. 2019. Tensorial and bipartite block models for link prediction in layered networks and temporal networks. Physical Review E 99(3):032307. Teh, Y. W.; Blundell, C.; and Elliott, L. 2011. Modelling genetic variations using fragmentation-coagulation processes. In Advances in neural information processing systems, 819 827. Yang, S., and Koeppl, H. 2018a. Dependent relational gamma process models for longitudinal networks. In International Conference on Machine Learning, 5547 5556. Yang, S., and Koeppl, H. 2018b. A poisson gamma probabilistic model for latent node-group memberships in dynamic networks. In Thirty-Second AAAI Conference on Artiﬁcial Intelligence. Zhang, H.; Qiu, L.; Yi, L.; and Song, Y. 2018. Scalable multiplex network embedding. In IJCAI. Zhou, M.; Li, L.; Dunson, D.; and Carin, L. 2012. Lognormal and gamma mixed negative binomial regression. In Proceedings of the... International Conference on Machine Learning. International Conference on Machine Learning, volume 2012, 1343. NIH Public Access. Zhou, M. 2015. Inﬁnite edge partition models for overlapping community detection and link prediction. In Artiﬁcial intelligence and statistics, 1135 1143.