# user_profile_preserving_social_network_embedding__30e98222.pdf

User Proﬁle Preserving Social Network Embedding

Daokun Zhanga, Jie Yinb, Xingquan Zhuc,d, Chengqi Zhanga

a Centre for Artiﬁcial Intelligence, FEIT, University of Technology Sydney, Australia b Data61, CSIRO, Australia c Dept. of Computer & Electrical Engineering and Computer Science, Florida Atlantic University, USA d School of Computer Science, Fudan University, China Daokun.Zhang@student.uts.edu.au; Jie.Yin@csiro.au; xqzhu@cse.fau.edu; Chengqi.Zhang@uts.edu.au

This paper addresses social network embedding, which aims to embed social network nodes, including user proﬁle information, into a latent lowdimensional space. Most of the existing works on network embedding only consider network structure, but ignore user-generated content that could be potentially helpful in learning a better joint network representation. Different from rich node content in citation networks, user proﬁle information in social networks is useful but noisy, sparse, and incomplete. To properly utilize this information, we propose a new algorithm called User Proﬁle Preserving Social Network Embedding (UPPSNE), which incorporates user proﬁle with network structure to jointly learn a vector representation of a social network. The theme of UPPSNE is to embed user proﬁle information via a nonlinear mapping into a consistent subspace, where network structure is seamlessly encoded to jointly learn informative node representations. Extensive experiments on four real-world social networks show that compared to state-of-the-art baselines, our method learns better social network representations and achieves substantial performance gains in node classiﬁcation and clustering tasks.

1 Introduction The huge growth of online social networks, e.g., Facebook, Twitter, Google Talk, Wechat, etc., has revolutionized a new way for people to connect, express themselves, and share content with others in today s cyber society. Users in online social networks are connected with each other to form a social graph (e.g., the friendship graph). One of the most critical problems in social network analysis is the automatic classiﬁcation of users into meaningful groups based on their social graphs, which has many useful practical applications such as user search, targeted advertising and recommendation systems. Therefore, it is essential to accurately learn useful information from social networks. One promising strategy is to learn a vector representation of a social network: each network node is represented as a low-dimensional vector such that the information conveyed by the original social

network can be effectively captured. As a result, existing machine learning methods can be directly applied in the lowdimensional vector space to perform network analytic tasks such as node classiﬁcation, network clustering, etc. Recently, a series of algorithms have been proposed for network representation learning (NRL), such as Deep Walk [Perozzi et al., 2014], LINE [Tang et al., 2015], Gra Rep [Cao et al., 2015], and node2vec [Grover and Leskovec, 2016]. These approaches have been shown to be effective in a variety of network analytic tasks, ranging from node classiﬁcation [Sen et al., 2008], anomaly detection [Bhuyan et al., 2014], community detection [Yang et al., 2013], to link prediction [L u and Zhou, 2011]. However, most of them have considered network structure only, e.g., the links between nodes, but ignored other user-generated content (e.g., text, user proﬁles) that could potentially beneﬁt network representation learning and subsequent analytic tasks. In this paper, we are mainly concerned about the problem of social network embedding, which embeds each user in a social network into a latent low-dimensional space. In social networks, users are not only connected by social relationships (e.g., friendship or the follower-followee relationship), but they are also associated with user proﬁle information, consisting of attributes such as gender, geographic location, interests, or school/afﬁliation. Such proﬁle information can reﬂect and affect the forming of community structures and social circles [Leskovec and Mcauley, 2012; Yang et al., 2013]. Motivated by the fact that user proﬁle information is potentially helpful in learning a better joint network representation, we focus on studying how user proﬁle information can be leveraged and incorporated into the learning of social network representations. Indeed, several very recently developed algorithms [Yang et al., 2015; Pan et al., 2016] have attempted to utilize node content information, such as textual features of each node in citation networks, for effective network representation learning. These existing works have conﬁrmed that node content indeed provides crucial information to learn better network representations. However, as we will soon demonstrate in Section 5, these methods are mainly designed to consider consistent node content, but fail to work for user proﬁles in social networks. This is mainly attributed to two reasons. First, today s online social networks rely on users to manually input proﬁle attributes, so attributes in proﬁles could be very

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Table 1: Characteristics of information sources used for network embedding

Info. Sources Consistency Sparsity Noise Incompleteness

Structure Medium High Medium Low Node content High Medium Low Low User proﬁle Low High High High

sparse, incomplete, and noisy. The values of user proﬁle attributes are often long-tail distributed, and the values of some attributes like school or address may occur very infrequently or are simply missing. Second, different from node content such as posts and comments that are topic-centric, user proﬁles are depicted by user attributes on different dimensions, such as interests, or school/afﬁliation, and the values on these dimensions are completely distinct and inconsistent. It is thus very difﬁcult to ﬁnd useful information from user proﬁles that could complement network structure towards learning a joint vector representation of social networks. In Table 1, we summarize major characteristics of information sources available for network embedding. Our analysis and empirical study conﬁrm that user proﬁles are largely different from node content features, and therefore existing rich node content based network embedding methods are ineffective in handling user proﬁles for representation learning. To overcome the above-mentioned difﬁculties, we propose a new algorithm called User Proﬁle Preserving Social Network Embedding (UPP-SNE), which incorporates user proﬁle information with network structure to jointly learn a vector representation of social networks. The theme of the UPPSNE algorithm is to learn a joint embedding representation by performing a non-linear mapping on user proﬁles guided by network structure. In this feature reconstruction process, network structure helps ﬁlter out noisy information from user proﬁles and embed them into a consistent subspace, into which topology structure is seamlessly encoded to jointly learn an informative embedding representation. The interplay between user proﬁle information and network structure enables them to complement with each other towards learning a joint network representation that preserves both network proximity and user proﬁle afﬁnity. The effectiveness of the UPP-SNE algorithm is validated on four real-world social networks for the tasks of node classiﬁcation and clustering. Experimental results show that, UPP-SNE s representation yields superior performance to the state-of-the-art baselines. The main contribution of this paper is threefold:

We introduce user proﬁle preserving social network embedding, in which user proﬁles that widely exist in social networks are leveraged for effective and informative social network embedding; We propose UPP-SNE that employs a non-linear mapping to incorporate noisy, sparse, and incomplete user proﬁle with network structure to learn joint embeddings; We empirically evaluate the effectiveness of the proposed algorithm through node classiﬁcation and clustering tasks on real-world social networks, showing its superior performance to the state-of-the-art baselines.

2 Related Work

In this section, we review two lines of NRL algorithms, namely network structure preserving NRL methods that are based on network structure only, and content augmented NRL methods that combine node content with network structure to enhance network representation learning.

2.1 Network Structure Preserving NRL Methods Deep Walk [Perozzi et al., 2014] is one of the pioneer works for learning node representations in networks. Following the idea of Skip-Gram [Mikolov et al., 2013], Deep Walk generates node context using truncated random walks and learns node representations that allow the nodes sharing similar node context to be represented similarly. LINE [Tang et al., 2015] formulates a more clear objective function to preserve the ﬁrst-order proximity and the second-order proximity. Gra Rep [Cao et al., 2015] further considers higher order proximities that describe the representation similarity between nodes sharing indirect neighbors. Very recently, SDNE [Wang et al., 2016] is proposed to learn non-linear network representations by applying deep autoencoder model on node adjacent matrix and exploiting the ﬁrst-order proximity as supervised information. To capture both the local and global network structure, node2vec [Grover and Leskovec, 2016] exploits biased random walks to generate context nodes, and then applies Deep Walk [Perozzi et al., 2014] to learn node representations. The above NRL algorithms consider only network structure, without taking advantage of user-generated content widely available in social networks to learn more informative network representations.

2.2 Content Augmented NRL Methods Text-associated Deep Walk (TADW) [Yang et al., 2015] is the ﬁrst attempt to import textual features into NRL. By proving Deep Walk is equivalent to matrix factorization, TADW incorporates textural features into network embedding through matrix factorization. TADW can be regarded as a special case of our proposed algorithm where a linear mapping is performed on node content features. Tri NDR [Pan et al., 2016] further exploits supervised labels to learn better node representations by modeling the inter-node relationship, node-word correlation and label-word correspondence. TADW and Tri NDR are both designed for information networks with rich text content on node features (e.g., citation networks), but are ineffective for social networks with noisy user proﬁle information. LANE [Huang et al., 2017] proposes to learn three types of latent node representations via spectral techniques from the node content-level similarity matrix, network adjacent matrix and node label-level similarity matrix, and project them into a common embedding space. Tri DNR and LANE primarily focus on utilizing supervised labels to enhance network representation learning. From their reported results, label information contributes more to performance gains, while the potential of node content is not sufﬁciently exploited.

3 Problem Deﬁnition

We consider that a social network is given as an undirected social graph G = (V, E), where V is the set of nodes and E

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

is the set of edges. Each node vi V is characterized by a d-dimensional feature vector xi that describes its proﬁle information. Social network embedding aims to map each node vi V into a low-dimensional space, and the mapped image Φ(vi) of node vi is taken as the learned node representation. The learned network representations should satisfy three properties: (1) low-dimensional, i.e., the dimension of the learned representations should be much smaller than the dimension of the original adjacent matrix representation, |V|; (2) network structure preserving, i.e., the original network structure is fully preserved in the embedded space; and (3) user proﬁle preserving, i.e., user proﬁle information is sufﬁciently exploited to complement network structure for learning a joint vector representation. Note that, we focus on user proﬁles in this study, but our problem can be ﬂexibly generalized to handle other modalities of node content.

4 User Proﬁle Preserving Social Network Embedding

In this section, we ﬁrst review preliminaries on Deep Walk, and then describe our proposed solution in detail.

4.1 Deep Walk Deep Walk [Perozzi et al., 2014] generalizes the Skip-Gram model from word representation learning to the learning of node representations in a network. Given a random walk node sequence Sr = {vr1, vr2, . . . , vr|Sr|} on the given network, Deep Walk learns a representation for node vri by using it to predict its context nodes within t-window size {vri t, . . . , vri+t}\vri. This is achieved by maximizing the conditional probability of the occurrence of context nodes given the current node:

max Φ log Pr({vri t, . . . , vri+t}\vri|Φ(vri)). (1)

By making conditional independence assumption, the probability Pr({vri t, . . . , vri+t}\vri|Φ(vri)) is calculated as

Pr({vri t, . . . , vri+t}\vri|Φ(vri)) =

j=i t,j =i Pr(vrj|Φ(vri)).

(2) Above, the probability Pr(vrj|Φ(vri)) is modeled by softmax:

Pr(vrj|Φ(vri)) = exp(Ψ(vrj) Φ(vri)) P

v V exp(Ψ(v) Φ(vri)), (3)

where Φ(v) and Ψ(v) is the input and output representation of node v, respectively.

4.2 The UPP-SNE Algorithm For Deep Walk, the representation of node vi, Φ(vi), is learned from scratch, which is independent of node content features. However, as conﬁrmed by existing works (e.g., [Yang et al., 2015; Pan et al., 2016]), node content can provide crucial information to enhance network representation learning. Thus, we propose to construct node representations via a non-linear mapping from node proﬁle features ϕ(xi):

Φ(vi) = ϕ(xi). (4)

Kernel mapping is a common choice for non-linear mapping, in which ϕ(xi) could be inﬁnite dimensional. However, according to [Rahimi et al., 2007], the inﬁnite dimensional kernel space can be approximated by some low-dimensional feature space, which provides the potential for ϕ(x) to be lowdimensional. As shown in [Rahimi et al., 2007], for shiftinvariant kernels, the approximated low-dimensional feature mapping ϕ( ) can be written as (with high probability):

xi ϕ(xi) = 1 m

h cos(µT 1 xi), , cos(µT mxi),

sin(µT 1 xi), , sin(µT mxi) i T , (5)

where {µ1, , µm} are the m projection directions sampled according to the distribution from the Fourier transform of the kernel function. In social networks, user proﬁle attributes and network structure are highly interrelated. For example, users with similar attributes are much more likely to be friends, and groups of users with common attributes often form dense communities. Thus, in order to capture this dependency, we propose to encode network structure into the mapping ϕ( ) in the Deep Walk framework, such that noisy information in user proﬁles can be ﬁltered out, and useful information consistent with network structure can be preserved in the embedded space. In Deep Walk, to learn node representation Φ(v) for each node v V, we need to solve the joint optimization problem:

i=1 log Pr({vri t, . . . , vri+t}\vri|Φ(vri)), (6)

By making the conditional independence assumption, Eq. (6) is reformulated as

j=i t,j =i log Pr(vrj|Φ(vri)). (7)

For convenience, Eq. (7) can be reformulated as

j=1 n(vi, vj) log Pr(vj|Φ(vi)), (8)

where n(vi, vj) is the number of times that vj occurs in vi s context in a given set of random walk sequences. If vj never occurs in vi s context, the value of n(vi, vj) is 0. Following Eq. (3), the probability Pr(vj|Φ(vi)) is modeled by softmax. Here, we set Φ(vi) to ϕ(xi) and we note Ψ(vj), a 2m-dimensional parameter vector related to vj, as νj. Therefore, the probability Pr(vj|Φ(vi)) is formulated as

Pr(vj|Φ(vi)) = exp(νj ϕ(xi)) P|V| k=1 exp(νk ϕ(xi)) . (9)

After substituting Eq. (9) into Eq. (8), and adopting negative sampling [Gutmann and Hyv arinen, 2012], we can get the objective function of the optimization task in Eq. (8):

j=1 n(vi, vj)

log σ(νj ϕ(xi)) +

k=1 log σ( νNk i ϕ(xi))

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

where σ( ) is the sigmoid function with σ(x) = 1 1+exp( x) and N k i is the index for the kth sampled negative node for node vi. To construct node representation Φ(v) for each node v V, parameters {µ1, , µm} and {ν1, , ν|V|} need to be solved by the gradient descent algorithm:

µs , s = 1, , m;

νs , s = 1, , |V|; (11)

where the gradients are calculated as

j=1 n(vi, vj) σ ( νj ϕ(xi)) ϕ(xi)

k=1 σ νNk i ϕ(xi) ϕ(xi)

j=1 n(vi, vj) [1s(j)σ ( νj ϕ(xi)) ϕ(xi)

k=1 1s(N k i )σ νNk i ϕ(xi) ϕ(xi)

where ϕ(xi)

µs is a d 2m Jacobian matrix. 1s( ) is an indicator function, which is deﬁned as

1s(x) = 1 if x = s, 0 if x = s. (14)

The learning rate η is speciﬁed by line search.

Algorithm 1 The UPP-SNE Algorithm

Input: a given social network G = (V, E); Output: node representation Φ(v) for each node v V; 1: start random walk from each node by γ times; 2: count n(u, v) for each node context pair {u, v} in all random walk sequences; 3: generate K negative samples N 1 i , , N K i for each vi;

4: initialize {µs}m s=1 and {νs}|V| s=1 with random numbers; 5: repeat 6: calculate O

µs for each s according to Eq. (12); 7: specify learning rate η for updating µ by line search; 8: update µs for each s according to Eq. (11); 9: calculate O

νs for each s according to Eq. (13); 10: specify learning rate η for updating ν by line search; 11: update νs for each s according to Eq. (11); 12: until convergence or a certain # of iterations 13: construct Φ(v) for each v V with {µs}m s=1; 14: return Φ(v) for each v V.

The description of the UPP-SNE algorithm is given in Algorithm 1. In line 1-2, for each node acting as a starting node, γ random walk sequences are generated to calculate the statistics n(vi, vj). In line 5-12, to solve the optimization problem of Eq. (8), gradient descent iteration is used, which contributes to the main time consumption of our algorithm.

During the iteration, the most time-consuming steps are line 6 and line 9, i.e., the calculation of O

νs for each s. The time complexity of line 6 and line 9 is O(nnz(n) m d K) and O(nnz(n) m K), respectively, where nnz(n) is the number of non-zero entries of n(vi, vj) and d is the averaged number of non-zero entries of node proﬁle feature xi. As the sparsity of matrix n(vi, vj) and proﬁle feature xi is utilized, the calculation is efﬁcient.

5 Experiments

We evaluate the validity of the proposed algorithm through node classiﬁcation and clustering tasks.

5.1 Benchmark Social Networks We perform experiments on four real-world social networks listed as follows: Google+1 is an ego-network of a Google+ user and nodes in this network represent the user s friends. There are 1206 nodes and 66918 links in this network. We use people s gender as class label. Each node is described by a 940-dimensional bag-of-words vector constructed from treestructured user proﬁles [Leskovec and Mcauley, 2012]. Ego-Facebook1 is an ego-network of a Facebook user. This network consists of 755 nodes and 60050 links. People s education type is used as class label. Each node is described by a 477-dimensional vector [Leskovec and Mcauley, 2012]. Hamilton and Rochester2 are two of the collection of 100 US university Facebook networks [Traud et al., 2012]. The two networks contain 2118 nodes, 87486 edges, and 4145 nodes, 145305 edges respectively. Each node s proﬁle is described by 7 user attributes: student/faculty status ﬂag, gender, major, second major/minor, dorm/house, year, and high school. We select student/faculty status ﬂag as class label and construct a 144-dimensional, and a 235-dimensional binary feature vector for Hamilton and Rochester, respectively.

5.2 Baseline Methods We compare our algorithm with two groups of baselines: Network Structure Preserving NRL algorithms

Deep Walk [Perozzi et al., 2014] learns network representations with Skip-Gram model by using random walks to generate context for each node.

LINE-1 [Tang et al., 2015] is the version of LINE that models the ﬁrst-order proximity.

LINE-2 [Tang et al., 2015] is the version of LINE that models the second-order proximity.

node2vec [Grover and Leskovec, 2016] uses biased random walks to capture both local and global structure.

Content Augmented NRL algorithms

TADW [Yang et al., 2015] learns node representations by encoding node text features into the matrix factorization version of Deep Walk. It can be considered as a degraded version of UPP-SNE that uses a linear mapping.

1https://snap.stanford.edu/data/ 2https://escience.rpi.edu/data/DA/fb100/

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Table 2: Node classiﬁcation accuracy (%) on Google+

Training Ratio 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%

Deep Walk 55.24 56.51 60.85 62.17 63.41 62.30 65.86 66.08 67.06 67.39 LINE-1 52.46 54.97 58.28 59.81 60.74 61.30 62.27 63.08 63.67 63.37 LINE-2 52.14 56.24 58.78 61.18 62.49 63.33 64.83 64.83 65.81 65.50 node2vec 55.69 56.27 60.27 62.29 62.85 61.69 65.15 65.76 66.94 66.38 TADW 52.61 57.14 59.39 64.61 66.22 67.91 69.89 70.27 71.89 71.72 LANE 50.40 54.13 55.21 55.34 54.96 55.79 55.87 56.70 56.50 57.25 UPP-SNE 61.86 69.99 69.20 69.78 70.33 70.95 70.37 69.11 68.92 68.46

Table 3: Node classiﬁcation accuracy (%) on Hamilton

Training Ratio 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%

Deep Walk 84.64 86.76 87.63 89.40 89.67 90.11 90.37 90.81 90.79 90.51 LINE-1 80.68 83.16 83.98 86.94 87.36 88.22 89.11 89.56 89.89 89.90 LINE-2 80.23 82.41 83.03 86.78 86.79 87.87 88.36 89.05 89.15 89.55 node2vec 85.16 87.46 88.01 90.18 90.14 90.62 90.89 91.45 91.38 91.20 TADW 81.57 84.00 85.27 87.54 87.48 88.48 88.56 89.92 89.11 89.33 LANE 79.54 80.04 79.96 81.43 81.16 82.27 82.94 83.62 83.34 84.34 UPP-SNE 85.35 86.89 89.14 90.68 90.76 91.71 91.79 92.07 92.51 92.41

Table 4: Node classiﬁcation accuracy (%) on Rochester

Training Ratio 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%

Deep Walk 82.49 83.58 84.14 84.40 85.08 85.43 85.65 85.71 85.66 86.13 LINE-1 81.41 81.58 82.19 82.90 83.62 84.21 84.67 84.75 84.87 85.47 LINE-2 81.33 81.29 81.80 82.43 83.09 83.71 84.16 84.29 84.43 85.15 node2vec 82.24 82.93 83.29 83.99 84.57 84.99 85.52 85.25 85.33 86.02 TADW 80.68 80.83 81.44 81.70 81.98 82.33 82.83 83.16 83.32 84.26 LANE 80.91 80.90 80.84 80.98 81.10 81.09 81.11 81.22 81.17 81.48 UPP-SNE 83.28 84.88 85.89 86.14 87.20 87.83 87.61 87.82 88.33 88.20

LANE [Huang et al., 2017] learns three kinds of node latent representations by preserving node pairwise similarity deﬁned by node content, links, and labels, and project them into a common embedding space. Here, we only use the version that does not consider labels.

5.3 Parameter Settings For UPP-SNE, we set the number of random walks per node γ as 40, the walk length l as 40, the window size t as 10, and the number of iterations as 40. All baselines use default parameters as reported. The dimension of learned node representations m is set as 128 for all the algorithms.

5.4 Node Classiﬁcation We ﬁrst verify the effectiveness of the UPP-SNE algorithm for multi-class node classiﬁcation. To make fair comparisons, we vary the training ratio from 1% to 10% by an increment of 1%. For each training ratio, we randomly split the training set and test set for 10 times and report the averaged accuracy. Tables 2-4 report the classiﬁcation accuracy of all the algorithms on Google+, Hamilton and Rochester, where the best results are bold-faced. We can see that, in most cases, UPPSNE outperforms both network structure preserving baselines and content augmented baselines. On Google+, UPP-SNE s representations achieve the best accuracy on all training ratios except for 8%, 9% and 10%. On Hamilton, UPP-SNE performs best on 9 out of 10 training ratios. On Rochester, UPP-SNE beats all baselines on all training ratios. The performance gains of the UPP-SNE algorithm over the structure preserving NRL baselines indicate that, with user proﬁle information being properly exploited, better social network representations can be learned. This conﬁrms

the usefulness of user proﬁles in learning social network representations and the capacity of UPP-SNE in effectively capturing useful information from user proﬁles to complement network structure for learning better user representations. As reﬂected on Tables 3 and 4, on Hamilton and Rochester, TADW performs even worse than the only network structure preserving NRL algorithms. It implies that, on the two Facebook networks, structural relationships are more indicative if noisy user proﬁle information is not properly exploited. This observation also proves that the linear mapping used by TADW cannot effectively ﬁlter noisy information from user proﬁles, thereby degrading its performance. We can also see that, LANE suffers from the same problem in exploiting user proﬁles to learn social network representations. LANE learns latent user proﬁle and network representations separately, and project them to get the ﬁnal representations. Because the learning of latent user proﬁle representations and latent network representations is uncorrelated, the interplay between user proﬁle and network structure is not well captured, resulting in unsatisfactory performance.

5.5 Node Clustering To evaluate our method on node clustering tasks, we apply kmeans to the learned representations of nodes and use accuracy and NMI [Strehl and Ghosh, 2003] to assess the quality of the clustering results. We repeat the clustering process 20 times to reduce the sensitivity of k-means to the initial randomly selected centroids, and report the averaged results. Figure 3 reports the clustering results on Google+ and Ego-Facebook. As we can see, UPP-SNE consistently yields the best clustering results. On Google+, compared with the second best results, UPP-SNE achieves 200% improvement

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

0 10 20 30 40 50 60 # of iterations

Value of the objective function

(a) Google+

0 10 20 30 40 50 60 # of iterations

Value of the objective function

(b) Hamilton

0 10 20 30 40 50 60 # of iterations

Value of the objective function

(c) Rochester

Figure 1: Convergence of the objective function

32 64 96 128 160 192 224 256 m

Google+ Hamilton Rochester

10 20 30 40 50 60 70 80 γ

Google+ Hamilton Rochester

4 6 8 10 12 14 16 18 20 t

Google+ Hamilton Rochester

Figure 2: The effect of parameters m, γ, and t

Deep Walk LINE-1 LINE-2 node2vec TADW LANE UPP-SNE 0

NMI Accuracy

(a) Google+

Deep Walk LINE-1 LINE-2 node2vec TADW LANE UPP-SNE 0

NMI Accuracy

(b) Ego-Facebook

Figure 3: The performance of node clustering

over LINE-2 on NMI, and 4% improvement over LANE on accuracy, demonstrating the superior performance of our method. On Ego-Facebook, UPP-SNE signiﬁcantly outperforms TADW, yielding 14% and 11% performance gains on accuracy and NMI. This again conﬁrms UPP-SNE s better capability in handling and incorporating noisy user proﬁles to learn social network representations.

5.6 Convergence Analysis We also conduct experiments to investigate the convergence property of solving the optimization problem in Eq. (8). We vary the number of iterations from 1 to 60 and plot the corresponding value of the objective function on Google+, Hamilton, and Rochester, as shown in Figure 1. We can see that, our algorithm can achieve fast convergence within only 10 iterations on the three networks.

5.7 Parameter Sensitivity Study Experiments are performed to analyze UPP-SNE s sensitivity to three parameters: (1) m: the dimension of learned node representations; (2) γ: the number of random walks from

each node; and (3) t: the window size for collecting context nodes. In turns, we ﬁx any two of the three parameters and study the effect of the third one on the classiﬁcation accuracy. Figure 2 shows the accuracy of multi-class node classiﬁcation on Google+, Hamilton, and Rochester, when the training ratio is set as 5%. We can see that the performance of UPP-SNE is stable with respect to different values of the parameters.

6 Conclusion In this paper, we proposed a user proﬁle preserving social network embedding method. We argued that, although a handful of methods exist to leverage node content for network representation learning, node content and user proﬁle are largely different; while the former is topic-centric, the latter is noisy, inconsistent, highly incomplete, and uninformative. As a result, existing node content augmented network representation learning methods are ineffective to leverage node proﬁle information. Accordingly, we proposed to use a nonlinear mapping to augment network structure and node proﬁle information, in order to jointly learn an informative representation. Experiments on node classiﬁcation and clustering tasks demonstrate the superior performance of the proposed method in learning effective social network representations.

Acknowledgements This work is partially supported by the Australian Research Council (ARC) under discovery grant DP140100545, and by the Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning. Daokun Zhang is supported by China Scholarship Council (CSC) with No. 201506300082 and a supplementary postgraduate scholarship from CSIRO.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

References [Bhuyan et al., 2014] Monowar H Bhuyan, Dhruba Kumar Bhattacharyya, and Jugal K Kalita. Network anomaly detection: methods, systems and tools. IEEE Communications Surveys & Tutorials, 16(1):303 336, 2014. [Cao et al., 2015] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 891 900. ACM, 2015. [Grover and Leskovec, 2016] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 855 864. ACM, 2016. [Gutmann and Hyv arinen, 2012] Michael U Gutmann and Aapo Hyv arinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. Journal of Machine Learning Research, 13(2):307 361, 2012. [Huang et al., 2017] Xiao Huang, Jundong Li, and Xia Hu. Label informed attributed network embedding. In Proceedings of 10th ACM International Conference on Web Search and Data Mining, pages 731 739, 2017. [Leskovec and Mcauley, 2012] Jure Leskovec and Julian J Mcauley. Learning to discover social circles in ego networks. In Advances in Neural Information Processing Systems, pages 539 547, 2012. [L u and Zhou, 2011] Linyuan L u and Tao Zhou. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications, 390(6):1150 1170, 2011. [Mikolov et al., 2013] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, pages 3111 3119, 2013. [Pan et al., 2016] Shirui Pan, Jia Wu, Xingquan Zhu, Chengqi Zhang, and Yang Wang. Tri-party deep network representation. In Proceedings of the 25th International Joint Conference on Artiﬁcial Intelligence, pages 1895 1901, 2016. [Perozzi et al., 2014] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 701 710. ACM, 2014. [Rahimi et al., 2007] Ali Rahimi, Benjamin Recht, et al. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 20, pages 1177 1184, 2007. [Sen et al., 2008] Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. Collective classiﬁcation in network data. AI Magazine, 29(3):93, 2008.

[Strehl and Ghosh, 2003] A. Strehl and J. Ghosh. Cluster ensembles a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583 617, 2003. [Tang et al., 2015] Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. Line: Largescale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, pages 1067 1077. ACM, 2015. [Traud et al., 2012] Amanda L Traud, Peter J Mucha, and Mason A Porter. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications, 391(16):4165 4180, 2012. [Wang et al., 2016] Daixin Wang, Peng Cui, and Wenwu Zhu. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1225 1234. ACM, 2016. [Yang et al., 2013] Jaewon Yang, Julian Mc Auley, and Jure Leskovec. Community detection in networks with node attributes. In Proceedings of the 13th IEEE International Conference on Data Mining, pages 1151 1156. IEEE, 2013. [Yang et al., 2015] Cheng Yang, Zhiyuan Liu, Deli Zhao, Maosong Sun, and Edward Y Chang. Network representation learning with rich text information. In Proceedings of the 24th International Joint Conference on Artiﬁcial Intelligence, pages 2111 2117, 2015.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)