# interest_inference_via_structureconstrained_multisource_multitask_learning__2fdc02a6.pdf

Interest Inference via Structure-Constrained Multi-Source Multi-Task Learning

Xuemeng Song , Liqiang Nie , Luming Zhang , Maofu Liu , Tat-Seng Chua

National University of Singapore, Wuhan University of Science and Technology {sxmustc, nieliqiang, zglumg}@gmail.com, liumaofu@wust.edu.cn, chuats@comp.nus.edu.sg

User interest inference from social networks is a fundamental problem to many applications. It usually exhibits dual-heterogeneities: a user s interests are complementarily and comprehensively reﬂected by multiple social networks; interests are inter-correlated in a nonuniform way rather than independent to each other. Although great success has been achieved by previous approaches, few of them consider these dual-heterogeneities simultaneously. In this work, we propose a structure-constrained multi-source multi-task learning scheme to co-regularize the source consistency and the tree-guided task relatedness. Meanwhile, it is able to jointly learn the tasksharing and task-speciﬁc features. Comprehensive experiments on a real-world dataset validated our scheme. In addition, we have released our dataset to facilitate the research communities.

1 Introduction

User interest inference is the basis for many applications, such as adaptive E-learning [Abel et al., 2011a] and personalized service [Pennacchiotti and Popescu, 2011]. Take target advertisement as an example. It is naturally to market cosmetics to ladies, whom are keen on beauty. On the other hand, recently we have witnessed many people with diverse interests involving in multiple social networks simultaneously. Such trend has been statistically validated by a survey result: 52% of online adults use multiple social media services1. Multiple social networks comprehensively convey users interests from different view points. For instance, users may update their daily interests in Facebook, follow their interested accounts in Twitter, and ask or answer questions they are interested in Quora. Thus, fusing cues from multiple sources can potentially boost the performance of user interest inference by a large margin. Inferencing user interests from multiple social networks, however, is non-trivial due to the following reasons. (a) Source Integration. Although users footprints on

1According to Paw Research Internet Project s Social Media Update 2014: http://www.pewinternet.org/.

heterogeneous social networks describe their interests from different views, they should characterize a same interest preference consistently. Therefore, how to effectively and comprehensively fuse them is one tough challenge. (b) Interest Relatedness Characterization. Interests are usually not independent but correlated in a nonuniform way. For example, given a set of interests I = {basketball, football, travel, cooking}, the relatedness between basketball and football may be stronger than that between basketball and cooking. Given that in our dataset, most users who like to play basketball are more likely to spend their spare time on football than cooking. In the context of user interest inference, each interest is usually aligned with one task. Consequently, the second challenge is how to capture and characterize the relatedness among tasks and how to incorporate this into multi-task learning. (c) Discriminant Feature Selection. The discrimination of features is different from task to task. Learning task-sharing features and task-speciﬁc features effectively is signiﬁcant to user interest inference. This thus poses another crucial challenge for us.

It is noticeable that there are three lines of researches dedicated to the problem of user interest inference. One is the single source single task learning [Pennacchiotti and Popescu, 2011]. In this context, neither the relatedness among tasks nor the complementary information across sources is explored. Another line of efforts is the multitask learning [Xue et al., 2007]. They take the task relatedness into account to boost the learning performance and alleviate the problem of insufﬁcient training samples that the traditional single task learning is faced with. It has been observed that learning multiple related tasks simultaneously can improve the modeling accuracy and lead to a better learning performance, especially in cases where only a limited number of positive training samples exist for each task [Fei and Huan, 2013]. The third category of approaches is the multi-source learning [Abel et al., 2011b; 2013]. Instead of sticking to a single source, they propose to aggregate multiple sources to infer users interests. It should be noted that the last two categories of approaches have the weakness of: existing multi-task learning explores the relatedness among tasks, but overlooks the consistency among different sources of a single task; whereas existing multi-source learning ignores the value of the label information of the other related tasks.

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)

As an improvement to the existing works, we propose a structure-constrained multi-source multi-task learning (SM2L) scheme to infer users interests. In particular, our scheme jointly regularizes two important aspects. One is the source consistency. The rationale is that interests reﬂected by different social networks for the same person should be similar, and hence the disagreement among the prediction results should be penalized. The other is the tree-guided task relatedness modeling. Based on prior knowledge, we organize all the tasks (interests) into a tree structure, which can effectively capture various relatedness among tasks. Speciﬁcally, the tree structure settles all tasks in leaf nodes and characterizes the relatedness among them by internal nodes. Moreover, the higher level the internal node is located, the weaker is the relatedness imposed on its children tasks. This is accomplished by a tree-guided group lasso regularizer. Meanwhile, SM2L learns representative features for individual task and groups of related tasks. A potential beneﬁt of sharing training instances among tasks is that the data scarcity problem can be alleviated. Extensive experiments on a real-world dataset well validated our scheme. We have released our compiled dataset2, which will facilitate other researchers to repeat our approach and to comparatively verify their own ideas.

2 Related Work

The problem of user interest inference from multiple social networks exhibits dual-heterogeneities: each task (interest) corresponds to features from multiple sources. Towards this end, the most related work lies in the area of multi-view multi-task learning. [He and Lawrence, 2011] proposed a graph-based iterative framework for multi-view multi-task learning (Ite M2) in the context of text classiﬁcation. Given task pairs, Ite M2 projects them to a new Reproducing Kernel Hilbert Space based upon the common views they share. However, this is a transductive model, which fails to generate predictive models on independent and unknown samples. To deal with the intrinsic trouble of transductive models, [Zhang and Huan, 2012] presented an inductive multiview multi-task learning model (reg MVMT). It employs a co-regularization term to achieve model consistency on unlabeled samples from different views. Meanwhile, another regularization function is utilized across multiple tasks to guarantee that the learned models are similar. Noticeably, the implicit assumption that all tasks are uniformly related without prior knowledge might be inappropriate. Realizing this limitation, the authors proposed a revised model (reg MVMT+) that incorporates a component to automatically infer the task relatedness. As a generalized model of reg MVMT, an inductive convex shared structure learning algorithm for multi-view multi-task problem (CSL-MTMV) was developed in [Jin et al., 2013]. CSL-MTMV considers the shared predictive structure among multiple tasks. Notably, only a limited number of works have been published regarding multi-view multi-task learning and few of them have been applied to user interest inference.

2The compiled dataset is currently publicly accessible via: http://msmt.farbox.com/.

Distinguished from these existing methods which maximize the agreement between views using unlabeled data, SM2L works towards supervised learning with two advantages: 1) SM2L considers source consistency and tree-guided relatedness among tasks simultaneously; 2) SM2L allows the learning of task-sharing features and task-speciﬁc features using weighted group lasso, where the weights can be learned from prior knowledge.

3 User Interest Inference This section details the proposed SM2L scheme for user interest inference.

3.1 Notation We ﬁrst introduce the notations throughout this section. We use bold capital letters (e.g. X) and bold lowercase letters (e.g. x) to denote matrices and vectors, respectively. We adopt non-bold letters (e.g. x) to represent scalars, and Greek letters (e.g. λ) as regularization parameters. If not clariﬁed, all vectors are in column forms. Suppose we have a set of N labeled data samples, S 2 sources and T 2 tasks. Let Ds denote the number of features extracted from the s-th source. Let Xs RN Ds denote the feature matrix generated from source s, and each row represents a user sample. The feature dimension extracted from all these sources is thus D = PS s=1 Ds. The whole feature matrix can be written as X = {X1, X2, , XS} RN D. The label matrix can be represented as Y = {y1, y2, , y T } RN T , where yt = (y1 t , y2 t , , y N t )T RN corresponds to the label vector regarding the t-th task.

3.2 Problem Formulations For each task, we can learn S predictive models, each of which is generated from one source and deﬁned as follows,

fst(Xs) = Xswst, (1)

where wst = (w1 st, w2 st, , w Ds st )T RDs represents the linear mapping function for the t-th task with respect to the s-th source. The ﬁnal predictive model for task t can be reinforced via linear combination of these S models. Without the prior knowledge of source conﬁdence, we treat all sources equally as follows,

1 S fst(Xs). (2)

In multi-class problems, tasks are usually inter-correlated. Multi-source multi-task learning is thus proposed to model their relatedness while seamlessly integrating multiple sources. To select discriminant features, group lasso is considered in the component of multi-task learning. Let W = (w1, w2, , w T ) RD T denote the linear mapping block matrix, where wt = (w T 1t, w T 2t, , w T St)T RD. The multi-source multi-task learning with group lasso can be formalized as follows,

where wd s = (wd s1, wd s2, , wd s T ), PS s=1 PDs d=1 wd s = W 2,1 and λ is the nonnegative regularization parameter that regulates the sparsity of the solution regarding W. When T 2, the weights of one feature across all tasks are ﬁrst grouped by the L2 norm, and all features are then grouped by the L1 norm. Thus, the L2,1 norm penalty is able to select features based on their strength over all tasks. In this way, we can simultaneously learn the task-sharing features and taskspeciﬁc features. Obviously, when T = 1, this formulation reduces to Lasso [Tibshirani, 1996]. However, the above optimization problem simply assumes that all the tasks share a common set of relevant input features, which might be unrealistic in many real word scenarios. For example, in our work, the tasks basketball and football tend to share a common set of relevant input features, which are less likely to be useful for the task cooking . This consideration propels us to assume that the relatedness among different tasks can be characterized by a tree T with a set of nodes V. In particular, the leaf nodes represent all the tasks, while the internal nodes denote the groupings of leaf nodes. Intuitively, each node v V of the tree T can be associated with group Gv, which consists of all the leaf nodes (tasks) belonging to the subtree rooted at node v. Moreover, the higher level the internal node is located at, the weaker relatedness it controls. The root of T is assigned the highest level. To characterize such strength of relatedness among tasks, we assign a weight ev to each node v V according to the prior knowledge via a hierarchical agglomerative clustering algorithm [Schickel Zuber and Faltings, 2007]. As illustrated in Figure 1, it is apparent that the tasks basketball and football are more correlated as compared to the task cooking . Thus, in Figure 1, the tasks basketball and football are ﬁrst grouped in node v4 with a weight ev4 = 0.6. Then these two tasks are grouped in a higher level internal node v5, whose weight ev5 = 0.4, together with the task cooking . We mathematically formulate the source integration and tree-constrained3 group lasso into one uniﬁed model,

v V ev wd s Gv ,

where wd s Gv is a vector of coefﬁcients {wd st : t Gv}. In addition, we assume that the mapping functions from all sources agree with one another as much as possible. Therefore, we introduce the regularization term to model the result consistency among different sources. The ﬁnal objective function Γ is restated as follows,

v V ev wd s Gv

Xswst Xs ws t 2 , (5)

3Beyond tree-structure, our model is extendable to incorporate other structures, such as graph.

Figure 1: Illustration of inter-interests relatedness in a tree structure. where µ is the nonnegative regularization parameter that regulates the disagreement among models learned from different sources.

3.3 Optimization Considering that the second term in Eqn. (5) is not differentiable, we use an equivalent formulation of it, which has been proven by [Bach, 2008], to facilitate the optimization as follows,

v V ev wd s Gv 2 . (6)

Still, the L2,1 norm in the above formulation gives rise to a non-convex function, which makes it intractable to solve directly. Therefore, we further resort to another variational formulation [Argyriou et al., 2008] of Eqn. (6). According to the Cauchy-Schwarz inequality, given an arbitrary vector b RM such that b = 0, we have,

i=1 θ 1 i b2 i 1

i=1 θ 1 i b2 i 1

where θi s are introduced variables that should satisfy PM i=1 θi = 1, θi > 0 and the equality holds for θi = |bi|/ b 1. Based on this preliminary, we can derive the following inequality,

v V ev wd s Gv 2

v V ev wd s Gv 2

v V ev wd s Gv 2

e2 v wd s Gv 2

(8) where we introduce the variable qs,d,v. The equality can be attained if qs,d,v satisﬁes that,

qs,d,v = ev wd s Gv PS s=1 PDs d=1 P

v V ev wd s Gv . (9)

Consequently, minimizing Γ is equivalent to minimizing the following convex objective function,

e2 v wd s Gv 2

Xswst Xs ws t 2 . (10)

To facilitate the computation of the derivative of objective function Γ with respect to wst, we deﬁne a diagonal matrix Qst RDs Ds as follows,

Qst(d, d) = X

e2 v qs,d,v . (11)

Finally, we have the following objective function,

s=1 w T st Qstwst

Xswst Xs ws t 2 . (12)

We adopt the alternating optimization strategy to solve Eqn. (12) [Kim and Xing, 2010]. Particularly, we alternatively optimize wst and qs,d,v, where we optimize one variable with the other one ﬁxed in each iteration and keep this iterative procedure until the objective value converges. When qs,d,v is ﬁxed, we take the derivative of objective function Γ regarding wst as follows,

Γ wst = 1 NS XT s (

1 S Xswst yt) + λQstwst

µ N XT s (Xswst Xs ws t). (13)

Setting Eqn. (13) to zero and rearranging the terms, we derive that all wst s can be learned jointly by the following linear system given a task t,

L11 L12 L13 L1S L21 L22 L23 L2S L31 L32 L33 L3S ... ... ... ... ... LS1 LS2 LS3 LSS

w1t w2t w3t ... w St

b1t b2t b3t ... b St

where Lt RD D is a sparse block matrix with S S blocks, wt RD and bt RD are both sparse block matrices with S blocks. Lss, Lss and bst are deﬁned as,

Lss = 1 NS2 XT s Xs + µ(S 1)

N XT s Xs + λQst, Lss = 1 NS2 XT s Xs µ

N XT s Xs , bst = 1 NS XT s yt. (15)

According to the deﬁnition of positive-deﬁnite matrix, Lt can be easily proven to be positive deﬁnite and invertible. Then we can derive the closed-form solution of wt as follows,

wt = L 1 t bt. (16)

Furthermore, we notice that wt can be computed individually, which saves considerable space and time cost. On the other hand, we optimize qs,d,v according to Eqn. (9) with ﬁxed wt.

3.4 Construction of Interest Tree Structure We aim to employ the hierarchical agglomerative clustering algorithm to construct the tree structure. One challenge is that an interest is usually represented by a single concept, which makes it hard to measure the similarities among interests and

apply the hierarchical agglomerative clustering algorithm. Towards this end, two types of prior knowledge are utilized. 1) External source. We exploit an external source the Web, where a huge amount of prior knowledge about interests are encoded implicitly. We transform each interest into a query and submit it to Google search engine. We collect the top 10 webpages, and then employ the library of Boiler Pipe4 [Kohlsch utter et al., 2010] to extract clean main contents from the returned webpages. Therefore, each interest can be represented by a document, based on which the Bag-of-words model [Mitchell, 1997] with TF-IDF term weighting scheme [Salton and Mc Gill, 1983] can be applied and the similarities among interests can be evaluated. 2) Internal source. Although the external source provides us the general prior knowledge, we believe that the internal prior knowledge stored in our dataset also plays a vital role in user interest inference. Driven by this consideration, we propose to measure the similarities among interests based on their co-occurrence in users Linked In proﬁles in our dataset5. It deserves attention that we exploit all available Linked In proﬁles that exhibit users personal interests rather than that of the subset of users selected for the task of interest inference. Suppose we have a set of interests I = {In1, In2, , In T }, and a set of documents DD = {d1, d2, , d N}, where dl contains all interests of user l. Let c(j, k, l) = 1 if and only if interests Inj and Ink both occur in dl, and otherwise c(j, k, l) = 0. Then the co-occurence matrix H is deﬁned as follows,

( P l c(j,k,l) P j P l c(j,k,l) if j = k;

1 otherwise. (17)

Each row of H corresponds to the co-occurrence of an interest with others. Then we use the Jensen Shannon divergence [Bordag, 2008] to measure the similarities among interests. Then it is suggested to apply the hierarchical agglomerative clustering algorithm on these enriched interests and build the tree structure. To assign appropriate weights to nodes, we choose to utilize the normalized height hv of subtree rooted at node v to characterize its weight ev, where ev = 1 hv. Such assignment guarantees the aforementioned condition that the higher node corresponds to the weaker relatedness. It is noted that we normalize the heights for all nodes such that the root node is at height 1. We thus derive two models SM2L-e and SM2L-i based on two types of prior knowledge, respectively.

3.5 Complexity Discussion To analyze the complexity of SM2L, we need to solve the time cost in terms of constructing Q, Lt and bt, deﬁned in Eqn. (11) and Eqn. (15), as well as computing the inverse of Lt. Assuming D S, the construction of diagonal matrix Q has a time complexity of O(DT), and the construction of matrix Lt has a time complexity of O(ND2). Due to the fact that the time cost of matrix multiplication XT s Xs and that of constructing bt involved in Eqn. (15) remain the same for all iterations and Lt is symmetric, we can reduce

4https://code.google.com/p/boilerpipe/. 5Users may list a set of personal interests in their Linked In proﬁles.

the practical time consumption remarkably. In addition, computing the inverse of Lt has the complexity of O(D3) by the standard method. Then the total complexity should be O(D3T). We notice that the speed bottleneck lies in the number of features and the number of tasks instead of the number of data samples. As D is usually small, SM2L should be computationally efﬁcient.

4 Experiments

In this paper, we cast the problem of user interest inference as the structure constrained multi-source multi-task learning problem. In particular, we explored four popular social networks: Twitter, Facebook, Quora and Linked In.

4.1 Dataset Construction

To construct the benchmark dataset, we need to ﬁrst tackle the problem of social account alignment , which aims to identify the same users across different social networks by linking their multiple social accounts [Abel et al., 2013]. To accurately establish this mapping, we employed the emerging social service Quora, which encourages users to explicitly list their multiple social accounts in their Quora proﬁles6. We collected candidates from Quora by the breadth-ﬁrstsearch method. In the end, we harvested 172, 235 Quora user proﬁles and only retained those who provided their Facebook, Twitter and Linked In accounts in their Quora proﬁles. Based on these mappings, we launched a crawler to collect their historical social contents, including their basic proﬁles, social posts and relations. To build the ground truth, we employed the structural information of users linkedin proﬁles: Additional Information , which usually contains information about users personal interests. Users interests listed in their Linked In proﬁles are usually represented by phrases separated by comma, which facilitates the ground truth construction to a large extent. To obtain the representative interests, we ﬁltered out the interests that are liked by less than 15 users. Finally, we obtained 74 interests7. Then we only retained those users who expressed these interests in their Linked In proﬁles and obtained 1, 607 users ultimately. Figure 2 shows the user frequency distribution with respect to the number of interests over our dataset.

4.2 Feature Extraction

To informatively describe users, we extracted two kinds of features: user topics and contextual topics. User topics. We explored the topic distributions of users social posts to infer users interests. We generated topic distributions using Latent Dirichlet Allocation (LDA) [Blei et al., 2003], which has been widely found to be useful in latent topic modeling [Cimiano et al., 2009; Iwata et al., 2009]. Based on perplexity [Li et al., 2010], we ultimately obtained

6One representative example can be seen via https://www.quora.com/Martijn-Sjoorda. 7These interests are available at http://msmt.farbox.com/.

Figure 2: Distribution of user frequency distribution with respect to the number of interests over our dataset.

89, 24, 119 dimensional topic-level features respectively over users Twitter8, Facebook9 and Quora10 data. Contextual topics. We deﬁne users contextual topics as the topics of users connections. As it goes that birds of a feather ﬂock together , we believe that the contextual topics intuitively reﬂect the contexts of users and further disclose users interests. Particularly, we studied followee connections in Twitter because of their intuitive reﬂection of topics that users are concerned with. As the bio descriptions are usually provided by users to brieﬂy express themselves and may indicate users summarized interests, we merged the bio descriptions of a user s followees into a document, on which we further applied LDA model. We utilized the perplexity to tune the dimensions of topic-level features over these bio documents and obtained a 64 dimensional feature space. In this work, we only explored the contextual topics in Twitter, since the bio descriptions are usually missing in Facebook and Quora.

4.3 On Evaluation Matrics

For the task of user interest inference, precision is of more importance as compared to recall. We thus validated our scheme via two metrics: S@K and P@K. S@K: It represents the mean probability that a correct interest is captured within the top K recommended interests. P@K: It stands for the proportion of the top K recommended interests that are correct. All the experiments were conducted over a server equipped with Intel(R) Xeon(R) CPU X5650 at 2.67GHZ on 48GB RAM, 24 cores and 64-bit Cent OS 5.4 operating system.

4.4 On Model Comparison

We compared SM2L with the following ﬁve baselines. SVM: The ﬁrst baseline is a traditional single source single task learning method support vector machine (SVM) [Cortes and Vapnik, 1995], which simply concatenates the features generated from different sources into a single feature vector and learns each task individually. We chose the learning formulation with the kernel of radial-basis function, implemented based on LIBSVM [Chang and Lin, 2011].

8Users Twitter data refers to users historical tweets. 9Users Facebook data refers to users historical timelines. 10Users Quora data refers to users historical questions and answers.

RLS: The second baseline is the regularized least squares (RLS) model [Kim et al., 2007], which also learns each task individually and aims to minimize the objective function of 1 2N yt PS s=1 1 S Xswst 2 + λ

2 wt 2. reg MVMT: The third baseline is the regularized multi-view multi-task learning model, introduced in [Zhang and Huan, 2012]. This model regulates both the source consistency and the task relatedness. However, it simply assumes the uniform relatedness among tasks. SM2L-eu: The fourth baseline is a derivation of SM2L-e. This method constructs the tree structure based on external source in the same manner as SM2L-e but assigns uniform weights to all nodes. SM2L-iu: The ﬁfth baseline is a derivation of SM2L-i, which constructs the tree structure using internal source but weights all nodes uniformly. We adopted the grid search strategy to determine the optimal values for the regularization parameters among the values {10r : r { 12, , 1}}. Experimental results reported in this work are the average values over 10-fold cross validation. Noticeably, we tuned the K in S@K and P@K from 1 to 10 and reported the optimal performance for each fold. Generally, the S@K reaches the maximum at K = 10, while K = 1 is much preferable regarding P@K.

Table 1: Performance comparison among various models.

Approaches P@K (%) S@K (%) SVM 8.69 54.69 RLS 24.32 73.86 reg MVMT 24.69 74.54 SM2L-eu 25.50 73.80 SM2L-iu 24.56 74.11 SM2L-e 25.72 74.57 SM2L-i 26.50 74.85

Table 1 shows the performance comparison between baselines and our proposed scheme. We observed that SM2L-i and SM2L-e both outperform the single source single task learning SVM and RLS. This veriﬁes the signiﬁcance of considering source consistency and task relatedness simultaneously. Moreover, it is not unexpected that SVM achieves the worst performance. A possible explanation might be the insufﬁcient positive training samples for certain interests. For example, only 24 positive training samples are available for the interest surﬁng . In addition, the less satisfactory performance of reg MVMT, as compared to SM2L-i and SM2L-e, conﬁrms that it is advisable to characterize the task relatedness in a tree structure instead of correlating all tasks uniformly. Besides, SM2L-i and SM2L-e show superiority over SM2L-iu and SM2L-eu respectively, which enables us to draw a conclusion that modeling the relatedness strength among tasks merits our particular attention. Last but not least, SM2L-i performs better than SM2L-e. This ﬁnding demonstrates the importance of prior knowledge extracted from our internal source. Based on the practical results, the time complexity of reg MVMT is remarkably higher than that of SM2L. In particular, reg MVMT costs about 562 seconds to execute,

114 times of that taken by SM2L for each iteration. This is mainly attributed to the computation of the inverse of a matrix with dimension of DT, which requires a time complexity of O(D3T 3). Compared to SM2L, it is rather time consuming using reg MVMT.

4.5 On Source Comparison

To shed light on the descriptiveness of multiple social network integration, we conducted experiments over various source combinations. Table 2 shows the performance of SM2L-i over individual social network and their various combinations. We noted that the more sources we incorporated, the better the performance can be achieved. This suggests the complementary relationships instead of mutual conﬂicting relationships among the sources. Moreover, we found that aggregating data from all these three social networks can achieve better performance as compared to each of the single source. Interestingly, we observed that SM2L over Twitter alone achieves a much better performance, as compared to that using Quora or Facebook alone. This may be caused by that we additionally extracted contextual topics apart from user topics in Twitter, which can reveal users interests more directly. It is far from incomprehensible that SM2L would degenerate to multi-task learning when the context problem involves only one single source.

Table 2: Contribution of individual social network and their various combinations.

Social network combinations P@K (%) S@K (%) Twitter 24.75 73.05 Facebook 19.59 69.74 Quora 20.97 68.19 Twitter+Facebook 25.51 74.98 Twitter+Quora 24.89 74.41 Facebook+Quora 22.52 71.80 Twitter+Facebook+Quora 26.50 74.85

5 Conclusions and Future Work

This paper presented a structure-constrained multi-source multi-task learning scheme in the context of user interest inference. In particular, this scheme takes both the source consistency and the tree-guided task relatedness into consideration by introducing two regularizations to the objective function. Moreover, the proposed model is able to effectively select the task-sharing features and task-speciﬁc features by employing the weighted group lasso. Notably, the weights can be learned from two kinds of prior knowledge: external source and internal source. Experimental results demonstrate the effectiveness of our proposed scheme. Currently, we only consider studying users distributed textual data. In the future, we will extend our work to investigate users visual information on social media services.

Acknowledgments

This research is supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Ofﬁce.

References [Abel et al., 2011a] Fabian Abel, Ilknur Celik, Claudia Hauff, Laura Hollink, and Geert-Jan Houben. U-sem: Semantic enrichment, user modeling and mining of usage data on the social web. ar Xiv preprint ar Xiv:1104.0126, 2011. [Abel et al., 2011b] Fabian Abel, Eelco Herder, and Daniel Krause. Extraction of professional interests from social web proﬁles. In UMAP, 2011. [Abel et al., 2013] Fabian Abel, Eelco Herder, Geert-Jan Houben, Nicola Henze, and Daniel Krause. Cross-system user modeling and personalization on the social web. UMUAI, 2013. [Argyriou et al., 2008] Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. Convex multi-task feature learning. Machine Learning, 2008. [Bach, 2008] Francis R Bach. Consistency of the group lasso and multiple kernel learning. JMLR, 2008. [Blei et al., 2003] David M Blei, Andrew Y Ng, and Michael I Jordan. Latent dirichlet allocation. JMLR, 2003. [Bordag, 2008] Stefan Bordag. A comparison of cooccurrence and similarity measures as simulations of context. In CICLing. 2008. [Chang and Lin, 2011] Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library for support vector machines. TIST, 2011. [Cimiano et al., 2009] Philipp Cimiano, Antje Schultz, Sergej Sizov, Philipp Sorg, and Steffen Staab. Explicit versus latent concept models for cross-language information retrieval. In IJCAI, 2009. [Cortes and Vapnik, 1995] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 1995. [Fei and Huan, 2013] Hongliang Fei and Jun Huan. Structured feature selection and task relationship inference for multi-task learning. KAIS, 2013. [He and Lawrence, 2011] Jingrui He and Rick Lawrence. A graph-based framework for multi-task multi-view learning. In ICML, 2011. [Iwata et al., 2009] Tomoharu Iwata, Shinji Watanabe, Takeshi Yamada, and Naonori Ueda. Topic tracking model for analyzing consumer purchase behavior. In IJCAI, 2009. [Jin et al., 2013] Xin Jin, Fuzhen Zhuang, Shuhui Wang, Qing He, and Zhongzhi Shi. Shared structure learning for multiple tasks with multiple views. In ECML/PKDD. 2013.

[Kim and Xing, 2010] Seyoung Kim and Eric P Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In ICML, 2010. [Kim et al., 2007] S Kim, Kwangmoo Koh, Michael Lustig, Stephen Boyd, and Dimitry Gorinevsky. A method for large-scale l1-regularized least squares problems with applications in signal processing and statistics. J-STSP, 2007. [Kohlsch utter et al., 2010] Christian Kohlsch utter, Peter Fankhauser, and Wolfgang Nejdl. Boilerplate detection using shallow text features. In WSDM, 2010. [Li et al., 2010] Daifeng Li, Bing He, Ying Ding, Jie Tang, Cassidy Sugimoto, Zheng Qin, Erjia Yan, Juanzi Li, and Tianxi Dong. Community-based topic modeling for social tagging. In CIKM, 2010. [Mitchell, 1997] Tom M Mitchell. Machine learning. 1997. Mc Graw Hill, 1997. [Pennacchiotti and Popescu, 2011] Marco Pennacchiotti and Ana-Maria Popescu. A machine learning approach to twitter user classiﬁcation. ICWSM, 2011. [Salton and Mc Gill, 1983] Gerard Salton and Michael J Mc Gill. Introduction to modern information retrieval. Mc Graw Hill, 1983. [Schickel-Zuber and Faltings, 2007] Vincent Schickel Zuber and Boi Faltings. Using hierarchical clustering for learning theontologies used in recommendation systems. In KDD, 2007. [Tibshirani, 1996] Robert Tibshirani. Regression shrinkage and selection via the lasso. JRSS B, 1996. [Xue et al., 2007] Ya Xue, Xuejun Liao, Lawrence Carin, and Balaji Krishnapuram. Multi-task learning for classiﬁcation with dirichlet process priors. JMLR, 2007. [Zhang and Huan, 2012] Jintao Zhang and Jun Huan. Inductive multi-task learning with multiple view data. In KDD, 2012.