# locationsensitive_user_profiling_using_crowdsourced_labels__fdfcee01.pdf

Location-Sensitive User Proﬁling Using Crowdsourced Labels

Wei Niu, James Caverlee, Haokai Lu Department of Computer Science and Engineering, Texas A&M University College Station, TX 77840, USA {wei,caverlee,hlu}@cse.tamu.edu

In this paper, we investigate the impact of spatial variation on the construction of location-sensitive user proﬁles. We demonstrate evidence of spatial variation over a collection of Twitter Lists, wherein we ﬁnd that crowdsourced labels are constrained by distance. For example, that energy in San Francisco is more associated with the green movement, whereas in Houston it is more associated with oil and gas. We propose a three-step framework for location-sensitive user proﬁling: ﬁrst, it constructs a crowdsourced label similarity graph, where each labeler and labelee are annotated with a geographic coordinate; second, it transforms this similarity graph into a directed weighted tree that imposes a hierarchical structure over these labels; third, it embeds this location-sensitive folksonomy into a user proﬁle ranking algorithm that outputs a ranked list of candidate labels for a partially observed user proﬁle. Through extensive experiments over a Twitter list dataset, we demonstrate the effectiveness of this location-sensitive user proﬁling.

Introduction User proﬁles are a valuable component of many applications, including recommender systems, search engines, question-answering systems, and online social networks. These proﬁles provide insight into the interests and expertise of each user, and can lead to improved personalization of the underlying system (?; ?; ?). Many systems rely on an explicit deﬁnition of a user proﬁle for example, by ﬁlling in an About section in a social media proﬁle or by directly selecting topics of interest on a question-answer system. Alternatively, implicit user proﬁles can be uncovered through methods like query log mining, running Latent Dirichlet Allocation (LDA) over a user s posts, or by applying matrix factorization approaches to identify hidden (or latent) topics of interest (?; ?; ?; ?). In a complementary direction, recent years have seen the development of crowdsourced user proﬁles construction, e.g., (?; ?; ?). In this scenario, crowds of users apply descriptive labels on other users, so that in the aggregate these labels provide a crowdsourced user proﬁle of the target user. For example, Twitter Lists and Linked In s Skill tags provide partial perspective on what users are known for by aggregating crowd

Copyright c 2018, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

labeling knowledge. However, the vast majority of users have no crowd labels; their expertise are essentially hidden from important applications such as personalized recommendation, community detection, and expert mining. In this paper, we aim to extend the reach of these crowdsourced methods, so as to construct robust user proﬁles for the long-tail of users for whom we have incomplete labels. A natural approach to extend the reach of these crowdgenerated labels is to apply existing tag recommendation methods (?; ?; ?; ?; ?). However, many of these approaches have viewed tag relationships without regard for the local variations that are inherent in real-world crowdsourced labels of users. For example, we ﬁnd in a sample of Twitter Lists that the label energy in San Francisco is more associated with the green movement, whereas in Houston it is more associated with oil and gas. These spatial variations are a critical component of crowdsourced labels and require careful consideration. Beyond just the presence of different relationships across locations, there is often a variation in the strength of this relationship from location to location. For example, stock and finance are more closely related in New York City than in Portland. Further, there is even a potential for varying location-speciﬁc senses of a tag (polysemy). For example, the tag rockets in the Houston area may be associated with the local NBA team instead of other senses of the word. Hence, we explore the impact of spatial variation on the construction of location-sensitive user proﬁles. Our main intuition is that spatial variation over crowdsourced labels can be modeled in a location-sensitive folksonomy to provide a comprehensive and up-to-date picture of location-aware topics, topic relations, and a ﬁne-grained topic level view of the social media corpus, which may mitigate the sparsity inherent in the raw labels. Recent studies (?; ?) have shown how hierarchical topic structures can improve ranking and recommendation, indicating the importance of folksonomies. Thus, we aim to study the impact of location-sensitive hierarchical structures of crowdsourced tags on user proﬁling. Concretely, we propose an approach for location-sensitive user proﬁling as illustrated in Figure ??. First, we construct a crowdsourced label similarity graph induced from crowdsourced labels, where each labeler and labelee are annotated with a geographic coordinate; this similarity graph varies by location to capture spatial variations of the kind identiﬁed

The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)

Figure 1: Overall approach: constructing location-sensitive user proﬁles from crowdsourced labels

above (e.g., the similarity graph for San Francisco will link energy with green movement). Second, we transform this similarity graph into a directed weighted tree that imposes a hierarchical structure over these labels, such that labels like sports are higher in the tree, whereas labels like rockets are lower, thereby providing ﬁner granularity for building user proﬁles. Finally, we embed this locationsensitive folksonomy into a user proﬁle ranking algorithm that outputs a ranked list of candidate labels for a partially observed user proﬁle. Through extensive experiments over a Twitter list dataset, we demonstrate the effectiveness of this location-sensitive user proﬁle estimation.

Related Work In this section, we highlight several research directions that inform the work presented here. User and Resource Proﬁling. User proﬁling is critical for enabling effective information services. Many efforts are devoted to proﬁle a user s topic interest for applications in personalized search (?), targeted advertisement (?), social media (?), and expert search (?). Many other research efforts aim to reveal users demographic information like age and gender (?; ?; ?). For example, Li et al. (?) propose a coproﬁling approach to proﬁle users attributes like employer, college, and circles of friends in a joint fashion. Meanwhile, another line of research focuses on recommending tags to resources, targeting the sparsity of collaborative tagging in order to construct comprehensive tag proﬁles (?; ?; ?). Some proposed approaches are based on topic models, association rule mining, and context information (?; ?; ?). Later works construct hierarchical folkonomies to assist resource tag prediction (?; ?). In our work, we leverage location-sensitive folksonomy for user proﬁling and we adopt a learning to rank approach that automatically weighs a group of factors to minimize the prediction error.

Folksonomy Construction. Considerable research has been devoted to folksonomy generation (?; ?; ?; ?; ?; ?). Many approaches consider the co-occurrence of tag pairs (?; ?; ?). Generally, they ﬁrst identify subsumption relations of tag pairs using unsupervised approaches and then prune these relations into a tree. Our method differs in two key aspects: (i) we are constructing a location-sensitive foklsonomy that models the knowledge structures focused on a particular location; and (ii) we propose to induce the folksonomy using an optimization algorithm that best conserves the graph structure.

Figure 2: Probability of tagging as a function of distance between labeler and user.

Geographic Inﬂuence. The impact of geographical distance has been widely studied for online social interactions (?; ?). Additionally, there are also studies of spatial variation over query logs and social media, e.g., (?; ?; ?; ?; ?). Our intuition is that due to the geographic, cultural, and structural differences among locations, there could be corresponding differences reﬂecting how people organize information in these locations.

Location-Sensitive User Proﬁling Our overarching goal is to estimate high-quality user proﬁles that respect this observed spatial variation. We assume some partial coverage of users via existing crowdsourced tags (e.g., from Twitter Lists or Linked In s Skill Tags), but that many tags are unknown. That is, given a user u s full (but hidden) tag proﬁle P(u), we have visibility only to some portion of this proﬁle Pk(u) where Pk(u) P(u). The goal is to estimate the unseen tags ti of u where ti P(u) Pk(u). Our intuition is that the spatial variation of how tags are applied can be carefully modeled to create high-quality user proﬁles.

Spatial Variation in Crowdsourced Labels In this section, we provide data-driven evidence for spatial variation in crowdsourced labels from a collection of Twitter lists (described more fully in Experiments). Twitter lists are one form of crowdsourced tagging, whereby individual users can add other users to a curated list annotated with a name.

Figure 3: Example Tag Pairs Similarity.

How does distance impact tagging? We begin by investigating in Figure ?? the impact of distance on the probability that a list labeler will include another user on a list. We observe that the probability of tagging is exponentially decaying with distance, which indicates a user is less likely to be tagged by a labeler as the distance between them increases. This spatial locality is a well-known property of many ofﬂine relationships and has been conﬁrmed repeatedly even in online scenarios where distance is not inherently a limiting factor. This locality of tagging suggests that a method for user proﬁle prediction that is induced from these crowd-based tags should reﬂect local knowledge; that is, since tags are not uniformly applied across distances, there may exists local variations of interest.

Example location-sensitive relationships. We consider the relationships between pairs of labels across different locations in our dataset. Representing each tag as a location-speciﬁc vector (see the following section for additional details), we show in Figure ?? the relationships of a group of tag pairs at multiple locations using cosine similarity. The x-axis shows each tag pair and each color bar represents the similarity at a location. We observe that the magnitude of tag-pair relations varies across different locations. For example, we ﬁnd the similarity between general concepts like nba and basketball tends to be relatively even across locations, with London having the lowest value; finance and stock have highest conﬁdence in New York while lowest in Houston. Another typical example is the similarity between energy with green and oil. Interestingly, we notice energy and oil have the strongest relationship in Houston, while energy and green bond closest in San Francisco. This ﬁts our understanding of these locations, since Texas is a major oil and gas hub, while San Francisco is a more eco-friendly community. These phenomena suggest location-sensitive user proﬁling has potential to reﬂect the characteristics of these locations.

Next, we shed light on the three-step framework (as shown in Figure ??).

Crowdsourced Label Similarity Graph Given a group of users U = {u1, u2, ..., un}, where each user u is associated with a geographic coordinate lu and a tag proﬁle Pu which contains a variable number of (tag, frequency) pairs {(t1, f1), (t2, f2), ...}. Our goal is to build a location-sensitive tag similarity graph from these proﬁles. We begin by proposing a distance weighting scheme which weights the proﬁle tags of a user according to how far this user is from the target location. Our intuition is a distant labeler is considered less knowledgeable about local users. We adopt a model popularized in the GIS literature the zone of indifference model for capturing this spatial inﬂuence. The key idea is to combine the inverse distance with a ﬁxed distance band model. In this model, all users within the distance band are considered equally important and once beyond the threshold distance, a user s inﬂuence drops off quickly following an exponential rate. We empirically set the distance band as 50 miles for large cities, which deﬁnes a circle area centered at the target location. Hence, the weight of a user w.r.t the target location lt is

wu(lt) = 1 if d 50 ( d 50

50 ) α else

where d is the distance from a user s location lu to lt and α is a constant, set experimentally. Thus, we utilize the whole dataset for constructing the location-sensitive folksonomy for each location. This avoids the sparsity issues that may arise (if we were to build a location-sensitive folksonomy using only locally-available tags) and mitigates data imbalances across locations (so a smaller city is not penalized in folksonomy creation relative to a larger city). We represent each tag ti at target location lt as a tf-idf vector of the users who are labeled with the tag, and each user is weighted by her corresponding inﬂuence wu(lt): t i = ti wi. Tag vectors vary in each location according to the users and weights. With these tag vectors t i, we then construct a directed tag similarity graph. We compare four similarity measures for tag pairs. Three are symmetric measures including cosine, RBF kernel, and pointwise mutual information (PMI). The fourth measure is imbalanced (meaning the strength of one tag to a second tag is not necessarily the same as in the reverse situation) and based on a modiﬁed version of the traditional association rules notion of conﬁdence (what we term modiﬁed conﬁdence, or MConf).

Modiﬁed conﬁdence. Finally, we adopt the idea of conﬁdence from association rule mining which has been used for inferring subsumption relations between tags. Since instead of predicting very general tags, we would like to predict tags that are as speciﬁc as possible. Thus we propose a modiﬁed conﬁdence metric (ti tj):

MConf(ti tj) = C(ti, tj)

F(ti) (1 |log(F(tj)) log(F(ti))|

log(max F) )

where C(ti, tj) = min(f1, f2) is the co-occurrence frequency of the tag pair, F(ti) =

u fi is the overall frequency of ti in the corpus and max F is the overall frequency of the most used tag in the corpus. Here we use conﬁdence as a criteria for ordering tags and we only consider

conﬁdence for cases that conf(ti tj) conf(tj ti). The reason of multiplying a weight is to avoid connecting tags with large frequency difference as mentioned in the second drawback. Even if conﬁdence is high, there might be some intermediate nodes that ﬁt between the nodes. As tag frequency versus number of tags follow a power law distribution, we model frequency difference with damping factor which is a fraction in log scale. When there is no frequency difference, the factor is 1, when there is large frequency difference, the factor decays to 0.

Location-Sensitive Folksonomy Construction

Given these measures of tag similarity that capture both user and location inﬂuences and similarity graph, we next turn to induce a folksonomy bound to a target location l, which is represented as directed rooted tree (arborescence) T = (V, E), where each tag has a unique parent. The node set V contains all unique tags that U has been labeled with, and edge set E contains subsumption relations of tag pairs. The abstractness of each tag is controlled by its level in the tree. We can further assign weights to edges to capture the similarity between tag pairs. Note that we will build a different location-sensitive folksonomy for each location of interest (e.g., one for Houston, one for Chicago). In order to deﬁne an order of abstractness for the tags, we calculate the closeness centrality of each tag, deﬁned as: Centrality(t) = j sim(t, tj), which sums up each tag s similarity with all other tags. This deﬁnition forces general tags to have high centrality. Since the modiﬁed conﬁdence metric already assigns a direction for a pair of tags, this step is exempted. Then we organize the tags and relations into a directed weighted graph G. To do this, we initialize G with a ROOT node and add an edge for the tag pair when the similarity is above a pre-deﬁned threshold (when 50% of tag-pairs are related) in Algorithm ?? line ??-??. The weight of the edge is set as the similarity value. Then, we assign a direction for each tag pair from the high centrality node to the low centrality node. As the graph is very likely not connected, we make the ROOT node point to every other node, with edge weight equal to the pre-deﬁned threshold to make the graph weakly connected. A directed rooted tree has a root vertex and exactly one directed path from root to any other vertex. A straightforward criteria is to ﬁnd a tree that maximizes the edge weights. In essence, this follows a greedy strategy which was used previously by Heymann et al. (?). They proposed to iteratively add nodes in a decreasing centrality order to a tree which maximizes similarity. From a graph point of view, we can apply Chu-Liu/Edmond s algorithm(?) over the similarity graph. The core procedure is ﬁnding the edge incoming to node t of highest weight (with ties broken arbitrarily) for each t other than the ROOT. Since the edge order is predeﬁned according to centrality, the graph is guaranteed to have no cycles and we can simplify the algorithm to forgo this cycle check.

Generalized Cost Function Although different metrics are adopted for characterizing the relation between a pair of tags, they share the similar greedy strategy of minimizing

Algorithm 1: Mincost Tree Formation

Input: Tag vectors

1 Calculate similarity between each pair of node (ti, tj)

2 Initialize directed weighted graph G with ROOT node

3 Add an edge (ti, tj) when Sim(ti, tj) > threshold

4 Assign direction for edges following centrality order

5 Add an edge between ROOT and each node with weight = threshold

6 while n(edge)>n-1 do

7 for each edge (ti, tj) do

8 G remove edge (ti, tj) from graph G

9 Find shortest path from ti to tj in G

10 Calculate cost of deleting edge(ti, tj)

12 while edge not removed do

13 edge=increasing cost sequence[i]

14 if the edge is not the last incoming edge to tj then remove edge from G and break;

16 return G

the cost function

t G sim(t, tp)

where sim(t, tp) represents the similarity between a tag t and its parent. Here we introduce a new minimum cost tree formation algorithm which builds upon the simpliﬁed Chu Liu/Edmond s algorithm that generalizes the cost function. Concretely, the proposed folksonomy generation algorithm can be formalized as Algorithm ??. After constructing a directed weighted graph G, we next convert this graph to a tree T with minimum cost, where the cost here characterizes the structural change to G. We deﬁne the cost for deleting the edge as sim(ti, tj) di,j, where di,j represent the shortest path length from ti to tj in the graph which excludes the edge (ti, tj). After deleting an edge, the two corresponding nodes are disconnected and we need to identify a new shortest route that connect these two nodes. The intuition is we want to maintain the coherence of the structure after deleting edges such that more similar tags tend to stay closer to each other. To do so, in each iteration, we calculate the cost of deleting each remaining edge in the graph G , and then ﬁnd an edge with minimum cost which is not the last remaining edge pointing to the corresponding child node (so the node is not isolated). The algorithm stops when n 1 edges are left, with each node having exactly one parent. Thus our goal is to minimize the structure conservation cost function for converting G to a tree T :

e G,e/ T sim(ti, tj) di,j

However, this algorithm is computationally costly as whenever a new edge is deleted, it is required to recompute the shortest path between each pair of nodes. This is an O(E2) shortest path calculation. Hence, we provide an approximation for the calculation shown in algorithm ??, where we only calculate the cost of deleting each edge in the original graph. According to the cost from low to high, we iteratively

delete the edges until there is a unique parent for each node. Finally, the output is a location-sensitive folksonomy.

Algorithm 2: Approximation Algorithm

Input: Tag vectors

1 Constructing the Directed Weighed Graph G according to lines in Algorithm ??

2 for each edge (ti, tj) do

3 G remove edge (ti, tj) from graph G

4 Find shortest path from ti to tj in G

5 Calculate cost of deleting edge(ti, tj)

6 while n(edge)>n-1 do

7 for each edge in increasing cost do

8 if the edge is not the last incoming edge to tj then Remove edge from G;

Folksonomy-Informed Proﬁling We turn in this section to apply the location-sensitive folksonomy for proﬁle construction. We begin by ﬁnding candidate tags from folksonomy, and then embedding these candidates in a learning-to-rank framework for ordering the tags.

Finding Candidate Tags. Given a user s seen tag proﬁle Ps(u), we ﬁrst leverage the location-sensitive folksonomy and select a set of candidate tags. To accomplish this, we locate each seen tag in Ps(u) in the folksonomy and collect parent, children, and sibling tags of this seen tag as candidate tags. The hierarchical structure acts as a good ﬁlter and thus controls the number and quality of candidate tags. Then we order the candidate tags according to different strategies for prediction. The formal deﬁnition for this problem is given user u and a set of candidate tags Tc(u) = {t1, ..., tk}, we aim to ﬁnd a scoring function to rank tags in Tc(u) for u.

Ranking Candidate Tags. We adopt a learning to rank approach for personalized candidate tag ranking. The advantage is that it automatically assigns optimum weight for each feature. We apply a pairwise learning algorithm Rank SVM(?). Here we consider each user as a query and we assign each candidate tag an integer ranking score in the range of 3 to 1 according to its actual count in the user s unseen proﬁle. Rank SVM ﬁrst generates a set of pairwise constraints and then transform the problem to a two-class classiﬁcation problem according to those constrains and an SVM model is learned. Finally, in the ranking phase, rank scores are calculated according to the margin value. Note that we train the model with the training set and an L2 regularization term is added to prevent overﬁtting. Here we introduce a set of features that we rely on to generate a preference order of the candidate tags for prediction in Table ??. A total of 13 features are used for training the model include features introduced above. Features can be grouped into three categories: user speciﬁc features, tag features, and folksonomy structure features. User speciﬁc features include fu(s), fu(t), Hsim, Sumsim, Hfsim, Sw1 mv,

Features Descriptions f(s) log scale overall frequency(freq) of the seed tag f(t) log scale overall freq of the tag fu(s) log scale unique user freq of the seed tag fu(t) log scale unique user freq of the tag Ssim similarity with seed tag Hsim highest similarity with existing tags Hfsim Hsim weighted by freq Sumsim sum of similarity with all seen tags

Sw1 mv sum of similarity with existing seed tags weighted by f(s)

Sw2 mv sum of similarity with existing seed tags weighted by fu(s) pcnt freq of the candidate tag being a parent scnt freq of the candidate tag being a sibling ccnt freq of the candidate tag being a child

Table 1: Features for ranking candidate tags from the location-sensitive folksonomy.

Sw2 mv. These features are retrieved from a user s seen proﬁle, which represents characteristics of the user. Tag features includes f(s), f(t), and Ssim. These features only provide intrinsic properties of candidate tags. And folksonomy structure features include pcnt, ccnt, scnt, which are uniquely deﬁned by the folksonomy to provide extra clues for making good predictions. The intuition here is that predicting a parent tag is more likely to be correct than a sibling or child, as parent tags are more general, having the largest overlap with the candidate. For example, inferring football to parent sports is more likely to be correct than to sibling volleyball and child football player.

Experiments Data Preparation. We rely on a Twitter list dataset containing 15 million list relationships in which the geo-coordinates of the labelers and users are known (?). In our experiments, the tags we included in the folksonomy are extracted from each list name, and users in the list will be endowed with the tags in their proﬁle. These tags contain multifaceted opinions of actual labelers, which means they can be complex and noisy. Hence, we apply text processing techniques such as case folding, stopword removal, and noun singularization. We also separate the string pattern like Food Drink into two words food and drink . We use language identiﬁcation package (?) to ﬁlter out non-English tags. To guarantee the informativeness and quality of the tags, we ﬁlter out infrequent tags with fewer than 5 labelers and 10 labelees. Twitter has a 25-character length limit for list names, and empirically we ﬁnd nearly all list names do not exceed three words. We also include bigrams. Finally, the size of tagset is 10,489.

Proﬁle Prediction Setup. For each of nine selected locations, a random sample of local users is held out. We construct a location-sensitive folksonomy given the location based on the rest of whole dataset. Following that we predict the user proﬁles for users in the hold-out data. For each user, the seen tag set Pk(u) is a random 25% of his proﬁle P(u). Then we try to predict tags in the rest 75% unseen

tags.1 The result reported for every proﬁling experiment in this paper, including baselines, are based on four-fold cross validation and averaged over the nine locations.

Baselines. We consider two approaches based on collaborative ﬁltering and Bayesian personalized ranking as baselines.

Collaborative Filtering-K Nearest Neighbor(CF-KNN). In CF-based proﬁling approach, we ﬁrst identify the top-k local users that share the most similar tags with the target user. To maintain consistency with other approaches, we assume each user proﬁle only contains 1/4 of the tags). Here, we apply cosine similarity to measure user similarity. Then, we aggregate the tags of the 50 nearest neighboring users weighted by their similarity and make predictions based on decreasing tag frequency in the collective neighbor proﬁle.

Bayesian Personalized Ranking-Matrix Factorization(BPRMF). We consider these tags as implicit feedback and our goal is identifying an optimal preference ranking of tags for each user. We thus experiment with two variations of stateof-the-art Bayesian personalized ranking criteria (?). In the ﬁrst setting, we train a unique model for each location by only considering its local users, denoted as LBPR . We model a user i s afﬁnity to tag j as rij = piqj + bj, where pi and qj represent latent factor of user and tag, respectively. bj represents the overall preference of tag j. In the second setting, we train with whole dataset and explicitly model location-aware preferences, denoted as LABPR . We deﬁne a user i s afﬁnity to tag j as as rij = piqj + gl(i)j + bj, where latent factor gl(i)j represents the regional popularity of tag j at the user i s home location (?). For reproducibility, the number of negative samples, number of iterations, number of user and tag latent factors are set as 200, 80, 20 respectively. Regularization weights are set as 0.02.

Evaluation Metrics. The evaluation metrics we use are Precision@k (P@K) and Actual Frequency@k (AF@k). P@k measures how reliable predictions can be made. A high P@k value implies users have been labeled with the predicted tag, while high AF@k represents that users have been labeled many times with the predicted tags. Both measurements reported later are averaged over the test data. We consider the quality of prediction for the top-1 tag as well as top-5 tags. AF@k is deﬁned as AF@k = k fu(tk)/k, where tk is the kth predicted tag. We now turn to the task of user proﬁle construction based on location-sensitive folksonomies. We ﬁrst compare the performance based on different ranking strategies, followed by proﬁling performance across different folksonomy generation approaches and local and general versions. Finally, we compare location-sensitive folksonomy informed proﬁling with the other baselines.

Comparing Ranking Strategies

Given the candidate tags identiﬁed from the folksonomy, our goal is to generate a personalized ranking over these tags so

1We only consider users with overall more than 10 tags.

Methods P@1 P@5 AF@1 AF@5 Highest similarity 0.333 0.285 4.82 4.24 Freq. & similarity 0.507 0.360 14.6 6.87 Overall popularity 0.607 0.501 25.9 15.1 Sumsimilarity 0.651 0.552 25.3 18.5 Learning to rank 0.763 0.677 26.5 19.2

Table 2: Comparing Tag Ranking Approaches. We observe that the LTR based approach results in the best precision, and also identiﬁes the tags used most often (AF). marks statistical signiﬁcant difference with LTR according to paired t-test at 0.05.

that the actual tags rank top. Here we compare the learningto-rank (LTR) based approach with several baselines in Table ??, i). rank the candidate tags according to the decreasing order of similarity with the seen tag which subsumes the candidate tag. ii). rank with a hierarchical criteria, primarily according to the frequency of corresponding seen tag associated with the user and secondarily by decreasing order of similarity with the corresponding seen tag. iii). rank by overall tag popularity. iv). rank according to the aggregated similarity with the seen tag set. We observe the LTR based approach outperforms all baselines in terms of precision, indicating high rank tags are more likely to be actual tags. Moreover, we ﬁnd a similar trend in terms of AF@5, which represents the actual number of predicted tags that a user possesses. All these results imply the effectiveness of proposed features and feature weight scheme. Among the baselines, we ﬁnd the overall popularity and Sumsimilarity are relatively strong predictors.

Location Sensitive vs General Folksonomy

Methods LS F. G F. LS F. G F. P@1 P@5 P@1 P@5 AF AF MS-MC 0.754 0.663 0.653 0.571 19.2 15.2 SC-MC 0.763 0.677 0.656 0.571 19.2 15.4 MS-COS 0.751 0.656 0.662 0.521 16.4 11.9 SC-COS 0.756 0.663 0.662 0.531 18.1 12.8 MS-PMI 0.389 0.349 0.352 0.320 1.24 1.53 SC-PMI 0.402 0.362 0.363 0.332 1.21 1.62 MS-RBF 0.566 0.413 0.471 0.422 9.94 8.22 SC-RBF 0.581 0.430 0.465 0.410 10.1 12.0

Table 3: Comparing Location-Sensitive and General Folksonomies in Proﬁle Tag Prediction. All location-sensitive versions are statistical signiﬁcantly different with general versions according to paired t-test at 0.05.

In Table ??, we compare the location-sensitive folksonomy versus the general folksonomy over all design choices for the application of proﬁle construction. For each design choice, we acquire both location-sensitive and general version for each of the 9 locations mentioned in Table ??. The general folksonomy is constructed using the whole dataset excluding distance and location factors. The reported result is based on averaging 9 locations. We observe that overall, location-sensitive versions always beat its general

Methods P@1 P@5 AF@1 AF@5 tﬁdf@5 CF-KNN 0.656 0.542 25.6 18.0 36.1 LBPR 0.731 0.650 22.6 12.3 30.2 LABPR 0.771 0.673 24.1 16.2 34.6 LS Folk. 0.763 0.677 26.5 19.2 42.6

Table 4: Comparing Tag Prediction Approaches. marks statistical signiﬁcant difference with LS-Folk. approach according to paired t-test at 0.05.

counterpart in terms of precision@k and AF@5, regardless of design choice. The priority in terms of P@5 is around 0.1 and AF@5 is above by 20%. This result justiﬁes the location-sensitive folksonomy since it better captures the local knowledge structures. It also demonstrates the effectiveness of how we model distance inﬂuence. Moreover, we ﬁnd that the performance is consistent across locations. We next compare the four similarity metrics used for constructing the tag similarity graph at the heart of locationsensitive folksonomy construction. Design choices with modiﬁed conﬁdence (MC) perform best in terms of P@k, and AF@5, with folksonomy-informed version built on top of cosine similarity slightly lower, followed by RBF kernel and PMI. The extremely low performance of PMI indicates it may not be suitable for this scenario. After inspecting the folksonomy, we observe many of the hierarchical relations are incorrect or meaningless. PMI only considers cooccurrence of tags without taking the relative frequency difference into account. In our experiment design, as we don t set a minimum tag occurrence for each user to avoid sparsity, many tags only show up once on a user and it creates noise for an approach like PMI. Last but not least, we evaluate the proposed structure conservation cost function (SC) against maximum similarity (MS) baseline. SC aims to construct a folksonomy that makes the smallest change to the similarity graph. We observe in Table ?? that applying SC leads to an incremental change in proﬁle construction. For example, in the cosine case, we notice the P@5 and AF@5 are slightly better for SC. We notice about 9.5% of the relations are different among folksonomy generated using the two cost functions, meaning that the SC approach made some structure adjustments with some sacriﬁce in connecting most similar ﬁrst strategy. Considering the limited difference in the two folksonomies, this increase in performance can be attributed to a better macro-structure.

Evaluate User Proﬁling

Finally, we compare location-sensitive folksonomyinformed user proﬁling with the CF-KNN and BPR-MF baselines. As we observe in Table ??, the proposed approach outperforms the CF-KNN and locally trained BPR-MF (LBPR) in both P@k and AF@k and exhibits similar performance compared with the location-aware BPR-MF (LABPR) approach. Even though LABPR does not consider frequency which is also important, the latent factors effectively capture user preferences over tags and location-based preference for tags. However, BPR-MF

based approach is computationally costly as the dimension of user and tag increase. The CF-KNN approach is not robust in sparse condition, for example, when there are few similar users, the prediction made by CF-KNN could be very inaccurate. We leverage average TFIDF score for top ﬁve predicted tags as a metric to reﬂect how important and informative the predicted tag is to a user in the actual tag collection. The score is averaged for users and locations. We notice the proposed LS-Folk yield the highest TFIDF@5, showing the capability of identifying uniquely important tag for the user. We notice that CF-KNN and BPR-MF based approach have a strong tendency to predict general high frequency tags. For CF-KNN, highly general tags are very likely to rank top in the sequence. For BPR-MF, the implicit feedback formulation neglects the difference in importance of the seen tags and has a tendency to predict tags that are seen on many users. These tags are often on a high abstraction level and thus provide only vague insight to a user. For example, if the predicted tag is peep which is short for people , there is little new information contributed to the target user. In order to precisely proﬁle a user, we expect to have concrete and speciﬁc tags, additionally, we wish to have a diverse tag space. In contrast, we observe that the location-sensitive folksonomy-informed approach performs much better in predicting diverse speciﬁc tags.

Conclusion In this work, we explored the impact of spatial variation on the construction of location-sensitive user proﬁle. Concretely, we proposed an location-sensitive folksonomyinformed framework toward the goal of improved user proﬁling. Through extensive experiments, we have demonstrated such location-sensitive folksonomy is more effective in identifying relevant tags, and learning to rank strategy is helpful for optimizing feature weights and leads to high quality user proﬁle tags.

References Ahmed, A.; Low, Y.; Aly, M.; Josifovski, V.; and Smola, A. J. 2011. Scalable distributed inference of dynamic user interests for behavioral targeting. In SIGKDD. Backstrom, L.; Kleinberg, J.; Kumar, R.; and Novak, J. 2008. Spatial variation in search engine queries. In WWW. Bhattacharya, P.; Ghosh, S.; Kulshrestha, J.; Mondal, M.; Zafar, M. B.; Ganguly, N.; and Gummadi, K. P. 2014. Deep twitter diving: Exploring topical groups in microblogs at scale. In CSCW. Brodersen, A.; Scellato, S.; and Wattenhofer, M. 2012. Youtube around the world: geographic popularity of videos. In WWW. Brooks, C. H., and Montanez, N. 2006. Improved annotation of the blogosphere via autotagging and hierarchical clustering. In WWW. Cheng, Z.; Caverlee, J.; Barthwal, H.; and Bachani, V. 2014. Who is the barbecue king of texas?: a geo-spatial approach to ﬁnding local experts on twitter. In SIGIR. Cheng, Z.; Caverlee, J.; and Lee, K. 2010. You are where you tweet: a content-based approach to geo-locating twitter users. In CIKM. Chu, Y.-J., and Liu, T.-H. 1965. On shortest arborescence of a directed graph. Scientia Sinica.

Ghosh, S.; Sharma, N.; Benevenuto, F.; Ganguly, N.; and Gummadi, K. 2012. Cognos: crowdsourcing search for topic experts in microblogs. In SIGIR. Heymann, P., and Garcia-Molina, H. 2006. Collaborative creation of communal hierarchical taxonomies in social tagging systems. Heymann, P.; Ramage, D.; and Garcia-Molina, H. 2008. Social tag prediction. In SIGIR. Hong, L.; Doumith, A. S.; and Davison, B. D. 2013. Cofactorization machines: modeling user interests and predicting individual decisions in twitter. In WSDM. Hu, L.; Sun, A.; and Liu, Y. 2014. Your neighbors affect your ratings: on geographical neighborhood inﬂuence to rating prediction. In SIGIR. Ikeda, K.; Hattori, G.; Ono, C.; Asoh, H.; and Higashino, T. 2013. Twitter user proﬁling based on text and community mining for market analysis. Knowledge-Based Systems. Jiang, M.; Cui, P.; Liu, R.; Yang, Q.; Wang, F.; Zhu, W.; and Yang, S. 2012. Social contextual recommendation. In CIKM. Joachims, T. 2002. Optimizing search engines using clickthrough data. In SIGKDD. Kaltenbrunner, A.; Scellato, S.; et al. 2012. Far from the eyes, close on the web: impact of geographic distance on online social interactions. In Proceedings of the 2012 ACM workshop on online social networks. Krestel, R.; Fankhauser, P.; and Nejdl, W. 2009. Latent dirichlet allocation for tag recommendation. In Rec Sys. Li, J.; Ritter, A.; and Hovy, E. H. 2014. Weakly supervised user proﬁle extraction from twitter. In ACL (1). Li, R.; Wang, C.; and Chang, K. C.-C. 2014. User proﬁling in an ego network: co-proﬁling attributes and relationships. In WWW. Liu, Q.; Chen, E.; Xiong, H.; Ding, C. H.; and Chen, J. 2012a. Enhancing collaborative ﬁltering by user interest expansion via personalized ranking. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on. Liu, X.; Song, Y.; Liu, S.; and Wang, H. 2012b. Automatic taxonomy construction from keywords. In SIGKDD. Lu, H., and Caverlee, J. 2015. Exploiting geo-spatial preference for personalized expert recommendation. In Rec Sys. Lui, M., and Baldwin, T. 2012. langid. py: An off-the-shelf language identiﬁcation tool. In Proceedings of the ACL 2012 system demonstrations. Majumder, A., and Shrivastava, N. 2013. Know your personalization: learning topic level personalization in online services. In WWW. Plangprasopchok, A., and Lerman, K. 2009. Constructing folksonomies from user-speciﬁed relations on ﬂickr. In WWW. Qiu, F., and Cho, J. 2006. Automatic identiﬁcation of user interest for personalized search. In WWW. Rakesh, V.; Singh, D.; Vinzamuri, B.; and Reddy, C. K. 2014. Personalized recommendation of twitter lists using content and network information. In ICWSM. Rego, A. S. C.; Marinho, L. B.; and Pires, C. E. S. 2015. A supervised learning approach to detect subsumption relations between tags in folksonomies. In ACM Symposium on Applied Computing. Rendle, S.; Freudenthaler, C.; Gantner, Z.; and Schmidt-Thieme, L. 2009. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the twenty-ﬁfth conference on uncertainty in artiﬁcial intelligence.

Ribeiro, I. S.; Santos, R. L.; Gonc alves, M. A.; and Laender, A. H. 2015. On tag recommendation for expertise proﬁling: A case study in the scientiﬁc domain. In WSDM. Scellato, S.; Noulas, A.; Lambiotte, R.; and Mascolo, C. 2011. Socio-spatial properties of online location-based social networks. Schmitz, P. 2006. Inducing ontology from ﬂickr tags. In Collaborative Web Tagging Workshop at WWW. Sigurbjornsson, B., and Van Zwol, R. 2008. Flickr tag recommendation based on collective knowledge. In WWW. Song, Y.; Qiu, B.; and Farooq, U. 2011. Hierarchical tag visualization and application for tag recommendations. In CIKM. Tuarob, S.; Pouchard, L. C.; and Giles, C. L. 2013. Automatic tag recommendation for metadata annotation using probabilistic topic modeling. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital Libraries. Verma, C.; Mahadevan, V.; Rasiwasia, N.; Aggarwal, G.; Kant, R.; Jaimes, A.; and Dey, S. 2015. Construction and evaluation of ontological tag trees. Expert Systems with Applications. Wang, S.; Tang, J.; Wang, Y.; and Liu, H. 2015. Exploring implicit hierarchical structures for recommender systems. In IJCAI. Weng, J.; Lim, E.-P.; Jiang, J.; and He, Q. 2010. Twitterrank: ﬁnding topic-sensitive inﬂuential twitterers. In WSDM. Xia, X.; Lo, D.; Wang, X.; and Zhou, B. 2013. Tag recommendation in software information sites. In MSR. Yin, H.; Cui, B.; Chen, L.; Hu, Z.; and Huang, Z. 2014. A temporal context-aware model for user behavior modeling in social media systems. In SIGMOD. Zhang, H.; Korayem, M.; You, E.; and Crandall, D. J. 2012. Beyond co-occurrence: discovering and visualizing tag relationships from geo-spatial and temporal similarities. In WSDM. Zhao, Z.; Cheng, Z.; Hong, L.; and Chi, E. H. 2015. Improving user topic interest proﬁles by behavior factorization. In WWW. Zhong, E.; Liu, N.; Shi, Y.; and Rajan, S. 2015. Building discriminative user proﬁles for large-scale content recommendation. In SIGKDD. Zhu, X.; Ming, Z.-Y.; Hao, Y.; and Zhu, X. 2015. Tackling data sparseness in recommendation using social media based topic hierarchy modeling. In IJCAI.