# constrained_preference_embedding_for_item_recommendation__277f56e3.pdf Constrained Preference Embedding for Item Recommendation Xin Wang, Congfu Xu , Yunhui Guo, Hui Qian College of Computer Science and Technology, Zhejiang University, China {cswangxinm, xucongfu, gyhui, qianhui}@zju.edu.cn To learn users preference, their feedback information is commonly modeled as scalars and integrated into matrix factorization (MF) based algorithms. Based on MF techniques, the preference degree is computed by the product of user and item vectors, which is also represented by scalars. On the contrary, in this paper, we express users feedback as constrained vectors, and call the idea constrained preference embedding (CPE); it means that we regard users, items and all users behavior as vectors. We find that this viewpoint is more flexible and powerful than traditional MF for item recommendation. For example, by the proposed assumption, users heterogeneous actions can be coherently mined because all entities and actions can be transferred to a space of the same dimension. In addition, CPE is able to model the feedback of uncertain preference degree. To test our assumption, we propose two models called CPE-s and CPE-ps based on CPE for item recommendation, and show that the popular pair-wise ranking model BPR-MF can be deduced by some restrictions and variations on CPE-s. In the experiments, we will test CPE and the proposed algorithms, and prove their effectiveness. 1 Introduction How to represent customers behavior is an important aspect for designing item recommendation algorithms. Unfortunately, there are no general and ideal solutions for different application scenarios so far. As for modeling users explicit feedback such as rating scores, a successful assumption is to represent them as different integers. The primarily methods for leveraging them are matrix factorization (MF) techniques, according to which, users and items are represented by lowrank latent factors (i.e., numeric vectors), and preference degree is computed by the product of related vectors. However, absolutely correlating scalars with users feedback may lead to some problems. For example, although the rating scores are uniformly distributed, the preference degree may not be Corresponding author liner. As we know, users attitudes tend to be following a long-tail distribution, which means most users prefer giving 3, 4 and 5 stars, and hence the difference between 3 stars and 1 star should be more obvious than the difference between 5 stars and 3 stars. What s more, in applications, users behavior is not limited to rating scores, but can be heterogeneous actions. For example, a user may give some tags to a favorite book, add a pair of shoes into a shopping cart and visit some pages about children s clothing. Traditional MF based approaches may face two problems in this situation. First, it is difficult to ascertain the preference degree of those actions. Different from modeling rating scores, translating giving some tags and adding an item into a shopping cart into real numbers is a difficult task because we cannot exactly assign some values to the preference degree. The situation is even worse when we are not sure whether one type of behavior is more positive than another one. Most MF based algorithms ignore those problems and directly express all the heterogeneous feedback as integer 1 and the unobserved correlations as 0, and hence fail to capture differentiated information from each type of feedback. Second, for most MF based item recommendation algorithms, different kinds of preference information is finally translated into the same user and item latent space, hence the value in each dimension is hard to explain. Simply embedding heterogeneous information into two types of vectors is inadequate and inflexible when we import more and more kinds of information from e-commerce sites into recommendation algorithms. To deal with the discussed problems, in this paper, we introduce a novel method called constrained preference embedding (CPE) to model users behavior. For CPE, we no longer regard the behavior information as numerical values, but embed them in a high-dimensional space together with users and items. In other words, all entities and feedback are represented by d dimensional vectors, e.g., the rating scores from one to five are expressed as five vectors. Then, a modified add approximation is employed to model (user, item) correlations. This process is similar to some knowledge relationship mining methods for relationship discovery[Bordes et al., 2013; Chen et al., 2013], but we emphasize on fine grained actions with degree information. For CPE, the L2-norm of the feedback vectors is used to represent preference degree in a relative way, and hence it avoids correlating them with explicit Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) numbers. Another advantage of the proposed idea is that, users behavior can share an isomorphic structure and be congruently modeled by a unified method. Because we express preference degree in a relative way, each type of behavior will be assigned an auto-adjusted satisfaction value according with datasets rather than an absolute number. In addition, by embedding users feedback, CPE can translate different kinds of behavior information to feedback vectors instead of restricting it in user and item space. In section 2, we start by discussing matrix factorization (MF) techniques and talk about some related MF based item recommendation approaches. Then, we introduce our constrained preference embedding (CPE) method in section 3. Based on CPE, we propose two item recommendation algorithms CPE-s and CPE-ps in section 4. In section5, we test CPE s performance on real-world datasets. Finally, we draw conclusion in section 6. 2 Background 2.1 Matrix Factorization Based on the low-rank assumptions, the matrix factorization methods[Koren et al., 2009; Koren, 2008] represent each user and item as a d dimensional vector. The preference of user u for item i is represented as a scalar ru,i 2 R. If we denote the vector of u as vu 2 Rd,1 and the vector of i as vi 2 Rd,1, the preference of u for i can be predicted by v T u vi. In order to learn the factors, a probabilistic version of MF (PMF)[Salakhutdinov and Mnih, 2008] assumes ru,i to be sampled from Gaussian distribution with the mode of v T u vi, and tries to optimize the maximum posterior on all observed (u, ru,i, i) triplets. 2.2 Preference Ranking To learn a ranked list of items, some related point-wise, pairwise and list-wise preference ranking algorithms have been proposed based on matrix factorization assumptions. For example, i MF[Hu et al., 2008; Lin et al., 2014] and OCCF[Pan et al., 2008] consider both observed and unobserved (user, item) correlations in a point-wise way and try to optimize the following loss function: (u,i)2D1[D0 u vi ru,i)2 + (1) where D1 and D0 are observed set and unobserved set, and is the regularization term λu||vu||2 2 + λi||vi||2 2. The MF based pair-wise methods are similar to i MF or OCCF, but they adopt different preference comparing structures. For example, the state-of-the-art pair-wise algorithm BPR-MF[Rendle et al., 2009; Pan and Chen, 2013] assumes user u may prefer an observed item i than an unobserved item j, and directly optimizes their relationships by logistic regression with the parameter (v T u vj). Finally, we maximize the following equation: (u,i)2D1,(u,j)2D0 u vj) + (2) For MF based list-wise algorithms, each action is compared with a list of actions. For example, CofiRank[Weimer Rating matrix Latent factors 1 2 , ,..., n u u u V V V 1 2 , ,..., m i i i V V V 1 2 5 , ,..., f f f V V V Traditional MF express users and items as vectors and predict the rating scores by the product of users and items vectors: CPE regard rating scores as vectors, and directly mine the three types of vectors in the high dimensinal space by add approximation approach and preference constraint assumption: Rating scores are scalar values u i f + v v v 1 2 3 4 5 2 2 2 2 2 || || || || || || || || || || f f f f f > > > > v v v v v , T u i u i r v v Rating scores are vectors Illustration of CPE: Figure 1: Illustration of CPE. et al., 2007] is proposed to directly optimize NDCG[Valizadegan et al., 2009] scores based on maximum margin matrix factorization models, and List Rank[Cao et al., 2007; Shi et al., 2010] tries to optimize a function of cross entropy of item lists based on PMF. As we can see from the above loss functions, those MF based algorithms are powerful for modeling isomorphic feedback with assigned preference degree information, but some of them may be inflexible and ineffective when the following situations occur: (1) It is difficult to determine the preference degree p(f) of feedback f (e.g., {p(fa) =?, p(fb) =?, p(fc) =?}). (2) We are only given the relative positiveness of some types of feedback(e.g., {p(fa) < p(fb), p(fc) < p(fd)}). (3) We try to leverage heterogeneous feedback and com- bine them in a single model (e.g., {p(fa) = 1, p(fb) = 2, p(fc) =?, p(fd) < p(fe), p(fg) =?, p(fh) =?, ...}). where, p( ) denotes preference degree. Note that, here f is a type of feedback, e.g., giving 2 stars , click or browse . For example, if we denote giving two stars as feedback b, then p(fb) may be 2, i.e., p(fb) = ri,j if ri,j = 2. 3 Constrained Preference Embedding In this section, we introduce our proposed method constrained preference embedding (CPE) and discuss some of its characteristics. We denote a kind of feedback f as vf 2 Rd,1. Therefore, the rating scores {1, 2, 3, 4, 5} are represented as {v(1 star), v(2 stars), v(3 stars), v(4 stars), v(5 stars)}, and users preference can be optimized by the following function: min Lu2U,f2F,i2I(vu, vf, vi) (3) where U, I and F are user, item and feedback set, vu and vi are user vector and item vector. In this paper, we assume users, items and preference obey add approximation rule, which means vu + vf should be close to vi if u gives f to i.The idea is illustrated in Figure 1. Based on the rating matrix and add approximation, we should optimize vu1 + v1 star ! vi2, vu1 + v5 stars ! vi5, vu3 + v1 star ! vi3 and vu4 + v2 stars ! vi5, where ! indicates approximates . For item recommendation, we also need to introduce collaborative filtering[Sarwar et al., 2001] features and preference degree information by giving some constraints on users feedback. We assume that if u has strong preference for i, vu and vi should be close; if u has similar preference for i1 and i2, vi1 and vi2 should be close; if u1 and u2 have similar preference for i, vu1 and vu2 should be close. To achieve this, we control the L2-norm (i.e., vector norm) of vf to make sure that it is smaller if f is more positive. Because the square root of L2-norm of vf (i.e., ||vf||2) is the Euclidean distance, based on the former rules, the stronger the preference of u for i, the shorter the distance between vu and vi. It means that the correlated (user, item), (user, user) or (item, item) should have similar direction and length in terms of their vectors. 4 CPE for Item Recommendation In this section, we propose two item recommendation methods called CPE-s and CPE-ps based on our CPE assumption. For CPE-s, we adopt add approximation rule and pair-wise preference comparison strategy, and optimize a loss function with soft constraints. CPE-ps is similar to CPE-s, but it is based on vector projection. In this paper, we primarily consider rating scores and unobserved (user, item) correlations (denoted as unobserved feedback ) for studying our models. 4.1 CPE-s We assume that if user u has a strong preference for item i, she will give it a higher rating score. Therefore, our task is to minimize the difference between vu + vf and vi for each triplet (u, f, i). Given users rating matrix, the loss function on the overall triplets is as follows: d(vu + vf vi)2 + s.t. {||vfp||2 2 < ||vfq||2 2 | p > q}, fp, fq 2 F where (u, f, i) 2 D denotes the observed and unobserved triplets; d( ) is Euclidean distance; controls the norm of the parameters. For the rating scores, vfk is the vector of giving k star(s) action and comes from the set {vfk|k = 1, 2, 3, 4, 5}, and the unobserved feedback is denoted as vf0. For our model, fp is assumed to be more positive than fq for all p > q, which means the correlation of observed triplets are stronger than the unobserved ones. All the vectors are randomly constructed. The possible vu, vi and vf in the above loss function are optimized with L2regularization terms. This constraint is important for CPE-s because it prevents the learning algorithm to trivially minimize the optimization function by artificially increasing the norms of vu, vi and vf. Instead of directly optimizing the loss function with constraints on preference, we convert them to the following soft unconstrained function: d(vu + vf vi)2 + e ln C(||vfq||2 where p and q belong to F; e is hyperparameter; wfp,fq denotes the weight of the preference comparison between fp and fq; it is computed by the product of fp s and fq s frequency in the triplets, and guarantees that the feedback priority can be adjusted according to the datasets instead of some predetermined values. Here C(x) can be some monotonic increasing functions. We here adopt the sigmoid function. 4.2 CPE-ps For CPE-s, if both u1 and u2 give i the same rating score, vu1 and vu2 should be very similar, which means CPE-s may suppress exploiting personalized information. The reason leading to the consequence is that for the elements in F (i.e., {fk|k = 0, 1, 2, 3, 4, 5}), we keep the same user and item vectors. To overcome the shortcomings, we adopt a projection method inspired by [Wang et al., 2014b] to model vector relationships. We create a plane Pf with the normal vector wf for each type of feedback vf. Then, for each triplet (vu, vf, vi), we project vu and vi to Pf, denoting the projected vectors as v?,u and v?,i respectively. Finally, we optimize (v?,u, vf, v?,i) similar to CPE-s, where v?,u and v?,i are computed according to the following equations: v?,u = vu w T v?,i = vi w T Therefore, the loss function can be described as d(v?,u + vf v?,i) and is illustrated in Figure 2(a). The L2-norm of wf is restricted to 1 to control v?,u and v?,i. Based on our CPE assumption, the loss function is d(v?,u + vf v?,i)2 2 < ||vfq||2 2 | p > q}, fp, fq 2 F 2 = 1, f 2 F f vf/||vf||2 , f 2 F where d( ) is the Euclidean distance; w T f vf/||vf||2 guarantees that vf is in the translated plane. Similar to CPEs, we do not directly optimize the above function but convert it to the following soft unconstrained loss function: d(v?,u + vf v?,i)2 + wf ln C(||vfq||2 where k|wf||2 2 is constrained to 1 in the learning procedure; wf is the weight of f. CPE-ps is better than CPE-s because it can help exploit more personalized information. For example, vu1 and vu2 can be different even if v?,u1 equals to v?,u2. That is, users preference can be shared through projected correlations, while some personalized information can be remained. 4.3 Learning and Item Recommendation The proposed algorithms are carried out by stochastic gradient descent (SGD) to optimize vu, vf and vi. Specifically, 3 d = 2 d = ( ) a ( ) b Figure 2: Illustration of CPE-ps and SCPE. for preference embedding and positiveness learning, in each iteration, we randomly sample t1 triplets according to the distribution of each feedback in D1 and t2 unobserved feedback, and optimize them according to E.q.(5) and E.q.(8) until they converge to a stable state. Note that for item recommendation in the experiments, we directly update the related vu and vi on ln C(||vfq||2 2) part for better learning efficiency and performance. The form can be expressed by E.q.(13), and we will discuss it later. The u s preference for i is computed according to E.q.(9) for CPE-s, and E.q.(10) for CPE-ps, and the top k items with the greatest pu,i are recommended to u. pu,i = 1/(||vi vu||2 pu,i = 1/(||v?,i v?,u||2 2 + 1) (10) where p denotes preference degree. 4.4 Discussion In this section, we discuss some features of CPE and explain why they are flexible for preference learning. Semi-constrained preference embedding (SCPE). An advantage of CPE is that it can model users heterogeneous actions in a unified way. In applications, although the ratings can be directly modeled by the scores, it is hard to specify some types of implicit feedback (e.g., click and browse ) to a certain preference degree, because we are not sure, for example, whether click and browse are more positive than giving 2 stars . The MF based algorithms cannot directly model the latter information well; a compromise is to represent all types of users behavior as implicit feedback, and correlate them with 1 (e.g., BPR-MF). Those approaches may have some shortcomings because the diversity of behavior information is lost. On the contrary, a modified CPE method called semi-constrained preference embedding (SCPE) may provide a possible fine grained solution. Specifically, for SCPE, all the elements in F are considered in the add approximation part, but the L2-norm of the unknown feedback is not constrained in the learning process. For example, given the set {fk|k = a, b, c, d, e} and the experience that p(fa) > p(fb), p(fc) > p(fd), and p(fe) =?, the loss function can be min L(vu, vi, vfa, vfb, vfc, vfd, vfe) C1(||vfb||2 2) C2(||vfd||2 where fe is modeled in L( ) to help correlate (user, item) pairs and predict users potential preference, which is illustrated in Figure 2(b). We also find that besides benefiting item recommendation, SCPE can also help learn preference degree of fe by the help of fi2a,b,c,d. Therefore, besides for item recommendation, SCPE may also be used to study whether a user s behavior is positive or not according to some assistant behavior. We will study it in the next section. Relation between CPE-s and BPR-MF. As we discussed above, BPR-MF is an effective pair-wise ranking algorithm for item recommendation, and its loss function is represented in E.q.(2). For our CPE-s, if we assume C( ) in E.q.(5) as logistic regression, and denote fp as positive feedback and fn as negative feedback, the main part of the soft preference restriction can be expressed as ln σ(||vfn||2 ||vfp||2) (12) With pair-wise CPE assumption, for two triplets (u, fp, i) and (u, fn, j), the distance between vu and vi should be shorter than the distance between vu and vj. Based on add approximation, we replace vfp with vi vu and vfn with vj vu, and constrain L2-norm of item vectors to a constant c. Hence, our task is to maximize the following function: ln σ(||vj vu||2 2 ||vi vu||2 u vj] + ||vi||2 s.t. ||vi||2 = ||vj||2 = c As we can see from E.q.(13), by some restrictions and variations, we can deduce BPR-MF with a constraint of item L2norm by pair-wise CPE. This reveals the potential relation between CPE-s and BPR-MF, and also shows why it is reasonable to express pu,i as 1/(||vi vu||2 5 Experiments 5.1 Datasets and Evaluation Metrics In this section, we study CPE on some real-world datasets in different domains and categories. The number of users, the number of items and the sparsity information is listed in Table 1. The first 5 datasets are from Amazon.com[Mc Auley and Leskovec, 2013]; the movies, books and music datasets are from Dou Ban.com[Wang et al., 2014a]. Each dataset is subdivided into three parts; 80% of it is used for training, 10% is used for validation and the last 10% is left for test. The evaluation metrics we used are NDCG[Valizadegan et al., 2009],Precision, and F1[Wang et al., 2014a] scores. 5.2 Baselines and Parameter Settings To comprehensively study our proposed models, we compare them with the following state-of-the-art ranking methods: Table 1: Datasets used in the experiments. dataset #users #items sparsity beauty (Amazon) 167,725 29,004 0.99995 tools&games (Amazon) 283,514 51,004 0.99997 clothing&accessories(Amazon) 128,794 66,370 0.99993 shoes (Amazon) 73,590 48,410 0.99989 industrial&scientific (Amazon) 29,590 22,622 0.99980 movies (Douban) 5,664 10,013 0.97423 books (Douban) 10,024 10,115 0.99280 music (Douban) 10,208 11,200 0.99157 point-wise: i MF (explicit, implicit) pair-wise: BPR (implicit), GBPR (implicit) list-wise: List Rank (explicit), CofiRank (explicit) The baselines are carefully chosen to make sure that each algorithm is typical in each discussed class. List Rank [Cao et al., 2007; Shi et al., 2010] and CofiRank [Weimer et al., 2007] are mainly for optimizing observed (user, item) correlations, while i MF [Hu et al., 2008; Lin et al., 2014], BPR [Rendle et al., 2009] and GBPR [Pan and Chen, 2013] consider both observed and unobserved correlations. Some details of those methods are discussed in the previous sections. For all the approaches, the learning rate is set to 0.05, and the latent dimension is set to 10 (i.e., d = 10). The regularization coefficient is selected from {1, 0.1, 0.01, 0.001}; t1 = t2 = 2; Because the adopted weighted sampling method for SGD, we set e, wfp,fq and wf to 1. For GBPR-MF, the group size is 3. 5.3 Analysis of Preference Embedding We reconstruct the L2-norm of the feedback vectors (i.e., {vfi|i = 0, 1, 2, 3, 4, 5}) after learning CPE-s or CPE-ps, and illustrate them by column charts in Figure 3. Due to the limited space, we only provide the results learned by CPE-s. According to the figure, ||vf0||2 is much greater than ||vfi2{1,2,3,4,5}||2 for all datasets; it implies that whether a user gives a rating score or not to an item is more significant than which score she chooses. The results are due to the fact that the number of (u, f0, i) triplets is much larger than other triplets , and the assumption that the members in {fi|1, 2, 3, 4, 5} should be more positive than f0. Hence, the difference between ||vf0||2 and ||vfi2{1,2,3,4,5}||2 has an important influence on model optimization. With the same reason, we can also explain why the difference between ||vfi2{4,5}||2 and ||vfj2{1,2}||2 is greater comparing with ||vf4||2 ||vf5||2 or ||vf1||2 ||vf2||2. Hence, the length of feedback vectors learned by our models can intuitively reflect the behavior distribution on the real-world datasets. 5.4 Analysis of Item Recommendation The item recommendation results on 8 real-world datasets are listed in Table 2. It is interesting that the selected point-wise and pair-wise algorithms are better than the list-wise ranking methods, which is inconsistent with our intuition. The reason for the outcomes is that the compared list-wise ranking methods mainly consider explicit rating scores and ignore unobserved (user, item) pairs. For our sparse datasets, CofiRank U 1 2 3 4 5 0 0.02 0.04 0.06 0.08 0.12 0.14 0.16 0.18 beauty 2 || || f U 1 2 3 4 5 0 0.1 tools... 2 || || f U 1 2 3 4 5 0 clothing... U 1 2 3 4 5 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 shoes 2 || || f U 1 2 3 4 5 0 movies 2 || || f U 1 2 3 4 5 0 0.05 books 2 || || f Figure 3: L2-norm of the learned feedback vectors. Here U represents the unobserved (user, item) pairs, and the numerical numbers denote the feedback about related rating scores, with the bar indicating their vector length. 0 20 40 60 80 100 0.04 Outer iterations [beauty] 0 2 4 5 ( ) ) ) ( ( ( ) p f p f p f p f < < < 1 : ( ) ? d vector f = 3 : ( ) ? vector d f = 0 20 40 60 80 100 0.04 Outer iterations [beauty] 1 0 3 5 ( ) ) ) ( ( ( ) p f p f p f p f < < < 2 : ( ) ? d vector f = 4 : ( ) ? vector d f = Figure 4: L2-norm comparison of two feedback vectors learned by CPE with some assistant feedback. and List Rank cannot take full advantage of the available data due to the shortage of the former information. It is obvious that our proposed models are better than BPR and GBPR on most datasets. On average, CPE-s (or CPE-ps) can help improve 14.75% precision-5, 15.41% recall-5, and 16.17% NDCG-5 due to the preference embedding assumptions. First, CPE-s and CPE-ps consider 6 different levels of users feedback, but BPR and GBPR primarily focus on binary information. Therefore, our models can exploit more information than BPR and GBPR. Second, our methods transfer each type of behavior to its related vector rather than restrict it in user and item space. Therefore, they are more effective comparing with other baselines. The latent dimension d 2 {3, 5, 10, 15} is changed to test model stabilities of the compared approaches. The outcomes of NDCG-5 on beauty and clothing&accessories are shown in Figure 6. We find that when d is large, the advantage of CPE is obvious; it is because the feedback vectors may help capture more preference related information as well as loose the low-rank assumption when d is bigger. The benefit is reducing with the decreasing of d, but due to the effectiveness of preference embedding, the performance of CPE is still better than the baselines. Finally, we study NDCG-K, precision K and recall-K scores with different recommendation size K 2 {1, 3, 5, 7}, and show some selected results in Figure 6. It is clear that CPE-s and CPE-ps are stable when K varies. Table 2: Prediction, recall and NDCG scores on 8 real-world datasets. The size of recommendation list is 5. metric method beauty[a] tools& games[a] clothing& accessories[a] shoes[a] industrial &scientific[a] movies[d] books[d] music[d] Precision-5 i MF 0.0306 0.0075 0.1054 0.1212 0.0434 0.1125 0.0231 0.0378 BPR 0.0380 0.0093 0.1456 0.1845 0.0597 0.1403 0.0405 0.0570 GBPR 0.0374 0.0094 0.1502 0.1896 0.0575 0.1444 0.0429 0.0598 List Rank 0.0149 0.0029 0.0482 0.0802 0.0232 0.0903 0.0170 0.0138 CofiRank 0.0164 0.0039 0.0607 0.0819 0.0285 0.1043 0.0256 0.0242 CPE-s 0.0431 0.0119 0.1731 0.1957 0.0780 0.1434 0.0440 0.0619 CPE-ps 0.0443 0.0121 0.1669 0.2092 0.0802 0.1410 0.0460 0.0637 i MF 0.1232 0.0288 0.2919 0.3484 0.0889 0.0139 0.0085 0.0128 BPR 0.1499 0.0358 0.3788 0.4880 0.1181 0.0202 0.0169 0.0230 GBPR 0.1477 0.0366 0.3887 0.5005 0.1152 0.0213 0.0179 0.0247 List Rank 0.0663 0.0131 0.1656 0.2612 0.0537 0.0106 0.0054 0.0023 CofiRank 0.0754 0.0174 0.1993 0.2715 0.0641 0.0089 0.0082 0.0045 CPE-s 0.1678 0.0458 0.4414 0.4895 0.1504 0.0207 0.0187 0.0272 CPE-ps 0.1729 0.0468 0.4108 0.5416 0.1652 0.0208 0.0195 0.0278 i MF 0.0996 0.0239 0.2327 0.2709 0.0819 0.1134 0.0229 0.0391 BPR 0.1305 0.0305 0.3281 0.4344 0.1184 0.1427 0.0417 0.0601 GBPR 0.1273 0.0313 0.3398 0.4509 0.1137 0.1437 0.0441 0.0632 List Rank 0.0480 0.0083 0.1215 0.1912 0.0408 0.1003 0.0180 0.0142 CofiRank 0.0526 0.0128 0.1443 0.1978 0.0565 0.1097 0.0271 0.0254 CPE-s 0.1510 0.0398 0.3982 0.4355 0.1518 0.1462 0.0456 0.0673 CPE-ps 0.1569 0.0411 0.3685 0.4870 0.1620 0.1484 0.0483 0.0683 3 5 10 15 0 BPR GBPR CPE s CPE ps 3 5 10 15 0 d [clothing&accessories] BPR GBPR CPE s CPE ps Figure 5: Performance comparison with different latent dimension d. 5.5 Analysis of Semi-CPE (SCPE) We use rating scores and unobserved correlations to study SCPE s ability on identifying behavior s positiveness. Specifically, two types of feedback fp and fq are randomly selected from {fk|k = 0, 1, 2, 3, 4, 5} to simulate unknown behavior, then every feedback vectors are embedded by SCPE-s, with fp and fq not constrained in the optimizing procedure. We will test whether SCPE-s can correctly rank p(fp) and p(fq). In the experiments, we choose the following two cases: {p = 3, q = 1} and {p = 4, q = 2}. ||vf1||2 and ||vf3||2 learned by SCPE-s on beauty dataset are plotted in Figure 4(a), and ||vf2||2 and ||vf4||2 are shown in Figure 4(b). It is clear that when the algorithm converges, ||f1||2 is persistently greater than ||f3||2, and ||f2||2 is greater than ||f4||2, which means that, with other assistant feedback, our SCPE-s can automatically infer that giving 3 scores is more positive than giving 1 score , and giving 4 scores is more positive comparing with giving 2 scores . The results are in line with what we expected, and can be explained by collaborative filtering features of SCPE discussed before. Therefore, SCPE may provide a possible way for analyzing and comparing users heterogeneous feedback for e-commerce websites. 1 3 5 7 0.1 BPR GBPR CPE s CPE ps K [clothing&accessories] BPR GBPR CPE s CPE ps 1 3 5 7 0.04 Precision K BPR GBPR CPE s CPE ps BPR GBPR CPE s CPE ps Figure 6: Performance comparison with different recommendation list size. 6 Conclusion In this paper, we introduced the constrained preference embedding (CPE) assumption and two models (i.e., CPE-s and CPE-ps) for preference learning. We discussed semiconstrained preference embedding (SCPE) and showed its effectiveness on modeling users feedback of uncertain preference degree. Finally, we demonstrated the relationship between CPE and BPR-MF. In the experiments, CPE is proved effective from different perspectives. Acknowledgements This work is supported by National Natural Science Foundation of China (Grant No: 61272303 and Grant No: 61472347). We thank Dr. Zhongyuan Wang in MSRA for helpful discussions on embedding techniques. [Bordes et al., 2013] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems, NIPS 13, pages 2787 2795, 2013. [Cao et al., 2007] Zhe Cao, Tao Qin, Tie-Yan Liu, Ming- Feng Tsai, and Hang Li. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning, ICML 07, pages 129 136. ACM, 2007. [Chen et al., 2013] Danqi Chen, Richard Socher, Christo- pher D Manning, and Andrew Y Ng. Learning new facts from knowledge bases with neural tensor networks and semantic word vectors. ar Xiv preprint ar Xiv:1301.3618, 2013. [Hu et al., 2008] Yifan Hu, Yehuda Koren, and Chris Volin- sky. Collaborative filtering for implicit feedback datasets. In Proceedings of the 8th International Conference on Data Mining, ICDM 08, pages 263 272. IEEE, 2008. [Koren et al., 2009] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30 37, 2009. [Koren, 2008] Yehuda Koren. Factorization meets the neigh- borhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, SIGKDD 13, pages 426 434. ACM, 2008. [Lin et al., 2014] Christopher H. Lin, Ece Kamar, and Eric Horvitz. Signals in the silence: Models of implicit feedback in a recommendation system for crowdsourcing. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, AAAI 14, pages 22 28, 2014. [Mc Auley and Leskovec, 2013] Julian Mc Auley and Jure Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems, Rec Sys 13, pages 165 172. ACM, 2013. [Pan and Chen, 2013] Weike Pan and Li Chen. Gbpr: Group preference based bayesian personalized ranking for oneclass collaborative filtering. In Proceedings of the 23th International Joint Conference on Artificial Intelligence, IJCAI 13, pages 2691 2697. AAAI Press, 2013. [Pan et al., 2008] Rong Pan, Yunhong Zhou, Bin Cao, Nathan Nan Liu, Rajan Lukose, Martin Scholz, and Qiang Yang. One-class collaborative filtering. In Proceedings of the 8th International Conference on Data Mining, ICDM 08, pages 502 511. IEEE, 2008. [Rendle et al., 2009] Steffen Rendle, Christoph Freuden- thaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, pages 452 461. AUAI Press, 2009. [Salakhutdinov and Mnih, 2008] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems, volume 20 of NIPS 08, 2008. [Sarwar et al., 2001] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, WWW 01, pages 285 295. ACM, 2001. [Shi et al., 2010] Yue Shi, Martha Larson, and Alan Han- jalic. List-wise learning to rank with matrix factorization for collaborative filtering. In Proceedings of the 4th ACM Conference on Recommender Systems, Rec Sys 10, pages 269 272. ACM, 2010. [Valizadegan et al., 2009] Hamed Valizadegan, Rong Jin, Ruofei Zhang, and Jianchang Mao. Learning to rank by optimizing ndcg measure. In Advances in Neural Information Processing Systems, NIPS 09, pages 1883 1891, 2009. [Wang et al., 2014a] Xin Wang, Weike Pan, and Congfu Xu. Hgmf: Hierarchical group matrix factorization for collaborative recommendation. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pages 769 778. ACM, 2014. [Wang et al., 2014b] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, AAAI 14, pages 1112 1119. Citeseer, 2014. [Weimer et al., 2007] Markus Weimer, Alexandros Karat- zoglou, Quoc Viet Le, and Alex Smola. Maximum margin matrix factorization for collaborative ranking. In Advances in Neural Information Processing Systems, NIPS 07, 2007.