# personalized_privacypreserving_social_recommendation__ddf7159f.pdf

Personalized Privacy-Preserving Social Recommendation

Xuying Meng,1,2 Suhang Wang,3 Kai Shu,3 Jundong Li,3 Bo Chen,4 Huan Liu,3 Yujun Zhang1

1Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China 3Computer Science and Engineering, Arizona State University, Tempe, 85281, USA 4Department of Computer Science, Michigan Technological University, Houghton, 49931, USA {mengxuying, zhmj}@ict.ac.cn, {suhang.wang, kai.shu, jundongl, huan.liu}@asu.edu, bchen@mtu.edu

Privacy leakage is an important issue for social recommendation. Existing privacy preserving social recommendation approaches usually allow the recommender to fully control users information. This may be problematic since the recommender itself may be untrusted, leading to serious privacy leakage. Besides, building social relationships requires sharing interests as well as other private information, which may lead to more privacy leakage. Although sometimes users are allowed to hide their sensitive private data using privacy settings, the data being shared can still be abused by the adversaries to infer sensitive private information. Supporting social recommendation with least privacy leakage to untrusted recommender and other users (i.e., friends) is an important yet challenging problem. In this paper, we aim to address the problem of achieving privacy-preserving social recommendation under personalized privacy settings. We propose Priv SR, a novel framework for privacy-preserving social recommendation, in which users can model ratings and social relationships privately. Meanwhile, by allocating different noise magnitudes to personalized sensitive and non-sensitive ratings, we can protect users privacy against the untrusted recommender and friends. Theoretical analysis and experimental evaluation on real-world datasets demonstrate that our framework can protect users privacy while being able to retain effectiveness of the underlying recommender system.

Introduction The recommender system has become an imperative component of myriad online commercial platforms. With increasing popularity of social networks, recommender systems can take advantage of rich social relationships to further improve effectiveness of recommendation (Tang, Hu, and Liu 2013; Wang et al. 2017; Shu et al. 2018). Despite their effectiveness, these social relationship-based recommender systems (i.e., social recommendation), however, may introduce another source of privacy leakage. For example, by observing victim users ratings on products such as adult or medical items, the attacker may infer the victims private sex inclinations and health conditions (Fredrikson et al. 2014), which may be even further abused for ﬁnancial beneﬁts (Nikolaenko et al. 2013).

Copyright c 2018, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

In practice, a privacy-preserving social recommender system, which can produce accurate recommendation results without sacriﬁcing users privacy, is very necessary. There are a few mechanisms dedicated along this line. However, most of them suffer from following defects. First, a vast majority of existing efforts (Liu and Terzi 2010; Jorgensen and Yu 2014) heavily rely on an assumption that the recommender is fully trusted. They neglect the fact that the recommender itself may be untrusted and may conduct malicious behaviors, causing serious privacy leakage. Second, some other works (Hoens, Blanton, and Chawla 2010; Tang and Wang 2016) rely on cryptography to prevent users exact inputs from being leaked to the untrusted recommender. Nonetheless, it has been shown that attackers can still infer sensitive information about the victim users based on their inﬂuence on the ﬁnal results (Mc Sherry and Mironov 2009). In addition, the cryptographic process is usually expensive and may bring large computational overhead. Third, some of the existing works (Machanavajjhala, Korolova, and Sarma 2011; Jorgensen and Yu 2014; Hua, Xia, and Zhong 2015) rely on friends history ratings to make recommendations. These methods, however, do not differentiate sensitive and non-sensitive ratings and simply treat them equally, which contradicts the real-world scenarios. In practice, social media sites such as IMDB and Facebook1 allow users to specify the visibility of their ratings on products. Treating all the ratings as equally sensitive and thus not exposing any non-sensitive ratings will make it difﬁcult to attract common-interest friends and make effective recommendations, sacriﬁcing user experience in the long run. Our work actually allows to disclosing the non-sensitive rating, but prevents sensitive ratings from being leaked from the exposed non-sensitive ratings. Resolving all the aforementioned defects is necessary for building an effective privacy-preserving social recommender system, which is a very challenging task due to the following reasons: First, to eliminate the assumption that a recommender is fully trustful, we need to change the recommender system from a fully centralized manner to a semicentralized manner. In other words, instead of fully relying on the recommender, we now allow users and the rec-

1Facebook provides public pages for products, e.g., https:// www.facebook.com/pages/Google-Earth/107745592582048

The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18)

ommender to collaborate together in the course of recommendation. Speciﬁcally, users can take part in the learning process upon their own ratings, while the recommender can only have access to non-sensitive ratings, and both parties interact with each other to make the ﬁnal recommendation. In such a semi-centralized manner however, private information may still be leaked during each interaction, and eliminating such leakage is necessary yet challenging. Second, to avoid using expensive cryptographic techniques, differential privacy (Dwork et al. 2006) can be used to provide provable privacy guarantee with a small computational overhead. However, differential privacy requires adding noise which may degrade recommendation effectiveness. This will be exacerbated when non-sensitive ratings are exposed and used as background knowledge to infer sensitive ratings. Third, users are often allowed to conﬁgure their privacy settings in practice. Due to idiosyncrasy of different users, their personalized privacy settings could be quite diverse. Protecting sensitive ratings based on those personalized diversiﬁed privacy settings is not straightforward. In this work, we perform an initial study of privacypreserving social recommendation based on personalized privacy settings. In particular, we propose a novel framework, Priv SR, that can protect sensitive ratings of users from being leaked to untrusted recommender and friends while retaining the effectiveness of recommendation. Our design is mainly based on matrix factorization-based social recommendation, a popular social recommendation approach. Our basic idea is three-fold: 1) We divide the learning process of user latent vectors into small components for each speciﬁc user, and utilize objective perturbation to provide privacy guarantee under differential privacy. 2) We divide the ratings into sensitive and non-sensitive ratings, and only attach sensitive ratings with small privacy budgets, i.e. big magnitude noises. In this way, the non-sensitive ratings modeling will not be signiﬁcantly affected, which can help retain recommendation effectiveness. 3) We decouple the components of noise perturbation into small pieces each of which can be independently processed by individual users. In this way, each user can decide his/her own noise magnitude locally. The entire process can still satisfy the requirement of differential privacy. We summarize the contributions in the following: We are the ﬁrst to study the problem of privacy-preserving social recommendation with personalized privacy. We propose a novel social recommendation framework Priv SR. Priv SR works in a semi-centralized manner, and relies on differential privacy with well-balanced privacy budgets to handle untrusted recommender and friends while retaining recommendation effectiveness. We theoretically prove that Priv SR can satisfy ϵdifferential privacy, and empirically validate its effectiveness using real-world datasets. The results are encouraging: Priv SR provides a good balance between privacy protection and recommendation accuracy.

Preliminaries and Related Work Differential privacy. Differential privacy (Dwork et al. 2006) is a popular privacy-preserving technique, which ef-

fectively perturbs the raw datasets by injecting noise and ensures that the output is not signiﬁcantly affected by removal/addition of a single rating. Considering its provable privacy guarantee with light computational overhead, we will use differential privacy in our proposed framework.

Deﬁnition 1 ϵ-Differential Privacy (Dwork et al. 2006): A randomized algorithm f satisﬁes ϵ-differential privacy, if for any two datasets D1 and D2 which differ at most one rating, and for any possible anonymized output dataset D Range(f),

Pr[f(D1) = D] eϵ Pr[f(D2) = D] (1)

where Range(f) denotes the output range of algorithm f.

The probability is taken over the randomness of f, and the privacy budget ϵ deﬁnes the magnitude of privacy being achieved, where ϵ is a positive real number and the smaller the ϵ, the harder to infer users privacy. Laplace mechanism (Dwork et al. 2006) is commonly used to satisfy ϵ-differential privacy by adding i.i.d. noise from Lap(GS(D)/ϵ) to each output, where the global sensitivity GS(D) is the maximal change to which any single rating in the input D can affect the output. Considering the rare characteristics of Laplace distribution compared with normal distribution, researchers proposed an effective way (Kotz, Kozubowski, and Podgorski 2012) to transfer it into the combination of exponential and normal distribution:

Lemma 1 If a random number h Exp(1), a random number c N(0, 1), then for any real number b > 0, there is b

2hc Lap(b).

Inference and reconstruction attack. Inference attack is always conducted to infer whether an individual rating is included in the training set (Shokri et al. 2017), while differential privacy is widely used to defend against inference attack (Tang and Wang 2016) by adding noise to perturb and reduce each individual s impact on the trained model. Reconstruction attack is conducted to predict exact value of some sensitive features about a target victim based on some background information. A few existing works explored how to reconstruct model to predict users sensitive information (Fredrikson, Jha, and Ristenpart 2015; Komarova, Nekipelov, and Yakovlev 2013). For example, Komarova et al. (Komarova, Nekipelov, and Yakovlev 2013) attempted to infer the sensitive features of an individual given ﬁxed statistical estimate from combined public and private sources. Fredrikson et al. (Fredrikson et al. 2014) demonstrated that differential privacy mechanisms can mitigate reconstruction attacks only when the privacy budget is very small, which unfortunately will signiﬁcantly degrade the effectiveness of the model. Wang et al. (Wang, Si, and Wu 2015) were the ﬁrst to propose to balance the utility and privacy from regression model based on functional mechanism (Zhang et al. 2012). However, the existing proposed mechanisms can not be applied to handle the reconstruction attack in social recommendation since the way to reconstruct the recommendation

Figure 1: The ﬁgure presents an example of user s privacy attacks in social recommendation from the perspective of victim user u1. Assume there are six items, and u1 has rated four of them with personalized privacy settings. u1 exposes processed outputs S1, S2, S3 and S4 to the recommender and to u2, u3 and u4. The black arrows show the exposure directions. The attackers, who are colored red, conduct attacks as shown in gray boxes, and the red dashed arrows show the process of attacks.

model is completely different, where the attackers can utilize non-sensitive ratings to inversely predict a victim user s latent features, reconstructing the user s sensitive ratings by matrix factorization (Koren, Bell, and Volinsky 2009). Social recommendation. Considering users preferences may be similar or inﬂuenced by their friends, social relationships are widely employed to improve recommendation effectiveness based on matrix factorization (Ma et al. 2011), which is selected as our basic model. Let U = {u1, . . . , un} be a set of n users and V = {v1, . . . , vm} be a set of m items. We denote ui s rating on item vj as Rij and use Fi to represent the set of ui s friends. The social recommendation algorithm can be mathematically written as:

j=1 Iij(Rij u T i vj)2 +

f Fi Sif||ui uf||2 F

(2) where Iij = 1 if we observed a rating from ui to vj, otherwise Iij = 0. Rating matrix R Rn m are decomposed into user latent matrix U = [ui]i [n] RK n and item latent matrix V = [vj]j [m] RK m, where ui RK and vj RK denote user latent vector for user ui and item latent vector for vj respectively, and K is the number of latent dimensions. || ||2 F denotes the Frobenius norm, and Sif is the cosine similarity between ratings of ui and uf on the same items, which is applied to regularize the impact of friends user latent vectors.

Problem Statement In social recommendation, there are three types of actors, namely, users, friends and recommender. Among them, the friends and the recommender may be untrusted, who are curious about or even misuse users sensitive ratings. We use a concrete example (as shown in Figure 1) to show some potential privacy leakage. To model history ratings in matrix factorization-based social recommendation, the victim user u1 is required to share some processed outputs with the recommender and friends u2, u3, u4. However, the attackers can manage to learn sensitive information from the exposed outputs in the learning process: (1) To update vj,

the recommender requires user u1, who has rated item vj, to share S1 being calculated from rating R1j and user latent vector u1. However, when there is an item vj, on which u1 regards its R1j as non-sensitive and publishes it, the recommender can obtain S1 and R1j, compute u1, and further obtain sensitive ratings R11 and R13; (2) With the exposed non-sensitive ratings, the recommender may conduct reconstruction attack to infer an approximation latent vector u1, by which u1 s all ratings may be disclosed; and (3) The malicious friend u3 requires user latent vector u1 for social regularization, by which u3 may learn u1 s ratings by computing u T 1 V. To formally deﬁne our problem, we ﬁrst describe the notations used in this paper. When the user ui rates item vj (i.e., Rij), ui will specify his/her privacy setting on Rij as private, sharing within friends, or public. We use Fij = 1 to indicate that Rij is a sensitive rating, and only visible to user ui due to privacy concerns; otherwise Fij = 0. Similarly, Gij = 1 indicates that Rij is a non-sensitive rating, and visible to friends/public; otherwise Gij = 0. As Fij = 1 and Gij = 1 are mutually exclusive, we have Iij = Fij + Gij for all observed ratings. Then we deﬁne the set of sensitive ratings as Rs = {Rij| (i, j) s.t. Fij = 1}, and the set of non-sensitive ratings as Rn = {Rij| (i, j) s.t. Gij = 1}. With these deﬁnitions, our privacy-preserving social recommendation problem can be formally deﬁned as: Given the observed values in R, the set of friends F, a set of sensitive ratings Rs, as well as a set of non-sensitive ratings Rn, we want to infer the missing values in R without privacy leakage of Rs.

Private Social Recommendation

Our proposed framework, Priv SR, aims to allow recommender systems to incorporate social relationships without leaking sensitive ratings to untrusted recommender and friends. To achieve this goal, we perform the following: First, we incorporate social relationships into traditional recommender systems with consideration of both non-sensitive and sensitive ratings. We divide the entire framework into users ratings component and social relationships compo-

Figure 2: The proposed framework Priv SR.

nent (Figure 2), and keep balanced noise perturbation on sensitive and non-sensitive ratings in users ratings component, and meanwhile, only utilize non-sensitive ratings to model social similarity with untrusted friends in the social relationships component. Second, to remove the centralized control of the untrusted recommender or any third parties, we require the recommender and the users to collaborate to perform recommendation. We allocate different resources to the recommender and individual users as shown in the green part of Figure 2, in which the recommender can only have access to non-sensitive ratings Rn and share the updated item latent matrix V with everyone for recommendation purpose. Except public information, every user holds his/her private information, including all his/her ratings Ri and friends set Fi, in local machine. In particular, since the user latent vector ui can be used to obtain sensitive ratings (e.g., by computing u T i V), ui should be also kept locally.

Modeling Sensitive and Non-sensitive Ratings

A non-trivial task for our Priv SR design is to model ratings without leakage of sensitive ratings, especially in face of personalized privacy settings and public non-sensitive ratings, which may be used by the adversary as the background information to infer the sensitive ratings. We present the basic model based on matrix factorization model (Koren, Bell, and Volinsky 2009; Wang, Tang, and Liu 2015; Wang et al. 2015; Meng et al. 2018a). Since Iij = Fij+Gij, the objective function considering rating sensitivity can be written as follows:

j=1 (Fij + Gij)(Rij u T i vj)2 (3)

To conduct recommendation in a semi-central manner and protect privacy from untrusted recommender, we utilize gradient descent to decouple and update each latent vector ui of each user. Because the gradient of Eq.(3) w.r.t. ui is m j=1 2(Fij+Gij)(u T i vj Rij)vj, which only involves ui and V, then each ui can be updated locally with the shared V, and can be kept private. On the other hand, to update vj, the gradient of Eq.(3) w.r.t. vj is n i=1 2(Fij + Gij)(u T i vj Rij)ui, which requires each user (e.g., ui) who has rated vj to submit a copy

of σi j = 2(Fij + Gij)(u T i vj Rij)ui to the recommender, whereas the individual submission may raise great privacy concerns: (1) Attackers can easily obtain ui when Gij = 1, then all sensitive ratings are exposed by u T i V; and (2) Attackers can conduct inference attack from the contribution of a particular user ui. Although encryption techniques may solve the ﬁrst problem and ensure the recommender only knows the ﬁnal summation but not the exact value from each user, the untrusted recommender can still conduct inference attack as the aforementioned second problem. To tackle all these problems, we apply the objective perturbation method (Chaudhuri, Monteleoni, and Sarwate 2011) with ϵ-differential privacy, and perturb individual s involvement by adding noise into the objective function. We then introduce noise to Eq.(3) as:

(Fij + Gij)(Rij u T i vj)2 + v T j oi j

(4) where oj =

i oi j RK 1 is a noise vector, and each user ui protects σi j by adding oi j in the derivative w.r.t. vj. Then there comes the third privacy concerns that attackers can still obtain users sensitive ratings easily with the exposed non-sensitive ratings by performing reconstruction attack. This can be prevented only when privacy budget ϵ for noise sampling is extremely small (Fredrikson et al. 2014), whereas, small privacy budgets will lead to large noise magnitude and the recommendation effectiveness will degrade. Thus the uniﬁed noise oj without considering personalized privacy settings, will deﬁnitely reduce the effectiveness of recommendation. To protect users privacy while retaining recommendation effectiveness, we allocate balanced privacy budgets for sensitive and non-sensitive ratings as:

j=1 Fij (Rij u T i vj)2 + v T j xi j

j=1 Gij (Rij u T i vj)2 + v T j yi j (5)

i xi j RK 1, yj =

i yi j RK 1 are noise vectors for i σi j with sensitive and non-sensitive ratings respectively. We allocate a much smaller privacy budget ϵs for sensitive ratings and a larger ϵn for non-sensitive ones where ϵs = βϵn and the domain of β is (0, 1] which is used to control the relative noise magnitude. Then sensitive ratings can receive better privacy protection with the small privacy budget ϵs. We set the privacy budget of the derived V as ϵ = βϵn

1+β . Since ϵn > ϵs, Theorem 1 shows that our model can effectively protect sensitive ratings while retaining recommendation effectiveness with balanced privacy budgets. However, it is difﬁcult for users to independently select yi j and achieve

i yi j Lap(2Δ

K/ϵn). It is similar for xi j, and we use yi j as an example. Although the sum of numbers from Laplace distribution does not follow Laplace distribution anymore, the summation of numbers

from normal distribution can still follow normal distribution. According to Lemma 1, the recommender ﬁrst construct hj RK, where each element of hj is randomly and independently picked from Exp(1). Then the recommender shares hj to users in Rn,j, where we deﬁne Rn,j (or Rs,j) as the set of users who gave vj non-sensitive (or sensitive) ratings. After that, each user selects cijn RK, where each element in cijn is randomly and independently picked from N(0, 1/|Rn,j|). Then σi j can be protected using noise 2Δ

2Khjcijn/ϵn based on hj and cijn, and the summation of noise i Rn,j(2Δ

2Khjcijn/ϵn)

K/ϵn). Theorem 1 Let Δ denotes the difference between the maximal rating and the minimum rating. If each element in xj and yj is independently and randomly selected from Lap( 2Δ

K ϵs ) and Lap( 2Δ

K ϵn ), the derived V satisﬁes ϵdifferential privacy.2

Modeling Social Relationships Social regularization, which is formulated as n i=1 f Fi Sif||ui uf||2 F based on Eq.(2), requires calculating similarity Sif for all the sensitive and non-sensitive ratings, and exchanging friends very sensitive information, i.e., latent vectors uf. Without a fully trusted recommender, this sensitive information may be leaked in the course of optimization. To protect sensitive ratings from untrusted friends, we only utilize non-sensitive ratings for the calculation of Sif. Also, to protect each friend s uf from the optimization of n i=1 f Fi Sif||ui uf||2 F with gradient descent, we ﬁrst calculate the gradient w.r.t ui as 2Sifui

f Fi 2Sifuf, where we set σf i = 2Sifuf. To protect friends from sharing uf to user ui, we also propose the perturbation terms to hide friends user latent vector uf

Sif||ui uf||2 F + u T i qf i (6)

f qf i RK 1 is the noise vector, and each

qf i is from friend uf for derived ui. In order to make uf help his friend ui locally to learn ui while not leaking uf from the submission of σf i , we add noise in Eq.(6). In this way, each friend can send the perturbed value qf i σf i to user ui. Theorem 2 ensures f qf i Lap(2

K/ϵ), thus we demand each user constructs hi from Exp(1), and shares hi with all his/her friends. All the friends will also randomly and independently select cif from N(0, 1/|Fi|). Then σf i can be protected by noise 2

Khicif/ϵ, and the summation of noise f Fi(2

Khicif/ϵ) Lap(2

K/ϵ). Theorem 2 If each element in qi is independently and randomly selected from Lap( 2

K ϵ ), the derived U satisﬁes ϵdifferential privacy.3

2Detailed proof can be found in (Meng et al. 2018b) 3Detailed proof can be found in (Meng et al. 2018b)

Algorithm 1 Priv SR Algorithm Input: J , ϵ, γ, β, λ, user ui holds its ui and Fi Output: R 1: Initialize U and V 2: while not converge do 3: for j = 1, ..., m do 4: // Calculate vj on recommender s side 5: for i in Rs,j do 6: Send 2(u T i vj Rij)ui+xi j to the recommender 7: for i in Rn,j do 8: Send 2(u T i vj Rij)ui+yi j to the recommender

9: Update vj as vj = vj γ J

vj 10: for i = 1, ..., n do 11: // Calculate ui on user ui s side 12: for f in Fi do 13: Send qf i 2Sifuf to user ui 14: Update ui as ui = ui γ J

ui 15: Return R = UT V

The Proposed Framework Priv SR To protect users privacy from untrusted recommender with sensitive and non-sensitive model component, and from untrusted friends with social relationships model component, the ﬁnal objective function of Priv SR to protect sensitive ratings while retaining recommendation effectiveness is to solve the following optimization problem:

min U,V J =

j=1 Fij (Rij u T i vj)2 + v T j xi j

j=1 Gij (Rij u T i vj)2 + v T j yi j

Sif||ui uf||2 F + u T i qf i

+λ(||U||2 F + ||V||2 F )

where α is a scalar to control the contribution of social relationships and λ(||U||2 F +||V||2 F ) is used to avoid over-ﬁtting with λ being a scalar. We use gradient descent to minimize the objective function. The gradients of Eq.(7) w.r.t. ui and vj are given as follows:

i=1 Fij 2(u T i vj Rij)ui + xi j (8)

i=1 Gij 2(u T i vj Rij)ui + yi j + 2λvj

j=1 Iij(u T i vj Rij)vj + 2α

f Fi Sif(ui uf)

f Fi qf i + 2λui (9)

To address the challenge of protecting sensitive ratings against untrusted recommender and friends, we conduct objective perturbation with balanced privacy budgets in a semicentralized way, which is described in Algorithm 1. To preserve privacy, item latent matrix is updated in the recommender s side with perturbed information from users, and user latent vectors are updated in each user s side individually with shared V and perturbed friends user latent vectors. Next, we brieﬂy describe Algorithm 1. In order to help the recommender to update vj in lines 4 through 9 with Eq.(8), users send the recommender the required information individually with different privacy budget ϵn or ϵs. To help user ui update ui in lines 11 through 14 with Eq.(9), each of ui s friends sends perturbed results with independent and random noise qf i . After the algorithm converges, we can obtain the predicted result R by the optimized U and V. Note that statistical information from users submission in each iteration may be utilized by attackers. For example, to obtain a targeted sensitive rating Rij, the untrusted recommender can collect σi j(t) = σi j(t) + xi j in t-th iteration, where σi j(t) = 2Fij(u T i (t)vj(t) Rij)ui(t). Based on σi j(t) σi j(t 1) = σi j(t) σi j(t 1), the impact of noise is eliminated. Therefore, we need to ensure xi j is randomly sampled in each iteration to eliminate the inﬂuence of statistics (Rajkumar and Agarwal 2012). Similarly, yi j and qf i will also be updated in each iteration. Security analysis. Theorem 3 conﬁrms us that Priv SR can achieve the desired security. After Algorithm 1 converges, our model can satisfy ϵ-differential privacy against untrusted recommender and friends.

Theorem 3 Priv SR can satisfy ϵ-differential privacy.4

Experimental Evaluation

In this section, we conduct experimental evaluation to validate the effectiveness of Priv SR. We aim to answer two questions: (1) can Priv SR improve recommendation effectiveness by incorporating sensitive ratings and social relationships? and (2) can it protect sensitive ratings under reconstruction attack while retaining recommendation effectiveness? In the following, we ﬁrst introduce our datasets and experimental settings, and then conduct experimental evaluation followed by analyzing impacts of parameters. Datasets and experimental settings. Two publicly available datasets Ciao5 and Epinions6 are used for evaluation. For both datasets, users can rate products from 1 to 5 and establish social relations with others. Detailed statistics of these two datasets are shown in Table 1. These two datasets possess social relations of different sparsity which can help validate effectiveness and generality of Priv SR. For each dataset, to simulate the setting of personalized privacy preferences, we randomly select x percent of the ratings as sensitive ratings and the remaining 100 x as non-sensitive

4Detailed proof can be found in (Meng et al. 2018b) 5http://www.ciao.co.uk/ 6http://www.epinions.com/

Table 1: Statistics of datasets Dataset # users # items # ratings # relationships # Ciao 7,193 21,889 183,415 28,513 # Epinions 17,950 49,760 508,936 14,017

ratings. We vary x as {0, 10, . . . , 50} and use ﬁve-fold cross validation for the following experiments. We use a popular metric Mean Absolute Error (MAE), which is deﬁned as (ui,vj) R | Rij Rij|/|R|, and R is the set of ratings in the testing set. For recommendation effectiveness, smaller MAE indicates better performance. For reconstruction attack on sensitive rating, larger MAE indicates better privacy protection. Note that previous work demonstrated that small improvement in MAE can have a signiﬁcant impact on the quality of the top-few recommendation (Koren 2008). We compare three representative stateof-the-art recommendation approaches: MF: matrix factorization (MF) tries to decompose the user-item rating matrix into two matrices for recommendation (Koren, Bell, and Volinsky 2009). So Reg: this method incorporates social regularization on matrix factorization to represent the social constrains on recommender systems (Ma et al. 2011). DPMF: differential private matrix factorization (DPMF) treats all ratings private and uses equally perturbed noise for latent matrix learning (Hua, Xia, and Zhong 2015). For each approach, the parameters are tuned via crossvalidation on training data. We then set γ = 10 4, λ = 10 3, α = 10 2 and the dimension K = 10. For convenience, we ﬁx β = 0.1 for Priv SR in the ﬁrst two experiments, and accordingly, ϵs = 1.1ϵ and ϵn = 11ϵ. More details about parameter selection will be discussed in the following. Recommendation effectiveness comparison. To answer the ﬁrst question, we evaluate the recommendation effectiveness on the test datasets. We do not provide sensitive ratings to MF and So Reg, and only provide them to DPMF and Priv SR, since there is no privacy protection for sensitive ratings in MF and So Reg. The average MAE results are shown in Figure 3, from which we observe: When x = 0, Priv SR with ϵ = 0.1 can perform almost as good as So Reg, which conﬁrms that noise perturbation on non-sensitive ratings will not signiﬁcantly affect recommendation effectiveness. In general, Priv SR with ϵ = 0.1 steadily outperforms other methods with different percentages of sensitive ratings, though we attach noise with low privacy budgets. This conﬁrms the effectiveness of the well-balanced privacy budgets for sensitive and non-sensitive ratings. Although the privacy budget of Priv SR with ϵ = 0.05 is much smaller than DPMF with ϵ = 0.1, the corresponding recommendation effectiveness of Priv SR is still better than DPMF in most cases. In practice, the percentage of sensitive ratings is usually not too high, thus Priv SR can still achieve very good recommendation effectiveness.

(a) MAE on Ciao

(b) MAE on Epinions

Figure 3: Recommendation effectiveness comparison.

(a) MAE on Ciao

(b) MAE on Epinions

Figure 4: Privacy protection comparison.

Based on the aforementioned observations, we conclude that Priv SR outperforms the state-of-the-art recommender systems on recommendation effectiveness by utilizing rich social relationships and perturbing balanced noises.

Privacy protection comparison. To answer the second question, we simulate the reconstruction attack. There are multiple options for conducting reconstruction attack (Fredrikson, Jha, and Ristenpart 2015). We conduct it using the matrix factorization-based model. Since attackers can obtain both V and Rn, they can infer a rough user latent proﬁle ui of the victim ui by solving the following equation:

j=1 Gij(Rij u T i vj)2 (10)

By using gradient descend, all the sensitive ratings can be obtained by U and V. We want to protect sensitive ratings, such that prediction of sensitive ratings is inaccurate, and a larger MAE value on sensitive ratings represents a better privacy protection. From Figure 4, we can obtain the following observations: Noise perturbation helps increase the level of privacy protection against reconstruction attacks. With the similar privacy budget, the level of privacy protection provided by Priv SR and DPMF are similar. However, Priv SR can achieve much better recommendation effectiveness with different privacy budgets for sensitive and non-sensitive ratings. We perform t-test on recommendation effectiveness of Priv SR and DPMF with the same privacy budgets for sensitive ratings. The test results show that the improvement is statistically signiﬁ-

(a) MAE when varying ϵ

(b) MAE when varying β

Figure 5: MAE with varying parameters.

cant. These results indicate Priv SR can achieve a better balance between privacy protection and recommendation effectiveness. Priv SR with a lower privacy budget can signiﬁcantly increase the level of privacy protection while being able to retain a good recommendation effectiveness, especially when the percentage of private ratings x is not too large. Based on the aforementioned observations, we conclude that Priv SR outperforms the state-of-the-art recommender systems on privacy protection while retaining great recommendation effectiveness.

Impact of parameters ϵ and β. For simpliﬁcation, we set x = 10, based on the real-world statistical results7. Then we randomly select 10% ratings of the entire datasets as the sensitive rating set. To understand the impact of ϵ and β, we change ϵ from {0.01, 0.05, 0.1, 0.5, 1} with ﬁxed β = 1. Also, we vary β from {0.01, 0.05, 0.1, 0.5, 1} with ﬁxed ϵ = 0.01. The MAE results are shown in Figure 5, from which we observe that: 1) Larger privacy budget indicates less noise, resulting in better recommendation effectiveness and worse privacy protection. This is a common observation about the trade-off between privacy and utility (Meng et al. 2016; Koren, Bell, and Volinsky 2009; Wang, Si, and Wu 2015). 2) With ﬁxed ϵ, the recommendation effectiveness stays the same, while larger β indicates larger privacy budget for sensitive data and smaller for the non-sensitive, which makes the privacy protection decrease on the sensitive ratings.

Conclusion and Future Work In this paper, we study the problem of privacy-preserving social recommendation with personalized privacy settings. We propose a novel differential privacy-preserving framework in a semi-centralized way which can protect users sensitive ratings while being able to retain recommendation effectiveness. Theoretic analysis and experimental evaluation on real-world datasets demonstrate the effectiveness of the proposed framework for recommendation as well as privacy protection. Several directions can be further investigated. First, in this paper, we build our model based on matrix factorization, which is a point-based model. We will

7https://techcrunch.com/2009/10/05/twitter-data-analysis-aninvestors-perspective-2

study privacy preserving social recommendation using rankbased models such as BPR (Rendle et al. 2009) in the future. Second, we only consider static data in this paper. We will study this problem for temporal and dynamic data (Koren 2010) next.

Acknowledgment This work was supported by, or in part by, National Science Foundation of China (61672500, 61572474 and 61402446), and Program of International S&T Cooperation (2016YFE0121500). Suhang Wang and Huan Liu were supported by National Science Foundation (NSF) under grant #1614576 and Ofﬁce of Naval Research (ONR) under grant N00014-16-1-2257.

References Chaudhuri, K.; Monteleoni, C.; and Sarwate, A. D. 2011. Differentially private empirical risk minimization. Journal of Machine Learning Research 12(3):1069 1109. Dwork, C.; Mc Sherry, F.; Nissim, K.; and Smith, A. 2006. Calibrating noise to sensitivity in private data analysis. In TCC, 265 284. Fredrikson, M.; Lantz, E.; Jha, S.; Lin, S.; Page, D.; and Ristenpart, T. 2014. Privacy in pharmacogenetics: An end-toend case study of personalized warfarin dosing. In USENIX, 17 32. Fredrikson, M.; Jha, S.; and Ristenpart, T. 2015. Model inversion attacks that exploit conﬁdence information and basic countermeasures. In CCS, 1322 1333. Hoens, T. R.; Blanton, M.; and Chawla, N. V. 2010. A private and reliable recommendation system for social networks. In Social Com, 816 825. Hua, J.; Xia, C.; and Zhong, S. 2015. Differentially private matrix factorization. In IJCAI, 1763 1770. Jorgensen, Z., and Yu, T. 2014. A privacy-preserving framework for personalized, social recommendations. In EDBT, 571 582. Komarova, T.; Nekipelov, D.; and Yakovlev, E. 2013. Estimation of treatment effects from combined data: Identiﬁcation versus data security. In Iccas-Sice, 3066 3071. Koren, Y.; Bell, R. M.; and Volinsky, C. 2009. Matrix factorization techniques for recommender systems. IEEE Computer 42(8):30 37. Koren, Y. 2008. Factorization meets the neighborhood: a multifaceted collaborative ﬁltering model. In SIGKDD, 426 434. Koren, Y. 2010. Collaborative ﬁltering with temporal dynamics. Commun. ACM 53(4):89 97. Kotz, S.; Kozubowski, T.; and Podgorski, K. 2012. The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and ﬁnance. Springer Science & Business Media. Liu, K., and Terzi, E. 2010. A framework for computing the privacy scores of users in online social networks. TKDD 5(1):6:1 6:30.

Ma, H.; Zhou, D.; Liu, C.; Lyu, M. R.; and King, I. 2011. Recommender systems with social regularization. In WSDM, 287 296. Machanavajjhala, A.; Korolova, A.; and Sarma, A. D. 2011. Personalized social recommendations: accurate or private. PVLDB 4(7):440 450. Mc Sherry, F., and Mironov, I. 2009. Differentially private recommender systems: building privacy into the net. In SIGKDD, 627 636. Meng, X.; Xu, Z.; Chen, B.; and Zhang, Y. 2016. Privacypreserving query log sharing based on prior n-word aggregation. In Trustcom, 722 729. Meng, X.; Wang, S.; Liu, H.; and Zhang, Y. 2018a. Exploiting emotion on reviews for recommender systems. In AAAI. Meng, X.; Wang, S.; Shu, K.; Jundong, L.; Chen, B.; Liu, H.; and Zhang, Y. 2018b. Personalized privacy-preserving social recommendation. https://github.com/mxyenguing/ Priv SR/blob/master/appendix.pdf. Nikolaenko, V.; Ioannidis, S.; Weinsberg, U.; Joye, M.; Taft, N.; and Boneh, D. 2013. Privacy-preserving matrix factorization. In CCS, 801 812. Rajkumar, A., and Agarwal, S. 2012. A differentially private stochastic gradient descent algorithm for multiparty classiﬁcation. In AISTATS, 933 941. Rendle, S.; Freudenthaler, C.; Gantner, Z.; and Schmidt Thieme, L. 2009. Bpr: Bayesian personalized ranking from implicit feedback. In UAI, 452 461. Shokri, R.; Stronati, M.; Song, C.; and Shmatikov, V. 2017. Membership inference attacks against machine learning models. In SP, 3 18. Shu, K.; Wang, S.; Tang, J.; Wang, Y.; and Liu, H. 2018. Crossﬁre: Cross media joint friend and item recommendations. In WSDM. Tang, Q., and Wang, J. 2016. Privacy-preserving friendshipbased recommender systems. IEEE Transactions on Dependable & Secure Computing PP(99):1 1. Tang, J.; Hu, X.; and Liu, H. 2013. Social recommendation: a review. Social Netw. Analys. Mining 3(4):1113 1133. Wang, S.; Tang, J.; Wang, Y.; and Liu, H. 2015. Exploring implicit hierarchical structures for recommender systems. In IJCAI, 1813 1819. Wang, S.; Wang, Y.; Tang, J.; Shu, K.; Ranganath, S.; and Liu, H. 2017. What your images reveal: Exploiting visual contents for point-of-interest recommendation. In WWW, 391 400. Wang, Y.; Si, C.; and Wu, X. 2015. Regression model ﬁtting under differential privacy and model inversion attack. In IJCAI, 1003 1009. Wang, S.; Tang, J.; and Liu, H. 2015. Toward dual roles of users in recommender systems. In CIKM, 1651 1660. Zhang, J.; Zhang, Z.; Xiao, X.; Yang, Y.; and Winslett, M. 2012. Functional mechanism: regression analysis under differential privacy. PVLDB 5(11):1364 1375.