# fedrec_lossless_federated_recommendation_with_explicit_feedback__c3c86ffc.pdf

Fed Rec++: Lossless Federated Recommendation with Explicit Feedback

Feng Liang, Weike Pan*, Zhong Ming* National Engineering Laboratory for Big Data System Computing Technology College of Computer Science and Software Engineering Shenzhen University, Shenzhen 518060, China liangfeng2018@email.szu.edu.cn, panweike@szu.edu.cn, mingz@szu.edu.cn

With the marriage of federated machine learning and recommender systems for privacy-aware preference modeling and personalization, there comes a new research branch called federated recommender systems aiming to build a recommendation model in a distributed way, i.e., each user is represented as a distributed client where his/her original rating data are not shared with the server or the other clients. Notice that, besides the sensitive information of a speciﬁc rating score assigned to a certain item by a user, the information of a user s rated set of items shall also be well protected. Some very recent works propose to randomly sample some unrated items for each user and then assign some virtual ratings, so that the server can not identify the scores and the set of rated items easily during the server-client interactions. However, the virtual ratings assigned to the randomly sampled items will inevitably introduce some noise to the model training process, which will then cause loss in recommendation performance. In this paper, we propose a novel lossless federated recommendation method (Fed Rec++) by allocating some denoising clients (i.e., users) to eliminate the noise in a privacy-aware manner. We further analyse our Fed Rec++ in terms of security and losslessness, and discuss its generality in the context of existing works. Extensive empirical studies clearly show the effectiveness of our Fed Rec++ in providing accurate and privacy-aware recommendation without much additional communication cost.

Introduction In the era of information overload, it is often difﬁcult for people to ﬁnd what they like among a huge number of items. Recommender systems solve the problem by exploiting users historical data to recommend some items that the users may like. Traditional collaborative ﬁltering (CF) algorithms (Mnih and Salakhutdinov 2007; Koren 2008; Rendle 2012) need to collect all the users rating data in one central place, e.g., the server, for model training. With the increasing awareness of privacy and the publishing of some related privacy protection laws such as GDPR (EU 2016), collecting users data may not be feasible in many cases. Recently, federated machine learning (Mc Mahan et al. 2017; Yang et al. 2019; Kairouz et al. 2019) has been proposed for protecting users privacy in machine learning al-

Copyright c 2021, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

gorithms and systems (including recommender systems). Some recent works (Ammad-ud-din et al. 2019; Chai et al. 2020) revisit some recommendation algorithms in the new federated learning paradigm. Speciﬁcally, the original rating data are always kept locally in each client (i.e., user) in the whole process of model training, and each client only uploads the corresponding model parameters to the server in order to update the model jointly. For example, federated collaborative ﬁltering (FCF) (Ammad-ud-din et al. 2019) focuses on item ranking with implicit feedback, and treats all the unrated items as negative ones, which may cause bias in model training and also high communication cost during the server-client interactions. Fed MF (Chai et al. 2020) uses the homomorphic encryption technology to encrypt the items gradients before uploading them to the server in order to protect the users privacy. A federated meta learning work (Jalalirad et al. 2019) combines a meta learning method called REPTILE (Nichol, Achiam, and Schulman 2018) with federated learning for rating prediction with explicit feedback, which is able to ﬁne tune the model parameters for each user. Federated multi-view matrix factorization (FED-MVMF) (Flanagan et al. 2020) combines multiview matrix factorization with federated learning in order to protect the users original rating data when modeling multiparty data. However, it will leak the users rating behaviors (i.e., the set of items rated by a user) similar to the aforementioned methods. We can see that most existing federated recommendation methods will either bias the model training or do not protect the users rating behaviors well. Recently, SDCF (Jiang, Li, and Lin 2019) proposes a two-stage randomized response algorithm to perturb the rated and unrated items of each user, and then calculates and uploads their gradients to the server. In this way, the server can not identify the set of items rated by the users easily, which thus protects the users rating behaviors. Fed Rec (Lin et al. 2020) also uploads the gradients of the rated items and the randomly sampled unrated items of the users. Notice that it uses a hybrid ﬁlling strategy to assign some virtual ratings to the unrated items. However, both of them will introduce some noise to the gradients, which will cause loss in the recommendation performance. In order to eliminate the gradient noise, we propose to allocate some denoising users (i.e., clients) to eliminate the noise caused by the randomly sampled items and their as-

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

signed virtual ratings. Our denoising strategy is secure in terms of protecting users privacy even the server colludes with the denoising clients. We then incorporate the denoising component into the very recent method Fed Rec (Lin et al. 2020) and obtain our solution called Fed Rec++. As far as we know, we are the ﬁrst to study gradient noise elimination in federated recommendation. We summarize our main contributions as follows. (i) We propose a novel lossless federated recommendation method (Fed Rec++) for modeling users explicit feedback, which is able to completely eliminate the gradient noise brought by the virtual ratings assigned to the randomly sampled unrated items. (ii) We discuss the relationships of our Fed Rec++ with existing works, e.g., our Fed Rec++ reduces to Fed Rec (Lin et al. 2020) when the component of noise elimination is removed, and also analyse its security in protecting users privacy. (iii) We conduct extensive empirical studies on three public datasets, and ﬁnd our Fed Rec++ is able to protect users privacy without sacriﬁcing the recommendation performance.

Related Work Probabilistic Matrix Factorization In probabilistic matrix factorization (PMF) (Mnih and Salakhutdinov 2007), the rating of a user u to an item i is calculated via the inner product of their latent feature vectors, i.e., ˆrui = Uu V T i , where Uu , Vi R1 d.

Federated Recommendation with Explicit Feedback In federated recommendation with explicit feedback (Fed Rec) (Lin et al. 2020), in order to protect a user s rating behaviors, i.e., the set of items Iu rated by a user u, the authors design an effective hybrid ﬁlling (HF) strategy to randomly sample some unrated items. Firstly, it randomly samples |I u| unrated items of user u from I\Iu, where |I u| = ρ|Iu| with ρ {1, 2, 3}. Secondly, it uses the average rating or predicted rating of user u to a sampled item i as a virtual rating r ui. Thirdly, it calculates the gradients of user u to the rated items and the unrated items, i.e., Vi , i Iu I u, and then uploads these gradients to the server. In this way, Fed Rec with the HF strategy achieves the purpose of protecting the user s original rating records and the rating behaviors in the preference modeling process. In particular, the virtual rating r ui is as follows,

Pm k=1 yukruk Pm k=1 yuk , t<Tpredict

U u V T i , t Tpredict

where t denotes the number of iterations that have been executed in model training, and Tpredict is a parameter that determines when to start using the predicted rating as a virtual rating to a sampled unrated item i. In Fed Rec with PMF (Mnih and Salakhutdinov 2007) as the backbone model, the gradient of each item i is as follows,

u Ui U i V HF EF (u, i)

|Ui U i| . (2)

Notice that Ui U i denotes the users that have rated or virtually rated item i, and

V HF EF (u, i) =

( (Uu V T i rui)Uu + λVi , yui = 1

(Uu V T i r ui)Uu + λVi , yui = 0 (3)

where rui and r ui are the true observed rating and the virtual rating of user u to item i, respectively. Although Fed Rec achieves privacy protection in rating prediction, the randomly sampled items in the hybrid ﬁlling strategy introduces some noise to the recommendation model, which inevitably affects the performance. This motivates us to design a lossless version of Fed Rec, which is critical to be deployed in a real-world application.

Secure Distributed Collaborative Filtering Secure distributed collaborative ﬁltering (SDCF) (Jiang, Li, and Lin 2019) is a distributed recommendation framework for protecting users original rating data, recommendation model and rating behaviors (i.e., each user s rated items). SDCF divides the recommendation model into two parts, including some public elements and some personal elements. The public elements refer to the items latent factors which can be shared with the server and the other clients, and the personal elements are the users rating data and users latent factors that are kept locally in their own clients. Hence, SDCF can protect the privacy in users rating data and the recommendation model in a similar way to that of Fed Rec (Lin et al. 2020). Moreover, SDCF uses stochastic gradient langevin dynamics (SGLD) (Welling and Teh 2011) as a gradient descent method so as to defend differential attacks and prevent users latent factors being leaked to the server. However, there may still be the leakage of the users rating behaviors. For this reason, SDCF uses a two-stage randomized response algorithm to perturb the rated items and unrated items of each user, and then uploads the corresponding items gradients to the server. Finally, SDCF can thus protect the users rating behaviors similar to that of Fed Rec by uploading the virtually rated items gradients. As another issue, users have no ratings for the unrated items, hence the users can not calculate the values of the loss in the unrated items gradients via rui Uu V T i (i.e., eui), i I\Iu. To solve this problem, SDCF samples some virtual eui, i I\Iu from the distribution of eui, i Iu of the user u. However, this strategy will also introduce some noise to the gradient at each iteration of model training. From the above discussions of Fed Rec and SDCF, we can see that noise elimination is a common challenge for performance improvement in privacy-ware recommendation tasks.

Decentralized Distributed Matrix Factorization Decentralized matrix factorization (DMF) (Chen et al. 2018) is a distributed POI recommendation framework for protecting users rating data and solving the problem of computation and storage in the server. DMF keeps users rating data in the corresponding local clients and utilizes these rating data to calculate the global items latent factors and the local items latent factors. And then each client synchronously sends the global items gradients to his/her neighboring

clients who are chosen by a random walk method. Although this framework saves the resource of the server and avoids the risk of the rating data of all the users being leaked from the server to malicious attackers, there still exists the leakage of users rating behaviors. Speciﬁcally, each user will receive the items gradients from their own neighbors at each iteration of model training, and these items gradients contain the items IDs. Hence, this user will know the rated items of its neighbors, i.e., rating behaviors of its neighbors. PDMFRec (Duriakova et al. 2019) is also a decentralized distributed POI recommendation framework with a novel, user-centric and privacy-enhanced matrix factorization method. This framework builds a user s adjacency graph in trustworthy clients via co-rated items between users, which solves the privacy problem of user geographic location leaked by DMF (Chen et al. 2018) when constructing the user adjacency graph. Furthermore, PDMFRec also proposes two privacy-protection settings that allow users to control the privacy-protection level. In the ﬁrst setting, it allows each user to only choose parts of the rated items to take part in the construction of the user s adjacency graph. In the second setting, the rated items hidden by each user in the ﬁrst setting also do not take part in model training. Compared with DMF, PDMFRec has comparable performance, better privacy-protection level, i.e., protecting the users rating behaviors and ensuring the anonymity of the sending client. The anonymity of the sending client in PDMFRec inspires us to design an effective noise elimination strategy to prevent the denoising clients from colluding with the server, which will be described in detail later.

Our Solution: Fed Rec++ In this section, we describe our proposed lossless federated recommendation method Fed Rec++ for modeling users explicit feedback in detail. We will ﬁrst deﬁne our studied problem and then describe our solution on noise elimination in both the server and the clients. We then discuss the generality of our Fed Rec++ and its security in privacy protection.

Problem Deﬁnition In federated recommendation with explicit feedback, we have a set of rating records for each user u, i.e., Ru = {(u, i, rui); i Iu}, where Iu is a set of items rated by user u and rui is the rating of user u assigned to item i. Our goal is to estimate the preference of user u to the unrated items without exposing the rating scores and the rating behaviors of each user u (i.e., Ru and Iu), which is the main difference between federated recommendation and traditional recommendation. We put some commonly used notations in Table 1.

Eliminate the Gradient Noise in the Server In the beginning, the server initializes the model parameters Vi , i I, and sends them to each client, i.e., step 1 in Figure 1. When each ordinary client u U\ U completes the computation of the item gradients, i.e., V HF EF (u, i), i Iu I u, by using the local rating data, the server will receive the item gradients from each ordinary client u U\ U, i.e.,

n number of users (i.e., clients) m number of items R = {1, . . . , 5} rating range rui R rating of user u to item i R = {(u, i, rui)} rating records in training data Ru rating records w.r.t. user u in R Rte = {(u, i, rui)} rating records in test data I the whole set of items Iu items rated by user u I u, |I u| = ρ|Iu| sampled unrated items w.r.t. user u from I\Iu U the whole set of users Ui users who rated item i U i users who virtually rated item i U u i users who virtually rated item i w.r.t. denoiser u U = { u} denoising clients (denoisers) yui {0, 1} indicator variable d R number of latent dimensions Uu , U u R1 d user-speciﬁc latent feature vector Vi R1 d item-speciﬁc latent feature vector ˆrui predicted rating of user u to item i γ learning rate ρ sampling parameter c number of clients in training η number of denoising clients λ tradeoff parameter T iteration number

Table 1: Some notations and explanations used in the paper.

Figure 1: Illustration of the interactions between the server and each client in our lossless federated recommendation (Fed Rec++). Notice that Fed Rec (Lin et al. 2020) is a special case of our Fed Rec++ without the denoising clients.

step 2 in Figure 1. Then the server can calculate the summation of the gradients of each item i I as follows,

u U\ U V HF EF (u, i), (4)

where U\ U denotes the ordinary clients (excluding the denoising clients). Since V HF EF (u, i) with i Iu I u contain the gradients of client u to unrated items I u, the server can

not identify each client u s rated items Iu easily. Hence, the rating behaviors of each client u are protected. Notice that Vi in Eq.(4) is not immediately divided by |Ui U i| as that in Fed Rec (Lin et al. 2020) via Eq.(2), but is divided by the number of users that have rated item i, i.e., |Ui|, after noise elimination for more accurate preference modeling. We will show the details in Eq.(5) and Eq.(6). The gradient noise (i.e., V HF EF (u, i), i I u) from each ordinary client u U\ U will inevitably bias the modeling of the user s preferences, which will be more serious when a larger value of ρ is used (Lin et al. 2020). In order to eliminate the gradient noise in Vi in Eq.(4), we design a speciﬁc algorithm for noise elimination in the server. Firstly, the server randomly selects some denoising clients U U as denoisers, which will be used to collect the gradient noise, i.e., V HF EF (u, i), i I u, u U\ U, from the ordinary clients. Secondly, each denoising client u sends the summation of the noisy gradients of the ordinary clients (i.e., V S( u, i)) to the server in order to eliminate the noise in Vi in Eq.(4), which corresponds to step 3 in Figure 1. Notice that the server will also receive the number of users who virtually rated item i, i.e., |U u i |, from each denoising client u U in step 3 in Figure 1. Thirdly, once the server has received V S( u, i) from all the denoisers U, the server can eliminate the gradient noise in Vi in Eq.(4) as follows,

u U V S( u, i), (5)

where V S( u, i) also contains the gradients for item i of the denoiser u itself, i.e., V HF EF ( u, i), i I u. We will describe the details in Eq.(9). After the server has eliminated the gradient noise in Vi , the server can then calculate the number of users who rated item i, i.e., |Ui| = |Ui U i| P

u U |U u i |, and update Vi as follows,

where γ denotes the learning rate. We depict the whole process of eliminating the gradient noise in the server in Algorithm 1.

Eliminate the Gradient Noise in the Client After each client u U has received the item-speciﬁc latent feature vectors Vi , i I from the server, i.e., step 1 in Figure 1, each client u U can use its own local rating data to calculate the gradient Uu ,

P i Iu( eui Vi + λUu )

where eui = rui ˆrui. Moreover, we can calculate the gradients V HF EF (u, i), i Iu I u via Eq.(3), which is the same as that in Fed Rec (Lin et al. 2020). Notice that Uu is used to update Uu locally in each client, and V HF EF (u, i) are sent to the server to update Vi with the denoising information, i.e., step 2 in Figure 1. Notice that if a client u is a denoising client, it only needs to calculate the gradients V HF EF (u, i) with i Iu rather than

Algorithm 1 The algorithm of Fed Rec++ in the server.

1: Randomly select some clients as denoisers, i.e., U. 2: Initialize the model parameters Vi , i = 1, 2, . . . , m and send them to each client u U. 3: for t = 1, 2, . . . , T do 4: for each client u U in parallel do 5: Client Training(Vi , i = 1, 2, . . . , m; TRAINING; u; u; U; t). 6: end for 7: Synchronize(). /*Wait for the clients to complete calculation.*/ 8: for i = 1, 2, . . . , m do 9: Calculate the gradient Vi via Eq.(4) and also |Ui U i|. 10: end for 11: for each client u U in parallel do 12: Client Training(NULL; COLLECTING; 0; u; NULL; 0). 13: end for 14: Synchronize(). /*Wait for all the clients to complete.*/ 15: for i = 1, 2, . . . , m do 16: Eliminate the gradient noise in Vi via Eq.(5). 17: Calculate the number of users who rated item i, i.e., |Ui| = |Ui U i| P

u U |U u i |. 18: Update Vi via Eq.(6). 19: end for 20: Decrease the learning rate γ 0.9γ. 21: end for

the gradients V HF EF (u, i) with i Iu I u, because the gradients V HF EF (u, i) with i Iu would be mixed with the gradient noise of the ordinary clients before being sent to the server. Hence, the privacy of the denoising client u does not leak towards the server, i.e., the rating behaviors of client u are protected. When each ordinary client u U\ U sends V HF EF (u, i) with i Iu I u to the server, they need to send the gradient noise (i.e., V N(u, i), i I u) to a denoising client u U at the same time, i.e., from the ordinary clients to the denoising clients of step 2 in Figure 1. We have the gradients V N(u, i) as follows,

V N(u, i) = V HF EF (u, i), i I u, (8)

where I u denotes the sampled unrated items w.r.t. user u. Notice that transferring information between clients is a prominent specialty of the decentralized distributed framework. In our Fed Rec++, we adopt this specialty by sending gradient noise to the denoising clients, which can help eliminate the gradient noise. For each denoising client u U, it does not need to send V HF EF ( u, i), i I u to the server immediately, because they can send V S( u, i) containing V HF EF ( u, i), i I u to the server after collecting the gradient noise from the ordinary clients. We have V S( u, i) as follows,

V S( u, i) = X

u u V N(u, i) V HF EF ( u, i), i I u, (9)

Algorithm 2 Client Training(Vi , i = 1, 2, . . . , m; OPERATION; u; u; U; t), i.e., the algorithm of Fed Rec++ in the client.

1: if OPERATION == TRAINING then 2: Sample items I u from I\Iu with |I u| = ρ|Iu|. 3: Assign Uu to U u , and update U u via U u U u γ U u in Tlcoal iterations. 4: Assign a virtual rating for item i, i I u via Eq.(1). 5: Calculate the gradient Uu via Eq.(7). 6: Update Uu via Uu Uu γ Uu . 7: for i Iu I u do 8: Calculate V HF EF (u, i) via Eq.(3). 9: end for 10: Upload V HF EF (u, i) with i Iu I u to the server. 11: if u is not a denoising client then 12: Calculate the gradient noise V N(u, i) with i I u via Eq.(8), and send it to a denoiser u U. 13: else 14: Calculate V S(u, i) via Eq.(9). 15: end if 16: else if OPERATION == COLLECTING then 17: Receive the gradient noise V N(u, i) with i I u from user u U\ U. 18: Calculate the summation of the gradient noise P u u V N(u, i) and the number of users who virtually rated item i w.r.t. the denoiser u, i.e., |U u i |. 19: Send V S( u, i) and |U u i | to the server. 20: end if

where the ﬁrst term denotes the summation of the gradient noise sent to the denoising client u. Notice that the server can then use Eq.(5) to completely remove the noise, and the resulting Vi on the left side of Eq.(5) contains the pure gradient of the rated item by the corresponding ordinary and/or denoising clients (i.e., users). Notice that each ordinary client u U\ U only sends its own gradient noise to one denoising client. And each denoising client u does not need to send its own gradient noise (i.e., V N( u, i)) to other denoisers, because each denoiser u does not need to send the gradient noise V HF EF ( u, i), i I u to the server and thus the server does not need to eliminate the gradient noise in V HF EF ( u, i), u U. When each denoiser u calculates the summation of the gradient noise from the ordinary clients, they can also calculate the number of users who virtually rated item i w.r.t. the denoiser u at the same time, i.e., |U u i |. After each denoiser u U obtains V S( u, i), they send V S( u, i) and |U u i | to the server, i.e., step 3 in Figure 1. Notice that V S( u, i) in Eq.(9) will not leak the privacy of the denoising clients towards the server, because V S( u, i) contains the gradient noise information of the ordinary client u U\ U. We describe the whole process of eliminating the gradient noise in the client in Algorithm 2.

Discussions

The designed noise elimination strategy in the server and in the client are actually quite generic and can be applied to

other recommendation methods. For example, as introduced before, SDCF (Jiang, Li, and Lin 2019) uses a two-stage random response algorithm to perturb the rated items Iu and the unrated items I u of each user, and then calculates the gradients V SDCF i with i I u of each user to the unrated items. The gradients V SDCF i with i I u can be rewritten as follows,

V SDCF i = γ(eui Uu + Vi Λ) N(0, γI), i I u, (10)

where γ is the learning rate, eui with i I u are sampled from the distribution of eui with i Iu, Λ is a diagonal matrix related to the Gamma distribution of the regularization term of Vi , and I is an identity matrix with appropriate dimension. We can see that the gradients V SDCF i with i I u are similar to the gradient noise V N(u, i) with i I u in Eq.(8) in our Fed Rec++. Hence, we can eliminate the noise introduced by the two-stage random response algorithm in SDCF via the denoising strategy in our Fed Rec++.

Privacy Analysis In this subsection, we analyze how our Fed Rec++ protects the user privacy in modeling explicit feedback. Firstly, each user s original rating records are always kept locally in the client in the whole process, which ensures the security of the original data. Secondly, our Fed Rec++ adopts a hybrid ﬁlling strategy (Lin et al. 2020) to assign a virtual rating to each randomly sampled unrated item, which protects the users rating behaviors. Thirdly, each ordinary client u transfers the gradients of the sampled unrated items (i.e., the gradient noise V N(u, i), i I u) to a denoising client, which again does not reveal the user s rating behaviors (i.e., Iu). Fourthly, the denoising client cannot identify the source (i.e., the sender) of the gradient, because the gradient does not contain the sensitive information of user ID, which guarantees the anonymity of each ordinary client (Duriakova et al. 2019). Finally, even if the server colludes with the denoising clients, the server cannot obtain the rating behaviors of a speciﬁc user according to the gradient noise of the ordinary clients as collected by the denoising clients because of the anonymity of the clients (i.e., the denoising clients do not know which ordinary client the gradient noise belongs to). Hence, the server cannot obtain a user u s rating behaviors Iu via comparing the item ID of V HF EF (u, i), i Iu I u uploaded to the server by user u and V N(u, i), i I u sent to a denoising client by user u.

Experiments We conduct experiments on three public datasets to study the effectiveness and efﬁciency of our Fed Rec++. We focus on the following three research questions: (i) RQ1: Is the denoising strategy in our Fed Rec++ lossless? (ii) RQ2: How is the impact of the number of clients participating in model training c? (iii) RQ3: What is the communication cost with different numbers of denoising clients η? Following Fed Rec (Lin et al. 2020), we use multi-thread programming in Java to simulate the interactions among the clients and the server, where the server and each client is modeled as one thread.

Datasets and Evaluation Metrics Besides using the two datasets in Fed Rec (Lin et al. 2020), i.e., Movie Lens 100K (ML100K) and Movie Lens 1M (ML1M), we also include a subset from Netﬂix (NF5K5K). Speciﬁcally, ML100K contains 100, 000 ratings of 1, 682 movies from 943 users; ML1M contains 1, 000, 209 ratings of 3, 952 movies from 6, 040 users; and NF5K5K contains 7, 944, 473 ratings of 5, 000 most popular movies from 5, 000 most active users. We process each dataset as follows: (i) we randomly divide the dataset into ﬁve parts with the same size; (ii) we take four parts as the training data, and the remaining one part as the test data; and (iii) we repeat the second step four times to get ﬁve different copies of training data and test data. We conduct experiments and report the average performance on these ﬁve copies of data. We use two commonly used evaluation metrics, i.e., RMSE and MAE, for performance evaluation. Notice that the losslessness of our denoising strategy is independent of the datasets and the evaluation metrics.

Baselines and Parameter Settings In order to study the effectiveness of our Fed Rec++, in particular of the merit of losslessness, we compare our Fed Rec++ with the most closely related work, i.e., Fed Rec (Lin et al. 2020). We have analyzed the generality of the denoising strategy, i.e., it is also applicable to SDCF (Jiang, Li, and Lin 2019), and will study its applicability to other works as the future work. In both Fed Rec and our Fed Rec++, we use PMF (Mnih and Salakhutdinov 2007) as the backbone model and the hybrid ﬁlling strategy for virtual ratings (Lin et al. 2020). And we randomly assign ordinary clients to denoising clients for eliminating the gradient noise. For parameter conﬁgurations, we mainly follow Fed Rec (Lin et al. 2020). In particular, we ﬁx the number of latent features d = 20 and the number of iterations T = 100. We search the best value of the learning rate γ {0.7, 0.8, . . . , 1.4}, and have γ = 0.8, γ = 0.8 and γ = 1.0 on ML100K, ML1M and NF5K5K, respectively. We search the best value of the tradeoff parameter on the regularization terms α {0.1, 0.01, 0.001}, and have α = 0.001 on all the three datasets. We use different values of the sampling parameter ρ {0, 1, 2, 3}. We choose the best value of the iteration number Tpredict for starting ﬁlling the sampled unrated items via Eq.(1) and the iteration number Tlocal for locally training U u both from {5, 10, 15}, and have (Tpredict, Tlocal) = (10, 10), (Tpredict, Tlocal) = (5, 15) and (Tpredict, Tlocal) = (5, 15) on ML100K with ρ = 1, ρ = 2 and ρ = 3, respectively; and have (Tpredict, Tlocal) = (10, 15), (Tpredict, Tlocal) = (10, 15) and (Tpredict, Tlocal) = (10, 15) on ML1M with ρ = 1, ρ = 2 and ρ = 3, respectively; and have (Tpredict, Tlocal) = (5, 10), (Tpredict, Tlocal) = (5, 10) and (Tpredict, Tlocal) = (5, 15) on NF5K5K with ρ = 1, ρ = 2 and ρ = 3, respectively. All the hyper parameters are searched according to the MAE performance on the ﬁrst copy of each dataset.

Results RQ1: Losslessness We report the performance of Fed Rec and our Fed Rec++ in Table 2, from which we can have

Data Algorithm MAE RMSE ρ

Fed Rec 0.7418 0.0048 0.9424 0.0064 0 Fed Rec 0.7440 0.0043 0.9432 0.0056 1 Fed Rec++ 0.7417 0.0049 0.9422 0.0063 Fed Rec 0.7445 0.0045 0.9431 0.0057 2 Fed Rec++ 0.7422 0.0047 0.9430 0.0061 Fed Rec 0.7447 0.0043 0.9431 0.0054 3 Fed Rec++ 0.7416 0.0049 0.9421 0.0064

Fed Rec 0.7193 0.0012 0.9106 0.0015 0 Fed Rec 0.7217 0.0011 0.9129 0.0012 1 Fed Rec++ 0.7198 0.0011 0.9113 0.0013 Fed Rec 0.7239 0.0011 0.9152 0.0011 2 Fed Rec++ 0.7195 0.0011 0.9109 0.0013 Fed Rec 0.7263 0.0010 0.9178 0.0010 3 Fed Rec++ 0.7196 0.0013 0.9109 0.0015

Fed Rec 0.7139 0.0007 0.9090 0.0008 0 Fed Rec 0.7148 0.0004 0.9102 0.0005 1 Fed Rec++ 0.7137 0.0008 0.9088 0.0012 Fed Rec 0.7152 0.0005 0.9104 0.0005 2 Fed Rec++ 0.7137 0.0002 0.9089 0.0002 Fed Rec 0.7160 0.0007 0.9110 0.0005 3 Fed Rec++ 0.7138 0.0005 0.9090 0.0004

Table 2: Recommendation performance of Fed Rec and our Fed Rec++ with different values of ρ {1, 2, 3}. Notice that we ﬁx c = n and η = 1 in our Fed Rec++, and copy the results of Fed Rec on ML100K and ML1M from (Lin et al. 2020) for reference and direct comparison.

the following observations: (i) The performance of our Fed Rec++ with ρ {1, 2, 3} is almost the same with that of Fed Rec with ρ = 0 (i.e., without introducing gradient noise), which means that our Fed Rec++ is able to completely eliminate the noise introduced when assigning virtual ratings to the sampled unrated items (i.e., the denoising strategy is lossless). The results are very promising and clearly show that our Fed Rec++ ensures privacy in model training without sacriﬁcing the recommendation accuracy. (ii) The performance of Fed Rec decreases with a larger value of ρ for higher security, which is expected because the volume of noise is proportional to the value of ρ. On the contrary, the performance of our Fed Rec++ does not decrease accordingly, which means that it can well eliminate the noise regardless of a small or large value of ρ.

ML100K ML1M

Figure 2: Recommendation performance of Fed Rec and our Fed Rec++ with different values of c {0.2n, 0.6n, n} and ρ {1, 2, 3}. Notice that we ﬁx η = 1.

Data Denoising Ordinary Client η Client Client

0 170 170 0 81,550 254 258 1 571 256 350 n/4 251 256 294 n/2

0 265 265 0 804,058 397 399 1 903 397 551 n/4 393 397 460 n/2

0 2535 2535 0 6,324,227 3799 3801 1 7309 3800 4608 n/4 3537 3799 4172 n/2

Table 3: Average communication cost per iteration of each denoising client, each ordinary client and each (denoising or ordinary) client in our Fed Rec++ with different numbers of denoising clients η {0, 1, n/4, n/2}. Notice that we ﬁx c = n and ρ = 1. The unit of each cost is 80 bytes occupied by one latent vector.

RQ2: Impact of the Number of Clients in Training In this subsection, we study the inﬂuence of the number of clients participating in model training. We ﬁx η = 1, and report the results of Fed Rec and our Fed Rec++ with c {0.2n, 0.6n, n} and ρ {0, 1, 2, 3} in Figure 2. Notice that the results on NF5K5K are similar to that on ML100K. We can obtain the following observations: (i) When c = 0.2n, the performance of Fed Rec decreases fast as ρ increases, and the overall performance of our Fed Rec++ is signiﬁcantly better than that of Fed Rec on each corresponding value of ρ, which again shows the effectiveness of the noise elimination strategy in our Fed Rec++. Because there are fewer clients participating in model training when c = 0.2n, the model training is insufﬁcient and the error is higher as expected. Hence, when c = 0.2n, the performance of both Fed Rec and our Fed Rec++ with ρ = 1, 2, 3 are worse than that in Table 2. (ii) When c {0.6n, n}, the performance of Fed Rec on ML100K does not decrease much with the increased values of ρ, while its performance on ML1M decreases. This means that we may choose to use the noise elimination strategy in our Fed Rec++ appropriately for different datasets. Importantly, the performance of our Fed Rec++ with ρ {1, 2, 3} is almost the same with that in Table 2, which means that we may only use 60% clients in model training to achieve comparable performance as that of using all the clients.

RQ3: Communication Cost In this subsection, we study the impact of the number of denoising clients η {0, 1, n/4, n/2} in our Fed Rec++ on the communication cost. Notice that when the number of denoising clients is larger than the number of ordinary clients (i.e., η > n/2), the item gradients of the redundant denoising clients are not uploaded to the server together with the item gradient noise

of the ordinary clients, which will then cause the rating behaviors of these denoising clients to be exposed to the server. We ﬁx c = n and ρ = 1, and report the cost in Table 3. Notice that when η = 0, we do not use the noise elimination strategy and our Fed Rec++ reduces to Fed Rec. And we do not count the communication cost of the clients receiving the model parameter Vi , i I sent by the server, because this part of communication cost is the same for Fed Rec and our Fed Rec++. From Table 3, we can see: (i) Using more denoising clients (i.e., a larger value of η) can more efﬁciently process the gradient noise in parallel, which thus reduces the cost of each denoising client. (ii) The cost of each ordinary client is almost the same when η {1, n/4, n/2}, which is expected since it is independent of the number of denoising clients. (iii) When η {n/4, n/2}, the cost of each client is higher than that when η {0, 1}, because there are more denoising clients sending gradients to the server. Quantitatively, the cost of each client is at most 350 170 = 180, 551 265 = 286, and 4608 2535 = 2073 more on ML100K, ML1M and NF5K5K, respectively. Because there are 100 iterations in model training, the averaged additional communication costs for each client are 180 80 100 bytes = 1.37 MB, 286 80 100 bytes = 2.18 MB, and 2073 80 100 bytes = 15.8 MB, on ML100K, ML1M and NF5K5K, respectively. We can see that the noise elimination strategy in our Fed Rec++ only consumes a small amount of communication cost of clients, which shows another merit of our Fed Rec++.

Conclusions and Future Work

In this paper, we study an emerging problem, i.e., privacyaware recommendation with explicit feedback. In particular, we propose a novel and lossless federated recommendation method called Fed Rec++, for which we use some denoising clients to completely eliminate the noise caused by the assigned virtual ratings to some randomly sampled items. We also conduct privacy analysis to show that our Fed Rec++ is able to protect the user privacy well. Moreover, our Fed Rec++ is a generic solution, which embodies Fed Rec (Lin et al. 2020) as a special case, and the denoising strategy can also be used in the other privacy-aware recommendation method such as SDCF (Jiang, Li, and Lin 2019). Experimental results on three public datasets show the effectiveness (i.e., losslessness) and efﬁciency (i.e., low communication cost) of our Fed Rec++.

For future works, we are interested in generalizing our Fed Rec++ to some models with ranking losses (e.g., pairwise loss (Rendle et al. 2009) or listwise loss (Wu, Hsieh, and Sharpnack 2018)), neural network models, and some vertical federated machine learning settings (Yang et al. 2019; Zhang et al. 2020). We are also interested in federating some more advanced recommendation models such as those based on deep learning techniques (He et al. 2017; Liang et al. 2018; Sun et al. 2019). Moreover, we will further explore the applicability of the denoising strategy to other works.

Acknowledgments We thank the support of National Natural Science Foundation of China Nos. 61872249 and 61836005. Weike Pan and Zhong Ming are co-corresponding authors for this work.

References Ammad-ud-din, M.; Ivannikova, E.; Khan, S. A.; Oyomno, W.; Fu, Q.; Tan, K. E.; and Flanagan, A. 2019. Federated Collaborative Filtering for Privacy-Preserving Personalized Recommendation System. Co RR abs/1901.09888. URL https://arxiv.org/abs/2004.04256.

Chai, D.; Wang, L.; Chen, K.; and Yang, Q. 2020. Secure Federated Matrix Factorization. IEEE Intelligent Systems doi: 10.1109/MIS.2020.3014880.

Chen, C.; Liu, Z.; Zhao, P.; Zhou, J.; and Li, X. 2018. Privacy Preserving Point-of-Interest Recommendation Using Decentralized Matrix Factorization. In Proceedings of the 32nd AAAI Conference on Artiﬁcial Intelligence, 257 264.

Duriakova, E.; Tragos, E. Z.; Smyth, B.; Hurley, N.; Pe na, F. J.; Symeonidis, P.; Geraci, J.; and Lawlor, A. 2019. PDMFRec: A Decentralised Matrix Factorisation with Tunable User-Centric Privacy. In Proceedings of the 13th ACM Conference on Recommender Systems, 457 461.

EU. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance). https://eur-lex.europa.eu/eli/reg/2016/ 679/oj. Accessed April 27, 2019.

Flanagan, A.; Oyomno, W.; Grigorievskiy, A.; Tan, K. E.; Khan, S. A.; and Ammad-ud-din, M. 2020. Federated Multiview Matrix Factorization for Personalized Recommendations. Co RR abs/2004.04256. URL https://arxiv.org/abs/ 2004.04256.

He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; and Chua, T. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web, 173 182.

Jalalirad, A.; Scavuzzo, M.; Capota, C.; and Sprague, M. R. 2019. A Simple and Efﬁcient Federated Recommender System. In Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, 53 58.

Jiang, J.; Li, C.; and Lin, S. 2019. Towards A More Reliable Privacy-Preserving Recommender System. Information Sciences 482: 248 265.

Kairouz, P.; Mc Mahan, H. B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A. N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; D Oliveira, R. G. L.; Rouayheb, S. E.; Evans, D.; Gardner, J.; Garrett, Z.; Gasc on, A.; Ghazi, B.; Gibbons, P. B.; Gruteser, M.; Harchaoui, Z.; He, C.; He, L.; Huo, Z.; Hutchinson, B.; Hsu, J.; Jaggi, M.; Javidi, T.; Joshi, G.; Khodak, M.; Konecn y, J.; Korolova, A.; Koushanfar, F.; Koyejo, S.; Lepoint, T.; Liu, Y.; Mittal, P.; Mohri, M.; Nock, R.; Ozg ur, A.; Pagh, R.; Raykova, M.; Qi, H.; Ramage, D.; Raskar, R.; Song, D.; Song, W.; Stich, S. U.; Sun, Z.; Suresh,

A. T.; Tram er, F.; Vepakomma, P.; Wang, J.; Xiong, L.; Xu, Z.; Yang, Q.; Yu, F. X.; Yu, H.; and Zhao, S. 2019. Advances and Open Problems in Federated Learning. Co RR abs/1912.04977. URL http://arxiv.org/abs/1912.04977.

Koren, Y. 2008. Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 426 434.

Liang, D.; Krishnan, R. G.; Hoffman, M. D.; and Jebara, T. 2018. Variational Autoencoders for Collaborative Filtering. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, 689 698.

Lin, G.; Liang, F.; Pan, W.; and Ming, Z. 2020. Fed Rec: Federated Recommendation with Explicit Feedback. IEEE Intelligent Systems doi:10.1109/MIS.2020.3017205.

Mc Mahan, B.; Moore, E.; Ramage, D.; Hampson, S.; and y Arcas, B. A. 2017. Communication-Efﬁcient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artiﬁcial Intelligence and Statistics, 1273 1282.

Mnih, A.; and Salakhutdinov, R. R. 2007. Probabilistic Matrix Factorization. In Proceedings of the 21st International Conference on Neural Information Processing Systems, 1257 1264.

Nichol, A.; Achiam, J.; and Schulman, J. 2018. On First Order Meta-Learning Algorithms. Co RR abs/1803.02999. URL http://arxiv.org/abs/1803.02999.

Rendle, S. 2012. Factorization Machines with lib FM. ACM Transaction on Intelligent System and Technology 3(3): 57:1 57:22.

Rendle, S.; Freudenthaler, C.; Gantner, Z.; and Schmidt Thieme, L. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25th Conference on Uncertainty in Artiﬁcial Intelligence, 452 461.

Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; and Jiang, P. 2019. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 1441 1450.

Welling, M.; and Teh, Y. W. 2011. Bayesian Learning via Stochastic Gradient Langevin Dynamics. In Proceedings of the 28th International Conference on Machine Learning, 681 688.

Wu, L.; Hsieh, C.; and Sharpnack, J. 2018. SQL-Rank: A Listwise Approach to Collaborative Ranking. In Proceedings of the 35th International Conference on Machine Learning, ICML 18, 5311 5320.

Yang, Q.; Liu, Y.; Chen, T.; and Tong, Y. 2019. Federated Machine Learning: Concept and Applications. ACM Transactions on Intelligent Systems and Technology 10(2): 12:1 12:19.

Zhang, C.; Liu, Y.; Wang, L.; Liu, Y.; Li, L.; and Zheng, N. 2020. Joint Intelligence Ranking by Federated Multiplicative Update. IEEE Intelligent Systems 35(4): 15 24.