# preference_diffusion_for_recommendation__f697594d.pdf Published as a conference paper at ICLR 2025 PREFERENCE DIFFUSION FOR RECOMMENDATION Shuo Liu1,2 An Zhang2 Guoqing Hu3 Hong Qian1 Tat-Seng Chua2 1East China Normal University, China 2National University of Singapore, Singapore 3University of Science and Technology of China, China shuoliu@stu.ecnu.edu.cn, anzhang@u.nus.edu, hl15671953077@ ustc.mail.edu.cn, hqian@cs.ecnu.edu.cn, dcscts@nus.edu.sg Recommender systems aim to predict personalized item rankings by modeling user preference distributions derived from historical behavior data. While diffusion models (DMs) have recently gained attention for their ability to model complex distributions, current DM-based recommenders typically rely on traditional objectives such as mean squared error (MSE) or standard recommendation objectives. These approaches are either suboptimal for personalized ranking tasks or fail to exploit the full generative potential of DMs. To address these limitations, we propose Prefer Diff, an optimization objective tailored for DM-based recommenders. Prefer Diff reformulates the traditional Bayesian Personalized Ranking (BPR) objective into a log-likelihood generative framework, enabling it to effectively capture user preferences by integrating multiple negative samples. To handle the intractability, we employ variational inference, minimizing the variational upper bound. Furthermore, we replace MSE with cosine error to improve alignment with recommendation tasks, and we balance generative learning and preference modeling to enhance the training stability of DMs. Prefer Diff devises three appealing properties. First, it is the first personalized ranking loss designed specifically for DM-based recommenders. Second, it improves ranking performance and accelerates convergence by effectively addressing hard negatives. Third, we establish its theoretical connection to Direct Preference Optimization (DPO), demonstrating its potential to align user preferences within a generative modeling framework. Extensive experiments across six benchmarks validate Prefer Diff s superior recommendation performance. Our codes are available at https://github.com/lswhim/Prefer Diff. 1 INTRODUCTION The recommender system endeavors to model the user preference distribution based on their historical behaviour data (He & Mc Auley, 2016; Wang et al., 2019; Rendle, 2022) and predict personalized item rankings. Recently, diffusion models (DMs) (Sohl-Dickstein et al., 2015; Ho et al., 2020; Yang et al., 2024) have gained considerable attention for their robust capacity to model complex data distributions and versatility across a wide range of applications, encompassing diverse input styles: texts (Li et al., 2022; Lovelace et al., 2023), images (Dhariwal & Nichol, 2021; Ho & Salimans, 2022) and videos (Ho et al., 2022a;b). As a result, there has been growing interest in employing DMs as recommenders in recommender systems. These DM-based recommenders utilize the diffusion-then-denoising process on the user s historical interaction data to uncover the potential target item, typically following one of three approaches: modeling the distribution of the next item (Yang et al., 2023b; Wang et al., 2024b; Li et al., 2024), capturing the user preference distribution (Wang et al., 2023b; Zhao et al., 2024; Hou et al., 2024a; Zhu et al., 2024), or focusing on the distribution of time intervals for predicting the user s next action (Ma et al., 2024a). However, prevalent DM-based recommenders often routinely rely on standard generative loss functions, such as mean squared error (MSE), or blindly adapt established recommendation objectives, such as Bayesian personalized ranking (BPR) (Rendle et al., 2009) and (binary) cross entropy (Sun et al., 2019) without any modification. Despite their empirical An Zhang is the corresponding author. Published as a conference paper at ICLR 2025 0.036 (a) (b) Figure 1: Illustration of user preference distributions modeled by DM-based recommenders. (a) Neglecting the negative item distribution leads to predicted items potentially being closer to negative items. (b) Incorporating the negative sampling enhances the understanding of user preferences. success, two key limitations in their training objectives have been identified, which may hinder further advancements in this field: DM-based recommenders inheriting generative objective functions (Yang et al., 2023b) lack a comprehensive understanding of user preference sequences. They model user behavior by considering only the items users have interacted with, neglecting the critical role of negative items in recommendations (Chen et al., 2023a; 2024; Zhang et al., 2024). As illustrated in Figure 1(a), although the predicted item centroid is close to the positive item, the sampling process of the DMs may tend to obtain the final predicted item embedding in high-density regions (red in Figure 1(a)(b)). This can result in the predicted item embedding being too close to negative items, thereby affecting the personalized ranking performance. Enabling DMs to understand what users may dislike can help alleviate this issue, as illustrated in Figure 1(b). DM-based recommenders simply employ standard recommendation training objectives, hindering their generative ability. This type of DM-based recommenders treats DMs primarily as noise-resistant models that focus on ranking or classification rather than on generation. While this approach can mitigate the impact of noisy interactions inherent in recommender systems (Wang et al., 2023b; Li et al., 2024), it may not fully exploit the generative and generalization capabilities of DMs, whose primary objective is to maximize the data log-likelihood. To better understand and redesign a diffusion optimization objective that is specially tailored to model user preference distributions for personalized ranking, we aim to simultaneously encode user dislikes and enhance the generative capability of the ranking objective. Our approach involves extending the classical and widely-adopted BPR objective to incorporate multiple negative samples, while also clarifying its connection to likelihood-based generative models, exemplified by DMs (Yang et al., 2024). BPR only seeks to maximize the rating margin between positive and negative items, which may result in high score negative ratings. In contrast, our core idea focuses on modeling user preference distributions, where the distribution of positive items diverges from that of negative items, conditioned on the user s personalized interaction history. To this end, we propose a training objective specifically designed for DM-based recommenders, called Prefer Diff, which effectively integrates negative samples to better capture user preference distributions. Specifically, by applying softmax normalization, we transform BPR from a rating ranking into log-likelihood ranking, leading to the formulation of LBPR-Diff. However, since DMs are latent variable models (Ho et al., 2020), direct optimization through gradient descent is intractable. To address this intractability, we derive a variational upper bound for LBPR-Diff using variational inference, which serves as a surrogate optimization target. Furthermore, we replace the original MSE with cosine error (Hou et al., 2022b), allowing generated items to better align with the similarity calculations in recommendation tasks and controlling the scale of embeddings (Chen et al., 2023c). Additionally, we extend LBPR-Diff to incorporate multiple negative samples, enabling the model to inject richer preference information during training while implementing an efficient strategy to prevent redundant denoising steps from excessive negative samples. Finally, we balance generation learning and preference learning to achieve a trade-off that enhances both training stability and model performance, culminating in the final objective function, LPrefer Diff. Published as a conference paper at ICLR 2025 Benefiting from a comprehensive understanding of user preference distributions, Prefer Diff has three appealing properties: First, Prefer Diff is the first personalized ranking loss specifically designed for DM-based recommenders, incorporating multiple negatives to model the user preference distributions. Second, gradient analysis reveals that Prefer Diff handles hard negatives by assigning higher gradient weights to item sequences, where DM incorrectly assigns a higher likelihood to negative items than positive ones (Chen et al., 2022; Fan et al., 2023; Zhang et al., 2023))(cf. Section 3.2). This not only improves recommendation performance but also accelerates training (cf. Section 4.1). Third, from a preference learning perspective, we find that Prefer Diff is connected to Direct Preference Optimization (Rafailov et al., 2023) under certain conditions, indicating its potential to align user preferences through generative modeling in diffusion-based recommenders (cf. Section 3.2). We evaluate the effectiveness of Prefer Diff through extensive experiments and comparisons with baseline models using six widely adopted public benchmarks (cf. Section 4.1). Furthermore, by simply replacing item ID embeddings with item semantic embeddings via advanced text-embedding modules, Prefer Diff shows strong generalization capabilities for sequential recommendations across untrained domains and platforms, without introducing additional components (cf. Section 4.2). 2 PRELIMINARY In this section, we begin by formally introducing the task of sequential recommendation and then introduce the foundations of DM-based recommenders who model the next-item distribution. Sequential Recommendation. Suppose each user has a historical interaction sequence {i1, i2, . . . , in 1}, representing their interactions in chronological order and in is the next target item. For each sequence, we randomly sample negative items from batch or candidate set result in H = {iv}|H| v=1. Moreover, each item i is associated with a unique item ID or additional descriptive information (e.g., title, brand and category). Via ID-embedding or text-embedding module, items can be transformed into its corresponding vectors e R1 d. Therefore, the historical interaction sequence and negative items set can be transformed to c = {e1, e2, . . . , en 1} and H = {ev}V v=1. The goal of sequential recommendation is to give the personalized ranking on the whole candidate set, namely, predict the next item in user may prefer given the sequence c and negative items set H. Diffusion models for Sequential Recommendation. In this section, we introduce the use of guided DMs to model the conditional next-item distribution p(in | i 1 else z = 0 Sample noise if not final step. 9: ˆe0 = (1 + w)Fθ(ˆet, M(c), t) w Fθ(ˆet, Φ, t) Apply classifier-free guidance. 10: ˆϵθ = ˆet αtˆe0 1 αt Compute predicted noise. 11: ˆet 1 = αt 1ˆe0 + 1 αt 1ˆϵθ DDIM update step when σt = 0. 12: end for 13: return ˆe0 Published as a conference paper at ICLR 2025 Table 5: Detailed Statistics of Datasets after Preprocessing. Fully Trained Recommendation General Sequential Recommendation Sports Beauty Toys Steam ML-1M Yahoo!R1 Pretraining Validation CDs Movies Steam #Sequences 35,598 22,363 19,412 39,795 6,040 50,000 746,688 101,501 112,379 297,529 39,795 #Items 18,357 12,101 11,924 9,265 3,706 23,589 68,668 8,623 15,520 25,925 9,265 #Interactions 256,598 162,150 138,444 437,733 60,400 500,000 3,258,523 452,415 457,589 2,053,497 437,733 sort all sequences chronologically for each dataset, then split the data into training, validation, and test sets with an 8:1:1 ratio, while preserving the last 10 interactions as the historical sequence. Amazon 2014 1. Here, we choose three public real-world benchmarks (i.e., Sports, Beauty and Toys) which has been widely utilized in recent studies (Rajput et al., 2023). Here, we utilize the common five-core datasets (Hou et al., 2022a), filtering out users and items with fewer than five interactions across all datasets. Following previous work (Yang et al., 2023b), we set the maximized length user interaction sequence as 10. Amazon 2018 2. Following prior works (Hou et al., 2022a; Li et al., 2023a), we select five distinct product review categories namely, Automotive, Electronics, Grocery and Gourmet Food, Musical Instruments, and Tools and Home Improvement as pretraining datasets. Cell Phones and Accessories is used as the validation set for early stopping. In line with previous research (Yang et al., 2023b), we filter out items with fewer than 20 interactions and user interaction sequences shorter than 5, capping the maximum length of each user s interaction sequence at 10. Steam is a game review dataset collected from Steam 3. Due to the large number of game reviews, we filter out users and items with fewer than 20 interactions. ML-1M is a movie rating dataset collected by Group Lens 4. We filter out users and items with fewer than 20 interactions. Yahoo!R1 is a music rating dataset collected by Yahoo 5. We filter out users and items with fewer than 20 interactions. D.2 IMPLEMENTATION DETAILS For a fair comparison, all experiments are conducted in Py Torch using a single Tesla V100-SXM332GB GPU and an Intel(R) Xeon(R) Gold 6248R CPU. We optimize all methods using the Adam W optimizer and all models parameters are initialized with Standard Normal initialization. We fix the embedding dimension to 64 for all models except DM-based recommenders, as the latter only demonstrate strong performance with higher embedding dimensions, as discussed in Section 4.3. Since our focus is not on network architecture and for fair comparison, we adopt a lightweight configuration for baseline models that employ a Transformer backbone 6, using a single layer with two attention heads. Notably, all baselines, unless otherwise specified, use cross-entropy as the loss function, as recent studies (Zhang et al., 2024; Klenitskiy & Vasilev, 2023; Zhai et al., 2023) have demonstrated its effectiveness. For Perfer Diff, for each user sequence, we treat the other next-items (a.k.a., labels) in the same batch as negative samples. We set the default diffusion timestep to 2000, DDIM step as 20, pu = 0.1, and the β linearly increase in the range of [1e 4, 0.02] for all DM-basd sequential recommenders (e.g., Dream Rec). We empirically find that tuning these parameters may lead to better recommendation performance. However, as this is not the focus of the paper, we do not elaborate on it. The other hyperparameter (e.g., learning rate) search space for Prefer Diff and the baseline models is provided in Table 11, while the best hyperparameters for Prefer Diff are listed in Table 12. 1https://cseweb.ucsd.edu/ jmcauley/datasets/amazon/links.html 2https://cseweb.ucsd.edu/ jmcauley/datasets/amazon_v2/ 3https://github.com/kang205/SASRec 4https://grouplens.org/datasets/movielens/1m/ 5https://webscope.sandbox.yahoo.com/ 6https://github.com/Yang Zhengyi98/Dream Rec/ Published as a conference paper at ICLR 2025 D.3 BASELINES OF SEQUENTIAL RECOMMENDATION Traditional sequential recommenders: GRU4Rec (Hidasi et al., 2016) adopts RNNs to model user behavior sequences for session-based recommendations. Here, following the previopus work (Kang & Mc Auley, 2018; Yang et al., 2023b), we treat each user s interaction sequence as a session. SASRec (Kang & Mc Auley, 2018) adopts a directional self-attention network to model the user user behavior sequences. Bert4Rec (Sun et al., 2019) adapts the original text-based BERT model with the cloze objective for modeling user behavior sequences. We adopt the implementation of mask from (Ren et al., 2024b) Contrastive learning based sequential recommenders: CL4SRec (Xie et al., 2022) incorporates the contrastive learning with the transformer-based sequential recommendation model to obtain more robust results. We adopt the implementation 7 from (Ren et al., 2024b). Generative sequential recommenders: TIGER(Rajput et al., 2023) introduces codebook-based identifiers through RQ-VAE, which quantizes semantic information into code sequences for generative recommendation. Since the source code is unavailable, we implement it using the Hugging Face and Transformers APIs, following the original paper by utilizing T5 (Ni et al., 2022) as the backbone. For quantization, we employ FAISS (Johnson et al., 2019), which is widely used 8 in recent studies of recommendation (Hou et al., 2023). DM-based sequential recommenders: Diff Rec (Wang et al., 2023b) introduces the application of diffusion on user interaction vectors (i.e., multi-hot vectors) for collaborative recommendation, where 1 denotes a positive interaction and 0 indicates a potential negative interaction. We adopt the author s public implementation 9. Dream Rec (Yang et al., 2023b) uses the historical interaction sequence as conditional guiding information for the diffusion model to enable personalized recommendations and utilize MSE as the training objective. We adopt the author s public implementation 10. Diffu Rec (Li et al., 2024) introduces the DM to reconstruct target item embedding from a Transformer backbone with the user s historical interaction behaviors and utilize CE as the training objective. We adopt the author s public implementation 11. Text-based sequential recommenders: Mo Rec (Yuan et al., 2023) utilizes item features from text descriptions or images, encoded using a text encoder or vision encoder, and applies dimensional transformation to match the appropriate dimension for recommendation. Here, we utilize the Open AI-3-large embeddings, SASRec as backbone and transform the dimension to 64. LLM2Bert4Rec (Harte et al., 2023) proposes initializing item embeddings with textual embeddings. In our implementation, we use Open AI-3-large embeddings, Bert4Rec as backbone and apply PCA to reduce the dimensionality to 64, as mentioned in the original paper. Noablely, the inconsistent performance of Tiger and LLM2BERT4Rec with their origin paper is actually caused by the differences in evaluation settings. Both of these papers use the Leave-one-out evaluation setting, which differs from the User-split used in our work. Results of Other Backbone. Here, we present a comparison of Prefer Diff with other recommenders using a different backbone, namely GRU. As shown in Table 6, Prefer Diff still outperforms Dream Rec across all datasets, further validating its versatility. Empirically, we find that, unlike SASRec, which 7https://github.com/HKUDS/SSLRec/ 8https://github.com/facebookresearch/faiss 9https://github.com/Yiyan Xu/Diff Rec/ 10https://github.com/Yang Zhengyi98/Dream Rec/ 11https://github.com/WHUIR/Diffu Rec/ Published as a conference paper at ICLR 2025 performs better with a Transformer than with GRU4Rec, Prefer Diff performs better with GRU as the backbone on the Sports and Toys datasets compared to using a Transformer. This could be due to the relatively shallow Transformer used, making GRU easier to fit. More suitable network architectures for DM-based recommenders will be explored in future work. Table 6: Comparison of the performance with sequential recommenders with GRU as backbone. The improvement achieved by Prefer Diff is significant (p-value 0.05). Model Sports and Outdoors Beauty Toys and Games R@5 N@5 R@10 N@10 R@5 N@5 R@10 N@10 R@5 N@5 R@10 N@10 GRU4Rec 0.0022 0.0020 0.0030 0.0023 0.0093 0.0078 0.0102 0.0081 0.0097 0.0087 0.0100 0.0090 SASRec 0.0047 0.0036 0.0067 0.0042 0.0138 0.0090 0.0219 0.0116 0.0133 0.0097 0.0170 0.0109 Dream Rec 0.0201 0.0147 0.0230 0.0165 0.0431 0.0290 0.0543 0.0321 0.0484 0.0343 0.0591 0.0382 Prefer Diff 0.0216 0.0165 0.0250 0.0176 0.0451 0.0313 0.0590 0.0358 0.0530 0.0385 0.0623 0.0415 D.4 LEAVE ONE OUT Evaluation. The leave-one-out strategy is another widely adopted evaluation protocol in sequential recommendation. For each user s interaction sequence, the final item serves as the test instance, the penultimate item is reserved for validation, and the remaining preceding interactions are utilized for training. During testing, the ground-truth item of each sequence is ranked against a set of candidate items, allowing for a comprehensive assessment of the model s ranking capabilities. Performance is evaluated by computing ranking-based metrics over the test set, and the final reported result is the average metric across all users in the test set. Table 7: Detailed Statistics of Datasets after Preprocessing in Leave-One-Out Setting. Datasets Sports Beauty Toys Automotive Music Office #Sequences 35,598 22,363 19,412 2,929 1,430 4,906 #Items 18,357 12,101 11,924 1,863 901 2,421 #Interactions 296,337 198,502 167,597 20,473 10,261 53,258 Avg. Length 8.32 8.87 8.63 6.99 7.17 10.86 Datasets. Except for the original three datasets (Sports, Toys and Beauty) in TIGER, we select three additional product review categories namely, Automotive , Music Instrument and Office Product from Amazon 2014 for a more comprehensive comparison. Here, we utilize the common five-core datasets, filtering out users and items with fewer than five interactions across all datasets. Baselines. Here, we directly report baseline results (e.g., S3-Rec (Zhou et al., 2020), P5 (Geng et al., 2022), FDSA (Hao et al., 2023)) from TIGER (Rajput et al., 2023) and evaluate Dream Rec (Yang et al., 2023b) and the proposed Prefer Diff. Results. Tables 8 and Tables 9 present the performance of Prefer Diff compared with six categories sequential recommenders. For breivty, R stands for Recall, and N stands for NDCG. The topperforming and runner-up results are shown in bold and underlined, respectively. Improv represents the relative improvement percentage of Prefer Diff over the best baseline. We observe that in the leave-one-out setting, Prefer Diff demonstrates competitive recommendation performance compared to the baselines. Specifically, on larger datasets (i.e., Sports and Beauty), Prefer Diff performs on par with TIGER. However, on the Toys dataset and the three smaller datasets, Prefer Diff achieves a significant lead.This may be due to Prefer Diff adopting the same manner as Dream Rec, where recommendation is not included in the training process. With a smaller number of items, this approach can result in more precise recommendation performance. D.5 GENERAL SEQUENTIAL RECOMMENDATION Pretraining Datasets. Here, we introduce more details about Pretraining datasets. Following the previous work (Hou et al., 2022a; Li et al., 2023a), we select five different product reviews from Amazon 2018 (Ni et al., 2019), namely, Automotive , Cell Phones and Accessories , Grocery and Gourmet Food , Musical Instruments and Tools and Home Improvement , as pretraining datasets. Cell Phones and Accessories is selected as the validation dataset for early stopping when Recall@5 Published as a conference paper at ICLR 2025 Table 8: Performance comparison on sequential recommendation under leave one out. The last row depicts % improvement with Prefer Diff relative to the best baseline. Methods Sports and Outdoors Beauty Toys and Games R@5 N@5 R@10 N@10 R@5 N@5 R@10 N@10 R@5 N@5 R@10 N@10 P5 0.0061 0.0041 0.0095 0.0052 0.0163 0.0107 0.0254 0.0136 0.0070 0.0050 0.0121 0.0066 Caser 0.0116 0.0072 0.0194 0.0097 0.0205 0.0131 0.0347 0.0176 0.0176 0.0166 0.0270 0.0141 HGN 0.0189 0.0120 0.0313 0.0159 0.0325 0.0206 0.0540 0.0257 0.0266 0.0321 0.0497 0.0277 GRU4Rec 0.0129 0.0086 0.0204 0.0111 0.0164 0.0113 0.0283 0.0137 0.0137 0.0097 0.0176 0.0084 BERT4Rec 0.0115 0.0075 0.0191 0.0099 0.0263 0.0184 0.0407 0.0214 0.0170 0.0161 0.0310 0.0183 FDSA 0.0182 0.0128 0.0288 0.0156 0.0261 0.0201 0.0407 0.0228 0.0228 0.0150 0.0381 0.0199 SASRec 0.0233 0.0162 0.0412 0.0209 0.0462 0.0387 0.0605 0.0318 0.0463 0.0463 0.0675 0.0374 S3-Rec 0.0251 0.0161 0.0385 0.0204 0.0380 0.0244 0.0647 0.0327 0.0327 0.0294 0.0700 0.0376 Dream Rec 0.0087 0.0071 0.0096 0.0075 0.0318 0.0257 0.0624 0.0273 0.0422 0.0347 0.0689 0.0362 TIGER 0.0264 0.0181 0.0400 0.0225 0.0454 0.0321 0.0648 0.0384 0.0521 0.0371 0.0712 0.0432 Prefer Diff 0.0275 0.0190 0.0405 0.0218 0.0455 0.0317 0.0660 0.0388 0.0603 0.0403 0.0851 0.0483 Improve 4.16% 4.97% 1.25% -3.1% 0.22% -1.25% 1.85% 1.04% 15.73% 8.63% 19.52% 11.81% Table 9: Performance comparison on sequential recommendation under leave one out. The last row depicts % improvement with Prefer Diff relative to the best baseline. Methods Automotive Music Office R@5 N@5 R@10 N@10 R@5 N@5 R@10 N@10 R@5 N@5 R@10 N@10 Dream Rec 0.0543 0.0400 0.0683 0.0445 0.0622 0.0414 0.0783 0.0467 0.0523 0.0378 0.0699 0.0434 TIGER 0.0454 0.0290 0.0745 0.0383 0.0532 0.0358 0.0840 0.0456 0.0462 0.0299 0.0746 0.0390 Prefer Diff 0.0649 0.0463 0.0864 0.0532 0.0650 0.0453 0.0874 0.0526 0.0538 0.0379 0.0850 0.0480 Improve 19.52% 15.75% 15.97% 19.55% 4.50% 9.42% 4.04% 12.63% 2.87% 0.26% 13.90% 10.60% (i.e., R@5) shows no improvement for 20 consecutive epochs. The detailed statistics of each dataset used for pretraining are shown in Table 10. Clearly, the pretraining datasets have no domain overlap with the unseen datasets used in Section 4.2. Table 10: Detailed Statistics of Pretraining Datasets. Datasets Automotive Phones Tools Instruments Food #Sequences 193,651 157,212 240,799 27,530 127,496 #Items 18,703 12,839 22,854 2,494 11,778 #Interactions 806,939 544,339 1,173,154 110,151 623,940 Avg. Length 7.26 6.51 7.19 7.06 7.24 Baselines. Here, we introduce more details for baselines in General Sequential Recommendation tasks. Notably, for a fair comparison, we employ the text-embedding-3-large model (Liu et al., 2025a) from Open AI (Neelakantan et al., 2022) as the text encoder instead of Bert (Devlin et al., 2019) in Uni SRec and Mo Rec to convert identical item descriptions (e.g., title, category, brand) into vector representations, as it has been proven to deliver commendable performance in recommendation (Harte et al., 2023). Different of the Mixed-of-Experts (Mo E) Whitening utilized in Uni SRec, we employ identical ZCA-Whitening (Bell & Sejnowski, 1997) for the textual item embeddings for Mo Rec and Our proposed Prefer Diff. Uni SRec (Hou et al., 2022a) uses textual item embeddings from frozened text encoder and adapts to a new domain using an Mo E-enhance adaptor. We adopt the author s public implementation 12. Mo Rec (Yuan et al., 2023) uses textual item embeddings from frozened text encoder and utilize dimension transformation technique. The architecture is the same as previously mentioned. Positive Correlation Between Training Data Scale and General Sequential Recommendation Performance. Here, we explore how the scale of training data impacts the general sequential recommendation performance of Prefer Diff-T. For brevity, we use the initials to represent each dataset. For example, A stands for Automotive, and P stands for Phones. AP indicates that the training data for pretraining includes both Automotive and Phones datasets training set. We observe that both NDCG and HR increase as the training data grows, indicating that Prefer Diff-T can effectively learn general knowledge to model user preference distributions through pre-training on 12https://github.com/RUCAIBox/Uni SRec Published as a conference paper at ICLR 2025 diverse datasets and transfer this knowledge to unseen datasets via advanced textual representations. Further studies can explore whether homogeneous datasets lead to greater performance improvements (e.g., whether Amazon Book data provides a larger boost for Goodreads compared to other datasets) and investigate the limits of data scalability for Prefer Diff-T. A APF APFTI APFHICM Training Data Data Scales (a) NDCG@5 on Steam A APF APFTI APFHICM Training Data Data Scales (b) HR@5 on Steam Figure 4: Positive Correlation Between Training Data Scale and General Sequential Recommendation Performance. D.6 HYPERPARAMETER SEARCH SPACE Here, we introduce the hyperparamter search space for baselines and Prefer Diff. Table 11: Hyperparameters Search Space for Baselines. Hyperparameter Seach Space GRU4Rec lr {1e-2, 1e-3, 1e-4, 1e-5}, weight decay=0 SASRec lr {1e-2, 1e-3, 1e-4, 1e-5}, weight decay=0 Bert4Rec lr {1e-2, 1e-3, 1e-4, 1e-5}, weight decay=0, mask probability {0.2,0.4,0.6,0.8} CL4SRec lr {1e-2, 1e-3, 1e-4, 1e-5}, weight decay=0, λ {0.1, 0.3, 0.5, 1.0, 3.0} Diff Rec lr {1e-2, 1e-3, 1e-4, 1e-5}, weight decay=0, noise scale {1e-1, 1e-2, 1e-3, 1e-4, 1e-5}, T {2, 5, 20, 50, 100} Dream Rec lr {1e-2, 1e-3, 1e-4, 1e-5}, weight decay=0, embedding size {64, 128, 256, 1024, 1536, 3072} , w {0, 2, 4, 6, 8, 10} Diffu Rec lr {1e-2, 1e-3, 1e-4, 1e-5}, weight decay=0, embedding size {64, 128, 256, 1024, 1536, 3072} Uni SRec lr {1e-2, 1e-3, 1e-4, 1e-5}, weight decay=0, λ {0.05, 0.1, 0.3, 0.5, 1.0, 3.0} TIGER lr {1e-2, 1e-3, 1e-4, 1e-5}, weight decay {0, 1e-1, 1e-2, 1e-3} Mo Rec lr {1e-2, 1e-3, 1e-4, 1e-5}, weight decay=0, text-encoder=text-embedding-3-large LLM2Bert4Rec lr {1e-2, 1e-3, 1e-4, 1e-5}, weight decay=0, text-encoder=text-embedding-3-large Prefer Diff lr {1e-2, 1e-3, 1e-4, 1e-5}, λ {0.2, 0.4, 0.6, 0.8}, embedding size {64, 128, 256, 1024, 1536, 3072} , w {0, 2, 4, 6, 8, 10} Table 12: Best Hyperparameters for Prefer Diff on Sports, Beauty, and Toys. Dataset learning rate weight decay λ w embedding size Sports 1e-4 0 0.4 2 3072 Beauty 1e-4 0 0.8 6 3072 Toys 1e-4 0 0.5 4 3072 E HYPERPARAMETER ANALYSIS FOR PREFERDIFF E.1 THE NUMBER OF NEGATIVE SAMPLES FOR PREFERDIFF. Here, we discuss the impact of the number of negative samples on Prefer Diff. As shown in Figure 6, we observe that in cases where the number of items is relatively small (e.g., Beauty and Toys), 8 Published as a conference paper at ICLR 2025 0.2 0.4 0.6 0.8 0.0134 0.2 0.4 0.6 0.8 0.0323 0.2 0.4 0.6 0.8 0.0017 Prefer Diff Figure 5: Effect of the λ for Prefer Diff. 1 4 8 16 32 64 128 Number of Negative Samples 1 4 8 16 32 64 128 Number of Negative Samples 1 4 8 16 32 64 128 Number of Negative Samples Figure 6: Effect of the Number of Negative Samples for Prefer Diff. 0 2 4 6 8 w 0 2 4 6 8 w 0 2 4 6 8 w Prefer Diff Figure 7: Effect of the w for Prefer Diff. negative samples are sufficient. However, as the number of items increases, the required number of negative samples also grows (e.g., in Sports). E.2 IMPORTANCE OF GUIDANCE STRENGTH FOR PREFERDIFF w controls the weight of personalized guidance during the inference stage of Prefer Diff. As shown in Figure 7, increasing w can enhance recommendation performance. However, an excessively large w may reduce the generalization capability of DMs, negatively impacting the recommender s performance. Therefore, we think setting w [2, 4]. E.3 DIFFERENT TEXT ENCODERS Obtaining Item Embedding from Advanced Text Encoder Here, we introduce the process for obtaining item embeddings from current advanced text-encoders (Liu et al., 2025b). For encoderbased large language models, such as Bert (Devlin et al., 2019) and Robert (Liu et al., 2019), we leverage the final hidden state representation associated with the [CLS] token (Hou et al., 2024b). For convenient, we directly utilize the Sentence Transformers APIs 13. As for other large language models, including T5 (Ni et al., 2022), Llama-7B (Touvron et al., 2023), Mistral-7B (Jiang et al., 2023), we utilize the output from the last transformer block corresponding to the final input token (Vaswani et al., 2017). Closed-source large language models like text-embedding-ada-v2 and text-embeddings3-large, we obtain the item embeddings directly via Open AI APIs 14 (Neelakantan et al., 2022). 13https://huggingface.co/sentence-transformers 14https://platform.openai.com/docs/guides/embeddings Published as a conference paper at ICLR 2025 Table 13: Comparison of the Prefer Diff-T performance with different text-encoder. Prefer Diff-T Sports and Outdoors Beauty Toys and Games Text-Encoders R@5 N@5 R@10 N@10 R@5 N@5 R@10 N@10 R@5 N@5 R@10 N@10 Bert 0.0022 0.0020 0.0030 0.0023 0.0104 0.0128 0.0154 0.0148 0.0051 0.0022 0.0068 0.0044 T5 0.0011 0.0009 0.0014 0.0011 0.0241 0.0198 0.0282 0.0212 0.0283 0.0240 0.0309 0.0248 Robert 0.0115 0.0098 0.0135 0.0102 0.0331 0.0256 0.0393 0.0276 0.0391 0.0303 0.0438 0.0319 Mistral-7B 0.0166 0.0130 0.0213 0.0146 0.0375 0.0287 0.0456 0.0312 0.0427 0.0328 0.0505 0.0353 LLa MA-7B 0.0171 0.0126 0.0205 0.0137 0.0402 0.0297 0.0483 0.0323 0.0397 0.0298 0.0494 0.0330 Open AI-Ada-V2 0.0160 0.0126 0.0183 0.0134 0.0407 0.0318 0.0469 0.0338 0.0396 0.0315 0.0467 0.0339 Open AI-3-large 0.0182 0.0145 0.0222 0.0158 0.0429 0.0327 0.0532 0.0360 0.0460 0.0351 0.0525 0.0387 Results. Table 13 shows the Prefer Diff-T employing different item embeddings encoded from text-encoders with varying parameter sizes and architectures. We can observe that Positive Correlation Between LLM Size and Recommendation Performance. The results show that Open AI-3-large outperforms all other models, indicating that larger language models (LLMs) yield better results in recommendation tasks. This is because larger models generate richer and more semantically stable embeddings, which improve Prefer Diff s ability to capture user preferences. Thus, the larger the LLM, the better the embeddings perform within Prefer Diff. High-Quality Embeddings Improve Generalization. Models like Mistral-7B and LLa MA-7B, although smaller than Open AI-3-large, still perform relatively well across metrics. This suggests that while model size is important, the quality of embeddings plays a crucial role. Especially in the Beauty, these models provide embeddings with sufficient semantic power to enhance recommendation quality. E.4 ANALYSIS OF LEARNED ITEM EMBEDDINGS (a) SASRec (b) Dream Rec (c) Prefer Diff Figure 8: t-SNE Visualization and Gaussian Kernel Density Estimation of Learned Item Embeddings on Amazon Beauty. To further analysis the item space learned by Prefer Diff, we reduce the dimensionality of the learned item embeddings using T-SNE (Van der Maaten & Hinton, 2008; Liu et al., 2024a; Qian et al., 2024) 15 to visualize the underlying distribution of the item space learned by Prefer Diff. Due to the large number of items in Amazon Beauty, we randomly select 2000 items as example. Then, we apply Gaussian kernel density estimation (Botev et al., 2010) 16 to analyze the density distribution of reduced item embeddings and visualize the results using contour plots. The red regions indicate areas where a 15https://scikit-learn.org/dev/modules/generated/sklearn.manifold.TSNE. html 16https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats. gaussian_kde.html Published as a conference paper at ICLR 2025 high concentration of items is clustered. From figure 8, we can observe that comparing with SASRec, Prefer Diff not only explores the item space more thoroughly (covering most regions). Comparing with Dream Rec, Prefer Diff exhibits a stronger clustering effect (with high-density regions concentrated in specific areas), better reflecting the similarities between items, result in better recommendation performance. F DISCUSSION F.1 COMPARISON ON OTHER BACKGROUND DATASETS. To further validate the effectiveness of Prefer Diff, we include Yahoo! R1 (Music) as an additional dataset, along with two other commonly used datasets in sequential recommendation Steam (Game) and ML-1M (Movie). These datasets provide a diverse set of user-item interaction patterns, allowing us to comprehensively evaluate the performance of our proposed Prefer Diff. We utilize the same data preprocessing technique and same evaluation setting as introduced in our paper for all three datasets, except Yahoo! R1. Due to its large size (over one million users), we are unable to provide results for the entire dataset during the rebuttal period. Instead, we randomly sampled 50,000 users for our experiments. We will include the full-scale results on Yahoo! R1 in the final revised version of the paper. The experimental results are shown in Table 14. Table 14: Performance Comparison Across Background Datasets (Recall@5/NDCG@5) Datasets (Background) Yahoo (Music) Steam (Game) ML-1M (Movie) GRU4Rec 0.0548 / 0.0491 0.0379 / 0.0325 0.0099 / 0.0089 SASRec 0.0996 / 0.0743 0.0695 / 0.0635 0.0132 / 0.0102 Bert4Rec 0.1028 / 0.0840 0.0702 / 0.0643 0.0215 / 0.0152 TIGIR 0.1128 / 0.0928 0.0603 / 0.0401 0.0430 / 0.0272 Dream Rec 0.1302 / 0.1025 0.0778 / 0.0572 0.0464 / 0.0314 Prefer Diff 0.1408 / 0.1106 0.0814 / 0.0680 0.0629 / 0.0439 We observe that the effectiveness of our proposed Prefer Diff across datasets with different backgrounds are validated. F.2 COMPARISON ON VARIABLE USER HISTORY we conduct additional experiments to evaluate the performance of Prefer Diff under different maximum history lengths {10, 20, 30, 40, 50}. Notably, since the historical interaction sequences in the original three datasets (Sports, Beauty, Toys) are relatively short, with an average length of around 10, we select two additional commonly used datasets Kang & Mc Auley (2018); Sun et al. (2019), Steam and ML-1M, for further experiments. These datasets were processed and evaluated following the same evaluation settings and data preprocessing protocols in our paper, which is different from the leave-one-out split in Kang & Mc Auley (2018); Sun et al. (2019). We choose another two datasets (Steam and ML-1M). The results are as follows: Table 15: Performance Comparison on Steam Dataset (Recall@5/NDCG@5) Model 10 20 30 40 50 SASRec 0.0698 / 0.0634 0.0676 / 0.0610 0.0663 / 0.0579 0.0668 / 0.0610 0.0704 / 0.0587 Bert4Rec 0.0702 / 0.0643 0.0689 / 0.0621 0.0679 / 0.0609 0.0684 / 0.0618 0.0839 / 0.0574 TIGIR 0.0603 / 0.0401 0.0704 / 0.0483 0.0676 / 0.0488 0.0671 / 0.0460 0.0683 / 0.0481 Dream Rec 0.0778 / 0.0572 0.0746 / 0.0512 0.0741 / 0.0548 0.0749 / 0.0571 0.0846 / 0.0661 Prefer Diff 0.0814 / 0.0680 0.0804 / 0.0664 0.0806 / 0.0612 0.0852 / 0.0643 0.0889 / 0.0688 From Table 15 and Table 16, we can observe that Prefer Diff consistently outperforms other baselines across different lengths of user historical interactions. Published as a conference paper at ICLR 2025 Table 16: Performance Comparison on ML-1M Dataset (Recall@5/NDCG@5) Model 10 20 30 40 50 SASRec 0.0201 / 0.0137 0.0242 / 0.0131 0.0306 / 0.0179 0.0217 / 0.0138 0.0205 / 0.0134 Bert4Rec 0.0215 / 0.0152 0.0265 / 0.0146 0.0331 / 0.0200 0.0248 / 0.0154 0.0198 / 0.0119 TIGIR 0.0451 / 0.0298 0.0430 / 0.0270 0.0430 / 0.0289 0.0364 / 0.0238 0.0430 / 0.0276 Dream Rec 0.0464 / 0.0314 0.0480 / 0.0349 0.0514 / 0.0394 0.0497 / 0.0350 0.0447 / 0.0377 Prefer Diff 0.0629 / 0.0439 0.0513 / 0.0365 0.0546 / 0.0408 0.0596 / 0.0420 0.0546 / 0.0399 F.3 WHY DREAMREC AND PREFERDIFF ARE SENSITIVE TO THE EMBEDDING DIMENSION? Here, we will try to explain the reason. Since there is no robust theoretical proof at this stage, we propose a hypothesis supported by simple theoretical reasoning and experimental validation. We guess the challenge is inherent to the DDPM Ho et al. (2020) itself, as it is designed to be variance-preserving as introduced in the following diffusion models Song et al. (2021b). For one target item, the forward process formula with vector form is as follows: Forward Process: et 0 = αte0 + 1 αtϵ Here, e0 R1 d represents the target item embedding, et 0 represents the noised target item embedding, αt denotes the degree of noise added, and ϵ is the noise sampled from a standard Gaussian distribution. Considering the whole item embeddings E RN d, where N represents the total number of items, we can rewrite the previous formula in matrix form as follows: Et 0 = αt E0 + 1 αtϵ Then, we calculate the variance on both sides of the equation: Var(Et 0) = αt Var(E0) + (1 αt)I We can observe that the Var(E0) is almost an identity matrix. This is relatively easy to achieve for data like images or text, as these data are fixed during the training process and can be normalized beforehand. However, in recommendation, the item embeddings are randomly initialized and updated dynamically during training. We empirically find that initializing item embeddings with a standard normal distribution is also a key factor for the success of Dream Rec and Prefer Diff. The results are shown as follows: Table 17: Performance of Different Initialization methods on Various Datasets (Recall@5/NDCG@5). Embedding Initialization Sports Beauty Toys Uniform 0.0039/0.0026 0.0013/0.0037 0.0015/0.0011 Kaiming Uniform 0.0025/0.0019 0.0040/0.0027 0.0051/0.0028 Kaiming Normal 0.0023/0.0021 0.0049/0.0028 0.0041/0.0029 Xavier Uniform 0.0011/0.0007 0.0036/0.0021 0.0051/0.0029 Xavier Normal 0.0014/0.0007 0.0067/0.0037 0.0042/0.0023 Standard Normal 0.0185/0.0147 0.0429/0.0323 0.0473/0.0367 We can observe that the initializing item embeddings with a standard normal distribution is the key of success for Diffusion-based recommenders. This experiment validates the aforementioned hypothesis. Furthermore, we also calculate the final inferred item embeddings of Dream Rec, Prefer Diff, and SASRec. As shown in Figure 9, interestingly, we observe that the covariance matrices of the final item embeddings for Dream Rec and Prefer Diff are almost identity matrices, while SASRec does not exhibit this property. This indicates that Dream Rec and Prefer Diff rely on high-dimensional embeddings to adequately represent a larger number of items. The identity-like covariance structure suggests that diffusion-based recommenders distribute variance evenly across embedding dimensions, requiring more dimensions to capture the complexity and diversity of the item space effectively. This further validates our the hypothesis that maintaining a proper variance distribution of the item embeddings is crucial for the effectiveness of current diffusion-based recommenders. Published as a conference paper at ICLR 2025 0 10 20 30 40 50 60 Dimensions 0 500 1000 1500 2000 2500 3000 (b) Dream Rec 0 500 1000 1500 2000 2500 3000 (c) Prefer Diff Figure 9: Covariance Matrix Visualization of Learned Item Embeddings on Amazon Beauty. We have tried several dimensionality reduction techniques (e.g., Projection Layers) and regularization techniques (e.g., enforcing the item embedding covariance matrix to be an identity matrix). However, these approaches empirically led to a significant drop in model performance. We guess one possible solution to this issue is to explore the use of Variance Exploding (VE) diffusion models Song et al. (2021b). Unlike Variance Preserving diffusion models, which maintain a constant variance throughout the diffusion process, VE diffusion models increase the variance over time. F.4 TRAINING AND INFERENCE TIME COMPARISON Table 18: Training and Inference Time Comparison for Prefer Diff and Baselines. Dataset Model Training Time (s/epoch)/(s/total) Inference Time (s/epoch) SASRec 2.67 / 35 0.47 Bert4Rec 7.87 / 79 0.65 TIGIR 11.42 / 1069 24.14 Dream Rec 24.32 / 822 356.43 Prefer Diff 29.78 / 558 6.11 SASRec 1.05 / 36 0.37 Bert4Rec 3.66 / 80 0.40 TIGIR 5.41 / 1058 10.19 Dream Rec 14.78 / 525 297.06 Prefer Diff 18.05 / 430 3.80 SASRec 0.80 / 56 0.22 Bert4Rec 3.11 / 93 0.23 TIGIR 3.76 / 765 4.21 Dream Rec 15.43 / 552 309.45 Prefer Diff 16.07 / 417 3.29 In this subsection, we endeavor to illustrate the training and inference time comparison between Prefer Diff and baseline methods, as efficiency is critically important for the practical application of recommenders in real-world scenarios. As shown in Table 18, Figure 10 and Figure 11, we can observe that In Prefer Diff, thanks to our adoption of DDIM for skip-step sampling, requires less training time and significantly shorter inference time compared to Dream Rec, another diffusion-based recommender. Compared to traditional deep learning methods like SASRec and Bert4Rec, Prefer Diff has longer training and inference times but achieves much better recommendation performance. Furthermore, compared to recent generative recommendation methods, such as TIGIR, which rely on autoregressive models and use beam search during inference, Prefer Diff also demonstrates shorter training and inference times, highlighting its efficiency and practicality in real-world scenarios. Published as a conference paper at ICLR 2025 0 200 400 600 800 1000 Total Training Time (s) SASRec Bert4Rec TIGIR Dream Rec Prefer Diff 0 200 400 600 800 1000 Total Training Time (s) SASRec Bert4Rec TIGIR Dream Rec Prefer Diff 100 200 300 400 500 600 700 800 Total Training Time (s) SASRec Bert4Rec TIGIR Dream Rec Prefer Diff Figure 10: Recall@5 and Total Training Time for Prefer Diff and Baselines. 0 50 100 150 200 250 300 350 Inference Time (s) SASRec Bert4Rec TIGIR Dream Rec Prefer Diff 0 50 100 150 200 250 300 Inference Time (s) SASRec Bert4Rec TIGIR Dream Rec Prefer Diff 0 50 100 150 200 250 300 Inference Time (s) SASRec Bert4Rec TIGIR Dream Rec Prefer Diff Figure 11: Recall@5 and Inference Time for Prefer Diff and Baselines. F.5 TRADE-OFF BETWEEN RECOMMENDATION PERFORMANCE AND INFERENCE TIME As introduced in Subsection F.4, Prefer Diff demonstrates significantly lower inference time compared to Dream Rec, averaging around 3 seconds per batch. However, this may still be unacceptable for real-time recommendation scenarios with strict latency constraints. In this subsection, we aim to show how adjusting the number of denoising steps can effectively balance recommendation performance and inference time. As shown in Figure 12 and Table 19, we observe that by adjusting the number of denoising steps, Prefer Diff can ensure practicality for real-time recommendation tasks. This flexibility allows for a trade-off between inference speed and recommendation performance, making Prefer Diff adaptable to various latency constraints while maintaining competitive effectiveness. F.6 CONNECTION OF PREFERDIFF AND DPO In Preferdiff, we aim to redesign a diffusion optimization objective that is specially tailored to model user preference distributions for personalized ranking. Therefore, we reformulate the classic recommendation objective Bayesian personalized ranking Rendle et al. (2009) to log-likelihood rankings Denoise Step Sports Beauty Toys (a) Recall@5 Denoise Step Sports Beauty Toys Figure 12: Relationship of Denoising Steps and Recommendation Performance. Published as a conference paper at ICLR 2025 Table 19: Adjusting Denoising Steps for Trade-Off Between Recommendation Performance and Inference Time. Datasets Sports Beauty Toys SASRec (0.33s) 0.0047 / 0.0036 0.0138 / 0.0090 0.0133 / 0.0097 BERT4Rec (0.42s) 0.0101 / 0.0060 0.0174 / 0.0112 0.0226 / 0.0139 TIGER (12.85s) 0.0093 / 0.0073 0.0236 / 0.0151 0.0185 / 0.0135 Dream Rec (320.98s) 0.0155 / 0.0130 0.0406 / 0.0299 0.0440 / 0.0323 Prefer Diff (Denoising Step=1, 0.35s) 0.0162 / 0.0131 0.0384 / 0.0289 0.0437 / 0.0340 Prefer Diff (Denoising Step=2, 0.43s) 0.0165 / 0.0133 0.0398 / 0.0309 0.0438 / 0.0341 Prefer Diff (Denoising Step=4, 0.65s) 0.0177 / 0.0137 0.0402 / 0.0296 0.0433 / 0.0342 Prefer Diff (Denoising Step=20, 3s) 0.0185 / 0.0147 0.0429 / 0.0323 0.0473 / 0.0367 Table 20: Comparison with DPO and Diffusion-DPO (Recall@5/NCDG@5) Models Sports Beauty Toys Dream Rec + DPO (β = 1) 0.0031 / 0.0015 0.0067 / 0.0053 0.0030 / 0.0022 Dream Rec + DPO (β = 5) 0.0036 / 0.0026 0.0053 / 0.0034 0.0036 / 0.0023 Dream Rec + DPO (β = 10) 0.0019 / 0.0011 0.0075 / 0.0056 0.0046 / 0.0034 Dream Rec + Diffusion-DPO (β = 1) 0.0129 / 0.0101 0.0308 / 0.0244 0.0324 / 0.0261 Dream Rec + Diffusion-DPO (β = 5) 0.0132 / 0.0113 0.0321 / 0.0251 0.0340 / 0.0272 Dream Rec + Diffusion-DPO (β = 10) 0.0133 / 0.0115 0.0281 / 0.0223 0.0345 / 0.0281 Prefer Diff 0.0185 / 0.0147 0.0429 / 0.0323 0.0473 / 0.0367 which meet the requirement of generative modeling in diffusion models. We are also surprisingly and delightedly discovering that the one-negative-sample version of Prefer Diff s formulation, LBPR-Diff, is indeed related to the recent well-known DPO Rafailov et al. (2023) which stems from Reinforcement Learning with Human Feedback, as you have mentioned. To further validate the rationality of our proposed LBPR-Diff, we intentionally aligned some aspects of our final formulation with DPO in terms of mathematical expression. However, there are significant distinctions between Prefer Diff and DPO. First, Prefer Diff is an optimization objective specifically tailored to model user preferences in diffusion-based recommenders. It is designed to align with the unique characteristics of the diffusion process, ensuring its effectiveness in recommendation tasks. We also replace the MSE loss with Cosine loss Second, unlike DPO and Diffusion-DPO Wallace et al. (2024), Prefer Diff incorporates multiple negative samples and proposes a theoretically guaranteed, efficient strategy to reduce the computational overhead of denoising caused by the increased number of negative samples in diffusion models. This innovation allows Prefer Diff to scale effectively while maintaining high performance, making it well-suited for large-negative-sample scenarios in recommendation tasks. Third, unlike DPO and Diffusion-DPO, Prefer Diff is utilized in an end-to-end manner without relying on a reference model. In contrast, DPO and Diffusion-DPO require a two-stage process, where the first step involves training a reference model. This significantly increases training overhead, which is often unacceptable in practical recommendation scenarios. To further validate the aforementioned distinctions, we conduct experiments on three datasets using DPO and Diffusion-DPO. Specifically, we select β, a crucial hyperparameter in DPO, with values of 1, 5, and 10, and integrate it with Dream Rec for a fair comparison. The results are shown in Table 20 We can observe that Prefer Diff outperforms DPO and Diffusion-DPO by a large margin on all three datasets. This further validates the effectiveness of our proposed Prefer Diff, demonstrating that it is specifically tailored to model user preferences in diffusion-based recommenders.