# learning_to_rank_in_generative_retrieval__b021c32c.pdf Learning to Rank in Generative Retrieval Yongqi Li1, Nan Yang2, Liang Wang2, Furu Wei2, Wenjie Li1, 1The Hong Kong Polytechnic University 2Microsoft liyongqi0@gmail.com, {nanya,wangliang,fuwei}@microsoft.com, cswjli@comp.polyu.edu.hk Generative retrieval stands out as a promising new paradigm in text retrieval that aims to generate identifier strings of relevant passages as the retrieval target. This generative paradigm taps into powerful generative language models, distinct from traditional sparse or dense retrieval methods. However, only learning to generate is insufficient for generative retrieval. Generative retrieval learns to generate identifiers of relevant passages as an intermediate goal and then converts predicted identifiers into the final passage rank list. The disconnect between the learning objective of autoregressive models and the desired passage ranking target leads to a learning gap. To bridge this gap, we propose a learning-to-rank framework for generative retrieval, dubbed LTRGR. LTRGR enables generative retrieval to learn to rank passages directly, optimizing the autoregressive model toward the final passage ranking target via a rank loss. This framework only requires an additional learning-to-rank training phase to enhance current generative retrieval systems and does not add any burden to the inference stage. We conducted experiments on three public benchmarks, and the results demonstrate that LTRGR achieves state-of-the-art performance among generative retrieval methods. The code and checkpoints are released at https://github.com/liyongqi67/LTRGR. Introduction Text retrieval is a crucial task in information retrieval and has a significant impact on various language systems, including search ranking (Nogueira and Cho 2019) and open-domain question answering (Chen et al. 2017). At its core, text retrieval involves learning a ranking model that assigns scores to documents based on a given query, a process known as learning to rank. This approach has been enduringly popular for decades and has evolved into point-wise, pair-wise, and list-wise methods. Currently, the dominant implementation is the dual-encoder approach (Lee, Chang, and Toutanova 2019; Karpukhin et al. 2020), which encodes queries and passages into vectors in a semantic space and employs a list-wise loss to learn the similarities. An emerging alternative to the dual-encoder approach in text retrieval is generative retrieval (Tay et al. 2022; Bevilacqua et al. 2022). Generative retrieval employs autoregressive Copyright 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. language models to generate identifier strings of passages as an intermediate target for retrieval. An identifier is a distinctive string to represent a passage, such as Wikipedia titles to Wikipedia passages. The predicted identifiers are then mapped to ranked passages as the retrieval results. In this manner, generative retrieval treats passage retrieval as a standard sequence-to-sequence task, maximizing the likelihood of the passage identifiers given the input query, distinct from previous learning-to-rank approaches. There are two main approaches to generative retrieval regarding the identifier types. One approach, exemplified by the DSI system and its variants (Tay et al. 2022), assigns a unique numeric ID to each passage, allowing predicted numeric IDs to directly correspond to passages on a one-to-one basis. However, this approach requires memorizing the mappings from passages to their numeric IDs, making it ineffective for large corpus sets. The other approach (Bevilacqua et al. 2022) takes text spans from the passages as identifiers. While the text span-based identifiers are effective in the large-scale corpus, they no longer uniquely correspond to the passages. In their work, a heuristic-based function is employed to rank all the passages associated with the predicted identifiers. Following this line, Li et al. proposed using multiview identifiers, which have achieved comparable results on commonly used benchmarks with large-scale corpus. In this work, we follow the latter approach to generative retrieval. Despite its rapid development and substantial potential, generative retrieval remains constrained. It relies on a heuristic function to convert predicted identifiers into a passage rank list, which requires sensitive hyperparameters and exists outside the learning framework. More importantly, generative retrieval generates identifiers as an intermediate goal rather than directly ranking candidate passages. This disconnect between the learning objective of generative retrieval and the intended passage ranking target brings a learning gap. Consequently, even though the autoregressive model becomes proficient in generating accurate identifiers, the predicted identifiers cannot ensure an optimal passage ranking order. Tackling the aforementioned issues is challenging, as they are inherent to the novel generative paradigm in text retrieval. However, a silver lining emerges from the extensive evolution of the adeptness learning-to-rank paradigm, which has demonstrated adeptness in optimizing the passage ranking objective. Inspired by this progress, we propose to en- The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) hance generative retrieval by integrating it with the classical learning-to-rank paradigm. Our objective is to enhance generative retrieval to not solely generate fragments of passages but to directly acquire the skill of ranking passages. This shift aims to bridge the existing gap between the learning focus of generative retrieval and the envisaged passage ranking target. In pursuit of this goal, we propose a learning-to-rank framework for generative retrieval, dubbed LTRGR. LTRGR involves two distinct training phases, as visually depicted in Figure 1: the learning-to-generate phase and the learningto-rank phase. In the initial learning-to-generate phase, we train an autoregressive model consistent with prior generative retrieval methods via the generation loss, which takes queries as input and outputs the identifiers of target passages. Subsequently, the queries from the training dataset are fed into the trained generative model to predict associated identifiers. These predicted identifiers are mapped to a passage rank list via a heuristic function. The subsequent learningto-rank phase further trains the autoregressive model using a rank loss over the passage rank list, which optimizes the model towards the objective of the optimal passage ranking order. LTRGR includes the heuristic process in the learning process, rendering the whole retrieval process end-to-end and learning with the objective of passage ranking. During inference, we use the trained model to retrieve passages as in the typical generative retrieval. Therefore, the LTRGR framework only requires an additional training phase and does not add any burden to the inference stage. We evaluate our proposed method on three widely used datasets, and the results demonstrate that LTRGR achieves the best performance in generative retrieval. The key contributions are summarized: We introduce the concept of incorporating learning to rank within generative retrieval, effectively aligning the learning objective of generative retrieval with the desired passage ranking target. LTRGR establishes a connection between the generative retrieval paradigm and the classical learning-to-rank paradigm. This connection opens doors for potential advancements in this area, including exploring diverse rank loss functions and negative sample mining. Only with an additional learning-to-rank training phase and without any burden to the inference, LTRGR achieves state-of-the-art performance in generative retrieval on three widely-used benchmarks. Related Work Generative Retrieval Generative retrieval is an emerging new retrieval paradigm, which generates identifier strings of passages as the retrieval target. Instead of generating entire passages, this approach uses identifiers to reduce the amount of useless information and make it easier for the model to memorize and learn (Li et al. 2023b). Different types of identifiers have been explored in various search scenarios, including titles (Web URLs), numeric IDs, and substrings, as shown in previous studies (De Cao et al. 2020; Li et al. 2023a; Tay et al. 2022; Bevilacqua et al. 2022; Ren et al. 2023). In 2023, Li et al. proposed multiview identifiers that represented a passage from different perspectives to enhance generative retrieval and achieve state-of-the-art performance. Despite the potential advantages of generative retrieval, there are still issues inherent in this new paradigm, as discussed in the previous section. Our work aims to address these issues by combining generative retrieval with the learning-to-rank paradigm. Learning to Rank Learning to rank refers to machine learning techniques used for training models in ranking tasks (Li 2011). This approach has been developed over several decades and is typically applied in document retrieval. Learning to rank can derive large-scale training data from search log data and automatically create the ranking model, making it one of the key technologies for modern web search. Learning to rank approaches can be categorized into point-wise (Cossock and Zhang 2006; Li, Wu, and Burges 2007; Crammer and Singer 2001), pair-wise (Freund et al. 2003; Burges et al. 2005), and list-wise (Cao et al. 2007; Xia et al. 2008) approaches based on the learning target. In the point-wise and pair-wise approaches, the ranking problem is transformed into classification and pair-wise classification, respectively. Therefore, the group structure of ranking is ignored in these approaches. The list-wise approach addresses the ranking problem more directly by taking ranking lists as instances in both learning and prediction. This approach maintains the group structure of ranking, and ranking evaluation measures can be more directly incorporated into the loss functions in learning. Method When given a query text q, the retrieval system must retrieve a list of passages {p1, p2, . . . , pn} from a corpus C, where both queries and passages consist of a sequence of text tokens. As illustrated in Figure 1, LTRGR involves two training stages: learning to generate and learning to rank. In this section, we will first provide an overview of how a typical generative retrieval system works. i.e. learning to generate, and then clarify our learning-to-rank framework within the context of generative retrieval. Learning to Generate We first train an autoregressive language model using the standard sequence-to-sequence loss. In practice, we follow the current sota generative retrieval method, MINDER (Li et al. 2023b), to train an autoregressive language model. Please refer to the MINDER for more details. Training. We develop an autoregressive language model, referred to as AM, to generate multiview identifiers. The model takes as input the query text and an identifier prefix, and produces a corresponding identifier of the relevant passage as output. The identifier prefix can be one of three types: "title", "substring", or "pseudo-query", representing the three different views. The target text for each view is the title, a random substring, or a pseudo-query of the target passage, respectively. During training, the three different samples are randomly shuffled to train the autoregressive model. The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24) Query Autoregressive Target Identifiers 1. Titles of positive passages 2. Body of positive passages Query Autoregressive Predicted Identifiers 1. Predicted titles 2. Predicted Passage 1 Passage 2 Passgage rank list (a) Learning to generate (b) Learning to rank Figure 1: This illustration depicts our proposed learning-to-rank framework for generative retrieval, which involves two stages of training. (a) Learning to generate: LTRGR first trains an autoregressive model via the generation loss, as a normal generative retrieval system. (b) Learning to rank: LTRGR continues training the model via the passage rank loss, which aligns the generative retrieval training with the desired passage ranking target. For each training sample, the objective is to minimize the sum of the negative loglikelihoods of the tokens {i1, , ij, , il} in a target identifier I, whose length is l. The generation loss is formulated as, j=1 log pθ(ij|q; I