# generative_adversarial_ranking_nets__b921a767.pdf Journal of Machine Learning Research 25 (2024) 1-35 Submitted 4/23; Revised 3/24; Published 4/24 Generative Adversarial Ranking Nets Yinghua Yao eva.yh.yao@gmail.com Center for Frontier AI Research, A*STAR, Singapore and Institute of High Performance Computing, A*STAR, Singapore Yuangang Pan yuangang.pan@gmail.com Center for Frontier AI Research, A*STAR, Singapore and Institute of High Performance Computing, A*STAR, Singapore Jing Li j.lee9383@gmail.com Center for Frontier AI Research, A*STAR, Singapore and Institute of High Performance Computing, A*STAR, Singapore Ivor W. Tsang ivor.tsang@gmail.com Center for Frontier AI Research, A*STAR, Singapore and Institute of High Performance Computing, A*STAR, Singapore Xin Yao xinyao@ln.edu.hk Department of Computing and Decision Sciences, Lingnan University, Hong Kong Editor: Matthew Hoffman We propose a new adversarial training framework generative adversarial ranking networks (GARNet) to learn from user preferences among a list of samples so as to generate data meeting user-specific criteria. Verbosely, GARNet consists of two modules: a ranker and a generator. The generator fools the ranker to raise generated samples to the top; while the ranker learns to rank generated samples at the bottom. Meanwhile, the ranker learns to rank samples regarding the interested property by training with preferences collected on real samples. The adversarial ranking game between the ranker and the generator enables an alignment between the generated data distribution and the user-preferred data distribution with theoretical guarantees and empirical verification. Specifically, we first prove that when training with full preferences on a discrete property, the learned distribution of GARNet rigorously coincides with the distribution specified by the given score vector based on user preferences. The theoretical results are then extended to partial preferences on a discrete property and further generalized to preferences on a continuous . The first version of this work was done when the author was a Ph D student at Southern University of Science and Technology (SUSTech), China and University of Technology Sydney (UTS), Australia. . The corresponding author. c 2024 Yinghua Yao, Yuangang Pan, Jing Li, Ivor W. Tsang and Xin Yao. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v25/23-0461.html. Yao, Pan, Li, Tsang, Yao property. Meanwhile, numerous experiments show that GARNet can retrieve the distribution of user-desired data based on full/partial preferences in terms of various interested properties (i.e., discrete/continuous property, single/multiple properties). Code is available at https://github.com/Eva Flower/GARNet. Keywords: Generative Adversarial Network, Controllable Generation, User Preferences, Adversarial Ranking, Relativistic f-Divergence 1. Introduction Generative Adversarial Networks (GANs) (Goodfellow et al., 2014; Arjovsky et al., 2017; Huang et al., 2022) are a popular generative model to approximate a data distribution via adversarial training between two networks, wherein a generator learns to generate data alike real data, whereas a discriminator learns to distinguish real data from synthetic data coming from the generator. Although unconditional generative modeling presents an intriguing technical challenge, it lacks the ability to control the generation of data and has limited practical relevance. In contrast, controllable generative models can generate data to match some user-desired properties, meeting the needs of many real-world applications (Engel et al., 2018). Recent success has been driven in the field of using some predefined attributes/classes or an evaluator that can evaluate specific properties for samples to direct the generation process of GANs (Mirza and Osindero, 2014; Asokan and Seelamantula, 2020; Engel et al., 2018; Gupta and Zou, 2019; Samanta et al., 2020). However, in many real-world applications, users may not be able to explicitly articulate the data they are interested in via the desired class/attribute labels or high values from the property evaluator (Engel et al., 2018). Instead, more naturally, users would often implicitly describe their preferences for data by ranking two or more samples in terms of a certain interested property (F urnkranz and H ullermeier, 2017). In particular, GANs are supposed to generate more (or even exclusively) user-desired data (i.e., data for which users have high preferences) by learning from preferences, which can have many promising applications. Here are some examples we propose. (1) In terms of image retrieval, it is more precise to adopt visual comparisons than conventional textual feedback to match the user s mental model of the desired content (Yu and Kovashka, 2020). Thus, data generated based on preferences is a good representation of user needs, which can be used to retrieve user-desired images from the database. (2) In terms of drug design, the experts may not be able to clearly describe the structure of desired drugs for target diseases but can assign their preferences over candidate drug structures based on their expertise or lab experiments. The process of drug discovery can be formulated as learning user-desired drug data distribution with the preferences. (3) In terms of reinforcement learning (RL) without an off-the-shelf reward function, it is feasible to provide human preferences between pairs of trajectories of the agent (Christiano et al., 2017; Wirth et al., 2017). Then, the generative model can produce user-desired demonstrations by learning from the preferences. The RL tasks can be solved as imitation learning with these generated demonstrations. There were a few works (Yu and Kovashka, 2020; Yao et al., 2022) applying user preferences to direct the generation of GANs. However, they required heavy interactions with humans to obtain preferences for generated samples during training so as to generate userdesired data. Moreover, these studies only considered the simplest pairwise preferences, failing to handle more general partial preferences. In this study, we consider teaching Generative Adversarial Ranking Nets GANs to learn a user-preferred data distribution directly from a more general form of user preferences, i.e., a collection of partial preferences over the training samples, without extra annotations on generated samples during the training. Incorporating such preferences into the training of GANs poses several challenges. First, learning a user-preferred data distribution means that the distribution alignment is imposed between a portion of the training data and generated data, while GANs match the generated data distribution to the whole training data distribution (Goodfellow et al., 2014; Mao et al., 2017). Therefore, GANs original framework cannot achieve the goal of this study. Second, the discriminator in GANs is designed to distill label information on individual samples (Goodfellow et al., 2014; Mirza and Osindero, 2014; Odena et al., 2017). So it cannot deal with preference data defined upon a list of samples, which is usually formulated as listwise ranking problems (Lin, 2010). Third, the key to eliciting adversarial classification in GANs is to assign the generated samples with fake labels when training the discriminator and real labels when training the generator, respectively (Goodfellow et al., 2014; Dai et al., 2017). Nevertheless, constructing the supervision for the generated samples adversarially in the context of users preferences (a.k.a, listwise ranking (Cao et al., 2007; Pan et al., 2022)) has not been explored. To tackle the challenges, we propose a new distribution matching mechanism between generated data and high-ranked data while considering all data in the training. Meanwhile, in order to learn from user preferences, GANs need to handle a list of samples. Motivated by learning to rank (Cao et al., 2007), we tailor-design a ranker to replace the discriminator in GANs. Therefore, we reform the adversarial learning framework of GAN and propose Generative Adversarial Ranking Nets (GARNet) to learn from user preferences. Specifically, our GARNet consists of a ranker and a generator. While a vanilla discriminator performs a binary classification task by taking an individual sample as input, our proposed ranker learns to rank samples in the context of partial ranking lists regarding the interested property. In addition, the ranker assigns the generated samples to the lowest position. As the generator targets the user-preferred distribution, it is trained to synthesize samples that confuse the ranker to be ranked higher than the real samples. Competition between the ranker and the generator drives both two modules to improve themselves until the generator converges to a user-preferred data distribution. The main contributions are summarized as follows: We propose Generative Adversarial Ranking Nets (GARNet) for data generation guided by user preferences. In particular, we design an adversarial ranking game between a ranker and a generator, which enables a distribution alignment between generated data and a portion of the training data (i.e., user-desired data) (Corollary 7). We prove that the objective function of the ranker in GARNet defines a relativistic f-divergence between the user-preferred data distribution and the generated data distribution, while the generator is to minimize the divergence to learn the user-preferred data distribution. Meanwhile, we prove that the distribution learned by GARNet is rigorously determined by the score vector given by the users (Proposition 5). We empirically show our GARNet can learn a user-preferred data distribution (determined by the given score vector) from preferences in terms of discrete and continuous properties. We further show that our GARNet can better retrieve the distribution of user-desired data from partial preferences than various baselines. The potential of GARNet is also validated in the application of imbalanced learning. Yao, Pan, Li, Tsang, Yao 2. Background and Problem Statement In this section, we first review existing work that applies labels or property evaluators to control generation with desired properties. Then we formulate our problem setting controllable generation by preferences. 2.1 Property-oriented Generation by Labels/Evaluators Many works (Mirza and Osindero, 2014; Odena et al., 2017; Miyato and Koyama, 2018; Sohn et al., 2015; Li et al., 2020; Pumarola et al., 2020; Ho and Salimans, 2022; Ding et al., 2020; Wang et al., 2021; Triess et al., 2022) proposed conditional variants of generative models that introduce conditioning variables C to enable user control for the generation, i.e., modeling the conditional distribution p(x|C). However, the condition C involving discrete class labels and attribute labels (Zhu et al., 2017) makes a restricting user control (discussed in Section 2.2.1). The conditions involving continuous labels can have a finegrained control, but they require complete knowledge about the properties, implying that there exists a universal evaluation criterion to annotate data with absolute values in terms of the properties, which could be hard to access in some real-world applications (Christiano et al., 2017). A few works (Engel et al., 2018; Gupta and Zou, 2019; Samanta et al., 2020) proposed to apply an accessible property evaluator h(x), which can provide evaluations for data samples w.r.t. a specified property to control the generation of generative models. However, these evaluators may not exist in real-world applications (Christiano et al., 2017) since they also require complete knowledge about the properties. In addition, their optimization for desired properties requires extensive feedback loops where all generated samples are sent to the evaluator h(x) for evaluation (Gupta and Zou, 2019), which would incur heavy cost especially when the evaluation process is expensive. Further, an additional problem is that they need to carefully balance the goal of generation quality and that of producing data with desired properties (Engel et al., 2018; Samanta et al., 2020) . 2.2 Our Problem: Property-oriented Generation by Preferences Let X = {xn}N n=1 denote a training dataset with N samples and S = {sm}M m=1 denote a collection1 of M preferences defined on subsets of X. In particular, each preference s S is an ordered list, namely, s := s1 > s2 > . . . > sl, and si X, i = 1, 2, . . . , l, (1) where 2 l N denotes the number of samples contained in s. l is usually varied for different preferences. Note that with l N, annotating preferences simply relies on partial knowledge about the properties (F urnkranz and H ullermeier, 2017) because it only involves comparisons among small subsets of the dataset. Our target is to learn a generative model Pg(x) that is equal to the user-preferred data distribution Pu(x) from partial preferences S. 1. Regarding data collection, the preferences S are collected among multiple users following the way in rank aggregation (Lin, 2010; Saquil et al., 2018; Pan et al., 2018). Generative Adversarial Ranking Nets Table 1: Main mathematical notations in this paper. Elements in a set (e.g., X) are indexed using subscripts (e.g., xi), whereas objects ranked at the i-th position in a ranking list s are indexed using superscripts si. Notation Explanation X training samples X = {xn}N n=1 Y class labels Y = {y1, y2, . . . , y T } endowed with an order y1 > y2 > . . . > y T O score space for a continuous property y(x) function that maps sample x to its class label o(x) function that maps sample x to its underlying ground-truth ranking score xi > xj sample xi is preferred over sample xj, xi, xj X s preferences, s := s1 > s2 > . . . > sl, si X S a collection of preferences S = {sm}M m=1 over training data X π(s) ground-truth score vector for preference s s(R) target preference to train the ranker, i.e., s(R) := s1 > . . . > sl > xg s(G) target preference to train the generator, i.e., s(G) := xg > s1 > . . . > sl Pu user-preferred data distribution Pg distribution of the generative model Pi distribution of Xi = {x|y(x) = yi} Df relativistic f-divergence R ranker G generator To be specific, Pu(x) should allocate high density to global high-ranked data while low or even zero density to global low-ranked data2. For example, as shown in Fig. 1, we assume that shoe images with different strengths in terms of the open attribute are distributed evenly. The collected pairwise preferences reveal that users prefer more open shoes. Accordingly, the generative model is expected to learn a distribution that assigns a larger density on shoe images with larger open values. Note that the generative model has the advantage of generating novel images different from the training data (Song and Ermon, 2019). Training data Generated data Figure 1: Illustration of user-preferred data distribution learning guided by user preferences (on the open attribute). 2. User preferences help to derive a global ranking over all the data (Cao et al., 2007; Lu and Boutilier, 2011). Global high-ranked data denote those data ranked high in the global ranking. Global low-ranked data denote those data ranked low in the global ranking. Yao, Pan, Li, Tsang, Yao 2.2.1 Controlled by Preferences VS. Conditioned on Labels Class labels or attribute labels are argued to have limited description capacity (Parikh and Grauman, 2011) since such category-level labels cannot capture intra-category differences. In contrast, user preferences can be used to describe fine-grained information (Parikh and Grauman, 2011; Chen et al., 2013; Raman and Joachims, 2014). For example, people feel tangled when they are asked to decide whether a face image is smiling or not, as there always exist many images that are difficult to categorize, apart from a small amount of obvious smiling images and non-smiling images. Instead, people can compare a set of images and rank them in terms of the strength of the smile attribute. On the other hand, since generation conditioned on class labels is dominated by those classes with sufficient training observations, a class with limited samples would be overlooked by the generative model. Instead, generation controlled by preferences can stress the modeling for preferred class . We will empirically show that preferences-guided distributional generation can serve as a remedy to improve extremely imbalanced class learning, while generation conditioned on labels fails in this case (See Section 7.4). 3. GARNet for Preferences w.r.t. a Discrete Property First of all, we consider a scenario in which the ranking relation among training samples is based on a discrete property. In particular, each training sample belongs to one of a finite set of classes which possess a natural order. Definition 1 (Preferences w.r.t. a discrete property) Let X = {xn}N n=1 be a set of training instances and Y = {y1, y2, . . . , y T } be a set of class labels endowed with an order y1 > y2 > . . . > y T , where T is the number of ordered classes. Assume each training instance xi is assigned with a label y(xi) Y, where y( ) is a function that outputs the label for a sample. The preference over a group of samples {xi1, xi2, . . . , } X is defined as xi1 > xi2 > xi3 > . . . , if y(xi1) > y(xi2) > y(xi3) > . . . . (2) We ignore the preference lists containing samples with the same labels since such preference lists can be simply transformed into the preferences (Eq. (2)) by removing redundant samples (Pan et al., 2022). Example. Suppose users prefer small digits on the MNIST dataset. Then for digits 0 to 9, the order would be 0 > 1 > 2 > . . . > 9, according to which users express their preferences over MNIST images. In the following part of this section, we propose a framework called Generative Adversarial Ranking Nets (GARNet) for the distribution learning guided by the above-mentioned preferences (problem definition in Section 2.2). Our GARNet consists of a ranker R (parameterized by θR) and a generator G (parameterized by θG). In particular, an adversarial ranking process is defined between the generator and the ranker. Namely, they are trained against the goal of ranking defined between the real data and the generated data (See Fig. 2). The competition between the generator and the ranker drives the generated distribution Pg(x) (simplified as Pg) aligned with the user-preferred data distribution Pu(x) (simplified as Pu). Generative Adversarial Ranking Nets random noise preference over a subset of generated sample target preference for ranker target preference for generator Figure 2: Framework of GARNet. 3.1 Given Full Preferences We open the description of our GARNet with the case given a collection of full preferences S = {sm}M m=1. That is, the length of each preference l is equal to the number of ordered classes T. Namely s := s1 > s2 . . . > sl, l = T where si X, i = 1, 2, . . . , l. (3) A full preference s would include samples from all ordered classes. Consequently, y(si) yi, s S and i = 1, 2, . . . , T. (4) Denoting Pi as the distribution of data Xi = {x|y(x) = yi, x X}, we have si Pi. 3.1.1 Listwise Ranking The ranker R learns a ranking function from the preferences S. We employ List Net (Cao et al., 2007), which is a cross-entropy based Learning-to-Rank (L2R) function that maximizes: LL2R (π(s), R(s)) = i=1 σ(π(s))i log σ(R(s))i. (5) With a slight abuse of notation, s = (s1, s2, . . . , sl). π(s) denotes the ground-truth score vector for the preference s, which can be explicitly or implicitly given by humans (Cao et al., 2007). R( ) denotes the score vector for the preference s predicted by the ranker R. In specific, the ranker R acts as a nonlinear feature extractor with a scalar output. Given feature matrix w = [w1, w2, . . . , wl] of l samples, the ranker R outputs a score vector R(w) = [R(w1), R(w2), . . . , R(wl)], where R(wi) R. σ( ) is a softmax function that takes a list of scalar values r = [r1, r2, . . . , rl] as input, i.e., σ(r)i = eri Pl j=1 erj . σ(π(s))i calculates the top-1 probability of object si, i.e., the probability of si being ranked on the top given the scores of all the objects π(s). Similarly, σ(R(s))i gives the top-1 probability of object si given the score vector R(s). 3.1.2 Generative Adversarial Ranking Given a preference s and a generated sample xg = G(z) Pg(x), where z Z is a random noise, the ranker R incorporates the generated sample xg into the ranking list s and outputs Yao, Pan, Li, Tsang, Yao Algorithm 1 Generative Adversarial Ranking Nets 1: Input: Training data X = {xn}N n=1, user preferences S = {sm}M m=1, batch size B, score vector π, ranker R and generator G. 2: Output: Generator G for user-preferred data distribution, i.e., Pg(x) = Pu(x). 4: Sample a mini-batch of preferences {si}B i=1 from S. 5: Get fake samples {xgi}B i=1 from the generator G, i.e., xgi = G(zi) where zi is a random noise. 6: Following Eq. (6a), construct target preferences {s(R) i }B i=1 for the ranker R. 7: Train the ranker R according to Eq. (8a). 8: Following Eq. (6b), construct target preferences {s(G) i }B i=1 for the generator G. 9: Train the generator G according to Eq. (8b). 10: until convergence a new ranking score vector as R(s, xg). Motivated by vanilla GAN (Goodfellow et al., 2014) which designs a target real/fake label for each generated sample to trigger the adversarial game between the discriminator and the generator, we construct a target preference for a generated list of samples consisting of s and xg, i.e., (s, xg), as adversarial supervision. To promote the competition between the ranker R and the generator G, we design the target preference in two different ways, namely, target preference for R s(R) := s1 > . . . > s T > xg, (6a) target preference for G s(G) := xg > s1 > . . . > s T . (6b) The generator fools the ranker to grade the generated samples as the best (Eq. (6b)); while the ranker learns to rank them at the lowest position (Eq. (6a)). Then, we define the objective for the ranker R and the generator G as follows, respectively, sup R:X R E s S xg Pg h LL2R π s(R) , R s(R) i , (7a) sup G:Z X E s S xg Pg h LL2R π s(G) , R s(G) i . (7b) Thus, the training loss for the ranker and the generator with the mini-batch data can be formulated as follows, respectively: i=1 LL2R π s(R) i , R s(R) i , (8a) i=1 LL2R π s(G) i , R s(G) i , (8b) where B is the batch size. The training algorithm is summarized in Algorithm 1. GANs theories usually show GANs estimate a divergence between the generated data distribution and the real data distribution, to support that they are good estimators of the Generative Adversarial Ranking Nets target data distribution. Thus similarly, we will show our GARNet is a good estimator of user-preferred data distribution by deriving a new divergence. In terms of most GAN variants, the objective function of the discriminator at optimum is a divergence (Jolicoeur Martineau, 2020; Goodfellow et al., 2014; Arjovsky et al., 2017). In the following, we prove that training the ranker in GARNet (Eq. (7a)) is equivalent to estimating a divergence between the user-preferred data distribution Pu(x) and the generated data distribution Pg(x). The generator in GARNet (Eq. (7b)) is trained to minimize the divergence so as to achieve Pg(x) = Pu(x). Definition 2 (Divergence) Let P S and Q S be probability distributions where S is the set of all probability distributions with common support. A function D : (S, S) [0, + ) is a divergence if it respects the following two conditions: D(P, Q) = 0 P = Q. (9) Theorem 3 (Relativistic f-divergence (Jolicoeur-Martineau, 2020)) Let f : R R be a concave function such that f(0) = 0, f is differentiable at 0, f (0) = 0, supv f(v) = M > 0, and arg supv f(v) > 0. Assume P and Q are two distributions with support X, we have that Df(P, Q) = sup C:X R 2 E x P x Q [f(C(x) C( x))] (10) is a relativistic f-divergence. Theorem 3 demonstrates the optimal discriminator in relativistic GAN (Jolicoeur-Martineau, 2019) estimates a relativistic f-divergence between the real data distribution and the generated data distribution. Its discriminator that is adapted from the concave function f(z) = log(sigmoid(z)) + log(2) defines a pairwise ranking loss between a real sample and a fake sample, and we find that f (C(x) C( x)) = log( 1 1 + e (C(x) C( x)) ) + log(2) = log( e C(x) e C(x) + e C( x) ) + log(2) 1 = LL2R(π(s), C(s)) + log(2), 1 is valid if π(s) = [0, ] and s := x > x where x P and x Q. Inspired by the connection between the relativistic f-divergence (Eq. (10)) and the tailored loss of our GARNet (Eq. (6a), Eq. (7a)), we argue that the optimal ranker of our GARNet also implicitly estimates a relativistic f-divergence, but between the user-preferred data distribution and the generated data distribution, which will be proven in Theorem 4. Accordingly, we introduce a new relativistic f-divergence between the user-preferred data distribution Pu and the generated data distribution Pg, which generalizes the targeted data distribution P(x) in Theorem 3 to the mixture distribution Pu = PT i=1 qi Pi with a userspecified mixing ratio where q1 > q2 > . . . > q T and P i qi = 1. In particular, Pi is the Yao, Pan, Li, Tsang, Yao distribution of Xi = {x|y(x) = yi, x X}. The mixture distribution Pu allocates a larger density qi to the higher-ranked class i, consistent with our problem setting in Section 2.2. Theorem 4 (Relativistic f-divergence between Pu and Pg) Let f : R R be a concave function such that f(0) = 0, f is differentiable at 0, f (0) = 0, supv f(v) = M > 0, and arg supv f(v) > 0. Let Pu be the mixture distribution of the whole ordered data, i.e., Pu = PT i=1 qi Pi, where q1 > q2 > . . . > q T and P i qi = 1. Let Pg be the distribution of generated data xg. We have that Df(Pu, Pg) = sup R:X R E s1 P1 ... s T PT xg Pg i=1 qi R(si) R(xg) is a relativistic f-divergence between Pu and Pg. According to Definition 2, we prove that Df(Pu, Pg) is a divergence between Pu and Pg following the three steps: #1 Prove that Df(Pu, Pg) 0. #2 Prove that Pu = PT i=1 qi Pi = Pg = Df(Pu, Pg) = 0. #3 Prove that Df(Pu, Pg) = 0 = Pu = PT i=1 qi Pi = Pg. The details are left in Appendix A. Instead of targeting general preferences that require T + 1 hyperparameters for the ground-truth score vector π s(R) in Eq. (7a), we focus our problem setting on preferences where the first T terms of π s(R) form an arithmetic progression, which depends only on the first term and the common difference, making it generally applicable to preferences of various lengths. Particularly, we demonstrate that when choosing the first T terms of π s(R) as an arithmetic progression, the optimal ranker of our GARNet approximates a relativistic f-divergence between the user-preferred data distribution and the generated data distribution shown in Theorem 4. Proposition 5 Given the training dataset X = {xn}N n=1 and a collection of full preferences S = {sm}M m=1, where each preference s := s1 > s2 . . . > s T S and si Pi. Let Pu = PT i=1 qi Pi be the mixture distribution of the whole ordered data where qi = σ π s(R) i. Let Pg be the distribution of generated data xg. Given a fixed generator G, the optimal ranker R of our GARNet (Eq. (7a)) approximates the relativistic f-divergence between Pu and Pg, i.e., Df(Pu, Pg), if π s(R) = [a + (T 1)d, a + (T 2)d, . . . , a, b] and 1 ea b 0. Proof First of all, given the definition of π s(R) , we have lim 1 ea b 0 q T+1 = lim 1 ea b 0 σ π s(R) T+1 = lim 1 ea b 0 eb eb + PT j=1 ea+(T j)d 1 = 0, (13a) lim 1 ea b 0 i=1 qi = lim 1 ea b 0 1 q T+1 2 = 1. (13b) Generative Adversarial Ranking Nets where 1 , 2 follow the Squeeze theorem (Stewart et al., 2020) given 0 < q T+1 < 1 ea b and 1 ea b 0. Note in practice, we set a b = 10 as 1 ea b 0. According to Eq. (5), we have LL2R π s(R) , R s(R) = i=1 σ(π(s(R)))i log σ(R(s(R)))i (14) i=1 qi log e R(si) e R(xg) + PT j=1 e R(sj) + q T+1 log e R(xg) e R(xg) + PT j=1 e R(sj) = log e PT i=1 qi R(si) e R(xg) + PT j=1 e R(sj) PT i=1 qi + q T+1 log e R(xg) e R(xg) + PT j=1 e R(sj) e R(xg) PT i=1 qi R(si) + PT j=1 e R(sj) PT i=1 qi R(si) . 1 follows the definition of the target preference s(R) in Eq. (6a). 2 is valid due to Eq. (13a) and Eq. (13b). Meanwhile, when the ranker is approaching the optima R for a fixed generator G, i.e., σ R s(R) = σ π s(R) , we have R (s(R)) = π s(R) + δ due to the translation invariance of softmax (Laha et al., 2018), i.e., σ(r + δ) = σ(r). Therefore, the first T terms of R (s(R)) is also arithmetic progression with a common difference of d same as that of π s(R) . Then, we have i=1 qi R (si) = R (s1) (j 1)d i=1 qi R (s1) (i 1)d (15) i=1 qi(1 i) where cj is a constant j = 1, 2, . . . , T, exclusively determined by the pre-specified score vector π(s). 1 is valid due to Eq. (13b). Yao, Pan, Li, Tsang, Yao Therefore, the objective for the ranker R can be approximated as follows: 1 sup R:X R E s S xg Pg c + e (PT i=1 qi R(si) R(xg)) 2 = sup R:X R E s1 P1 ... s T PT xg Pg c + e (( PT i=1 qi R(si)) R(xg)) 3 = sup R:X R E s1 P1 ... s T PT xg Pg c + e (( PT i=1 qi R(si)) R(xg)) + log(c + 1) 4 = sup R:X R E s1 P1 ... s T PT xg Pg i=1 qi R(si) R(xg) 1 is reasonable because PT j=1 e R(sj) PT i=1 qi R(si) in Eq. (14) would degenerate to a constant c = ec1+c2+...+c T following Eq. (15), being independent of the ranker when the ranker is approaching the optima. 2 is valid as y(si) yi according to Eq. (4). 3 is valid due to the addition of a constant. 4 is obtained by denoting f(z) = log 1 c+e z + log(c + 1). Note f is a concave function; f(0) = 0; f is differentiable at 0; f (0) = 0; supx f(x) = M > 0; and arg supx f(x) > 0. According to Theorem 4, the optimal ranker of our GARNet approximately estimates a relativistic f divergence. Remark 6 In theory, it seems feasible to directly adopt the following objective to learn Pu according to Theorem 4. sup R:X R E s1 P1 ... s T PT xg Pg i=1 qi R(si) R(xg) sup G:X R E s1 P1 ... s T PT xg Pg i=1 qi R(si) where the concave function f can be defined as f(z) = log(sigmoid(z))+log(2) like Jolicoeur Martineau (2019). However, in practice, the above objective cannot provide sufficient gradient for the generator G to learn properly. Specifically, in the early training, the generator G is poor, and the ranker R can easily assign lower values to the generated sample xg than real samples. PT i=1 qi R(si) R(xg) would be large and then gradient vanish occurs. Generative Adversarial Ranking Nets Based on Proposition 5, we set a and b to be 10 and 0, respectively. The common difference d for the arithmetic progression in π s(R) determines the mixture distribution Pu = PT i=1 qi Pi. Namely, the mixture distribution Pu would be close to the distribution of top-1 data when d is sufficiently large, which refers to our Corollary 7; while Pu would be close to a distribution where the whole ordered data is uniformly distributed when d is very small. Corollary 7 The optimal ranker of GARNet (Eq. (7a)) defines a relativistic f-divergence between the distribution of top-1 data P1 and the generated data distribution Pg, i.e., Df(P1, Pg) when e d 0. Proof According to the definition of q1 in Proposition 5, we have q1 = ea+(T 1)d PT i=1 ea+(T i)d + eb = ea+(T 1)d ea+(T 1)d 1 + e d + . . . + e (T 1)d + eb (a+T 1)d) = 1 1 + e d + . . . + e (T 1)d + eb (a+T 1)d) > 1 1 + Te d Since 1 1+Te d < q1 < 1, we have q1 1 following the Squeeze theorem (Stewart et al., 2020) when e d 0. The proof is completed. Proposition 8 Given the optimal ranker R as demonstrated in Proposition 5, the generator of GARNet (Eq. (7b)) is minimizing the divergence between Pu and Pg, i.e., Df(Pu, Pg). Proof Given R , the objective for the generator can be formulated as: sup G:Z X E s S xg Pg h LL2R π s(G) , R s(G) i . (19) We update the parameters of generator θG as follows: θt+1 G = θt G + θG E s S xg Pg h LL2R π s(G) , R s(G) i , (20) which maximizes the ranker s output for the generated samples R (xg). Then, Df(Pu, Pg) will be minimized. Proposition 5 and Proposition 8 demonstrate that the distribution learned by our GARNet (Eq. (7)) is determined by the pre-specified score vector π(s). We present a case study on MNIST (Lecun et al., 1998) to justify it by setting d to 0.1, 0.5, 1 and 5, respectively. As shown in Fig. 3, the proportion of generated data from GARNet is almost consistent with the top-1 probability calculated by the specified score vector, i.e., q = σ(π(s)). In terms of preferences over a discrete property, the most desirable data is supposed to belong to the global top-1 class, i.e., y(x) = y1. As clarified in Corollary 7, we can simply set up a sufficiently large d (e.g., 5) to achieve this goal. Yao, Pan, Li, Tsang, Yao 0 1 2 3 4 5 6 7 8 9 (a) d = 0.1 0 1 2 3 4 5 6 7 8 9 (b) d = 0.5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Figure 3: GARNet learns the distribution Pu with different pre-specified score vectors π(s) from full preferences w.r.t. a discrete property on MNIST (preferring small digits, i.e., 0 1 . . . 9). d is the common difference for π s(R) (Proposition 5). q = σ(π(s)) calculates the ground-truth top-1 probability of each digit class. q calculates the proportion of different digit classes for generated data from GARNet. The digit values for the generated data are evaluated by a pretrained classifier for digit classification. 3.2 Given Partial Preferences In real-world situations, full preferences are not always available. A more realistic scenario is to provide partial preferences of various lengths as training dataset (Lin, 2010). Namely, s := s1 > s2 > . . . > sl, l T. (21) Similarly, we construct the adversarial ranking process between the ranker and generator: ranker s(R) := s1 > . . . > sl > xg; (22a) generator s(G) := xg > s1 > . . . > sl. (22b) We adopt the objective (Eq. (7)) to learn the user-preferred distribution Pu(x) via the adversarial ranking (Eq. (22)). Thanks to the consistency of the score function (Lemma 9), our GARNet can still learn the desired distribution from the partial preferences. Lemma 9 (Consistency of the score function (Xia et al., 2008)) Given a collection of partial preferences S = {sm}M m=1, where 2 |sm| T, the List Net loss (Eq. (5)) adopted in our GARNet can recover the optimal score function for the ranker if S contains sufficient preferences and the samples X are well covered3. According to Lemma 9, the optimal ranker can recover s1 > s2 > . . . > s T > xg. Therefore, GARNet can still learn a user-preferred data distribution from partial preferences, which assigns greater density to higher-ranked data. We present a case study on MNIST (Lecun et al., 1998) with partial preferences (2 l 10). We set the ground-truth score vector π s(R) for Eq. (22a) as [a + (l 1)d , a + (l 2)d , . . . , a, b] similar in Proposition 5. As shown in Fig. 4, the proportion of generated data from GARNet is larger when the digit class is ranked higher, which is consistent with user preferences. Especially, when d is sufficiently large, GARNet only generates high-ranked data, namely, digit zero and one. 3. It doesn t mean to include every sample into the preferences but requires the samples included to be approximately uniformly distributed in the score space. Generative Adversarial Ranking Nets 0 1 2 3 4 5 6 7 8 9 (a) d = 0.1 0 1 2 3 4 5 6 7 8 9 (b) d = 0.5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Figure 4: GARNet learns the distribution Pu with different pre-specified score vectors π(s) from partial preferences (2 l 10) w.r.t. a discrete property on MNIST (preferring small digits). d is the common difference of the ground-truth score vector π s(R) for Eq. (22a). q calculates the proportion of different digit classes for generated data from GARNet. q is copied from the results in Fig. 3 for references. The digit values for the generated data are evaluated by a pretrained classifier for digit classification. We found that training on relatively long preferences (7 l 10), GARNet can converge to top-1 data distribution when setting d = 5. Note the resultant proportion per digit q is no longer consistent with pre-specified values q as y(si) yi, s S and i = 1, 2, . . . , T does not hold anymore for partial preferences. 4. GARNet for Preferences w.r.t. a Continuous Property We consider a scenario that the ranking relation among training samples is based on a continuous property. Particularly, each training sample xi is associated with an underlying score o(xi) that represents the user s preference for xi in terms of the property. Definition 10 (Preferences w.r.t. a continuous property) Given a training dataset X = {xn}N n=1, where each training instance xn X has an underlying ground-truth ranking score o(xn) [A, B] O in terms of a certain continuous property, the preference among a list of samples {xi1, xi2, . . . , } X is defined as xi1 > xi2 > xi3 > . . . , if o(xi1) > o(xi2) > o(xi3) > . . . . (23) Example. Suppose users prefer smiling face images. Then users would assign higher preferences to those images with bigger smiles, i.e., the score for sample x in terms of the smiling attribute o(x) being larger. Theorem 11 Given a collection of preferences in terms of a continuous property over [A, B], see Eq. (23), it can be transformed equivalently to the preferences in terms of finite ordered classes if we consider the samples with close ranking scores as ties. Proof According to Heine-Borel theorem (Jeffreys et al., 1999), we have O = {(oi ϵi, oi + ϵi)|oi [A, B], ϵi > 0, i = 1, 2, . . . , T }, (24) Yao, Pan, Li, Tsang, Yao 1 2 3 4 5 Top (a) d = 0.1 1 2 3 4 5 Top (b) d = 0.5 1 2 3 4 5 Top 1 2 3 4 5 Top Figure 5: GARNet learns the distribution Pu with different pre-specified score vectors π(s) from full preferences w.r.t. a continuous property on LFW (preferring smiling faces). q = σ(π(s)) calculates the top-1 probabilities of each class. q calculates the proportion of different classes for generated data from GARNet. Top 1 class represents images locating in the score interval of those real images ranked top 20% in terms of the smile attribute; top 2 class represents images locating in the score interval of those real images ranked between top 20% and top 40%; so on and so forth. The labels for the generated data are obtained based on the score given by a pretrained ranker for face images w.r.t. the smile attribute. which is a finite subcover of [A, B]. Without loss of generality, we assume o1 > o2 > . . . > o T . (25) Then, we can construct a union of the non-overlapping intervals to cover the score interval [A, B]: i=T ,...,2 [oi ϵi, oi + ϵi) [o1 ϵ1, o1 + ϵ1], (26) where o T ϵT = A, o1 + ϵ1 = B, oi + ϵi = oi 1 ϵi 1, i = T , . . . , 2. Meanwhile, we define a pair of samples within the same internal as a tie following Zhou et al. (2008), namely xm = xn for any o(xm), o(xn) Oi, (27) where Oi = [oi ϵi, oi + ϵi), i = T , T 1, . . . , 2 and O1 = [o1 ϵ1, o1 + ϵ1]. By assigning the same label yi to all samples within each interval Oi, we can obtain a set of ordered class labels {yi}T i=1 where y1 > y2 > . . . > y T , namely y(xi) yi if o(xi) Oi. (28) To sum up, the preferences in terms of a continuous property (Definition 10) approximate the preferences in terms of T ordered classes. Remark 12 The assumption of Eq. (27) is in line with real-world applications. People usually cannot distinguish a pair of samples when the difference is small and would annotate them as ties (Zhou et al., 2008), where ϵi is the indistinguishable score difference for each Generative Adversarial Ranking Nets Oi. Nevertheless, for o(x1) Oi 1 and o(x2) Oi, x1 and x2 are indistinguishable when o(x1) o(x2) is small. Fortunately, assigning them as two adjacent ordered classes yi 1 and yi is still reasonable because the order gap between yi 1 and yi is not significant when T is sufficiently large. According to Theorem 11, learning from the preferences in terms of a continuous property can be discretized as learning from the preferences over a finite number of ordered classes. Therefore, we can adopt GARNet which defines an adversarial ranking goal (Eq. (6)) to learn the desired distribution from full preferences in terms of the continuous property. We present a case study on LFW (Huang et al., 2008) w.r.t. the smile attribute in Fig. 5 to justify this. Simply, we discrete the continuous ranking score w.r.t. the smile attribute over real images into five ordered classes. That is, images ranked top 20% in terms of the smile attribute are assigned as top 1 class; images ranked between top 20% and top 40% is assigned as top 2 class; so on and so forth. To label a generated sample, we first apply a pretrained ranker for ranking face images w.r.t. the smile attribute to predict the sample s score and then determine the label as the corresponding class according to the score interval in which the score is located. As shown in Fig. 5, the proportion of generated data from GARNet is almost consistent with the top-1 probability calculated by the specified score vector, i.e., q = σ(π(s)). Similarly, we can adopt GARNet (Eq. (22)) to learn the desired distribution from partial preferences in terms of the continuous property (see Section 7.2 for more empirical studies). In practice, we do not explicitly discretize the continuous ranking scores to a finite number of classes and define a so-called top-1 class as the most desirable data. Instead, we suppose the generative model is more consistent with user preferences if the generated data has higher ranking scores. 5. GARNet for a Mixture of Preferences We extend our GARNet to a more general circumstance where a mixture of user preferences are collected from distinct groups of users. Particularly, we consider a simple situation where each group of users rank the samples in terms of a specific attribute, i.e., {Su}U u=1. The objective of GARNet in this context can be formulated as follows: u=1 E su Su xg Pg h LL2R π s(R) u , Ru s(R) u i , (29a) u=1 E su Su xg Pg h LL2R π s(G) u , Ru s(G) u i . (29b) For simplicity, we extend the scalar output of the ranker R to a U dimension vector so as to learn from {Su}U u=1 simultaneously. Ru is the u-th output of the ranker. Conditional GARNet When provided a mixture of preferences where one group of preferences might be conflicting with another, GARNet can achieve a compromise performance, which will be empirically verified in the experiment. However, for exactly opposing attributes, like being both open and not open , a practical way is to model them using Yao, Pan, Li, Tsang, Yao two separate GARNet models or extending our GARNet to its conditional version, denoted as CGARNet, conditioned on each group of preferences. The extension is similar as extending GAN to conditional GAN (Mirza and Osindero, 2014). Both the ranker and the generator can be conditioned on an attribute label. 6. Discussion on GANs Extended to Preference-guided Generation In this section, we discuss how current GANs variants can be extended to preference-guided generation and their defects. To simplify the discussion, here we aim to learn the distribution that is most consistent with user preferences, i.e., the distribution of top-1 data P1. Preferences guided generative adversarial learning To apply vanilla GAN (denoted as GAN-0) (Goodfellow et al., 2014) for this application, an intuitive way is to first construct a dataset consisting of the user-desired samples and then perform regular GAN training on this dataset to obtain a generator that purely outputs the desired samples. Two possible strategies can be applied: (1) Selecting desired samples using partial preferences directly (GAN-1): A new dataset is constructed by selecting top-1 samples from each partial preference sm. Since the top1 samples in partial preferences are not necessarily the global top-ranked samples, this strategy produces biased training data that inevitably involves undesired samples. As a result, the derived distribution is not precisely user-desired. On the other hand, it would suffer from insufficient data issues when the amount of training data is small. (2) Selecting desired samples with a ranking proxy (GAN-2): Specifically, we train a global ranking proxy using partial preferences (Cao et al., 2007) and then apply the proxy to select the global high-ranked samples as a new training dataset. For example, the top 30% of the training samples are selected as the desired dataset. In spite of its feasibility, this strategy may also incur insufficient data issues especially when the desired data in the original training dataset is limited. Feedback GAN (FBGAN) (Gupta and Zou, 2019) is based on the second strategy and remedies its drawbacks by introducing the high-ranked generated samples into previous training data. Since the ratio of the high-ranked samples in the training data gradually increases along with the training epoch, all training data will ideally converge to the userdesired samples. However, FBGAN may suffer from severe data quality issues since plainly treating the generated samples as the training data will degrade the generation quality. GANs plus a ranking module Another idea is to introduce a ranker as an extra critic (Saquil et al., 2018) to promote the generation of user-desired samples while the original GAN s discriminator is kept as a critic to guarantee the generation quality, which is dubbed as GAN-RK. Similarly, the ranker learns to rank samples from partial preferences and outputs high ranking scores for user-desired samples. The discriminator can be a classifier that distinguishes real from fake (Goodfellow et al., 2014) or define a distribution discrepancy (Arjovsky et al., 2017). For the generator, apart from the goal of aligning the generated distribution with the real distribution guided by the discriminator, an extra goal is in pursuit of generating samples with high-ranking scores judged by the ranker. However, both goals are conflicting since one requires the generator to synthesize samples alike the whole data Generative Adversarial Ranking Nets samples while the other requires the generation of partial data samples, i.e., samples with high-ranking scores. Thus, learning the desired data distribution cannot be well achieved. Generator with Naive Ranker Another naive approach is to just have a ranker that learns from user preferences on real data to guide a generator, which is called GR. However, such an approach would suffer from poor data quality issues since there is no modeling for data authenticity in this model. 7. Experiments We empirically verify that our GARNet can learn a data distribution that is best consistent with user preferences. Furthermore, we demonstrate the potential of GARNet in improving imbalanced class learning. Dataset: (1) MNIST dataset (Lecun et al., 1998) consists of 28 28 images with digit zero to nine. We use its training set (50K images) for experiment. We suppose that users prefer smaller digits of MNIST. An image with a smaller digit value will be ranked higher. For instance, the partial ranking list over four digits 1, 3, 2, 9 is s := 1 > 2 > 3 > 9. As there are 10 digits in total, the maximal length of the preferences l can be 10. In particular, the length of the preferences included in the training data varies from 7 to 10 so that GARNet can converge to top-1 data distribution. (2) Labeled Faces in the Wild (LFW) dataset (Huang et al., 2008) consists of 13, 143 celebrity face images from the wild. LFW-10 consists of a subset of 2, 000 images from LFW along with about 500 pairwise comparisons for one attribute. We take the pairwise comparisons in terms of the smile attribute as the user preferences for the training data. Since LFW10 has limited pairwise comparisons, we exploit a pretrained ranker to augment more preferences for the training data. In addition, as LFW10 has limited training images, i.e., 1K images, we take 13, 143 images from LFW as our training data also. Specifically, we pretrain a ranker to learn to rank images in terms of the smile attribute from given pairwise comparisons of LFW10. Then we use the ranker to output ranking scores for all training samples of LFW and construct pairwise preferences based on the scores. If face image xa has larger smile than face image xb, then the preference is s := xa > xb. (3) UT-Zap50K dataset (Yu and Grauman, 2014) contains 50, 025 shoe images from Zappos.com. It contains pairwise comparisons over several attributes. We use all pairs, i.e., UT-Zap50K-1, UT-Zap50K-2, and UT-Zap50K-lexi as training data. The pairwise comparisons in terms of comfort (4, 483 pairwise comparisons), open (4, 351) and sporty (4, 085) attributes are taken as the user preferences for the training data, respectively. We exploit a pretrained ranker to augment more preferences for the training data. We pretrain a ranker to rank images in terms of a specified attribute from given pairwise comparisons and then construct pairwise preferences based on the scores evaluated by the ranker. If shoe image xa is more comfort/open/sporty than shoe image xb, then the preference is s := xa > xb. Baselines: We consider GAN (Goodfellow et al., 2014), FBGAN (Gupta and Zou, 2019), GAN-RK (Saquil et al., 2018) (GAN plus an additional ranker) and GR (a generator with a ranker that is only trained with partial preferences) as our baselines. In terms of GAN, it is trained with three kinds of subsets: the entire data, the subset made up of local high-ranked samples (top-1 samples in the partial preferences), and the subset made up of global high-ranked samples (e.g., top 10% for MNIST; top 50% for LFW and UT-Zap50K) Yao, Pan, Li, Tsang, Yao Figure 6: Comparison w.r.t. the generation of user-preferred digits (small digits) on MNIST. selected by a surrogate ranker. These three variants are denoted as GAN-0, GAN-1, and GAN-2, respectively. Network & Hyperparameters: All methods are implemented based on WGANGP architecture (DCGAN version) (Gulrajani et al., 2017) unless specifically mentioned. For the training we use the Adam optimizer (Kingma and Ba, 2015) with learning rate 2 10 4 and β1 = 0.5, β2 = 0.999. According to Proposition 5 and Corollary 7, we set the ground-truth score vector for s(R) as π s(R) = [10 + 5(l 1), 10 + 5(l 2), . . . , 10, 0] for all datasets, which can make GARNet learn a data distribution that is best consistent with user preferences as q1 1. We simply set the ground-truth score vector for s(G) as π s(G) = [10 + 5(l 2), 10 + 5(l 3), . . . , 10, 5, 0]. For MNIST, the batch size used is 50. The training iteration is set to 100K. For LFW and UT-Zap50K, the batch size is 64. The training iteration is 200K. The training images are resized to 32 32 unless specifically mentioned. To rank images in terms of a certain attribute, we adopt a pairwise ranking loss in Burges et al. (2005). The pretrained VGG16 (Simonyan and Zisserman, 2015) is used to map an image to its ranking score. Evaluation Metrics: (1) For MNIST, mean digit (MD) is adopted as a performance measure for the generation of preferred digits. The digit values for the generated data will be evaluated by a pretrained classifier for digit classification. For each method, we randomly generate 50K digits and calculate their MD. (2) For LFW and UT-Zap50K, mean score (MS) is adopted as a performance measure for the generation of user-preferred face images. The scores are evaluated by a pretrained ranker that learns to rank images in terms of the target attribute from the given preferences. Frechet Inception Distance (FID) (Heusel et al., 2017) measures the visual quality w.r.t. the high-ranked real face images. For each method, we randomly select 50K generated images and conduct the evaluation. Generative Adversarial Ranking Nets 7.1 Preferences w.r.t. Discrete Digits In this section, we apply GARNet to generate the user-preferred digits on MNIST using the partial preferences. Table 2: Comparison of various methods on MNIST w.r.t. mean digit (MD, ). Best results are in bold. FBGAN suffers from mode collapse (Fig. 6d). GR generates meaningless results (Fig. 6f), so its MD is not collected. Method Real GAN-0 GAN-1 GAN-2 GAN-RK GR FBGAN GARNet MD 4.45 4.49 0.75 0.15 4.63 - 0.00 0.00 According to Fig. 6 and Table 2, we highlight that only our GARNet successfully learns the distribution of the top-ranked digits from user preferences. Table 2 shows that except for GAN-0, GAN-RK, and GR, other methods have smaller MD compared to the training data (denoted as real). It means that those generative models can learn from user preferences to some extent. However, only FBGAN and GARNet can converge to generate top-ranked digits with MD 0.00. On the other hand, GAN-0 is the vanilla GAN that is trained with all training data, so its MD is close to that of real images. GAN-RK fails to learn from user preferences because of the conflict between the discriminator and ranker. GR cannot get any meaningful generations because it does not involve a distribution matching between real and generated samples. Therefore, its MD is not applicable. Fig. 6 shows that: (1) GARNet generates top-ranked digits (digit 0), with high quality as shown in Fig. 6g. (2) According to Fig. 6b, GAN-1 still generates digits that are not ranked top, like digit two, since the constructed training subset contains undesired samples. (3) GAN-2 achieves better performance than GAN-1 but still fails to converge to the distribution of the top-ranked digit. (4) FBGAN suffers from mode collapse of which the generated digit zeros are exactly the same since its selection of high-ranked generated data does not consider maintaining sample diversity. 7.2 Preferences w.r.t. a Continuous Attribute As described in experiments on MNIST, GAN-1, GAN-2, FBGAN and our GARNet can learn from the user preferences. Therefore, we only compare GARNet with these baselines on generating user-preferred facial images on LFW w.r.t. the smile attribute as well as generating user-preferred shoe images on UT-Zap50K w.r.t. the comfort, open and sporty attribute, respectively. We quantitatively evaluate the performance of learning from user preferences by mean score (MS) and the generation quality by FID in Table 3. Meanwhile, we plot the score density for generated samples of various methods and training samples in Fig. 7. The evaluation results on LFW (smile) show that: GAN-1, GAN-2, and GARNet shift to a distribution of data with larger scores, which means they can learn from user preferences and generate face images with large smiles. (1) Our GARNet learns the best user-preferred distribution, indicated by a best MS, along with the best image quality, indicated by a lowest FID score. (2) GAN-1 and GAN-2 have poorer quality than GARNet. (3) FBGAN Yao, Pan, Li, Tsang, Yao suffers from mode collapse and its generated images have very poor quality (see Appendix), which is verified by its large FID score. Similar results can be concluded on UT-Zap50K. For further visual quantification, we place GARNet s generated samples and training samples (a.k.a. real samples) w.r.t. their scores on the smile axis in Fig. 8a. GARNet mainly (over 95%) covers those real images ranked top 50% in terms of the smile attribute. GARNet even generates images with higher scores than the maximum scores in the LFW dataset, which shows its potential for generating data beyond existing rankings. We place GARNet s generated samples and training samples (a.k.a. real samples) w.r.t. their scores on the comfort axis in Fig. 8b. Over 75% generated images cover those real images ranked top 50% in terms of the comfort attribute. Results on the open and sporty attributes can be seen in Appendix. Table 3: Comparison on LFW face data and UT-Zap50K shoe data w.r.t. performance measure (MS, ) and quality score (FID, ). The best results are highlighted in bold. Since FBGAN suffers from mode collapse and poor image quality issues (large FID; see its generated images in Appendix), its MS is not collected. Dataset MS FID Real GAN-1 GAN-2 FBGAN GARNet GAN-1 GAN-2 FBGAN GARNet LFW (smile) 0.59 1.41 1.14 - 2.29 47.55 37.40 89.36 22.22 UT-Zap50K (comfort) -5.06 -2.66 -0.91 - -0.90 49.20 77.99 253.53 39.16 UT-Zap50K (open) -0.36 0.93 0.99 - 3.75 63.92 112.93 265.49 46.78 UT-Zap50K (sporty) 4.09 5.28 4.48 - 7.07 41.02 84.59 431.44 32.87 7.3 User Control on a Mixture of Preferences To demonstrate that GARNet can generalize to a mixture of user preferences, i.e., preferences on multiple attributes, we conduct the experiments in terms of the comfort and open attributes as the training data. To achieve a better generation, we apply a more advanced GAN architecture (Karras et al., 2020). Meanwhile, the training images are resized to 64 64 for better resolution. In Fig. 9, we visualize real images and generated images by ordering them using the pretrained rankers defined over the open and comfort attributes, respectively. It is observed that GARNet generates shoe images that are more comfort or more open. In addition, the least comfort or the least open shoe images are not generated by GARNet as they are disliked by users. It is worth mentioning that GARNet achieves a compromise between a pair of moderate conflicting attributes. Intuitively, open and comfort are conflicting attributes. For example, sport shoes are thought comfortable but may be slightly open. As shown in Figure 9, learning from preferences w.r.t. the open and comfort attributes, GARNet converges to avoid generating samples ranked lowest in either of two attributes but would generate relatively close and comfortable shoes or relatively uncomfortable and open shoes. Generative Adversarial Ranking Nets 2.5 0.0 2.5 5.0 Score Real Generated 40 20 0 20 Score -5.06 -2.66 Real Generated 40 20 0 20 Score Real Generated 25 0 25 Score Real Generated 2.5 0.0 2.5 5.0 Score Real Generated 40 20 0 20 Score -5.06 -0.91 Real Generated 40 20 0 20 Score Real Generated 25 0 25 Score Real Generated 2.5 0.0 2.5 5.0 Score Real Generated 40 20 0 20 Score -5.06 -0.90 Real Generated (b) comfort 40 20 0 20 Score Real Generated 25 0 25 Score Real Generated Figure 7: Comparison of GAN-1 (Row 1), GAN-2 (Row 2) and our GARNet (Row 3) in terms of the score density. The green point denotes the mean score of generated samples while the red point denotes that of real samples. Note the x-axis denotes ranking scores of samples instead of ordered classes in Figure 5. Similar observations that a preferred class has a larger proportion can be found if we coarsen the ranker scores into limited ordered classes. 95% 75% 50% 25% 5% 95% 75% 50% 25% 5% real percentile rank generated percentile rank 95% 75% 50% 25% 5% 95% 75% 50% 25% 5% real percentile rank generated percentile rank (b) UT-Zap50K Figure 8: Real images (32 32, above the axis) vs. generated images (GARNet, below the axis) w.r.t. the score axis of attributes. The percentile rank of a given score is the percentage of scores in its frequency distribution that are less than that score. Real (generated) percentile rank means calculated among real (generated) images. Yao, Pan, Li, Tsang, Yao Min Max Min Max Real samples Generated samples -54.6 -40.0 -25.4 -10.9 3.7 18.3 comfort Min -22.4 -15.3 -8.1 -1.0 6.2 13.3 Figure 9: GARNet for multiple attributes. Images (64 64) are placed w.r.t. the comfort and open score. The results are obtained by an advanced GAN variant, Style GAN2ADA (Karras et al., 2020), thus having better quality. 40 20 0 20 Score Real Generated 40 20 0 20 Score -0.36 -15.07 Real Generated (b) not open Figure 10: The generated samples (32 32) and the score density (open , score ) for CGARNet conditioned on open and not open attributes, respectively. The green point denotes the mean score of generated samples while the red point denotes that of real samples. 7.3.1 Conditional GARNet We apply our conditional GARNet (CGARNet) for exact opposing attributes, i.e., open and not open. Fig. 10 shows that: (1) When conditioned on the open attribute, CGARNet generates shoe images with large open attribute values, with a score density that locates on a region of large open score values. (2) When conditioned on the not open attribute, CGARNet generates shoe images with small open attribute values, with a score density that locates on a region of small open score values. 7.4 GARNet on preferences VS. GANs on Labels In this section, we show that preference-guided generation outperforms generation conditioned on labels in extremely imbalance class learning. Note conditional generation can generate samples for the minority data, showing a promising application for imbalanced Generative Adversarial Ranking Nets 0 1 2 3 4 5 6 7 8 9 Predicted 0 1 2 3 4 5 6 7 8 9 Actual 0.1 0.1 -0.2 0.1 0.1 -0.1 -0.1 -0.1 -0.1 0.1 0.1 0.1 0.1 -0.1 -0.1 -0.2 0.2 0.1 -0.1 -0.1 -0.1 -2.9 -3.8 8.9 -1.9 -0.1 0.2 -0.2 -0.1 0.1 0.1 0.1 -0.3 0.1 0.1 -0.1 -0.1 -0.4 0.5 (a) Performance gain matrix (c) Elastic-info GAN Figure 11: (a) GARNet boosts imbalance classification on MNIST with digit six as the minority class. The gain matrix (zero is not presented) is obtained by C -C, where C and C are the confusion matrix calculated on GARNet boosted data and original MNIST data, respectively. The color denotes the confusion matrix (%) on original MNIST data. (b-c) Visual results of GARNet and Elastic-info GAN on MNIST with extremely imbalanced data. data classification by promoting the minority via data augmentation in a pre-processing manner. Experimental setup: On the MNIST dataset (Lecun et al., 1998), we randomly pick up 0.5% samples of the digit six to constitute the minority class. All samples of the rest classes are retained. Note the imbalance ratio reaches to minor/major 1 200. We construct partial preferences simply by preferring digit six than any other class. We compare the classification performance of a CNN classifier (CNN) without and with data augmentation by GARNet. Baselines: Elastic-info GAN (Ojha et al., 2020) is a state-of-art generative model for imbalanced data generation. Standard GAN trained only with minor class is excluded as a baseline since there are insufficient samples (0.5% 4, 951 25) for training a generative model. Fig. 11b shows that our GARNet successfully generates the user-desired data, i.e., digit six. Though there are limited digit six images in the training set, we can construct sufficient partial preferences between the minor class and the major classes. Therefore, our GARNet can learn a direction of user preferences and generate the desired data. In contrast, Fig. 11c shows that Elastic-info GAN fails to generate digit six since it relies on class labels and the samples of digit six are too limited. Therefore, our GARNet can be used to improve imbalance class learning via data augmentation but Elastic-info GAN cannot. Fig. 11a shows the combined confusion matrix of two settings: with augmentation by GARNet and without augmentation. The color indicates the confusion matrix of CNN with imbalanced data, where every class has a high purity except digit six which is a bit confusing with digit four, five, and eight. By replenishing the digit six with GARNet, we train the same CNN architecture (CNN-GARNet). It shows that the purity of digit six increases by 8.9%, demonstrating GARNet can efficiently augment the minority data in imbalanced scenarios. Yao, Pan, Li, Tsang, Yao 8. Conclusion This paper presents a novel adversarial ranking framework, GARNet, to learn from user preferences. In particular, we prove that GARNet is equivalent to optimizing a divergence between the user-preferred data distribution (determined by the given score vector) and the generated data distribution, which theoretically gurantees GARNet as a good estimator of the desired data distribution. Meanwhile, we empirically show GARNet can obtain corresponding distributions when different given score vectors are specified. Numerous experiments demonstrate GARNet can learn the distribution that best matches user preferences compared to various baselines. A study of imbalanced class learning validates the advantage of preference-guided GARNet over GAN conditioned on labels. GARNet is modeled under the homogeneity assumption of rank aggregation that only one ground truth full ranking exists (Lin, 2010), which is in line with the actual situation in many real-world applications peer grading (Raman and Joachims, 2014), to image rating (Liang and Grauman, 2014), document recommendation (Sellamanickam et al., 2011), opinion analysis (Kim et al., 2015), and bioinformatics (Chatterjee et al., 2018). It is possible to break such an assumption when the preferences are sampled from heterogeneous user groups, i.e., a mixture of preferences: (1) There is one group of users dominating the ranking. Then, the homogeneity assumption is still valid by simply taking the samples from the majority of preferences as clean preferences and samples from the rest ones as noisy preferences. Robust rank aggregation (Pan et al., 2022) can be incorporated into our GARNet so as to handle these noisy preferences, which is left for future exploration. (2) The preferences from different groups of users are balanced. The homogeneity assumption on users is no longer valid and it is more reasonable and realistic to assume a heterogeneity of users for an idealized scenario (Zhao et al., 2016), which is called mixture rank aggregation. We conduct a preliminary exploration for a simple case where the group affiliation of each preference is known. We leave the general case of mixture ranking for future work. Acknowledgements This work was supported by the A*STAR Centre for Frontier AI Research, the A*STAR Career Development Fund (Grant No.C222812019), the A*STAR GAP project (Grant No.I23D1AG079), the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (Grant No. 2017ZT07X386), NSFC (Grant No.62250710682), and the Program for Guangdong Provincial Key Laboratory (Grant No. 2020B121201001). Appendix A. Proof of Theorem 4 Proof Let R = arg sup R:X R E s1 P1 ... s T PT xg Pg h f PT i=1 qi R(si) R(xg) i be the best choice of R. #1 Prove that Df(Pu, Pg) 0. Let Rw(x) = c x X (worst possible choice of R), where c is a constant. Generative Adversarial Ranking Nets Df(Pu, Pg) = E s1 P1 ... s T PT xg Pg i=1 qi R (si) R (xg)) E s1 P1 ... s T PT xg Pg i=1 qi Rw(si) Rw(xg)) (30) 1 is valid because PT i=1 qi = 1. #2 Prove that Pu = PT i=1 qi Pi = Pg = Df(Pu, Pg) = 0. Df(Pu, Pg) = E s1 P1 ... s T PT xg Pg i=1 qi R (si) R (xg) 1 E s1 P1 ... s T PT i=1 qi R (si) E xg Pg [R (xg)] E s1 P1 ... s T PT i=1 qi R (si) E xg Pg [R (xg)] i=1 qi E si Pi R (si) E xg Pg [R (xg)] i=1 qi E x Pi [R (x)] E x Pg [R (x)] i=1 qi Pidx Z R (x)Pgdx 4 = f(0) = 0, where 1 and 2 follow Jensen s inequality for a concave function f. 3 is valid due to change of variables. 4 is valid because Pg = PT i=1 qi Pi. Since Df(Pu, Pg) 0 (Eq. (30)), we have Df(Pu, Pg) = 0. #3 Prove that Df(Pu, Pg) = 0 = Pu = PT i=1 qi Pi = Pg. We prove this by contraposition. Namely, we prove that Pu = Pg = Df(Pu, Pg) = 0. Let H = {x|Pu(x) > Pg(x)}. Since Pu = Pg, we have H = . Let u = R H Pu(x)dx and v = R H Pg(x)dx, we have (1 u) = R X\H Pu(x)dx and (1 v) = R X\H Pg(x)dx. According to the definition, u > 0, v > 0, and u > v. Yao, Pan, Li, Tsang, Yao Let R (x) = ( if x H 0 else , where = 0, then we have L( ) = E s1 P1 ... s T PT xg Pg i=1 qi R (si) R (xg) 1 = E s1 P1 ... s T PT xg Pg i=1 qi R (si) R (xg) !# 2 E s1 P1 ... s T PT xg Pg i=1 qif(R (si) R (xg)) i=1 E si Pi xg Pg qif(R (si) R (xg)) i=1 E xu Pi xg Pg qif(R (xu) R (xg)) X f(R (xu) R (xg))( i=1 qi Pi(xu))Pg(xg)dxudxg X f(R (xu) R (xg))Pu(xu)Pg(xg)dxudxg H f(R (xu) R (xg))Pu(xu)Pg(xg)dxudxg X\H f(R (xu) R (xg))Pu(xu)Pg(xg)dxudxg H f(R (xu) R (xg))Pu(xu)Pg(xg)dxudxg X\H f(R (xu) R (xg))Pu(xu)Pg(xg)dxudxg = f( )u(1 v) + f( )v(1 u), 1 is valid because PT i=1 qi = 1. 2 follows Jensen s inequality for concave function f. 3 is valid due to the change of variables. 4 is obtained by denoting Pu = PT i=1 qi Pi. Since u(1 v) > v(1 u), > 0, s.t. L( ) > 0 according to Lemma A.3 in Jolicoeur Martineau (2020). Thus, let = , we have Eq. (32) > 0. Therefore, Df(Pu, Pg) = E s1 P1 ... s T PT xg Pg i=1 qi R (si) R (xg)) E s1 P1 ... s T PT xg Pg i=1 qi R (si) R (xg)) The proof is completed. Appendix B. Additional Experimental Results Generative Adversarial Ranking Nets 95% 75% 50% 25% 5% 95% 75% 50% 25% 5% real percentile rank generated percentile rank (a) UT-Zap50K open 95% 75% 50% 25% 5% 95% 75% 50% 25% 5% real percentile rank generated percentile rank (b) UT-Zap50K sporty Figure 12: Real images (32 32, above the axis) vs. generated images (GARNet, below the axis) w.r.t. the score axis of attributes. The percentile rank of a given score is the percentage of scores in its frequency distribution that are less than that score. Real (generated) percentile rank means calculated among real (generated) images. (b) comfort Figure 13: Visual results (32 32, mode collapse) of FBGAN on LFW (smile) and UTZap50K (comfort, open and sporty). Yao, Pan, Li, Tsang, Yao (b) comfort Figure 14: Verification of GARNet generating novel images (32 32) (generated images not exactly alike any training images) on LFW (smile) and UT-Zap50K (comfort, open and sporty). Nearest neighbors are measured by the ℓ2 distance between images. The first column is generated samples from GARNet. The other columns are the generated samples nearest neighbors in the training dataset. Generative Adversarial Ranking Nets Martin Arjovsky, Soumith Chintala, and L eon Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214 223, 2017. Siddarth Asokan and Chandra Sekhar Seelamantula. Teaching a GAN what not to learn. In Hugo Larochelle, Marc Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems, 2020. Christopher J. C. Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Gregory N. Hullender. Learning to rank using gradient descent. In International Conference on Machine learning, 2005. Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. Learning to rank: from pairwise approach to listwise approach. In International Conference on Machine learning, pages 129 136, 2007. Sujoy Chatterjee, Anirban Mukhopadhyay, and Malay Bhattacharyya. A weighted rank aggregation approach towards crowd opinion analysis. Knowledge-Based Systems, 149: 47 60, 2018. Xi Chen, Paul N Bennett, Kevyn Collins-Thompson, and Eric Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 193 202, 2013. Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems, pages 4299 4307, 2017. Zihang Dai, Zhilin Yang, Fan Yang, William W Cohen, and Russ R Salakhutdinov. Good semi-supervised learning that requires a bad gan. Advances in neural information processing systems, 30, 2017. Xin Ding, Yongwei Wang, Zuheng Xu, William J Welch, and Z Jane Wang. Ccgan: Continuous conditional generative adversarial networks for image generation. In International conference on learning representations, 2020. Jesse H. Engel, Matthew D. Hoffman, and Adam Roberts. Latent constraints: Learning to generate conditionally from unconditional generative models. In International Conference on Learning Representations, 2018. Johannes F urnkranz and Eyke H ullermeier. Preference learning. In Claude Sammut and Geoffrey I. Webb, editors, Encyclopedia of Machine Learning and Data Mining, pages 1000 1005. Springer, 2017. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in Neural Information Processing Systems, 27:2672 2680, 2014. Yao, Pan, Li, Tsang, Yao Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5767 5777, 2017. Anvita Gupta and James Zou. Feedback gan for dna optimizes protein functions. Nature Machine Intelligence, 1(2):105, 2019. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017. Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. ar Xiv preprint ar Xiv:2207.12598, 2022. Gary B Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. In Workshop on Faces in Real-Life Images: Detection, Alignment, and Recognition, 2008. Jian Huang, Yuling Jiao, Zhen Li, Shiao Liu, Yang Wang, and Yunfei Yang. An error analysis of generative adversarial networks for learning distributions. Journal of Machine Learning Research, 23(116):1 43, 2022. Harold Jeffreys, Bertha Jeffreys, and Bertha Swirles. Methods of mathematical physics. Cambridge university press, 1999. Alexia Jolicoeur-Martineau. The relativistic discriminator: a key element missing from standard GAN. In International Conference on Learning Representations, 2019. Alexia Jolicoeur-Martineau. On relativistic f-divergences. In International Conference on Machine Learning, pages 4931 4939. PMLR, 2020. Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative adversarial networks with limited data. In Advances in Neural Information Processing Systems, 2020. Min Ji Kim, Farzad Farnoud, and Olgica Milenkovic. Hydra: gene prioritization via hybrid distance-score rank aggregation. Bioinform., 31(7):1034 1043, 2015. Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015. Anirban Laha, Saneem Ahmed Chemmengath, Priyanka Agrawal, Mitesh Khapra, Karthik Sankaranarayanan, and Harish G Ramaswamy. On controllable sparse alternatives to softmax. Advances in neural information processing systems, 31, 2018. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 2324, Nov 1998. Xiao Li, Chenghua Lin, Chaozheng Wang, and Frank Guerin. Latent space factorisation and manipulation via matrix subspace projection. International Conference on Machine Learning, 2020. Generative Adversarial Ranking Nets Lucy Liang and Kristen Grauman. Beyond comparing image pairs: Setwise active learning for relative attributes. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 208 215, 2014. Shili Lin. Rank aggregation methods. Wiley Interdisciplinary Reviews: Computational Statistics, 2(5):555 570, 2010. Tyler Lu and Craig Boutilier. Learning mallows models with pairwise preferences. In International Conference on Machine Learning, pages 145 152, 2011. Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In International Conference on Computer Vision, pages 2794 2802, 2017. Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. ar Xiv preprint ar Xiv:1411.1784, 2014. Takeru Miyato and Masanori Koyama. cgans with projection discriminator. In International Conference on Learning Representations, 2018. Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning, pages 2642 2651, 2017. Utkarsh Ojha, Krishna Kumar Singh, Cho-Jui Hsieh, and Yong Jae Lee. Elastic-infogan: Unsupervised disentangled representation learning in class-imbalanced data. In Advances in Neural Information Processing Systems, 2020. Yuangang Pan, Bo Han, and Ivor W Tsang. Stagewise learning for noisy k-ary preferences. Machine Learning, 107(8):1333 1361, 2018. Yuangang Pan, Ivor W Tsang, Weijie Chen, Gang Niu, and Masashi Sugiyama. Fast and robust rank aggregation against model misspecification. J. Mach. Learn. Res., 23:23 1, 2022. Devi Parikh and Kristen Grauman. Relative attributes. In International Conference on Computer Vision, pages 503 510. IEEE, 2011. Albert Pumarola, Stefan Popov, Francesc Moreno-Noguer, and Vittorio Ferrari. C-flow: Conditional generative flow models for images and 3d point clouds. In IEEE Conference on Computer Vision and Pattern Recognition, pages 7949 7958, 2020. Karthik Raman and Thorsten Joachims. Methods for ordinal peer grading. In International Conference on Knowledge Discovery and Data Mining, pages 1037 1046, 2014. Bidisha Samanta, Abir De, Gourhari Jana, Vicen c G omez, Pratim Kumar Chattaraj, Niloy Ganguly, and Manuel Gomez-Rodriguez. Nevae: A deep generative model for molecular graphs. The Journal of Machine Learning Research, 21(1):4556 4588, 2020. Yao, Pan, Li, Tsang, Yao Yassir Saquil, Kwang In Kim, and Peter M. Hall. Ranking cgans: Subjective control over semantic image attributes. In British Machine Vision Conference, page 131. BMVA Press, 2018. Sundararajan Sellamanickam, Priyanka Garg, and Sathiya Keerthi Selvaraj. A pairwise ranking based approach to learning with positive and unlabeled examples. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 663 672, 2011. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015. Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models. Advances in Neural Information Processing Systems, 28:3483 3491, 2015. Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, Neur IPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 11895 11907, 2019. James Stewart, Daniel K Clegg, and Saleem Watson. Multivariable calculus. Cengage Learning, 2020. Larissa T Triess, Andre B uhler, David Peter, Fabian B Flohr, and Marius Z ollner. Point cloud generation with continuous conditioning. In International Conference on Artificial Intelligence and Statistics, pages 4462 4481. PMLR, 2022. Ze Wang, Seunghyun Hwang, Zichen Miao, and Qiang Qiu. Image generation using continuous filter atoms. Advances in Neural Information Processing Systems, 34:17826 17838, 2021. Christian Wirth, Riad Akrour, Gerhard Neumann, Johannes F urnkranz, et al. A survey of preference-based reinforcement learning methods. Journal of Machine Learning Research, 18(136):1 46, 2017. Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on Machine learning, pages 1192 1199, 2008. Yinghua Yao, Yuangang Pan, Ivor W Tsang, and Xin Yao. Differential-critic gan: Generating what you want by a cue of preferences. IEEE Transactions on Neural Networks and Learning Systems, 2022. A. Yu and K. Grauman. Fine-grained visual comparisons with local learning. In IEEE Conference on Computer Vision and Pattern Recognition, Jun 2014. Zac Yu and Adriana Kovashka. Syntharch: Interactive image search with attributeconditioned synthesis. In Conference on Computer Vision and Pattern Recognition, CVPR Workshops, pages 645 654, 2020. Generative Adversarial Ranking Nets Zhibing Zhao, Peter Piech, and Lirong Xia. Learning mixtures of plackett-luce models. In International Conference on Machine Learning, pages 2906 2914. PMLR, 2016. Ke Zhou, Gui-Rong Xue, Hongyuan Zha, and Yong Yu. Learning to rank with ties. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275 282, 2008. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In International Conference on Computer Vision, pages 2223 2232, 2017.