# diverse_rare_sample_generation_with_pretrained_gans__4f31da49.pdf

Diverse Rare Sample Generation with Pretrained GANs

Subeen Lee1, Jiyeon Han1, Soyeon Kim1 and Jaesik Choi1,2

1Korea Advanced Institute of Science and Technology (KAIST), South Korea 2INEEJI, South Korea {sbrblee7, j.han, soyeon.k, jaesik.choi}@kaist.ac.kr

Deep generative models are proficient in generating realistic data but struggle with producing rare samples in low density regions due to their scarcity of training datasets and the mode collapse problem. While recent methods aim to improve the fidelity of generated samples, they often reduce diversity and coverage by ignoring rare and novel samples. This study proposes a novel approach for generating diverse rare samples from high-resolution image datasets with pretrained GANs. Our method employs gradient-based optimization of latent vectors within a multi-objective framework and utilizes normalizing flows for density estimation on the feature space. This enables the generation of diverse rare images, with controllable parameters for rarity, diversity, and similarity to a reference image. We demonstrate the effectiveness of our approach both qualitatively and quantitatively across various datasets and GANs without retraining or fine-tuning the pretrained GANs.

Code https://github.com/sbrblee/Div Rare Gen

1 Introduction Deep generative models have shown impressive generative capabilities across various domains. The primary focus of current generative model research is on enhancing the fidelity of generated images (De Vries, Drozdzal, and Taylor 2020; Karras, Laine, and Aila 2019; Azadi et al. 2018; Turner et al. 2019). However, these approaches often compromise sample diversity and encounter difficulties generating rare samples, primarily due to their limited representation in the training dataset (Sehwag et al. 2022; Lee et al. 2021). In GANs, this issue is worsened by the mode collapse problem (Thanh-Tung and Tran 2020). Investigating rare samples is crucial for several reasons: it enhances the creation of synthetic datasets that embody diversity and creativity (Sehwag et al. 2022; Agarwal, D souza, and Hooker 2022), ensures fairness in generative processes (Teo, Abdollahzadeh, and Cheung 2023; Hwang et al. 2020), and aligns with the human tendency to favor unique features (Snyder and Lopez 2001; Lynn and Harris 1997). Additionally, exploring edge cases and unusual scenarios is essential in various domains, such as drug discovery or molecular design

Copyright 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

(Sagar et al. 2023; Zeng et al. 2022), and natural hazard analysis (Ma, Mei, and Xu 2024). Several studies have been conducted to enhance the overall diversity of GAN-generated outputs and to promote the generation of rare samples (Chang et al. 2024; Allahyani et al. 2023; Humayun, Balestriero, and Baraniuk 2022, 2021; Heyrani Nobari, Rashad, and Ahmed 2021; Ghosh et al. 2018; Tolstikhin et al. 2017; Srivastava et al. 2017; Chen et al. 2016). Due to the high computational cost of training GANs (Karras, Laine, and Aila 2019; Brock, Donahue, and Simonyan 2018), techniques without model retraining are appealing. For example, Humayun, Balestriero, and Baraniuk (2022) proposed a resampling technique for pretrained GANs with a controllable fidelity-diversity tradeoff parameter. However, the proposed method requires extensive sampling to cover the data manifold fully. On the other hand, Chang et al. (2024) proposed a method to obtain diverse samples that satisfy text conditions by optimizing latent vectors with a quality-diversity objective. Han et al. (2023) proposes a rarity score for samples using relative density measures based on k-nearest neighbor (k NN) manifolds in the feature space of pretrained classifiers. Although k-NN density estimation is straightforward and reliable (Naeem et al. 2020; Kynk a anniemi et al. 2019), its non-differentiable nature complicates gradient-based optimization. In contrast, normalizing flows (NFs) excel at highdimensional density estimation in differentiable form (Papamakarios et al. 2021; Kingma and Dhariwal 2018; Dinh, Sohl-Dickstein, and Bengio 2016; Dinh, Krueger, and Bengio 2014). We employ NFs for density estimation in the feature space, which incurs a lower training cost compared to retraining or fine-tuning GANs, and analyze how NF-based density estimates relate to the rarity score. This study aims to generate diverse rare samples for a given high-resolution image datasets and GANs. Our method does not require any fine-tuning or retraining of the GANs but instead explores the latent space of the given model through gradient-based optimization. Our contributions are as follows: Our method can generate diverse versions of rare images utilizing the multi-start method in optimization, without being trapped at the same local optima. Rarity and diversity of generated images and similarity to the initial image can be controlled via a multi-objective

The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25)

Wearing a Hat Brown Hair à Colorful Hair

Getting Very Young or Old White à Non-White

white front

non-white front

white front glasses

non-white left

white front hat

Race Pose Acc.

white front hat

white low hat

white front hat

non-white right hat

white front hat, glasses

Race Pose Acc.

Figure 1: Examples of rare samples generated by our method. Left: Our method produces diverse rare images for a single reference, with variations even within the same rare attribute (e.g., hats of different shapes and colors). Right: Generated rare attributes include accessories like hats, non-brown hair colors, extreme ages, and non-white races. Pose refers to head orientation, and Acc. denotes accessories. Rare attributes are highlighted in bold.

optimization framework. We demonstrate the effectiveness of our method with various high-resolution image datasets and GANs, both qualitatively and quantitatively.

2 Related Work Rare Generation Han et al. (2023) introduced the rarity score to quantify the uniqueness of individual samples, distinguishing it from conventional metrics that primarily evaluate fidelity or diversity in generated samples (Kynk a anniemi et al. 2019; Zhang et al. 2018; Heusel et al. 2017). The rarity score is defined as the minimum k-nearest neighbor distance (k-NND) among real samples that are closer to the target sample, with higher scores indicating lower density within the real data manifold. However, obtaining rare samples has received limited attention. Sehwag et al. (2022) addressed this by leveraging a pretrained classifier to estimate likelihoods and adapting the sampling process of diffusion probabilistic models to target low-density regions while maintaining fidelity. Their method focuses on sampling from regions far from class mean vectors in the feature space and penalizes deviations from the overall mean vector of real data. However, this approach is classconditional and depends on a Gaussian likelihood function. On the other hand, Humayun, Balestriero, and Baraniuk (2022) tackled the mode collapse issue in GANs with Polarity sampling, a fidelity-diversity controllable resampling strategy for pretrained GANs. It approximates the GAN s output space using continuous piecewise affine splines. By tuning ρ, sampling can focus on modes (ρ < 0) or antimodes (ρ > 0), with higher ρ increasing diversity by targeting low-density regions. However, it does not guarantee the fidelity of the selected samples and requires extensive sampling and Jacobian matrix computations, leading to high computational costs.

Quality-Preserved Diverse Generation Using Pretrained GANs Generating rare samples is important, but maintaining quality is also crucial for their usefulness (Amabile 2018). Achieving diverse, high-fidelity samples is similar

to finding multiple solutions in combinatorial optimization. Chang et al. (2024) addressed this by proposing a qualitydiversity algorithm that updates the latent vector to balance quality and diversity, using the CLIP score (Radford et al. 2021) to measure similarity and diversity. In our work, we also optimize the latent vector to generate diverse samples; however, we prioritize rarity as the main objective rather than quality and use Euclidean distance in arbitrary feature spaces, avoiding the additional text constraints required by the CLIP score. To prevent low-fidelity samples, we apply a constraint to keep the sampled data within the real data manifold.

Reference-based Generation Finding diverse rare variations of a given initial image relates to reference-based image generation, encompassing tasks like domain adaptation (Yang et al. 2023), editing (Xia et al. 2023), and conditional generation (Casanova et al. 2021). While these approaches involve additional training costs for each attribute (Yang et al. 2023) or reference (Xia et al. 2023), or require a different GAN training scheme (Casanova et al. 2021), our method only requires a single training phase for the density estimator across multiple references with pretrained GANs.

Density Estimation for Images The rarity score identifies samples in low-density regions of the real data manifold, making it valuable for detecting rare generations. However, the non-differentiable nature of k-NN-based manifold estimation complicates gradient-based optimization for directly obtaining rare samples. In contrast, extensive research has been conducted to estimate density in high-dimensional spaces. Normalizing flows (NFs), as likelihood-based probabilistic models, use a sequence of invertible functions to transform a simple density into a complex one, potentially representing multi-modal distributions while preserving data relationships (Papamakarios et al. 2021; Kingma and Dhariwal 2018; Dinh, Sohl-Dickstein, and Bengio 2016; Dinh, Krueger, and Bengio 2014). While NFs may struggle with out-of-distribution data due to their focus on low-level features (Kirichenko, Izmailov, and Wilson 2020), training them on the feature space of a pretrained network which

Initial point (𝐱 ) Optimized point (𝐱𝑖)

Optimization path Random noise addition

Penalizing boundary Normalizing flow manifold k-NN manifold

(1) Rare optimization with multi-start (2) Diversity constraint (3) Similarity constraint

+ 𝜆1 max 𝑑𝐱𝑖, 𝐱 𝑑 2 log 𝑝(𝐱𝑖)

Figure 2: Schematic diagram for the objective function of our method. x = f(G(z )) and xi = f(G(zi)) for brevity.

includes high-level semantic information can help mitigate this issue (Esser, Rombach, and Ommer 2020).

Multi-Start Method for Diverse Solutions We frame our problem of obtaining diverse rare samples for each given reference as identifying multiple local minima around the reference in the data distribution. A straightforward approach to this problem is the multi-start method, an algorithm that iteratively searches for local optima starting from multiple initial points (Feo and Resende 1995; Rochat and Taillard 1995; Rinnooy Kan and Timmer 1987). This method selects several starting positions and applies a local search algorithm to each, aiming to locate distinct local optima. Although easy to implement, it does not always ensure that different starting points result in different local optima (Tarek and Huang 2022). To address this issue, we add the diversity and similarity constraints to the objective function, ensuring that each initial point converges to a different minimum.

3 Methods 3.1 Problem Statement Given a GAN generator G = G(z), an arbitrary initial latent vector z Rm, a feature extractor f = f(I), and a density estimator g = g(x), our objective is to find a set of latent vectors {zi}N i=1 that generates diverse rare samples which are similar to the image generated from z (referred to as the reference for the rest of the paper). Here, I Rw h 3 and x Rn denote an image and a feature vector, respectively. For simplicity, we denote x = f(G(z )) and xi = f(G(zi)). In this study, we define similarity as the Euclidean distance in the feature space, d(x1, x2) = x1 x2 , a metric shown to align well with human perception (Zhang et al. 2018). We propose a multi-objective optimization framework that integrates rarity, diversity, and similarity regularization, as illustrated in Fig. 2, and provide a detailed explanation in the subsequent sections.

3.2 Rare Sample Generation For rarity, the density estimated from g is used. NFs are employed due to their remarkable performance in highdimensional density estimation, though any differentiable

density estimator can be applied. NFs provide the exact loglikelihood of individual samples, allowing us to directly define the objective function as Lrare(x) = g(x) = log p(x) to minimize. To control the similarity between the generated rare image and the reference image, we incorporate a regularization term inspired by Chang et al. (2024). Specifically, we define the similarity loss as Lsim(x) = (max(d(x, x ), d ) d )2, where this term penalizes samples that exceed a predefined boundary, referred to as the penalizing boundary throughout this paper, defined by d . Specifically, we use the distance to k -nearest neighbor in the fake k-NN manifold for d . We also strictly accept the sample inside this boundary as well as inside the real k-NN manifold Φreal in optimization. The optimization goal is formulated by combining the two objectives rarity and similarity regularization as follows.

min z x=f(G(z)) Lrare(x) + λ1Lsim(x)

subject to x Φreal and d(x, x ) d (1)

3.3 Diverse Rare Sample Generation

We utilize the multi-start method to obtain diverse rare images by adding small random noises to z , generating multiple starting points for optimization. Specifically, {zi}N i=1 are initialized as zi = z + ϵ, where ϵ N(0, σ2I). However, as shown in Fig. 2 (1), this does not guarantee obtaining different rare images, but they might converge to the same local minima. To address this issue, we add a diversity constraint to the objective, Ldiv(xi) = P j =i d(xi, xj)2, ensuring that the samples are far from each other, as shown in Fig. 2 (2). This term is inspired by the expected distances in feature space, similar to the concept of Maximum Mean Discrepancy (MMD) (Gretton et al. 2006). Combining all objectives, the multi-objective optimization problem is formulated as follows.

min zi xi=f(G(zi)) Lrare(xi) + λ1Lsim(xi) + λ2Ldiv(xi)

subject to xi Φreal and d(xi, x ) d (2)

4 Experimental Results

We validate our proposed method using high-resolution image datasets with a resolution of 1024 1024, including Flickr Faces HQ (FFHQ) (Karras, Laine, and Aila 2019), Animal Faces HQ (AFHQ) (Choi et al. 2020), and Metfaces (Karras et al. 2020a). Style GAN2 with config-f (Karras et al. 2020b) and Style GAN2-ADA (Karras et al. 2020a) are utilized. For feature extraction, we employ the VGG16-fc2 architecture (Simonyan and Zisserman 2015). As the density estimator, the Glow architecture (Kingma and Dhariwal 2018) is adapted to accommodate the high dimensionality of the feature space. The optimization is performed using the Adam optimizer (Diederik 2014) with a learning rate of 2 10 2, combined with a Step LR scheduler. The best optimization results are recorded when the lowest loss is achieved according to Equation (2). Additional details including computational cost are provided in Appendix B.

4.1 Generation of Rare Facial Attributes with Style GAN2 (FFHQ-Style GAN2)

Quantitative Results As a baseline, 10,000 synthetic samples are generated using latent vectors from Style GAN2 with a truncation parameter of ψ = 1.0. With our method, we generate ten rare samples for each of 1,000 initial latent vectors from the baselines, using parameters λ1 = 30.0, λ2 = 0.002, σ = 0.1, and k = 1001. The choice of the parameters is explained in Appendix C. For Polarity sampling, 250,000 latent vectors and their corresponding Jacobian matrices are obtained from the authors (Humayun, Balestriero, and Baraniuk 2022), and 10,000 samples are resampled using ρ = [1.0, 5.0] (anti-mode sampling). Results are evaluated with metrics including the Rarity Score (RS) (Han et al. 2023), precision (Prec.) and recall (Rec.) for fidelity and diversity (Kynk a anniemi et al. 2019), LPIPS score for diversity (Zhang et al. 2018), and FID score (Heusel et al. 2017), as shown in Table 1. Each metric is computed using 10,000 generated samples, with the LPIPS score averaged over 10,000 random sample pairs. Significant differences in LPIPS scores between sampling methods are confirmed using an unpaired t-test. For the real k-NN manifold, k = 3 is used. Our method improves both rarity and diversity compared to the baseline, even when the optimization uses only 10% of the baseline samples as references. The FID score decreases because more samples are generated in low-density regions, reducing samples near the data distribution s modes. Polarity sampling also enhances rarity and diversity but sacrifices precision, as it primarily targets low-density regions in the GAN s output space rather than the real manifold, often generating out-of-distribution samples (structural zeros; (Kim and Bansal 2023)). Furthermore, in contrast to our objective of generating rare samples similar to a given reference, Polarity sampling is not designed for it. Finally, since Polarity sampling operates by resampling from an initial set, its diversity is heavily dependent on the size of that initial

1For the fake k-NN manifold estimation, 10,000 generated samples are used.

Model RS Prec. Rec. LPIPS FID Baseline 18.88 0.69 0.56 0.73 4.17 Polarity (ρ = 1.0) 24.71 0.39 0.70 0.75 33.28 Polarity (ρ = 5.0) 24.83 0.38 0.71 0.75 34.11 Ours 23.50 0.92 0.65 0.76 7.38

Table 1: Quantitative evaluation for Section 4.1.

Model-Based Attribute FFHQ Reference Ours(%)

0-9 7.51 7.60 8.56 10-69 92.23 92.20 90.98 Age Over70 0.25 0.20 0.45 Male 45.46 47.44 48.50 Gender Female 54.53 52.55 51.49 White 62.97 64.66 61.52 Race Non White 37.03 35.33 38.47 Front 60.98 55.25 45.73 Head Pose Not Front 39.01 44.75 54.26

Table 2: Percentage of age, gender, race, and head pose attributes predicted by Face Xformer for Section 4.1.

LFWA Attribute FFHQ Reference Ours(%) <10% Pale Skin 0.12 0.20 0.37 Mustache 0.39 0.30 0.46 Bald 0.88 1.20 2.18 Wearing Necktie 0.96 0.80 1.23 Pointy Nose 1.42 1.80 2.95 Gray Hair 2.43 1.90 3.12 Receding Hairline 3.32 2.40 3.92 Big Lips 4.66 5.90 7.26 Blond Hair 5.04 5.20 5.72 Eyeglasses 6.10 5.70 6.51 Wearing Hat 8.07 7.80 11.03 >10% Brown Hair 19.76 22.92 13.78

Table 3: Percentage of LFWA attributes predicted by Face Xformer for Section 4.1. Sorted in descending order of FFHQ(%). The entire table is in Table 11.

set. An additional comparison with Polarity sampling is in Appendix G.

Generated Rare Facial Attributes Our method successfully increases the percentages of rare attributes as shown in Tables 2 and 3. To identify rare facial attributes in FFHQ, we employ Face XFormer (Narayan et al. 2024), which provides multiple face-related features including age, gender, race, head pose, and the attributes from Deep Learning Face Attributes in the Wild (LFWA) dataset (Liu et al. 2015). Real data attributes with lower percentages include extreme ages (very young or old), male gender, non-white races, nonfrontal head poses2, non-natural skin colors, hairless or no hair, hair colors other than brown, and accessories.

2The head pose is predicted in the form of (θ1, θ2, θ3) =(pitch, yaw, roll). We define Front as the head pose with 15 < θ1, θ2, θ3 < 15 .

High k-NND Real Samples

Low Likelihood Real Samples Low Likelihood Fake Samples

Out of k-NN Manifold Fake Samples (Rarity Score=N/A)

High Rarity Score Fake Samples

Figure 3: Examples of highand low-density real and fake samples for Section 4.1.

Additionally, other rare attributes can be identified qualitatively. We select and visualize the topand bottom-ranked samples based on k-NN-based and likelihood-based density estimates in Fig. 3. For real samples, k-NND (Loftsgaarden and Quesenberry 1965) is employed, while the rarity score (Han et al. 2023) is used for fake samples. Likelihoods are estimated using the NF model, excluding samples outside the real k-NN manifold. Rare samples exhibit characteristics such as objects obscuring faces, face painting, various hats, colorful eyeglasses, and artifacts. Notably, samples with undefined (N/A) rarity scores may include high-fidelity, artifact-free images, which arise from underestimated regions in the k-NN manifold.

Qualitative Results As shown in Fig. 1 (Right) and 4, our method generates samples with rare attributes, including hats, hair colors other than natural brown, very young or old age, non-white races such as Black, Indian, and Asian, non-frontal head poses, eyeglasses, bald or receding hairline, colorful backgrounds or T-shirts, hair accessories, and unique skin colors. Moreover, the rare samples generated by our method show diversity, as shown in Fig. 1 (Left) and 5. Starting from initial vectors with small noise variations, we generate diverse rare images that retain perceptual similarity to the reference. Additional examples are provided in Appendix D.

4.2 Animal Face and Artwork Generation with Style GAN2-ADA Quantitative & Qualitative Results As a baseline, 5,000 synthetic samples are generated using latent vectors from Style GAN2-ADA with a truncation parameter of ψ = 1.0. With our method, we generate five rare samples for each of 1,000 initial latent vectors from the baseline, using parameters λ1 = 200.0, λ2 = 0.02, σ = 0.01 and k = 100. We also evaluate the results using various metrics, as presented in Table 4. Given that the AFHQ and Met-

Not Front Head Pose Eyeglasses

Hair Accessory Unique Skin Color

Bald or Receding Hairline Colorful Background or T-shirt

Figure 4: Examples of rare samples generated by our method for Section 4.1.

Initial Generated Diverse Rare Samples

Figure 5: Examples of diverse rare samples generated by our method for Section 4.1.

Faces datasets are relatively small, we use the KID score (Bi nkowski et al. 2018) instead of the FID score, as the KID score is inherently unbiased (Karras et al. 2020a). Our method effectively enhances rarity and diversity across all three datasets compared to the baseline. In Fig. 7, we visualize the examples generated by our method. More examples are in Fig.14 and 15.

Generated Rare Attributes of Animal Face and Artwork To identify the rare cat and dog breeds in AFHQ datasets, we employed Model Soups (Wortsman et al. 2022) for zero-shot classification of cat and dog classes in the Image Net dataset (Deng et al. 2009), which includes five cat classes and 120 dog classes. In the AFHQ Dog dataset, each dog class represents less than 5% of the total, so we grouped the dogs into broader categories based on appearance: Toy, Hound, Scent Hound, Terrier, Sporting, Non-Sporting, Herding, and Working. Further details are provided in Appendix E. The classified result is shown in Table 5, and our method successfully increases the percentages of minor classes. We also apply the Face XFormer and observe similar rare attributes in FFHQ, as shown in Table 6. To further identify rare attributes within the datasets, we visualize the highand low-likelihood samples in Fig. 6. For the AFHQ-cat dataset, high-likelihood samples predominantly consist of brown-colored Tabby cats, whereas low-

Data Model RS Prec. Rec. LPIPS KID 103

AFHQ Cat Ref. 17.47 0.76 0.49 0.73 0.46 Ours 20.71 0.76 0.68 0.78 2.03

AFHQ Dog Ref. 21.19 0.77 0.56 0.75 1.10 Ours 27.32 0.85 0.76 0.80 2.73

Met Faces Ref. 16.90 0.80 0.44 0.74 0.97 Ours 21.25 0.86 0.62 0.78 2.15

Table 4: Quantitative evaluation for Section 4.2.

High Likelihood Low Likelihood

Figure 6: Examples of highand low-likelihood real samples from the AFHQ Cat, Dog, and Met Faces dataset.

likelihood samples encompass a broader range of classes. In the AFHQ-dog dataset, high-likelihood samples are primarily drawn from the Herding and Sporting groups, including breeds such as Shetland Sheepdogs, Collies, and Retrievers. In contrast, low-likelihood samples span a variety of groups and exhibit greater diversity in head poses, backgrounds, and facial expressions. In the Metfaces dataset, high-likelihood samples predominantly include Europeanstyle oil paintings, while low-likelihood samples include statues, drawings, and other styles of paintings. As shown in Fig. 7, such rare attributes can be also observed in the results of our method.

4.3 Ablation Study on the Objective Function Our objective function includes three components: rarity, similarity, and diversity terms. To evaluate their effectiveness, we conduct an ablation study using the FFHQ dataset and Style GAN2, keeping the density estimator and parameters consistent. We optimize ten samples for each of the 100 initial latent vectors across different objective combinations. First, we assess results using only the Lrare. Adding the Lsim ensures that samples stay within a similarity boundary to the reference, potentially finding rarer and more diverse

Tabby à Rare Cat Breed

Unique Facial Expression

Western Style à Non-Western Style Eyeglasses, Hat

Herding/Sporting à Rare Dog Group

Colorful Background

Figure 7: Examples of rare samples generated by our method for Section 4.2.

Image Net Attribute Real Reference Ours(%) Cat Tabby 56.76 58.00 55.10 Others 43.23 42.00 44.90 Dog Herding 21.48 21.70 17.68 Sporting 18.50 17.30 15.24 Working 15.44 14.60 16.84 Toy 14.93 17.30 19.44 Terrier 9.11 9.60 7.72 Non-Sporting 7.02 5.90 7.38 Scent Hound 6.92 8.10 8.92 Hound 3.05 2.60 3.76

Table 5: Percentage of the cat-related breeds and dog-related groups in Image Net classes predicted by Model Soups. Sorted in descending order of Real(%). The entire table is in Table 12.

Model-Based Attribute Met F. Reference Ours(%)

Age 0-9 5.19 3.23 5.21 10-69 94.20 95.96 93.55 Over70 0.60 0.80 1.23

Gender Male 42.84 41.71 49.62 Female 57.15 58.28 50.37

White 73.64 77.97 77.96 Indian 9.86 9.19 9.03 Black 3.38 2.02 2.33 Asian 2.86 4.04 5.25 Others 10.24 6.76 5.40

Head Pose Front 42.54 46.06 32.86 Not Front 57.45 53.93 67.14

LFWA Eyeglasses 0.07 0.10 0.33 Wearing Hat 12.34 9.89 12.18

Table 6: Percentage of age, gender, race, head pose, Eyeglasses, and Wearing Hat attributes predicted by Face XFormer. Met F. refers to the Met Faces dataset. The entire table is in Table 17.

Objective RS LPIPS Lrare 18.99 0.752 Lrare + λ1Lsim 21.11 0.766 Lrare + λ1Lsim + λ2Ldiv 21.28 0.768

Table 7: Rarity scores (RS) and LPIPS scores for the ablation study on the objective function.

k-NND of real samples (k=3)

Pearson corr.=0.928 p-value=0.000

Rarity score of fake samples

Pearson corr.=0.815 p-value=0.000

Figure 8: Correlation plot for the k-NND / rarity score and negative log-likelihood estimated by the normalizing flow.

samples inside the boundary. Finally, incorporating the Ldiv completes the full objective. The results in Table 7 demonstrate the effectiveness of each term.

4.4 Relationship with Rarity Score We use the rarity score (Han et al. 2023) to measure sample rarity and demonstrate that our method improves rarity compared to other sampling methods. Although directly optimizing the rarity score is challenging, our likelihood-based objective effectively guides samples to locally low-density regions. To compare k-NN-based density measures (k-NND for real samples and rarity score for fake samples) with NFestimated density measures, we visualize the scatter plot and compute the Pearson correlation coefficient as represented in Fig. 8. We observe a high Pearson correlation coefficient of 0.928 for real samples and 0.815 for fake samples, with p-values < 10 8, excluding samples with undefined rarity scores. Although the NF estimates likelihood across the feature space, the rarity score is undefined outside the real k-NN manifold. This allows for out-of-manifold samples with sufficient quality, as shown at the bottom of Fig. 3. In Fig. 9, we visualize an optimization example that starts with an undefined rarity score but eventually gains and increases the rarity score. The sample, initially in an underestimated k-NN region, becomes rare by moving to a low-likelihood region, altering the reference image to achieve curlier blonde hair and a non-frontal head pose. We plot the NF-estimated density using RBF kernel interpolation and UMAP (Mc Innes, Healy, and Melville 2018) dimensionality reduction on the real feature space and its inverse transformation function. Further details are provided in Appendix F.

5 Conclusion We proposed a novel algorithm that generates diverse rare samples using multi-start gradient-based optimization, avoiding low-quality samples. Users can control rarity, di-

N/A point Optimized point Optimized point Real point Manifold

Figure 9: Example of the optimization path with a real k-NN manifold and a heatmap of likelihoods estimated by the normalizing flow. Notably, the local k-NN manifold includes only the three nearest real data points (k = 3) for each point, rather than the entire manifold.

versity, and similarity to the reference through a multiobjective approach. Our method successfully increased the prevalence of rare attributes in various image generation domains. We also provide an experimental comparison between k-NN-based and normalizing flow-based density estimation methods. We hope this work contributes to advancing creativity in deep generative models. However, there are some limitations that could be improved. The results rely on the GAN s capabilities and require an additional density estimator. Exploring other generative models might improve outcomes and eliminate the need for extra training. Additionally, our method alters multiple attributes simultaneously; integrating it with other image manipulation techniques could allow for more controlled manipulation.

Acknowledgments

This work was partly supported by KAIST-NAVER Hypercreative AI Center, and from the Korean Institute of Information & Communications Technology Planning & Evaluation and the Korean Ministry of Science and ICT under grant agreement No. RS-2019-II190075 (Artificial Intelligence Graduate School Program(KAIST)), No. RS-2022II220984 (Development of Artificial Intelligence Technology for Personalized Plug-and-Play Explanation and Verification of Explanation), and No.RS-2022-II220184 (Development and Study of AI Technologies to Inexpensively Conform to Evolving Policy on Ethics).

References Agarwal, C.; D souza, D.; and Hooker, S. 2022. Estimating example difficulty using variance of gradients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10368 10378. Allahyani, M.; Alsulami, R.; Alwafi, T.; Alafif, T.; Ammar, H.; Sabban, S.; and Chen, X. 2023. Div GAN: A diversity enforcing generative adversarial network for mode collapse reduction. Artificial Intelligence, 317: 103863. Amabile, T. M. 2018. Creativity in context: Update to the social psychology of creativity. Routledge. Azadi, S.; Olsson, C.; Darrell, T.; Goodfellow, I.; and Odena, A. 2018. Discriminator rejection sampling. ar Xiv preprint ar Xiv:1810.06758. Bi nkowski, M.; Sutherland, D. J.; Arbel, M.; and Gretton, A. 2018. Demystifying mmd gans. ar Xiv preprint ar Xiv:1801.01401. Brock, A.; Donahue, J.; and Simonyan, K. 2018. Large scale GAN training for high fidelity natural image synthesis. ar Xiv preprint ar Xiv:1809.11096. Casanova, A.; Careil, M.; Verbeek, J.; Drozdzal, M.; and Romero Soriano, A. 2021. Instance-conditioned gan. Advances in Neural Information Processing Systems, 34: 27517 27529. Chang, A.; Fontaine, M. C.; Booth, S.; Matari c, M. J.; and Nikolaidis, S. 2024. Quality-Diversity Generative Sampling for Learning with Synthetic Data. Proceedings of the AAAI Conference on Artificial Intelligence, 38(18): 19805 19812. Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; and Abbeel, P. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems, 29. Choi, Y.; Uh, Y.; Yoo, J.; and Ha, J.-W. 2020. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8188 8197. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248 255. Ieee. De Vries, T.; Drozdzal, M.; and Taylor, G. W. 2020. Instance selection for gans. Advances in Neural Information Processing Systems, 33: 13285 13296. Diederik, P. K. 2014. Adam: A method for stochastic optimization. (No Title). Dinh, L.; Krueger, D.; and Bengio, Y. 2014. Nice: Non-linear independent components estimation. ar Xiv preprint ar Xiv:1410.8516. Dinh, L.; Sohl-Dickstein, J.; and Bengio, S. 2016. Density estimation using real nvp. ar Xiv preprint ar Xiv:1605.08803. Esser, P.; Rombach, R.; and Ommer, B. 2020. A disentangling invertible interpretation network for explaining latent representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9223 9232. Feo, T. A.; and Resende, M. G. 1995. Greedy randomized adaptive search procedures. Journal of global optimization, 6: 109 133. Ghosh, A.; Kulharia, V.; Namboodiri, V. P.; Torr, P. H.; and Dokania, P. K. 2018. Multi-agent diverse generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 8513 8521. Gretton, A.; Borgwardt, K.; Rasch, M.; Sch olkopf, B.; and Smola, A. 2006. A kernel method for the two-sample-problem. Advances in neural information processing systems, 19.

Han, J.; Choi, H.; Choi, Y.; Kim, J.; Ha, J.-W.; and Choi, J. 2023. Rarity Score: A New Metric to Evaluate the Uncommonness of Synthesized Images. In International Conference on Learning Representations (ICLR). International Conference on Learning Representations. Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; and Hochreiter, S. 2017. Gans Trained by a Two Time-scale Update Rule Converge to a Local Nash Equilibrium. Advances in neural information processing systems, 30. Heyrani Nobari, A.; Rashad, M. F.; and Ahmed, F. 2021. Creativegan: Editing generative adversarial networks for creative design synthesis. In International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, volume 85383, V03AT03A002. American Society of Mechanical Engineers. Humayun, A. I.; Balestriero, R.; and Baraniuk, R. 2021. Ma GNET: Uniform sampling from deep generative network manifolds without retraining. ar Xiv preprint ar Xiv:2110.08009. Humayun, A. I.; Balestriero, R.; and Baraniuk, R. 2022. Polarity sampling: Quality and diversity control of pre-trained generative networks via singular values. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10641 10650. Hwang, S.; Park, S.; Kim, D.; Do, M.; and Byun, H. 2020. Fairfacegan: Fairness-aware facial image-to-image translation. ar Xiv preprint ar Xiv:2012.00282. Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; and Aila, T. 2020a. Training generative adversarial networks with limited data. Advances in neural information processing systems, 33: 12104 12114. Karras, T.; Laine, S.; and Aila, T. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4401 4410. Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; and Aila, T. 2020b. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8110 8119. Kim, E.-J.; and Bansal, P. 2023. A deep generative model for feasible and diverse population synthesis. Transportation Research Part C: Emerging Technologies, 148: 104053. Kingma, D. P.; and Dhariwal, P. 2018. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31. Kirichenko, P.; Izmailov, P.; and Wilson, A. G. 2020. Why normalizing flows fail to detect out-of-distribution data. Advances in neural information processing systems, 33: 20578 20589. Kynk a anniemi, T.; Karras, T.; Laine, S.; Lehtinen, J.; and Aila, T. 2019. Improved precision and recall metric for assessing generative models. In Neur IPS. Lee, J.; Kim, H.; Hong, Y.; and Chung, H. W. 2021. Selfdiagnosing gan: Diagnosing underrepresented samples in generative adversarial networks. Advances in Neural Information Processing Systems, 34: 1925 1938. Liu, Z.; Luo, P.; Wang, X.; and Tang, X. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV). Loftsgaarden, D. O.; and Quesenberry, C. P. 1965. A nonparametric estimate of a multivariate density function. The Annals of Mathematical Statistics, 36(3): 1049 1051.

Lynn, M.; and Harris, J. 1997. Individual differences in the pursuit of self-uniqueness through consumption. Journal of Applied Social Psychology, 27(21): 1861 1883.

Ma, Z.; Mei, G.; and Xu, N. 2024. Generative deep learning for data generation in natural hazard analysis: motivations, advances, challenges, and opportunities. Artificial Intelligence Review, 57(6): 160.

Mc Innes, L.; Healy, J.; and Melville, J. 2018. Umap: Uniform manifold approximation and projection for dimension reduction. ar Xiv preprint ar Xiv:1802.03426.

Naeem, M. F.; Oh, S. J.; Uh, Y.; Choi, Y.; and Yoo, J. 2020. Reliable fidelity and diversity metrics for generative models. In ICML.

Narayan, K.; VS, V.; Chellappa, R.; and Patel, V. M. 2024. Face XFormer: A Unified Transformer for Facial Analysis. ar Xiv preprint ar Xiv:2403.12960.

Papamakarios, G.; Nalisnick, E.; Rezende, D. J.; Mohamed, S.; and Lakshminarayanan, B. 2021. Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57): 1 64.

Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748 8763. PMLR.

Rinnooy Kan, A.; and Timmer, G. T. 1987. Stochastic global optimization methods part I: Clustering methods. Mathematical programming, 39: 27 56.

Rochat, Y.; and Taillard, E. D. 1995. Probabilistic diversification and intensification in local search for vehicle routing. Journal of heuristics, 1: 147 167.

Sagar, D.; Risheh, A.; Sheikh, N.; and Forouzesh, N. 2023. Physics-Guided Deep Generative Model For New Ligand Discovery. In Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, 1 9.

Sehwag, V.; Hazirbas, C.; Gordo, A.; Ozgenel, F.; and Canton, C. 2022. Generating high fidelity data from low-density regions using diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11492 11501.

Simonyan, K.; and Zisserman, A. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.

Snyder, C. R.; and Lopez, S. J. 2001. Handbook of positive psychology. Oxford university press.

Srivastava, A.; Valkov, L.; Russell, C.; Gutmann, M. U.; and Sutton, C. 2017. Veegan: Reducing mode collapse in gans using implicit variational learning. Advances in neural information processing systems, 30.

Tarek, M.; and Huang, Y. 2022. Simplifying deflation for nonconvex optimization with applications in Bayesian inference and topology optimization. ar Xiv preprint ar Xiv:2201.11926.

Teo, C. T.; Abdollahzadeh, M.; and Cheung, N.-M. 2023. Fair generative models via transfer learning. Proceedings of the AAAI conference on artificial intelligence, 37(2): 2429 2437.

Thanh-Tung, H.; and Tran, T. 2020. Catastrophic forgetting and mode collapse in GANs. In 2020 international joint conference on neural networks (ijcnn), 1 10. IEEE.

Tolstikhin, I. O.; Gelly, S.; Bousquet, O.; Simon-Gabriel, C.-J.; and Sch olkopf, B. 2017. Adagan: Boosting generative models. Advances in neural information processing systems, 30.

Turner, R.; Hung, J.; Frank, E.; Saatchi, Y.; and Yosinski, J. 2019. Metropolis-hastings generative adversarial networks. In International Conference on Machine Learning, 6345 6353. PMLR. Wortsman, M.; Ilharco, G.; Gadre, S. Y.; Roelofs, R.; Gontijo Lopes, R.; Morcos, A. S.; Namkoong, H.; Farhadi, A.; Carmon, Y.; Kornblith, S.; et al. 2022. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International conference on machine learning, 23965 23998. PMLR. Xia, M.; Shu, Y.; Wang, Y.; Lai, Y.-K.; Li, Q.; Wan, P.; Wang, Z.; and Liu, Y.-J. 2023. FEdit Net: few-shot editing of latent semantics in GAN spaces. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 2919 2927. Yang, C.; Shen, Y.; Zhang, Z.; Xu, Y.; Zhu, J.; Wu, Z.; and Zhou, B. 2023. One-shot generative domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 7733 7742. Zeng, X.; Wang, F.; Luo, Y.; Kang, S.-g.; Tang, J.; Lightstone, F. C.; Fang, E. F.; Cornell, W.; Nussinov, R.; and Cheng, F. 2022. Deep generative molecular design reshapes drug discovery. Cell Reports Medicine, 3(12). Zhang, R.; Isola, P.; Efros, A. A.; Shechtman, E.; and Wang, O. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, 586 595.