# disguised_copyright_infringement_of_latent_diffusion_models__6a7e9926.pdf

Disguised Copyright Infringement of Latent Diffusion Models

Yiwei Lu * 1 2 Matthew Y.R. Yang * 1 Zuoqiu Liu * 1 Gautam Kamath 1 2 Yaoliang Yu 1 2

Copyright infringement may occur when a generative model produces samples substantially similar to some copyrighted data that it had access to during the training phase. The notion of access usually refers to including copyrighted samples directly in the training dataset, which one may inspect to identify an infringement. We argue that such visual auditing largely overlooks a concealed copyright infringement, where one constructs a disguise that looks drastically different from the copyrighted sample yet still induces the effect of training Latent Diffusion Models on it. Such disguises only require indirect access to the copyrighted material and cannot be visually distinguished, thus easily circumventing the current auditing tools. In this paper, we provide a better understanding of such disguised copyright infringement by uncovering the disguises generation algorithm, the revelation of the disguises, and importantly, how to detect them to augment the existing toolbox. Additionally, we introduce a broader notion of acknowledgment for comprehending such indirect access. Our code is available at https://github.com/watml/d isguised_copyright_infringement.

1. Introduction

Generative models, especially the recent advanced Latent Diffusion Models (LDM) (Rombach et al. 2022), have shown tremendous ability to generate new images, even of creative or artistic form according to text prompts. Such models are trained on a large corpus of data, which may consist of copyrighted material. Additionally, prior works have established that such generative models are prone to regurgitating content from their training data (Ippolito et al. 2023; Zhang et al. 2021; Carlini et al. 2022; Vyas et al.

*Equal contribution 1School of Computer Science, University of Waterloo, Waterloo, Canada 2Vector Institute. Correspondence to: Yiwei Lu <yiwei.lu@uwaterloo.ca>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

2023; Somepalli et al. 2023a; Somepalli et al. 2023b), which may also be copyrighted.

In this paper, we will focus on copyright law within the jurisdiction of the United States. To establish a copyright violation, two factors must be present. First, the accused must have had access to the copyrighted material. Second, the accused must produce content that bears substantial similarity to the copyrighted material (reproducing). Note that the definition of substantial similarity can be ambiguous. Within the context of images, its definition appears to be relatively broad (Steinberg v. Columbia Pictures Industries, Inc. 1987), and in particular encompasses near-exact copies.

Turning our attention to the former access criterion: the natural way to establish that a model had access to a particular piece of copyrighted material is to inspect its training data. For example, in the case of Andersen v. Stability AI Ltd. (Dist. Court 2023), the case was allowed to proceed based on the fact that copyrighted images were found in LAION-5B (Schuhmann et al. 2022) (the training data used for Stable Diffusion) using haveibeentrained.com.

We challenge the perspective that such visual auditing is sufficient to establish access to copyrighted material. Our results show that it is possible to conceal copyrighted images within the training dataset for LDMs. Specifically, LDMs are equipped with a fixed encoder for dimension reduction such that the diffusion learning process occurs in the latent space. This structure can be maliciously exploited to generate disguised copyrighted samples: given a copyrighted image, we show how to generate a disguise such that it is visually different from the copyrighted sample but shares similar latent information. The closeness of the two samples in the latent space can be quantitatively measured by a distance function, or qualitatively revealed by a concept extraction tool called textual inversion (Gal et al. 2022), both of which we will demonstrate in our empirical study.

Our study reveals the possibility of creating a new training dataset that does not appear to directly or blatantly contain any copyrighted data. Nonetheless, if a model is trained on this derivative training dataset, it would behave similarly as if the copyrighted data were present. Such disguises may still exhibit copyright infringement, although only accessing proprietary data indirectly. In Figure 1, we display a comparison between the previous copyright infringement

Disguised Copyright Infringement of Latent Diffusion Models

Figure 1: An overview of conventional (with direct access to copyrighted material) and disguised (with indirect access) copyright infringement for latent diffusion models. For direct access, training an LDM-based model on copyrighted material xc and reproducing xc is subjected to copyright infringement. For indirect access, one trains the same model on disguised samples xd, which are drastically different from xc, but is still able to reproduce xc during inference.

phenomenon (direct access) with the disguised copyright infringement (indirect access). Clearly, there was still access to the copyrighted material in the latter training pipeline, which raises the following question:

What constitutes access? How to quantify it?

We answer the first question by introducing a notion of acknowledgment, which refers to a criterion that any sample that contains similar latent information as that of a copyrighted sample should be considered acknowledging it, despite possible visual dissimilarity. To quantify acknowledgment in practice, a deeper inspection than visually auditing the training set is required. Thus we further propose a twostep detection method: (1) a feature similarity search for screening suspects; (2) an encoder-decoder examination to confirm disguises, which augments the existing criterion. In summary, we make the following contributions:

We challenge the current access criterion and point out its insufficiency in more delicate cases of copyright infringement;

We propose an algorithm that demonstrably crafts disguised copyrighted data to conceal the content (or concepts) of copyrighted images in the training set;

We show disguised data contain copyrighted information in the latent space, such that by finetuning them on textual inversion or Dream Booth, or training on LDM, the model reproduces copyrighted data during inference;

We propose methods to detect such disguises, which further encourage the expansion and quantification of access in the context of copyright infringement.

2. Background

2.1. Diffusion Models

We focus on latent diffusion models (Rombach et al. 2022) as they are ideal for text-to-image generation. We first recall the objective of regular diffusion models (Sohl-Dickstein et al. 2015):

LDM = Ex,ε,t h ε εθ(xt, t) 2 2 i ,

where ε N(0, I), t is the timestep uniformly sampled from {1, ..., T}, and xt is a noisy version of the input sample x at timestep t . Briefly, diffusion models are probabilistic models designed to learn a data distribution p(x) by gradually denoising a normally distributed variable, which corresponds to learning the reverse process of a fixed Markov chain of length T.

Note that the intermediate xt are all in pixel space, which makes the training and inference of diffusion models expensive. To address this problem, Rombach et al. (2022) propose to perform the diffusion process in the latent space, namely latent diffusion models (LDM). Specifically, LDMs utilize a pre-trained (and fixed) autoencoder architecture which consists of an encoder E and a decoder D, where E is only used during training and D is only used during inference. The objective of LDM can be expressed as:

LLDM = Ex,ε,t h ε εθ(E(xt), t) 2 2 i .

Disguised Copyright Infringement of Latent Diffusion Models

When we perform text-to-image tasks, a text condition (or prompt) y needs to be considered. Such a text condition is compressed into the latent space with a pre-trained textembedding model cθ( ) (typically a BERT (Devlin et al. 2018) text encoder) and fed into LDM training as a conditioning vector:

LLDM-c = Ex,y,ε,t h ε εθ(E(xt), t, cθ(y)) 2 2 i . (1)

Intuitively, LDMs perform the diffusion training process on a compressed version of the training data, while the compression procedure is deterministic. As a result, it is possible that two training samples share similar latent representations, but differ significantly visually. Based on this intuition, we propose a method to generate disguised data, demonstrating that the use of pre-trained autoencoders may conceal the inclusion of copyrighted data in the training set.

2.2. Data Poisoning attacks

Data poisoning attacks refer to the threat of contaminating part of the training data such that a machine learning model trained on the mixture of clean and poisoned data is influenced toward certain behaviors. Existing data poisoning attacks mainly focus on poisoning supervised models (classifiers). Based on the objective, there exist (1) indiscriminate attacks (e.g., Biggio et al. 2012; Koh and Liang 2017; Koh et al. 2022; Muñoz-González et al. 2017; Lu et al. 2022; Lu et al. 2023; Lu et al. 2024), which seek to decrease the overall test accuracy; (2) targeted attacks (e.g., Shafahi et al. 2018; Aghakhani et al. 2021; Guo and Liu 2020; Zhu et al. 2019; Geiping et al. 2021) that only alter the prediction on specific test examples; (3) backdoor attacks (e.g., Gu et al. 2017; Tran et al. 2018; Chen et al. 2017; Saha et al. 2020) that trigger if a particular pattern appears in an example; (4) unlearnable examples (e.g., Liu and Chawla 2010; Huang et al. 2021; Yu et al. 2022; Fowl et al. 2021b; Fowl et al. 2021a; Sandoval-Segura et al. 2022; Fu et al. 2021) that aim to reduce the utility of the data towards training a model. Moreover, the recent advanced Nightshade (Shan et al. 2024) shows some success in poisoning LDMs on prompt-specific text-to-image tasks; Wan et al. (2023) and Jiang et al. (2023) also explore data poisoning attacks against large language models.

Copyright infringement can be regarded as a special case of data poisoning, where one includes some poisoned data as a subset of a training set χ, such that after training a LDM on χ, the model reproduces a copyrighted sample xc. For direct access, poisoned data simply refers to direct copies of xc. In this paper, we also examine indirect access, where one can construct the poisoned data so they are visually different from xc using an adaptation of the data poisoning attack in Shafahi et al. (2018).

2.3. Textual Inversion

Art generation and related copyrighted material reproduction frequently happen in LDM-based generative tools. For example, textual inversion (Gal et al. 2022) is a popular tool for modifying personalized images with language-guided LDMs. Textual inversion uses a small set of images (typically three to five), which depicts the target concept S (which is a placeholder in a text prompt, e.g., A photo of S ) across multiple settings such as varied backgrounds or poses. The concept S is tokenized , converted to an embedding ν , and then encoded as part of the text embedding cθ(y). The embedding ν is acquired by optimizing the LDM loss in Equation 1 over the training images:

ν = argmin ν LLDM-c (2)

Optimization is performed using the same training scheme as the original LDM while keeping εθ, cθ and E fixed. After obtaining ν , it can be combined with various text prompts (typically derived from the CLIP Image Net templates (Radford et al. 2021)) for generation. Overall, textual inversion is a great tool for capturing particular concepts and may also be used to reproduce copyrighted content.

3. Generating Disguises

We describe how to conceal copyrighted images within the training dataset. Specifically, we demonstrate that one can construct a disguised sample xd that is visually distinct from a copyrighted sample xc, but contains essentially similar information in the latent space (measured via the distance between feature representations). Specifically, we first recall the feature matching attack (Shafahi et al. 2018) designed for targeted data poisoning attacks, and then discuss how to adapt the attack to generate disguises.

3.1. Feature Matching Attack

We recall that targeted attacks aim to change the prediction (e.g., causing misclassification) of a model (typically a classifier) on a targeted test sample xt (with label yt) by training on a mixture of the existing clean data and poisoned data. Specifically, Shafahi et al. (2018) propose a feature matching attack, where a poisoned sample xp is acquired by making imperceptible changes to a base sample xb with label yb (different than yt) such that the feature representation of the poisoned sample f(xp) matches that of the target sample f(xt), where the model f is a (usually fixed) pretrained feature extractor. By training the model (e.g., with a softmax layer on top of f ) on xp, it is likely to misclassify the target sample xt as the wrong (base) label yb.

Note that the generality of the feature matching attack is restricted in the context of classification tasks because it is only applicable when the feature extractor f is fixed, which

Disguised Copyright Infringement of Latent Diffusion Models

Figure 2: An illustration on the algorithm to generate disguises. We aim to optimize the loss L consisting of an input space constraint that measures the distance between the base image xb and the disguise xd in the input space, and a feature space constraint that measures the distance between the copyrighted xc and xd in the feature space.

may only be applicable for fine-tuning or transfer learning. However, we will show how it can be used to craft disguised copyrighted images in the next paragraph. Adapting to copyright infringement: Training LDMs on a dataset χ amounts to performing a regular diffusion learning process on a latent dataset ζ, where ζ = E(χ). Suppose there exists a set of Ncopyrighted data {xi c}N i=1 χ, which we aim to substitute with the same amount of disguised data {xi d}N i=1 such that the latent dataset ζ stays intact. Compared with data poisoning, the target sample xt is set to xc, the feature extractor f is set to E, the base sample can be any uncopyrighted image different from xc, the sample we aim to optimize is xd, and the feature matching algorithm can be adapted. Conveniently, the encoder model E is pre-trained and its weights are fixed during the training of diffusion, thus making the above attack realistic.

3.2. Disguised Copyrighted Image Generation

Now we introduce our disguise generation algorithm for individual images. Formally, given a target copyrighted image xc, a base image xb which is chosen to be visually different from xc, distance measures D1( ) and D2( ) for input space and latent space respectively, and a fixed pretrained encoder E, we aim to construct a disguised image xd such that it satisfies:

D1 = D1(xb, xd) γ1, D2 = D2(E(xc), E(xd)) γ2,

Algorithm 1 Disguise Generation Input: copyrighted image xc, base image xb, pre-trained encoder E, input threshold γ1, feature threshold γ2, distance measure on input space D1( ), distance measure on feature space D2( ), hyperparameter on input space constraint α, learning rate η.

1 Initialize disguise xd with base image xb 2 repeat

3 D1 D1(xb, xd) // image distance

4 D2 D2(E(xc), E(xd)) // feature distance

5 L αD1 + D2 // calculate loss

6 xd xd η L

xd // update disguise

7 xd ProjΓ(xd) // project to admissible set

8 until D1 γ1 and D2 γ2 9 return disguise xd

where γ1 is the input threshold that measures the visual distance of xd compared with xb, γ2 is the threshold that measures the feauture distance between xc and xd , which can be both determined empirically. We can then express the objective function as follows:

argmin xd αD1 + D2,

where α is a tunable hyperparameter for controlling the tradeoff. To illustrate more, the extreme case α = 0 refers to the scenario where there is no input space constraint on xd while α = refers to no feature space constraint. Our algorithm is summarized in Algorithm 1 and Figure 2.

Connection to Nightshade: Shafahi et al. (2018) introduced the idea of attacking a model by manipulating training datapoints so that their feature representation resembles that of a different point. They applied this principle to targeted data poisoning attacks against classifiers, whereas Shan et al. (2024) (in their work Nightshade ) and we apply it to LDMs. To highlight some of the differences between our work and that of Shan et al. (2024), there are crucial technical differences in how we instantiate this algorithm. Specifically, Nightshade intends to stop LDM from producing the image containing the correlated concept. In contrast, we aim to urge LDM to reproduce the target image (copyrighted image) through indirect access. Furthermore, the two works employ this algorithm towards very different ends. Nightshade aims to sabotage the connection between an image and its corresponding text prompt, thus requiring a pair of (image, text) as input. In contrast, our method only requires the form of images as input.

Disguised Copyright Infringement of Latent Diffusion Models

Disguises xd

Inference on textual inversion by learning on xd only

Figure 3: The disguised symbol on textual inversion. The first row from left to right: Column (1) the designated copyrighted symbol; Columns (2)-(5) four disguises xd generated with different xb. The second row: images generated by textual inversion after training on the above xd.

4. Revealing Disguises

Recall that the set of disguised data {xi d}N i=1 is constructed to substitute their copyrighted counterpart {xi c}N i=1 such that the latent dataset ζ is not changed much. To evaluate whether this goal is accomplished, we extract the latent information contained in the acquired samples xd using textual inversion to qualitatively reveal the disguises. We also report the feature threshold γ2 for quantitative analysis.

Moreover, examination of textual inversion also demonstrates how disguised copyright infringement may occur in the wild, i.e., reproducing copyrighted materials using products that utilize pre-trained LDMs to generate arts, which bring major concerns for human artists. Such LDM-based methods usually require only a few images as the training set χ, and can be easily substituted by disguises entirely. In Appendix F, we will further extend our evaluation to Dream Booth (Ruiz et al. 2023) and LDM training.

4.1. Experimental settings

Note that the copyrighted images xc used in our experiments may not be actually copyrighted, but used as a substitute to demonstrate disguised copyright infringement.

LDM: We adopt the official Py Torch implementation1 of conditional LDM (Rombach et al. 2022) and acquire the pre-trained weights2(including that of the encoder E, the denoising U-Net εθ and the text embedding cθ( )) of a 1.45B parameter KL-regularized LDM-8 (8 denotes downsampling factor) model conditioned on language prompts on LAION-

1https://github.com/Comp Vis/latent-diffu sion

2https://ommer-lab.com/files/latent-diffu sion/nitro/txt2img-f8-large/model.ckpt

Figure 4: We show the disguised copyrighted content on textual inversion. The first row: the designated copyrighted image xc (The Sunflowers by Vincent Van Gogh); the second row: three disguises xd generated with different xb; the third row: images generated by textual inversion after training on the above xd.

400M (Schuhmann et al. 2021). We apply the pre-trained encoder E in Algorithm 1 to generate disguises, and the entire pre-trained LDM for textual inversion. Note that the LDM is not re-trained in our experiments.

Generating Disguises: Throughout our experiments, we apply the pre-trained KL-regularized encoder E, and set the input distance measure D1( ) as a sum of the multi-scale structural similarity index (MS-SSIM) loss (Wang et al. 2003) and L1 loss following the analysis of (Khare et al. 2021), and the feature distance measure to be the L2 loss: D2(E(xc), E(xd)) = E(xc) E(xd) 2. The choice of the copyrighted image xc, base image xb, the input threshold γ1, the feature threshold γ2 and the hyperparameter on the input constraint α are task-dependent (for different copyrighted material), and we specify them in their corresponding paragraphs below. We set the admissible set to be in the range of [0, 1] as the legitimate (normalized) image pixel value and run the algorithm for 100000 epochs (early stop if the stopping criteria are reached) for all experiments.

Textual inversion: After acquiring the disguises xd, we feed them (we generate N = {3, 4} disguises for extracting the same concept, following the normal recipe) into textual inversion (Gal et al. 2022) and optimize the embedding ν with Equation 2. We adopt the official Py Torch implementa-

Disguised Copyright Infringement of Latent Diffusion Models

Figure 5: We show the disguised copyrighted style on textual inversion. The first row: the designated copyrighted style xc (in the style of The Starry Night by Vincent Van Gogh); the second row: disguises xd generated with different xb; the third row: images generated by textual inversion after training on the above xd.

tion3, which utilizes the pre-trained KL-regularized LDM-8 with the same encoder E. Note that training textual inversion requires an initial word as input, which is an initialization for the concept S . Throughout our experiments, we choose the initial word to match the visual appearance of the disguises (thus the concept of the base image) and we observe that changing the initial word does not lead to a significant difference for the algorithm. For generating new images, we select from the pool of prompts included in the existing implementation of textual inversion4 which we specify for each task below.

4.2. Concepts learned on disguises

We demonstrate the concepts learned by textual inversion on the disguises with respect to different copyrighted patterns. (1) a copyrighted symbol; (2) copyrighted content; and (3) style. We note that although the style is generally not copyrightable, there are still ongoing debates about the ethics of using copyrighted artwork to train a model that can imitate an artist s style5. For this reason, we also examine

3https://github.com/rinongal/textual_inve rsion

4https://github.com/rinongal/textual_inve rsion/blob/main/ldm/data/personalized.py

5https://news.bloomberglaw.com/ip-law/a i-imitating-artist-style-drives-call-to-r ethink-copyright-law

the case of hiding an artist s style in disguised images.

Disguised symbol: We first show the easiest scenario to generate disguises, namely copyrighted symbol. In Figure 3, we pick the symbol A as xc (note that here we abstract xc as the symbol A , in our experiment, it is combined with each base image to create the corresponding copyrighted image xc, see Figure 8 in Appendix A) and choose four images of a water bottle as base images xb. The base images do not need to be carefully constructed and are photos taken by ourselves. Each disguise xd is created by taking a pair of xc and xb as input to Algorithm 1. We set γ1 = 0.05, γ2 = 0.35 (normalized to the range of [0, 1]) as the threshold of input and feature distance, respectively, and α = 8000. By feeding xd into textual inversion with the text prompt a photo of a * , we reproduce the target symbol A without being exposed to the semantic information of the copyrighted content.

Disguised content We then show a more challenging task to generate disguised content. In Figure 4, we pick three drawings of The Sunflowers6 by Vincent Van Gogh as xc (first row). Due to the difficulty of the task, we cannot choose any image as xb. Thus we first blur xc such that they lose their semantic information and retain the color pattern, then we add simple sketches of houses7 as base images xb (second row). In Figure 14 in Appendix A, we show the background is essential for our purpose, where the disguises are ineffective with a white background. We set γ1 = 0.08, γ2 = 0.26, α = 4000. By feeding xd into textual inversion with the text prompt a photo of a * , we recover the content of the sunflowers (third row) without containing the semantic information of xd.

Disguised style Finally, we show the results for style scraping. In Figure 5, we pick three drawings8 in the style of The Starry Night by Vincent Van Gogh as xc (first row). The base images xb (second row) are the target images with another style (watercolor), generated with Ada INbased (Huang and Belongie 2017) style transfer9. We set γ1 = 0.03, γ2 = 0.33, α = 2000. By feeding xd into textual inversion with the text prompt a painting in the style of * , we reproduce the style of xc (The Starry Night) while textual inversion only learns from images that visually resemble watercolor style.

6From left to right: F456 (1888), F453 (1888), F455 (1889). 7These simple sketches of houses were generated by Chat GPT with the text prompt create a very very simple line art of a very simple house in the middle with a white background 8From left to right: The Neckarfront in Tubingen, Germany (photo by Andreas Praefcke); The Faroe Islands (courtesy Listasavn Føroya); Taj Mahal (adamkaz/Getty Images) in the style of The Starry Night, generated with Neural Style Transfer (Gatys et al. 2015). 9https://github.com/tyui592/Ada IN_Pytorch

Disguised Copyright Infringement of Latent Diffusion Models

no constraint (α = 0)

base (α = )

Figure 6: The effect of tuning the hyperparameter α on disguises (Rows 1-2: symbol; Rows 3-4: style). Rows 1 and 3 show disguises generated with different α; Rows 2 and 4 show the images generated by textual inversion by training on the disguises above. Note that the criterion on the input and feature threshold may not be satisfied in this series of experiments and we only train the disguise generation algorithm until convergence.

In summary, for disguised symbols, generating the disguises xd with any base image xb is easy. However, for generating disguised content and style, one needs to choose the base image carefully for successful optimization.

4.3. Visual appearance of disguises

Next, we control the visual appearance of xd by tuning α, which is the weighting parameter of the input space constraint. In Figure 6, we show the tradeoff of tuning the hyperparameter α: (1) a smaller α indicates weaker input space constraint and it could lead to an ineffective disguise, where xd still visually contains the copyrighted material; (2) in contrast, a bigger α shifts the optimization focus away from feature matching to generate latent embeddings distinct from the copyrighted content s. In the latter case, the tradeoff is that despite having a strong disguise that visually hides the copyrighted content, the textual inversion process may not successfully learn the copyrighted material, thus rendering our attack pointless. Note that there is no

input constraint for the extreme case α = 0, and the visual appearance of disguises largely depends on the initialization (we initialize with xb above) and we show the results for different initialization in Figure 15 in Appendix A.

4.4. Detection

Next, we introduce a two-step detection method specifically for disguised samples that go beyond browsing through the training set, e.g., haveibeentrained.com.

(1) Feature similarity search: Quantitatively, a disguised sample has a similar feature representation with that of a copyrighted sample, i.e., D2(E(xc), E(xd)) γ2. As we have provided the feature threshold γ2 which is sufficient for replicating the copyrighted content for different tasks, one can use this as a reference threshold to detect possible disguises. Specifically, given an encoder E (which is usually easily accessible), a copyrighted image xc which needs examination for infringement, one simply goes through the training set and computes E(x) for every single sample and

Disguised Copyright Infringement of Latent Diffusion Models

Figure 7: Results for the encoder-decoder examination. The first row shows the disguises xd and the second row includes the output of the autoencoder: D(E(xd)). Columns 1-3, 4-6, and 7-9 show results for symbol, content, and style, respectively. The autoencoder reveals the copyrighted information contained in the disguises.

compare with that of xc. However, such a search alone only extracts suspects, which may not be true disguises. To rule out the sheer chance of collisions in the feature space, we perform another examination below.

(2) Encoder-decoder examination: The encoder architecture in LDMs is part of an autoencoder architecture (e.g., the KL-based VAE), where the encoder and decoder are used separately for encoding and inference. Consequently, the decoder D can be naturally used to detect disguises qualitatively. Specifically, for a well-trained autoencoder, D(E(xc)) xc, while for disguises E(xd) E(xc), thus we have D(E(xd)) xc. In Figure 7, we show that the encoder-decoder architecture is a great detection tool for disguises, where the output of the autoencoder reveals the copyrighted content hidden in xd.

4.5. A broader definition of access

Our experiments have demonstrated that by generating disguises, one can indirectly acquire access to copyrighted material as part of the training data to fuel LDM-based models, which could reproduce the copyrighted samples during inference or deployment. We expand the definition of direct access to acknowledgment to cover such scenarios as possible copyright infringement: any training data x that contains similar latent information as that of copyrighted image xc, measured by their similarity in the latent feature representation, even visually different from xc, shall be considered to have an acknowledgment of xc. The notion of acknowledgment may also augment existing regulatory frameworks (e.g., the White House Executive Order on AI (Biden 2023)) on AI governance in terms of data quality evaluation. Additionally, our quantification of acknowledgment (the detection method) provide a timely tool for auditing beyond black-box access, which was pointed out to be insufficient in (Casper et al. 2024).

5. Conclusion

In this paper, we review the current access criterion of containing copyrighted material in the training set (direct access) in copyright infringement of generative models and point out its insufficiency by introducing disguised copyright infringement (indirect access). Specifically, such an infringement is realized by injecting disguised samples into the training set, which urges LDMs to produce copyrighted content. Such disguises are generated with a simple algorithm and demonstrated to share the same concept with their target copyrighted images using the textual inversion tool. To alleviate the concern on the disguises, we expand the current visual auditing (browsing the training set) with additional tools, i.e., feature similarity search and encoderdecoder examination to better identify these disguises. Furthermore, we propose a broader definition of acknowledgment to cover this new type of copyright violation.

Limitations and future work: One interesting future work is to quantify the number of disguises needed for reproducing in large-scale training, which can be further linked to the quantification of memorization of such models (Carlini et al. 2022; Somepalli et al. 2023a; Somepalli et al. 2023b; Carlini et al. 2023; Ippolito et al. 2023; Zhang et al. 2021)). Additionally, although our algorithms can generate descent disguises, we believe there is still room for improvement for optimization. Finally, one extension we didn t touch is the possibility of chopping copyrighted data and hiding it in several images. It is intriguing to explore whether it is possible to generate such a smuggler s dataset and detection towards it.

Learning from noisy data A simultaneous and independent work (Daras et al. 2024) considers learning LDMs from noisy data. Although the techniques are very different, our works share a similar implication: the training dataset may not immediately resemble the generations produced, thus al-

Disguised Copyright Infringement of Latent Diffusion Models

lowing copyright issues to be disguised from an auditor who manually inspects the dataset. In the work of Daras et al. (2024), the training data is the original data with Gaussian noise applied to it. In our case, the training data has the original data hidden in the latent space. In Section 4.5, we argue that if training data contains the same latent information as some copyrighted data, then it should be acknowledged. Daras et al. (2024) show that even when the latent space is corrupted the copyrighted information could still be contained, which could raise a stronger disguised copyright infringement attack.

Acknowledgment and fair use Finally we discuss the anticipated impact of our definition of acknowledgment on judicial decisions. Specifically, we believe acknowledgment will not directly affect the current fair use doctrine or change its decision. To further illustrate the above claim, we first propose two principles: (1) copyright infringement is not fair use: although generative AI law is not rigorously defined, we generally assume that reproducing the copyrighted content is not fair use when there s direct access; (2) disguised infringement does not (appear to) use the copyrighted data: there is no decision of fair use doctrine here as there is no apparent usage of the copyrighted data, not to mention fair use . Thus the decision is on the originality of the output (reproduction of the copyrighted content) but not fair use. As a result, the notion of acknowledgment aims to identify disguised copyright infringement in the second principle and challenge the decision of originality . Based on the acquired outcome (reproduction), there would be no scenarios that acknowledgment can change the decision of fair use according to the first principle.

Impact Statement

In this paper, we adopt a simple algorithm to disguise images, which we realize may be applied as a concealed copyright infringement tool. We are surprised at how easy it is for Generative AI to circumvent the copyright law, which was originally designed for regulating human scrapers. This seemingly undetectable infringing threat may bring disadvantages for human artists, who are already losing the battle with Gen AI. Furthermore, disguised images may not only be a concern to copyright owners but also a threat to Generative AI companies: if these disguises are accidentally collected as part of the training data, copyright infringement could be triggered unconsciously and unwillingly. As computer scientists, we are obligated to reveal the existence of such possible unlawful acts and provide technical tools to identify such behaviors. We expect our paper to be a prudent tool such that disguised copyright infringement and its corresponding detection methods are recognized by law experts.

Acknowledgements

We thank the reviewers for the critical comments that have largely improved the presentation and precision of this paper. We gratefully acknowledge funding support from NSERC and the Canada CIFAR AI Chairs program. YY is also supported by the Ontario early researcher program. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.

Aghakhani, H., D. Meng, Y.-X. Wang, C. Kruegel, and G. Vigna (2021). Bullseye polytope: A scalable cleanlabel poisoning attack with improved transferability . In: IEEE European Symposium on Security and Privacy (Euro S&P), pp. 159 178.

Biden, J. R. (2023). Executive order on the safe, secure, and trustworthy development and use of artificial intelligence .

Biggio, B., B. Nelson, and P. Laskov (2012). Poisoning attacks against support vector machines . In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 1467 1474.

Carlini, N., D. Ippolito, M. Jagielski, K. Lee, F. Tramer, and C. Zhang (2022). Quantifying Memorization Across Neural Language Models . In: The Eleventh International Conference on Learning Representations.

Carlini, N., J. Hayes, M. Nasr, M. Jagielski, V. Sehwag, F. Tramer, B. Balle, D. Ippolito, and E. Wallace (2023). Extracting training data from diffusion models . In: 32nd USENIX Security Symposium (USENIX Security 23), pp. 5253 5270.

Casper, S. et al. (2024). Black-box access is insufficient for rigorous AI audits . ar Xiv preprint ar Xiv:2401.14446.

Chen, X., C. Liu, B. Li, K. Lu, and D. Song (2017). Targeted backdoor attacks on deep learning systems using data poisoning . ar Xiv:1712.05526.

Daras, G., A. G. Dimakis, and C. Daskalakis (2024). Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data . ar Xiv preprint ar Xiv:2404.10177.

Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova (2018). Bert: Pre-training of deep bidirectional transformers for language understanding . ar Xiv preprint ar Xiv:1810.04805.

Disguised Copyright Infringement of Latent Diffusion Models

Dist. Court, N. C. (2023). Andersen v. STABILITY AI LTD. Case No. 23-cv-00201-WHO.

Fowl, L., P.-y. Chiang, M. Goldblum, J. Geiping, A. Bansal, W. Czaja, and T. Goldstein (2021a). Preventing unauthorized use of proprietary data: Poisoning for secure dataset release . ar Xiv preprint ar Xiv:2103.02683.

Fowl, L., M. Goldblum, P.-y. Chiang, J. Geiping, W. Czaja, and T. Goldstein (2021b). Adversarial Examples Make Strong Poisons . In: Advances in Neural Information Processing Systems, pp. 30339 30351.

Fu, S., F. He, Y. Liu, L. Shen, and D. Tao (2021). Robust unlearnable examples: Protecting data privacy against adversarial learning . In: International Conference on Learning Representations.

Gal, R., Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or (2022). An image is worth one word: Personalizing text-to-image generation using textual inversion . ar Xiv preprint ar Xiv:2208.01618.

Gatys, L. A., A. S. Ecker, and M. Bethge (2015). A neural algorithm of artistic style . ar Xiv preprint ar Xiv:1508.06576.

Geiping, J., L. H. Fowl, W. R. Huang, W. Czaja, G. Taylor, M. Moeller, and T. Goldstein (2021). Witches Brew: ial Scale Data Poisoning via Gradient Matching . In: International Conference on Learning Representations.

Gu, T., B. Dolan-Gavitt, and S. Garg (2017). Badnets: Identifying vulnerabilities in the machine learning model supply chain . ar Xiv:1708.06733.

Guo, J. and C. Liu (2020). Practical Poisoning Attacks on Neural Networks . In: European Conference on Computer Vision, pp. 142 158.

Huang, H., X. Ma, S. M. Erfani, J. Bailey, and Y. Wang (2021). Unlearnable Examples: Making Personal Data Unexploitable . In: International Conference on Learning Representations.

Huang, X. and S. Belongie (2017). Arbitrary style transfer in real-time with adaptive instance normalization . In: Proceedings of the IEEE international conference on computer vision, pp. 1501 1510.

Ippolito, D., F. Tramèr, M. Nasr, C. Zhang, M. Jagielski, K. Lee, C. A. Choquette-Choo, and N. Carlini (2023). Preventing generation of verbatim memorization in language models gives a false sense of privacy . In: Proceedings of the 16th International Natural Language

Generation Conference. Association for Computational Linguistics, pp. 28 53.

Jiang, S., S. R. Kadhe, Y. Zhou, L. Cai, and N. Baracaldo (2023). Forcing Generative Models to Degenerate Ones: The Power of Data Poisoning Attacks . ar Xiv preprint ar Xiv:2312.04748.

Khare, N., P. S. Thakur, P. Khanna, and A. Ojha (2021). Analysis of Loss Functions for Image Reconstruction Using Convolutional Autoencoder . In: International Conference on Computer Vision and Image Processing. Springer, pp. 338 349.

Koh, P. W. and P. Liang (2017). Understanding black-box predictions via influence functions . In: Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 1885 1894.

Koh, P. W., J. Steinhardt, and P. Liang (2022). Stronger Data Poisoning Attacks Break Data Sanitization Defenses . Machine Learning, vol. 111, pp. 1 47.

Liu, W. and S. Chawla (2010). Mining adversarial patterns via regularized loss minimization . Machine learning, vol. 81, no. 1, pp. 69 83.

Lu, Y., G. Kamath, and Y. Yu (2022). Indiscriminate Data Poisoning Attacks on Neural Networks . Transactions on Machine Learning Research.

(2023). Exploring the limits of model-targeted indiscriminate data poisoning attacks . In: International Conference on Machine Learning. PMLR, pp. 22856 22879.

Lu, Y., M. Y. Yang, G. Kamath, and Y. Yu (2024). Indiscriminate Data Poisoning Attacks on Pre-trained Feature Extractors . ar Xiv preprint ar Xiv:2402.12626.

Muñoz-González, L., B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee, E. C. Lupu, and F. Roli (2017). Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization . In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security (AISec), pp. 27 38.

Radford, A. et al. (2021). Learning transferable visual models from natural language supervision . In: International conference on machine learning. PMLR, pp. 8748 8763.

Rombach, R., A. Blattmann, D. Lorenz, P. Esser, and B. Ommer (2022). High-resolution image synthesis with latent diffusion models . In: Proceedings of the IEEE/CVF

Disguised Copyright Infringement of Latent Diffusion Models

conference on computer vision and pattern recognition, pp. 10684 10695.

Ruiz, N., Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman (2023). Dreambooth: Fine tuning text-toimage diffusion models for subject-driven generation . In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22500 22510.

Saha, A., A. Subramanya, and H. Pirsiavash (2020). Hidden trigger backdoor attacks . In: Proceedings of the AAAI Conference on Artificial Intelligence.

Sandoval-Segura, P., V. Singla, J. Geiping, M. Goldblum, T. Goldstein, and D. W. Jacobs (2022). Autoregressive Perturbations for Data Poisoning . In: Advances in Neural Information Processing Systems.

Schuhmann, C., R. Kaczmarczyk, A. Komatsuzaki, A. Katta, R. Vencu, R. Beaumont, J. Jitsev, T. Coombes, and C. Mullis (2021). LAION-400M: Open Dataset of CLIPFiltered 400 Million Image-Text Pairs . In: Neur IPS Workshop Datacentric AI. FZJ-2022-00923. Jülich Supercomputing Center.

Schuhmann, C. et al. (2022). Laion-5b: An open large-scale dataset for training next generation image-text models . Advances in Neural Information Processing Systems, vol. 35, pp. 25278 25294.

Shafahi, A., W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, and T. Goldstein (2018). Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks . In: Advances in Neural Information Processing Systems (Neur IPS), pp. 6103 6113.

Shan, S., W. Ding, J. Passananti, S. Wu, H. Zheng, and B. Y. Zhao (2024). Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models . In: 2024 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, pp. 212 212.

Sohl-Dickstein, J., E. Weiss, N. Maheswaranathan, and S. Ganguli (2015). Deep unsupervised learning using nonequilibrium thermodynamics . In: International conference on machine learning. PMLR, pp. 2256 2265.

Somepalli, G., V. Singla, M. Goldblum, J. Geiping, and T. Goldstein (2023a). Diffusion art or digital forgery? investigating data replication in diffusion models . In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6048 6058.

(2023b). Understanding and Mitigating Copying in Diffusion Models . In: Advances in Neural Information Processing Systems (Neur IPS).

Steinberg v. Columbia Pictures Industries, Inc. (1987).

Tran, B., J. Li, and A. Madry (2018). Spectral Signatures in Backdoor Attacks . In: Advances in Neural Information Processing Systems (Neur IPS).

Vyas, N., S. Kakade, and B. Barak (2023). On provable copyright protection for generative models . In: Proceedings of the 40 th International Conference on Machine Learning.

Wan, A., E. Wallace, S. Shen, and D. Klein (2023). Poisoning Language Models During Instruction Tuning . ar Xiv preprint ar Xiv:2305.00944.

Wang, Z., E. P. Simoncelli, and A. C. Bovik (2003). Multiscale structural similarity for image quality assessment . In: The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003. Vol. 2. Ieee, pp. 1398 1402.

Yu, D., H. Zhang, W. Chen, J. Yin, and T.-Y. Liu (2022).

Availability Attacks Create Shortcuts . In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2367 2376.

Zhang, C., D. Ippolito, K. Lee, M. Jagielski, F. Tramèr, and N. Carlini (2021). Counterfactual memorization in neural language models . ar Xiv preprint ar Xiv:2112.12938.

Zhu, C., W. R. Huang, H. Li, G. Taylor, C. Studer, and T. Goldstein (2019). Transferable clean-label poisoning attacks on deep neural nets . In: International Conference on Machine Learning, pp. 7614 7623.

Disguised Copyright Infringement of Latent Diffusion Models

A. Additional Experiments on Revealing Disguises

We provide additional experimental results on revealing disguises to further support our observations.

Textual inversion on xc and xb: Recall that Figure 3, Figure 4 and Figure 5 show the disguised images xd and the new images generated by textual inversion. To further show the effect of our algorithm on generating disguises, we directly perform a textual inversion (with the same setting in our main experiments) on the copyrighted images xc and the base images xb below. Firstly, for copyrighted symbols:

Learning on the copyrighted images xc:

Figure 8: Textual inversion on images xc with the copyrighted symbol A . The first row shows the copyrighted images xc, and the second row shows new images generated by textual inversion after learning on the above xc.

Learning on the base images xb:

Figure 9: Textual inversion on base images xb (different images of a bottle). The first row shows the base images xb, and the second row shows new images generated by textual inversion after learning on the above xb.

Comparing with Figure 3, we observe that although the disguises xd look visually similar to their corresponding base images xb, they contain drastically different latent information.

Disguised Copyright Infringement of Latent Diffusion Models

Next, we show the results for copyrighted content in the same format:

Learning on the copyrighted images xc:

Figure 10: Textual inversion on copyrighted images xc (The Sunflowers by Vincent Van Gogh). The first row shows the copyrighted images xc, and the second row shows new images generated by textual inversion after learning on the above xc.

Learning on the base images xb:

Figure 11: Textual inversion on base images xb (images of sketched houses with the blurry version of the corresponding xc as the background). The first row shows the base images xb, and the second row shows new images generated by textual inversion after learning on the above xb.

Compared to Figure 4, we observe that the background (blurry version of the xc) does not provide any information regarding the copyrighted images xc, and the success of the disguises is accomplished by our generating algorithm.

Disguised Copyright Infringement of Latent Diffusion Models

Finally, we show the results for style scraping:

Learning on the copyrighted images xc:

Figure 12: Textual inversion on copyrighted images xc (images in the style of The Starry Night by Vincent Van Gogh). The first row shows the copyrighted images xc, and the second row shows new images generated by textual inversion after learning on the above xc.

Learning on the base images xb:

Figure 13: Textual inversion on base images xb (images in watercolor style). The first row shows the base images xb, and the second row shows new images generated by textual inversion after learning on the above xb.

Compared to Figure 5, we again confirm that the disguises xd generated for style scraping look visually similar to their corresponding base images xb but contain latent information similar to xc.

Choice of xb for generating disguised content: Recall that for generating disguises in Figure 4, we choose a blurry version of xc as background to retain the color pattern. Here we demonstrate the necessity of the background by substituting it for a white background as base images xb. In Figure 14, we observe that the disguises fail in reproducing the copyrighted content on textual inversion.

Disguised Copyright Infringement of Latent Diffusion Models

Figure 14: Failure to learn the copyrighted content (The Sunflowers) from a disguised sketch of the house with a white background. The first row: disguises; the second row: images generated by textual inversion.

Intialization for unbounded disguises: In Figure 6 (first column), we show the disguises generated without any input constraints. Note that the disguise (for the copyrighted symbol) still contains the base image as background as the initialization of the disguise is xb. To confirm, we show two different initializations (zero tensor and Gaussian noise) in Figure 15 and observe that the background diminishes while the disguises still contain the copyrighted symbol.

Disguises xd

Inference on textual inversion by learning on xd only

Figure 15: Textual inversion on the disguised symbol without any input constraints. We deploy different initializations for generating disguises: the first row: zero tensor; the second row: Gaussian noise.

B. Data augmentation

In textual inversion, images are horizontally flipped with 50% probability. In this section, we explore how this affects the success of our disguises. The top two rows of Figure 16 show the results of training textual inversion on our disguise as well as its horizontally flipped augmentation. For the simpler concept of watermarking, textual inversion learns a concept similar to the intended one even when the input image is flipped. On the other hand, when the concept becomes more difficult,

Disguised Copyright Infringement of Latent Diffusion Models

textual inversion struggles to learn the concept under a horizontal flip augmentation (top right of Figure 16).

Disguise xd

Learn on xd

Learn on hflip(xd)

Disguise xd

Learn on xd

Learn on hflip(xd)

Disguise xd

Learn on xd

Learn on hflip(xd)

Disguise xd

Learn on xd

Learn on hflip(xd)

Figure 16: Textual inversion on different disguises. Top left: textual inversion on a single watermarking disguise. Top right: textual inversion on a single content disguise. Bottom row: same as top row, but with robust disguises.

To ensure that a horizontal flip augmentation would not damage the quality of our poison, we construct a new robust poison that additionally penalizes the distance between the features of the horizontally flipped poison and the horizontally flipped copyrighted image xc. This can be done with a simple change to the objective from

argmin xd αD1(xb, xd) + D2(E(xc), E(xd))

argmin xd αD1(xb, xd) + D2(E(xc), E(xd)) + D2(E(hflip(xc)), E(hflip(xd))).

The bottom two rows of Figure 16 demonstrate the effectiveness of the robust poison and its qualitative improvement under horizontal flipping.

Table 1: The encoder-decoder examination s threshold ζ, mean construction losses, false positive rate, and binary AUROC across symbol, content, and style disguises. The second group of rows is for the setting of mixture with base images, and the third group of rows is for the setting of mixture with 100 randomly selected Imagenette images.

Symbol Content Style

Exd(Lreconstruct(xd)) 0.0856 0.1632 0.2667 Threshold ζ 0.0703 0.1412 0.2506

Exb(Lreconstruct(xb)) 0.0502 0.0256 0.0551 FPR (xb misclassified) 1/4 0/3 0/3 AUC 0.8750 1 1

Exclean(Lreconstruct(xclean)) 0.0655 - - FPR (xclean misclassified) 42/100 2/100 0/100 AUC 0.7550 0.9933 1

C. Quantitative detection

In Section 4.4, we show how to qualitatively distinguish a disguise with encoder-decoder examination. In this section, we further examine two quantitative detection scenarios with/without the presence of the copyrighted image xc:

Disguised Copyright Infringement of Latent Diffusion Models

Without the presence of xc: Specifically, we observe that in Figure 7, for disguises xd, the input xd and output of the autoencoder D(E(xd)) are significantly different, i.e., the reconstruction loss D1(D(E(xd)), xd) is big (recall that D1 is the distance measure in the input space, a sum of MS-SSIM loss and L1 loss ). This property differs from normal images, which are expected to have low reconstruction loss. Leveraging this intuition, we develop a detection criterion for disguised images:

x is a disguise if Lreconstruct(x) := D1(D(E(x)), x) ζ

for some threshold ζ. The choice of ζ differs on different tasks (depending on the choice of the task, which we specify below). Note that this detection mechanism does not require any knowledge of the copyrighted image and is simply a data sanitization method that can be applied to any dataset.

In Table 1, we show the effectiveness of our proposed method across the three tasks we performed in Section 4 (copyrighted symbol, content, and style). Row 2 shows the mean of the reconstruction loss for the disguises we generated for each task; Row 3 shows the choice of threshold ζ as the lowest reconstruction loss out of all the disguised images such that there are no false negatives. Rows 4-6 examine the detection performance in terms of FPR (False Positive Rate) and AUC (Are Under the Curve) within the pool of the mix of the disguises xd and their corresponding base images xb; Rows 7-9 performs the same examination on a larger pool of the mix of the disguises xd and a subset of 100 images randomly chosen from Image Net.

We observe that for copyrighted symbols, as the copyrighted images xc (see Figure 8) and the base images xb (see Figure 9) share the same background and only differ on the symbol, the reconstruction loss of xd is low (mean of 0.0856) and falls into the range of that of clean images, thus having a higher FPR and a lower AUC than the other two tasks.

With the presence of xc: In practical scenarios (e.g., examination in court), one may already have acquired the copyrighted images xc and want to find its (possible) corresponding disguise from a dataset. In such scenarios, one may directly use the detection method introduced in Section 4.4. We repeat the above experiment using our feature similarity search as a first step, which we recall, screen disguises using the criterion D2(E(xc), E(xd)) γ2. We acquire perfect AUC score and obtain no false positives across all tasks, which indicate that the second step (encoder-decoder examination) is not required for this specific task.

D. Circumventing the detection?

A stronger disguise may be generated to circumvent our detection method (e.g., the encoder-decoder examination). This can be possibly accomplished by modifying the disguise generation objection. Recall that our previous objective is:

argmin xd αD1(xb, xd) + D2(E(xc), E(xd)),

Here we wish to also minimize the input distance between xd and its autoencoder output D1(D(E(xd)), xd)), thus we can modify our objective to:

argmin xd α(D1(xb, xd) + D1(D(E(xd)), xd)) + D2(E(xc), E(xd))

We perform experiments on the copyrighted symbols and observe in Figure 17 that, the additional term adds a constraint such that xd and D(E(xd)) are visually similar (row 2). However, performing textual inversion on these images also fails to reveal the copyrighted symbol. We conclude that this method might not be ideal, and additional study might be required.

E. Additional experiments on disguised style

In this section, we exhibit additional experimental results on style transfer by extending the choices of the (designated) copyrighted style xc and base images xb.

Disguises in the style of Claude Monet: Recall that in Section 4, we choose the base images xb as the ones with the watercolor style. In the following experiments, we aim to change xb to images with the style of Claude Monet and still generate disguises xd that implicitly contain the style of Vincent Van Vogh. In Figure 18, we first create the xc using style transfer (Huang and Belongie 2017), which is San Giorgio Maggiore at Dusk (1908-1912) by Claude Monet transferring to the style of The Starry Night by Vincent Van Gogh. In Figure 19, we create the disguise xd (row 1, column 2) with the

Disguised Copyright Infringement of Latent Diffusion Models

Figure 17: An attempt to evade the encoder-decoder detection on symbols. The first row shows the disguises xd, and the second row includes the output of the autoencoder D(E(xd)) that is unable to reveal the copyrighted information contained in the disguises. The third row shows the (unsuccessful) outcome of applying textual inversion to learn the hidden symbol.

visual appearance of the base xb and show that the images generated by textual inversion after training on xd still contain the style of The Starry Night.

(1) San Giorgio Maggiore at Dusk

(2) The Starry Night

xc: (1) with the style of (2)

Figure 18: Creating the copyrighted image xc (third column) using Ada IN-based style transfer (Huang and Belongie 2017). xc is San Giorgio Maggiore at Dusk by Claude Monet transferring to the style of The Starry Night by Vincent Van Gogh.

Scraping the style of other artists: In the previous experiments, we choose the style of The Starry Night as the designated copyrighted style. Next, we show the results for disguised style scraping of another artist Cecily Brown10. In Figure 20, we

10https://en.wikipedia.org/wiki/Cecily_Brown

Disguised Copyright Infringement of Latent Diffusion Models

Disguise xd

Images generated by textual inversion after training on the xd

Figure 19: Additional results on the disguised copyrighted style with textual inversion, where we generate disguise xd that visually resembles the style of Claude Monet but containing the style of Vincent Van Gogh in the latent space.

first create the xc using style transfer (Huang and Belongie 2017), which is Taj Mahal (adamkaz/Getty Images) transferring to the style of the painting with serial number 56542 by Cecily Brown11. In Figure 21, we create the disguise xd (row 1, column 2) with the visual appearance of the base xb (row 1, column 1) and show that the images generated by textual inversion after training on xd contain the style of the style of Cecily Brown.

Finally, in Figure 22, we compare two different disguises xd and observe images that visually look similar can contain drastically different styles in the feature space, which again demonstrates the threat of disguised copyright infringement.

(1) Taj Mahal (watercolor)

(2) Cecily Brown-56542

xc: (1) with the style of (2)

Figure 20: Creating the copyrighted image xc (third column) using Ada IN-based style transfer (Huang and Belongie 2017). xc is Taj Mahal transferring to the style of Cecily Brown.

Choice of xb: In previous experiments, we construct the disguises xd by choosing the base image xb to have a different style from xc but containing the same content. In Figure 23, we also explore the possibility of choosing a random base image from Image Net (row 1, column 1). Our results show that although the style is partially learned by textual inversion from the disguise xd, it also loses the structural information of xc.

11https://www.artnews.com/art-in-america/features/cecily-brown-56542/

Disguised Copyright Infringement of Latent Diffusion Models

Disguise xd

Images generated by textual inversion after training on the xd

Figure 21: Additional results on the disguised copyrighted style with textual inversion, where we generate disguise xd that visually resembles the watercolor style but containing the style of Cecily Brown in the latent space.

Taj Mahal in watercolor style

Taj Mahal in The Starry Night style

Taj Mahal in Cecily Brown style

Taj Mahal in watercolor style

xd (The Starry Night style)

xd (Cecily Brown style)

Figure 22: Comparison between different disguises xd we generated that contain drastically different styles in the feature space, but are visually similar in the input space.

Disguised Copyright Infringement of Latent Diffusion Models

Disguise xd

Images generated by textual inversion after training on the xd

Figure 23: Additional results on the disguised copyrighted style with textual inversion, where we generate disguise xd that visually resembles a random base image but intend to contain the style of The Starry Night in the latent space.

F. Disguises in the Wild

We have previously revealed the concept hidden in the disguised data using textual inversion, which is also an LDM-based application that can possibly generate copyright materials. In this section, we further extend our evaluation to other LDMbased pipelines that change the parameters of the U-Net. Specifically, we consider Dream Booth (Ruiz et al. 2023), which fine-tunes a pre-trained LDM model with only a few disguised images (similar to textual inversion); and mixed-training that contains both clean data and a small portion of disguises.

F.1. Evaluation on Dream Booth

Main results: In Section 4, we reveal the latent (copyrighted) information contained in the acquired samples xd using textual inversion, where the diffusion model (i.e., the U-Net in LDM) is not retrained. To further demonstrate the effectiveness of the disguises xd and examine disguised copyright infringement in other scenarios, we further perform evaluation on Dream Booth (Ruiz et al. 2023), an LDM-based fine-tuning method on a small set of images designed for subject-driven generation. Specifically, we choose five images from the Dream Booth dataset12 as the (designated) copyrighted image xc, and generate their corresponding disguises xd using Algorithm 1 by choosing the noisy version of xc as the base images xb, training 10000 epochs and setting the weight parameter α = 1000. After generating the disguises xd, we apply Dream Booth for fine-tuning (on xd only) and inference following the general recipe13. In Figure 24, we show the copyrighted images xc in the first row, their corresponding disguises xd in the second row, and images generated by Dream Booth by fine-tuning with the above xd in the third row. Our results qualitatively confirm that copyrighted information in xc can be reproduced by fine-tuning on xd with Dream Booth.

Evaluation on model utility: To further examine the practicality of the disguises, we want to demonstrate that the model utility is not deprecated after training (or fine-tuning) on xd. To accomplish this task, we perform the same text-to-image generation task with the prompt a photo of an astronaut riding a horse on Mars before/after training on xd with LDM 14. Specifically, the before model is Stable Diffusion v1-4, and the after model is the acquired model by fine-tuning the before model on the dog disguise (second row, first column in Figure 24). We show our results in Figure 25 for 6

12https://github.com/google/dreambooth 13https://github.com/huggingface/diffusers/tree/main/examples/dreambooth 14We perform inference following the recipe in https://huggingface.co/Comp Vis/stable-diffusion-v1-4

Disguised Copyright Infringement of Latent Diffusion Models

Disguise xd

Images generated by Dream Booth by fine-tuning with the above disguises

Figure 24: We show the disguised copyright infringement with Dream Booth. The first row: the designated copyrighted image xc from the Dream Booth dataset; the second row: the corresponding disguises xd generated with our Algorithm 1; the third row: images generated by Dream Booth after training on the above disguises xd.

Images generated by LDM before fine-tuning with our disguises using Dream Booth

Images generated by LDM after fine-tuning with our disguises using Dream Booth

Figure 25: Utility validation: images generated by LDM using the prompt a photo of an astronaut riding a horse on Mars before/after fine-tuning with our disguises in Figure 24 (row 2, column 1) with Dream Booth.

inferences and observe that the generation ability of the model is not compromised. We will show in the next section that this observation extends to LDM training.

F.2. Mixed-training on Celeb A-HQ

In the previous experiments, we have performed textual inversion and fine-tuning with Dream Booth where the disguises xd are the entire training set. In this section, we extend our experiments to more challenging settings where the disguises xd

Disguised Copyright Infringement of Latent Diffusion Models

Target images xc with the copyrighted symbol

Disguises xd

Images generated by LDM by training on clean images (1000) and disguises (100)

Figure 26: Disguised copyright infringement during mixed-training for unconditional generation on Celeb A-HQ. The first row: the designated copyrighted image xc with a red symbol A ; the second row: the corresponding disguises xd; the third to fifth rows: images generated by LDM by training on 1000 clean images and 100 disguises (each disguise above contributes 20 copies). Images highlighted with a red box indicate reproductions of the copyrighted symbol.

only constitute a small portion of the training set and are overwhelmed by the clean samples. Specifically, we randomly take a subset of 1000 images from Celeb A-HQ/25615 as clean training samples, and another subset of 5 samples (outside of the clean set) and add a (designated) copyrighted symbol A to all of them to construct xc (first row in Figure 26). Next, we construct the corresponding disguises xd (second row in Figure 26) to visually remove the symbol using Algorithm 1 and duplicate each disguise 20 times to construct a disguise set containing 100 images. Overall, we acquire a training set of 1100 images where the disguise fraction is 1 11. We initialize the model with the Comp Vis official model16 to preserve reasonable generation performance and train the model on the mixed dataset for 200 DDIM steps. In Figure 26 (rows 3-5) we show the images generated by the model after training (inference for 20 times) and observe 3 out of 20 images contain the copyrighted symbol, which demonstrated the effectiveness of the disguises for mixed-training scenarios.

15https://www.tensorflow.org/datasets/catalog/celeb_a_hq 16https://huggingface.co/Comp Vis/ldm-celebahq-256